Configs2026年7月4日·1 分钟阅读

Biopython — Python Tools for Computational Biology

Biopython is a collection of Python modules for biological computation, providing parsers for bioinformatics file formats, interfaces to online databases, and tools for sequence analysis, phylogenetics, and structural biology.

Agent 就绪

先审查再安装

这个资产需要先审查。复制的指令会要求 Agent dry-run、列出写入项,确认后再继续。

Needs Confirmation · 64/100策略:需确认
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
Biopython
先审查命令
npx -y tokrepo@latest install 76d29f6c-7761-11f1-9bc6-00163e2b0d79 --target codex

先 dry-run,确认写入项后再运行此命令。

Introduction

Biopython is the oldest and most widely used Python library for bioinformatics and computational biology. Started in 1999, it provides parsers for common biological data formats (FASTA, GenBank, PDB, BLAST output), interfaces to NCBI Entrez and other online databases, and tools for sequence alignment, phylogenetics, and protein structure analysis. Biopython is part of the Open Bioinformatics Foundation.

What Biopython Does

  • Parse and write bioinformatics file formats (FASTA, GenBank, PDB, BLAST XML)
  • Access NCBI databases (PubMed, GenBank, BLAST) via the Entrez API
  • Perform pairwise and multiple sequence alignment
  • Build and manipulate phylogenetic trees
  • Analyze protein 3D structures from PDB files

Architecture Overview

Biopython is organized into modules: Bio.SeqIO for sequence file I/O, Bio.Entrez for NCBI web services, Bio.Blast for BLAST parsing and remote execution, Bio.PDB for protein structure analysis, Bio.Phylo for phylogenetic trees, and Bio.Align for sequence alignment. Each module follows Pythonic conventions with iterator-based parsing for memory efficiency. The Seq object represents biological sequences with standard string operations plus translation and complement methods.

Self-Hosting & Configuration

  • Install via pip: pip install biopython
  • Requires Python 3.8+ and NumPy
  • Optional: ReportLab for graphics, matplotlib for plotting
  • No external services required for file parsing (Entrez queries need internet)
  • Set Entrez.email before making NCBI API requests

Key Features

  • Parsers for 20+ bioinformatics file formats with a unified SeqIO interface
  • NCBI Entrez API integration for PubMed, GenBank, and BLAST queries
  • PDB structure parser with atom-level access and DSSP integration
  • Phylogenetic tree construction and visualization
  • Active development since 1999 with extensive documentation

Comparison with Similar Tools

  • BioPandas — tabular access to PDB files; Biopython covers a wider range of bioinformatics tasks
  • scikit-bio — newer library focused on microbial ecology; Biopython has broader format support
  • Biotite — modern structure-focused library; Biopython is more established with wider community support
  • BioPerl/BioJava — equivalent libraries in Perl/Java; Biopython is the Python standard

FAQ

Q: Can Biopython run BLAST locally? A: Yes. Biopython provides wrappers for local BLAST+ executables and parsers for BLAST output formats.

Q: Does Biopython support next-generation sequencing data? A: Biopython can parse FASTQ files via SeqIO. For heavy NGS workflows, pysam or HTSeq may be more suitable.

Q: How do I access NCBI databases? A: Use Bio.Entrez with your email set. Functions like efetch, esearch, and einfo mirror the NCBI E-utilities API.

Q: Is Biopython suitable for large-scale genomics? A: Biopython is best for scripting and moderate-scale analysis. For genome-scale pipelines, consider integrating it with tools like Snakemake or Nextflow.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产