Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsJul 4, 2026·3 min de lecture

Biopython — Python Tools for Computational Biology

Biopython is a collection of Python modules for biological computation, providing parsers for bioinformatics file formats, interfaces to online databases, and tools for sequence analysis, phylogenetics, and structural biology.

Prêt pour agents

Installation avec revue préalable

Cet actif nécessite une revue. Le prompt copié demande un dry-run, affiche les écritures, puis continue seulement après confirmation.

Needs Confirmation · 64/100Policy : confirmer
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
Biopython
Commande avec revue préalable
npx -y tokrepo@latest install 76d29f6c-7761-11f1-9bc6-00163e2b0d79 --target codex

Dry-run d'abord, confirmez les écritures, puis lancez cette commande.

Introduction

Biopython is the oldest and most widely used Python library for bioinformatics and computational biology. Started in 1999, it provides parsers for common biological data formats (FASTA, GenBank, PDB, BLAST output), interfaces to NCBI Entrez and other online databases, and tools for sequence alignment, phylogenetics, and protein structure analysis. Biopython is part of the Open Bioinformatics Foundation.

What Biopython Does

  • Parse and write bioinformatics file formats (FASTA, GenBank, PDB, BLAST XML)
  • Access NCBI databases (PubMed, GenBank, BLAST) via the Entrez API
  • Perform pairwise and multiple sequence alignment
  • Build and manipulate phylogenetic trees
  • Analyze protein 3D structures from PDB files

Architecture Overview

Biopython is organized into modules: Bio.SeqIO for sequence file I/O, Bio.Entrez for NCBI web services, Bio.Blast for BLAST parsing and remote execution, Bio.PDB for protein structure analysis, Bio.Phylo for phylogenetic trees, and Bio.Align for sequence alignment. Each module follows Pythonic conventions with iterator-based parsing for memory efficiency. The Seq object represents biological sequences with standard string operations plus translation and complement methods.

Self-Hosting & Configuration

  • Install via pip: pip install biopython
  • Requires Python 3.8+ and NumPy
  • Optional: ReportLab for graphics, matplotlib for plotting
  • No external services required for file parsing (Entrez queries need internet)
  • Set Entrez.email before making NCBI API requests

Key Features

  • Parsers for 20+ bioinformatics file formats with a unified SeqIO interface
  • NCBI Entrez API integration for PubMed, GenBank, and BLAST queries
  • PDB structure parser with atom-level access and DSSP integration
  • Phylogenetic tree construction and visualization
  • Active development since 1999 with extensive documentation

Comparison with Similar Tools

  • BioPandas — tabular access to PDB files; Biopython covers a wider range of bioinformatics tasks
  • scikit-bio — newer library focused on microbial ecology; Biopython has broader format support
  • Biotite — modern structure-focused library; Biopython is more established with wider community support
  • BioPerl/BioJava — equivalent libraries in Perl/Java; Biopython is the Python standard

FAQ

Q: Can Biopython run BLAST locally? A: Yes. Biopython provides wrappers for local BLAST+ executables and parsers for BLAST output formats.

Q: Does Biopython support next-generation sequencing data? A: Biopython can parse FASTQ files via SeqIO. For heavy NGS workflows, pysam or HTSeq may be more suitable.

Q: How do I access NCBI databases? A: Use Bio.Entrez with your email set. Functions like efetch, esearch, and einfo mirror the NCBI E-utilities API.

Q: Is Biopython suitable for large-scale genomics? A: Biopython is best for scripting and moderate-scale analysis. For genome-scale pipelines, consider integrating it with tools like Snakemake or Nextflow.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires