# Biopython — Python Tools for Computational Biology > Biopython is a collection of Python modules for biological computation, providing parsers for bioinformatics file formats, interfaces to online databases, and tools for sequence analysis, phylogenetics, and structural biology. ## Install Save in your project root: # Biopython — Python Tools for Computational Biology ## Quick Use ```bash pip install biopython ``` ```python from Bio import SeqIO, Entrez # Parse a FASTA file for record in SeqIO.parse("sequences.fasta", "fasta"): print(record.id, len(record.seq)) # Fetch a GenBank record Entrez.email = "you@example.com" handle = Entrez.efetch(db="nucleotide", id="NM_001301717", rettype="gb") record = SeqIO.read(handle, "genbank") print(record.description) ``` ## Introduction Biopython is the oldest and most widely used Python library for bioinformatics and computational biology. Started in 1999, it provides parsers for common biological data formats (FASTA, GenBank, PDB, BLAST output), interfaces to NCBI Entrez and other online databases, and tools for sequence alignment, phylogenetics, and protein structure analysis. Biopython is part of the Open Bioinformatics Foundation. ## What Biopython Does - Parse and write bioinformatics file formats (FASTA, GenBank, PDB, BLAST XML) - Access NCBI databases (PubMed, GenBank, BLAST) via the Entrez API - Perform pairwise and multiple sequence alignment - Build and manipulate phylogenetic trees - Analyze protein 3D structures from PDB files ## Architecture Overview Biopython is organized into modules: Bio.SeqIO for sequence file I/O, Bio.Entrez for NCBI web services, Bio.Blast for BLAST parsing and remote execution, Bio.PDB for protein structure analysis, Bio.Phylo for phylogenetic trees, and Bio.Align for sequence alignment. Each module follows Pythonic conventions with iterator-based parsing for memory efficiency. The Seq object represents biological sequences with standard string operations plus translation and complement methods. ## Self-Hosting & Configuration - Install via pip: `pip install biopython` - Requires Python 3.8+ and NumPy - Optional: ReportLab for graphics, matplotlib for plotting - No external services required for file parsing (Entrez queries need internet) - Set Entrez.email before making NCBI API requests ## Key Features - Parsers for 20+ bioinformatics file formats with a unified SeqIO interface - NCBI Entrez API integration for PubMed, GenBank, and BLAST queries - PDB structure parser with atom-level access and DSSP integration - Phylogenetic tree construction and visualization - Active development since 1999 with extensive documentation ## Comparison with Similar Tools - **BioPandas** — tabular access to PDB files; Biopython covers a wider range of bioinformatics tasks - **scikit-bio** — newer library focused on microbial ecology; Biopython has broader format support - **Biotite** — modern structure-focused library; Biopython is more established with wider community support - **BioPerl/BioJava** — equivalent libraries in Perl/Java; Biopython is the Python standard ## FAQ **Q: Can Biopython run BLAST locally?** A: Yes. Biopython provides wrappers for local BLAST+ executables and parsers for BLAST output formats. **Q: Does Biopython support next-generation sequencing data?** A: Biopython can parse FASTQ files via SeqIO. For heavy NGS workflows, pysam or HTSeq may be more suitable. **Q: How do I access NCBI databases?** A: Use Bio.Entrez with your email set. Functions like efetch, esearch, and einfo mirror the NCBI E-utilities API. **Q: Is Biopython suitable for large-scale genomics?** A: Biopython is best for scripting and moderate-scale analysis. For genome-scale pipelines, consider integrating it with tools like Snakemake or Nextflow. ## Sources - https://github.com/biopython/biopython - https://biopython.org/ --- Source: https://tokrepo.com/en/workflows/asset-76d29f6c Author: AI Open Source