ScriptsApr 28, 2026·3 min read

AlphaFold — AI-Powered 3D Protein Structure Prediction

AlphaFold by Google DeepMind predicts three-dimensional protein structures from amino acid sequences with atomic-level accuracy, enabling breakthroughs in drug discovery, enzyme engineering, and structural biology research.

Introduction

AlphaFold is a deep learning system developed by Google DeepMind that predicts 3D protein structures from their amino acid sequences. It solved one of biology's grand challenges, achieving accuracy competitive with experimental methods like X-ray crystallography and cryo-EM, and has been used to predict structures for virtually every known protein.

What AlphaFold Does

  • Predicts 3D atomic coordinates of a protein from its amino acid sequence
  • Generates per-residue confidence scores (pLDDT) to indicate prediction reliability
  • Supports multimer prediction for protein complexes (AlphaFold-Multimer)
  • Produces multiple ranked structure candidates per input sequence
  • AlphaFold 3 extends predictions to DNA, RNA, ligands, and post-translational modifications

Architecture Overview

AlphaFold 2 uses an Evoformer module that processes multiple sequence alignments (MSAs) and pairwise residue features through iterative attention layers. A structure module then converts these representations into 3D coordinates using invariant point attention. The system searches genetic databases (UniRef, BFD, MGnify) to build MSAs, which provide evolutionary context. AlphaFold 3 replaces parts of the pipeline with a diffusion-based structure generation module.

Self-Hosting & Configuration

  • Clone the repository and use the provided Docker setup for reproducible environments
  • Requires downloading reference databases (~2.6 TB for full genetic databases)
  • Reduced database mode available for faster setup with slightly lower accuracy
  • Needs an NVIDIA GPU with at least 12 GB VRAM for standard predictions
  • Configure input via FASTA files; output includes PDB/mmCIF structure files and confidence metrics

Key Features

  • Near-experimental accuracy on most single-chain protein structures
  • Per-residue confidence scores help identify reliable regions
  • Multimer mode predicts protein-protein complex structures
  • Pre-computed structures for 200+ million proteins available in the AlphaFold Protein Structure Database
  • Open-source code with Apache 2.0 license

Comparison with Similar Tools

  • ESMFold — Meta's single-sequence predictor; faster but less accurate without MSA
  • RoseTTAFold — Baker Lab alternative; competitive accuracy with different architecture
  • ColabFold — accelerated AlphaFold using MMseqs2 for MSA; easier cloud setup
  • OpenFold — trainable reimplementation of AlphaFold 2 in PyTorch
  • Chai-1 — newer multi-modal structure prediction; predicts drug-target interactions

FAQ

Q: How much disk space do the databases require? A: The full genetic databases need approximately 2.6 TB. A reduced version requires about 500 GB with some accuracy tradeoff.

Q: Can AlphaFold predict protein-ligand binding? A: AlphaFold 3 can predict ligand binding poses. AlphaFold 2 predicts protein structure only; use docking tools for ligands.

Q: How long does a prediction take? A: A typical single-chain prediction takes 30 minutes to several hours depending on sequence length and GPU.

Q: Is AlphaFold suitable for disordered proteins? A: Intrinsically disordered regions will have low pLDDT scores. The model indicates uncertainty rather than producing unreliable structures.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets