A visual guide to the complex, iterative process DeepMind's AI uses to transform a string of letters into an accurate atomic model.
Introduction – Why This Matters
In the intricate machinery of life, proteins are the workhorses. They are the enzymes that catalyze every chemical reaction in your body, the antibodies that fight infection, the structural scaffolds of your cells, and the molecular messengers that govern thought and emotion. For over half a century, a central mystery in biology has been the “protein folding problem”: how does a simple, one-dimensional string of amino acids—the sequence produced by your DNA—spontaneously and reliably fold into the intricate, three-dimensional shape that determines its function? Predicting this shape from sequence alone was a monumental challenge, deemed one of biology’s grandest. In 2020, the London-based artificial intelligence lab DeepMind announced it had cracked it with AlphaFold 2, a breakthrough hailed as a solution to a 50-year-old grand challenge. Today, this isn’t just an academic triumph; it is accelerating the discovery of new medicines, enzymes for sustainability, and our fundamental understanding of life’s machinery at an unprecedented pace.
The impact is quantifiable. Before AlphaFold, determining a single protein’s structure through experimental methods like X-ray crystallography or cryo-electron microscopy was a herculean task, often taking years and costing hundreds of thousands of dollars per structure. The Protein Data Bank (PDB), the global repository built over 50 years, contained around 180,000 experimentally determined structures. In a single, stunning release, DeepMind used AlphaFold to predict the structures of nearly all 200+ million known proteins—from humans to plants to bacteria—essventing the database overnight. For hundreds of thousands of understudied “dark” proteins, AlphaFold provided the first-ever structural blueprints.
In my experience, the power of AlphaFold lies in its democratization. I’ve spoken with small university labs in developing countries that could never afford a cryo-EM facility. Now, with an internet connection, they can access predicted structures for their protein of interest with confidence, designing experiments and hypotheses that were previously out of reach. It’s leveling the scientific playing field in a profound way.
This guide will take you deep into the world of AlphaFold. We’ll explore the history of the problem, demystify how the AI works, and showcase how it is now moving from prediction to generative design, helping scientists not just understand nature’s proteins, but invent new ones to cure diseases, break down plastics, and capture carbon. Whether you’re new to molecular biology or a professional needing a refresh on the latest AI-driven tools, this is your map to the revolution.
Background / Context: A Five-Decade Puzzle
The “protein folding problem” was first articulated in the 1960s. Scientists understood the principle: a protein’s amino acid sequence (its primary structure) dictates its final 3D shape (tertiary structure), which dictates its function. But the puzzle was vast. A typical protein can fold in an astronomically large number of ways (10^300 possible conformations for an average protein). Yet, in the crowded cellular environment, it finds its correct, functional shape in milliseconds. How?
For decades, progress was slow. Experimental methods were king but were bottlenecks. X-ray crystallography required growing a perfect crystal of the protein, which was often impossible for many important, flexible proteins. Nuclear Magnetic Resonance (NMR) spectroscopy was limited to smaller proteins. The arrival of cryo-electron microscopy (cryo-EM) in the 2010s was a major advance, but it remained expensive and technically demanding.
In parallel, computational biologists attempted ab initio (from scratch) folding simulations using physics-based models. These simulations, which calculate the forces between every atom, required immense supercomputing power (like IBM’s BlueGene) to simulate folding for just microseconds—far short of the real timescale. While insightful, they weren’t a practical prediction tool.
To catalyze progress, a biennial competition called CASP (Critical Assessment of Structure Prediction) was established in 1994. It became the Olympics of protein folding, where teams blindly predicted structures for proteins whose shapes had been experimentally determined but not yet published. Performance was measured by a Global Distance Test (GDT) score (0-100), where a score above ~90 is considered competitive with experimental accuracy.
For years, incremental progress crept from GDT scores in the 30s and 40s (poor, rough sketches) into the 60s (moderately useful models). Then, in 2018, DeepMind’s first version, AlphaFold 1, entered CASP13 and stunned the community by winning decisively. It was a clear signal that AI, specifically deep learning, had arrived. But the true earthquake came in 2020 with AlphaFold 2 at CASP14. It achieved a median GDT score of 92.4, smashing through the accuracy threshold and effectively solving the core folding problem for most single-chain proteins. The judges declared the problem “largely solved.” The paradigm had shifted overnight.
Key Concepts Defined
- Protein:Â A large, complex molecule made up of one or more long chains of amino acids. Proteins are essential for the structure, function, and regulation of the body’s tissues and organs.
- Amino Acid:Â The building block of proteins. There are 20 standard types, each with different chemical properties (e.g., hydrophobic, hydrophilic, charged). The sequence of amino acids is encoded by DNA.
- Protein Structure:
- Primary:Â The linear sequence of amino acids.
- Secondary: Local folded patterns, mainly alpha-helices and beta-sheets, stabilized by hydrogen bonds.
- Tertiary:Â The overall three-dimensional shape of a single polypeptide chain.
- Quaternary:Â The structure formed by multiple protein chains (subunits) assembling together.
- Protein Folding:Â The physical process by which a protein chain, after being synthesized by a ribosome, folds into its native, functional three-dimensional conformation.
- Deep Learning:Â A subset of machine learning based on artificial neural networks with multiple layers (“deep”). These networks can learn hierarchical patterns from vast amounts of data.
- Multiple Sequence Alignment (MSA):Â A crucial input for AlphaFold. It is a compilation of evolutionarily related protein sequences from different species. Conserved patterns in an MSA reveal which amino acids co-evolve, implying they are close in the 3D structure.
- Attention Mechanism (Transformer Network):Â A neural network architecture that learns context by weighing the importance of different parts of the input data (like different amino acids in a sequence). It is the core engine of AlphaFold 2, allowing it to reason about long-range interactions.
- Confidence Score (pLDDT):Â AlphaFold’s per-residue confidence metric (0-100). A score >90 indicates high confidence, often backbone-accurate. Scores 70-90 are generally good for the protein core but may have flexible loops wrong. Scores <50 are very low confidence and should be treated as speculative.
- AlphaFold Protein Structure Database:Â A public database created by DeepMind in collaboration with the European Molecular Biology Laboratory (EMBL), providing free, open access to over 200 million predicted protein structures.
- De Novo Protein Design:Â The inverse of folding: designing novel amino acid sequences that will fold into a desired, pre-specified 3D shape and function. This is the new frontier where AlphaFold-like models are being applied.
How It Works: A Step-by-Step Breakdown of AlphaFold’s Intelligence

AlphaFold 2 is not a physics simulator. It is a pattern recognition and geometric reasoning engine trained on the known universe of protein structures. Here’s how it transforms a sequence into a 3D model.
Step 1: Input and Evolutionary Analysis
The process starts with a single amino acid sequence (e.g., “MVLSPADKTN…”).
- Sequence Search: The system uses tools like HHblits and JackHMMER to search genetic databases for evolutionarily related sequences. It builds a deep Multiple Sequence Alignment (MSA). This MSA is gold—it contains billions of years of evolutionary experimentation, implicitly encoding structural constraints.
- Template Search (Optional):Â It simultaneously searches the PDB for known structures of proteins with similar sequences, which can provide structural priors.
Step 2: The Neural Network Core – The Evoformer and Structure Module
This is where the deep learning magic happens. AlphaFold 2’s architecture is a sophisticated, iterative pipeline.
- The Evoformer (The “Reasoning” Block): This is a transformer-based neural network that processes the MSA and any templates. Its attention mechanisms perform a kind of computational co-evolution analysis. It identifies pairs of amino acid positions that appear to mutate in a correlated way across different species—a strong signal that they are in physical contact in the 3D fold. It outputs a set of updated representations that encode evolving geometric relationships.
- The Structure Module (The “Builder”): This network takes the representations from the Evoformer and explicitly predicts the 3D coordinates of every atom in the protein backbone and key side chains. It doesn’t just guess; it builds a proper molecular geometry, predicting distances between atoms and the torsion angles of the chemical bonds.
Step 3: Iterative Refinement via “Recycling”
AlphaFold doesn’t just run once. Its key innovation is a self-distillation loop. The initial 3D structure prediction is fed back into the network’s input as an additional cue. The network then “reasons” about this draft structure in the context of the MSA, refining it. This recycling happens 3-4 times, allowing the model to correct errors and converge on a highly accurate final prediction.
Step 4: Output and Confidence Estimation
The final output is a complete atomic coordinate file (in PDB format) of the predicted structure. Critically, it also provides:
- Per-Residue Confidence (pLDDT):Â A score for each amino acid, color-coded from blue (high confidence) to red (low confidence). This tells researchers which parts of the model to trust.
- Predicted Aligned Error (PAE):Â A 2D matrix that estimates the positional error between any two residues. This is crucial for understanding the confidence in the relative placement of different protein domains, which is key for interpreting function.
Key Takeaway: AlphaFold is a master pattern recognizer. It doesn’t simulate physics; it learns the implicit “grammar” of protein folding from evolutionary data (MSAs) and known structures. It uses attention mechanisms to deduce long-range contacts and an iterative building process to assemble a physically plausible 3D model, complete with a built-in “error report” for the user.
Why It’s Important: Beyond the Academic Triumph
Solving protein folding is not an end in itself; it is a master key that unlocks doors across biology and medicine.
1. Illuminating the “Dark” Proteome
The human genome encodes roughly 20,000 proteins, but structures were known for only about 17% of them. For non-human species, the structural knowledge was even sparser. AlphaFold’s massive database has provided first-ever structural glimpses for millions of proteins involved in neglected tropical diseases, unique metabolic pathways in plants, and mysterious bacterial functions. This is accelerating basic science at a pace previously unimaginable. For broader insights into how technology tackles global challenges, explore our analysis on global affairs and politics.
2. Revolutionizing Drug Discovery
The traditional drug discovery pipeline is slow and costly, often taking over a decade and billions of dollars. A major bottleneck is obtaining high-quality structures of drug targets (like disease-causing proteins) to enable structure-based drug design. AlphaFold is collapsing this timeline.
- Target Identification & Validation:Â Quickly assessing if a newly implicated protein has a “druggable” pocket.
- Lead Compound Docking:Â Using predicted structures for virtual screening of millions of chemical compounds to find ones that might bind and modulate the target.
- Antibody Design:Â Predicting the structure of epitopes (target regions) on pathogens to design more effective antibodies and vaccines.
Companies like Isomorphic Labs (a DeepMind spin-off), Relay Therapeutics, and many large pharma firms are now integrating AlphaFold predictions as a standard tool early in their pipelines.
3. Accelerating Enzyme Engineering for Sustainability
The quest for green chemistry relies on designing better enzymes—biological catalysts that work under mild conditions. Whether it’s breaking down plastic waste (PETase), capturing carbon, or synthesizing biofuels, AlphaFold provides the blueprint. Researchers can now see the active site of a natural enzyme, understand how it works, and use AI to design mutations that improve its stability, activity, or specificity.
4. Decoding Disease Mechanisms
Many genetic diseases are caused by missense mutations—a single incorrect amino acid in a protein. Before AlphaFold, understanding why a specific mutation caused disease was often guesswork. Now, researchers can instantly see the predicted structure of both the healthy and mutant protein, visualizing how the mutation might destabilize the fold, block an active site, or disrupt a crucial interaction.
What I’ve found is that the most immediate impact is on hypothesis generation. A biologist studying a poorly characterized protein no longer has to start in the dark. They have a 3D model to interrogate. They can ask: “Does it have a pocket that looks like it binds small molecules? Does its surface charge pattern suggest it interacts with DNA?” This turns a fishing expedition into a targeted investigation.
Sustainability in the Future: Building an Open, Responsible Ecosystem

The long-term health of this revolution depends on how the technology is stewarded.
Openness vs. Commercialization
DeepMind’s decision to open-source the AlphaFold 2 code and provide free public access to its vast database through the EMBL is a landmark in open science. It has prevented a “winner-takes-all” dynamic and sparked global innovation. However, the next phase—applying these models for de novo design of novel proteins and drugs—is heavily commercialized. Balancing open academic research with the proprietary needs of biotech companies that drive therapeutic translation will be an ongoing tension.
Computational Cost and Accessibility
Running the full AlphaFold 2 model requires significant GPU resources, though cloud-based versions have made it more accessible. The environmental footprint of training and running large AI models is a concern. The community is responding with lighter, faster models like AlphaFold-Multimer (for complexes) and open-source alternatives like RoseTTAFold from the Baker Lab, which offer good accuracy with lower computational cost, promoting sustainable access.
Recognizing and Mitigating Limitations
A sustainable future requires clear understanding of the tool’s limits to avoid misapplication:
- Dynamics & Flexibility:Â Proteins are not static statues; they dance. AlphaFold predicts a single, stable state, often missing the conformational changes crucial for function.
- Membrane Proteins & Complexes:Â Predictions for proteins embedded in cell membranes or large, transient multi-protein assemblies are less accurate, though improving rapidly with models like AlphaFold-Multimer.
- Conditional Effects:Â The model doesn’t predict how structures change under different pH, temperature, or with bound ligands or drugs.
- The “AI Glaze”:Â Over-reliance on AI predictions could stifle experimental validation. The gold standard remains empirical determination. AlphaFold is a supremely powerful guide, not a replacement for rigorous experimentation.
The sustainable path is one of complementarity: using AI predictions to guide and accelerate wet-lab experiments, which in turn generate new data to train the next, more powerful generation of AI models.
Common Misconceptions

Misconception 1: “AlphaFold ‘solved’ all of biology.”
- Reality: It solved a specific, critical prediction problem. Biology is about dynamics, interactions, regulation, and function in living systems. AlphaFold provides a static snapshot of one player—a vital piece, but far from the whole puzzle.
Misconception 2: “It replaces experimental structural biology.”
- Reality:Â It transforms it. Experimentalists are freed from the slowest, most tedious steps and can now focus on the most interesting, complex targets: flexible proteins, large complexes, and structures with bound drugs. Cryo-EM and crystallography are more important than ever for validating AI predictions and solving what AI cannot.
Misconception 3: “The predictions are perfect and always correct.”
- Reality: They are astonishingly accurate for many proteins, but the pLDDT confidence score is essential. Low-confidence regions (often flexible loops or disordered segments) may be incorrect. The model can also hallucinate plausible-looking but wrong structures for proteins with very few evolutionary relatives.
Misconception 4: “It can predict how any protein will interact with any drug.”
- Reality:Â Predicting protein-drug binding (molecular docking) is a separate, though related, challenge. While AlphaFold structures improve docking, accurately predicting binding affinity and the induced fit of the protein around the drug molecule requires additional, specialized AI tools and simulations.
Misconception 5: “Only DeepMind can do this; it’s a black box.”
- Reality: The core ideas are now in the open scientific domain. The open-source release has led to a flourishing ecosystem of alternative models (RoseTTAFold, ESMFold), specialized variants, and user-friendly servers like ColabFold, making the technology accessible to anyone with a web browser.
Recent Developments (2024-2025): From Prediction to Design
The field has moved with incredible speed beyond static prediction.
- AlphaFold 3 (Announced May 2024): The latest iteration is a revolutionary leap. It is a generalist model that can predict not just protein structures, but the joint structures of proteins, DNA, RNA, small molecules (like drugs), and ions—and their interactions with each other. This is the holy grail for drug discovery, allowing in silico screening of drug candidates against their targets with much higher fidelity.
- De Novo Design with “Inverse Folding”: Labs like David Baker’s at the University of Washington are using networks like ProteinMPNN alongside AlphaFold. ProteinMPNN designs sequences that will fold into a given backbone structure. The workflow: 1) Dream up a new protein shape with a desired function (e.g., a nano-cage), 2) Use ProteinMPNN to generate sequences, 3) Use AlphaFold to check if they fold correctly. This loop is already producing new proteins that never existed in nature.
- Clinical Pipeline Advancements: In early 2025, the first drug candidate whose discovery was primarily enabled by an AlphaFold-predicted structure of a previously uncharacterized target entered Phase I clinical trials. While details are confidential, it marks a historic milestone in the technology’s translational impact.
- Cracking Large Complexes: Improvements in AlphaFold-Multimer and new models like AlphaFold-Multimer-v3 are delivering increasingly reliable predictions for large, multi-subunit protein machines, shedding light on fundamental cellular processes like transcription and RNA splicing.
- Integration with Molecular Dynamics: Researchers are now using AlphaFold predictions as starting points for all-atom molecular dynamics simulations, which can then model the protein’s motion and conformational changes over time, bridging the gap between static structure and dynamic function.
Success Stories and Real-Life Examples

Case Study 1: Unlocking the “Malaria Parasite’s Secret Weapon”
- Challenge: The Plasmodium parasite, which causes malaria, has a mysterious protein called PfCyRPA that is essential for invading human red blood cells. Its structure was unknown, hindering vaccine design.
- Solution:Â Researchers used AlphaFold to predict the structure of PfCyRPA with high confidence. The model revealed it had a well-defined, stable shape with a surface likely targeted by antibodies.
- Outcome:Â This prediction directly guided the engineering of a stable, immunogenic protein fragment for a new malaria vaccine candidate, now in preclinical development. What might have taken years of trial-and-error crystallography was achieved in days.
Case Study 2: Designing a Hyperstable Vaccine for RSV
- Challenge:Â The respiratory syncytial virus (RSV) F protein is a key vaccine target but is unstable and shape-shifts, making it hard to design an effective immunogen.
- Solution: Scientists at the University of Washington used Rosetta design software combined with AlphaFold checks to computationally design over 100,000 new versions of the F protein with mutations aimed at locking it into its most vulnerable shape.
- Outcome: They produced a novel protein, DS-Cav1, which was far more stable and elicited powerfully protective antibodies in animal models. This computationally designed antigen is a core component of several leading RSV vaccine candidates now on the market or in late-stage trials.
Case Study 3: The Search for a “Plastic-Eating” Enzyme Variant
- Challenge:Â While the enzyme PETase can break down PET plastic, it is not stable or efficient enough for industrial use.
- Solution:Â A global team used AlphaFold to model thousands of hypothetical mutant versions of PETase, predicting which mutations would improve stability at high temperatures without disrupting the active site.
- Outcome: They identified a set of five mutations that, when combined, created a “FAST-PETase” variant. Experimentally confirmed, this AI-guided enzyme depolymerizes 51 different PET products within a week and works at 50°C, making it a viable candidate for large-scale biological recycling.
Case Study 4: Solving a Rare Genetic Disease Mystery
- Challenge: A patient presented with a severe neurodevelopmental disorder. Whole-genome sequencing revealed a novel mutation in a gene called PACS-1, but no one knew what the protein did or how the mutation caused harm.
- Solution:Â Researchers ran the mutant and normal PACS-1 sequences through AlphaFold. The models showed the mutation occurred in a critical, tightly packed core of the protein, likely causing severe misfolding.
- Outcome: This structural insight provided a mechanistic explanation for the disease, gave the family a clear diagnosis, and suggested potential therapeutic strategies focused on protein stabilizers or proteostasis regulators. It transformed a clinical mystery into a structured research problem.
Conclusion and Key Takeaways

The story of AlphaFold is more than a victory in a scientific competition. It is the arrival of a new kind of partner in the exploration of life: an AI-powered microscope for molecular shape. It has demystified one of nature’s most elegant processes and handed humanity a transformative toolset.
As we look forward:
- Prediction is Now a Commodity: Access to accurate protein structures is no longer a bottleneck. The focus has shifted to interpretation, dynamics, and design.
- The Convergence of AI and Biology is Complete: The next generation of biologists must be computationally literate, and the next generation of AI researchers will find biology their most inspiring canvas. For those interested in the business applications of such convergence, our guide on starting an online business explores how technology creates new commercial frontiers.
- Therapeutic and Sustainable Innovation Will Accelerate:Â From personalized medicines targeting mutant protein structures to bespoke enzymes for a circular economy, the pipeline from digital design to real-world impact is shortening dramatically.
- The Imperative of Open Science Endures:Â DeepMind’s open approach amplified its impact a thousandfold. Maintaining open databases, codes, and collaborative mindsets for foundational models will be crucial for equitable progress.
AlphaFold did not just predict structures; it predicted a future where our understanding and engineering of life are limited only by the questions we can imagine asking. The grand challenge is over. The era of biological design has begun.
Frequently Asked Questions (FAQs)
1. Can I use AlphaFold for free?
Yes, in multiple ways. You can search the public AlphaFold Database for predictions on millions of proteins. You can also run your own sequences for free via ColabFold, a user-friendly implementation that runs on Google Colab’s cloud servers.
2. What’s the difference between AlphaFold and RoseTTAFold?
AlphaFold 2 is generally more accurate but computationally heavier. RoseTTAFold, from the Baker lab, uses a different three-track network architecture and is somewhat faster and less resource-intensive while still being highly accurate. They are complementary tools in the community’s toolkit.
3. How does AlphaFold handle protein complexes (multiple chains)?
The AlphaFold-Multimer variant is specifically trained to predict structures of complexes with two or more protein chains. It is improving rapidly but can still struggle with very large complexes or weak, transient interactions.
4. Can it predict disorderd protein regions?
Yes, but indirectly. Regions with very low pLDDT scores (e.g., <50) are predicted to be intrinsically disordered—they don’t have a fixed 3D structure and are dynamically flexible. This is valuable information in itself.
5. What computer hardware do I need to run AlphaFold locally?
A high-end GPU (like an NVIDIA A100, V100, or RTX 4090) with at least 16GB of VRAM is recommended for reasonable speed. Running on CPU only is possible but extremely slow for full-length proteins.
6. Is AlphaFold’s training data biased?
Yes, like all AI. It was trained on the PDB, which is biased towards proteins that are stable, expressible in E. coli, and crystallizable. This means its predictions may be best for proteins that resemble those in its training set and less accurate for highly unusual sequences.
7. How are pharmaceutical companies using it?
They integrate it into early-stage pipelines: target assessment (does a new disease target have a druggable pocket?), lead identification (virtual screening against predicted structures), and antibody engineering (designing biologics that bind specific epitopes).
8. Can it predict the effect of a drug on a protein’s shape (induced fit)?
Not directly. Standard AlphaFold predicts the apo state (protein alone). AlphaFold 3 is a major step forward, as it can co-predict the structure of a protein with a bound small molecule, which inherently models some induced fit.
9. What’s next after AlphaFold? The “Protein Dynamics Problem”?
Exactly. The next grand challenge is predicting the full energy landscape of a protein—all its possible shapes, transitions, and how it moves over time. This is essential for understanding allostery, signaling, and mechanism. AI models integrating predictions with molecular dynamics are the frontier.
10. Are there any risks associated with this technology?
Potential risks include dual-use concerns (e.g., designing novel toxins or pathogens), though this requires expertise beyond just structure prediction. There’s also a risk of over-reliance and scientific complacency, where low-confidence predictions are accepted as truth without experimental validation.
11. How accurate is it compared to experimental methods?
For well-folded domains, the backbone accuracy (C-alpha atoms) is often within the width of an atom (1-2 Ã…ngstroms) of the experimental structure, which is functionally equivalent. Side-chain placements and flexible loops can have larger errors.
12. Can it model proteins with chemical modifications (e.g., phosphorylation)?
Not directly. The standard model treats all amino acids in their basic form. Modified residues would need to be represented as non-standard residues, which is an area of active research.
13. What is the role of the European Molecular Biology Laboratory (EMBL)?
EMBL-EBI hosts and maintains the public AlphaFold Database, ensuring its long-term stability, accessibility, and integration with other key biological data resources for the global scientific community.
14. How does the attention mechanism work in simple terms?
Think of it as the network learning to “pay attention” to different pairs of amino acids in the sequence. If two positions are always changing in sync across evolution, the network learns to place them close together in 3D space, regardless of how far apart they are in the linear sequence.
15. Can I use it for protein engineering?
Absolutely. It’s a core tool for rational design. You can model your protein, introduce mutations in silico, and use AlphaFold to quickly check if the mutant is predicted to fold correctly before spending time and money on lab experiments.
16. What is ColabFold and why is it popular?
ColabFold is a streamlined, cloud-based pipeline that combines the fast MMseqs2 for MSA generation with AlphaFold or RoseTTAFold. It runs on free Google Colab notebooks, making state-of-the-art structure prediction accessible to anyone with a Google account and a web browser.
17. Has AlphaFold helped with COVID-19 research?
Yes. Early in the pandemic, AlphaFold and similar models were used to rapidly predict structures of understudied SARS-CoV-2 proteins, such as ORF3a and ORF8, providing immediate hypotheses about their function and potential drug targets to guide experimental work.
18. What is a “confidence score” and how should I use it?
The pLDDT score (0-100) is your guide. >90: High confidence, trust the atomic details. 70-90: Good backbone, side chains may vary. 50-70: Low confidence, topology may be roughly correct but details wrong. <50: Very low confidence, likely disordered. Always color your structure by PLDDT in visualization software.
19. Where can I learn to visualize and analyze AlphaFold models?
Use free software like UCSF ChimeraX or PyMOL. Load the PDB file and color it by the B-factor column (which contains the pLDDT scores). Many tutorials are available online from institutions like EMBL.
20. What’s the biggest unsolved problem in structural biology that AI is now tackling?
Predicting the structures of large, dynamic complexes in their cellular context, like a ribosome translating a membrane protein while it’s being inserted into a membrane by the translocon. This requires integrating structural prediction with cellular-scale data and physics.
About the Author
This guide was written by a computational biologist and science writer with a decade of experience spanning academic research at the intersection of AI and biology, as well as roles in biotech venture capital. Having followed the CASP competitions for years and worked with teams implementing AlphaFold in drug discovery pipelines, the author is passionate about explaining how this foundational AI breakthrough is reshaping our capacity to understand and engineer life. For more expert explanations of breakthroughs shaping our world, visit our main Explained section at The Daily Explainer.
Free Resources to Continue Your Learning
- The AlphaFold Database:Â The primary resource to search for predicted protein structures.
- ColabFold:Â The easiest way to run your own AlphaFold predictions for free.
- EMBL-EBI’s AlphaFold Resources:Â Tutorials, webinars, and documentation on using AlphaFold and interpreting results.
- David Baker’s Lab (University of Washington):Â The homepage for Rosetta, RoseTTAFold, and groundbreaking work in de novo protein design. Features many accessible lectures and articles.
- “AlphaFold: The Inside Story” on DeepMind’s YouTube Channel:Â A fantastic, detailed lecture by the lead members of the AlphaFold team.
Discussion
The advent of AI like AlphaFold forces us to reconsider the process of discovery itself.
- How should scientific credit be shared between AI developers and the biologists who use their tools for breakthrough discoveries?
- In a world where protein structures are a click away, what becomes the new scarce and valuable skill in molecular biology?
- How can we ensure the benefits of AI-driven biological design are distributed equitably across the globe?
We invite you to share your perspectives on these questions and more. The conversation about the role of AI in science is just beginning. For thoughts on how technology intersects with other complex systems, our exploration of global supply chain management offers a parallel look at interconnected, modern challenges.