`with AlphaFold
`
`https://doi.org/10.1038/s41586-021-03819-2
`Received: 11 May 2021
`Accepted: 12 July 2021
`Published online: 15 July 2021
`Open access
` Check for updates
`
`John Jumper1,4 ✉, Richard Evans1,4, Alexander Pritzel1,4, Tim Green1,4, Michael Figurnov1,4,
`Olaf Ronneberger1,4, Kathryn Tunyasuvunakool1,4, Russ Bates1,4, Augustin Žídek1,4,
`Anna Potapenko1,4, Alex Bridgland1,4, Clemens Meyer1,4, Simon A. A. Kohl1,4,
`Andrew J. Ballard1,4, Andrew Cowie1,4, Bernardino Romera-Paredes1,4, Stanislav Nikolov1,4,
`Rishub Jain1,4, Jonas Adler1, Trevor Back1, Stig Petersen1, David Reiman1, Ellen Clancy1,
`Michal Zielinski1, Martin Steinegger2,3, Michalina Pacholska1, Tamas Berghammer1,
`Sebastian Bodenstein1, David Silver1, Oriol Vinyals1, Andrew W. Senior1, Koray Kavukcuoglu1,
`Pushmeet Kohli1 & Demis Hassabis1,4 ✉
`
`Proteins are essential to life, and understanding their structure can facilitate a
`mechanistic understanding of their function. Through an enormous experimental
`effort1–4, the structures of around 100,000 unique proteins have been determined5, but
`this represents a small fraction of the billions of known protein sequences6,7. Structural
`coverage is bottlenecked by the months to years of painstaking effort required to
`determine a single protein structure. Accurate computational approaches are needed
`to address this gap and to enable large-scale structural bioinformatics. Predicting the
`three-dimensional structure that a protein will adopt based solely on its amino acid
`sequence—the structure prediction component of the ‘protein folding problem’8—has
`been an important open research problem for more than 50 years9. Despite recent
`progress10–14, existing methods fall far short of atomic accuracy, especially when no
`homologous structure is available. Here we provide the first computational method
`that can regularly predict protein structures with atomic accuracy even in cases in which
`no similar structure is known. We validated an entirely redesigned version of our neural
`network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein
`Structure Prediction (CASP14)15, demonstrating accuracy competitive with
`experimental structures in a majority of cases and greatly outperforming other
`methods. Underpinning the latest version of AlphaFold is a novel machine learning
`approach that incorporates physical and biological knowledge about protein structure,
`leveraging multi-sequence alignments, into the design of the deep learning algorithm.
`
`The development of computational methods to predict
`three-dimensional (3D) protein structures from the protein sequence
`has proceeded along two complementary paths that focus on either the
`physical interactions or the evolutionary history. The physical interac-
`tion programme heavily integrates our understanding of molecular
`driving forces into either thermodynamic or kinetic simulation of pro-
`tein physics16 or statistical approximations thereof17. Although theoreti-
`cally very appealing, this approach has proved highly challenging for
`even moderate-sized proteins due to the computational intractability
`of molecular simulation, the context dependence of protein stability
`and the difficulty of producing sufficiently accurate models of protein
`physics. The evolutionary programme has provided an alternative in
`recent years, in which the constraints on protein structure are derived
`from bioinformatics analysis of the evolutionary history of proteins,
`homology to solved structures18,19 and pairwise evolutionary correla-
`tions20–24. This bioinformatics approach has benefited greatly from
`
`the steady growth of experimental protein structures deposited in
`the Protein Data Bank (PDB)5, the explosion of genomic sequencing
`and the rapid development of deep learning techniques to interpret
`these correlations. Despite these advances, contemporary physical
`and evolutionary-history-based approaches produce predictions that
`are far short of experimental accuracy in the majority of cases in which
`a close homologue has not been solved experimentally and this has
`limited their utility for many biological applications.
`In this study, we develop the first, to our knowledge, computational
`approach capable of predicting protein structures to near experimental
`accuracy in a majority of cases. The neural network AlphaFold that we
`developed was entered into the CASP14 assessment (May–July 2020;
`entered under the team name ‘AlphaFold2’ and a completely different
`model from our CASP13 AlphaFold system10). The CASP assessment is
`carried out biennially using recently solved structures that have not
`been deposited in the PDB or publicly disclosed so that it is a blind test
`
`1DeepMind, London, UK. 2School of Biological Sciences, Seoul National University, Seoul, South Korea. 3Artificial Intelligence Institute, Seoul National University, Seoul, South Korea. 4These
`authors contributed equally: John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna
`Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Demis Hassabis.
`✉e-mail: jumper@deepmind.com; dhcontact@deepmind.com
`
`Nature | Vol 596 | 26 August 2021 | 583
`
`Article
`
`Inari Ex. 1021
`Inari Agric. v. Corteva Agriscience
`PGR2023-00022
`Page 00001
`
`
`
`b
`
`c
`
`d
`
`N terminus
`
`C terminus
`
`AlphaFold Experiment
`r.m.s.d.95 = 0.8 Å; TM-score = 0.93
`
`AlphaFold Experiment
`r.m.s.d. = 0.59 Å within 8 Å of Zn
`
`AlphaFold Experiment
`r.m.s.d.95 = 2.2 Å; TM-score = 0.96
`
`G216
`G253
`G362
`G324
`G368
`G488
`G498
`G480
`G420
`G032
`G403
`G129
`G473
`G009
`AlphaFold
`G427
`
`4
`
`3
`
`2
`
`1
`
`0
`
`a
`
`Median C r.m.s.d.95 (Å)
`
`e
`
`Input sequence
`
`Genetic
`database
`search
`
`Pairing
`
`Structure
`database
`search
`
`MSA
`
`Templates
`
`+
`
`MSA
`representation
`(s,r,c)
`
`Single repr. (r,c)
`
`High
`con(cid:31)dence
`
`Low
`con(cid:31)dence
`
`Evoformer
`(48 blocks)
`
`Structure
`module
` (8 blocks)
`
`+
`
`Pair
`representation
`(r,r,c)
`
`Pair
`representation
`(r,r,c)
`
`3D structure
`
`ࢎ Recycling (three times)
`
`Fig. 1 | AlphaFold produces highly accurate structures. a, The performance
`of AlphaFold on the CASP14 dataset (n = 87 protein domains) relative to the top-
`15 entries (out of 146 entries), group numbers correspond to the numbers
`assigned to entrants by CASP. Data are median and the 95% confidence interval
`of the median, estimated from 10,000 bootstrap samples. b, Our prediction of
`CASP14 target T1049 (PDB 6Y4F, blue) compared with the true (experimental)
`structure (green). Four residues in the C terminus of the crystal structure are
`B-factor outliers and are not depicted. c, CASP14 target T1056 (PDB 6YJ1).
`
`An example of a well-predicted zinc-binding site (AlphaFold has accurate side
`chains even though it does not explicitly predict the zinc ion). d, CASP target
`T1044 (PDB 6VR4)—a 2,180-residue single chain—was predicted with correct
`domain packing (the prediction was made after CASP using AlphaFold without
`intervention). e, Model architecture. Arrows show the information flow among
`the various components described in this paper. Array shapes are shown in
`parentheses with s, number of sequences (Nseq in the main text); r, number of
`residues (Nres in the main text); c, number of channels.
`
`for the participating methods, and has long served as the gold-standard
`assessment for the accuracy of structure prediction25,26.
`In CASP14, AlphaFold structures were vastly more accurate than
`competing methods. AlphaFold structures had a median backbone
`accuracy of 0.96 Å r.m.s.d.95 (Cα root-mean-square deviation at 95%
`residue coverage) (95% confidence interval = 0.85–1.16 Å) whereas
`the next best performing method had a median backbone accuracy
`of 2.8 Å r.m.s.d.95 (95% confidence interval = 2.7–4.0 Å) (measured on
`CASP domains; see Fig. 1a for backbone accuracy and Supplementary
`Fig. 14 for all-atom accuracy). As a comparison point for this accuracy,
`the width of a carbon atom is approximately 1.4 Å. In addition to very
`accurate domain structures (Fig. 1b), AlphaFold is able to produce
`highly accurate side chains (Fig. 1c) when the backbone is highly accu-
`rate and considerably improves over template-based methods even
`when strong templates are available. The all-atom accuracy of Alpha-
`Fold was 1.5 Å r.m.s.d.95 (95% confidence interval = 1.2–1.6 Å) compared
`with the 3.5 Å r.m.s.d.95 (95% confidence interval = 3.1–4.2 Å) of the best
`alternative method. Our methods are scalable to very long proteins with
`accurate domains and domain-packing (see Fig. 1d for the prediction
`of a 2,180-residue protein with no structural homologues). Finally, the
`model is able to provide precise, per-residue estimates of its reliability
`that should enable the confident use of these predictions.
`We demonstrate in Fig. 2a that the high accuracy that AlphaFold dem-
`onstrated in CASP14 extends to a large sample of recently released PDB
`
`structures; in this dataset, all structures were deposited in the PDB after
`our training data cut-off and are analysed as full chains (see Methods,
`Supplementary Fig. 15 and Supplementary Table 6 for more details).
`Furthermore, we observe high side-chain accuracy when the back-
`bone prediction is accurate (Fig. 2b) and we show that our confidence
`measure, the predicted local-distance difference test (pLDDT), reliably
`predicts the Cα local-distance difference test (lDDT-Cα) accuracy of the
`corresponding prediction (Fig. 2c). We also find that the global super-
`position metric template modelling score (TM-score)27 can be accu-
`rately estimated (Fig. 2d). Overall, these analyses validate that the high
`accuracy and reliability of AlphaFold on CASP14 proteins also transfers
`to an uncurated collection of recent PDB submissions, as would be
`expected (see Supplementary Methods 1.15 and Supplementary Fig. 11
`for confirmation that this high accuracy extends to new folds).
`
`The AlphaFold network
`AlphaFold greatly improves the accuracy of structure prediction by
`incorporating novel neural network architectures and training proce-
`dures based on the evolutionary, physical and geometric constraints
`of protein structures. In particular, we demonstrate a new architecture
`to jointly embed multiple sequence alignments (MSAs) and pairwise
`features, a new output representation and associated loss that enable
`accurate end-to-end structure prediction, a new equivariant attention
`
`584 | Nature | Vol 596 | 26 August 2021
`
`Article
`
`PGR2023-00022 Page 00002
`
`
`
`architecture, use of intermediate losses to achieve iterative refinement
`of predictions, masked MSA loss to jointly train with the structure,
`learning from unlabelled protein sequences using self-distillation and
`self-estimates of accuracy.
`The AlphaFold network directly predicts the 3D coordinates of all
`heavy atoms for a given protein using the primary amino acid sequence
`and aligned sequences of homologues as inputs (Fig. 1e; see Methods
`for details of inputs including databases, MSA construction and use of
`templates). A description of the most important ideas and components
`is provided below. The full network architecture and training procedure
`are provided in the Supplementary Methods.
`The network comprises two main stages. First, the trunk of the net-
`work processes the inputs through repeated layers of a novel neural
`network block that we term Evoformer to produce an Nseq × Nres array
`(Nseq, number of sequences; Nres, number of residues) that represents
`a processed MSA and an Nres × Nres array that represents residue pairs.
`The MSA representation is initialized with the raw MSA (although
`see Supplementary Methods 1.2.7 for details of handling very deep
`MSAs). The Evoformer blocks contain a number of attention-based
`and non-attention-based components. We show evidence in ‘Interpret-
`ing the neural network’ that a concrete structural hypothesis arises
`early within the Evoformer blocks and is continuously refined. The key
`innovations in the Evoformer block are new mechanisms to exchange
`information within the MSA and pair representations that enable direct
`reasoning about the spatial and evolutionary relationships.
`The trunk of the network is followed by the structure module that
`introduces an explicit 3D structure in the form of a rotation and transla-
`tion for each residue of the protein (global rigid body frames). These
`representations are initialized in a trivial state with all rotations set to
`the identity and all positions set to the origin, but rapidly develop and
`refine a highly accurate protein structure with precise atomic details.
`Key innovations in this section of the network include breaking the
`chain structure to allow simultaneous local refinement of all parts of
`the structure, a novel equivariant transformer to allow the network to
`implicitly reason about the unrepresented side-chain atoms and a loss
`term that places substantial weight on the orientational correctness
`of the residues. Both within the structure module and throughout
`the whole network, we reinforce the notion of iterative refinement
`by repeatedly applying the final loss to outputs and then feeding the
`outputs recursively into the same modules. The iterative refinement
`using the whole network (which we term ‘recycling’ and is related to
`approaches in computer vision28,29) contributes markedly to accuracy
`with minor extra training time (see Supplementary Methods 1.8 for
`details).
`
`Evoformer
`The key principle of the building block of the network—named Evo-
`former (Figs. 1e, 3a)—is to view the prediction of protein structures
`as a graph inference problem in 3D space in which the edges of the
`graph are defined by residues in proximity. The elements of the pair
`representation encode information about the relation between the
`residues (Fig. 3b). The columns of the MSA representation encode the
`individual residues of the input sequence while the rows represent
`the sequences in which those residues appear. Within this framework,
`we define a number of update operations that are applied in each block
`in which the different update operations are applied in series.
`The MSA representation updates the pair representation through an
`element-wise outer product that is summed over the MSA sequence
`dimension. In contrast to previous work30, this operation is applied
`within every block rather than once in the network, which enables the
`continuous communication from the evolving MSA representation to
`the pair representation.
`Within the pair representation, there are two different update pat-
`terns. Both are inspired by the necessity of consistency of the pair
`
`Nature | Vol 596 | 26 August 2021 | 585
`
`20
`
`80
`60
`40
`lDDT-C of a residue
`
`100
`
`100
`
`90
`
`1.0
`
`0.9
`
`0.8
`
`0.7
`
`0.6
`
`0.5
`
`b
`
`Fraction of correct (cid:70)1 rotamers
`
`>8
`
`4–8
`
`2–4
`
`1–2
`
`0.5–1
`
`0–0.5
`
`Full chain C r.m.s.d.95 (Å)
`
`0.30
`
`0.25
`
`0.20
`
`0.15
`
`0.10
`
`0.05
`
`0
`
`Fraction of proteinsa
`
`c
`
`100
`
`80
`
`60
`
`40
`
`20
`
`lDDT-C
`
`80
`60
`40
`20
`Average pLDDT on the resolved region
`
`100
`
`80
`
`80
`
`90
`
`100
`
`1.0
`
`0.9
`
`0.8
`
`0.8
`
`0.9
`
`1.0
`
`0.8
`0.6
`0.4
`0.2
`pTM on the resolved region
`
`1.0
`
`0
`
`0
`
`d
`
`1.0
`
`0.8
`
`0.6
`
`0.4
`
`0.2
`
`TM-score
`
`0
`
`0
`
`Fig. 2 | Accuracy of AlphaFold on recent PDB structures. The analysed
`structures are newer than any structure in the training set. Further filtering is
`applied to reduce redundancy (see Methods). a, Histogram of backbone
`r.m.s.d. for full chains (Cα r.m.s.d. at 95% coverage). Error bars are 95%
`confidence intervals (Poisson). This dataset excludes proteins with a template
`(identified by hmmsearch) from the training set with more than 40% sequence
`identity covering more than 1% of the chain (n = 3,144 protein chains). The
`overall median is 1.46 Å (95% confidence interval = 1.40–1.56 Å). Note that this
`measure will be highly sensitive to domain packing and domain accuracy; a
`high r.m.s.d. is expected for some chains with uncertain packing or packing
`errors. b, Correlation between backbone accuracy and side-chain accuracy.
`Filtered to structures with any observed side chains and resolution better than
`2.5 Å (n = 5,317 protein chains); side chains were further filtered to
`B-factor <30 Å2. A rotamer is classified as correct if the predicted torsion angle
`is within 40°. Each point aggregates a range of lDDT-Cα, with a bin size of 2 units
`above 70 lDDT-Cα and 5 units otherwise. Points correspond to the mean
`accuracy; error bars are 95% confidence intervals (Student t-test) of the mean
`on a per-residue basis. c, Confidence score compared to the true accuracy on
`chains. Least-squares linear fit lDDT-Cα = 0.997 × pLDDT − 1.17 (Pearson’s
`r = 0.76). n = 10,795 protein chains. The shaded region of the linear fit
`represents a 95% confidence interval estimated from 10,000 bootstrap
`samples. In the companion paper39, additional quantification of the reliability
`of pLDDT as a confidence measure is provided. d, Correlation between pTM
`and full chain TM-score. Least-squares linear fit TM-score = 0.98 × pTM + 0.07
`(Pearson’s r = 0.85). n = 10,795 protein chains. The shaded region of the linear fit
`represents a 95% confidence interval estimated from 10,000 bootstrap
`samples.
`
`PGR2023-00022 Page 00003
`
`
`
`a
`
`MSA
`representation
`(s,r,c)
`
`Pair
`representation
`(r,r,c)
`
`48 blocks (no shared weights)
`
`Row-wise
`gated
`self-
`attention
`with pair
`bias
`
`+
`
`Column-
`wise
`gated
`self-
`attention
`
`+
`
`Tran-
`sition
`
`+
`
`MSA
`representation
`(s,r,c)
`
`Outer
`product
`mean
`
`+
`
`Triangle
`update
`using
`outgoing
`edges
`
`+
`
`Triangle
`update
`using
`incoming
`edges
`
`+
`
`Triangle
`self-
`attention
`around
`starting
`node
`
`+
`
`Triangle
`self-
`attention
`around
`ending
`node
`
`+
`
`Tran-
`sition
`
`+
`
`Pair
`representation
`(r,r,c)
`
`Pair representation
`(r,r,c)
`
`Corresponding edges
`in a graph
`
`c
`
`Triangle multiplicative update
`using ‘outgoing’ edges
`
`Triangle multiplicative update
`using ‘incoming’ edges
`
`Triangle self-attention around
`starting node
`
`Triangle self-attention around
`ending node
`
`i
`
`ik
`
`ij
`
`jk
`
`k
`
`j
`
`i
`
`ij
`
`j
`
`i
`
`ij
`
`j
`
`i
`
`ij
`
`j
`
`ki
`
`kj
`
`k
`
`ik
`
`jk
`
`ki
`
`kj
`
`k
`
`k
`
`i
`
`ki
`
`ik
`
`ji
`
`ij
`
`k
`
`j
`
`jk
`
`kj
`
`k i
`
`k
`
`jk
`
`j
`
`ij
`
`kj
`
`i
`
`ji
`ki
`
`Pair
`representation
`(r,r,c)
`
`b
`
`i j k
`
`d
`
`8 blocks (shared weights)
`
`e
`
`Predict (cid:70) angles
`and compute all
`atom positions
`
`Single repr. (r,c)
`
`IPA
`module
`
`+
`
`Single repr. (r,c)
`
`Predict relative
`rotations and
`translations
`
`Backbone frames
`(r, 3×3) and (r,3)
`(initially all at the origin)
`
`Backbone frames
`(r, 3×3) and (r,3)
`
`f
`
`(R
`k, t
`k)
`
`~
`(R
`~
`k, t
`k)
`
`~
`xi
`
`xi
`
`Fig. 3 | Architectural details. a, Evoformer block. Arrows show the information
`flow. The shape of the arrays is shown in parentheses. b, The pair representation
`interpreted as directed edges in a graph. c, Triangle multiplicative update and
`triangle self-attention. The circles represent residues. Entries in the pair
`representation are illustrated as directed edges and in each diagram, the edge
`being updated is ij. d, Structure module including Invariant point attention (IPA)
`
`module. The single representation is a copy of the first row of the MSA
`representation. e, Residue gas: a representation of each residue as one
`free-floating rigid body for the backbone (blue triangles) and χ angles for the
`side chains (green circles). The corresponding atomic structure is shown below.
`f, Frame aligned point error (FAPE). Green, predicted structure; grey, true
`structure; (Rk, tk), frames; xi, atom positions.
`
`representation—for a pairwise description of amino acids to be represent-
`able as a single 3D structure, many constraints must be satisfied including
`the triangle inequality on distances. On the basis of this intuition, we
`arrange the update operations on the pair representation in terms of
`triangles of edges involving three different nodes (Fig. 3c). In particular,
`we add an extra logit bias to axial attention31 to include the ‘missing edge’
`of the triangle and we define a non-attention update operation ‘triangle
`multiplicative update’ that uses two edges to update the missing third
`edge (see Supplementary Methods 1.6.5 for details). The triangle multipli-
`cative update was developed originally as a more symmetric and cheaper
`replacement for the attention, and networks that use only the attention or
`multiplicative update are both able to produce high-accuracy structures.
`However, the combination of the two updates is more accurate.
`We also use a variant of axial attention within the MSA representation.
`During the per-sequence attention in the MSA, we project additional
`logits from the pair stack to bias the MSA attention. This closes the loop
`by providing information flow from the pair representation back into
`the MSA representation, ensuring that the overall Evoformer block is
`able to fully mix information between the pair and MSA representations
`and prepare for structure generation within the structure module.
`
`End-to-end structure prediction
`The structure module (Fig. 3d) operates on a concrete 3D backbone
`structure using the pair representation and the original sequence row
`(single representation) of the MSA representation from the trunk. The
`3D backbone structure is represented as Nres independent rotations
`and translations, each with respect to the global frame (residue gas)
`(Fig. 3e). These rotations and translations—representing the geometry
`of the N-Cα-C atoms—prioritize the orientation of the protein back-
`bone so that the location of the side chain of each residue is highly
`constrained within that frame. Conversely, the peptide bond geometry
`is completely unconstrained and the network is observed to frequently
`violate the chain constraint during the application of the structure mod-
`ule as breaking this constraint enables the local refinement of all parts
`of the chain without solving complex loop closure problems. Satisfac-
`tion of the peptide bond geometry is encouraged during fine-tuning
`by a violation loss term. Exact enforcement of peptide bond geometry
`is only achieved in the post-prediction relaxation of the structure by
`gradient descent in the Amber32 force field. Empirically, this final relaxa-
`tion does not improve the accuracy of the model as measured by the
`
`586 | Nature | Vol 596 | 26 August 2021
`
`Article
`
`PGR2023-00022 Page 00004
`
`
`
`a
`
`Test set of CASP14 domains
`
`Test set of PDB chains
`
`0
`–10
`–20
`GDT difference compared
`with baseline
`
`–4
`
`0
`–2
`lDDT-C difference
`compared with baseline
`
`2
`
`T1024 D1
`T1024 D2
`T1064 D1
`
`With self-distillation training
`
`Baseline
`
`No templates
`
`No auxiliary distogram head
`
`No raw MSA
`(use MSA pairwise frequencies)
`
`No IPA (use direct projection)
`
`No auxiliary masked MSA head
`
`No recycling
`
`No triangles, biasing or gating
`(use axial attention)
`No end-to-end structure gradients
`(keep auxiliary heads)
`
`No IPA and no recycling
`
`b
`
`100
`
`80
`
`60
`
`40
`
`20
`
`0
`
`Domain GDT
`
`0
`
`48
`
`96
`
`144
`
`192
`
`Evoformer block
`
`Fig. 4 | Interpreting the neural network. a, Ablation results on two target sets:
`the CASP14 set of domains (n = 87 protein domains) and the PDB test set of
`chains with template coverage of ≤30% at 30% identity (n = 2,261 protein
`chains). Domains are scored with GDT and chains are scored with lDDT-Cα. The
`ablations are reported as a difference compared with the average of the three
`baseline seeds. Means (points) and 95% bootstrap percentile intervals (error
`bars) are computed using bootstrap estimates of 10,000 samples. b, Domain
`GDT trajectory over 4 recycling iterations and 48 Evoformer blocks on CASP14
`targets LmrP (T1024) and Orf8 (T1064) where D1 and D2 refer to the individual
`domains as defined by the CASP assessment. Both T1024 domains obtain the
`correct structure early in the network, whereas the structure of T1064 changes
`multiple times and requires nearly the full depth of the network to reach the
`final structure. Note, 48 Evoformer blocks comprise one recycling iteration.
`
`Evoformer block—in which each intermediate represents the belief of
`the network of the most likely structure at that block. The resulting
`trajectories are surprisingly smooth after the first few blocks, show-
`ing that AlphaFold makes constant incremental improvements to the
`structure until it can no longer improve (see Fig. 4b for a trajectory of
`accuracy). These trajectories also illustrate the role of network depth.
`For very challenging proteins such as ORF8 of SARS-CoV-2 (T1064),
`the network searches and rearranges secondary structure elements
`for many layers before settling on a good structure. For other proteins
`such as LmrP (T1024), the network finds the final structure within the
`first few layers. Structure trajectories of CASP14 targets T1024, T1044,
`T1064 and T1091 that demonstrate a clear iterative building process
`for a range of protein sizes and difficulties are shown in Supplementary
`Videos 1–4. In Supplementary Methods 1.16 and Supplementary Figs. 12,
`13, we interpret the attention maps produced by AlphaFold layers.
`Figure 4a contains detailed ablations of the components of AlphaFold
`that demonstrate that a variety of different mechanisms contribute
`to AlphaFold accuracy. Detailed descriptions of each ablation model,
`their training details, extended discussion of ablation results and the
`
`Nature | Vol 596 | 26 August 2021 | 587
`
`global distance test (GDT)33 or lDDT-Cα34 but does remove distracting
`stereochemical violations without the loss of accuracy.
`The residue gas representation is updated iteratively in two stages
`(Fig. 3d). First, a geometry-aware attention operation that we term
`‘invariant point attention’ (IPA) is used to update an Nres set of neural
`activations (single representation) without changing the 3D positions,
`then an equivariant update operation is performed on the residue gas
`using the updated activations. The IPA augments each of the usual
`attention queries, keys and values with 3D points that are produced
`in the local frame of each residue such that the final value is invariant
`to global rotations and translations (see Methods ‘IPA’ for details). The
`3D queries and keys also impose a strong spatial/locality bias on the
`attention, which is well-suited to the iterative refinement of the protein
`structure. After each attention operation and element-wise transition
`block, the module computes an update to the rotation and translation
`of each backbone frame. The application of these updates within the
`local frame of each residue makes the overall attention and update
`block an equivariant operation on the residue gas.
`Predictions of side-chain χ angles as well as the final, per-residue
`accuracy of the structure (pLDDT) are computed with small per-residue
`networks on the final activations at the end of the network. The estimate
`of the TM-score (pTM) is obtained from a pairwise error prediction that
`is computed as a linear projection from the final pair representation. The
`final loss (which we term the frame-aligned point error (FAPE) (Fig. 3f))
`compares the predicted atom positions to the true positions under
`many different alignments. For each alignment, defined by aligning
`the predicted frame (Rk, tk) to the corresponding true frame, we com-
`pute the distance of all predicted atom positions xi from the true atom
`positions. The resulting Nframes × Natoms distances are penalized with a
`clamped L1 loss. This creates a strong bias for atoms to be correct relative
`to the local frame of each residue and hence correct with respect to its
`side-chain interactions, as well as providing the main source of chirality
`for AlphaFold (Supplementary Methods 1.9.3 and Supplementary Fig. 9).
`
`Training with labelled and unlabelled data
`The AlphaFold architecture is able to train to high accuracy using only
`supervised learning on PDB data, but we are able to enhance accuracy
`(Fig. 4a) using an approach similar to noisy student self-distillation35.
`In this procedure, we use a trained network to predict the structure of
`around 350,000 diverse sequences from Uniclust3036 and make a new
`dataset of predicted structures filtered to a high-confidence subset. We
`then train the same architecture again from scratch using a mixture of
`PDB data and this new dataset of predicted structures as the training
`data, in which the various training data augmentations such as crop-
`ping and MSA subsampling make it challenging for the network to
`recapitulate the previously predicted structures. This self-distillation
`procedure makes effective use of the unlabelled sequence data and
`considerably improves the accuracy of the resulting network.
`Additionally, we randomly mask out or mutate individual residues
`within the MSA and have a Bidirectional Encoder Representations from
`Transformers (BERT)-style37 objective to predict the masked elements of
`the MSA sequences. This objective encourages the network to learn to
`interpret phylogenetic and covariation relationships without hardcoding
`a particular correlation statistic into the features. The BERT objective is
`trained jointly with the normal PDB structure loss on the same training
`examples and is not pre-trained, in contrast to recent independent work38.
`
`Interpreting the neural network
`To understand how AlphaFold predicts protein structure, we trained
`a separate structure module for each of the 48 Evoformer blocks in
`the network while keeping all parameters of the main network fro-
`zen (Supplementary Methods 1.14). Including our recycling stages,
`this provides a trajectory of 192 intermediate structures—one per full
`
`PGR2023-00022 Page 00005
`
`
`
`b
`
`Coverage < 0.3
`Coverage > 0.6
`
`AlphaFold Experiment
`
`100
`
`101
`103
`102
`Median per-residue Neff for the chain
`
`104
`
`a
`
`100
`
`80
`
`60
`
`40
`
`20
`
`0
`
`lDDT-C
`
`Fig. 5 | Effect of MSA depth and cross-chain contacts. a, Backbone accuracy
`(lDDT-Cα) for the redundancy-reduced set of the PDB after our training data
`cut-off, restricting to proteins in which at most 25% of the long-range contacts
`are between different heteromer chains. We further consider two groups of
`proteins based on template coverage at 30% sequence identity: covering more
`than 60% of the chain (n = 6,743 protein chains) and covering less than 30% of
`the chain (n = 1,596 protein chains). MSA depth is computed by counting the
`
`number of non-gap residues for each position in the MSA (using the Neff
`weighting scheme; see Methods for details) and taking the median across
`residues. The curves are obtained through Gaussian kernel average smoothing
`(window size is 0.2 units in log10(Neff)); the shaded area is the 95% confidence
`interval estimated using bootstrap of 10,000 samples. b, An intertwined
`homotrimer (PDB 6SK0) is correctly predicted without input stoichiometry
`and only a weak template (blue is predicted and green is experimental).
`
`effect of MSA depth on each ablation are provided in Supplementary
`Methods 1.13 and Supplementary Fig. 10.
`
`MSA depth and cross-chain contacts
`Although AlphaFold has a high accuracy across the vast majority of
`deposited PDB structures, we note that there are still factors that affect
`accuracy or limit the applicability of the model. The model uses MSAs
`and the accuracy decreases substantially when the median alignment
`depth is less than around 30 sequences (see Fig. 5a for details). We
`observe a threshold effect where improvements in MSA depth over
`around 100 sequences lead to small gains. We hypothesize that the MSA
`information is needed to coarsely find the correct structure within the
`early stages of the network, but refinement of that prediction into a
`high-accuracy model does not depend crucially on the MSA information.
`The other substantial limitation that we have observed is that AlphaFold
`is much weaker for proteins that have few intra-chain or homotypic con-
`tacts compared to the number of heterotypic contacts (further details
`are provided in a companion paper39). This typically occurs for bridging
`domains within larger complexes in which the shape of the protein is
`created almost entirely by interactions with other chains in the complex.
`Conversely, AlphaFold is often able to give high-accuracy predictions for
`homomers, even when the chains are substantially intertwined (Fig. 5b).
`We expect that the ideas of AlphaFold are readily applicable to predicting
`full hetero-complexes in a future system and that this will remove the dif-
`ficulty with protein chains that have a large number of hetero-contacts.
`
`Related work
`The prediction of protein structures has had a long and varied develop-
`ment, which is extensively covered in a number of reviews14,40–43. Despite
`the lon