`
`PERSPECTIVE
`
`The nature of protein folding pathways
`
`S. Walter Englander1 and Leland Mayne
`Johnson Research Foundation, Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania,
`Philadelphia, PA 19104
`
`Edited by Alan R. Fersht, Medical Research Council Laboratory of Molecular Biology, Cambridge, United Kingdom, and approved September 23, 2014 (received for review June
`24, 2014)
`
`How do proteins fold, and why do they fold in that way? This Perspective integrates earlier and more recent advances over the 50-y history of
`the protein folding problem, emphasizing unambiguously clear structural information. Experimental results show that, contrary to prior belief,
`proteins are multistate rather than two-state objects. They are composed of separately cooperative foldon building blocks that can be seen to
`repeatedly unfold and refold as units even under native conditions. Similarly, foldons are lost as units when proteins are destabilized to
`produce partially unfolded equilibrium molten globules. In kinetic folding, the inherently cooperative nature of foldons predisposes the
`thermally driven amino acid-level search to form an initial foldon and subsequent foldons in later assisted searches. The small size of foldon
`units, ∼20 residues, resolves the Levinthal time-scale search problem. These microscopic-level search processes can be identified with the
`disordered multitrack search envisioned in the “new view” model for protein folding. Emergent macroscopic foldon–foldon interactions then
`collectively provide the structural guidance and free energy bias for the ordered addition of foldons in a stepwise pathway that sequentially
`builds the native protein. These conclusions reconcile the seemingly opposed new view and defined pathway models; the two models account
`for different stages of the protein folding process. Additionally, these observations answer the “how” and the “why” questions. The protein
`folding pathway depends on the same foldon units and foldon–foldon interactions that construct the native structure.
`protein folding | hydrogen exchange | protein structure
`
`Proteins must fold to their active native state
`when they emerge from the ribosome and
`when they repeatedly unfold and refold during
`their lifetime (1, 2). The folding process is
`difficult (3, 4) and potentially dangerous (5).
`Biological health depends on its success and
`disease on its failure. However, more than 50 y
`after the formative demonstration that protein
`folding is a straightforward biophysical pro-
`cess (6), there is not general agreement on
`the overarching questions of how proteins fold
`and why they fold in that way. Given this
`uncertainty, one is not sure how to even think
`about many related biophysical and biological
`problems.
`Early in the history of the folding field,
`experimentalists simply assumed that proteins
`fold through distinct intermediate states in a
`distinct pathway (Fig. 1A), as seen for a classical
`biochemical pathways. Following Anfinsen’s
`demonstration that proteins can fold all by
`themselves without outside help (6), Levinthal
`perceived that no undirected folding process
`would be able to find the native structure by
`random searching through the vast number of
`structural options (3, 4). Proteins must solve
`the problem, he believed, by folding through
`predetermined pathways, although one had
`no clue how or why that should occur.
`A realization of the inability to equilibrate
`to a common structure (3, 4) and the en-
`semble nature of partially folded forms led
`the theoretical community to a very differ-
`ent more statistical “new view” (7–11). It
`was inferred that proteins must fold to their
`
`unique native state through multiple un-
`predictable routes and intermediate con-
`formations. Another prominent inference
`configured the Anfinsen thermodynamic
`hypothesis (6) in terms of a funnel-shaped
`energy landscape diagram (Fig. 1B), which
`pictures that proteins must fold energeti-
`cally downhill (the Z axis) and shrink in
`conformational extent (the generalized XY
`plane) as they go (9, 12–14). To fill out the
`landscape picture, classical rate-determining
`kinetic barriers are often replaced by qual-
`itative concepts such as ruggedness, frus-
`tration, and traps, and major species by
`deep wells, all forming a kind of metalan-
`guage known as “energy landscape theory”
`(10, 15–17). The graphic funnel picture is
`a generic representation,
`independent of
`structural and thermodynamic detail and
`equally applicable to any protein, RNA, or
`other compact polymer. Although it pro-
`vides no constraints that would exclude
`any realistic folding scenario, even a defined
`pathway model, it has been widely inter-
`preted to require that proteins fold through
`many independent pathways.
`R. L. Baldwin took up the challenge and
`led the field in a multiyear effort to exper-
`imentally define kinetic folding intermediates
`and pathways (18–20). In a thoughtful pro-
`tein folding review 20 y ago, Baldwin consid-
`ered the disparate insights available at the
`time from both theory and experiment
`(21). He highlighted uncertainties in the ex-
`perimental evidence for classical pathways.
`
`Kinetic folding intermediates seemed to form
`asynchronously over a range of time scales.
`Equilibrium analogs of folding intermediates
`called molten globules yielded mixed results,
`sometimes agreeing with kinetic folding in-
`formation and sometimes not. Baldwin’s ar-
`ticle served to alert the experimental protein
`folding community to the new view of het-
`erogeneous folding and helped to establish
`the current paradigm of a multipath fun-
`neled energy landscape.
`The distinction between the classical view
`of a more or less single pathway through
`defined intermediates and the disordered
`many-pathway new view has broad signifi-
`cance for the understanding of protein bio-
`physics and biological function. The question
`could be resolved by determining experimen-
`tally the structure of the intermediate forms
`that bridge between unfolded and native
`states in real proteins, but this effort has
`turned out to be exceptionally difficult. The
`usual methods, crystallography and NMR,
`cannot define partial structures that form and
`decay in less than 1 s. Experimentalists have
`been forced to depend on spectroscopic
`methods (fluorescence, CD, IR) that can fol-
`low kinetic folding in real time but are blind
`to the specifics of structure and so allow the
`
`Author contributions: S.W.E. and L.M. wrote the paper.
`
`The authors declare no conflict of interest.
`
`This article is a PNAS Direct Submission.
`
`1To whom correspondence should be addressed. Email: engl@mail.
`med.upenn.edu.
`
`www.pnas.org/cgi/doi/10.1073/pnas.1411798111
`
`PNAS | November 11, 2014 | vol. 111 | no. 45 | 15873–15880
`
`Downloaded from https://www.pnas.org by 71.69.184.238 on March 27, 2023 from IP address 71.69.184.238.
`
`Inari Ex. 1016
`Inari Agric. v. Corteva Agriscience
`PGR2023-00022
`Page 00001
`
`
`
`encounter a sizeable kinetic barrier and so
`reach significant population.
`Initial results obtained for cytochrome c
`(Cyt c; Fig. 2) showed that approximately half
`of the molecules form their sequentially
`remote but structurally contiguous N- and
`C-terminal helical segments early (12 ms),
`suggesting the formation of a specific on-
`pathway native-like intermediate. However,
`Baldwin’s review (21) emphasized the asyn-
`chrony in these kinetic results; some of the
`molecules protect their N- and C-terminal
`helices early, whereas others do so at later
`times, along with other regions. Other proteins
`similarly studied have often yielded analogous
`results. This heterogeneous behavior conflicts
`with a well-defined sequential pathway model
`but seems more consistent with the new view
`of different routes, rates, and traps.
`One now knows that the heterogeneous
`folding seen in kinetic HX pulse labeling
`experiments can be due to previously un-
`recognized experimental issues. One prob-
`lem concerns the tendency of refolding
`proteins to transiently aggregate, especially
`at the high concentrations used to facilitate
`the preparation of samples for NMR analy-
`sis (25, 26). Another unexpected effect was
`revealed in a sophisticated analysis of the
`HX pulse labeling experiment, which showed
`that intermediates populated during kinetic
`folding may repeatedly unfold and refold on
`a fast time scale. Sites that are already folded
`and protected can nevertheless become
`H-labeled during the intense high pH in-
`terrogation pulse even with only a single
`reversible unfolding during the pulse (the
`so-called EX1 HX regime; 50-ms pulse,
`−1 for Fig. 2) (27,
`back unfolding rate 12 s
`28). Other HX NMR pulse labeling studies
`have been compromised by similar aggrega-
`tion and HX EX1 behavior, and also by the
`inability to differentiate mixtures of states due
`to the ensemble averaging that occurs when
`NMR is used to obtain a single measurement
`for each individual residue.
`
`HX Pulse Labeling and MS Analysis. A
`recently developed variant of the HX pulse
`labeling experiment can produce a more
`explicit description of the kinetic folding
`process. The new technology replaces NMR
`analysis with a mass spectrometry technique
`(HX MS) that allows folding experiments
`at 1,000-fold lower concentration and thus
`excludes aggregation. As before, the un-
`folded and D-exchanged protein is mixed
`into folding conditions and is subjected to
`a D to H exchange labeling pulse after var-
`ious folding times. The labeling pulse can
`be adjusted to avoid or to study the back-
`unfolding behavior of transiently populated
`
`B
`
`Collapse
`
`Entropy
`
`Molten globule
`states
`
`Q
`
`Energy
`
`Transition state
`
`Folding
`intermediates
`1.0
`
`Native
`
`N
`
`A
`
`TS
`
`U
`
`search
`
`(A) The classical view of a defined folding pathway, and (B) the new view of multiple routes through a funneled
`Fig. 1.
`landscape. Reprinted with permission from ref. 13. Dashed line in A illustrates the insertion of an optional error-dependent
`kinetic barrier, which can affect some population fraction and not others and thus mimic multipathway folding.
`
`possibility of alternative folding mechanisms.
`Theorists have attempted to avoid these dif-
`ficulties by simulating the folding process in
`computers. Theory-based computer simula-
`tions can be remarkably powerful. For exam-
`ple, one can compute the path of a multiton
`rocket through 150 million miles of
`free
`space to a pinpoint landing on Mars. The
`equations that govern space flight are known
`precisely (22), computer power is ample, and
`the track to be controlled is clear. Computing
`the structural journey of minuscule protein
`molecules through submicrons of space has
`proved to be more difficult. The computer
`power required to track the folding process
`at the level of thermally driven residue-level
`dynamics is immense. The forces that direct
`protein folding are delicately balanced, inter-
`locking, and not describable in exact terms.
`The reaction path(s) to be mined from the
`mass of computer data are unknown.
`For both the classical and new view models,
`Fig. 1 implies that the structure of folding
`intermediates and their pathway connec-
`tions might be determined in three different
`ways: (i) as intermediates that reach signif-
`icant occupancy during kinetic folding; (ii)
`as conformationally excited states that exist
`at their equilibrium Boltzmann level in the
`high free energy space above the native pro-
`tein; and (iii) as modified molten globule
`forms made by destabilizing the native pro-
`tein so that higher energy states become the
`lowest free energy equilibrium form. Exper-
`imental advances have accomplished all
`three of these approaches. At any time point
`during kinetic folding, a briefly present fold-
`ing intermediate can be marked in a struc-
`ture-sensitive way by hydrogen exchange
`(HX) pulse labeling and defined by later
`analysis. The structure of partially folded
`states minimally populated in the high free
`
`energy space can be determined by HX la-
`beling, sulfhydryl labeling, and NMR meth-
`ods. Partially unfolded molten globules can
`be labeled in a structure-sensitive way by
`hydrogen–deuterium (H-D) exchange and
`analyzed later by site-resolved NMR in the
`reformed native state or directly by mass
`spectrometry.
`These advances now make it possible to
`determine the structure and properties of
`intermediate protein folding states and their
`pathway connections and so place the study
`of folding pathways on the solid ground of
`structural biology. Experiment can now ask
`whether proteins fold through a limited num-
`ber of distinct obligatory intermediate struc-
`tures in an ordered kinetic sequence as
`suggested in Fig. 1A, or through a heteroge-
`neous collection of independent multiply par-
`allel forms and routes as in Fig. 1B, or through
`some other combination of conformations.
`
`Intermediates During Kinetic Folding
`HX Pulse Labeling and NMR Analysis. It
`first became possible to obtain detailed
`structural
`information on briefly present
`protein folding intermediates with the de-
`velopment of the HX pulse labeling meth-
`od (23, 24). The initially unfolded and
`D-exchanged protein is mixed into folding
`conditions and then, at various times during
`folding, is subjected to a short, selective D to
`H exchange labeling pulse. The protein folds
`to the native state, and D vs. H placement is
`analyzed by NMR to identify amide sites that
`were already protected (still D-labeled) or not
`yet protected (H-labeled) at the time of
`the labeling pulse. The results provide a series
`of snapshots during the time course over
`which folding converts identifiable main chain
`amides to a protected H-bonded condition.
`The results will detect
`intermediates that
`
`15874 | www.pnas.org/cgi/doi/10.1073/pnas.1411798111
`
`Englander and Mayne
`
`Downloaded from https://www.pnas.org by 71.69.184.238 on March 27, 2023 from IP address 71.69.184.238.
`
`PGR2023-00022 Page 00002
`
`
`
`PERSPECTIVE
`
`temporal detail (30). The overlapping peptide
`MS results allow transiently formed inter-
`mediates to be defined at near amino acid
`resolution. In each case they are composed of
`sets of residues that form well-defined
`H-bonded elements in the native protein
`(foldons). The results display a stepwise as-
`sembly of the native structure, first helix A +
`strand 4 (blue in Fig. 4), then the neighboring
`helix D + strand 5 (green), then the inter-
`acting B/C helix (yellow), and finally the
`terminal segments (red). The yellow foldon
`does not reach complete protection because
`of some back-unfolding (∼20%) during the
`10-ms HX labeling pulse which, fortuitously,
`helps to distinguish the yellow and green
`foldons along with the small difference in
`their formation rates seen in the renormal-
`ized kinetic phases (Fig. 4, Inset).
`We used the HX MS method to reexamine
`the ambiguous kinetic folding results of Cyt
`c measured before by HX NMR (Fig. 2).
`Low folding concentration (2 μM) avoided
`the previous transient aggregation problem,
`and a short labeling pulse (10 ms rather
`than 50 ms) minimized spurious labeling
`due to back-unfolding during the pulse.
`The results confirm that all of the proteins
`
`V11
`C14
`W59
`
`Tertiary H-bond
`
`L64
`I75
`L98
`K100
`
`60s and 70s helix
`
`N-Terminal helix
`
`C-Terminal helix
`
`0.01
`
`0.1
`
`1
`
`10
`
`0.01
`
`0.1
`
`1
`
`10
`
`Time (s)
`
`1
`
`.8
`
`.6
`
`.4
`
`0.
`
`2
`
`Proton occupancy
`
`Fig. 2.
`Initial HX NMR pulse labeling results for Cyt c (24). A brief D to H labeling pulse imposed after various folding
`times was used to track the increasing protection (decreasing H-labeling) of individual residues and the segments that
`they represent. The results suggested early formation of a native-like N/C bihelical folding intermediate. Baldwin’s
`review (21) noted the kinetic asynchrony, with the N- and C-terminal helical segments in different molecules folding at
`different rates. Later work shows that the asynchrony is caused by protein aggregation and by HX pulse breakthrough
`due to back-unfolding of the transiently populated intermediate during the H-labeling pulse (50 ms).
`
`tendency to form helical structure. The more
`protected segments (black curves in Fig. 3)
`are not the ones that form the emergent 7 s
`native-like foldon.
`The same technology was able to resolve
`the entire folding trajectory of Ribonuclease
`H (155 residues; Fig. 4) in structural and
`
`43 ms pulse
`
`B
`
`C
`
`21-43
`Unfolded
`
`D
`
`76-89
`Unfolded
`
`0.2 s
`
`1 s
`
`5 s
`
`0.5 s
`
`15 s
`
`60 s
`
`30 s
`
`Native
`
`180 s
`
`Native
`
`180 s
`
`Native
`
`346-370
`Unfolded
`
`0.5 s
`
`15 s
`
`60 s
`
`0.1
`
`10
`1
`Folding Time (s)
`
`100
`
`1000
`
`2570
`
`2580
`Mass (Da)
`
`2590
`
`1590
`
`1595
`Mass (Da)
`
`1600
`
`2660
`
`2670
`Mass (Da)
`
`2680
`
`A
`
`1.0
`
`0.8
`
`0.6
`
`0.4
`
`0.2
`
`0.0
`
`noitcarF noitalupoP yvaeH
`
`E
`
`76-89
`
`261-285
`
`21-43
`
`Δ mass
`
`ces7
`
`cesm1
`
`ces001
`
`Pulse labeling HX MS results for maltose binding protein (29). (A) The time-dependent folding (HX protection)
`Fig. 3.
`of 116 highest-quality MBP peptide fragments representing different protein regions. Black kinetic curves show the
`slow time course for folding of peptide fragments that are most protected in the initial collapse. (B–D) Representative
`HX-labeled MS fragments from different protein regions (colored) define the separate folding steps, display their
`concerted two-state nature, measure their formation rates, and show that the entire protein population (>95%)
`experiences the same steps. (E) The course of folding. On dilution from denaturant into folding conditions, MBP
`rapidly collapses into a heterogeneous polyglobular state (SAXS envelope reconstruction in gray) with widespread low
`level HX protection, then slowly folds (kinetic curves in A) through an initial native-like intermediate (blue, τ = 7 s) and
`later kinetically unresolved steps (green, gray, red; τ ∼60 s to 120 s; fastest green segments shown in C and E).
`Mutations known to greatly slow folding (stars) are all within the 7 s intermediate.
`
`intermediates. To terminate labeling and
`prepare for analysis, the selectively labeled
`protein is plunged into slow HX conditions
`(low pH and temperature), then cleaved into
`short
`fragments, and the fragments are
`separated and analyzed by fast HPLC and
`mass spectrometry. The two examples so far
`published, illustrated in Figs. 3 and 4, pro-
`vide detailed pathway information.
`When the large (370 residues) two-
`domain and aggregation prone maltose
`binding protein (MBP) is diluted into
`folding conditions at <1 μM concentration
`it does not aggregate, but it does rapidly
`collapse into a dynamic polyglobular state
`with heterogeneous low level HX protection
`(Fig. 3)
`(29). This condition might be
`expected to spawn multiple folding routes
`as in the new view model, but it does not.
`The microsecond and millisecond time
`scales pass with no indication of native-like
`structure formation, perhaps because con-
`formational searching in the collapsed state
`is difficult. Ultimately, the entire protein
`population assembles sequentially remote
`segments into a specific native-like interme-
`diate with a single exponential time con-
`stant of 7 s (blue in Fig. 3). Other peptides
`then report on later folding events that
`move to the native state over a broader time
`scale (60–120 s), suggesting several folding
`steps, but their kinetics are too compressed
`to allow clear resolution. These experiments
`largely avoided the back-unfolding HX la-
`beling artifact by using a short labeling
`pulse (12 ms). Longer pulses (up to 42 ms)
`allowed the back-unfolding of the weakly
`protected regions in the initially collapsed
`form to be studied (29). Higher protection
`seems to correlate with the amphipathic
`nature of different segments and their
`
`Englander and Mayne
`
`PNAS | November 11, 2014 | vol. 111 | no. 45 | 15875
`
`Downloaded from https://www.pnas.org by 71.69.184.238 on March 27, 2023 from IP address 71.69.184.238.
`
`PGR2023-00022 Page 00003
`
`
`
`method, although much less used, is more
`definitive. It finds the same distinct partially
`formed native-like structure for the entire
`folding population (36). These results favor
`the distinct pathway hypothesis.
`Many proteins, especially small ones, tend
`to fold and unfold in a kinetically two-state
`manner, each with a single exponential rate.
`The same kinetic barrier is rate-limiting in
`both folding and unfolding directions, and
`their ratio gives the correct equilibrium sta-
`bility constant. In this case, intermediates will
`not be seen to populate either before or after
`the barrier, whether they exist or not, and the
`usual kinetic folding experiment
`simply
`cannot distinguish whether separate pathway
`steps do or do not occur. For example, the
`defined pathway model in Fig. 1A will pro-
`duce two-state kinetic folding and unfolding
`(and linear chevron plots) in the absence of
`the inserted misfolding barrier noted. Un-
`fortunately, the absence of explicit evidence
`for multiple kinetic steps is often taken, in-
`correctly, as evidence for
`their absence.
`However, again here one can note that the
`observation of the same folding rate for the
`whole protein population tends to favor
`a single common pathway rather than mul-
`tiple independent paths.
`Thus, much of available kinetic infor-
`mation is unable to distinguish alterna-
`tive pathway behaviors, although some
`observations can be deemed supportive of
`the distinct pathway model.
`
`Multiple Pathways and Misfolding. Some
`other optically measured kinetic results have
`been thought to support multiple pathways,
`although only a small number. The conflict is
`often due to the chance occurrence of partial
`misfolding, which inserts an optional kinetic
`barrier into the folding pathway, differently
`affecting the folding of different population
`fractions (37). In this case kinetic folding will
`appear heterogeneous and asynchronous,
`even when all of the molecules fold through
`the same sequence of intermediate structures.
`This barrier-based problem is common and
`has greatly confused protein folding studies.
`Known optional errors include aggregation
`(26), partial proline mis-isomerization (38),
`incorrect disulfide pairing (39), nonnative
`hydrophobic clustering (40), and partial
`heme mis-ligation (24). (Note: The term
`“misfolding” has become associated with
`amyloid formation; we use it in a more
`general sense.)
`In a prime example, folding experiments
`on the large TIM barrel protein α-Trp syn-
`thase found several kinetically distinct
`population fractions and intermediates, sug-
`gesting four parallel folding tracks (41).
`
`D
`
`E
`
`F
`
`73-103 +6
`
`108-117 +2
`
`137-155 +2
`
`54-67 +2
`Unfolded
`
`1
`
`0.9
`
`0.8
`
`0.7
`
`10
`
`100
`
`10
`100
`Time(ms)
`
`1000
`1000
`
`C
`
`9 ms
`
`15 ms
`
`38 ms
`
`176 ms
`
`720 ms
`
`3 sec
`
`8 sec
`
`15 sec
`
`20 sec
`
`30 sec
`
`Native
`
`A
`
`1.0
`
`0.9
`
`0.8
`
`0.7
`
`0.6
`
`0.5
`
`Population Fraction
`
`B
`
`0.6
`
`Population Fraction
`
`0.4
`
`0.2
`
`0
`
`0
`
`5
`
`10
`
`20
`15
`Time(sec)
`
`25
`
`30
`
`769
`
`771
`773
`z/m
`
`775
`
`635
`
`637
`
`z/m
`
`541
`
`543
`z/m
`
`545
`
`1019
`
`1023
`z/m
`
`1027
`
`Pulse labeling HX MS results for Ribonuclease H (30). (A and B) Kinetic curves for time-dependent HX
`Fig. 4.
`protection of peptide fragments that define the blue, green, yellow, and red foldons. (C–F) HX MS pulse labeling
`results for representative peptide fragments show the time course and two-state concerted nature of foldon folding
`steps, and that the entire protein population (>95%) experiences the same sequence of concerted steps in a single
`dominant pathway. The yellow foldon does not reach complete protection because of partial labeling due to back-
`unfolding during the 10-ms labeling pulse, which helps to distinguish the yellow foldon from the green foldon, along
`with the small difference in their formation rates seen in the renormalized kinetic phases (A, Inset).
`
`fold and dock their two terminal helices in
`a single early step (∼12 ms), and the rest of
`the native structure folds later. A method
`for studying kinetic folding intermediates
`at equilibrium, known as native state HX,
`described below,
`independently confirms
`this result and elucidates the entire sub-
`sequent folding pathway.
`Unlike all ensemble-level measurements
`including HX NMR, these pulse labeling HX
`MS results provide snapshots that show the
`structurally different populations that are
`already formed and not yet formed at any
`time point during kinetic folding rather than
`a potentially misleading population average.
`The HX MS data show that folding occurs in
`a stepwise manner and that each kinetic step
`is individually two-state representing the
`cooperative formation of an additional
`folding unit (foldon) (MS data in Figs. 3 and
`4). The time-dependent MS data show that,
`once each foldon unit is formed, it remains
`in place as subsequent foldons are added,
`demonstrating a stepwise buildup through
`distinct, progressively more folded forms.
`Essentially the entire refolding population
`joins synchronously in the same stepwise
`sequence of
`intermediate structures,
`in-
`dicating a single dominant folding pathway.
`The data show explicitly that less than 5%
`of
`the protein population folds through
`any other pathway(s). However, other Cyt c
`results do detect minimal branching in
`the special case where the prior structure
`can support two different but essentially
`
`equivalent subsequent steps; either step
`can occur before the other (31).
`These results support a picture of protein
`folding in which the entire protein pop-
`ulation folds
`through the same distinct
`intermediates and kinetic barriers in the same
`defined pathway, as in Fig. 1A. A seminal
`observation is that the intermediates form
`by assembling pieces of the native protein,
`called foldons.
`
`Other Kinetic Studies. A large fraction of
`the protein folding literature is directed at
`finding the determinants of folding rates.
`Prominent issues, highlighted by reviewers,
`concern the nucleation–condensation model,
`the φ value analysis method, and two-state
`folding. Is the distinct pathway model con-
`sistent with current kinetic information?
`The nucleation–condensation model sug-
`gests that folding is initiated by a nucleation
`event that potentiates subsequent structural
`consolidation (32, 33). The φ value analysis
`method attempts to define the parts of
`a protein that gain structure in the initial
`rate-limiting transition state, the nucleating
`event, by measuring the effect of specific
`mutations on folding rate (34). The usual
`result, that φ values are small and fractional
`(∼0.3 ± 0.2) (35), can be explained either by
`multiple pathways or by the likelihood that
`flexible partially folded structures can ac-
`commodate disruptive mutations more easily
`than the rigid native state. Thus, implications
`for the question of one pathway vs. many
`are ambiguous. A related ψ value analysis
`
`15876 | www.pnas.org/cgi/doi/10.1073/pnas.1411798111
`
`Englander and Mayne
`
`Downloaded from https://www.pnas.org by 71.69.184.238 on March 27, 2023 from IP address 71.69.184.238.
`
`PGR2023-00022 Page 00004
`
`
`
`PERSPECTIVE
`
`Cyt c (Fig. 6) repeatedly unfold and refold,
`accessing partially unfolded high energy
`states with ΔGo of 4–13 kcal/mol above the
`native state, corresponding to steady-state
`
`−3 and 10−9 of the
`populations between 10
`total protein. These results identify foldon
`unfolding units in terms of their detailed
`residue composition, specify the free energy
`of the partially unfolded states relative to the
`native state, and can measure unfolding and
`refolding rates.
`However, these results do not fully identify
`the different partially unfolded forms (PUFs).
`At each intermediate state, sites that have
`already exchanged to D are invisible (NMR);
`one cannot tell whether they are structured
`or not in the given intermediate. Therefore,
`one cannot tell whether the different foldons
`simply unfold independently or in a pathway
`sequence, as posed in Fig. 5E. This is unlike
`the kinetic HX MS experiments in Figs. 3 and
`4, where the pulse labeling approach directly
`provides a snapshot of the folded condition
`of all of the residues during the folding pro-
`cess. Ultimately, a series of “stability labeling”
`experiments showed that the high energy
`states seen for Cyt c do represent a quantized
`stepwise series of progressively more un-
`folded PUFs, as pictured in Fig. 5E (48). In
`the unfolding direction, the transition to each
`higher energy PUF unfolds one more foldon
`in a sequential pathway manner. Because
`these experiments were done under equi-
`librium native conditions (pD 7, 30 °C),
`each uphill unfolding step must be matched
`by an equivalent refolding step. The down-
`hill sequence defines a stepwise sequential
`folding pathway.
`In detailed confirmation, these equilibrium
`results identified the same N/C bihelical
`foldon (blue) as did the kinetic pulse label-
`ing experiment. The pulse labeling experi-
`ment places this state as first in the folding
`sequence; the native state HX experiment
`places it as last in the unfolding sequence.
`The initially folded N/C bihelical PUF
`accumulates in Cyt c kinetic folding when it
`encounters a histidine to heme mis-ligation
`barrier; both peripheral histidines of Cyt c
`are placed on and therefore block formation
`of the green foldon segment, which is pro-
`grammed to fold next. An independent
`kinetic mode native state HX experiment
`showed that the various foldons unfold in
`the kinetic order shown in the rising ladder
`in Fig. 5E. The unfolding rate for a first
`unfolding step (by EX1 HX) accurately
`matches the independently measured Cyt c
`global unfolding rate in two-state unfolding
`conditions (49).
`Distinct native-like pathway intermediates
`have been found for other proteins by HX
`
`Subsequent work found that each additional
`track could be suppressed, one at a time, by
`mutational replacement of one or more pro-
`line residues or by addition of a prolyl isom-
`erase (42), as expected for a defined pathway
`interrupted in some fraction of the folding
`population by optional mis-isomerized pro-
`line barriers. In similar work, multiple kinetic
`folding phases observed by optical methods
`for hen egg lysozyme (43, 44) and Staphylo-
`coccal nuclease (45) were also fit by the
`authors to multiple pathway models, but it
`was shown that the data can be fit at least as
`well by a single pathway in which some
`fraction of the molecules experience an error
`that slows its folding (37, 46). In the absence
`of structural information, it is not possible to
`distinguish between a multiple pathway in-
`terpretation and a given pathway with op-
`tional barriers. This can be seen intuitively by
`considering the insertion of an optional bar-
`rier at any step in a well-defined pathway as
`in Fig. 1A (dashed line).
`The common occurrence of on-pathway
`optional errors has led to other incorrect sug-
`gestions:
`that well-populated kinetic inter-
`mediates are grossly misfolded artifacts rather
`than constructive on-pathway structures with
`some particular misfolding error; that visible
`intermediates hinder rather than promote
`folding because visible intermediates and
`slowed folding occur together. Other literature
`
`results have been interpreted in terms of
`multiple pathways, either during unfolding at
`conditions far from native, or during folding
`but potentially confounded by ensemble av-
`eraging, or by complex spectroscopic phases
`that allow different interpretations, as well as
`by spurious barriers due to optional errors. In
`all of these cases, the structural information
`that is necessary to support a definitive con-
`clusion is absent.
`More definitive information comes from
`the kinetic HX MS experiments illustrated
`above, which do document a distinct path-
`way, and from a number of equilibrium-
`based methods described in the following,
`which have been able to reveal multiple
`native-like partially folded on-pathway inter-
`mediates, even when simple folding seems to
`be kinetically two-state.
`
`Intermediates Observed at Equilibrium
`Intermediates as Conformationally Ex-
`cited States. An experiment called equilib-
`rium native state HX, explained in Fig. 5, first
`detected and described cooperative foldon
`units (2, 47). The experiment uses low con-
`centrations of denaturant (or other destabi-
`lant) to promote sizeable unfolding reactions
`to the point where they come to dominate
`the H-exchange of the amides that they ex-
`pose. The results, reproduced in Fig. 5A,
`showed that specific structural elements of
`
`U
`
`N
`
`C
`
`12.8
`E
`10.0
`
`D
`
`7.4
`
`6.0
`
`4.3
`
`0.0
`
`A
`
`C14
`H18
`A15
`
`B
`
`L68
`
`M65
`E69
`F36 Y67
`N70
`E66
`
`A96 Y97
`
`L94
`
`W59 N1H
`Y74 G37
`I75
`I85
`
`K60 L64
`
`12
`
`8
`
`4
`
`F10
`
`K13
`V11 Q12
`I9
`K7
`K8
`
`12
`
`L98
`
`R91
`D93
`K100
`
`E92
`
`A101
`
`8
`
`4
`
`ΔG(kcal/mol)
`
`0.0
`
`0.5
`
`1.0
`
`0.0
`1.5
`[GdmCl] (M)
`
`0.5
`
`1.0
`
`1.5
`
`independent
`
`sequential
`
`Initial equilibrium native state HX NMR results for Cyt c (47). (A–D) HX rates of many individual Cyt c residues,
`Fig. 5.
`measured by NMR as a function of low levels of added denaturant far below the melting transition, are plotted in
`terms of the free energy of the exposure reaction that determines each amide HX rate. HX governed by a small local
`fluctuation is insensitive to denaturant and produces a horizontal curve. HX determined by a large un