throbber
Proc. Natl. Acad. Sci. USA
`Vol. 95, pp. 11158–11162, September 1998
`Biophysics
`
`Clustering of low-energy conformations near the native structures
`of small proteins
`DAVID SHORTLE*, KIM T. SIMONS, AND DAVID BAKER†
`Department of Biochemistry, University of Washington School of Medicine, Seattle, WA 98195
`
`Edited by Peter G. Wolynes, University of Illinois, Urbana, IL, and approved July 17, 1998 (received for review May 13, 1998)
`
`Recent experimental studies of the dena-
`ABSTRACT
`tured state and theoretical analyses of the folding landscape
`suggest that there are a large multiplicity of low-energy,
`partially folded conformations near the native state. In this
`report, we describe a strategy for predicting protein structure
`based on the working hypothesis that there are a greater
`number of low-energy conformations surrounding the correct
`fold than there are surrounding low-energy incorrect folds. To
`test this idea, 12 ensembles of 500 to 1,000 low-energy struc-
`tures for 10 small proteins were analyzed by calculating the
`rms deviation of the Ca coordinates between each conforma-
`tion and every other conformation in the ensemble. In all 12
`cases, the conformation with the greatest number of confor-
`mations within 4-Å rms deviation was closer to the native
`structure than were the majority of conformations in the
`ensemble, and in most cases it was among the closest 1 to 5%.
`These results suggest that, to fold efficiently and retain
`robustness to changes in amino acid sequence, proteins may
`have evolved a native structure situated within a broad basin
`of low-energy conformations, a feature which could facilitate
`the prediction of protein structure at low resolution.
`
`Prediction of the structures of proteins from their amino acid
`sequence traditionally has followed one approach. First, a
`candidate conformation is generated, either by a de novo
`conformational search method or by turning to the database of
`known protein structures. This conformation then is scored for
`the quality of the match between the sequence of the target
`protein and the spatial positions forced on the residues when
`placed in the candidate conformation. This process is contin-
`ued until practical limitations force termination of the search,
`at which point the conformation with the most favorable score
`is considered to be the best candidate for the structure of the
`target protein.
`A central assumption underlying this standard approach is
`that the native state is the conformation of lowest energy. The
`configurational entropy of the protein chain cannot be in-
`cluded in the scoring function because the focus is on finding
`one conformation. Consequently, this approach is not rigor-
`ously based on Anfinsen’s hypothesis that the native state lies
`at the global minimum in free energy (1). Interpreted literally,
`the Anfinsen hypothesis implies that, because the native state
`of a protein is an ensemble of many similar conformations, the
`target or goal of protein structure prediction should be this
`ensemble rather than just a single conformation. Because this
`ensemble of conformations is probably very narrowly distrib-
`uted around the mean, it is often considered a safe assumption
`to ignore this source of complexity and concentrate on one
`representative conformation, which in all likelihood would
`approximate the mean of the ensemble.
`
`Proteins participate in a second, much larger ensemble of
`conformations, usually referred to as the ‘‘denatured state’’ (2,
`3, 4). In the past few years, considerable attention has been
`given to experimental and theoretical characterization of this
`complex and structurally diverse ensemble. Although current
`physical methods do not provide as high a resolution descrip-
`tion of denatured states as they do for native states, the
`emerging picture is one of significant population of transient
`native-like local structures weakly coupled to each other (5, 6).
`An analysis of long range structure in an expanded denatured
`state of staphylococcal nuclease suggests that many of the
`global topological features of the native state are retained in
`the denatured state (7). In other words, the ensemble average
`structure of the denatured state resembles the native state,
`albeit at very low resolution.
`In addition to forming a much more diverse ensemble, the
`conformations in the denatured state may have their structure
`and dynamic behavior determined by a smaller number of
`energy terms. Several authors have argued that burial of
`hydrophobic surface is the dominant force shaping structure in
`the denatured state ensemble (3, 5). In addition, the highly
`dynamic character and the much lower density of atoms
`suggest that dispersion forces, hydrogen bonds, and salt bridges
`may contribute little to the properties of denatured proteins.
`If the energetics are less dependent on the high resolution
`details of chain–chain interactions, the ensemble-averaged
`properties of the denatured state might be easier to predict
`than those of the native state. The database-derived energyy
`scoring functions currently used for structure prediction are
`thought to model primarily hydrophobic interactions (8, 9, 10)
`and thus may be suitable for prediction of structure in the
`denatured state ensemble.
`If the ensemble-averaged topology (or low resolution struc-
`ture) of the denatured state is approximately the same as that
`of the native state, the basin or minimum in the energy
`landscape containing both native and denatured states must
`have a partition function that is much larger than any other
`ensemble of structurally similar conformations. This idea is the
`fundamental hypothesis underlying the approach to structure
`prediction described in this paper.
`
`RESULTS
`Current knowledge of the residual structure in the denatured
`state and its energetic basis is too limited to reach definitive
`conclusions about the appropriateness of this large, dynamic
`ensemble as a target for predicting structural features of folded
`proteins. Therefore, we consider arguments concerning the
`structural correspondence between the denatured state and
`the native fold only as a general point of departure for the
`analysis reported here. In this spirit, we present two conjec-
`
`The publication costs of this article were defrayed in part by page charge
`payment. This article must therefore be hereby marked ‘‘advertisement’’ in
`accordance with 18 U.S.C. §1734 solely to indicate this fact.
`© 1998 by The National Academy of Sciences 0027-8424y98y9511158-5$2.00y0
`PNAS is available online at www.pnas.org.
`
`This paper was submitted directly (Track II) to the Proceedings office.
`Abbreviation: rmsd, rms deviation.
`*Permanent Address: Department of Biological Chemistry, The Johns
`Hopkins University School of Medicine, Baltimore, MD 21205.
`†To whom reprint requests should be addressed. e-mail: baker@ben.
`bchem.washington.edu.
`
`11158
`
`APOTEX EX1022
`
`Page 1
`
`

`
`Biophysics: Shortle et al.
`
`Proc. Natl. Acad. Sci. USA 95 (1998)
`
`11159
`
`FIG. 1. Schematic diagram of a hypothetical folding energy land-
`scape. The x axis corresponds to a generalized structure coordinate
`(17, 26). The solid line corresponds to the internal free energy (17),
`and the dashed line corresponds to the value of a database-derived
`scoring function such as the one used in this work. The scoring function
`follows the true potential because it is sensitive to hydrophobic burial
`but produces noise and fails to detect the sharp drop in energy of the
`native state because of inaccuracies in quantifying hydrogen bonds,
`electrostatic, and van der Waals interactions. However, the scoring
`function is able to detect the higher density of low-energy states in the
`broad region surrounding the native state.
`
`tures as working assumptions to be verified by future experi-
`mental and computational work rather than as established
`facts.
`The first assumption is that the minimum in which native-
`like conformations reside is broader than any other minimum
`at the lowest energy levels in the folding landscape. The second
`assumption is that the breadth of this minimum results from
`the long range character of hydrophobic interactions and
`consequently should be detectable by using database-derived
`energyyscoring functions, which capture some of the features
`of hydrophobic interactions. Together, these two assumptions
`are equivalent to the statement that effective burial of hydro-
`phobic residues can be retained throughout a larger range of
`
`FIG. 2. Histograms of the rmsd (Ca coordinates) from the native
`state to members of each of the Park–Levitt sets. An arrow marks the
`position of the center of the largest cluster of conformations by using
`a 4-Å rmsd cutoff. The bin intervals along the x axis are in 0.5-Å
`increments.
`
`structural perturbations of the native topology than of any
`other topology.
`These assumptions are illustrated in a schematic energy
`landscape shown in Fig. 1, which positions the native state in
`a deep, narrow well located near the center of a broad, shallow
`minimum (solid line) (11). In a search to find the structure of
`a protein of known sequence, a relatively coarse grid search
`may generate multiple conformations within this broad min-
`imum. Although an energy function that does not correctly
`quantify dispersion interactions, hydrogen bonds, and electro-
`static interactions (Fig. 1, dashed line) may miss the steep drop
`in energy for conformations that comprise the native state, it
`may succeed in detecting the broad minimum if hydrophobic
`interactions are more or less correctly modeled.
`To the extent that these two assumptions are correct, protein
`structure at low resolution may be predicted by carrying out a
`coarse-grained sampling of conformational space and choos-
`ing the low-energy conformation having the largest number of
`structurally related low-energy conformations. In a situation
`such as that depicted in Fig. 1, relatively uniform sampling of
`conformation space followed by identification of the largest
`cluster of structurally related low-energy conformations would
`be expected to find the region of conformation space that
`contains the native state.
`Two Sets of Computer-Generated ‘‘Decoy’’ Conformations
`for 10 Small Proteins. To test this idea, we examined large sets
`of structures generated by Park and Levitt (12) for eight small
`proteins—cro repressor (2cro), a fragment of ribosomal pro-
`tein L7yL12 (1ctf), the 434 repressor (1r69), calbindin (3icb),
`scorpion neurotoxin (1sn3), pancreatic trypsin inhibitor (4pti),
`ubiquitin (1ubq), and an electron transfer protein with an
`iron-sulfur center (4rxn). In brief, these structures were pro-
`duced by an exhaustive search in which the angular relation-
`ships between five or six segments of fixed secondary structure
`
`Table 1. Clustering by structural similarity of the 1,000 lowest
`energy conformations in the Park–Levitt sets
`rmsd
`center to
`native (rank in
`proximity to
`native state)
`4.7 (44)
`3.2
`6.7
`1.7 (2)
`2.9
`2.9
`3.3 (12)
`4.2
`3.9
`1.7 (1)
`2.0
`1.7
`8.1 (417)
`7.2
`6.9
`2.5 (10)
`5.0
`6.1
`2.0 (3)
`2.0
`3.7
`3.1 (13)
`3.2
`5.4
`
`rmsd lowest
`energy
`conformation
`5.6
`
`Mean rmsd
`of ensemble
`8.8
`
`2.0
`
`5.2
`
`4.7
`
`2.1
`
`10.0
`
`5.3
`
`8.4
`
`8.1
`
`8.0
`
`9.2
`
`8.4
`
`9.2
`
`9.2
`
`8.4
`
`Protein
`2cro
`
`1ctf
`
`1r69
`
`3icb
`
`1sn3
`
`4pti
`
`1ubq
`
`4rxn
`
`rmsd
`cutoff
`4 Å
`5 Å
`6 Å
`4 Å
`5 Å
`6 Å
`4 Å
`5 Å
`6 Å
`4 Å
`5 Å
`6 Å
`4 Å
`5 Å
`6 Å
`4 Å
`5 Å
`6 Å
`4 Å
`5 Å
`6 Å
`4 Å
`5 Å
`6 Å
`
`Cluster
`size
`44
`85
`151
`69
`132
`247
`45
`129
`257
`51
`83
`137
`18
`40
`120
`22
`44
`100
`52
`94
`154
`36
`82
`153
`
`Page 2
`
`

`
`11160
`
`Biophysics: Shortle et al.
`
`Proc. Natl. Acad. Sci. USA 95 (1998)
`
`more realistic chain representation was obtained by fitting
`backbone atoms (N, CA, C, O, CB) with correct bond distances
`and angles to the virtual Catrace by using fragments of known
`proteins (K.T.S. and D.B., unpublished work).
`A second, more structurally diverse set of conformations for
`four small all-helical proteins—staphylococcal protein A
`(1fc2), homeodomain repressor (1hdd), cro repressor (2cro),
`and calbindin (4icb)— also was analyzed. In previous work
`from this laboratory, ensembles of protein-like structures were
`generated by a Monte Carlo simulating annealing procedure in
`which segments of structure from the protein database were
`recombined to generate compact composites that scored well
`on the basis of a knowledge-based scoring function (13). To
`obtain local secondary structure compatible with the local
`structure, the protein segments used in this construction
`process were selected on the basis of similarity in amino acid
`sequence between the source protein and the target protein.
`To avoid biasing the generated set toward the wild-type
`conformation, all known structural homologues were removed
`from the set of proteins used as sources of structural segments.
`The 500 structures with the best overall score were saved for
`analysis. Unlike the conformations in the Park–Levitt sets, the
`conformations in the Simons sets showed considerable vari-
`ability in the exact position of some helices (13).
`The scoring functions used to evaluate the decoy structures
`were based on the decomposition
`P(structure(cid:239) sequence)’P(sequence(cid:239) structure)pP(structure),
`
`or Bayes’ rule, where P(x) is the a priori probability of the
`occurrence of x and P(y (cid:239) x) is the conditional probability of y,
`given the occurrence of x. The first term on the right hand side
`quantifies the fit between the sequence and the structure and
`consisted of a residue-environment term that depends primar-
`ily on the hydrophobic interaction and a specific pair interac-
`tion term that captures interactions such as salt bridges and
`disulfide bonds. The second term on the right hand side is the
`probability that a candidate conformation is a properly folded
`protein structure. For scoring the Park–Levitt sets, P(struc-
`ture) consisted of an excluded volume component plus a
`secondary structure packing term that is sensitive primarily to
`the relative orientation and packing of b strands (K.T.S. and
`D.B., unpublished work). For the Simons sets, this term only
`depended on excluded volume and the radius of gyration (13).
`Results of Cluster Analysis. Analysis of the eight Park–
`Levitt sets began with scoring each of the 200,000 confor-
`mations and saving the 1,000 with the best scores, which were
`defined as the low-energy ensemble for each protein se-
`quence. The rms deviation (rmsd) of the Cacoordinates was
`calculated for each pair of conformations within a set, and
`the results stored in a 1,000 3 1,000 ‘‘distance matrix.’’ For
`each of a series of distance cutoffs ranging from 4 to 6 Å, the
`conformation having the most neighboring conformations
`within the distance cutoff was selected as the most central
`conformation. These results are listed in Table 1.
`Fig. 2 shows the distribution of rmsd distances between the
`wild-type structure and each conformation in the set, along
`with the position of the center conformation for the largest
`cluster within 4-Å rmsd. As can be seen, in all cases but one
`(protein 1sn3, which has a very irregular structure held to-
`gether by four disulfide bonds), the center conformation is
`significantly more similar to the native structure than the
`average member of the low-energy ensemble. In addition, for
`six of these seven cases, the center conformation was in the
`closest 1.5% of conformations with regard to rmsd from the
`native state.
`More graphic displays of the structural similarities among
`the 1,000 conformations in the Park–Levitt sets for proteins
`4pti and 4rxn are shown in Fig. 3. By applying the statistical
`method of multidimensional scaling to the set of 500,000
`
`FIG. 3. Multidimensional scaling maps of the ensemble of confor-
`mations in the Park–Levitt sets of conformations for 4pti (Upper) and
`4rxn (Lower). The distance in rmsd between each pair of conforma-
`tions is projected onto two dimensions, retaining relative distance
`relationships so that two structurally similar conformations tend to be
`located near each other. The position of each conformation is indi-
`cated by a small white dot. The position of the native state is marked
`with a white diamond, and the three conformations with three lowest
`(best) energy scores are marked with white boxes. The gray scale value
`of each pixel is determined by the lowest energy conformation within
`that small region of the map, with black being the very lowest energies.
`
`were allowed to vary. After optimally fitting the native con-
`formation as a trace of virtual Caatoms with only four allowed
`torsion angles between residues, segments of the protein chain
`that closely followed a straight line (namely helices and
`strands) were identified, and residues between these straight
`segments became candidates for hinge angles (13). A total of
`10 moveable residue positions were introduced as adjacent
`pairs in four or five hinge regions in each protein. Because
`torsion angles only were allowed to assume one of four possible
`values, each starting structure could be converted to 410 –1
`alternative conformations by exhaustively enumerating all
`possible combinations of torsion angle values. After generating
`’1,000,000 conformations, the 80% with the greatest number
`of steric clashes were discarded, leaving a set of 200,000 decoy
`structures. After most remaining steric clashes were removed
`from these decoys by minimization in dihedral angle space, a
`
`Page 3
`
`

`
`Biophysics: Shortle et al.
`
`Proc. Natl. Acad. Sci. USA 95 (1998)
`
`11161
`
`Protein
`2cro
`
`Mean rmsd
`of ensemble
`8.2
`
`Table 2. Clustering by structural similarity of the 1,000 most
`compact conformations in the Park–Levitt sets
`rmsd
`Cluster
`rmsd center to
`cutoff
`size
`native
`4 Å
`68
`3.0
`5 Å
`123
`2.3
`6 Å
`210
`4.1
`4 Å
`33
`8.6
`5 Å
`43
`11.4
`6 Å
`84
`7.2
`4 Å
`18
`9.7
`5 Å
`31
`3.7
`6 Å
`87
`2.9
`4 Å
`16
`6.1
`5 Å
`28
`6.1
`6 Å
`53
`4.6
`4 Å
`18
`9.0
`5 Å
`61
`9.0
`6 Å
`149
`8.1
`4 Å
`19
`10.3
`5 Å
`32
`10.4
`6 Å
`68
`10.4
`4 Å
`23
`10.1
`5 Å
`37
`10.1
`6 Å
`74
`11.9
`4 Å
`27
`10.0
`5 Å
`47
`9.4
`6 Å
`102
`10.4
`
`1ctf
`
`1r69
`
`3icb
`
`1sn3
`
`4pti
`
`1ubq
`
`4rxn
`
`9.6
`
`8.8
`
`10.0
`
`9.4
`
`9.3
`
`9.8
`
`9.0
`
`pairwise distances, a map can be generated that represents an
`optimal solution of separating conformations in two dimen-
`sions in relationship to their distance in rmsd. The resulting
`physical distances between points on the map are not related
`linearly; only local rank ordering of distances is preserved by
`this scaling method. As can be seen, the conformations of very
`lowest energy are distributed fairly randomly over the maps of
`both proteins. Yet, in both cases, there is a significantly higher
`number density of low-energy conformations in a region near
`the native state.
`To demonstrate that low energy plays an important role in
`these observed clusters of native-like conformations, the
`200,000 conformations in each of the Park–Levitt sets were
`sorted by compactness, and the 1,000 most compact confor-
`mations were selected. Clustering this ensemble of conforma-
`tions in the same manner gave the results seen in Table 2. For
`five of the proteins, the centers of the largest clusters are no
`more similar to the native structure than an average confor-
`mation within the ensemble.
`
`FIG. 4. Histograms of the rmsd (Ca coordinates) from the native
`state to members of each of the Simons sets. An arrow marks the
`position of the center of the largest cluster of conformations by using
`a 4-Å rmsd cutoff. The bin intervals along the x axis are in 0.5-Å
`increments.
`
`Table 3. Clustering by structural similarity of the Simons sets of
`500 conformations
`
`Protein
`1fc2
`
`1hdd
`
`2cro
`
`4icb
`
`rmsd
`cutoff
`4 Å
`5 Å
`6 Å
`4 Å
`5 Å
`6 Å
`4 Å
`5 Å
`6 Å
`4 Å
`5 Å
`6 Å
`
`Cluster
`size
`410
`419
`431
`209
`296
`348
`16
`37
`90
`43
`89
`143
`
`rmsd center to
`native (rank in
`proximity to
`native state)
`4.0 (193)
`4.2
`5.3
`3.5 (17)
`4.5
`4.8
`4.4 (3)
`7.2
`4.9
`6.5 (82)
`7.0
`5.9
`
`rmsd of lowest
`energy
`conformation
`3.8
`
`Mean rmsd
`of ensemble
`4.9
`
`5.2
`
`7.9
`
`5.8
`
`6.8
`
`8.7
`
`9.4
`
`Surprisingly, for the three all-helical proteins, 2cro, 1r69,
`and 3icb, the cluster centers are considerably closer to the
`native structure than are the majority of configuration in the
`ensemble, although this trend may not be significant for 3icb.
`For 2cro, the centers of the 4-, 5-, and 6-Å groupings are closer
`to native than the cluster centers from the corresponding
`lowest energy ensemble. This may be a consequence of the fact
`that there are a limited number of ways to arrange helices to
`achieve a compact, self-avoiding configuration (15), and there
`are greater number of such well packed configurations around
`the native fold than other topological arrangements accessible
`in this ensemble.
`To analyze a second set of decoy structures constructed in
`an entirely different manner, the 500 conformations in the
`Simons sets also were clustered on the basis of structural
`similarity as measured by rmsd of the Ca coordinates (Table
`3). As shown in Fig. 4, these four sets of proteins contained
`very few conformations within 3-Å rmsd of the native state.
`Nevertheless, for the two proteins 1hdd and 2cro, the center of
`the largest 4-Å cluster was among the top 5% in rmsd. For the
`4icb set, which contained no member closer than 4-Å from the
`native state, the center of the largest 4-Å cluster had an rmsd
`from native of 6.2-Å, placing it only in the closest 20% of
`conformations. The fourth protein, 1fc2 or staphylococcal
`protein A, consists of a three-helical bundle. Not surprisingly,
`the level of structural diversity in the starting ensemble was
`relatively small. The bimodal distribution seen in Fig. 4 reflects
`the fact that there are only two topologies for packing three-
`helical bundles with very short connecting loops. The center of
`the largest cluster was only average in structural similarity to
`the native state, yet it did have the third helix on the correct
`side of the plane defined by the first two helices.
`
`DISCUSSION
`We describe a strategy for predicting protein structure at low
`resolution that goes beyond the standard approach of search-
`ing for the single lowest energy conformation. Instead of
`focusing on the lowest energy conformation, we search for the
`largest cluster of structurally related low-energy conforma-
`tions. In all 12 sets of low-energy conformations studied, the
`conformation with the most other conformations within 4-Å
`rmsd was much more similar to the native structure than the
`majority of the conformations, and, in 9 of the 12 cases, this
`conformation was more similar to the native structure than the
`lowest energy conformation in the set.
`Because the conformations in the Park–Levitt sets are
`rigidly fixed in secondary structure and have only four or five
`degrees of freedom for repositioning helices and strands, they
`correspond to a very limited search of conformation space. On
`
`Page 4
`
`

`
`11162
`
`Biophysics: Shortle et al.
`
`Proc. Natl. Acad. Sci. USA 95 (1998)
`
`the other hand, the algorithm used to generate the conforma-
`tions in the Simons sets explores many more degrees of
`freedom. In this case, the type and position of secondary
`structures are constrained only by similarity in sequence
`between short segments of the target protein under construc-
`tion and the template proteins from which structural segments
`were obtained. Thus, these sets represent a more realistic
`attempt to predict the structure of protein from sequence
`information alone. Overall, clustering of the Park–Levitt sets
`gave cluster centers closer to the native structure than did the
`Simons sets. Presumably, this is a consequence of the larger
`number of degrees of freedom used to generate the Simons
`sets. It will be important to determine in future work how
`readily the native minimum can be identified by using still
`more diverse conformational sampling strategies.
`The higher density of low-energy conformations near the
`native structure is not an artifact built into these sets of
`conformations by the algorithms used to generate them. That
`the native state does not occupy a unique position in an
`ensemble is fairly obvious for the Simons sets. In this case, only
`the sequence of the target protein was used in the build-up
`process. Because the structural segments used in this process
`were derived from a subset of proteins that did not include
`known homologues, there should be no intrinsic bias toward
`over-representation of the tertiary structure of the native state
`among the conformations generated.
`The conformations in the Park–Levitt sets, on the other
`hand, were derived from the native structure of the target
`protein, after it had been configured as a discreet state virtual
`Ca chain, by varying four or five hinge angles between fixed
`secondary structural segments. Because this construction pro-
`cess searched all allowed values of these angles, the resulting
`set of conformations is independent of the starting conforma-
`tion. In other words, if one picked the lowest energy member
`of the ensemble and repeated the construction algorithm,
`exactly the same conformational set would be regenerated.
`Thus, there is no bias toward the more native-like members of
`the ensemble.
`Why does clustering identify conformations considerably
`closer to the native structure than the conformation of lowest
`energy? One explanation is that the native topology provides
`the most robust arrangement of the chain for burying hydro-
`phobic residues, in the sense that large structural perturbations
`can be tolerated without steric clashes and with relatively small
`increases in hydrophobic exposure. For example, in a four-
`helix bundle protein, relatively large translations of the helices
`relative to one another plus moderate rotations of the helices
`preserve hydrophobic burial. Similarly, in ayb sandwich pro-
`teins, the two layers may undergo rotations and translations
`relative to one another without exposing large amounts of
`hydrophobic surface. From the standpoint of the “new view”
`of protein folding (15–17), the greater breadth of the native
`minimum is a consequence of the assumption that native
`interactions are stronger on average than nonnative interac-
`tions, which results in a lowering of the energy of conforma-
`tions with some native interactions formed. Our strategy also
`may be viewed as a type of signal averaging to compensate for
`noisy scoring functions, in which repeated independent at-
`tempts to find the native state are combined by picking the
`most common topology (the mode) rather than the lowest
`energy conformation.
`The structural elements in native structures would be robust
`to displacement if they (i) often have sufficient local interac-
`tions to be low in energy in isolation; (ii) minimally restrict the
`ability of the remainder of the chain to form structural
`elements low in energy; and (iii) readily combine with other
`low-energy elements to form conformations that are low in
`energy. These features are consistent with the known modu-
`larity of structure in partially folded states of proteins—
`
`synthetic peptides, large protein fragments, and denatured
`proteins. Structural characterization of these types of systems
`have demonstrated that segments of a protein chain frequently
`have a high propensity in isolation to form local structures
`similar to those formed in the native protein (18–20).
`Though limited to a very small sample, these results are
`encouraging and suggest that proteins in general may conform
`to some of the conditions we postulate might permit the
`prediction of structure at low resolution. If these results should
`prove to be general, they support the hypothesis that the native
`structures of proteins are in some sense surrounded by a large
`ensemble of low-energy conformations. In ascribing physical
`reality to this ensemble, we consider it most probable that it
`corresponds predominantly to the denatured state but also
`includes some high-energy forms of the native state involving
`large scale vibrational modes (21) plus partially unfolded states
`(22, 23).
`Recently, the claim has been made that structures of natu-
`rally occurring proteins are selected by evolution because they
`have a high ‘‘designability,’’ i.e., a large tolerance to changes in
`amino acid sequence (a high sequence entropy). One plausible
`mechanism for such designability observed in simple lattice
`models is negative in character: minimization of the likelihood
`of favorable interactions in alternative structural states (24,
`25). The results presented here suggest that a high tolerance of
`structural perturbation (high structural entropy) may be an
`additional, positive mechanism underlying tolerance of se-
`quence perturbations.
`
`The authors thank Ingo Ruczinski for the multidimensional scaling
`analysis shown in Fig. 3, Enoch Huang for helpful discussions and Britt
`Park and Michael Levitt for their decoy set. This work was supported
`by National Institutes of Health Grants GM34171 (to D.S.) and young
`investigator grants to D.B. from the National Science Foundation and
`the Packard Foundation. K.T.S. was supported by National Institutes
`of Health Training Grant PHS NRSA T32 GM07270.
`
`1. Anfinsen, C. B. (1973) Science 181, 223–230.
`2. Tanford, C. (1968) Adv. Protein Chem. 23, 121–282.
`3. Tanford, C. (1970) Adv. Protein Chem. 24, 1–95.
`4. Shortle, D. (1996) FASEB J. 10, 27–34.
`5. Dill, K. A. & Shortle, D. (1991) Annu. Rev. Biochem. 60, 795–825.
`6. Shortle, D. (1996) Curr. Opin. Struct. Biol. 6, 24–30.
`7. Gillespie, J. & Shortle, D. (1997) J. Mol. Biol. 268, 170–184.
`8. Thomas, P. D. & Dill, K. A. (1996) J. Mol. Biol. 257, 457–469.
`9.
`Jernigan, R. L. & Bahar, I. (1996) Curr. Opin. Struct. Biol. 6, 195–209.
`10. Vajda, S., Sippl, M. & Novotny, J. (1997) Curr Opin. Struct. Biol. 7,
`222–229.
`11. Onuchic, J. N. (1997) Proc. Natl. Acad. Sci. USA 94, 7129–7131.
`12. Park, B. & Levitt, M. (1996) J. Mol. Biol. 258, 367–392.
`13. Simons, K. T., Kooperberg, C., Huang, E. & Baker, D. (1997) J. Mol.
`Biol. 268, 209–255.
`14. Murzin, A. G. & Finkelstein, A. V. (1988) J. Mol. Biol. 204, 749–769.
`15. Bryngelson, J. D. & Wolynes, P. G. (1987) Proc. Natl. Acad. Sci. USA
`84, 7524–7528.
`16. Leopold, P. E., Montal, M. & Onuchic, J. N (1992) Proc. Natl. Acad.
`Sci. USA 89, 8721–8725.
`17. Chan, H. S. & Dill, K. A. (1998) Prot. Struct. Funct. Genet. 32, 2–33.
`18. Dobson, C. M. (1992) Curr. Opin. Struct. Biol. 2, 6–12.
`19. Shortle, D., Wang, Yi, Gillespie, J. & Wrabl, J. O. (1996) Protein Sci.
`5, 991–1000.
`20. Schulman, B. A., Kim, P. S., Dobson, C. M. & Redfield, C. (1997) Nat.
`Struct. Biol. 4, 630–634.
`21. Tolman, J. R., Flanagan, J. M., Kennedy, M. A. & Prestegard, J. H.
`(1997) Nat. Struct. Biol. 4, 292–297.
`22. Bai, Y., Sosnick, T. R., Mayne, L. & Englander, S. W. (1995) Science
`269, 192–197.
`23. Chamberlain, A. K., Handel, T. M. & Marqusee, S. (1996) Nat. Struct.
`Biol. 3, 782–787.
`24. Li, H., Helling, R., Tang, C. & Wingreen, N. (1996) Science 273,
`666–669.
`25. Yue, K. & Dill, K. A. (1995) Proc. Natl. Acad. Sci. USA 92, 146–150.
`26. Bryngelson, J. D., Onuchic, J. N., Socci, N. D. & Wolynes, P. G. (1995)
`Protein Struct. Funct. Genet. 21, 167–195.
`
`Page 5

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket