`Copyright 0 1993 The Protein Society
`
`Disulfide bonding patterns and protein topologies
`
`H
`
`CRAIG J. BENHAM AND M. SALEET JAFRI
`Department of Biomathematical Sciences, Mount Sinai School of Medicine, New York, New York 10029
`(RECEIVED August 4, 1992; REVISED MANUSCRIPT RECEIVED September 10, 1992)
`
`Abstract
`This paper examines the topological properties of protein disulfide bonding patterns. First, a description of these
`patterns in terms of partially directed graphs is developed. The topologically distinct disulfide bonding patterns
`available to a polypeptide chain containing n disulfide bonds are enumerated, and their symmetry and reducibil-
`ity properties are examined. The theoretical probabilities are calculated that a randomly chosen pattern of n bonds
`will have any combination of symmetry and reducibility properties, given that all patterns have equal probability
`of being chosen. Next, the National Biomedical Research Foundation protein sequence and Brookhaven National
`Laboratories protein structure (PDB) databases are examined, and the occurrences of disulfide bonding patterns
`in them are determined. The frequencies of symmetric and/or reducible patterns are found to exceed theoretical
`predictions based on equiprobable pattern selection. Kauzmann’s model, in which disulfide bonds form during
`random encounters as the chain assumes random coil conformations, finds that bonds are more likely to form
`with near neighbor cysteines than with remote cysteines. The observed frequencies of occurrence of disulfide pat-
`terns are found here to be virtually uncorrelated with the predictions of this alternative random bonding model.
`These results strongly suggest that disulfide bond pattern formation is not the result of random factors, but in-
`stead is a directed process.
`Finally, the PDB structure database is examined to determine the extrinsic topologies of polypeptides contain-
`
`ing disulfide bonds. A complete survey of all structures in the database found no instances in which two loops
`formed by disulfide bonds within the same polypeptide chain are topologically linked. Similarly, no instances are
`found in which two loops present on different polypeptide chains in a structure are catenated. Further, no exam-
`ples of topologically knotted loops occur. In contrast, pseudolinking has been found to be a relatively frequent
`event. These results show a complete avoidance of nontrivial topological entanglements that is unlikely to be the
`result of chance events. A hypothesis is presented to account for some of these observations.
`Keywords: covalent bond topology; entanglements; knots; protein structure
`
`Topology is the branch of mathematics that studies those
`properties of shape that remain invariant under continu-
`ous deformations. Topological properties naturally sub-
`divide into two types - those that derive from the intrinsic
`structure of the object under study, and those that relate
`to how that structure is embedded in space. For example,
`a closed circle has a different intrinsic topological struc-
`ture than a finite line segment. One can convert a circle
`into a line segment only by introducing a cut, which is a
`discontinuous deformation. As these two structures have
`different intrinsic topologies, one naturally might expect
`them also t o have different ranges of possible realizations
`in space. All embeddings of a finite linear segment in
`three-dimensional space are topologically equivalent in
`
`Reprint requests to: Craig J. Benham, Department of Biomathemat-
`ical Sciences, Box 1023, Mount Sinai School of Medicine, 1 Gustave
`Levy Place, New York, New York 10029.
`
`the sense that any one can be converted to any other by
`a continuous deformation. In particular, a segment can-
`not be topologically knotted, because any candidate knot
`can be undone without recourse to cutting. One need only
`pass the ends of the segment back through whatever loops
`have been formed, which is a continuous deformation. It
`follows that all geometric shapes having the topological
`structure of finite line segments are topologically equiv-
`alent, both intrinsically and in all spatial embeddings. In
`contrast, a closed circular curve can be knotted. Differ-
`ent knot types cannot be interconverted without introduc-
`ing transient cuts. Two circular curves having distinct
`knot types differ only in the way they are embedded in
`space. Both have the same intrinsic topology, that of a
`closed circle.
`The pattern of covalent connections among amino acid
`residues imparts topological structure to a polypeptide
`chain. (Small loops, such as those occurring in aromatic
`
`41
`
`Amgen Exhibit 2052
`Apotex Inc. et al. v. Amgen Inc. et al., IPR2016-01542
`Page 1
`
`
`
`42
`
`rings, fused rings, and similar local structures, commonly
`are disregarded because their topologies show no variabil-
`ity.) Although a polypeptide chain is synthesized as a lin-
`ear polymer, it need not always have the trivial intrinsic
`topology of a line segment. The formation of covalent di-
`sulfide bonds between cysteine residues within a polypep-
`tide chain produces circular loops
`of covalent bonds
`(Thornton, 1981). These covalent self-associations impart
`nontrivial intrinsic topology to the polypeptide. Molecules
`containing such covalent loops also may have nontrivial
`embedded topologies. Possible examples include knotted
`loops, interlinked pairs of loops on the same polymeric
`backbone, catenanes between loops on different back-
`bones, as well as other forms of entanglement (Crippen,
`1974, 1975). As this paper treats only topological prop-
`erties, loop penetrations that are not topological in char-
`acter are not considered, although
`these also may be
`important in practice (Connolly et al., 1980; Klapper &
`Klapper, 1980).
`The topological state of a molecule constrains its geom-
`etry in specific and potentially important ways (Meiro-
`vitch & Scheraga, 1981a,b; Kikuchi et al., 1986, 1989). A
`protein can fold only into those conformations that are
`consistent with its topology. This limits the portion
`of
`conformation space that a molecule containing disulfide
`bonds may sample. The change in entropy consequent on
`this restriction can stabilize the conformation, as demon-
`strated by the increase in denaturation temperature ob-
`served when a disulfide bond in introduced (Johnson
`et al., 1978). Moreover, the folding pathway of a protein
`may involve the transient or permanent formation of spe-
`cific disulfide bonds that constrain the molecule in a way
`that directs it toward its correct final conformation
`(Creighton & Goldenberg, 1984; Scheraga et al., 1984;
`Weissman & Kim, 1991).
`
`Disulfide bonding patterns and intrinsic topologies
`Consider the distinct disulfide bonding patterns (i.e., states
`of connectivity) available to a polypeptide containing M
`cysteine residues. The backbone of this polymeric chain
`consists of the sequence of residues covalently connected
`through peptide bonds, which are oriented in the N -+ C
`direction. Covalent disulfide bonds may form
`between
`pairs of cysteines, with any single cysteine residue partic-
`ipating in at most one such bond. These disulfide bonds
`possess a chemical symmetry that does not endow them
`with a natural orientation.
`A disulfide bonding pattern has the mathematical struc-
`ture of a partially directed graph. The vertices of this
`graph are the C- and N-termini of the chain, plus each of
`the cysteine residues that participates in a disulfide bond.
`The edges of this graph are the covalent connections be-
`tween these vertices. The polypeptide backbone of the
`molecule is comprised of directed edges, each oriented ac-
`cording to its N -+ C chemical direction, forming a
`
`C. J. Benham and M.S. Jafri
`
`unique, directed, unbranched tree that spans every ver-
`tex. Because disulfide bonds are unoriented, the edges
`corresponding to them are undirected. The end vertices
`have order one, and all others have order three. (The or-
`der of a vertex, also called its valence by graph theorists,
`is the number of edges that are connected to it.) The three
`edges impinging on an interior vertex have distinct prop-
`erties: one edge is directed into the vertex, one is directed
`away from the vertex, and the edge corresponding to the
`disulfide bond
`is undirected. This formulation differs
`from earlier graph-theoretic treatments of disulfide bond-
`ing patterns in that here the direction corresponding to the
`chemical orientation of the polymeric backbone
`is in-
`cluded. Earlier approaches used undirected graphs only
`(Walba, 1985; Mao, 1989).
`Disulfide bonding patterns may be depicted by draw-
`ing the polymer backbone as a straight line, oriented left
`to right in the N -+ C direction, with the disulfide bonds
`shown as interconnections between the vertices corre-
`sponding to the pairs of cysteine residues involved. When
`not indicated by arrows, the backbone orientation always
`is chosen to be left to right as described. When necessary
`the vertices may be numbered in the order they are en-
`countered as the backbone is traversed in the direction as-
`signed by its orientation. For example, the three different
`patterns containing two disulfide bonds are shown in Fig-
`ure 1. Because we are concerned with topological prop-
`erties relating to connectivity, not at present with metric
`properties, the numbers of residues in each part
`of the
`polymer chain are not relevant.
`An alternative representation of a pattern labels the di-
`sulfide bonds alphabetically in the order they are first en-
`countered, starting from the N-terminus. The pair of
`cysteines connected by a particular bond are given its al-
`phabetic label. An n-bond pattern is specified by giving
`the sequence of letters associated with the bonded cys-
`teines, as they are encountered when the chain is traversed
`
`1
`
`2
`
`
`
`3 4
`
`5
`
`6
`
`A n
` BI
`ci
`
`1
`
`2
`
`
`
`3 4
`
`5
`
`6
`
`
`
`1
`
`2
`
`
`
`3 4
`
`5
`
`6
`
`
`
`Fig. 1. The three different disulfide bond patterns in polypeptides con-
`taining two such bonds. All three patterns are symmetric, whereas only
`pattern A is reducible.
`
`Page 2
`
`
`
`Disulfide bonding in proteins
`
`starting from the N-terminus. Thus, each pattern contain-
`ing n disulfide bonds determines a sequence of length 2n
`whose entries are the first n letters of the alphabet, each
`of which appears twice, with new letters appearing in al-
`phabetic order. In this notation the three two-bond disul-
`fide patterns are aabb, abab, and abba.
`In this paper the pattern associated with a state of disul-
`fide bonding of a polypeptide chain is a partially directed
`graph of the type shown in Figure 1, having a unique di-
`rected spanning tree corresponding to the backbone. The
`graph associated with the pattern is the simple collection
`of edges and vertices shown, with no orientation and no
`distinction between different types of edges.
`Two patterns have the same topological structure if one
`can be transformed into the other by a continuous defor-
`mation. This transition must preserve the directed nature
`of the polypeptide chain connections. Therefore its action
`on the directed backbone spanning tree is unique. In par-
`ticular, it associates corresponding vertices in the order
`they are encountered along the chain. It maps directed
`edges to their corresponding directed edges, and disulfide
`bonds to disulfide bonds. It follows that two patterns are
`topologically equivalent exactly when all their disulfide
`bonds connect corresponding pairs of vertices. That is,
`only identical patterns are topologically equivalent. Two
`patterns are topologically distinct if no continuous trans-
`formation between them exists. This means that their in-
`terconversion requires the
`formation, disruption or
`rearrangement of disulfide bonds. Distinct patterns are
`always topologically nonequivalent.
`It is important to note that the topological properties
`of patterns are not the same as the topological properties
`of their underlying graphs. Two graphs have the same in-
`trinsic topology (i.e., are isomorphic) when there is a way
`of numbering the vertices of each so that corresponding
`edges join pairs of vertices having the same numbers in
`both graphs (Roberts, 1984). In graphs the numbering of
`vertices may be chosen arbitrarily and is not determined
`by a directed spanning tree (i.e., polypeptide backbone),
`as was the case for patterns. Thus, two topologically dis-
`tinct patterns may have isomorphic underlying graphs. For
`example, two graphs that are mirror images are isomor-
`phic, although asymmetric patterns in which the disulfide
`bonds occur in mirror image order are not topologically
`equivalent because the mirror image mapping does not
`preserve the backbone orientation. Another example of
`distinct patterns having isomorphic graphs is shown in
`Figure 2 .
`Disulfide bonding patterns have specific attributes that
`could be important for protein structure. One such prop-
`erty is symmetry. A pattern
`is symmetric if it and its
`mirror image both have the same disulfide bonding con-
`nections. Alternatively, the pattern is symmetric if its al-
`phabetic representation reads the same when labels are
`assigned in the N + C direction as when they are assigned
`in the opposite direction. For example, all of the two-
`
`43
`
`bond patterns are symmetric, although patterns with three
`or more disulfide bonds may be asymmetric, as is the case
`for both patterns shown in Figure 2. The second impor-
`tant property is reducibility. A reducible pattern is one in
`which a single cut somewhere along the backbone can sep-
`arate the pattern into two nontrivial subpatterns. That
`is, some disulfide bonds occur entirely to the left of the
`cut point and others entirely to the right, but no disul-
`fide bonds span the cut point. The pattern in Figure 1A
`comprised of two disjoint loops is reducible, whereas both
`of the other patterns are
`irreducible. A third intrinsic
`topological property of a disulfide bonding pattern is non-
`planarity. A pattern is nonplanar if its graph cannot be
`drawn in a plane in a way in which no edges cross (Crip-
`pen, 1974). A pattern is nonplanar exactly when it contains
`the (sub-)pattern abcdbcda. (This topological definition
`of nonplanarity differs from that used by Kikuchi et al.
`[1986, 19891.)
`In the following sections formulas are derived express-
`ing the numbers of distinct (hence topologically nonequiv-
`alent) disulfide bond patterns, as well as the numbers of
`these that have all combinations of symmetry and reduc-
`ibility properties. Intrinsic nonplanarity will not be con-
`sidered in detail here, as it is less likely to be of practical
`importance in protein structure.
`
`lEELL
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`1
`
`0
`
`
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`1
`
`0
`
`
`
`1
`
`2
`
`6
`
`5
`
`7
`
`8
`
`4
`
`3
`
`9
`
`1
`
`0
`
`
`
`Fig. 2. An example of two different patterns whose underlying graphs
`are isomorphic. The top graph is the original pattern, where all edges
`now are regarded as undirected. If the vertices of this graph are visited
`along the path shown in the middle graph, and then this path is drawn
`as a straight line, the graph at the bottom results. Here the vertices re-
`tain their original numbering for clarity.
`
`Page 3
`
`
`
`44
`
`C. J. Benham and M.S. Jafri
`
`having approximately symmetric regions or other regu-
`larities. Numbering the 2n bonded cysteines starting at the
`N-terminus, a pattern is symmetric if, whenever cysteines
`i a n d j are bonded, then so are cysteines 2n - i + 1 and
`2n - j + 1. We note that this symmetry relates only to the
`topological pattern of disulfide bonding, not to metric
`properties such as the lengths of the polypeptide chain
`spanned by the bonds.
`The number S ( n ) of symmetric disulfide bonding pat-
`terns may be found as follows. All patterns containing
`either one or two disulfide bonds are symmetric, so S( 1) =
`1 and S(2) = 3. For the general case, we first enumerate
`those symmetric patterns in which a disulfide bond con-
`nects the first cysteine to the last (i.e., 2nth) cysteine, as
`shown in Figure 3A. This is a symmetric arrangement of
`that bond. There remain n - 1 other bonds to specify. For
`the entire pattern to be symmetric, these other bonds must
`be arranged in a symmetric manner. As there are S ( n - 1)
`ways in which this can be done, this gives the number of
`symmetric patterns of this first type. Alternatively, sup-
`pose the pattern has a disulfide bond connecting the first
`cysteine to the j t h cysteine, j # 2n. There are (2n - 2)
`choices for the cysteine to which this connection is made:
`only 1 and 2n are excluded. For the entire pattern to be
`symmetric, the 2nth cysteine must be connected to the
`2n - j + 1st cysteine, as shown in Figure 3B. Also, the re-
`maining ( n - 2) disulfide bonds must be arranged in a
`symmetric manner, which can be done in S ( n - 2) ways.
`Hence the total number of symmetric patterns of this type
`is (2n - 2)S(n - 2). Putting these results together,
`the
`
`A 1
`
`The number of disulfide bonding patterns
`Consider a polypeptide chain containing M cysteine res-
`idues in which n disulfide bonds are formed, so M r 2n.
`The number of ways of choosing the 2n cysteines par-
`ticipating in the disulfide bonding is ,,,,(& = M ! /(2n) !
`( M - 2n) ! . Now suppose that the participating cysteines
`have been specified. The number of distinct patterns
`containing n disulfide bonds may be found by the follow-
`ing procedure (Cantor & Schimmel, 1980). Consider the
`participating cysteine nearest the N-terminus. There are
`2n - 1 other cysteines to which it may be attached by
`a disulfide bond. Specify to which of these that bond is
`made. This leaves 2n - 2 cysteines whose disulfide con-
`nections remain to be determined. Of these, choose the
`unattached cysteine closest to the N-terminus. There are
`2n - 3 possible choices for which other cysteine forms the
`disulfide bond with this one. Specify to which of these
`candidates that bond is to be made. Continue this process
`until all 2n cysteines have been connected. At the first step
`there were 2n - 1 choices, at the second 2n - 3, at the
`third 2n - 5 , etc. The total number of choices is the prod-
`uct of all the odd numbers from 1 to 2n:
`
`P ( n ) = n (2i - 1) = -.
`
`n
`
`i= I
`
`(2n) !
`2"n!
`
`These equations give the number of different patterns
`containing n disulfide bonds. The factorial form of this
`expression was first presented by Kauzmann (1959). As
`noted above, all of these possibilities are topologically
`distinct as patterns, although some of their underlying
`graphs may be isomorphic.
`n disulfide bonds
`The number of arrangements of
`among Mcysteine residues on a polypeptide chain there-
`fore is
`
`-
`
`M !
`a ( M , n ) = M C 2 n P ( n ) =
`2"n! ( M - 2n) ! '
`
`M r 2n.
`
`B
`
`1
`
`2n
`
`This expression was derived by Sela and Lifson (1959).
`Hereafter we will not consider cysteines that do not par-
`ticipate in disulfide bonds.
`(In mathematics, an algebraic structure can be given to
`the set of patterns by defining a multiplication operation
`on them. However, it is not known whether the resulting
`construct, called the full connection monoid on 2n points
`[Kaufmann & Vogel, 19921, is relevant to protein structure.)
`
`The number of symmetric patterns
`The patterns involving n disulfide bonds may be classi-
`fied according to whether or not they possess symmetry.
`This attribute may reflect (or dictate) a folding pattern
`
`1
`
`j
`
`2n-j+l 2n
`
`or
`
`1
`
`2n-j+l
`
`j
`
`I
`
`2n
`
`Fig. 3. The two cases encountered in the derivation of the recursion re-
`lation for S(n), as described in the text. In the first case (A) a disulfide
`bond joins the first and last (2nth) cysteines, whereas in the second case
`(B) the first cysteine bonds to some cysteine other than the last. The
`disulfide bond shown in the first case is symmetric. However, in the sec-
`ond case the symmetry condition requires the presence of a mirror im-
`age disulfide bond as shown.
`
`Page 4
`
`
`
`Disulfide bonding in proteins
`
`total number of symmetric disulfide bonding patterns is
`shown to obey the following recursion relation:
`
`S ( 1 ) = 1,
`
`S ( 2 ) = 3,
`S ( n ) = S ( n - 1 ) + 2 ( n - 1 ) S ( n - 2 ) ,
`
`n 2 3. (3)
`
`This recursion relation may be solved explicitly, yielding
`the following closed form expression:
`
`(Here ;Pj = i ! / ( i - j ) ! is the permutation of i objects
`taken j at a time, which is the number of different ways
`of choosingj objects, in order and without replacement,
`from a collection of i objects. Throughout this paper
`square brackets in equations denote the greatest integer
`function.)
`
`The number of reducible patterns
`
`A disulfide bonding pattern is reducible if it consists of
`two nonoverlapping, nontrivial subpatterns. In other
`words, if there is a site on the polypeptide backbone
`where a single cut will decompose the pattern into two
`subpatterns, then the pattern is reducible.
`Recursion relations enumerating the reducible and ir-
`reducible patterns are derived as follows. A pattern con-
`taining n disulfide bonds is reducible exactly when it has
`at least one interior cut point, as described above. Tra-
`versing the sequence starting from the N-terminus, sup-
`pose the first such cut point that is encountered has i
`disulfide bonds on its N-terminal side and n - i bonds on
`its C-terminal side, 1 5 i c n. Then the subpattern con-
`sisting of the i bonds on the N-terminal side must be ir-
`reducible, because this is the first cut site encountered.
`The subpattern comprised of the n - i bonds on the C-
`terminal side of the cut can have any form, reducible or
`irreducible. So there are P ( n - i) choices for this pattern.
`Therefore the number of ways in which an n bond pat-
`tern can be chosen whose first cut site occurs as stated is
`the product I ( i ) P ( n - i ) , where I ( i ) denotes the num-
`ber of irreducible patterns with i bonds. For a pattern to
`be reducible it must have a cut point of this type at some
`position for which 1 s i 5 n - 1, so the total number R( n)
`of reducible patterns is the sum
`
`I(n) = P ( n ) - R ( n ) .
`
`45
`
`A similar calculation derives the recursion relation giv-
`ing the number S r ( n ) of n-bond patterns that are both
`symmetric and reducible. Again, suppose the first cut
`point occurs after i bonds, so the number of choices for
`the subpattern of these initial bonds is I ( i ) . Because the
`complete pattern is symmetric as well as reducible, the last
`i bonds must be the mirror images of the first ones. It fol-
`lows that n 1 2i, and that the subpattern of the middle
`n - 2i bonds, if any, is all that remains to be determined.
`For the entire pattern to be symmetric, the subpattern of
`the middle n - 2i bonds must be symmetric. Hence there
`are S ( n - 2i) choices for this structure. It follows that the
`number of symmetric, reducible patterns in which the first
`cut occurs after i bonds is the product I ( i ) S ( n - 2i), so
`the total number of patterns that are both symmetric and
`reducible is
`
`i= 1
`The above results determine the number A (n) of non-
`symmetric patterns on n disulfide bonds to be
`
`A ( n ) = P ( n ) - S ( n ) .
`
`(7)
`
`Similarly, the number of patterns that are symmetric and
`irreducible is
`
`S ; ( n ) = S ( n ) - S r ( n ) .
`
`(8)
`
`The number of nonsymmetric, reducible patterns is
`
`and the number of patterns that are both nonsymmetric
`and irreducible is
`
`Table 1 displays the numbers of patterns P ( n ) contain-
`ing n disulfide bonds, 1 s n s 12, together with the num-
`bers of these patterns that are symmetric, reducible, or
`both. From these values the numbers of patterns with all
`other combinations of symmetry and reducibility prop-
`
`erties may be calculated according to the above equations.
`Table 2 shows the fractions of patterns with given sym-
`metry and reducibility properties for the cases 1 s n I 12.
`These are the probabilities that a randomly chosen pat-
`tern of n disulfide bonds has the given attribute(s), pro-
`vided every pattern is equally likely to be chosen. One sees
`that the fractions of patterns that are asymmetric or ir-
`reducible or both grow with n, while the fractions with
`all other combinations of attributes decrease. The prob-
`ability of symmetry decreases rapidly as n grows, while
`the probability of reducibility decreases more slowly.
`
`Page 5
`
`
`
`46
`
`Table 1. Number P(n) of patterns of n disurfde bonds,
`together with the numbers of these patterns possessing
`spec$ic symmetry and reducibility properties"
`
`"_____._____
`_I__".
`n
`p ( n )
`
`S ( n )
`
`R ( n )
`
`____.__
`S,(nf
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`
`1
`3
`15
`105
`945
`10,395
`135,135
`2,027,025
`34,459,425
`654,729,075
`13,749,310,575
`316,234,143,225
`
`1
`3
`7
`25
`81
`33 1
`1,303
`5,937
`26,785
`133,651
`669,351
`3,609,673
`
`0
`1
`5
`31
`239
`2,233
`24,725
`3 18,63 1
`4,707,359
`78,691,633
`1,471,482,725
`30,469,552,111
`
`0
`1
`1
`5
`9
`41
`105
`485
`1,609
`7,777
`3 1,425
`160,965
`
`_ _ _ ~ - ~
`a These quantities were calculated using the methods described in the
`text.
`
`Observed protein topologies
`In this section we describe the results of database surveys
`evaluating the intrinsic and embedded topological prop-
`erties of known polypeptide disulfide bonding patterns.
`The intrinsic topologies are given by the corresponding di-
`sulfide bonding patterns, whereas the embedded topolog-
`ical properties considered include knotting of loops and
`interlinking of pairs of loops. Intrinsic topologies are de-
`termined by disulfide bond connections alone, whereas
`the evaluation of embedded topologies requires knowl-
`edge of the structure of the protein.
`
`Table 2. Fractions of n-bond patterns having specific
`symmetry and reducibility properties"
`
`n
`
`&(n)
`
`"
`
`.
`
`P r ( n )
`
`P s r ( n )
`
`I
`
`_
`p,,(n)
`
`_
`Pa;(n)
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`
`0.000000
`1 .000000
`0.333333
`1 .000000
`0.333333
`0.466667
`0.295238
`0.238095
`0.252910
`0.085714
`0.214815
`0.031842
`0.182965
`0.009642
`0.157191
`0.002929
`0.000777
`0.136606
`0.120190
`0.000204
`0.107022
`0.000049
`
`0 . ~ 1 1 0.096351
`
`0.000000
`0.000000
`0.000000
`0.333333
`0.266667
`0.066667
`0.247619
`0.047619
`0.243386
`0.009524
`0.210871
`0.003944
`0.182188
`0.000777
`0.156952
`0.000239
`0.136559
`0.000047
`0.120178
`0.000012
`0.107020
`0.000002
`0.096351
`0 . ~ 1
`
`0.000000
`0.000000
`0.266667
`0.514286
`0.670899
`0.757287
`0.808170
`0.840119
`0.862664
`0.879618
`0.892931
`0.903638
`
`_____._____~____
`a In terms of the quantities calculated in Equations 1-10, these frac-
`tins are: p , ( n ) = S(n)/P(n), p,(n) = R(n)/P(n), p,,(n) =
`S~(n)/~(n),p,~(n) =A,(n)/P(n), andp,(n) = A i ( n ) / P ( n ) . These
`fractions also give the probability that a randomly selected pattern has
`the corresponding set of attributes, provided all patterns have equal
`probabilities of selection. Here the subscripts stands for symmetric, a
`for asymmetric, r for reducible, and i for irreducible.
`
`C.J. Benham and M S . Jafri
`
`Intrinsic topologies- Disulfide bond patterns
`
`Information regarding known disulfide bond patterns in
`proteins has been culled from two databases. The Brook-
`haven National Laboratories protein structural database
`(PDB) contains atomic coordinates for the structures of
`approximately 600 molecules (Berstein et al., 1977). Most
`of these structures have been found by crystallography,
`although some are theoretical predictions. In several cases
`a single database entry contains information on multiple
`subunits of the molecule, or on an additional molecule
`such as a bound inhibitor. A total of 259 protein molecules
`in the structural database were found to have disulfide
`bonds. This total includes duplicate entries, successive re-
`finements of the same molecule, and entries for identical
`molecules from closely related species. Some structures
`are reported only for fragments of molecules or for mol-
`ecules that have been altered by mutations affecting the
`number of cysteines present. In developing the population
`of observed structures examined here, theoretically pre-
`dicted structures, mutated molecules, and fragments were
`removed from further consideration, as the information
`in the database does not specify the disulfide bonding pat-
`tern of the actual complete molecule in these cases. When
`duplicate and closely related entries also are deleted, a
`population of 62 distinct, complete polypeptide molecules
`
`containing disulfide bonds remains (listed in the kinemage
`file). The numbers of occurrences in this database of each
`type of observed disulfide bonding pattern are given in the
`fourth column of Table 3 below.
`The National Biomedical Research Foundation (NBRF)
`protein sequence database (Barker et al., 1986) contains
`many thousands of entries, only some of which report di-
`sulfide bonding information. However, the absence of
`this information for a given molecule does not necessar-
`ily imply that it lacks disulfide bonds. In the small num-
`ber of cases where
`disulfide bonding is reported, the
`accuracy of the pattern is not always known. Some en-
`tries rate bonds as certain, probable, or possible, whereas
`others give alternative possible disulfide bonding patterns.
`In some cases bonding patterns have been inferred by ho-
`mology with other molecules. The disulfide bonding in-
`formation derived from this database, although more
`plentiful than that found from the PDB structure data-
`base, must be regarded as being less reliable.
`A total of 455 complete polypeptide chains in the
`NBRF sequence database were found to have intrachain
`disulfide bonds. This figure excludes fragmentary mole-
`cules and cases where considerable uncertainty regarding
`the disulfide connections was reported. Deletion of repeat
`entries and closely related molecules resulted in a popu-
`lation of 186 distinct polypeptides containing disulfide
`bonds. Column 3 of Table 3 reports the occurrences of
`each type of observed pattern in this population.
`When the populations culled from the structure and se-
`quence databases were amalgamated and duplicate entries
`
`~
`
`
`
`Page 6
`
`
`
`Disulfide bonding in proteins
`
`were deleted, an aggregate population of 208 distinct
`polypeptides containing disulfide bonds resulted. All oc-
`currences of each type of disulfide bonding pattern in this
`aggregate population were determined. The results are
`given in column 5 of Table 3. Column 2 in this table gives
`the reducibility and symmetry properties of each observed
`pattern.
`Table 4 shows the observed frequencies of disulfide
`bonding patterns having specific reducibility and symme-
`try attributes. The number of distinct occurrences of a
`given pattern is evaluated from the data of Table 3, sep-
`arately for each database and also for the aggregate pop-
`ulation. Also shown is the theoretical probability of each
`type of attribute, calculated using the expressions in the
`previous sections, assuming that each pattern of n bonds
`has equal probability of forming. These data show that,
`in cases where more than three disulfide bonds are
`present, symmetric patterns occur with frequencies that
`greatly exceed what would be predicted from random,
`equiprobable bonding. When n 1 6 , this frequency is an
`order of magnitude greater than random. This disparity
`is greatest for patterns that are both symmetric and reduc-
`ible, which are overrepresented for all values of n. When
`n I 6 , the prevalence of this type of pattern is two orders
`of magnitude greater than would arise with random bond-
`ing. In contrast, patterns that are irreducible are under-
`rep