`
`Post-translational modification:
`nature’s escape from genetic
`imprisonment and the basis
`for dynamic information encoding
`Sudhakaran Prabakaran,1 Guy Lippens,2 Hanno Steen3
`and Jeremy Gunawardena1∗
`
`We discuss protein post-translational modification (PTM) from an information
`processing perspective. PTM at multiple sites on a protein creates a combina-
`torial explosion in the number of potential ‘mod-forms’, or global patterns of
`modification. Distinct mod-forms can elicit distinct downstream responses, so
`that the overall response depends partly on the effectiveness of a particular
`mod-form to elicit a response and partly on the stoichiometry of that mod-form
`in the molecular population. We introduce the ‘mod-form distribution’—the
`relative stoichiometries of each mod-form—as the most informative measure of a
`protein’s state. Distinct mod-form distributions may summarize information about
`distinct cellular and physiological conditions and allow downstream processes to
`interpret this information accordingly. Such information ‘encoding’ by PTMs may
`facilitate evolution by weakening the need to directly link upstream conditions to
`downstream responses. Mod-form distributions provide a quantitative framework
`in which to interpret ideas of ‘PTM codes’ that are emerging in several areas of
`biology, as we show by reviewing examples of ion channels, GPCRs, microtubules,
`and transcriptional co-regulators. We focus particularly on examples other than
`the well-known ‘histone code’, to emphasize the pervasive use of information
`encoding in molecular biology. Finally, we touch briefly on new methods for
`measuring mod-form distributions. © 2012 Wiley Periodicals, Inc.
`
`How to cite this article:
`WIREs Syst Biol Med 2012, 4:565–583. doi: 10.1002/wsbm.1185
`
`INTRODUCTION
`
`Post-translational modification (PTM) is a bio-
`
`chemical mechanism in which amino-acid residues
`in a protein are covalently modified.1 It is nature’s
`escape from genetic imprisonment. Gene sequences
`change on an evolutionary time scale but not on
`one appropriate for organismal development, adult
`physiology and the continual battle against dis-
`ease and disintegration. After exons are chosen and
`spliced, a protein’s tertiary structure is altered only
`
`∗
`
`Correspondence to: jeremy@hms.harvard.edu
`1Department of Systems Biology, Harvard Medical School, Boston,
`MA, USA
`2CNRS—Universit´e de Lille, UMR 8576, Villeneuve d’Ascq, France
`3Department of Pathology, Children’s Hospital, Boston, MA, USA
`
`by conformational fluctuations. PTM allows amino-
`acid properties to be changed ‘on the fly’, in response
`to requirements on a developmental or physiological
`time scale. Multisite PTM leads to a combinatorial
`explosion in the number of potential molecular states.
`Such complexity may provide the foundation for
`sophisticated forms of cellular information process-
`ing that are essential for the emergence of organismal
`complexity. This information-centric perspective pro-
`vides the basis for this review.
`
`REVERSIBLE PHOSPHORYLATION
`AS INFORMATION PROCESSING
`The ability of PTM to process information can be seen
`in a simple example of reversible phosphorylation
`
`Volume 4, November/December 2012
`
`© 2012 Wiley Periodicals, Inc.
`
`565
`
`Inari Ex. 1024
`Inari Agric. v. Corteva Agriscience
`PGR2023-00022
`Page 00001
`
`
`
`Advanced Review
`
`wires.wiley.com/sysbio
`
`on a single site (Figure 1a). An individual substrate
`molecule can be either unphosphorylated or phos-
`phorylated. The population of substrate molecules
`contains a mixture of both molecular states. The state
`of the population can be summarized in the relative
`stoichiometry of the phosphorylated state, denoted
`U in Figure 1b and c. This number varies between
`0 (completely unphosphorylated) and 1 (completely
`phosphorylated). It is easiest to understand the behav-
`ior of U when the system has reached steady state and
`the rates of phosphorylation and dephosphorylation
`are equal and opposite. Then, U depends on the rela-
`tive amounts, or effective levels of activity, of kinase
`and phosphatase (Figure 1c).
`To put it another way, the relative stoichiometry,
`U, carries information about the amounts or activity
`levels of the enzymes that are targetting the substrate.
`If
`the substrate is
`itself
`interacting with other
`proteins that prefer the phosphorylated state, such
`as those carrying phospho-specific binding domains,4
`these downstream processes will be able to sense
`information about the upstream enzymes, indirectly
`through the value of U. We will see in the course of
`this review how this idea plays out in intricate ways
`across a broad range of cellular processes.
`PTM information processing is highly regulat-
`able. As shown by Goldbeter and Koshland in a
`
`classic mathematical analysis,3 and later confirmed
`experimentally,5 the shape of the U-response curve
`becomes steeper as the total amount of substrate
`increases (Figure 1c). If the response is very steep
`(blue curve), then any changes in enzyme amounts
`that stay below threshold or above saturation will not
`be visible through changes in U. The information will
`have been filtered out. Between threshold and satu-
`ration, the dynamic range becomes highly amplified:
`small changes in enzyme amounts yield large changes
`in U (‘ultrasensitivity’). Such quantitative details mat-
`ter: if information processing is to be understood, we
`need to be able to measure relative stoichiometries
`and to relate their behavior to the enzyme networks
`that underlie PTM.
`The ability to process information, and to do so
`in a regulatable way, requires continuous expenditure
`of energy. This comes from hydrolysis of the donor
`molecule, in this case ATP, and is a dissipative process:
`a cell’s core biochemical pathways must continually
`replenish ATP and maintain the chemical ‘voltage’ that
`drives phosphorylation. In this sense, PTM behaves
`like a transistor in electronics, expending energy
`to encode information. Such functionality becomes
`vastly enhanced with multiple types and multiple
`sites of modification. The implications of multisite
`phosphorylation have been discussed in previous
`
`(a)
`
`Reversible phosphorylation
`
`(c)
`
`Regulatable information transfer
`Saturation
`
`Increasing substrate
`
`Threshold
`
`Relative amount of E to F
`
`1
`
`0
`
`Steady state U
`
`P
`
`E
`
`F
`
`(b)
`
`Relative stoichiometry of the phosphorylated state
`
`Concentration of
`
`P
`
`U
`
`Total concentration of substrate
`
`FIGURE 1 | Reversible phosphorylation as information processing. (a) A single phosphorylated site on a substrate is dynamically regulated by a
`forward kinase, E, and a reverse phosphatase, F. Not shown are the donor, ATP, its hydrolysis products, ADP and Pi, and the background metabolic
`pathways that maintain the ATP ‘voltage’ (see Figure 3a). (b) The state of the population of substrate molecules is summarized by the relative
`stoichiometry of the phosphorylated state, denoted U, and defined by the fraction shown. Note that the denominator may have more contributions
`than just the free unphosphorylated and phosphorylated states, since, depending on the enzyme mechanisms, substrate may also be bound in
`enzyme-substrate complexes. (c) The steady-state level of U is shown as a function of the relative amounts of kinase and phosphatase. This is a
`hypothetical, but typical, illustration; the quantitative details depend on the enzyme mechanisms.2 The value of U contains information about the
`relative amounts of kinase and phosphatase, which can be sensed and utilized by downstream processes. The response curve can exhibit increasing
`steepness, from nearly hyperbolic (black) to strongly sigmoidal (blue), as the amount of substrate is increased,3 allowing the information processing
`characteristics to be regulated.
`
`566
`
`© 2012 Wiley Periodicals, Inc.
`
`Volume 4, November/December 2012
`
` 1939005x, 2012, 6, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wsbm.1185 by Duke University Libraries, Wiley Online Library on [27/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
`
`PGR2023-00022 Page 00002
`
`
`
`WIREs Systems Biology and Medicine
`
`Information encoding by post-translational modification
`
`Top experimentally observed
`post-translational modifications
`
`43125
`
`105
`
`640
`
`660
`
`664
`
`680
`
`810
`
`843
`
`970
`
`1031
`
`1104
`
`3849
`
`4020
`
`4134
`
`5343
`
`9481
`
`104
`
`103
`
`102
`
`Frequency
`
`Glycyl lysine isopeptide (interchain with G-Cter in ubiquitin)
`
`Phenylalanine amide
`
`N-acetylmethionine
`
`D-alanine
`
`Tryptophyl-tyrosyl-methionine
`
`Pyrrolidone carboxylic acid
`
`4-hydroxyproline
`
`N-acetylalanine
`
`O-linked glycosylation
`
`Phosphotyrosine
`
`Glycyl lysine isopeptide
`
`N6-acetyllysine
`
`N-linked glycosylation
`
`Phosphothreonine
`
`Phosphoserine
`
`FIGURE 2 | Occurrence of experimentally detected PTMs, as curated
`from SwissProt. (Reprinted with permission from Ref 12. Copyright 2011
`Nature Publishing Group)
`
`enzymes cleave the isopeptide linkage and release
`35 which may be
`the modifying polypeptide
`recycled or degraded. Metabolic processes are not
`directly involved in maintaining the ‘voltage’, for
`which responsibility lies with whatever regulates
`transcription of the modifier genes and recycling and
`degradation of the resulting polypeptides.36
`The dissipative character of all PTMs places a
`burden on the background processes, metabolic or
`transcriptional, that are responsible for maintaining
`modifier molecules at the appropriate ‘voltage’. If
`such a background process is not homeostatic—if
`it does not maintain modifier concentration when
`demand fluctuates—then the efficiency of modification
`may be
`compromised, potentially affecting all
`substrates subject to that modification. In the case
`of phosphorylation, ATP concentration is remarkably
`robust even in tissues like skeletal muscle, where
`demand for ATP can change by over two orders
`of magnitude.37 Because ATP is so widely used for
`
`reviews,6–8 as has the interplay of different types of
`modification.9
`
`METABOLIC AND POLYPEPTIDE
`MODIFICATIONS
`Over 200 types of PTM have been identified.1,10
`Several were discovered years
`ago and their
`broader significance has emerged only slowly.11
`Mass spectrometry has been instrumental in giving
`a genome-wide and less biased view.10 A recent
`survey of SwissProt data finds 87,308 experimentally
`detected modifications of amino-acid residues.12
`Phosphorylation on serine/threonine is the most
`prevalent (Figure 2), although this may reflect the
`preponderance of phosphorylation studies. Of the
`other prevalent modifications, some are thought to be
`irreversible, or, at least, are not known to be reversible.
`Irreversible modifications have limited information
`processing capabilities and we focus here only on
`reversible PTMs (from now on, simply, PTMs),
`and limit attention further to those occurring in
`eukaryotes, particularly metazoans (Table 1).
`What has been said above for phosphorylation
`holds true for other such PTMs. They are all dissi-
`pative mechanisms in which energy is expended to
`change protein state. However, there are two different
`kinds of processes that maintain the required ‘volt-
`ages’. One kind of modification is based on small
`molecular groups—phosphoryl, acetyl, ADP-ribosyl,
`etc.—that are carried by metabolic donors—ATP,
`acetyl-CoA, NAD,
`etc.—(Table 1). The donor
`molecules are continuously supplied by the cell’s back-
`ground metabolic processes, which have the ultimate
`responsibility for ensuring that the required ‘volt-
`ages’ are maintained (Figure 3a). Forward and reverse
`modifications are each carried out by single enzymes.
`In contrast to small molecule modifications,
`ubiquitin, and ubiqutin-like modifications (SUMO,
`NEDD, etc.) are polypeptide modifications.30 The
`modifying molecules are made by gene transcription
`and forward modification is undertaken by a chain
`of enzymes (Figure 3b). ATP is expended to adenylate
`the modifier to link it to the first activating enzyme
`(E1) in the chain. The modifier is passed from the E1
`to the second conjugating enzyme (E2). The E2 may
`sometimes act alone, or in concert with an E3 ligase, or
`the E3 may act independently, to build an isopeptide
`linkage between the terminal -NH2 group of a lysine
`residue in the substrate protein and the C-terminal tail
`of the modifier.34 The modified protein is a branched
`amino-acid chain and the introduced polypeptide
`branch can itself become a target for further ubiqutin-
`like modifications. Single reverse deubiquitinating
`
`Volume 4, November/December 2012
`
`© 2012 Wiley Periodicals, Inc.
`
`567
`
` 1939005x, 2012, 6, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wsbm.1185 by Duke University Libraries, Wiley Online Library on [27/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
`
`PGR2023-00022 Page 00003
`
`
`
`Advanced Review
`
`wires.wiley.com/sysbio
`
`TABLE 1 Reversible Post-Translational Modifications
`
`Modification
`Phosphorylation
`Acetylation
`GlcNAcylation
`Palmitoylation2
`Methylation
`
`ADP-ribosylation
`Ubiquitin-like
`
`Modifier
`PO2−
`3
`CH3CO
`C6H12O5(NH)CH3CO
`CH3(CH2)14CO
`CH3
`ADP-ribose
`Ub, SUMO, etc.
`
`Donor
`ATP
`AcCoA
`UDP-GlcNAc
`palmitoyl-CoA
`SAM
`NAD+
`—
`
`Residues
`S, T, Y1
`K
`S, T
`C
`K3
`R, K, E4
`K
`
`References
`21, 22
`23, 24
`25
`26, 27
`18, 28
`
`20, 29
`30, 31
`
`The table shows some of the more widely studied PTMs in metazoa but is by no means exhaustive. For each PTM, only those residues thought to be most
`significant are indicated; for more complete details, see Ref 1. The PTMs above the double line are simple modifications, as in Figure 4, while those below are
`more complex, as in Figure 5. The citations focus on nonhistone examples.
`1Reversible phosphorylation on histidine and aspartate forms the basis for two-component signaling, which is abundant in eubacteria and is also found in
`plants and fungi13; acid-labile phosphoramidate attachments to basic residues are also found in eukaryotes.14
`2Most lipid modifications are irreversible, S-linked palmitoylation being the exception.15
`3Arginine methylation is also widespread16 and is known to cross-talk with other PTMs17 but its reversibility remains in question.18
`4Mono ADP-ribosylation usually takes place on arginine19 and poly-ADP-ribosylation on lysine or glutamate.20
`
`(b)
`
`M gene transcription
`
`ATP
`
`AMP
`
`PPi
`
`(a)
`
`Metabolic recharging process
`
`Proteolytic
`processing
`
`E1
`
`M
`
`E1
`
`M
`
`E3
`
`M
`
`E2
`
`E2
`
`M
`
`E3
`
`M
`
`K
`Modified
`lysine
`
`Lysine
`K
`
`R
`
`e c y clin g
`
`M
`
`X
`Modified
`residue
`
`Forward enzyme
`
`M
`
`Donor
`
`Residue
`X
`
`M
`
`Reverse enzyme
`
`H2O
`
`Degradation
`
`M
`
`Reverse enzyme
`
`H2O
`
`Modifier
`
`Modifier
`
`FIGURE 3 | Metabolic and polypeptide PTMs. The biochemical details may differ depending on the modification; see Ref 1 for more details. (a)
`Metabolic PTMs. Note that lysine deacetylation by the sirtuins uses NAD+ and releases acetyl-ADP-ribose rather than acetate. (b) Polypeptide PTMs.
`Ubiquitin-like modifiers are synthesized by gene transcription, which, in the case of ubiquitin, yields tandem repeats or fusion proteins. These must be
`proteolytically cleaved prior to being used for PTM.32 E2 enzymes can sometimes modify substrates independently of E3s; E2 and E3 enzymes often
`collaborate and E4 elongation factors can join in.33 Assembly of polymeric chains is not fully understood and ubiquitin chains may be preformed prior
`to substrate ligation.33
`
`so many different purposes, there may have been
`sufficient pressure to evolve the circuitry needed to
`make its supply robust to fluctuations in demand.
`This may not be so for other modifications, for which
`much less is known about modifier homeostasis.36,38
`
`COMBINATORICS OF MODIFICATION
`Phosphorylation is a binary modification; a given ser-
`ine, threonine, or tyrosine residue is either phosphory-
`lated or not (Figure 4). The same is true for acetylation
`on lysine, GlcNAcylation on serine or threonine and
`
`568
`
`© 2012 Wiley Periodicals, Inc.
`
`Volume 4, November/December 2012
`
` 1939005x, 2012, 6, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wsbm.1185 by Duke University Libraries, Wiley Online Library on [27/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
`
`PGR2023-00022 Page 00004
`
`
`
`WIREs Systems Biology and Medicine
`
`Information encoding by post-translational modification
`
`2-
`PO3
`
`O
`
`2-
`PO3
`
`2-
`PO3
`
`O
`
`O
`
`OO
`
`O
`
`O
`
`NH
`
`NH
`
`NH
`
`OH
`
`phosphorylation
`
`NH
`
`tyrosine
`
`O
`
`OH
`
`phosphorylation
`
`NH
`
`threonine
`
`O
`
`OH
`
`phosphorylation
`
`NH
`
`serine
`
`O
`
`+
`NH3
`
`CH2
`
`NH
`
`CH2
`
`acetylation
`
`O
`
`NH
`
`O
`
`NH
`
`OH
`
`OH
`
`OH
`O
`
`GlcNAcylation
`
`OH
`
`OH
`
`OH
`O
`
`GlcNAcylation
`
`O
`
`O O
`
`O
`
`NH
`
`NH
`
`(CH2)3
`
`NH
`
`lysine
`
`O
`
`palmitoylation
`
`O
`
`S
`
`O
`
`NH
`
`CH3
`
`CH3
`
`N+
`
`CH3
`
`CH3
`
`CH3
`
`NH+
`
`CH2
`
`(CH2)3
`
`CH2
`
`(CH2)3
`
`CH3
`
`+
`NH2
`
`CH2
`
`(CH2)3
`
`methylation
`
`NH
`
`NH
`
`NH
`
`O
`
`O
`
`O
`
`(CH2)3
`
`NH
`
`O
`
`SH
`
`NH
`
`cysteine
`
`O
`
`+
`NH3
`
`CH2
`
`(CH2)3
`
`NH
`
`lysine
`
`O
`
`FIGURE 4 | Simple PTMs. The chemistry of those PTMs above the double line in Table 1, which exhibit a small, limited number of modifications, is
`shown, with the modifications to each residue in red. Chemical formulas were drawn in BKChem, an open source utility.
`
`palmitoylation on cysteine. Up to three methyl groups
`may bind to the -NH2 group of lysine, so that a given
`lysine may be mono-, di-, or tri-methylated. For these
`PTMs, each residue has a small, limited number of
`discrete modification states (Figure 4).
`The possiblities become more intricate for
`other modifications,
`such as ADP-ribosylation
`(Figure 5). Mono-ADP-ribosylation—the transfer of
`
`a single ADP-ribose moiety usually to an arginine
`residue—was first identified in bacterial toxins which
`inhibit key cellular processes, such as the GTPase
`activity of G-proteins.19
`In contrast, poly-ADP-
`ribosylation was first discovered in the DNA damage
`response, although it is now known to affect a wide
`range of cellular processes.20 Such ‘PARsylation’ is
`reversibly catalyzed by the PARP and PARG families
`
`Volume 4, November/December 2012
`
`© 2012 Wiley Periodicals, Inc.
`
`569
`
` 1939005x, 2012, 6, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wsbm.1185 by Duke University Libraries, Wiley Online Library on [27/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
`
`PGR2023-00022 Page 00005
`
`
`
`Advanced Review
`
`wires.wiley.com/sysbio
`
`O
`
`OH
`
`O
`
`O
`
`PARsylation
`
`NH
`
`glutamate O
`
`NH
`
`NH2
`
`N
`
`N
`
`O
`
`N
`
`N
`
`HO
`
`OH
`
`O
`
`O
`(PO32-)2
`
`
`N
`
`N
`
`O
`
`NH2
`
`N
`
`N
`
`HO
`
`O
`
`OH
`
`HO
`HO
`
`OH
`
`O
`
`O
`
`(PO32-)2
`
`O
`
`O
`
`O
`(PO32-)2
`
`
`NH2
`
`N
`
`N
`
`N
`
`N
`
`O
`
`HO
`
`O
`
`OH
`
`O
`
`+
`NH3
`
`CH2
`
`(CH2)3
`
`ubiquitination
`
`NH
`
`lysine O
`
`NH
`
`CH2
`
`NH
`
`(CH2)3
`
`O
`
`M Q IF V K T L T G K T IT L E V E PS D T IE N V K A K IQ D K E G P PD Q Q R L IF A G K Q L E D G R T LS D YN IQ K ES T LH V L R LR G GI L
`
`
`
`
`
`
`
`DSSP
`
`PDB
`PDB
`
`1
`
`Lys6
`
`10
`11
`
`20
`
`30
`27 29
`
`33
`
`40
`
`50
`
`48
`
`60
`
`63
`
`70
`
`76
`
`Lys48
`
`Lys63
`
`FIGURE 5 | Complex PTMs. The chemistry of those PTMs below the double line in Table 1, which exhibit potentially unlimited numbers of
`modifications, is summarized, as in Figure 4. The human ubiquitin sequence was obtained from PDB 1UBI, along with the secondary structure
`assignment through DSSP. The PDB entries of the ubiquitin structures are 1UBI for the monomer, 1AAR for the Lys48 dimer and 2JF5 for the Lys63
`dimer. The structures were oriented and annotated in Open Source PyMol 1.2.X.
`
`of enzymes, which can dynamically build, on lysine
`or glutamate residues, a polymer of ADP-ribose
`monomers linked by glycosidic bridges. Heteroge-
`nous, linear and branched polymers with more than
`200 monomers have been found.39 Instead of a simple
`modification like those in Figure 4, PARsylation offers
`a potentially unlimited suite of modification structures
`
`on a single residue. In vitro studies show that PAR
`binding domains can discriminate between polymers
`of different sizes,40 suggesting that evolution may have
`been able to exploit this heterogeneity.
`Ubiquitin-like modification exhibits even more
`structural diversity than PARsylation. This is best
`understood for ubiquitin itself, although SUMO2/3
`
`570
`
`© 2012 Wiley Periodicals, Inc.
`
`Volume 4, November/December 2012
`
` 1939005x, 2012, 6, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wsbm.1185 by Duke University Libraries, Wiley Online Library on [27/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
`
`PGR2023-00022 Page 00006
`
`
`
`WIREs Systems Biology and Medicine
`
`Information encoding by post-translational modification
`
`and NEDD8 are also reported to form polymeric
`chains.41 Lysine residues may be mono-ubiqutinated
`or poly-ubiquitinated, with one of the lysine residues
`in ubiquitin itself becoming the attachment point
`for the next ubiquitin monomer (Figure 5). Polymers
`with over a dozen monomers are reported. Ubiquitin
`has seven lysines, K6, K11, K27, K29, K33, K48,
`and K63, each of which may be involved in chain
`formation in vivo. During log-phase growth of yeast,
`mass-spectrometry has shown that these lysines occur
`as attachment points in the respective proportions
`8:24:4:3:1:20:8.31
`Homotypic chains, in which the same position
`is used for each link, are thought to be most common.
`For instance, K48-linked chains are associated with
`proteasomal degradation and may have a compact
`structure, while K63-linked chains are associated with
`endocytosis and may have a more open structure41; see
`Figure 5. There is evidence for heterotypic linking.42
`Forked chains, in which some ubiquitin monomers
`have more than one lysine to which other monomers
`are attached, have been constructed in vitro43 but have
`not yet been observed in vivo. Significantly, evolution
`has found a way to discriminate between structural
`variants. A variety of ubiquitin-binding domains can
`distinguish not only between different lengths of
`polymer but also between different linkages.44
`Multiple residues are often modified on the
`same protein. This may happen through the same
`type of modification on different sites as well as
`through different types of modifications on different,
`or overlapping, sites. If one site has k modification
`states and another site l, then,
`in principle, there
`could be k × l combinatorial states. The possibilities
`multiply with increasing numbers of sites. If a protein
`has n sites of phosphorylation then the total number
`of combinatorial protein states is 2n. Each of these
`combinatorial states corresponds to a global pattern
`of modification across the entire protein. If there
`are also complex modifications,
`like PARsylation
`or ubiquitin-like modification, then the number of
`global patterns increases even faster with n. For
`ubiquitin, it may be necessary to keep track not only
`of the size and shape of the polymer but also of the
`linkages between the components, giving even higher
`multiplicative possibilities for heterotypic chains. This
`enormous combinatorial explosion is one of the most
`characteristic features of PTM and also one of its
`most perplexing. The ‘hypothetical computation’ of
`PTM states by Lonard and O’Malley makes the
`same point.45 Why is so much state needed? What
`manner of information processing has evolution been
`able to implement through having such extraordinary
`complexity at its disposal? To address this question,
`
`we build upon the ideas discussed initially to introduce
`a basic concept for keeping track of global patterns of
`modification.
`
`THE MOD-FORM DISTRIBUTION
`From now on, we refer to a global pattern of
`modification as a ‘mod-form’. To reiterate the
`meaning of this, a mod-form is a specific pattern of
`modifications on all modifiable residues in a protein.
`Each post-translationally modified protein may have
`many mod-forms, as discussed above. The customary
`cartoon depiction of a post-translationally modified
`protein shows it in one particular mod-form, usually
`the maximally modified one (Figure 6a). This gives
`the misleading impression that only one mod-form
`is present, when, in reality, there are combinatorially
`many possibilities (Figure 6b). Moreover, there is a
`population of molecules present and each molecule
`is in one of the potential mod-forms. It is easy
`to lose sight of the molecular populations behind
`the cartoons. For instance, it is often said that two
`modifications that target the same residue, such as
`GlcNAcylation and phosphorylation on serine and
`threonine (Figure 4), are ‘mutually exclusive’. This is
`only true of a single molecule. The population may
`contain both modifications in any proportion.
`Of course, not all potential mod-forms may be
`present in any particular context. The serine/arginine
`repetitive matrix factor (Srrm2) has over 300 detected
`phosphorylations,
`as
`reported on PhosphoElm
`(Table 2). Since 2300 exceeds Eddington’s estimate
`of the number of protons in the Universe, not all mod-
`forms can ever be present at any one time. However,
`this only begs the question of which of the many
`possible mod-forms are present and to what extent.
`This is a matter of biochemical dynamics.
`The pattern of mod-forms in the population is
`dynamically regulated by the cognate forward and
`reverse enzymes working collectively. It is sometimes
`thought that forward and reverse enzymes work in
`sequence, with the former being activated first to
`create the modifications and the latter being activated
`next to downregulate them. This may be useful in
`some contexts to create a tightly focussed mod-form
`distribution but any stochastic (noisy) fluctuation
`in the forward enzyme will precipitate irreversible
`modification, suggesting that this is not a robust
`mechanism in general. It is more usually the case
`that opposing enzymes are constitutively present.56,57
`PTM is a highly dynamic business.
`For a single site, as discussed initially, enzyme
`activities
`can be
`regulated to set
`the
`relative
`stoichiometry of phosphorylation anywhere between
`
`Volume 4, November/December 2012
`
`© 2012 Wiley Periodicals, Inc.
`
`571
`
` 1939005x, 2012, 6, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wsbm.1185 by Duke University Libraries, Wiley Online Library on [27/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
`
`PGR2023-00022 Page 00007
`
`
`
`Advanced Review
`
`wires.wiley.com/sysbio
`
`(a)
`
`PTM cartoon
`
`(b)
`
`Combinatorial patterns of modification (“mod-forms”)
`
`P
`
`P
`
`P P
`
`9.
`10.
`11.
`12.
`
`2U
`
`2U
`
`2U
`
`2U
`
`P
`
`P
`
`P
`
`P
`
`U
`
`U U U
`
`5.
`6.
`7.
`8.
`
`P
`
`P
`
`PP
`
`1.
`2.
`3.
`4.
`
`P 3
`
`P
`
`2
`
`U U
`
`1
`
`(c)
`
`Mod-form distribution
`
`(d)
`
`Site-specific
`Phosphorylation
`Ubiquitination
`
`no P P on
`site 2
`
`P on
`site 3
`
`no U
`
`1U on
`site 1
`
`2U on
`site 1
`
`1 2 3 4 5 6 7 8 9 10 11 12
`
`FIGURE 6 | Mod-form distributions. (a) Cartoon depiction of a hypothetical substrate with 3 sites of modification; site 1 is ubiquitinated with a
`chain of up to two monomers; sites 2 and 3 are phosphorylated. (b) There are 12 = 3 × 2 × 2 global patterns of modification, enumerated as shown.
`(c) A hypothetical mod-form distribution, showing the proportions in the population of each of the 12 mod-forms, following the numbering used in
`(b). The mod-form distribution can be viewed as a probability distribution, which gives, for each mod-form, the probability of finding a substrate
`molecule in that mod-form. The vertical scale has been omitted to focus on qualitative aspects. (d) In current practice, only limited information may
`be available. The separate phosphoryl- and ubiquityl-modifications calculated from (c) are shown, with the phosphoryl-modifications given as
`site-specific stoichiometries (the proportion of unphosphorylated substrate and of substrate phosphorylated on each site). Such summaries lose
`considerable information compared to the underlying mod-form distribution, making it harder to infer correlations between modification states and
`downstream responses.
`
`TABLE 2 PTM Resources
`
`Name
`UniProt
`HPRD
`Phospho.ELM
`PhosphoSitePlus
`PHOSIDA
`PhosphoPep
`dbPTM
`CPLA
`P3DB
`PhosPhAt3.0
`Phosphorylation Site Database
`
`URL
`www.uniprot.org
`www.hprd.org
`phospho.elm.eu.org
`www.phosphosite.org
`www.phosida.com
`www.phosphopep.org
`dbptm.mbc.nctu.edu.tw
`cpla.biocuckoo.org
`www.p3db.org
`phosphat.mpimp-golm.mpg.de
`www.phosphorylation.biochem.vt.edu
`
`PTMs
`Many
`Ph1
`Ph
`Ac, Me, Ph, Ub
`Ac, Ph
`Ph
`Many2
`Ac
`Ph
`Ph
`Ph3
`
`Organisms
`Many
`Hs
`Eukaryotes
`Hs, Mm
`Ce, Dm, Hs, Mm, Sc
`Ce, Dm, Hs, Sc
`N/A
`N/A
`At, Bn, Gm, Mt, Os, Zm
`At
`Bacteria, Archaea
`
`References
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`N/A
`
`The table shows online databases of post-translational modifications, focussing on those modifications in Table 1. The list is by no means exhaustive. Ph,
`phosphorylation; Ac, acetylation; Me, methylation; Ub, ubiquitin; At, Arabidopsis thaliana; Bn, Brassica napus; Ce,Caenorhabditis elegans; Dm, Drosophila
`melanogaster; Eu, eukaryotes; Gm, Glycine max; Hs, Homo sapiens; Mm, Mus musculus; Mt, Medicago truncatula; Os, Oryza sativa; Sc, Saccharomyces
`cerevisiae; Zm, Zea mays; N/A, not available.
`1Other PTMs are included but phosphorylation is particularly curated.
`2Includes irreversible PTMs but focusses on statistics and motifs.
`3Only phosphorylations on serine, threonine, and tyrosine are provided, not histidine and aspartate, as found in two-component signaling.
`
`0 and 1 and can do so in either a graded
`or ultrasensitive manner (Figure 1c). The situation
`becomes more complicated with multiple sites58 or
`complex enzyme mechanisms.2 A kinase may operate
`
`processively, phosphorylating a substrate on multiple
`sites without releasing it;59 intermediate mod-forms
`may not then appear. Enzyme action may depend on
`the prior existence of certain mod-forms, as in ‘primed’
`
`572
`
`© 2012 Wiley Periodicals, Inc.
`
`Volume 4, November/December 2012
`
` 1939005x, 2012, 6, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wsbm.1185 by Duke University Libraries, Wiley Online Library on [27/03/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
`
`PGR2023-00022 Page 00008
`
`
`
`WIREs Systems Biology and Medicine
`
`Information encoding by post-translational modification
`
`or hierarchical phosphorylation;60 some mod-forms
`may only appear at certain times and with the right
`sequence of enzymes. In short, the pattern of mod-
`forms and the way this changes over time depend
`on the mechanistic details of the network of enzymes
`that target the substrate. New mathematical methods
`developed by one of our labs have the potential
`to analyze such combinatorially complex dynamical
`behavior.61,62
`(but not
`typically
`Downstream processes
`always—see the next paragraph)
`interact with a
`modified substrate by sampling all
`the substrate
`molecules in the population. They are therefore
`influenced by whichever mod-forms are present. As
`we shall see, distinct mod-form may exert distinct
`effects on downstream processes. The overall response
`will depend both on how much effect each mod-
`form exerts and on how much of each mod-form is
`present. If a mod-form has high effect but is present
`only at low stoichiometry, it may have less impact
`than one of low effect but high stoichiometry. The
`stoichiometry to which each mod-form is present, or
`the proportion of total substrate in each mod-form,
`determines the substrate’s ‘mod-form distribution’.
`This is a histogram over all the mod-forms that lists,
`for each mod-form, the effective probability of finding
`the protein in that state (Figure 6c). The overall effect
`of a given downstream process can be quantified as an
`average over the mod-form distribution of the effect of
`each individual mod-form. The mod-form distribution
`provides the most comprehensive and quantitative
`accounting of the combinatorial possibilities.
`The role of the substrate population can vary
`with context. For instance, the carboxy terminal
`domain (CTD) of the largest subunit of RNA Pol II
`consists of tandem hexapeptide repeats, 52 in humans,
`whose differential phosphorylation correlates with the
`progress of transcription.63 Here, transcription of a
`particular gene is influenced only by the CTD of the
`Pol II that is transcribing that gene; the population of
`CTDs is not sampled. In such cases, t