`Antigen Binding Site
`
`Vered Kunik1, Bjoern Peters2, Yanay Ofran1*
`
`1 The Goodman faculty of life sciences, Nanotechnology building, Bar Ilan University, Ramat Gan, Israel, 2 Division of Vaccine Discovery, La Jolla Institute for Allergy and
`Immunology, La Jolla, California, United States of America
`
`Abstract
`
`The Complementarity Determining Regions (CDRs) of antibodies are assumed to account for the antigen recognition and
`binding and thus to contain also the antigen binding site. CDRs are typically discerned by searching for regions that are
`most different, in sequence or in structure, between different antibodies. Here, we show that ,20% of the antibody
`residues that actually bind the antigen fall outside the CDRs. However, virtually all antigen binding residues lie in regions of
`structural consensus across antibodies. Furthermore, we show that these regions of structural consensus which cover the
`antigen binding site are identifiable from the sequence of the antibody. Analyzing the predicted contribution of antigen
`binding residues to the stability of the antibody-antigen complex, we show that residues that fall outside of the traditionally
`defined CDRs are at least as important to antigen binding as residues within the CDRs, and in some cases, they are even
`more important energetically. Furthermore, antigen binding residues that fall outside of the structural consensus regions
`but within traditionally defined CDRs show a marginal energetic contribution to antigen binding. These findings allow for
`systematic and comprehensive identification of antigen binding sites, which can improve the understanding of antigenic
`interactions and may be useful in antibody engineering and B-cell epitope identification.
`
`Citation: Kunik V, Peters B, Ofran Y (2012) Structural Consensus among Antibodies Defines the Antigen Binding Site. PLoS Comput Biol 8(2): e1002388.
`doi:10.1371/journal.pcbi.1002388
`
`Editor: Brian Baker, University of Notre Dame, United States of America
`
`Received August 1, 2011; Accepted December 30, 2011; Published February 23, 2012
`Copyright: ß 2012 Kunik et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
`unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
`
`Funding: This work was supported by NIAID contract N01-AI-900048C (http://www.niaid.nih.gov/) and by the Israeli Science Foundation, grants No. 511/
`10(www.isf.org.il). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
`
`Competing Interests: The authors have declared that no competing interests exist.
`
`* E-mail: Yanay@ofranlab.org
`
`Introduction
`
`interactions are based on non-
`Antibody-Antigen (Ab-Ag)
`covalent binding between the antibody (Ab) and the antigen
`(Ag). Correct
`identification of
`the residues that mediate Ag
`recognition and binding would improve our understanding of
`antigenic interactions and may permit
`the modification and
`manipulation of Abs. For example, introducing mutations into the
`V-genes has been suggested as a way to improve Ab affinity [1–3].
`However, mutations in the framework regions (FRs) rather than in
`the Ag binding residues themselves are more likely to evoke an
`undesired immune response [4]. Knowing which residues bind the
`Ag can help direct such mutations and be beneficial
`to Ab
`engineering [5–7]. It has been shown that Ag binding residues are
`primarily located in the so called complementarity determining
`regions (CDRs) [7–9]. Thus, the attempt to identify CDRs, and
`particularly the attempt to define their boundaries, has become the
`focus of extensive research over the last few decades [7,8,10].
`Kabat and co-workers [9,11] attempted to systematically identify
`CDRs in newly sequenced Abs. Their approach was based on the
`assumption that CDRs include the most variable positions in Abs
`and therefore could be identified by aligning the fairly limited
`number of Abs available then. Based on this alignment they
`introduced a numbering scheme for
`the residues
`in the
`hypervariable regions and determined which positions mark the
`beginning and the end of each CDR. The Kabat numbering
`scheme was developed when no structural
`information was
`
`available. Chothia et al. [12,13] analyzed a small number of Ab
`structures and determined the relationship between the sequences
`of the Abs and the structures of their CDRs. The boundaries of the
`FRs and the CDRs were determined and the latter have been
`shown to adopt a restricted set of conformations based on the
`presence of certain residues at key positions in the CDRs and the
`flanking FRs. This analysis suggested that the sites of insertions
`and deletions in CDRs L1 and H1 are different
`than those
`suggested by Kabat. Thus, the Chothia numbering scheme is
`almost identical to the Kabat scheme, but based on structural
`considerations, places the insertions in CDRs L1 and H1 at
`different positions. As more experimental data became available,
`the analysis was performed anew, re-defining the boundaries of the
`CDRs. These definitions of CDRs are mostly based on manual
`analysis and may require adjustments as the structure of more Abs
`become available. Abhinandan et al. [14] aligned Ab sequences in
`the context of structure and found that approximately 10% of the
`sequences
`in the manually annotated Kabat database have
`erroneous numbering. A more recent attempt to define CDRs is
`that of
`the IMGT database [15] which curates nucleotide
`sequence information for immunoglobulins (IG), T-cell receptors
`(TcR) and Major Histocompatibility Complex (MHC) molecules.
`It proposes a uniform numbering system for IG and TcR
`sequences, based on aligning more than 5000 IG and TcR
`variable region sequences, taking into account and combining the
`Kabat definition of FRs and CDRs [16], structural data [17] and
`Chothia’s characterization of the hypervariable loops [12]. Their
`
`PLoS Computational Biology | www.ploscompbiol.org
`
`1
`
`February 2012 | Volume 8 |
`
`Issue 2 | e1002388
`
`1 of 12
`
`OnCusp
`Ex. 1012
`
`
`
`Structural Consensus Defines Antigen Binding Site
`
`Author Summary
`
`Results
`
`Antibodies are a primary adaptive defence mechanism
`against infection, and function by recognizing and binding
`to non-self antigens. While most of the sequence of all
`antibodies of a given individual is identical, relatively small
`variations turn each antibody into a specific binder of one
`antigen. It is widely assumed that antigen binding sites
`correspond to the so called Complementarity Determining
`Regions (CDRs) of the antibody, which are defined as the
`elements that are most different between antibodies. We
`analysed all known antibody-antigen complexes and
`found that about 20% of the residues that actually bind
`the antigen fall outside the CDRs. However, we also found
`that virtually all antigen binding residues fall within
`regions of
`structural consensus between antibodies.
`Moreover, we demonstrate that antigen binding residues
`that reside within these structural consensus regions but
`outside of the traditionally-defined CDRs make significant
`energetic contribution to antigen binding. Furthermore,
`we show that these regions are organized along the
`sequence of the antibody chains and are identifiable from
`the sequence of the antibody.
`
`numbering scheme does not differentiate between the various
`immunoglobulins (i.e., IG or TcR), the chain type (i.e., heavy or
`light) or the species.
`A drawback of these numbering schemes is that CDRs length
`variability is accommodated with either annotation of insertion
`(Kabat and Chothia) or by providing excess numbers (IMGT). Abs
`with unusually long insertions may be hard to annotate this way,
`and therefore their CDRs may not be identified correctly.
`Honegger and Pluckthun [18] suggested a structurally improved
`version of the IMGT scheme. Instead of introducing unidirec-
`tional
`insertions and deletions as in the IMGT and Chothia
`schemes, they were placed symmetrically around a key position.
`MacCallum et al. [8] have proposed focusing on the specific
`notion of Ag binding residues rather than the more vague concept
`of CDRs. They suggested that these residues could be identified
`based on structural analysis of the binding patterns of canonical
`loops. Other studies have dubbed those Ag binding residues
`Specificity Determining Regions (SDRs) [5,7]. Here, we analyze
`Ag-Ab complexes and show that virtually all Ag binding residues
`fall within regions of structural consensus. We refer to these
`regions as Ag Binding Regions (ABRs). We show that these regions
`can be identified from the Ab sequence as well. We used
`‘‘Paratome’’, an implementation of a structural approach for the
`identification of structural consensus in Abs [19]. While residues
`identified by Paratome cover virtually all the Ag binding sites, the
`CDRs (as identified by the commonly used CDR identification
`tools) miss significant portions of them. We refer to the Ag binding
`residues which are identified by Paratome but are not identified by
`any of the common CDR identification methods, as Paratome-
`unique residues. Similarly, Ag binding residues that are identified
`by any of the common CDR identification methods but are
`not
`identified by Paratome are referred to as CDRs-unique
`residues. We show that Paratome-unique residues make crucial
`energetic contribution to Ab-Ag interactions, while CDRs-unique
`residues have a rather minor contribution. These results allow for
`better identification of Ag binding sites and thus for better
`identification of B-cell epitopes. They may also help improve
`vaccine and Ab design.
`
`Structural consensus defines ABRs
`The outline of our structure-based ABRs identification method
`is delineated in Figure 1. Briefly, the algorithm structurally aligns
`all known Abs and marks the residues that contact the Ag in each
`of them. We have shown [19,20] that in this multiple structure
`alignment there is a consensus among Abs that some structurally
`aligned positions contact
`the Ag. These positions
`form six
`sequence stretches along the Ab sequence that roughly correspond
`to the six CDRs. Beyond the edges of these stretches there were no
`structurally aligned positions in which more than 10% of the Abs
`contact the Ag. Thus, we defined the boundaries of the ABRs
`based on these stretches and marked the ABRs in all the Abs in
`our dataset.
`
`Paratome: Automatic sequence based ABRs identification
`Figure 2 depicts the automated ABRs identification tool we
`developed. Given a query sequence (Figure 2A) a BLAST search is
`performed against all Abs in the dataset described above. The best
`hit (i.e., lowest E-value) is used to infer the positions of the ABRs in
`the query sequence, based on its alignment to the annotated Ab
`from the dataset. When the query Ab has a known 3-D structure,
`it can be used to identify the ABRs as described in Figure 2B (see
`Methods).
`
`Content statistics
`Figure 3 summarizes the number of residues identified by each
`method on the test set. In all regions except L1 and H2, Paratome
`identified a slightly larger number of residues than any other
`method. The largest differences were recorded in L2 and H2. In
`L2, Paratome had 50% more residues identified than Kabat and
`Chothia and four times the number of residues identified by
`IMGT. For H2, Kabat and Paratome identified twice the number
`of residues suggested by Chothia and IMGT.
`
`Structural consensus regions contain virtually all Ag
`binding residues
`For each Ab in our test dataset we recorded the average recall of
`the residues that actually bind the Ag by each method. Given the
`typical trade-off between recall and precision in which the increase
`of one is at the cost of decreasing the other, we measured the
`average precision of each method. The results are presented in
`Figure 4. The ABRs identified by Paratome included 94% of Ag
`binding residues, followed by Kabat (85%), IMGT (81%) and
`Chothia (79%) CDRs. Precision rates ranged between 48%
`(IMGT) and 41% (Kabat), with Chothia (44%) and Paratome
`(42%) in between.
`
`ABRs-specific residues cover 10–17% of the Ag binding
`sites
`Table 1 compares the consensus sets and the method specific
`sets of residues. The Paratome-Kabat consensus set is the largest
`(3476 residues), covering 83.54% of
`the Ag binding sites.
`Paratome-Chothia consensus
`set covered 77.08% of
`the Ag
`binding sites (3203 residues), and Paratome-IMGT consensus set
`covered 79.47% of the Ag binding sites (3077 residues). In all
`consensus sets, approximately 50% of the residues are Ag binding
`residues. DParatome contains a substantially larger percentage of
`Ag binding residues than DKabat, DChothia and DIMGT (20.8%,
`26.23% and 20.6% respectively, compared with 5.03%, 4.88%
`and 6.88% respectively).
`
`PLoS Computational Biology | www.ploscompbiol.org
`
`2
`
`February 2012 | Volume 8 |
`
`Issue 2 | e1002388
`
`2 of 12
`
`OnCusp
`Ex. 1012
`
`
`
`Structural Consensus Defines Antigen Binding Site
`
`Figure 1. Structure-based identification of ABRs. (A) Using the non-redundant set of all Ab-Ag complexes in the PDB, (B) we created a multiple
`structure alignment of the Abs. Residues that are in contact with the Ag were identified by searching for structurally aligned positions that
`systematically create contacts with the Ag (black and grey solid circles) and disregarded positions that contact the Ag only sporadically (open
`shapes). (C) The contacting positions were mapped to the sequence representation of the multiple structure alignment (bold letters). The stretches of
`amino acids in which at least 10% of the Abs are in contact with the Ag were defined as ABRs (white rectangle).
`doi:10.1371/journal.pcbi.1002388.g001
`
`Moreover, DParatome residues cover a significantly larger
`the Ag binding sites. DParatome residues covered
`portion of
`10.77% of the Ag binding sites while DKabat covered merely
`the Ag binding sites. The coverage of DParatome
`1.78% of
`(14.84%) was 20 times larger than that of DIMGT (0.76%). When
`compared to Chothia, the coverage of DParatome (17.23%) was,
`
`again, more than an order of magnitude greater than that of
`DChothia’s
`(0.86%).
`In each comparison, Paratome-specific
`residues covered a significantly larger portion of the Ag binding
`sites than the alternative method-specific residues. Thus, indicat-
`ing that structural consensus regions capture more of the Ag
`binding portion of Abs.
`
`PLoS Computational Biology | www.ploscompbiol.org
`
`3
`
`February 2012 | Volume 8 |
`
`Issue 2 | e1002388
`
`3 of 12
`
`OnCusp
`Ex. 1012
`
`
`
`Structural Consensus Defines Antigen Binding Site
`
`Figure 2. Automated ABRs Identification (A) Sequence based ABRs identification. A BLAST search is performed using the query Ab
`sequence versus the dataset of non-redundant PDB Abs. Using the best hit from the BLAST search, the query and annotated Abs FRs are aligned and
`hence the query sequence ABRs are inferred based on the location of the annotated sequence ABRs in the MSTA. (B) Structure based ABRs
`identification. A BLAST search is performed using the sequence of the query Ab versus our dataset of Abs. Using the best hit from the BLAST
`search, the query and annotated Abs are structurally aligned. The ABRs of the query Ab are inferred based on the location of the annotated Ab ABRs
`in the MSTA.
`doi:10.1371/journal.pcbi.1002388.g002
`
`PLoS Computational Biology | www.ploscompbiol.org
`
`4
`
`February 2012 | Volume 8 |
`
`Issue 2 | e1002388
`
`4 of 12
`
`OnCusp
`Ex. 1012
`
`
`
`Structural Consensus Defines Antigen Binding Site
`
`Figure 3. Total number of residues identified by each method for all Ab-Ag complexes in the test set. L1–L3 are ABR/CDR1-3 of the light
`chain. H1–H3 are ABR/CDR1-3 of the heavy chain. Total light and heavy are the sum of all identified residues in the light and heavy chains
`respectively.
`doi:10.1371/journal.pcbi.1002388.g003
`
`Differences in ratios of Ag binding residues
`Figure 5A shows the average precision for each ABR/CDR on
`the light and heavy chains as defined by each of the methods. L2
`has the lowest precision in all methods. For L3, all the methods
`have a similar precision, with a slightly higher rate for Paratome
`(0.55). IMGT has the highest precision for L1 (0.46), followed by
`Paratome (0.38) and Chothia and Kabat has the lowest precision
`(0.27). The largest difference between the methods is in H2 where
`Chothia has the highest precision (0.69), followed by IMGT (0.57),
`then Paratome (0.43) and Kabat (0.37).
`Figure 5B summarizes the average recall of each method for
`each of the six regions. For all methods, L2 has the lowest recall
`(2–7%). This is expected considering L2 has the lowest precision
`(see Figure 5A). For L1, all methods show similar recall (11–12%).
`The same holds for H3, which covers the largest fraction of the Ag
`binding sites (24–25%). H2 shows the highest diversity; For
`Paratome and Kabat it covers 21% of the Ag binding sites while
`for Chothia and IMGT recall
`ranged between 13–15%,
`respectively. In all cases, Paratome shows the highest recall. Note
`that while the overall recall ranges between 0.7–1 (see Figure 4),
`the recall of each of the six regions ranges between 0–0.3. This is
`due to the fact that the total recall is the accumulation of the recall
`obtained by each of the six regions.
`
`Paratome-unique residues are important for Ag binding
`To gain insight into the extent to which Paratome-unique
`residues contribute to Ag binding, we searched the non-redundant
`set of Abs for Ag binding residues residing within structural
`consensus regions that are not identified by any of the CDR
`identification methods. We obtained 153 Paratome-unique
`residues, originating from 104 Abs (Table S3). Using the FoldX
`algorithm [21,22], we performed an in-silico alanine scan in which
`each Paratome-unique residue and each Ag binding residue
`identified by the CDR identification methods (2707 residues)
`within the 104 Abs were mutated to Alanine. Additionally, we
`
`searched the non-redundant set of Abs for Ag binding residues
`residing within CDRs that are not identified by Paratome (i.e.
`CDRs-unique residues). We found 59 CDRs-unique residues,
`stemming from 41 Abs (Table S4). To each CDRs-unique residue
`we performed an in-silico alanine scan in which it was mutated to
`Alanine. The distribution of
`the predicted interaction energy
`(DDG) of these mutants is presented in figure 6A. Destabilizing
`residues in this analysis (DDG.0.25) are residues whose mutation
`to alanine is predicted to destabilize the Ab-Ag complex. These
`residues, therefore, are likely to be important for Ag binding.
`Paratome-unique residues have a slightly higher percentage of
`destabilizing residues (49%) than Ag binding residues that fall
`within the CDRs according to Kabat, Chothia or
`IMGT
`(44.15%). While it is not clear whether the differences between
`Paratome-unique and Ag binding residues within the CDRs are
`significant, it is obvious that the former are at least as important to
`stability as the latter. In contrast, CDRs-unique residues have
`substantially lower contribution to binding: only 27% of them are
`destabilizing and the vast majority of them (70%) are neutral. To
`demonstrate the importance of Paratome-unique residues we show
`a more detailed analysis of the complex of IL-15 with an anti-IL-
`15 Ab (PDB ID 2xqb). Two Ag binding residues, LEU46 and
`TYR49, which were identified by Paratome to be part of ABR L2,
`were not identified by any of the CDR identification methods
`(Table S1). Figure 6B shows these residues relative to the surface of
`the Ag. It can be seen that TYR49 protrudes into the surface of
`the Ag, while LEU46 is located opposite to the antigenic LEU52,
`forming a hydrophobic interaction. As shown is Figure 6C, only
`seven residues from L2 interact with the Ag, and two of them are
`Paratome-unique residues. TYR49 forms one of the two hydrogen
`bonds between the Ag and ABR L2. The results of the FoldX in-
`silico single-point mutations analysis
`indicate that mutating
`ARG50, ARG53 and TYR49 to Alanine have the most significant
`destabilizing effect (Table S2). Not surprisingly, due to the salt
`bridge it forms with antigenic GLU46, mutating ARG50 had the
`
`PLoS Computational Biology | www.ploscompbiol.org
`
`5
`
`February 2012 | Volume 8 |
`
`Issue 2 | e1002388
`
`5 of 12
`
`OnCusp
`Ex. 1012
`
`
`
`Structural Consensus Defines Antigen Binding Site
`
`Figure 4. Recall and precision of Ag binding sites identification. Average precision and recall were calculated for the Abs in the test set for
`Paratome, Kabat, Chothia and IMGT methods. Error bars represent standard error of the mean.
`doi:10.1371/journal.pcbi.1002388.g004
`
`most prominent destabilizing effect. The next most destabilizing
`mutation to Alanine was of TYR49 which forms a hydrogen bond
`between ABR L2 and antigenic GLU53. The third most
`destabilizing mutation to Alanine was of ARG53, which forms a
`cation-p interaction with TYR49. As expected, mutating LEU46
`to Alanine has a weak destabilizing effect on the binding energy.
`Hence, Ag binding residues within the structural consensus regions
`that fall outside the CDRs may play a pivotal role in Ag binding
`and recognition. The amino acid composition of Paratome-unique
`residues is presented in Table S8.
`
`Discussion
`
`Ab-Ag recognition is the basis for the vast usage of Abs for
`molecular identification in research and in the clinic [23–26].
`Thus, identifying Ag binding sites facilitates the understanding of
`the underlying biology as well as Ab design and engineering. In a
`previous study [19], we have shown that structural analysis can
`lead to the identification of residues that roughly correspond to the
`CDRs. Here we further developed this approach, and tried to
`determine whether it can be used to identify the Ag binding
`
`regions within Abs. To our knowledge, this study is the first to
`quantitatively compare the residues
`identified by the most
`commonly used CDR identification methods. The residues that
`reside within the structural consensus regions cover most of the
`observed Ag binding residues
`(94%), a significantly higher
`coverage than with the other methods. The coverage obtained
`by Kabat, Chothia and IMGT stemmed almost entirely from
`the residues that were within the structural consensus regions.
`While CDR residues unique to Kabat, Chothia and IMGT
`comprised less than 2% of the Ag binding sites, ABRs residues
`unique to Paratome covered 10–17% of the Ag binding residues.
`Nevertheless, there are cases in which the structural consensus
`regions did not contain Ag binding residues while a CDR
`identification method identified them. For a detailed example, see
`Figure S1. Approximately 2% of the Ag contacting residues are
`located remotely from the ABRs/CDRs and thus should be
`considered as true negatives. Therefore,
`the actual recall of
`Paratome is 96%.
`Interestingly, all Paratome-unique residues come from either L2
`or H2. However, when we compare each method separately to
`Paratome there are differences in other CDRs as well. Table S9
`
`PLoS Computational Biology | www.ploscompbiol.org
`
`6
`
`February 2012 | Volume 8 |
`
`Issue 2 | e1002388
`
`6 of 12
`
`OnCusp
`Ex. 1012
`
`
`
`Table 1. Ag binding sites coverage by consensus and
`method-specific residues.
`
`Residues set
`
`consensus Paratome - Kabat
`
`DParatome
`
`DKabat
`
`consensus Paratome - Chothia
`
`DParatome
`
`DChothia
`
`consensus Paratome - IMGT
`
`DParatome
`
`DIMGT
`
`# of
`residues
`
`# of
`residues in
`contact
`
`binding sites
`coverage
`
`3476
`
`1018
`
`695
`
`3202
`
`1292
`
`348
`
`3077
`
`1417
`
`218
`
`1664
`
`212
`
`35
`
`1517
`
`339
`
`17
`
`1564
`
`292
`
`15
`
`83.54%
`
`10.77%
`
`1.78%
`
`77.08%
`
`17.23%
`
`0.86%
`
`79.47%
`
`14.84%
`
`0.76%
`
`For each set, we recorded the total number of residues, the number of Ag
`contacting residues and the percentage of Ag binding sites coverage. In all of
`the comparisons, Paratome-specific residues covered a significantly larger
`portion of the Ag binding sites.
`doi:10.1371/journal.pcbi.1002388.t001
`
`shows the distribution of CDRs from which the Ag binding
`residues that are identified by Paratome but are missed by one of
`the other methods originated, in a pairwise comparison.
`MacCallum et al. [8] demonstrated that for some of the CDRs,
`the residues that contact the Ag correspond better with the Kabat
`definition of CDRs than with that of Chothia. This finding may, to
`some extent, explain the fact that the ABRs residues have the
`highest overlap with the residues identified by Kabat, and that for
`both H2 rather than H3, comprises the largest number of residues.
`Attempts to increase Ab affinity have suggested that CDRs L3 and
`H3 are prevalently responsible for high energy interactions with
`the Ag [27,28]. This coincides with our observation that ABRs/
`CDRs L3 and H3 have the largest fraction of Ag binding residues
`for both Paratome and Kabat. For Chothia and IMGT, however,
`the CDRs with most Ag contacting residues are H2 and H3
`(Figure 5A). Notably, for all methods except for Chothia, H2 and
`H3 rather than L3 and H3, cover a significantly larger percentage
`of the Ag binding residues (Figure 5B).
`This analysis of Ag binding residues recognition demonstrates
`that
`relying on structural consensus
`rather
`than sequence
`differences, enables to identify Ag binding residues significantly
`better than the commonly used CDR identification methods.
`Additionally, a detailed in-silico single point mutation analysis of
`all Ag binding residues demonstrates
`that Paratome-unique
`residues contribute to Ag binding at least as much as residues
`within the CDRs and substantially more than Ag binding residues
`that are not identified by Paratome and are identified by CDR
`identification methods. This may prove useful for applications
`aimed at identifying and manipulating Ag binding residues.
`
`Materials and Methods
`
`Extraction of 3D structures
`The outline of our structure-based ABRs identification method
`is delineated in Figure 1. To identify all Ab-Ag structures in the
`PDB [29] we performed a BLAST [30] search against the August
`2009 version of the PDB using an arbitrarily chosen Fab sequence
`as a query. The search was performed separately for the light and
`heavy chains and thus two lists were obtained, a heavy chains list
`(2000 chains from 962 structures) and a light chains list (2500
`
`Structural Consensus Defines Antigen Binding Site
`
`chains from 1047 structures). To obtain an E-value cut-off that will
`ascertain that the hits for the light chain do not contain any heavy
`chains and vice versa, we performed a BLAST search using the
`heavy chain of the query Fab against the hits of light chain and
`another BLAST search using the light chain of the query Fab
`against the hits of the heavy chain. Based on these analyses we
`determined that results with an E-value#1e-6 should be further
`analysed (1280 heavy chains from 855 structures and 1846 light
`chains from 961 structures remained). To discard all T-cell
`receptors or MHC molecules complexes
`from our lists, we
`searched for a BLAST E-value that will exclude all T-cell
`receptors and MHC molecules from the dataset. We arbitrarily
`chose MHC-I, MHC-II, TCR type A and TCR Type B sequences
`and performed a separate BLAST search against the hits of light
`and heavy chains. Results with an E-value of 1e-6, 1e-6, 1e-12 and
`1e-28, respectively, or smaller, were discarded. Furthermore, we
`removed files
`that contained the keywords TcR or MHC,
`duplicate chains from the same PDB and complexes that did not
`contain both a heavy Ab chain and a light Ab chain. This resulted
`in a list containing 1568 Ab chains from 784 structures. We then
`screened the list so each complex holds one heavy Ab chain, one
`light Ab chain, and a single Ag chain which is not an Ab and
`contains at least five amino acids. We did not include non-peptide
`Ags in the analysis. The final
`list
`from which we removed
`redundancy contained 352 structures.
`
`Redundancy removal
`Redundancy removal was performed using Blastclust [31] with
`sequence identity $97% and coverage $95%. We ran Blastclust
`separately for the sequences of
`the light chains and for the
`sequences of the heavy chains and obtained 96 clusters, 48 for
`each. To determine which sequences to remove in each cluster, we
`chose the Ab-Ag interactions as the distinguishing criterion for
`redundancy removal. For each PDB complex in a given Blastclust
`cluster, we identified all residue-residue contacts (see below for
`contact definition). The similarity between any two complexes (i.e.,
`lists of Ab-Ag contacts) within each cluster was measured as the
`number of
`identical contacts (i.e.,
`the same amino acid and
`alignment position within the Ab and the same amino acid and
`position within the Ag) divided by the total number of contacts in
`the shorter of the two lists. Since the similarity score on its own is
`not sufficient for separating the non-redundant complexes from
`the redundant ones, we plotted a histogram of the similarity scores
`to obtain a discriminating cut-off. Most of the complexes in any
`given Blastclust cluster had a similarity score greater than 0.90
`while only 25% of all complexes had a similarity score smaller than
`0.77. Therefore the latter was chosen as the cut-off, rendering
`complexes with a similarity score ,0.77 non-redundant. For each
`group of complexes with a similarity score above the cut-off, the
`complex with the highest number of interactions was chosen as the
`representative complex. This process removed 152 redundant
`complexes and the resulting non-redundant set
`included 200
`experimentally determined 3-D structures of Ab-Ag complexes
`from the PDB.
`
`Ag Binding Regions identification - Paratome
`Using a structure-based approach [19] that is presented in
`Figure 1, we determined the ABRs of the Abs in our dataset of
`non-redundant known Ab-Ag complexes from the PDB. The
`algorithm structurally aligns all Abs whose 3-D structure was
`experimentally determined bound to their protein Ag, using the
`MUSTANG multiple structure alignment algorithm [32]. Next, it
`marks the residues in each structure that contact the residues on
`the Ag. Then, it searches for structurally aligned positions that
`
`PLoS Computational Biology | www.ploscompbiol.org
`
`7
`
`February 2012 | Volume 8 |
`
`Issue 2 | e1002388
`
`7 of 12
`
`OnCusp
`Ex. 1012
`
`
`
`Structural Consensus Defines Antigen Binding Site
`
`Figure 5. Average Ag binding sites recall and precision of light and heavy chains for all ABRs/CDRs. (A) Average Ag binding sites
`precision (B) Average Ag binding sites recall. Error bars represent standard error of the mean.
`doi:10.1371/journal.pcbi.1002388.g005
`
`create such contacts across at least 10% of the Abs. These positions
`form six sequence stretches along the Ab sequence that correspond
`to the six CDRs. Beyond the edges of these clusters, there were no
`structurally aligned positions in which more than 10% of the Abs
`created contacts with the Ag. Therefore, these were defined as the
`ABRs edges. Applying this algorithm to the dataset, we
`automatically identified all ABRs defined by our method without
`any manual intervention.
`
`CDR identification - Kabat, Chothia, and IMGT
`Kabat, Chothia and IMGT establish Ab sequence numbering
`schemes that define in a straightforward manner the location of
`the CDRs within the sequence. Applying the various numbering
`schemes to the Ab sequences in our dataset we obtained the
`residues composing the CDRs according to each of the methods.
`We used the online AbNum tool [8] to number the Abs in our
`
`dataset according to Kabat and Chothia. The boundaries of the
`CDRs were defined as described in AbNum (see table of CDR
`definitions [33]). To obtain the CDRs according to IMGT, we
`coupled the Kabat numbering obtained by applying AbNum with
`a conversion code available at the IMGT web site.
`
`Automatic sequence based ABRs identification
`Considering that the three dimensional structure of most known
`Abs is not yet known, the ability to identify the ABRs based merely
`on its sequence, is highly desirable. We constructed an automated
`ABRs identification tool capable of identifying the ABRs of an Ab
`from its amino acid sequence. Given a query sequence, the tool
`works as follows: First, a BLAST search is performed using the
`query Ab sequence, against all Abs in our dataset. As described
`above, all Abs in this non-redundant set were annotated and the
`ABRs within each of them were identified based on the multiple
`
`PLoS Computational Biology | www.ploscompbiol.org
`
`8
`
`February 2012 | Volume 8 |
`
`Issue 2 | e1002388
`
`8 of 12
`
`OnCusp
`Ex. 1012
`
`
`
`Structural Consensus Defines Antigen Binding Site
`
`Figure 6. Contribution of Paratome-unique and CDR-unique residues to the binding energy in Ab-Ag complexes. (A) The distributions
`of DDG values of an in-silico alanine scan analysis of Paratome-unique, CDRs-unique and CDR Ag binding residues. DDG values ranging between
`20.25 and 0.25 were defined as neutral. DDG values,20.25 were defin