throbber
Analytical Biochemistry 285, 33– 49 (2000)
`doi:10.1006/abio.2000.4744, available online at http://www.idealibrary.com on
`
`A Holistic Approach for Protein Secondary Structure
`Estimation from Infrared Spectra in H2O Solutions
`
`Ganesh Vedantham,* H. Gerald Sparks,†,1 Samir U. Sane,†,2 Stelios Tzannis,†,3
`and Todd M. Przybycien*,4
`*Applied Biophysics Laboratory, Department of Chemical Engineering, Carnegie Mellon University,
`Pittsburgh, Pennsylvania 15213; and †Howard P. Isermann Department of Chemical Engineering,
`Rensselaer Polytechnic Institute, Troy, New York 12180
`
`Received January 31, 2000
`
`We present an improved technique for estimating pro-
`tein secondary structure content from amide I and
`amide III band infrared spectra. This technique com-
`bines the superposition of reference spectra of pure sec-
`ondary structure elements with simultaneous aromatic
`side chain, water vapor, and solvent background sub-
`traction. Previous attempts to generate structural refer-
`ence spectra from a basis set of reference protein spec-
`tra have had limited success because of inaccuracies
`arising from sequential background subtractions and
`spectral normalization, arbitrary spectral band trunca-
`tion, and attempted resolution of spectroscopically de-
`generate structure classes. We eliminated these inaccu-
`racies by defining a single mathematical function for
`protein spectra, permitting all subtractions, normaliza-
`tions, and amide band deconvolution steps to be per-
`formed simultaneously using a single optimization algo-
`rithm. This approach circumvents many of the problems
`associated with the sequential nature of previous meth-
`ods, especially with regard to removing the subjectivity
`involved in each processing step. A key element of this
`technique was the calculation of reference spectra for
`ordered helix, unordered helix, sheet, turns, and unor-
`dered structures from a basis set of spectra of well-char-
`acterized proteins. Structural reference spectra were
`generated in the amide I and amide III bands, both of
`which have been shown to be sensitive to protein sec-
`ondary structure content. We accurately account for
`overlaps between amide and nonamide regions and al-
`
`1 Current address: DuPont Experimental Station, Route 141 and
`Henry Clay Road, Wilmington, DE 19880.
`2 Current address: Genentech, Inc., 1 DNA Way, South San Fran-
`cisco, CA 94080.
`3 Current address: Inhale Therapeutics, 150 Industrial Road, San
`Carlos, CA 94070.
`4 To whom correspondence should be addressed. Fax: (412) 268
`7139. E-mail: todd@andrew.cmu.edu
`
`0003-2697/00 $35.00
`Copyright © 2000 by Academic Press
`All rights of reproduction in any form reserved.
`
`low different structure types to have different extinction
`coefficients. The agreement between our structure esti-
`mates, for proteins both inside and outside the basis set,
`and the corresponding determinations from X-ray crys-
`tallography is good. © 2000 Academic Press
`Key Words: infrared spectroscopy; spectral deconvo-
`lution; protein secondary structure; reference spectra.
`
`Fourier-transformed infrared (FTIR)5 spectroscopy
`is perhaps the most versatile spectroscopic technique
`for analyzing protein secondary structure in diverse
`physiochemical environments. FTIR spectroscopy has
`been applied to investigate protein structure in solu-
`tion (1, 2), in aggregates and inclusion bodies (3, 4), as
`well as during lyophilization (5–7) and freeze/thaw pro-
`cessing (8). In addition, attenuated total reflection
`(ATR) FTIR spectroscopy is ideal for studying protein
`adsorption onto catheter surfaces (9), chromatographic
`media (10 –12), and a variety of other polymeric sur-
`faces (13–15).
`In the past decade, a plethora of methods to estimate
`protein secondary structure contents via analysis of
`amide I, II, and III, band spectra have been reported.
`These methods include, but are not limited to, solitary
`use or combinations of factor analysis (FA) (16 –20),
`singular value decomposition (SVD) (21, 22), Fourier
`self-deconvolution (FSD; or resolution enhancement)
`(14, 23–26), second derivative (SD) band identification
`and fitting (27–29), and the development of spectral
`correlation coefficients (30, 31). Recent reviews of these
`
`5 Abbreviations used: FTIR, Fourier-transformed infrared; ATR,
`attenuated total reflection; FA, factor analysis; SVD, singular value
`decomposition; FSD, Fourier self-deconvolution; SD, second deriva-
`tive; R.H.S., right-hand side; GL, Gaussian–Lorentzian.
`
`33
`
`Chugai Exhibit 2306
`Pfizer, Inc. v. Chugai Pharmaceutical Co. Ltd.
`IPR2017-01357
`Page 00001
`
`

`

`34
`
`VEDANTHAM ET AL.
`
`techniques by Pelton and McLean (32) and Jackson
`and Mantsch (33) are instructive. This large body of
`work devoted to protein secondary structure estima-
`tion from infrared spectra has led to a number of dis-
`crepancies that persist throughout the literature.
`In a classic work cited by virtually every researcher
`in the field, Byler and Susi (24) used FSD to analyze
`the spectra of 21 globular proteins in 2H2O and were
`able to assign components of amide I band spectra to
`helices, b-sheet, turns, and random (unordered) struc-
`ture. By their method, segments with similar structure
`do not necessarily exhibit peaks with identical frequen-
`cies from protein to protein. For example, Byler and
`Susi (24) reported frequencies varying from 1651 to
`1657 cm21 for helical vibrations in proteins, and fre-
`quencies for homopolypeptides in helical conforma-
`tions have been reported as low as 1634 cm21 (24). Also,
`Chirgadze et al. (34) reported that for helical struc-
`tures, the corresponding peak width increases with
`decreasing helical order. In light of this, when decon-
`voluting protein amide bands, many algorithms in-
`volve subjective peak assignments or allow the peak
`positions and widths to vary during the structure esti-
`mation procedure. To circumvent these difficulties,
`many authors have invoked either resolution enhance-
`ment or second derivative techniques to help identify
`the positions of relevant peaks, followed by the assign-
`ment of a structure type to each peak and a fit of each
`peak with Gaussian and/or Lorentzian distribution
`functions. However, significant bias in the results can
`still be introduced because choices of the resolution
`enhancement factor for FSD and the peak assignments
`in both methods are subjective.
`An alternative to case-specific peak assignment
`methods is the direct or indirect development of struc-
`tural reference spectra, or eigenspectra, that theoreti-
`cally represent either pure motifs, such as a-helix,
`b-sheet, and turns, or linear combinations of pure mo-
`tifs (33). These idealized spectra are then fit to the
`spectrum of a protein of unknown structure by varying
`the corresponding motif
`fractions. These fractions
`serve as the weighting factors in a linear superposition
`scheme. The reference spectra are generated by the
`decomposition of a calibration set or basis set of real
`protein spectra covering a broad range of structural
`fractions, utilizing methods such as SVD, band fitting,
`or matrix inversion (17, 19, 21, 35). The reference spec-
`tra approach has been successfully applied to both CD
`spectra (36) and Raman spectra (37), but has had
`mixed results when applied to FTIR spectra (19, 21).
`Contrary to the results of Byler and Susi (24), the
`reference spectra method assigns fixed positions to
`peaks representing the various structure motifs. How-
`ever, as will be demonstrated by the results of this
`work, the mixed success of the past reference spectra
`methods for protein secondary structure predictions
`
`from FTIR spectra is associated with the structure
`class assignments and not the seeming contradiction
`with the work of Byler and Susi (24).
`In addition to uncertainty in peak positions and as-
`signments, the shortcomings of most previous routines
`involve the sequential subtraction of background sol-
`vent and water vapor contributions to the protein so-
`lution spectra, followed by an arbitrary baseline as-
`signment to isolate the amide region of
`interest.
`Baseline correction can be a function of operator expe-
`rience with the subtraction procedure (38). Obtaining a
`so-called “flat-region” in the 1750 –2200 cm21 frequency
`range is the typical criterion used for bulk water sub-
`traction. The degree of background subtraction is often
`determined manually and “flat” is rarely quantified.
`After all background subtractions, the amide I region is
`often isolated for analysis by truncating the spectrum
`at 1600 and 1700 cm21, followed by the subtraction of a
`linear baseline to zero the ends of the spectrum. When
`examining the amide I and II regions together, end
`points of 1480 and 1700 cm21 are typically used, while
`the amide III region is often bounded at 1200 and 1300
`cm21 (39). In this subjective approach, an early error in
`sequential background and baseline subtractions will
`be carried through to the band fitting or reference
`spectra routine and will produce potentially erroneous
`results. Additionally, choosing arbitrary end points for
`a baseline subtraction ignores any contributions from
`adjacent vibrational modes that tail into the amide
`regions and vice versa.
`No current algorithm for protein secondary structure
`estimation from infrared spectra accounts for the impact
`of solutes on background solvent spectra or the possibility
`that different secondary structure motifs may absorb
`with varying extinction coefficients. As demonstrated via
`Raman spectroscopy, the O–H bending and stretching
`vibrations of water undergo significant changes in the
`presence of proteins and other solutes (37, 40, 41). In-
`creasing evidence also supports the idea that different
`molar extinction coefficients exist for the various struc-
`ture types contributing to the protein amide vibrations
`(33, 42, 43). Accurate subtraction of background solvent
`and assignment of the proper weights to the amide band
`components are critical for obtaining reliable secondary
`structure estimates, especially in cases involving low pro-
`tein concentrations.
`Another major discrepancy in current protein struc-
`ture estimation algorithms concerns the paradox seem-
`ingly generated when normalizing spectra. It is com-
`mon practice during analysis to normalize a spectrum
`after all background subtractions have been performed
`and a particular amide band has been isolated. How-
`ever, to accurately account for all the overlapping re-
`gions between peaks that correlate with protein struc-
`ture and those that do not, the amide region should be
`normalized before subtraction. In addition, possible
`
`Chugai Exhibit 2306
`Pfizer, Inc. v. Chugai Pharmaceutical Co. Ltd.
`IPR2017-01357
`Page 00002
`
`

`

`HOLISTIC REFERENCE SPECTRA CALCULATION
`
`35
`
`TABLE 1
`List of Proteins Used for FTIR Spectroscopic Studies
`
`Abbreviation
`
`Protein
`
`Source
`
`Cat. No.
`
`Lot No.
`
`PDB fileb
`
`ALA
`BGH
`BLB
`CAL
`CAN
`CHYa
`CONa
`CYT
`HSA
`LYSa
`MYOa
`PAPa
`PEP
`RNAa
`SUBa
`TPIa
`
`a-Lactalbumin
`Bovine growth hormone
`b-Lactoglobulin
`Conalbumin
`Carbonic anhydrase
`a-Chymotrypsin
`Concanavalin A
`Cytochrome c
`Human serum albumin
`Lysozyme
`Myoglobin
`Papain
`Pepsin
`Rnase
`Subtilisin-BPN9
`Triosephosphate isomerase
`
`Bovine milk
`E. coli (recombinant)
`Bovine milk
`Chicken egg white
`Bovine erythrocytes
`Bovine pancreas
`Canavalia ensiformis
`Horse heart
`Human serum
`Chicken egg white
`Sperm whale
`Papaya latex
`Porcine stomach mucosa
`Bovine pancreas
`Bacillus licheniformis
`Rabbit muscle
`
`L-5385
`
`L-8005
`C-0755
`C-3934
`C-7762
`C-7275
`C-7752
`A-9511
`L-6876
`M-7527
`P-4762
`P-6887
`R-5500
`101129
`T-6258
`
`92H7015
`M901-004
`13H7150
`116H7035
`47H1358
`27H7010
`118F7160
`25H7045
`24H9314
`65H7025
`17H6660
`107H7015
`120H8095
`86H7046
`69618
`96H9554
`
`1hfx
`1bst
`1beb
`1aiv
`2cba
`5cha
`1apn
`1hrc
`1bj5
`1azf
`104m
`1ppn
`4pep
`3rn3
`2st1
`1ag1
`
`a Included in the basis set for generation of the reference spectra.
`b Protein Data Bank file listing. URL: http://www.rcsb.org/pdb/.
`
`variations in secondary structure extinction coeffi-
`cients imply that the areas of the amide bands also
`depend on the overall protein secondary structure con-
`tent. This enigma can be resolved by performing the
`subtractions, normalization, and deconvolution of the
`amide band of interest simultaneously.
`In this paper, we describe a holistic reference spectra
`calculation technique for the generation of idealized ref-
`erence infrared spectra in the amide I and amide III
`regions, followed by a procedure for the estimation of
`protein secondary structure for unknown samples. Our
`prediction technique did not make use of the amide II
`region because this vibrational mode has been shown to
`be less sensitive to variations in protein secondary struc-
`ture content (39). In the calculation of the reference spec-
`tra, all subtractions, normalization, and amide band de-
`convolution steps
`are
`performed
`simultaneously,
`following the method Sane and co-workers (37) developed
`for Raman spectral deconvolution. All non-structure-re-
`lated vibrational peaks are fit using equally weighted
`Gaussian–Lorentzian product functions; peaks correlat-
`ing with protein secondary structure are allowed to have
`different molar extinctions. This method places no re-
`strictions on the frequency ranges analyzed: overlaps be-
`tween non-structure- and structure-associated peaks are
`accounted for since all components are fit simulta-
`neously. The introduction of a protein-dependent effec-
`tive concentration variable solved the normalization
`problem. The calculation of reference spectra involved
`multivariate nonlinear
`least-squares minimization
`which was implemented in Matlab 5.0 (Mathworks Inc.,
`Natik, MA). The idealized reference spectra were opti-
`mized for internal consistency via a bootstrapping algo-
`rithm. FTIR spectra of proteins outside the basis protein
`
`set were then analyzed to validate the secondary struc-
`ture estimation algorithm. Results presented here for
`calculated structural reference spectra compare well with
`those in the literature and provide good secondary struc-
`ture estimates for proteins.
`
`MATERIALS AND METHODS
`Materials
`The proteins in and outside the reference set were
`chosen to cover a broad range of secondary structure
`motifs; a list of the proteins studied is given in Table 1.
`The protein’s secondary structure assignment is de-
`pendent on the choice of assignment algorithm (44, 45).
`In this report, all secondary structure assignments
`were made using the STRIDE algorithm of Frishman
`and Argos (45). The use of a single assignment algo-
`rithm eliminates the discrepancies that ensue from the
`application of dissimilar criteria and algorithms to
`crystallographic data (46). The STRIDE secondary
`structure assignments of the proteins analyzed in this
`report, both within and outside the reference set, are
`shown on a triangular diagram in Fig. 1. We have
`assigned STRIDE-identified 310 helices as well as a-he-
`lices of three or less contiguous residues as unordered
`helices in this work. The Protein Data Bank files used
`to generate the STRIDE estimates are listed in Table 1.
`All the proteins studied exhibit significant ordered sec-
`ondary structure content in their native states.
`Subtilisin BPN9 was purchased from ICN Biomedicals
`Inc. (Irvine, CA). Bovine growth hormone was a gift from
`Monsanto (St. Louis, MO). All other proteins, see Table 1
`for abbreviations used throughout this work, and re-
`agents for buffers were purchased from Sigma Chemical
`
`Chugai Exhibit 2306
`Pfizer, Inc. v. Chugai Pharmaceutical Co. Ltd.
`IPR2017-01357
`Page 00003
`
`

`

`36
`
`VEDANTHAM ET AL.
`
`three intermediate reservoir changes. In addition, solids
`remaining in the lysozyme solution were sedimented in a
`Eppendorff 5415C microcentrifuge (Brinkman Instru-
`ments, Westbury, NY) at 14,000 rpm for 15 min and the
`supernatant was pipetted off for study. Triose phosphate
`isomerase was dialyzed, as described above, to remove
`borate and EDTA. Myoglobin was obtained in liquid form
`at a concentration of 4.8 mg/mL. All other proteins were
`dissolved directly into the corresponding buffers listed in
`Table 2. a-Chymotrypsin was centrifuged, as above, to
`remove residual solids. After dissolution, the concanava-
`lin A protein solution remained slightly cloudy; however,
`centrifugation precipitated the protein and thus the tur-
`bid solution was used for analysis. In addition, myoglobin
`and papain were concentrated in a Beckmann Instru-
`ments, Inc. (Palo Alto, CA), TJ-6 centrifuge at 3000 rpm
`to a final volume of 250 mL with Centricon-3, 3000
`MWCO, centrifugal membrane concentrators from Ami-
`con, Inc. (Beverly, MA). Prior to protein dissolution, all
`buffers were filtered through syringe filters with 0.45-mm
`nylon membranes to remove dust and undissolved salts.
`The proteins included in the reference set are CHY, CON,
`LYS, MYO, PAP, RNA, SUB, and TPI.
`
`FIG. 1. Secondary structure assignments for proteins analyzed in
`this work: 1, ALA; 2, BGH; 3, BLB; 4, CAL; 5, CAN; 6, CHY; 7, CON; 8,
`CYT; 9, HAS; 10, LYZ; 11, MYO; 12, PAP; 13, PEP; 14, RNA; 15, SUB;
`16, TPI. All structure assignments were based on the STRIDE algo-
`rithm of Frishman and Argos (45). Symbols: S, total sheet; Ho, ordered
`helix; T 1 R 1 Hu, turn 1 random coil 1 unordered helix.
`
`Co. (St. Louis, MO). The final buffer conditions used for
`all protein solutions are listed in Table 2. Several pro-
`teins required processing to remove additives. Lysozyme
`and papain were dissolved into their respective buffers
`and then dialyzed with Spectra/Por Biotech 500 MWCO
`cellulose ester membranes (Cat. No. 08-750-1A), pur-
`chased from Fisher Scientific Inc. (Pittsburgh, PA), to
`remove sodium acetate; dialyses were conducted against
`500-mL reservoirs of final buffer solutions for 12 h, with
`
`FTIR Spectroscopy
`All protein spectra were recorded in H2O solution. All
`spectra were collected with a Nicolet Magna 550 Series II
`FTIR spectrometer (Madison, WI) with a horizontal ATR
`accessory from SpectraTech, Inc. (Shelton, CT). The ATR
`accessory used a trapezoidal germanium crystal (7.0 3
`1.0 cm; length 3 width), with ends cut to 45° generating
`12 internal reflections, that was mounted into a sample-
`
`TABLE 2
`Protein Solution Conditions and Spectral Quality
`
`Protein
`
`Buffer
`
`ALA
`BGH
`BLB
`CAL
`CAN
`CHY
`CON
`CYT
`HSA
`LYS
`MYO
`PAP
`PEP
`RNA
`SUB
`TPI
`
`10 mM sodium phosphate, pH 6.0, with 100 mM NaCl
`DI with a trace of HCl, pH 3.8a
`50 mM sodium phosphate, pH 7.0
`100 mM NaCl, pH 6.0
`DI water
`DI with a trace of HCl, pH 3.8a
`DI with a trace of HCl, pH 3.8a
`25 mM sodium phosphate, pH 6.0, with 100 mM NaCl
`25 mM sodium phosphate, pH 7.0, with 100 mM NaCl
`DI water
`20 mM Tris–HCl, pH 8.0
`DI with a trace of HCl, pH 3.8a
`25 mM sodium phosphate, pH 7.0 with 100 mM NaCl
`DI with a trace of HCl, pH 3.8a
`25 mM sodium phosphate, pH 6.0, with 100 mM NaCl
`DI water
`
`Protein
`concentration
`(mg/ml)
`
`S/N
`(amide I band)
`
`S/N
`(amide III band)
`
`28
`8
`40
`20
`24
`18
`21
`15
`28
`32
`18
`27
`22
`18
`19
`18
`
`209
`178
`161
`167
`669
`184
`439
`186
`225
`877
`172
`195
`172
`402
`389
`211
`
`22
`33
`65
`103
`74
`42
`38
`35
`167
`73
`25
`27
`160
`52
`48
`13
`
`a Trace is defined as approximately 50 to 100 ml of 2 M HCl in 1 liter DI water.
`
`Chugai Exhibit 2306
`Pfizer, Inc. v. Chugai Pharmaceutical Co. Ltd.
`IPR2017-01357
`Page 00004
`
`

`

`HOLISTIC REFERENCE SPECTRA CALCULATION
`
`37
`
`boat/trough. The spectrometer was equipped with a liq-
`uid nitrogen-cooled mercury cadmium telluride detector.
`To reduce the contributions of water vapor and carbon
`dioxide, the IR system was continuously purged with air
`from a Balston, Inc. (Haverhill, MA) 75-45 FTIR Purge
`Gas Generator at 30 standard cubic feet per minute and
`supplemented with nitrogen gas from the vent of a liquid
`nitrogen tank. To obtain protein solution and correspond-
`ing buffer background spectra, approximately 250 mL of
`each solution was spread evenly to completely cover the
`germanium crystal. The crystal was then sealed with
`parafilm to minimize evaporation during acquisition.
`Protein concentrations above 20 mg/mL ensured that less
`than 2% of the FTIR signal derived from molecules ad-
`sorbed to the germanium crystal, assuming a worst case
`scenario of monolayer coverage attained by random se-
`quential adsorption with a jamming limit of 55%. All
`ATR-corrected spectra were collected in the 1000 to 4000
`cm21 range as sets of 2048 time-averaged, double-sided
`interferograms with Happ–Genzel apodization. Spectral
`resolution was set at 2 cm21 and a gain of 8 and an
`aperture of 38 were used. After each experiment, the
`exposed surface of the germanium crystal was cleaned
`via a five-step process: (1) rinsing with DI water, (2)
`soaking in a 1% (w/w) SDS solution for 10 min, (3) rinsing
`thoroughly with DI water, (4) rinsing thoroughly with a
`50% (w/w) aqueous ethanol solution, and (5) drying with
`compressed air filtered through cotton to remove oils and
`particulates. Amide I band signal-to-noise (S/N) ratios
`varied from 877 to 161, whereas amide III band S/N
`ratios varied from 166 to 12, as shown in Table 2. Amide
`band S/N ratios were calculated as 2.5 times the maxi-
`mum intensity of the background-subtracted band di-
`vided by 3 times the standard deviation of the intensity
`between 1850 and 2200 cm21.
`
`Data Analysis
`Mathematical representation of protein FTIR spec-
`In addition to the secondary structure-sensitive
`tra.
`amide I and amide III bands, there are several other
`vibrational modes active in the spectral region of interest,
`including the amide II band. Protein solution FTIR spec-
`tra also contain background contributions from buffer
`and water vapor. In addition, spectra may have a contri-
`bution from a sloping baseline. By assuming that the
`contributions of all underlying spectral components are
`additive, invoking the principle of superposition, any set
`of spectra from p proteins (p . 1) can be represented in
`matrix form as
`
`calc 5 vz31a13p 1 1z31b13p 1 Bz3mAm3p 1 Nz3nDn3p
`I z3p
`
`
`
`
`
`
`1 @S z3qI E q3rI F r3pI 1 S z3sIII E s3tIII F t3pIII #C p3p
`eff
`,
`
`[1]
`
`calc is the calculated spectral intensity for p
`where I z3p
`proteins at z frequencies. All subscripts in Eq. [1] cor-
`
`respond to the dimensions (rows 3 columns) of the
`associated matrices, each of which will be elaborated
`upon below.
`The first two terms on the right-hand side (R.H.S.) of
`Eq. [1] describe a linear baseline for the spectral range
`of interest, 1000 to 2200 cm21, during the optimization
`routine. Here v z31 and 1 z31 are vectors of length z
`containing frequencies and ones, respectively. The
`baseline slope and intercept for each protein spectrum
`are compiled in the vectors a 13p and b 13p, respectively.
`Background contributions from buffer (or solvent;
`m 5 1), water vapor (m 5 2), and, where necessary,
`an underlying surface (m 5 3), are accounted for in the
`third term on the R.H.S. of Eq. [1]. The matrix B z3m,
`representing m independently measured background
`spectra recorded at z frequencies, is multiplied by the
`matrix of background signal magnitudes (or ampli-
`tudes), A m3p, containing the respective background
`contributions to each protein spectrum.
`The fourth term on the R.H.S. of Eq. [1] accounts for
`the vibrational peaks in the frequency range analyzed
`that are not correlated with protein secondary struc-
`ture, here on designated as nonstructure peaks. These
`peaks embody vibrations associated with amino acid
`side chains and the amide II band. We have not in-
`cluded individual side-chain resonances that contrib-
`ute intensity in the amide I and III bands (47). These
`resonances typically account for 5 to 15% of the signal
`intensity in the amide I region (43), but are highly
`variable in position from protein to protein (33).
`Each individual peak i is expressed as a Gaussian–
`Lorentzian (GL) product function
`
`2pwi
`
`p~pw i
`
`J4 ~12Y!
`
`[2]
`
`2 1 4~v# i 2 v!! 2G Y
`GLi 5F
`3 3 2˛ln~2!
`expH 24 ln~2!~v# i 2 v! 2
`
`p
`pwi
`
`2
`pw i
`
`each of which has an associated mean frequency posi-
`tion, v# i, and peak width at half-height, pw i. Equation
`[2] is used to generate n nonstructure peaks at z fre-
`quencies across the whole spectral range, forming the
`matrix N z3n. The matrix D n3p contains the nonstruc-
`ture peak magnitudes (or amplitudes) for each corre-
`sponding protein in the reference set. In our formula-
`tion, the number, associated mean peak positions, and
`peak widths of nonstructure GL peaks are identical for
`each protein (i.e., protein independent); however, the
`amplitudes corresponding to the contribution of each
`nonstructure peak to an individual protein spectrum
`vary from protein to protein. The exponent Y in Eq. [2]
`is a weighting factor that determines the relative
`Gaussian–Lorentzian character of the nonstructure
`
`Chugai Exhibit 2306
`Pfizer, Inc. v. Chugai Pharmaceutical Co. Ltd.
`IPR2017-01357
`Page 00005
`
`

`

`38
`
`VEDANTHAM ET AL.
`
`peaks. Following the results of Sane and co-workers
`(37), Y was set equal to 0.5 and used for all nonstruc-
`ture peaks throughout this work, although other val-
`ues have been used successfully (48).
`The final term on the R.H.S. of Eq. [2] represents the
`amide I and amide III band contributions to the calcu-
`lated spectral intensities. The columns of the matrices
`I
`III
`contain Gaussian–Lorentzian peaks,
`and S z3s
`S z3q
`again generated by Eq. [2], with q and s peaks corre-
`lated with protein structure in the amide I and III
`regions, respectively. The molar extinction coefficients
`for the various amide I and amide III structure classes
`I
`III ,
`are contained in the columns of matrices E q3r
`and E s3t
`I
`respectively. Multiple GL peaks in the matrices S z3q
`III (i.e., multiple columns) may represent a single
`and S z3s
`I
`III
`structure class. As a result the matrices E q3r
`and E s3t
`are block diagonal. The different amide I and III struc-
`ture class percentages (or fractions) for each protein
`represented by Eq. [1] are contained in the matrices
`I
`III . Finally, the effective protein concentra-
`and F t3p
`F r3p
`eff , is diagonal, with one nonzero ele-
`tion matrix, C p3p
`ment for each protein. As with the nonstructure peaks,
`the number of peaks as well as the mean position and
`peak widths of each structure-related GL component
`peak is identical for each protein. The molar extinction
`coefficients for each structure class component peak
`are protein independent as well.
`In this work, four
`Reference spectra generation.
`structure classes were associated with the amide I and
`III bands. We performed a SVD analysis on the isolated
`amide I and III bands using the eight proteins in the
`reference set. Our analysis of the singular values sug-
`gested that we could reliably extract between three and
`five different linearly independent pieces of informa-
`tion or secondary structure classes from the amide
`band spectra. Based on an eigenvector analysis of the
`isolated amide band matrix we restricted our structure
`classes to four. We decomposed the amide I band into
`ordered helix (Ho), unordered helix and random (Hu 1
`R), sheet (S), and turn (T) classes (r 5 4). The amide
`III band was decomposed into helix (H), sheet (S), turn
`(T), and random (R) classes (t 5 4). Differing classes in
`the amide I and III regions reflect the differing over-
`laps between underlying component peaks in the two
`regions. Treating these regions separately also aids in
`determining the goodness of fit to the two regions in-
`dependently.
`Each of the proteins outside the reference set was
`added to the reference set, one at a time, to determine
`whether we could confidently deconvolute more than
`four structure classes. SVD analysis suggested that
`augmentation of the reference set does not increase the
`information content of the isolated amide I and III
`band spectra. The size of our reference protein set is
`small. However, adding more proteins to the reference
`
`set degraded the condition number of the matrix of
`isolated amide bands as the set of spectra become in-
`creasingly linearly dependent. Sarver and Krueger (17)
`also parsed secondary structure elements into four
`classes based on an SVD analysis of the amide I bands
`of 17 proteins in aqueous solution; this is consistent
`with our analysis. The number of proteins included in
`the calculation of structural reference spectra is not as
`important as the structure content space that set of
`proteins spans.
`For a given basis set of p proteins, the known vari-
`meas) measured at z
`ables are the spectral intensities (I z3p
`frequencies (v z31), the corresponding background spec-
`tra contributions (B z3m), and the various structure
`III ) for each protein. Becauseclass fractions (F r3pI and F t3p
`
`
`the background and nonstructure peak subtractions as
`well as the amide region fits are performed simulta-
`neously, it is impossible to calculate the area under the
`amide bands a priori. In addition, the amide band area
`is also a function of the relative content of different
`classes of secondary structure since we permit the mo-
`lar extinction coefficients of each structure class to
`vary. To circumvent this problem, the areas under the
`amide I and III bands for each protein spectrum are
`normalized by the effective concentration parameter.
`Given the known variables outlined above, the un-
`known variables can be used as fitting parameters to fit
`Eq. [1] to the set of measured solution spectra of the
`basis set proteins. The fitted parameters include the
`following: all mean peak positions, v# i, and peak widths,
`pw i; the linear baseline parameters, a 13p and b 13p; the
`background and nonstructure peak amplitudes, A m3p
`I
`and D n3p; the molar extinction coefficients, E q3r
`and
`eff . TheE s3tIII ; and the effective protein concentrations, C p3p
`
`
`objective function for optimizing all the fitted parame-
`ters is based on the sum of square differences between
`the calculated and measured total spectral intensities:
`
`measi2.
`objective 5 minimize iI z3pcalc 2 I z3p
`
`
`
`[3]
`
`The computer code for the optimization routine was
`written in a format suitable for the Matlab 5.0 (Math-
`works Inc., Natick, MA) software package.
`The method used by Sane and co-workers (37) to
`separate the linear and nonlinear unknown parame-
`ters during the optimization is a unique feature of this
`algorithm, reducing the problem dimensionality in
`nonlinear space thus leading to significantly faster con-
`vergence. The format of the algorithm is depicted in the
`flowchart in Fig. 2. The calculated spectral intensities
`are linearly related to the baseline parameters, a 13p
`and b 13p, background and nonstructure peak ampli-
`tudes, A m3p and D n3p, and effective protein concentra-
`eff . The spectra are nonlinear functions of
`tions C p3p
`amide I and III mean peak positions, v# i, and peak
`
`Chugai Exhibit 2306
`Pfizer, Inc. v. Chugai Pharmaceutical Co. Ltd.
`IPR2017-01357
`Page 00006
`
`

`

`HOLISTIC REFERENCE SPECTRA CALCULATION
`
`39
`
`The “nnls” routine solves Eq. [4] subject to constraint that
`G is positive, semidefinite; all peak amplitudes and effec-
`tive protein concentrations must be greater than or equal
`to zero to be physically meaningful. However, the slope
`and intercept baseline parameters may be less than zero,
`which violates the “nnls” routine constraint. To resolve
`this difficulty, an arbitrary but known positive slope and
`intercept was added to each spectrum prior to each invo-
`cation of the “nnls” routine; this arbitrary linear back-
`ground addition was subsequently subtracted before con-
`tinuing with the next iteration of the “constr” routine. At
`each iteration step, “constr” updated guesses for the non-
`linear parameters by employing an analytical Jacobian of
`the objective function. Matlab continued the iterative
`procedure until the objective function, Eq. [3], reached a
`minimum, indicating that the best fit between the calcu-
`lated and measured FTIR protein solution spectra had
`been obtained.
`There are several complications in obtaining ideal-
`ized reference spectra that can be dealt with by the
`method of Sane and co-workers (37) in a rather unique
`way. For a basis set of p proteins with m background
`signals, n nonstructure peaks, and q 1 s structure
`peaks, the total number of unknowns in Eq. [1] is quite
`substantial. The number of unknown linear parame-
`ters is (3 1 m 1 n) p and the number of unknown
`nonlinear parameters is 2n 1 3(q 1 s) 2 1. Because
`the problem of developing a best fit to the measured
`spectra is nonlinear, multiple solutions can potentially
`arise. Finally, it is not possible to know a priori how
`many nonstructure and structure peaks are necessary
`to describe protein solution spectra.
`To circumvent these problems, two separate meth-
`ods were utilized in a bootstrapping fashion to gener-
`ate two sets of idealized reference spectra. The first
`method is to generate reference spectra directly from
`the structure-related Gaussian–Lorentzian product
`functions fit to each protein solution spectra via the
`solution to Eq. [3]. Once all the unknowns have been
`fit, reference spectra for the amide I and III re

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket