throbber
Detection and quantification of rare mutations with
`massively parallel sequencing
`
`Isaac Kinde, Jian Wu, Nick Papadopoulos, Kenneth W. Kinzler1, and Bert Vogelstein1
`
`The Ludwig Center for Cancer Genetics and Therapeutics and The Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center,
`Baltimore, MD 21231
`
`Contributed by Bert Vogelstein, April 19, 2011 (sent for review March 21, 2011)
`
`The identification of mutations that are present in a small fraction
`of DNA templates is essential for progress in several areas of bio-
`medical research. Although massively parallel sequencing instruments
`are in principle well suited to this task, the error rates in such instru-
`ments are generally too high to allow confident identification of rare
`variants. We here describe an approach that can substantially increase
`the sensitivity of massively parallel sequencing instruments for this
`purpose. The keys to this approach, called the Safe-Sequencing System
`(“Safe-SeqS”), are (i) assignment of a unique identifier (UID) to each
`template molecule, (ii) amplification of each uniquely tagged template
`molecule to create UID families, and (iii) redundant sequencing of
`the amplification products. PCR fragments with the same UID are con-
`sidered mutant (“supermutants”) only if ≥95% of them contain the
`identical mutation. We illustrate the utility of this approach for deter-
`mining the fidelity of a polymerase, the accuracy of oligonucleotides
`synthesized in vitro, and the prevalence of mutations in the nuclear
`and mitochondrial genomes of normal cells.
`diagnostics | early diagnosis | biomarkers | genetics | cancer
`Genetic mutations underlie many aspects of life and death—
`
`through evolution and disease, respectively. Accordingly,
`their measurement is critical to several fields of research. Luria
`and Delbrück’s classic fluctuation analysis is a prototypic example
`of the insights into biological processes that can be gained simply
`by counting the number of mutations in carefully controlled
`experiments (1). Counting de novo mutations in humans, not
`present in their parents, has similarly led to new insights into the
`rate at which our species can evolve (2, 3). Similarly, counting
`genetic or epigenetic changes in tumors can inform fundamental
`issues in cancer biology (4). Mutations lie at the core of current
`problems in managing patients with viral diseases such as AIDS
`and hepatitis by virtue of the drug resistance they can cause (5, 6).
`Detection of such mutations, particularly at a stage before their
`becoming dominant in the population, will likely be essential to
`optimize therapy. Detection of donor DNA in the blood of organ
`transplant patients is an important indicator of graft rejection and
`detection of fetal DNA in maternal plasma can be used for pre-
`natal diagnosis in a noninvasive fashion (7, 8). In neoplastic dis-
`eases, which are all driven by somatic mutations, the applications
`of rare mutant detection are manifold; they can be used to help
`identify residual disease at surgical margins or in lymph nodes, to
`follow the course of therapy when assessed in plasma, and to
`identify patients with early, surgically curable disease when eval-
`uated in stool, sputum, plasma, and other bodily fluids (9–11).
`These examples highlight the importance of identifying rare
`mutations for both basic and clinical research. Accordingly, in-
`novative ways to assess them have been devised over the years.
`The first methods involved biologic assays based on prototrophy,
`resistance to viral infection or drugs, or biochemical assays (1, 12–
`18). Molecular cloning and sequencing provided a new dimension
`to the field, as they allowed the type of mutation, rather than
`simply its presence, to be identified (19–24). Some of the most
`powerful of these newer methods are based on digital PCR, in
`which individual molecules are assessed one by one (25). Digital
`PCR is conceptually identical to the analysis of individual clones
`
`of bacteria, cells, or virus, but is performed entirely in vitro with
`defined, inanimate reagents. Several implementations of digital
`PCR have been described, including the analysis of molecules
`arrayed in multiwell plates, in polonies, in microfluidic devices,
`and in water-in-oil emulsions (25–30). In each of these technol-
`ogies, mutant templates are identified through their binding to
`oligonucleotides specific for the potentially mutant base.
`Massively parallel sequencing represents a particularly powerful
`form of digital PCR in that hundreds of millions of template mol-
`ecules can be analyzed one by one. It has the advantage over con-
`ventional digital PCR methods in that multiple bases can be queried
`sequentially and easily in an automated fashion. However, mas-
`sively parallel sequencing cannot generally be used to detect rare
`variants because of the high error rate associated with the se-
`quencing process. For example, with the commonly used Illumina
`sequencing instruments, this error rate varies from ∼1% (31, 32)
`to ∼0.05% (33, 34), depending on factors such as the read length
`(35), use of improved base-calling algorithms (36–38), and the type
`of variants detected (39). Some of these errors presumably result
`from mutations introduced during template preparation, during the
`preamplification steps required for library preparation, and during
`further solid-phase amplification on the instrument itself. Other
`errors are due to base misincorporation during sequencing and base-
`calling errors. Advances in base calling can enhance confidence
`(e.g., refs. 36–39), but instrument-based errors are still limiting,
`particularly in clinical samples wherein the mutation prevalence can
`be ≤0.01% (11). In the work described herein, we show how tem-
`plates can be prepared and the sequencing data obtained from them
`more reliably interpreted, so that relatively rare mutations can be
`identified with commercially available instruments.
`
`Results
`Overview. Our approach, called the Safe-Sequencing System
`(“Safe-SeqS”), involves two basic steps (Fig. 1). The first is the
`assignment of a unique identifier (UID) to each DNA template
`molecule to be analyzed. The second is the amplification of each
`uniquely tagged template, so that many daughter molecules with
`the identical sequence are generated (defined as a UID family). If
`a mutation preexisted in the template molecule used for ampli-
`fication, that mutation should be present in every daughter mol-
`ecule containing that UID (barring any subsequent replication or
`sequencing errors). A UID family in which at least 95% of family
`members have the identical mutation is called a “supermutant”.
`Mutations not occurring in the original templates, such as those
`occurring during the amplification steps or through errors in base
`calling, should not give rise to supermutants. Conceptual and
`
`Author contributions: I.K., N.P., K.W.K., and B.V. designed research; I.K., J.W., N.P., and B.V.
`performed research; I.K., J.W., N.P., K.W.K., and B.V. contributed new reagents/analytic tools;
`I.K., N.P., K.W.K., and B.V. analyzed data; and I.K. and B.V. wrote the paper.
`
`The authors declare no conflict of interest.
`1To whom correspondence may be addressed: E-mail: kinzlke@jhmi.edu or bertvog@
`gmail.com.
`
`This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.
`1073/pnas.1105422108/-/DCSupplemental.
`
`9530–9535 | PNAS |
`
`June 7, 2011 | vol. 108 | no. 23
`
`www.pnas.org/cgi/doi/10.1073/pnas.1105422108
`
`PGDX EX. 1004
`Page 1 of 16
`
`

`

`*
`
`*
`
`* ****
`
`*
`
`Shear
`
`Ligate Adapters
`
`Solid Phase Capture
`
`Library Amplification
`
`Mutant
`
`**
`
`*****
`
`WT
`
`UID Assignment
`
`Amplification
`
`Redundant
`Sequencing
`
`*
`
`*
`
`*
`*
`
`*
`
`Redundant Sequencing
`
`GENETICS
`
`Safe-SeqS with endogenous UIDs plus capture. The sequences of
`Fig. 2.
`the ends of each fragment produced by random shearing (variously colored
`bars) serve as the unique identifiers (UIDs). These fragments are ligated to
`adapters (yellow and orange bars) so they can subsequently be amplified by
`PCR. One uniquely identifiable fragment is produced from each strand of
`the double-stranded template; only one strand is shown. Fragments of
`interest are captured on a solid phase containing oligonucleotides com-
`plementary to the sequences of interest. Following PCR amplification to
`produce UID families with primers containing 5′ “grafting” sequences
`(black and red bars), sequencing is performed and supermutants are de-
`fined as in Fig. 1.
`
`A strategy using endogenous UIDs was also used to reduce
`false-positive mutations upon deep sequencing of a single region
`of interest. In this case, a library prepared as described above from
`∼1,750 normal cells was used as template for inverse PCR using
`primers complementary to a gene of interest, so the PCR products
`could be directly used for sequencing (Fig. S1). With conventional
`analysis, an average of 2.3 × 10−4 mutations/bp were observed,
`similar to that observed in the capture experiment (Table 1).
`Given that only 1,057 independent molecules from normal cells
`were assessed in this experiment, as determined through Safe-
`SeqS analysis, all mutations observed with conventional analysis
`likely represented false positives (Table 1). With Safe-SeqS anal-
`ysis of
`the same data, no supermutants were identified at
`any position.
`
`Table 1. Safe-SeqS with endogenous UIDs
`
`Conventional analysis
`High-quality base pairs
`Mean high-quality base pairs
`read depth
`Mutations identified
`Mutations/bp
`Safe-SeqS analysis
`High-quality base pairs
`Mean high-quality base pairs
`read depth
`UID families
`Average no. of members/UID family
`Median no. of members/UID family
`Supermutants identified
`Supermutants/bp
`
`Capture
`
`Inverse PCR
`
`106,958,863
`38,620×
`
`1,041,346,645
`2,085,600×
`
`25,563
`2.4E-04
`
`234,352
`2.3E-04
`
`106,958,863
`38,620×
`
`1,041,346,645
`2,085,600×
`
`69,505
`40
`19
`8
`3.5E-06
`
`1,057
`21,688
`4
`0
`0.0
`
`Fig. 1. Essential elements of Safe-SeqS. In the first step, each fragment to
`be analyzed is assigned a unique identification (UID) DNA sequence (green
`or blue bars). In the second step, the uniquely tagged fragments are am-
`plified, producing UID families, each member of which has the same UID. A
`supermutant is defined as a UID family in which ≥95% of family members
`have the same mutation.
`
`practical issues related to UID assignment and supermutants are
`discussed in detail in SI Materials and Methods.
`
`Endogenous UIDs. UIDs, sometimes called barcodes or indexes,
`can be assigned to nucleic acid fragments using a variety of
`methods. These methods include the introduction of exogenous
`sequences through PCR (40, 41) or ligation (42, 43). Even more
`simply, randomly sheared genomic DNA inherently contains
`UIDs consisting of the sequences of the two ends of each sheared
`fragment (Fig. 2 and Fig. S1). Paired-end sequencing of these
`fragments yields UID families that can be analyzed as described
`above. To use such endogenous UIDs in Safe-SeqS, we used two
`separate approaches: one designed to evaluate many genes si-
`multaneously and the other designed to evaluate a single gene
`fragment in depth (Fig. 2 and Fig. S1, respectively).
`For the evaluation of multiple genes, we ligated standard Illu-
`mina sequencing adapters to the ends of sheared DNA fragments
`to produce a standard sequencing library and then captured genes
`of interest on a solid phase (44). In this experiment, a library made
`from the DNA of ∼15,000 normal cells was used, and 2,594 bp
`from six genes were targeted for capture. After excluding known
`single-nucleotide polymorphisms, 25,563 apparent mutations,
`corresponding to 2.4 × 10−4 mutations/bp, were also identified
`(Table 1). On the basis of previous analyses of mutation rates in
`human cells, at least 90% of these apparent mutations were likely
`to represent mutations introduced during template and library
`preparation or base-calling errors. Note that the error rate de-
`termined here (2.4 × 10−4 mutations/bp) is considerably lower than
`usually reported in experiments using the Illumina instrument
`because we used very stringent criteria for base calling (SI Materials
`and Methods).
`With Safe-SeqS analysis of the same data, we determined that
`69,505 original template molecules were assessed in this experiment
`(i.e., 69,505 UID families, with an average of 40 members per family,
`were identified) (Table 1). All of the polymorphic variants identified
`by conventional analysis were also identified by Safe-SeqS. However,
`only eight supermutants were observed among these families, cor-
`responding to 3.5 × 10−6 mutations/bp. Thus, Safe-SeqS decreased
`the presumptive sequencing errors by at least 70-fold.
`
`Kinde et al.
`
`PNAS |
`
`June 7, 2011 | vol. 108 | no. 23 | 9531
`
`PGDX EX. 1004
`Page 2 of 16
`
`

`

`residual, unused UID assignment primers are removed by di-
`gestion with a single strand-specific exonuclease, without further
`purification, and two new primers are added. The new primers,
`complementary to the tails introduced in the UID assignment
`cycles, contain grafting sequences at their 5′ ends, permitting solid-
`phase amplification on the Illumina instrument, and phosphor-
`othioate residues at their 3′ ends to make them resistant to any
`remaining exonuclease. Following 25 additional cycles of PCR, the
`products are loaded on the Illumina instrument. As shown below,
`this strategy allowed us to evaluate the majority of input fragments
`and was used for several illustrative experiments.
`
`Analysis of DNA Polymerase Fidelity. Measurement of the error rates
`of DNA polymerases is essential for their characterization and
`dictates the situations in which these enzymes can be used. We
`chose to measure the error rate of Phusion polymerase, as this
`polymerase has one of the lowest reported error frequencies of any
`commercially available enzyme and therefore poses a particular
`challenge for an in vitro-based approach. We first amplified a sin-
`gle human DNA template molecule, comprising a segment of an
`arbitrarily chosen human gene, through 19 rounds of PCR. The
`PCR products from these amplifications, in their entirety, were
`used as templates for Safe-SeqS as described in Fig. 3. In seven
`independent experiments of this type, the number of UID families
`identified by sequencing was 624,678 ± 421,274, which is consistent
`with an amplification efficiency of 92 ± 9.6% per round of PCR.
`The error rate of Phusion polymerase, estimated through cloning
`of PCR products encoding β-galactosidase in plasmid vectors and
`transformation into bacteria, is reported by the manufacturer to be
`4.4 × 10−7errors/bp/PCR cycle. Even with very high-stringency base
`calling, conventional analysis of the Illumina sequencing data
`revealed an apparent error rate of 9.1 × 10−6 errors/bp/PCR cycle,
`more than an order of magnitude higher than the reported Phusion
`polymerase error rate (Table 2, polymerase fidelity). In contrast,
`Safe-SeqS of the same data revealed an error rate of 4.5 ×
`10−7errors/bp/PCR cycle, nearly identical to that measured for
`Phusion polymerase in biological assays (Table 2, polymerase
`fidelity). The vast majority (>99%) of these errors were single-base
`substitutions (Table S1, polymerase fidelity), consistent with pre-
`vious data on the mutation spectra created by other prokaryotic
`DNA polymerases (15, 46, 47).
`Safe-SeqS also allowed a determination of the total number of
`distinct mutational events and an estimation of PCR cycle in
`which the mutation occurred. There were 19 cycles of PCR per-
`formed in wells containing a single template molecule in these
`experiments. If a polymerase error occurred in cycle 19, there
`would be only one supermutant produced (from the strand con-
`taining the mutation). If the error occurred in cycle 18, there
`should be two supermutants (derived from the mutant strands
`produced in cycle 19), etc. Accordingly, the cycle in which the
`error occurred is related to the number of supermutants con-
`taining that error. The data from seven independent experiments
`demonstrate a relatively consistent number of observed total
`polymerase errors (2.2 ± 1.1 × 10−6 distinct mutations/bp), in
`reasonable agreement with the number expected from simula-
`tions (1.5 ± 0.21 × 10−6 distinct mutations/bp, detailed in SI
`Materials and Methods). The data also show a highly variable
`timing of occurrence of polymerase errors among experiments
`(Table S2), as predicted from classic fluctuation analysis (1). This
`kind of information is difficult to derive using conventional anal-
`ysis of the same next-generation sequencing data, in part because
`of the prohibitively high apparent mutation rate noted above.
`
`Analysis of Oligonucleotide Composition. A small number of mis-
`takes during the synthesis of oligonucleotides from phoshoramidite
`precursors are tolerable for most applications, such as routine PCR
`or cloning. However, for synthetic biology, wherein many oligonu-
`cleotides must be joined together, such mistakes present a major
`
`Exogenous UIDs. Although the results described above show that
`Safe-SeqS can increase the reliability of massively parallel sequenc-
`ing, the number of different molecules that can be examined using
`endogenous UIDs is limited. For fragments sheared to an average
`size of 150 bp (range 125–175), 36-base paired-end sequencing can
`evaluate a maximum of ∼7,200 different molecules containing
`a specific mutation (2 reads × 2 orientations × 36 bases/read × 50-
`base variation on either end of the fragment). In practice, the actual
`number of UIDs is smaller because the shearing process is not
`entirely random.
`To make more efficient use of the original templates, we de-
`veloped a Safe-SeqS strategy that used a minimum number of
`enzymatic steps. This strategy also permitted the use of degraded
`or damaged DNA, such as found in clinical specimens or after
`bisulfite treatment for the examination of cytosine methylation
`(45). As depicted in Fig. 3, this strategy employs two sets of PCR
`primers. The first set is synthesized with standard phosphoramidite
`precursors and contained sequences complementary to the gene of
`interest on the 3′ end and different tails at the 5′ ends of both the
`forward and reverse primers. The different tails allowed universal
`amplification in the next step. Finally, there was a stretch of 12–14
`random nucleotides between the tail and the sequence-specific
`nucleotides in the forward primer (40). The random nucleotides
`form the UIDs. An equivalent way to assign UIDs to fragments, not
`used in this study, would employ 10,000 forward primers and
`10,000 reverse primers synthesized on a microarray. Each of these
`20,000 primers would have gene-specific primers at their 3′ ends
`and one of 10,000 specific, predetermined, nonoverlapping UID
`sequences at their 5′ ends, allowing for 108 [i.e., (104)2] possible
`UID combinations. In either case, two cycles of PCR are per-
`formed with the primers and a high-fidelity polymerase, producing
`a uniquely tagged, double-stranded DNA fragment from each of
`the two strands of each original template molecule (Fig. 3). The
`
`**
`
`**
`**
`
`**
`
`**
`
`*****
`*****
`
`UID Assignment Cycle #1
`
`UID Assignment Cycle #2
`
`Library Amplification
`
`Redundant Sequencing
`
`Safe-SeqS with exogenous UIDs. DNA (sheared or unsheared) is am-
`Fig. 3.
`plified with a set of gene-specific primers. One of the primers has a random
`DNA sequence (e.g., a set of 14 Ns) that forms the unique identifier (UID)
`(variously colored bars), located 5′ to its gene-specific sequence, and both
`have sequences that permit universal amplification in the next step (yellow
`and orange bars). Two UID assignment cycles produce two fragments—each
`with a different UID—from each double-stranded template molecule, as
`shown. Subsequent PCR with universal primers, which also contain “grafting”
`sequences (black and red bars), produces UID families that are directly se-
`quenced. Supermutants are defined as in the legend to Fig. 1.
`
`9532 | www.pnas.org/cgi/doi/10.1073/pnas.1105422108
`
`Kinde et al.
`
`PGDX EX. 1004
`Page 3 of 16
`
`

`

`GENETICS
`
`S3), which were distributed in the expected stochastic pattern
`among replicate experiments. The number of errors in the oligo-
`nucleotides synthesized with phosphoramidites was ∼60 times
`higher than that in the equivalent products synthesized by Phusion
`polymerase. These data, in toto, indicate that the vast majority of
`errors in the former were generated during their synthesis rather
`than during the Safe-SeqS procedure.
`Does Safe-SeqS preserve the ratio of mutant:normal sequences in
`the original templates? To address this question, we synthesized two
`31-base oligonucleotides of identical sequence with the exception of
`nucleotide 15 (50:50 C/G instead of T) and mixed them at nominal
`mutant/normal fractions of 3.3% and 0.33%. Through Safe-SeqS
`analysis of the oligonucleotide mixtures, we found that the ratios
`were 2.8% and 0.27%, respectively. We conclude that the UID as-
`signment and amplification procedures used in Safe-SeqS do not
`greatly alter the proportion of variant sequences and thereby provide
`a reliable estimate of that proportion when unknown. This conclu-
`sion is also supported by the reproducibility of variant fractions when
`analyzed in independent Safe-SeqS experiments (Fig. S2A).
`
`Analysis of DNA Sequences from Normal Human Cells. The exogenous
`UID strategy (Fig. 3) was then used to determine the prevalence of
`rare mutations in a small region of the CTNNB1 gene isolated
`from ∼100,000 normal human cells from three unrelated individ-
`uals. Through comparison with the number of UID families
`obtained in the Safe-SeqS experiments (Table 2, CTNNB1 muta-
`tions in DNA from normal human cells), we calculated that the
`majority (78 ± 9.8%) of the input fragments were converted into
`UID families. There was an average of 68 members/UID family,
`easily fulfilling the required redundancy for Safe-SeqS (Fig. S3).
`Conventional analysis of the Illumina sequencing data revealed an
`average of 118,488 ± 11,357 mutations among the ∼560 Mb of
`sequence analyzed per sample, corresponding to an apparent mu-
`tation prevalence of 2.1 ± 0.16 × 10−4 mutations/bp (Table 2,
`CTNNB1 mutations in DNA from normal human cells). Only an
`average of 99 ± 78 supermutants were observed in the Safe-SeqS
`analysis. The vast majority (>99%) of supermutants were single-
`base substitutions and the calculated mutation rate was 9.0 ± 3.1 ×
`10−6 mutations/bp (Table S1, CTNNB1 mutations in DNA from
`normal human cells). Safe-SeqS thereby reduced the apparent
`frequency of mutations in genomic DNA by at least 24-fold (Fig. 4).
`We applied the identical strategy to a short segment of mito-
`chondrial DNA isolated from ∼1,000 cells from each of seven
`unrelated individuals. Conventional analysis of the Illumina se-
`quencing libraries produced with the Safe-SeqS procedure (Fig. 3)
`revealed an average of 30,599 ± 12,970 mutations among the
`∼150 Mb of sequence analyzed per sample, corresponding to an
`apparent mutation prevalence of 2.1 ± 0.94 × 10−4 mutations/bp
`(Table 2, mitochondrial mutations in DNA from normal human
`cells). Only 135 ± 61 supermutants were observed in the Safe-
`SeqS analysis. As with the CTNNB1 gene, the vast majority of
`mutations were single-base substitutions, although occasional
`single-base deletions were also observed (Table S1, mitochondrial
`mutations in DNA from normal human cells). The calculated
`mutation rate in the analyzed segment of mtDNA was 1.4 ±
`0.68 × 10−5 mutations/bp (Table 2, mitochondrial mutations in
`DNA from normal human cells). Thus, Safe-SeqS thereby re-
`duced the apparent frequency of mutations in mitochondrial
`DNA by at least 15-fold.
`
`Discussion
`The results described above demonstrate that the Safe-SeqS ap-
`proach can substantially improve the accuracy of massively parallel
`sequencing (Tables 1 and 2). It can be implemented through either
`endogenous or exogenously introduced UIDs and can be applied to
`virtually any sample preparation workflow or sequencing platform.
`As demonstrated here, the approach can easily be used to identify
`rare mutants in a population of DNA templates, to measure poly-
`
`Table 2. Safe-SeqS with exogenous UIDs
`
`Mean
`
`SD
`
`996,855,791 64,030,757
`198,638
`22,515
`2.0E-04
`1.7E-05
`9.1E-06
`7.7E-07
`
`996,855,791 64,030,757
`624,678
`421,274
`107
`122
`197
`143
`9.9E-06
`2.3E-06
`4.5E-07
`1.0E-07
`
`Polymerase fidelity
`Conventional analysis of seven replicates
`High-quality base pairs
`Total mutations identified
`Mutations/bp
`Calculated Phusion error rate
`(errors/bp/cycle)
`Safe-SeqS analysis of seven replicates
`High-quality base pairs
`UID families
`Members/UID family
`Total supermutants identified
`Supermutants/bp
`Calculated Phusion error rate
`(errors/bp/cycle)
`CTNNB1 mutations in DNA from normal human cells
`Conventional analysis of three individuals
`High-quality base pairs
`Total mutations identified
`Mutations/bp
`Safe-SeqS analysis of three individuals
`559,334,774 66,600,749
`High-quality base pairs
`374,553
`263,105
`UID families
`68
`38
`Members/UID family
`99
`78
`Total supermutants identified
`9.0E-06
`3.1E-06
`Supermutants/bp
`Mitochondrial mutations in DNA from normal human cells
`Conventional analysis of seven individuals
`High-quality base pairs
`Total mutations identified
`Mutations/bp
`Safe-SeqS analysis of seven individuals
`High-quality base pairs
`UID families
`Members/UID family
`Total supermutants identified
`Supermutants/bp
`
`559,334,774 66,600,749
`118,488
`11,357
`2.1E-04
`1.6E-05
`
`147,673,456 54,308,546
`30,599
`12,970
`2.1E-04
`9.4E-05
`
`147,673,456 54,308,546
`515,600
`89,985
`15
`6
`135
`61
`1.4E-05
`6.8E-06
`
`obstacle to success. Clever strategies for making the gene con-
`struction process more efficient have been devised (48, 49), but all
`such strategies would benefit from more accurate synthesis of the
`oligonucleotides themselves. Determining the number of errors in
`synthesized oligonucleotides is difficult because the fraction of oli-
`gonucleotides containing errors can be lower than the sensitivity of
`conventional next-generation sequencing analyses.
`To determine whether Safe-SeqS could be used for this de-
`termination, we used standard phosphoramidite chemistry to syn-
`thesize an oligonucleotide containing 31 bases that were designed
`to be identical to that analyzed in the polymerase fidelity experi-
`ment described above. In the synthetic oligonucleotide, the 31 ba-
`ses were surrounded by sequences complementary to primers that
`could be used for the UID assignment steps of Safe-SeqS (Fig. 3).
`By performing Safe-SeqS on ∼300,000 oligonucleotide templates,
`we found that there were 8.9 ± 0.28 × 10−4 supermutants/bp and
`that these errors occurred throughout the sequence of the oligo-
`nucleotides (Fig. S2A). The oligonucleotides contained a large
`number of insertion and deletion errors, representing 8.2 ± 0.63%
`and 25 ± 1.5% of the total supermutants, respectively. Importantly,
`both the position and the nature of the errors were highly re-
`producible among seven independent replicates of this experiment
`performed on the same batch of oligonucleotides (Fig. S2A). This
`nature and distribution of errors had little in common with that of
`the errors produced by Phusion polymerase (Fig. S2B and Table
`
`Kinde et al.
`
`PNAS |
`
`June 7, 2011 | vol. 108 | no. 23 | 9533
`
`PGDX EX. 1004
`Page 4 of 16
`
`

`

`UID approaches (Fig. 2 and Fig. S1) and the one described by
`Travers et al. are not ideally suited for this purpose because of the
`inevitable losses of template molecules during the ligation and
`other preparative steps.
`How do we know that the mutations identified by conventional
`analyses in the current study represent artifacts rather than true
`mutations in the original templates? Strong evidence supporting
`this is provided by the observation that the mutation prevalence in
`all but one experiment was similar: 2.0 × 10−4–2.4 × 10−4 muta-
`tions/bp (Tables 1 and 2). The exception was the experiment with
`oligonucleotides synthesized from phosphoramidites, in which the
`error of the synthetic process was apparently higher than the error
`rate of conventional Illumina analysis when used with stringent
`base-calling criteria. In contrast, the mutation prevalence of Safe-
`SeqS varied much more, from 0.0 to 1.4 × 10−5 mutations/bp,
`depending on the template and experiment. Moreover, the mu-
`tation prevalence measured by Safe-SeqS in the most controlled
`experiment, in which polymerase fidelity was measured (Table 2,
`polymerase fidelity), was almost identical to that predicted from
`previous experiments in which polymerase fidelity was measured
`by biological assays. Our measurements of mutation prevalence in
`the DNA from normal cells are consistent with some previous
`experimental data. However, estimates of these prevalences vary
`widely and may depend on cell type and sequence analyzed (SI
`Materials and Methods). We therefore cannot be certain that the
`relatively low number of mutations revealed by Safe-SeqS repre-
`sented errors occurring during the sequencing process rather than
`true mutations present in the original DNA templates. Potential
`sources of error in the Safe-SeqS process are described in SI
`Materials and Methods.
`Like all techniques, Safe-SeqS has limitations. For example, we
`have demonstrated that the exogenous UIDs strategy can be used
`to analyze a single amplicon in depth. This technology may not be
`applicable to situations wherein multiple amplicons must be ana-
`lyzed from a sample containing a limited number of templates.
`Multiplexing in the UID assignment cycles (Fig. 3) may provide
`a solution to this challenge. A second limitation is that the effi-
`ciency of amplification in the UID assignment cycles is critical for
`the success of the method. Clinical samples can contain inhibitors
`that reduce the efficiency of this step. This problem can pre-
`sumably be overcome by performing more than two cycles in the
`UID assignment PCR step (Fig. 3), although this would complicate
`the determination of the number of templates analyzed. The
`specificity of Safe-SeqS is currently limited by the fidelity of the
`polymerase used in the UID assignment PCR step, i.e., 8.8 × 10−7
`mutations/bp in its current implementation with two cycles. In-
`creasing the number of cycles in the UID assignment PCR step to
`five would decrease the overall specificity to ∼2 × 10−6 mutations/
`bp. However, this specificity can be increased by requiring more
`than one supermutant for mutation identification—the probability
`of introducing the same artifactual mutation twice or three times
`would be exceedingly low [(2 × 10−6)2 or (2 × 10−6)3, respectively].
`In sum, there are several simple ways to vary the Safe-SeqS pro-
`cedure and analysis to realize the needs of specific experiments.
`Luria and Delbrück, in their classic paper in 1943, wrote that
`their “prediction cannot be verified directly, because what we
`observe, when we count the number of resistant bacteria in a cul-
`ture, is not the number of mutations which have occurred but the
`number of resistant bacteria which have arisen by multiplication of
`those which mutated, the amount of multiplication depending on
`how far back the mutation occurred” (ref. 1, p. 495). The Safe-
`SeqS procedure described here can verify such predictions because
`the number as well as the time of occurrence of each mutation can
`be estimated from the data, as noted in the experiments on poly-
`merase fidelity. In addition to templates generated by polymerases
`in vitro, the same approach can be applied to DNA from bacteria,
`viruses, and mammalian cells. We therefore expect that this
`
`29
`
`Mutation number
`
`57
`
`Indiv. 1
`Indiv. 2
`Indiv. 3
`
`85
`
`Indiv. 1
`Indiv. 2
`Indiv. 3
`
`29
`
`57
`
`85
`
`29
`
`Mutation number
`
`57
`
`85
`
`1.1
`
`0.55
`
`0
`1
`
`1.1
`
`A
`
`5.5
`
`0
`1
`
`Frequency per 10,000bp
`
`B
`
`11
`
`5.5
`
`0
`1
`
`Frequency per 10,000bp
`
`Single-base substitutions identified by conventional and Safe-SeqS
`Fig. 4.
`analysis. The exogenous UID strategy depicted in Fig. 3 was used to produce
`PCR fragments from the CTNNB1 gene of three normal, unrelated individuals.
`Mutation numbers represent one of 87 possible single-base substitutions (3
`possible substitutions/base × 29 bases analyzed). These fragments were se-
`quenced on an Illumina GA IIx instrument and analyzed in the conventional
`manner (A) or with Safe-SeqS (B). Safe-SeqS results are displayed on the same
`scale as conventional analysis for direct comparison; the Inset is a magnified
`view. Note that most of the variants identified by conventional analysis are
`likely to represent sequencing errors, as indicated by their high frequency
`relative to Safe-SeqS and their consistency among unrelated samples.
`
`merase error rates, and to judge the reliability of oligonucleotide
`syntheses. One of the advantages of the strategy is that it yields the
`number of templates analyzed as well as the fraction of templates
`containing variant bases. Previously described in vitro methods for
`the detection of small numbers of template molecules (e.g., refs. 29
`and 50) allow the fraction of mutant templates to be determined but
`cannot d

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket