throbber
The Use of Coded PCR Primers Enables High-Throughput
`Sequencing of Multiple Homolog Amplification Products
`by 454 Parallel Sequencing
`Jonas Binladen1.
`, M. Thomas P. Gilbert1.
`
`, Jonathan P. Bollback2, Frank Panitz3, Christian Bendixen3, Rasmus Nielsen2, Eske Willerslev1*
`
`1 Center for Ancient Genetics, Institute of Biology, University of Copenhagen, Copenhagen, Denmark, 2 Center for Bioinformatics and Institute of
`Biology, University of Copenhagen, Copenhagen, Denmark, 3 Department of Genetics and Biotechnology, Danish Institute of Agricultural Sciences
`Research Centre Foulum, Tjele, Denmark
`
`Background. The invention of the Genome Sequence 20TM DNA Sequencing System (454 parallel sequencing platform) has
`enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR)
`reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as
`homologous sequences cannot be subsequently assigned to their original sources. Methodology. We use conventional PCR
`with 59-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by
`sequencing through the high-throughput Genome Sequence 20TM DNA Sequencing System (GS20, Roche/454 Life Sciences).
`Each DNA sequence is subsequently traced back to its individual source through 59tag-analysis. Conclusions. We demonstrate
`that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once
`sequencing anomalies are accounted for (miss-assignment rate,0.4%). Therefore, the method enables accurate sequencing
`and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias
`in the distribution of the differently tagged primers that is dependent on the 59 nucleotide of the tag. In particular, primers 59
`labelled with a cytosine are heavily overrepresented among the final sequences, while those 59 labelled with a thymine are
`strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second
`nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach
`requires confirmation. However, our experiments demonstrate that 59primer tagging is a useful method in which the
`sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach
`will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses,
`population genetics, and phylogenetics.
`
`Citation: Binladen J, Gilbert MTP, Bollback JP, Panitz F, Bendixen C (2007) The Use of Coded PCR Primers Enables High-Throughput Sequencing of
`Multiple Homolog Amplification Products by 454 Parallel Sequencing. PLoS ONE 2(2): e197. doi:10.1371/journal.pone.0000197
`
`INTRODUCTION
`The arrival of the Genome Sequence 20TM DNA Sequencing
`System (GS20, Roche/454 Life Sciences) and associated ‘Se-
`quencing-by-Synthesis’ protocol has enabled pyrosequencing of up
`to 25 million nucleotides in a single four-hour reaction [1]. The
`method employs single molecule amplification prior to sequencing
`and therefore eliminates the need for prior cloning. In initial
`implementations of the technology random fragments from DNA
`extracts have been sequenced without a priori selection of specific
`genetic regions. As such, all DNA that is present in the sample has
`a chance of being amplified and sequenced that approximately
`correspond to its frequency within the DNA extract. The method
`has proven an efficient tool for use in a number of specific cases,
`such as the rapid sequencing of relatively small genomes [1,2].
`For purposes such as comparative genomics, mitochondrial
`sequencing, and population genetics, it is of interest to combine
`the selectivity of primer-based PCR, with the sequencing power of
`the GS20 platform. The simplest way to achieve this is the use of
`the GS20 to emulsion PCR (emPCR) then pyrosequence the
`products of individual PCR reactions. Due to the sequencing
`power of the GS20 this approach results in hundreds of thousands
`of individual sequences from a single PCR reaction, each derived
`directly from a single original template within the reaction [1]. As
`such, this result is similar to the generation of sequence data
`through conventional cloning. We henceforth term the GS20
`derived sequences as single molecule sequences. Obviously,
`in
`many studies the amount of single molecule sequences produced
`by single GS20 runs is unnecessary and economically unfeasible,
`
`unless several PCR products can be processed simultaneously and
`correctly assigned.
`Thomas and co-authors [3] recently took the first step in mak-
`ing this possible by pooling together eleven PCR products, each
`targeting different regions of the genome, into single sequencing-
`by-synthesis reactions using the GS20. In this case, the authors
`could easily sort the sequence data due to the unique genetic
`sequence of each original target. Furthermore, by sequencing the
`
`Academic Editor: Matthew Hahn, Indiana University, United States of America
`
`Received November 27, 2006; Accepted January 16, 2007; Published February
`14, 2007
`
`Copyright: ß 2007 Binladen et al. This is an open-access article distributed
`under the terms of the Creative Commons Attribution License, which permits
`unrestricted use, distribution, and reproduction in any medium, provided the
`original author and source are credited.
`
`Funding: JB and EW were supported by the Wellcome Trust, UK, the Carlsberg
`Foundation, DK, and the National Science Foundation, DK. MTPG acknowledges
`the Marie Curie Actions FP6-MEIF-CT-2005-025002 ‘FORMAPLEX’ grant for funding
`his research. JPB and RN were funded by the Danish FSS and the National Science
`Foundation, DK. None of the sponsors or funders have had any influence on the
`data or manuscript presented here.
`
`Competing Interests: The authors have declared that no competing interests
`exist.
`
`* To whom correspondence should be addressed. E-mail: ewillerslev@bi.ku.dk
`
`. These authors contributed equally to this work.
`
`PLoS ONE | www.plosone.org
`
`1
`
`February 2007 |
`
`Issue 2 | e197
`
`Ariosa Exhibit 1005, p. 1
`
`

`

`59 Primer Tags on the GS20
`
`combined PCR products from separate individual specimens on
`specially partitioned fragments
`(1/8 sections) of
`the GS20
`PicoTitrePlateTM,
`they were rapidly able to generate large
`numbers of sequences from each of the eleven PCR products
`(<1,000 per product) [3].
`While this represents an excellent advance in the exploitation of
`the GS20, in theory the combined ‘‘primer specific PCR/GS20’’
`approach can be enhanced even further. For example, the number
`of sequences generated in even an 1/8th run of the GS20 using
`a 40675 PicoTitrePlateTM (currently the smallest commercially
`available subdivision of a single GS20 reaction) is large; in our
`experience such a run routinely generates at least 6,000, and more
`commonly over 10,000 sequences per run. With an estimated 10-
`fold coverage, using the method of Thomas et al [3] this could
`enable the pooling of 600 PCR products in a single reaction.
`However, the subsequent identification of the sequence reads
`would require either the pooling of 600 PCR products targeting
`unique genetic regions, or, if multiple homologous PCR products
`were to be co-sequenced (i.e. multiple different products amplified
`using a single identical primer pair), an a prior knowledge about the
`exact sequence of each target.
`In this paper we have overcome this problem, presenting
`a method where initial PCR primers are 59-tagged with short
`
`nucleotide sequences (tags) in such a way that a unique tagged
`primer combination can be applied to each specific DNA template
`source. As sequences generated by the GS20 commence at the
`very first position of the source DNA fragment, the tags are
`observed in the generated sequences. Therefore sequences can
`rapidly be sorted into their original template source using the tags
`(Figure 1). Currently,
`the method provides a means for the
`simultaneous sequencing, generation of single molecule sequences,
`(,120 bp)
`and assignment of
`short
`from homologous PCR
`products obtained from multiple individuals. However, as the
`GS20 sequencing-by-synthesis
`technologies are developed to
`increase both the number, and length of the sequences generated,
`the power of this technique will likewise increase.
`
`METHODS
`In theory, a GS20 reaction that has been performed on a pool of
`different PCR products at equimolar concentration should
`generate an equal number of sequences from each PCR product.
`However, in practice it can be expected that random processes
`occurring during the procedure will result in a Poisson distributed
`relative frequency of the final products. In addition to this, the
`different 59 tags used on the primers for the initial PCR might
`potentially bias the final sequence distribution. As a result, the
`
`Figure 1. The application of 59 primer tags to the GS20 sequencing-by-synthesis process.
`doi:10.1371/journal.pone.0000197.g001
`
`PLoS ONE | www.plosone.org
`
`2
`
`February 2007 |
`
`Issue 2 | e197
`
`Ariosa Exhibit 1005, p. 2
`
`

`

`59 Primer Tags on the GS20
`
`the subsequent discrimination of 256 different products. However,
`under the current status of the sequencing technology, GS20
`sequencing reads are limited to approximately 120 bases, thus in
`this experiment the full sequence (133–141 bp including primer,
`species dependent) was not returned and our analyses were limited
`to simply discriminating using the primer at the sequence end of
`the product. Furthermore, during the GS20 process, single DNA
`fragments are mobilised to beads in either orientation (c.f. [1] for
`details). The implication of this is that approximately 50% of each
`PCR product will be sequenced from the orientation of
`the
`forward primer, and 50% from the orientation of the reverse
`primer. Hence, this made it necessary to label both the forward
`and reverse end of each PCR product.
`In addition to the above experiments, three further unique
`primer pairs were designed and used for PCR,
`that contain
`tetranucleotide tails (Table 1) in order to investigate whether an
`increased tail length affects the efficiency of the method. Increas-
`ing the tag sequence would exponentially increase the number of
`possible unique primer combinations and thus PCR reactions that
`can be incorporated into a single GS20 sequencing run.
`
`DNA samples analysed
`DNA from thirteen species was used as PCR template (Table 2).
`The target species and size of the PCR insert (excluding primers)
`impala (Aepyceros melampus) 92 bp; grey wolf
`were as follows:
`(Canis lupus) 91 bp; cheetah (Acinonyx jubatus) 91 bp; hippopotamus
`(Hippopotamus amphibious) 91 bp;
`lion (Panthera leo) 95 bp; saiga
`antelope (Saiga tartarica) 93 bp, Mueller’s Bornean gibbon (Hylobates
`muelleri) 94 bp, narwhal (Monodon monoceros) 90 bp; domestic mouse
`(Mus domesticus) 97 bp; musk ox (Ovibos moschatus) 93 bp; human
`94 bp; Burchell’s zebra (Equus burchelli) 89 bp; and African buffalo
`(Syncerus
`caffer) 94 bp. The DNA was extracted from frozen
`specimens using the DNEasy tissue extraction kit
`(Qiagen)
`following the manufacturer’s protocol. To increase the number
`of different PCR products that we could pool into the GS20-
`reaction beyond a single product from each of available thirteen
`extractions, we used individual primer pairs on several different
`extractions each (Table 2).
`
`PCR conditions
`We generated 64 differently labelled 16S mtDNA PCR fragments
`(Table 2). PCRs were performed in 25 ml PCR reactions contain-
`ing 16 PCR Buffer, 2.5 mM MgCl2 solution, 0.2 mM dNTP
`Mix, 1 U Taq DNA Polymerase, 1 mM each primer and 1 ml
`DNA extract. Cycling was performed using a Mastercycler
`Gradient Thermal Cycler (Eppendorf) with the following cycle
`program: Initial denaturation at 94uC for 2 minutes followed by
`25 cycles of 94uC for 30 seconds, 56uC for 30 seconds and 72uC
`for 30 seconds,
`followed by a final extension of 8 minutes at
`72uC. Five ml of the PCR products were visualised on 2% agarose
`gels using ethidium bromide staining and UV light
`trans-
`illumination. Positive PCR products were purified using the
`Invisorb Spin PCRapid kit
`(Invitek) and quantified using
`a Nanodrop ND-1000 (Nanodrop Technologies). Quantification
`was performed directly on the purified PCR products (that is,
`without dilution). Several duplicate measurements indicated that
`intrasample measurement variation was negligible. Purified yields
`were between 3.8–26.1 ng/ml
`(Supplementary Table S1). Sub-
`sequently the PCR products were pooled together. The PCR
`products were at equimolar concentrations (26.1 ng each) with
`two exceptions; amplification products from the buffalo were
`added at double concentration (52.2 ng), and PCR products
`generated from the zebra template used twice the number of
`
`incorporation of too many different PCR products in a single
`GS20 reaction could result in some of them not being sequenced.
`In contrast, the incorporation of too few PCR products in a single
`454 parallel sequencing run minimises the efficiency, and cost
`efficacy of the method. Furthermore, as one advantage of the
`approach is the generation of single molecule sequences, it is useful
`to empirically determine how many sequences can be expected
`from each of a set of PCR products that are pooled in equimolar
`concentration.
`We performed a test involving the analysis of a single genetic
`marker in DNA extracts from multiple different individuals to
`investigate the effectiveness of
`this method (i.e. how many
`individual PCR products can be expected to be represented
`among a set number of sequences). This was achieved using
`a single conventional pair of mammalian mitochondrial DNA
`(mtDNA) 16S primers [4]. The primers were originally designed as
`mammalian generic, and amplify an 89–97 bp fragment (133–
`141 bp including primers) that is discriminatory between mam-
`malian species. The study is thus an analogue to a likely use of the
`technique - the PCR amplification and sequencing of specific
`genetic regions from multiple individuals of a single species.
`
`59 primer tagging
`The original primers were modified into sixteen unique forward,
`and sixteen reverse primers through the addition of 59 dinucleotide
`tags (Table 1). In contrast
`to most conventional sequencing
`platforms, pyrosequencing methods (such as that used by the
`GS20) generate data from the first base of the fragment sequenced.
`Thus, the 59 tags on each primer will be apparent in the final
`sequence. The sixteen unique forward and reverse primers can be
`combined to make 16*16 = 256 unique sequence tags. In this way,
`an investment of thirty-two initial primers could in theory enable
`
`Table 1. 59 tagged PCR primers
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`Forward primers
`
`Reversed primers
`
`Name
`
`16Faa
`
`16Fac
`
`Sequence (59–39)
`
`aacggttggggtgacctcgga
`
`accggttggggtgacctcgga
`
`Name
`
`16Raa
`
`16Rac
`
`Sequence (59–39)
`
`aagctgttatccctagggtaact
`
`acgctgttatccctagggtaact
`
`16Fag
`
`agcggttggggtgacctcgga
`
`16Rag
`
`aggctgttatccctagggtaact
`
`16Fat
`
`16Fca
`
`16Fcc
`
`atcggttggggtgacctcgga
`
`cacggttggggtgacctcgga
`
`cccggttggggtgacctcgga
`
`16Rat
`
`16Rca
`
`16Rcc
`
`atgctgttatccctagggtaact
`
`cagctgttatccctagggtaact
`
`ccgctgttatccctagggtaact
`
`16Fcg
`
`cgcggttggggtgacctcgga
`
`16Rcg
`
`cggctgttatccctagggtaact
`
`16Fct
`
`16Fga
`
`16Fgc
`
`ctcggttggggtgacctcgga
`
`16Rct
`
`ctgctgttatccctagggtaact
`
`gacggttggggtgacctcgga
`
`16Rga
`
`gagctgttatccctagggtaact
`
`gccggttggggtgacctcgga
`
`16Rgc
`
`gcgctgttatccctagggtaact
`
`16Fgg
`
`ggcggttggggtgacctcgga
`
`16Rgg
`
`gggctgttatccctagggtaact
`
`16Fgt
`
`16Fta
`
`16Ftc
`
`16Ftg
`
`16Ftt
`
`gtcggttggggtgacctcgga
`
`tacggttggggtgacctcgga
`
`tccggttggggtgacctcgga
`
`tgcggttggggtgacctcgga
`
`ttcggttggggtgacctcgga
`
`16Rgt
`
`16Rta
`
`16Rtc
`
`16Rtg
`
`16Rtt
`
`gtgctgttatccctagggtaact
`
`tagctgttatccctagggtaact
`
`tcgctgttatccctagggtaact
`
`tggctgttatccctagggtaact
`
`ttgctgttatccctagggtaact
`
`16SF4a
`
`gctacggttggggtgacctcgga
`
`16SR4a
`
`gtacgctgttatccctagggtaact
`
`16SF4b
`
`tcagcggttggggtgacctcgga
`
`16SR4b
`
`tgacgctgttatccctagggtaact
`
`16SF4c
`
`ctagcggttggggtgacctcgga
`
`16SR4c
`
`tagcgctgttatccctagggtaact
`
`doi:10.1371/journal.pone.0000197.t001
`
`..........................................................................................
`
`PLoS ONE | www.plosone.org
`
`3
`
`February 2007 |
`
`Issue 2 | e197
`
`Ariosa Exhibit 1005, p. 3
`
`

`

`59 Primer Tags on the GS20
`
`0.0000
`
`0.0000
`
`0.0000
`
`0.0417
`
`0.0000
`
`0.0000
`
`0.0039
`
`0.0114
`
`0.0000
`
`0.0034
`
`0.0061
`
`0.0000
`
`0.0043
`
`0.0042
`
`0.0000
`
`0.0000
`
`0.0000
`
`0.0000
`
`0.0000
`
`0.0000
`
`0.0062
`
`0.0053
`
`0.0100
`
`0.0000
`
`0.0000
`
`0.0000
`
`0.0090
`
`0.0088
`
`0.0116
`
`0.0000
`
`0.0000
`
`0.0105
`
`0.0000
`
`0.0000
`
`0.0088
`
`0.0000
`
`0.0000
`
`0.0071
`
`1
`
`0
`
`0
`
`1
`
`0
`
`0
`
`1
`
`0
`
`0
`
`1
`
`1
`
`1
`
`0
`
`0
`
`0
`
`1
`
`1
`
`1
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`1
`
`1
`
`0
`
`2
`
`1
`
`0
`
`3
`
`1
`
`0
`
`0
`
`2
`
`0
`
`0
`
`0
`
`326
`
`305
`
`231
`
`237
`
`161
`
`202
`
`172
`
`191
`
`104
`
`118
`
`162
`
`190
`
`100
`
`AssignmentError
`
`assigned
`Incorrectly
`
`100
`
`114
`
`100
`
`114
`
`50
`
`48
`
`60
`
`45
`
`258
`
`263
`
`286
`
`291
`
`110
`
`87
`
`105
`
`111
`
`113
`
`86
`
`50
`
`50
`
`60
`
`45
`
`259
`
`266
`
`286
`
`292
`
`328
`
`305
`
`232
`
`238
`
`161
`
`202
`
`172
`
`191
`
`104
`
`118
`
`163
`
`191
`
`101
`
`110
`
`87
`
`105
`
`112
`
`114
`
`87
`
`127
`
`127
`
`93
`
`95
`
`65
`
`68
`
`113
`
`179
`
`115
`
`141
`
`93
`
`96
`
`65
`
`68
`
`114
`
`179
`
`115
`
`142
`
`74
`
`84
`
`108
`
`112
`
`101
`
`106
`
`81
`
`80
`
`2
`
`5
`
`4
`
`6
`
`3
`
`3
`
`102
`
`93
`
`82
`
`99
`
`108
`
`93
`
`96
`
`86
`
`1
`
`1
`
`assigned
`TotalCorrectly
`
`melampus
`Aepyceros
`
`caffer
`Syncerus
`
`Impala
`
`Buffalo
`African
`
`54
`
`57
`
`64
`
`21
`
`15
`
`19
`
`12
`
`32
`
`28
`
`7
`
`13
`
`19
`
`23
`
`28
`
`36
`
`20
`
`23
`
`16
`
`23
`
`1
`
`56
`
`44
`
`17
`
`17
`
`72
`
`98
`
`58
`
`69
`
`43
`
`51
`
`25
`
`29
`
`18
`
`28
`
`20
`
`15
`
`21
`
`58
`
`41
`
`49
`
`leo
`Panthera
`
`amphibius
`Hippopotamus
`
`jubatus
`Acinonyx
`
`lupus
`Canis
`
`Saiga
`
`CheetahHippopotamusLion
`
`Wolf
`
`16SR4C
`
`16SF4C
`
`16SR4B
`
`16SF4B
`
`16SR4A
`
`16SF4A
`
`16RCT
`
`16FCT
`
`16RCG
`
`16FCG
`
`16RCC
`
`16FCC
`
`16RCA
`
`16FCA
`
`16RGT
`
`16FGT
`
`16RGG
`
`16FGG
`
`16RGC
`
`16FGC
`
`16RGA
`
`16FGA
`
`16RTT
`
`16FTT
`
`16RTG
`
`16FTG
`
`16RTC
`
`16FTC
`
`16RTA
`
`16FTA
`
`16RAT
`
`16FAT
`
`16RAG
`
`16FAG
`
`16RAC
`
`16FAC
`
`16RAA
`
`16FAA
`
`Primer
`
`.......................................................................................................................................................................................................
`Table2.Assignedsequencedistribution
`
`.......................................................................................................................................................
`
`PLoS ONE | www.plosone.org
`
`4
`
`February 2007 |
`
`Issue 2 | e197
`
`54
`
`60
`
`19
`
`15
`
`55
`
`43
`
`82
`
`86
`
`96
`
`80
`
`117
`
`106
`
`54
`
`71
`
`5
`
`1
`
`6
`
`4
`
`14
`
`19
`
`1
`
`0
`
`2
`
`3
`
`1
`
`2
`
`9
`
`24
`
`26
`
`25
`
`11
`
`9
`
`4
`
`19
`
`1
`
`1
`
`1
`
`35
`
`45
`
`24
`
`31
`
`25
`
`42
`
`34
`
`43
`
`1
`
`46
`
`43
`
`61
`
`51
`
`63
`
`64
`
`54
`
`42
`
`71
`
`90
`
`61
`
`84
`
`8
`
`5
`
`65
`
`86
`
`1
`
`1
`
`1
`
`27
`
`35
`
`12
`
`31
`
`7
`
`19
`
`40
`
`49
`
`1
`
`1
`
`48
`
`55
`
`58
`
`47
`
`39
`
`burchelli
`Equus
`
`sapiens
`Homo
`
`moschatus
`Ovibos
`
`domesticus
`Mus
`
`monoceros
`Monodon
`
`muelleri
`Hylobates
`
`tartarica
`Saiga
`
`HumanZebra
`
`MuskOx
`
`Mouse
`Domestic
`
`Narwhal
`
`Gibbon
`
`Ariosa Exhibit 1005, p. 4
`
`

`

`59 Primer Tags on the GS20
`
`tags. The pooled PCR products were subsequently
`different
`analysed on the GS20 platform using the complete sample
`preparation and analytical process, as recommended by the
`manufacturer (Roche). The initial
`sample concentration was
`9.33 ng/ml and 21 ng (23 ml) was used for the reaction. No
`nebulization was performed and the average concentration of
`single stranded library was 75 pg/ml. The calculated dilution
`factor was 5.25 and sequencing was performed as a full titration
`run without bead enrichment,
`i.e. the run was performed on
`a 40675 plate, divided into 8 sectors (a titration run uses 4 of
`these sectors with different numbers of DNA molecules per bead
`i.e. 1,4,16, and 64 respectively.).
`
`Conventional sequencing of the targets
`Although the complete 16S mtDNA sequences for most of the
`species analysed is available in the public domain, we regenerated
`the target sequences for the thirteen mammal species used using
`conventional dye-labelled sequencing (Sequencing reactions and
`analyses performed on Applied Biosystems platforms by Macro-
`gen, Korea). This was to ensure that subsequent analyses did not
`mistake natural sequence variation with sequencing errors. The
`thirteen individual 16S mtDNA sequences are deposited in
`GenBank under the accession numbers EF152485–EF152497.
`
`Initial assignment of the sequence data
`As the correct association of tags and sequences is crucial to the
`approach, we followed very conservative criteria post sequencing
`in regards to acceptance of
`the sequence data. Initially, we
`discarded all sequence reads without an exact match to any of the
`primers used in the studies (Primer Mismatched Sequences).
`Subsequently,
`the identity of
`the remaining sequences were
`globally aligned to the thirteen reference sequences (Sanger-
`sequencing generated) using direct and reverse complementation.
`The global alignment was performed using ClustalW [5] used the
`following scoring scheme: matches (+5), mismatches (24), gap
`penalty (210), and a gap extension penalty (210). The latter
`penalties were not applied to end gaps. For each alignment
`a percent identity score was calculated to determine the best
`match in the following way: excluding end gaps, ambiguous states
`(Ns) in the 454 sequence, and gaps introduced in the reference
`sequence during alignment
`the number of mismatches was
`calculated.
`If a sequence differed at more than one nucleotide from the
`highest scoring alignment, then the sequences were discarded into
`a separate dataset. We refer to these sequences as Non-Assigned
`sequences, and the remaining sequences are referred to as
`Assigned sequences. The per nucleotide error rate estimated from
`this type of data is 7.561024 [6]. With reads of a length of
`<100 bp excluding primers, and primers of length 22 bp, the
`expected proportion of non-assigned sequences is then 2.761023
`and the expected proportion of primer mismatched sequences
`should be 1.661022. Any excess of Non-Assigned or Primer
`Mismatched Sequences above this level is then due to experi-
`mental errors other than sequencing errors, such as contamina-
`tion.
`The identity of the Non-Assigned sequences are of some interest
`as they may provide information regarding these other sources of
`experimental error. Thus
`the Non-assigned sequences were
`subsequently subjected to BLAST [7] analyses against the NCBI
`GenBank DNA database in order to determine their identity.
`During this (and other) BLAST analyses performed, when two or
`more hits with identical E-score were reported, we prioritised any
`that matched our 13 target sequences over others.
`
`doi:10.1371/journal.pone.0000197.t002
`Italicnumbersindicatemiss-assignedsequences.
`
`0.0083
`
`0.0000
`
`0.0026
`
`0.0000
`
`0.0866
`
`0.0000
`
`0.0024
`
`0.0043
`
`0.0045
`
`0.0047
`
`0.0000
`
`0.0000
`
`0.0023
`
`0.0000
`
`1.5385
`
`20
`
`0
`
`2
`
`0
`
`11
`
`5622432.46
`
`746
`
`782
`
`988
`
`127
`
`0
`
`279
`
`1
`
`424
`
`2
`
`1
`
`2
`
`0
`
`0
`
`1
`
`0
`
`470
`
`220
`
`422
`
`147
`
`188
`
`431
`
`398
`
`SUMMEAN
`
`0.003557453
`
`rate
`assignment
`Overallmiss-
`
`83.1
`
`83.4
`
`0.0040
`
`0.1525
`
`0.5263
`
`148.5147.9474
`
`20
`
`56425622
`
`sequences
`GS20
`Percent
`
`Mean
`
`Total
`
`error
`assignment
`Species
`
`assigned
`Incorrect
`
`Assigned
`Correctly
`
`Column:
`Analysisby
`
`AssignmentError
`
`assigned
`Incorrectly
`
`assigned
`TotalCorrectly
`
`melampus
`Aepyceros
`
`caffer
`Syncerus
`
`burchelli
`Equus
`
`sapiens
`Homo
`
`moschatus
`Ovibos
`
`domesticus
`Mus
`
`monoceros
`Monodon
`
`muelleri
`Hylobates
`
`tartarica
`Saiga
`
`leo
`Panthera
`
`amphibius
`Hippopotamus
`
`jubatus
`Acinonyx
`
`lupus
`Canis
`
`Primer
`
`Impala
`
`Buffalo
`African
`
`HumanZebra
`
`MuskOx
`
`Mouse
`Domestic
`
`Narwhal
`
`Gibbon
`
`Saiga
`
`CheetahHippopotamusLion
`
`Wolf
`
`.......................................................................................................................................................................................................
`Table2.cont.
`
`............................................................................
`
`PLoS ONE | www.plosone.org
`
`5
`
`February 2007 |
`
`Issue 2 | e197
`
`Ariosa Exhibit 1005, p. 5
`
`

`

`RESULTS
`GS20 sequences generated
`6765 DNA sequences were generated using the GS20 platform
`(Sequence Data S1). The sequence data is provided in the
`supplemental
`information. The sequencing was performed as
`a titration run with no bead enrichment and different DNA/bead
`rations, therefore the number of sequences is lower than what is
`previously reported for PCR products (8,000–12,000, [3]). As such
`the calculations of the sequencing efficiency in this study provides
`a conservative estimate of the potential power of the technique.
`
`Sequence analysis
`Primer Mismatch Sequences Due to the stringent screening
`criteria applied in this study, 458 (6.8%) of
`the 6765 initial
`sequences generated from a 1/8th of a plate run on the GS20 were
`identified as Primer Mismatch Sequences (see above). These
`grouped as follows: 377 sequences or 5.6% have sequencing errors
`in the primer sequence, 54 sequences or 0.8% have the primer
`sequence starting one position off, 3 sequences or 0.04% have the
`primer sequence starting two positions off, and 24 sequences or
`0.4% have the primer sequence starting more than two positions
`off. As the theoretically expected number of mismatches based on
`the sequencing error rate is 1.6%, other sources of error (such as
`damage to the original DNA template, sequencing errors or
`mtDNA heteroplasmy) may be affecting the results.
`The 458 Primer Mismatch Sequences were identified using
`BLAST, revealing that 395 of the sequences (86.2%) match the
`reference sequences of the study. This includes 81 sequences
`where the primers are as expected, but positioned one or more
`base pair positions off the 59 end of the sequence. Among these, 80
`sequences match DNA sequences from species used in this study
`(Supplementary Table S2). Of sequences containing errors in the
`primers 313 of 377 (83.0%) matched species used in this study
`(Supplementary Table S3).
`That so many sequences contained sequencing errors in the
`primers (n = 377) was surprising, and warranted further investiga-
`tion. The sequences could be distinguished into four broad
`categories as follows: Those that failed show any match to the
`primer sequences in general (n = 2); Those that were exact match
`to the core primer but lacked the 59 tag sequence (n = 121); Those
`that contained at least one mismatch and no indels (insertion/
`deletions)
`(n = 53); and those that contained at least one indel
`(n = 201)
`(21 of which also contained a mismatch). We sub-
`sequently investigated whether the errors may have arisen during
`the primer synthesis itself, and not during the sequencing-by-
`synthesis process. This was tested under the assumptions that a)
`errors arising during the primer synthesis process would be
`randomly distributed along the primer sequence, and that b)
`primers containing errors in the 39 four nucleotides would bind
`poorly to the template DNA, thus not enable PCR amplification.
`If this was the case, then although prior to PCR a random
`distribution of sequence errors should be observed across the
`primer sequences, post PCR significantly fewer errors should be
`observed at the 39 end of the primer. A x2 test on the distribution
`of the sequencing errors between the five 39 terminal nucleotides,
`five (middle) nucleotides, and the remaining 59
`the next
`nucleotides confirms that there are significantly fewer sequencing
`errors in the five terminal 39 nucleotides of the primers (Pearson’s
`x2 test, x2 = 17.506, p = 0.00001). Therefore the data suggests that
`at least some of the primer-related errors can be explained by
`errors during the primer synthesis itself. (We note however that
`this test was only performed on the primers that contained
`
`59 Primer Tags on the GS20
`
`mismatches without indels, due to the difficulty of accurately
`aligning the primers that contained indels).
`Assigned Sequences The remaining 6307 sequences were
`identified through a global alignment
`to the 13 reference
`sequences. Of these, 5642 sequences (89%) diverged by no more
`than 1bp from one of the reference sequences, and could thus be
`assigned to one of the taxa analysed in the study (Table 2).
`Twenty sequences (0.4%) were miss-assigned to an incorrect
`identity. Strikingly, more than half of the miss-assigned sequences
`(n = 11) are of human origin. Based on the omnipresent nature of
`human DNA in most laboratory settings, this bias is likely due to
`contamination during extractions and/or PCR setup. Ignoring all
`human sequences (n = 138), only 9 sequences could be miss-
`assigned out of a total of 5504 GS20 non-human sequences
`(0.00163 percent miss-assignment). Based on a GS20 sequencing
`rate of 761024
`[6],
`the expected number of miss-
`error
`assignments due to sequencing errors in the dinucleotide tag is
`2655046 (761024)<7.4 mismatches. Thus, the obtained result is
`are remarkably close to the expected and miss-assignments of non-
`human sequences can be explained by sequencing errors alone.
`This result shows that despite the possibility of sequencing (and
`other) errors, the assignment based on 59 tagging is remarkably
`reliable.
`Non-Assigned Sequences Of the 6307 sequences that did
`not contain a primer error, 665 sequences diverge from the
`reference sequences by more than 1 bp. However, the expected
`number of such sequences based on the known sequencing error
`rate is only 63076(2.761023)<17, suggesting a significant impact
`of other factors. Obvious candidates include the amplification of
`non-targeted genomic
`sequences
`(for
`example
`laboratory
`contamination), DNA damage or heteroplasmy in the original
`template, and errors introduced into the DNA during the initial
`PCR stage (where a non-proof reading polymerase was used). Of
`these, 491 sequences (73.8%) match DNA sequences from one of
`the 13 original
`taxa amplified by the highest BLAST hit
`(Supplementary Table S4).
`Sequence distribution On average each of the 64 amplicons
`(grouping forward and reverse reads) had 856 coverage with
`a standard deviation of 32 (Table 2). The coverage variation is
`very large. At the extremes we observe that the zebra DNA
`amplified with a TA tag generating a single forward read and no
`reverse read, while the zebra amplified with the CC tag resulting
`in more than 100 forward and reverse reads. There is no evidence
`that forward or reverse strands are unequally distributed within
`the sequence dataset (Pearson’s x2 test, x2 = 27.2793, df = 18,
`p = 0.0739).
`59 tag distribution A Pearson’s x2 test strongly rejects an
`equal distribution among the different tags (x2 = 1725.28, df = 18,
`p = 0.0). The divergence from the expected numbers are primarily
`caused by an excess of 59CN (N representing A,T,G,C) tagged
`amplicons, and a depletion of 59TN tags (Table 3), with a small
`surplus of 59GN and small depletion of 59AN tags. We also
`investigated whether the identity of the second nucleotide within
`each tag led to non-random distribution of the final sequences.
`This was achieved using x2 analysis on the 4 independent datasets
`constituted by the 59 nucleotide A, C, G and T respectively (i.e. the
`4 primer groups AN, CN, GN and TN). The results indicate that
`with the exception of the 59 T labelled tags, the sequences were
`non randomly distributed (AN, x2 = 60.0, d.f. = 3, p = 0.0; CN,
`x2 = 10.0, d.f. = 3, p = 0.0186; GN, x2 = 16.3, d.f. = 3, p = 0.0009;
`TN, x2 = 2.35, d.f. = 3, p = 0.5039). Due to the limited number of
`tetranucleotide tags analysed, we were unable to investigate the
`effect of the identity of the 3rd and 4th position nucleotides.
`
`PLoS ONE | www.plosone.org
`
`6
`
`February 2007 |
`
`Issue 2 | e197
`
`Ariosa Exhibit 1005, p. 6
`
`

`

`Table 3. Observed and Expected sequence distributions
`sorted by 59 tag composition
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`59Tag
`
`Sequences
`from forward
`primer
`
`Sequences
`from reverse
`primer
`
`Total
`sequences
`
`Expected
`sequence
`frequency
`
`Deviation
`
`(Table 3). Although the small number of tetranucleotide tagged
`primers tested makes statistically supported comparisons difficult,
`our observations on the data indicate that overall the rate of
`sequence miss-assignment for these primers was lower than for the
`dinucleotide tags.
`
`59 Primer Tags on the GS20
`
`AA
`
`AC
`
`AG
`
`AT
`
`CA
`
`CC
`
`CG
`
`CT
`
`GA
`
`GC
`
`141
`
`179
`
`68
`
`95
`
`237
`
`305
`
`291
`
`263
`
`171
`
`114
`
`115
`
`113
`
`65
`
`93
`
`231
`
`326
`
`286
`
`258
`
`153
`
`93
`
`256
`
`292
`
`133
`
`188
`
`468
`
`631
`
`577
`
`521
`
`324
`
`207
`
`274.75
`
`274.75
`
`274.75
`
`274.75
`
`274.75
`
`274.75
`
`274.75
`
`274.75
`
`274.75
`
`274.75
`
`218.75
`
`17.25
`
`2141.75
`
`286.75
`
`193.25
`
`356.25
`
`302.25
`
`246.25
`
`49.25
`
`267.75
`
`37.25
`
`DISCUSSION
`Caveats
`In this study we present data describing the viability and limitations
`of a pooled-PCR based approach to GS20 sequencing. Naturally
`the specific results of this study may be dependent on the genetic
`region targeted and the PCR primers/target

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket