IPR2013-00276, No. 1039 Exhibit - Exhibit 1039 (P.T.A.B. Mar. 31, 2014)

ARTICLES
`
`Vol 437|15 September 2005|doi:10.1038/nature03959
`
`Genome sequencing in microfabricated
`high-density picolitre reactors
`
`Marcel Margulies1*, Michael Egholm1*, William E. Altman1, Said Attiya1, Joel S. Bader1, Lisa A. Bemben1,
`Jan Berka1, Michael S. Braverman1, Yi-Ju Chen1, Zhoutao Chen1, Scott B. Dewell1, Lei Du1, Joseph M. Fierro1,
`Xavier V. Gomes1, Brian C. Godwin1, Wen He1, Scott Helgesen1, Chun He Ho1, Gerard P. Irzyk1,
`Szilveszter C. Jando1, Maria L. I. Alenquer1, Thomas P. Jarvie1, Kshama B. Jirage1, Jong-Bum Kim1,
`James R. Knight1, Janna R. Lanza1, John H. Leamon1, Steven M. Lefkowitz1, Ming Lei1, Jing Li1, Kenton L. Lohman1,
`Hong Lu1, Vinod B. Makhijani1, Keith E. McDade1, Michael P. McKenna1, Eugene W. Myers2,
`Elizabeth Nickerson1, John R. Nobile1, Ramona Plant1, Bernard P. Puc1, Michael T. Ronan1, George T. Roth1,
`Gary J. Sarkis1, Jan Fredrik Simons1, John W. Simpson1, Maithreyan Srinivasan1, Karrie R. Tartaro1,
`Alexander Tomasz3, Kari A. Vogt1, Greg A. Volkmer1, Shally H. Wang1, Yong Wang1, Michael P. Weiner4,
`Pengguang Yu1, Richard F. Begley1 & Jonathan M. Rothberg1
`
`The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to
`reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput signiﬁcantly
`greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel ﬁbre-optic slide of
`individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an
`approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an
`emulsion method for DNA ampliﬁcation and an instrument for sequencing by synthesis using a pyrosequencing protocol
`optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness
`of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage
`at 99.96% accuracy in one run of the machine.
`
`DNA sequencing has markedly changed the nature of biomedical
`research and medicine. Reductions in the cost, complexity and time
`required to sequence large amounts of DNA, including improve
`ments in the ability to sequence bacterial and eukaryotic genomes,
`will have signiﬁcant scientiﬁc, economic and cultural impact. Large
`scale sequencing projects, including whole genome sequencing, have
`usually required the cloning of DNA fragments into bacterial vectors,
`ampliﬁcation and puriﬁcation of individual templates, followed by
`Sanger sequencing1 using ﬂuorescent chain terminating nucleotide
`analogues2 and either slab gel or capillary electrophoresis. Current
`estimates put the cost of sequencing a human genome between $10
`million and $25 million3. Alternative sequencing methods have been
`described4 8; however, no technology has displaced the use of
`bacterial vectors and Sanger sequencing as the main generators of
`sequence information.
`Here we describe an integrated system whose throughput routinely
`enables applications requiring millions of bases of sequence infor
`mation, including whole genome sequencing. Our focus has been on
`the co development of an emulsion based method9 11 to isolate and
`amplify DNA fragments in vitro, and of a fabricated substrate and
`instrument that performs pyrophosphate based sequencing (pyro
`sequencing5,12) in picolitre sized wells.
`In a typical run we generate over 25 million bases with a Phred
`quality score of 20 or better (predicted to have an accuracy of 99% or
`higher). Although this Phred 20 quality throughput is signiﬁcantly
`
`higher than that of Sanger sequencing by capillary electrophoresis, it
`is currently at the cost of substantially shorter reads and lower
`average individual read accuracy. Sanger based capillary electrophor
`esis sequencing systems produce up to 700 bases of sequence
`information from each of 96 DNA templates at an average read
`accuracy of 99.4% in 1 h, or 67,000 bases per hour, with substantially
`all of the bases having Phred 20 or better quality23. We further
`characterize the performance of the system and demonstrate that it
`is possible to assemble bacterial genomes de novo from relatively
`short reads by sequencing a known bacterial genome, Mycoplasma
`genitalium (580,069 bases), and comparing our shotgun sequencing
`and de novo assembly with the results originally obtained for this
`genome13. The results of shotgun sequencing and de novo assembly of
`a larger bacterial genome, that of Streptococcus pneumoniae14
`(2.1 megabases (Mb)), are presented in Supplementary Table 4.
`
`Emulsion-based sample preparation
`We generate random libraries of DNA fragments by shearing an
`entire genome and isolating single DNA molecules by limiting
`dilution (Supplementary Methods). Speciﬁcally, we randomly frag
`ment the entire genome, add specialized common adapters to the
`fragments, capture the individual fragments on their own beads and,
`within the droplets of an emulsion, clonally amplify the individual
`fragment (Fig. 1a, b). Unlike in current sequencing technology, our
`approach does not require subcloning in bacteria or the handling of
`
`1454 Life Sciences Corp., 20 Commercial Street, Branford, Connecticut 06405, USA. 2University of California, Berkeley, California 94720, USA. 3Laboratory of Microbiology,
`The Rockefeller University, New York, New York 10021, USA. 4The Rothberg Institute for Childhood Diseases, 530 Whitﬁeld Street, Guilford, Connecticut 06437, USA.
`*These authors contributed equally to this work.
`
`376
`
`© 2005 Nature Publishing Group
`
`Ariosa Exhibit 1039, pg. 1
`IPR2013-00276
`
`

`NATURE|Vol 437|15 September 2005
`
`ARTICLES
`
`individual clones; the templates are handled in bulk within the
`emulsions9 11.
`
`Sequencing in fabricated picolitre-sized reaction vessels
`We perform sequencing by synthesis simultaneously in open wells of
`a ﬁbre optic slide using a modiﬁed pyrosequencing protocol that is
`designed to take advantage of the small scale of the wells. The ﬁbre
`optic slides are manufactured by slicing of a ﬁbre optic block that is
`obtained by repeated drawing and fusing of optic ﬁbres. At each
`iteration, the diameters of the individual ﬁbres decrease as they are
`hexagonally packed into bundles of increasing cross sectional sizes.
`Each ﬁbre optic core is 44 m m in diameter and surrounded by
`2 3 m m of cladding; etching of each core creates reaction wells
`approximately 55 m m in depth with a centre to centre distance of
`50 m m (Fig. 1c), resulting in a calculated well size of 75 pl and a well
`density of 480 wells mm 2. The slide, containing approximately 1.6
`million wells15, is loaded with beads and mounted in a ﬂow chamber
`designed to create a 300 m m high channel, above the well openings,
`through which the sequencing reagents ﬂow (Fig. 2a, b). The
`unetched base of the slide is in optical contact with a second ﬁbre
`optic imaging bundle bonded to a charge coupled device (CCD)
`
`sensor, allowing the capture of emitted photons from the bottom of
`each individual well (Fig. 2c; see also Supplementary Methods).
`We developed a three bead system, and optimized the components
`to achieve high efﬁciency on solid support. The combination of
`picolitre sized wells, enzyme loading uniformity allowed by the small
`beads and enhanced solid support chemistry enabled us to develop a
`method that extends the useful read length of sequencing by syn
`thesis to 100 bases (Supplementary Methods).
`In the ﬂow chamber cyclically delivered reagents ﬂow perpendi
`cularly to the wells. This conﬁguration allows simultaneous exten
`sion reactions on template carrying beads within the open wells and
`relies on convective and diffusive transport to control the addition or
`removal of reagents and by products. The timescale for diffusion into
`and out of the wells is on the order of 10 s in the current conﬁguration
`and is dependent on well depth and ﬂow channel height. The
`timescales for the signal generating enzymatic reactions are on the
`order of 0.02 1.5 s (Supplementary Methods). The current reaction
`is dominated by mass transport effects, and improvements based on
`faster delivery of reagents are possible. Well depth was selected on the
`basis of a number of competing requirements: (1) wells need to be
`deep enough for the DNA carrying beads to remain in the wells in the
`presence of convective transport past the wells; (2) they must be
`sufﬁciently deep to provide adequate isolation against diffusion of
`by products from a well in which incorporation is taking place to a
`well where no incorporation is occurring; and (3) they must be
`shallow enough to allow rapid diffusion of nucleotides into the wells
`and rapid washing out of remaining nucleotides at the end of each
`ﬂow cycle to enable high sequencing throughput and reduced reagent
`use. After the ﬂow of each nucleotide, a wash containing apyrase is
`used to ensure that nucleotides do not remain in any well before the
`next nucleotide being introduced.
`
`Base calling of individual reads
`Nucleotide incorporation is detected by the associated release of
`inorganic pyrophosphate and the generation of photons5,12. Wells
`containing template carrying beads are identiﬁed by detecting a
`known four nucleotide ‘key’ sequence at the beginning of the read
`
`Figure 1 | Sample preparation. a, Genomic DNA is isolated, fragmented,
`ligated to adapters and separated into single strands (top left). Fragments are
`bound to beads under conditions that favour one fragment per bead, the
`beads are captured in the droplets of a PCR reaction mixture in oil
`emulsion and PCR ampliﬁcation occurs within each droplet, resulting in
`beads each carrying ten million copies of a unique DNA template (top right).
`The emulsion is broken, the DNA strands are denatured, and beads carrying
`single stranded DNA clones are deposited into wells of a ﬁbre optic slide
`(bottom right). Smaller beads carrying immobilized enzymes required for
`pyrophosphate sequencing are deposited into each well (bottom left).
`b, Microscope photograph of emulsion showing droplets containing a bead
`and empty droplets. The thin arrow points to a 28 m m bead; the thick arrow
`points to an approximately 100 m m droplet. c, Scanning electron
`micrograph of a portion of a ﬁbre optic slide, showing ﬁbre optic cladding
`and wells before bead deposition.
`
`Figure 2 | Sequencing instrument. The sequencing instrument consists of
`the following major subsystems: a ﬂuidic assembly (a), a ﬂow chamber that
`includes the well containing ﬁbre optic slide (b), a CCD camera based
`imaging assembly (c), and a computer that provides the necessary user
`interface and instrument control.
`
`© 2005 Nature Publishing Group
`
`377
`
`Ariosa Exhibit 1039, pg. 2
`IPR2013-00276
`
`

`ARTICLES
`
`NATURE|Vol 437|15 September 2005
`
`(Supplementary Methods). Raw signals are background subtracted,
`normalized and corrected. The normalized signal intensity at each
`nucleotide ﬂow, for a particular well, indicates the number of
`nucleotides, if any, that were incorporated. This linearity in signal
`is preserved to at least homopolymers of length eight (Supplemen
`tary Fig. 6). In sequencing by synthesis a very small number of
`templates on each bead lose synchronism (that is, either get ahead of,
`or fall behind, all other templates in sequence16). The effect is
`primarily due to leftover nucleotides in a well (creating ‘carry
`forward’) or to incomplete extension. Typically, we observe a carry
`forward rate of 1 2% and an incomplete extension rate of 0.1 0.3%.
`Correction of these shifts is essential because the loss of synchronism
`is a cumulative effect that degrades the quality of sequencing at
`longer read lengths. We have developed algorithms, based on detailed
`models of the underlying physical phenomena, that allow us to
`determine, and correct for, the amounts of carry forward and
`incomplete extension occurring in individual wells (Supplementary
`Methods). Figure 3 shows the processed result, a 113 bases long read
`generated in the M. genitalium run discussed below. To assess
`sequencing performance and the effectiveness of the correction
`algorithms, independently of artefacts introduced during the emul
`sion based sample preparation, we created test fragments with
`difﬁcult to sequence stretches of identical bases of increasing length
`(homopolymers) (Supplementary Methods and Supplementary Fig.
`4). Using these test fragments, we have veriﬁed that at the individual
`read level we achieve base call accuracy of approximately 99.4%, at
`read lengths in excess of 100 bases (Table 1).
`
`High-quality reads and consensus accuracy
`Before base calling or aligning reads, we select high quality reads
`without relying on a priori knowledge of the genome or template
`being sequenced (Supplementary Methods). This selection is based
`on the observation that poor quality reads have a high proportion of
`signals that do not allow a clear distinction between a ﬂow during
`which no nucleotide was incorporated and a ﬂow during which one
`or more nucleotide was incorporated. When base calling individual
`reads, errors can occur because of signals that have ambiguous values
`(Supplementary Fig. 5). To improve the usability of our reads, we also
`developed a metric that allows us to estimate ab initio the quality (or
`probability of correct base call) of each base of a read, analogous to
`the Phred score17 used by current Sanger sequencers (Supplementary
`Methods and Supplementary Fig. 8).
`
`Figure 3 | Flowgram of a 113-bases read from an M. genitalium
`run. Nucleotides are ﬂowed in the order T, A, C, G. The sequence is shown
`above the ﬂowgram. The signal value intervals corresponding to the various
`homopolymers are indicated on the right. The ﬁrst four bases (in red, above
`the ﬂowgram) constitute the ‘key’ sequence, used to identify wells containing
`a DNA carrying bead.
`
`Higher quality sequence can be achieved by taking advantage of
`the high over sampling that our system affords and building a
`consensus sequence. Sequences are aligned to one another using
`the signal strengths at each nucleotide ﬂow, rather than individual
`base calls, to determine optimal alignment (Supplementary
`Methods). The corresponding signals are then averaged, after
`which base calling is performed. This approach greatly improves
`the accuracy of the sequence (Supplementary Fig. 7) and provides an
`estimate of the quality of the consensus base. We refer to that quality
`measure as the Z score
`it is a measure of the spread of signals in all
`the reads at one location and the distance between the average signal
`and the closest base calling threshold value. In both re sequencing
`and de novo sequencing, as the minimum Z score is raised the
`consensus accuracy increases, while coverage decreases; approxi
`mately half of the excluded bases, as the Z score is increased, belong
`to homopolymers of length four and larger. Sanger sequencers
`usually require a depth of coverage at any base of three or more in
`order to achieve a consensus accuracy of 99.99%. To achieve a
`minimum of threefold coverage of 95% of the unique portions of a
`typical genome requires approximately seven to eightfold over
`sampling. Owing to our higher error rate, we have observed that
`comparable consensus accuracies, over a similar fraction of a gen
`ome, are achieved with a depth of coverage of four or more, requiring
`approximately ten to twelve times over sampling.
`
`Mycoplasma genitalium
`Mycoplasma genomic DNA was fragmented and prepared into a
`sequencing library as described above. (This was accomplished by a
`single individual in 4 h.) After emulsion polymerase chain reaction
`
`(PCR) and bead deposition onto a 60 £ 60 mm2 ﬁbre optic slide, a
`
`process which took one individual 6 h, 42 cycles of four nucleotides
`were ﬂowed through the sequencing system in an automated 4 h run
`of the instrument. The results are summarized in Table 2. In order to
`measure the quality of individual reads, we aligned each high quality
`read to the reference genome at 70% stringency using ﬂow space
`mapping and criteria similar to those used previously in assessing the
`accuracy of other base callers17. When assessing sequencing quality,
`only reads that mapped to unique locations in the reference genome
`were included. Because this process excludes repeat regions (parts of
`the genome for which corresponding ﬂowgrams are 70% similar to
`one another), the selected reads did not cover the genome comple
`tely. Figure 4a illustrates the distribution of read lengths for this run.
`The average read length was 110 bases, the resulting over sample 40
`fold, and 84,011 reads (27.4%) were perfect. Figure 4b summarizes
`the average error as a function of base position. Coverage of non
`repeat regions was consistent with the sample preparation and
`emulsion not being biased (Supplementary Fig. 8). At the individual
`read level, we observe an insertion and deletion error rate of
`approximately 3.3%; substitution errors have a much lower rate,
`on the order of 0.5%. When using these reads without any Z score
`restriction, we covered 99.94% of the genome in ten contiguous
`regions with a consensus accuracy of 99.97%. The error rate in
`homopolymers is signiﬁcantly reduced in the consensus sequence
`(Supplementary Fig. 7). Of the bases not covered by this consensus
`
`Table 1 | Summary of sequencing statistics for test fragments
`60 £ 60 mm2
`Size of ﬁbre optic slide
`Run time/number of cycles
`243 min/42
`Test fragment reads
`497,893
`Average read length (bases)
`108
`Number of bases in test fragments
`53,705,267
`Bases with a Phred score of 20 and above
`47,181,792
`Individual read insertion error rate
`0.44%
`Individual read deletion error rate
`0.15%
`Individual read substitution error rate
`0.004%
`All errors
`0.60%
`
`378
`
`© 2005 Nature Publishing Group
`
`Ariosa Exhibit 1039, pg. 3
`IPR2013-00276
`
`

`NATURE|Vol 437|15 September 2005
`
`ARTICLES
`
`sequence (366 bases), all belonged to excluded repeat regions. Setting
`a minimum Z score equal to 4, coverage was reduced to 98.1% of the
`genome, while consensus accuracy increased to 99.996%. We further
`demonstrated the reproducibility of the system by repeating the
`whole genome sequencing of M. genitalium an additional eight
`times, achieving a 40 fold coverage of the genome in each of the
`eight separate instrument runs (Supplementary Table 3).
`We assembled the M. genitalium reads from a single run into 25
`contigs with an average length of 22.4 kb. One of these contigs was
`misassembled due to a collapsed tandem repeat region of 60 bases,
`and was corrected by hand. The original sequencing of M. genitalium
`resulted in 28 contigs before directed sequencing used for ﬁnishing
`the sequence13. Our assembly covered 96.54% of the genome and
`attained a consensus accuracy of 99.96%. Non resolvable repeat
`regions amount to 3% of the genome: we therefore covered 99.5%
`of the unique portions of the genome. Sixteen of the breaks between
`contigs were due to non resolvable repeat regions, two were due to
`missed overlapping reads (our read ﬁlter and trimmer are not perfect
`and the algorithms we use to perform the pattern matching of
`ﬂowgrams occasionally miss valid overlaps) and the remainder to
`thin read coverage. Setting a minimum Z score of 4, coverage was
`reduced to 95.27% of the genome (98.2% of the resolvable part of the
`genome) with the consensus accuracy increasing to 99.994%.
`
`Discussion
`We have demonstrated the simultaneous acquisition of hundreds of
`thousands of sequence reads, 80 120 bases long, at 96% average
`accuracy in a single run of the instrument using a newly developed in
`vitro sample preparation methodology and sequencing technology.
`With Phred 20 as a cutoff, we show that our instrument is able to
`produce over 47 million bases from test fragments and 25 million
`bases from genomic libraries. We used test fragments to de couple
`our sample preparation methodology from our sequencing technol
`ogy. The decrease in single read accuracy from 99.4% for test
`fragments to 96% for genomic libraries is primarily due to a lack
`of clonality in a fraction of the genomic templates in the emulsion,
`and is not an inherent limitation of the sequencing technology. Most
`of the remaining errors result from a broadening of signal distri
`butions, particularly for large homopolymers (seven or more),
`leading to ambiguous base calls. Recent work on the sequencing
`
`Table 2 | Summary statistics for M. genitalium
`Sequencing summary
`Number of instrument runs
`Size of ﬁbre optic slide
`Run time/number of cycles
`High quality reads
`Average read length (bases)
`Number of bases in high quality reads
`Bases with a Phred score of 20 and above
`Re sequencing
`Reads mapped to single locations
`Number of bases in mapped reads
`Individual read insertion error rate
`Individual read deletion error rate
`Individual read substitution error rate
`Re sequencing consensus
`Average over sampling
`Coverage, all (Z $ 4)
`Consensus accuracy, all (Z $ 4)
`Consensus insertion error rate, all (Z $ 4)
`Consensus deletion error rate, all (Z $ 4)
`Consensus substitution error rate, all (Z $ 4)
`Number of contigs
`De novo assembly
`Coverage, all (Z $ 4)
`Consensus accuracy, all (Z $ 4)
`Number of contigs
`Average contig size (kb)
`
`1
`60 £ 60 mm2
`243 min/42
`306,178
`110
`33,655,553
`26,753,540
`
`238,066
`27,687,747
`1.67%
`1.60%
`0.68%
`£ 40
`99.9% (98.2%)
`99.97% (99.996%)
`0.02% (0.003%)
`0.01% (0.002%)
`0.001% (0.0003%)
`10
`
`96.54% (95.27%)
`99.96% (99.994%)
`25
`22.4
`
`The individual read error rates are referenced to the total number of bases in mapped reads.
`
`chemistry and algorithms that correct for crosstalk between wells
`suggests that the signal distributions will narrow, with an attendant
`reduction in errors and increase in read lengths. In preliminary
`experiments with genomic libraries that also include improvements
`in the emulsion protocol, we are able to achieve, using 84 cycles, read
`lengths of 200 bases with accuracies similar to those demonstrated
`here for 100 bases. On occasion, at 168 cycles, we have generated
`individual reads that are 100% accurate over greater then 400 bases.
`Using M. genitalium we demonstrate that short fragments a priori
`do not prohibit the de novo assembly of bacterial genomes. In fact, the
`larger over sampling afforded by the throughput of our system
`resulted in a draft sequence having fewer contigs than with Sanger
`reads, with substantially less effort. By taking advantage of the over
`sampling, consensus accuracies greater then 99.96% were achieved
`for this genome. Further quality ﬁltering of the assembly results in
`the selection of a consensus sequence with accuracy exceeding
`99.99% while incurring only a minor loss of genome coverage.
`Comparable results were seen when we shotgun sequenced and de
`novo assembled the 2.1 Mb genome of Streptococcus pneumoniae14
`(Supplementary Table 4). The de novo assembly of genomes more
`complex than bacteria, including mammalian genomes, may require
`the development of methods similar to those developed for Sanger
`sequencing, to prepare and sequence paired end libraries that can
`span repeats in these genome. To facilitate the use of paired end
`libraries we have developed methods to sequence, in an individual
`well, from both ends of genomic template, and plan to add paired end
`read capabilities to our assembler (Supplementary Methods).
`Future increases in throughput, and a concomitant reduction in
`cost per base, may come from the continued miniaturization of the
`ﬁbre optic reactors, allowing more sequence to be produced per unit
`area
`a scaling characteristic similar to that which enabled the
`
`Figure 4 | M. genitalium data. a, Read length distribution for the 306,178
`high quality reads of the M. genitalium sequencing run. This distribution
`reﬂects the base composition of individual sequencing templates. b, Average
`read accuracy, at the single read level, as a function of base position for the
`238,066 mapped reads of the same run.
`
`© 2005 Nature Publishing Group
`
`379
`
`Ariosa Exhibit 1039, pg. 4
`IPR2013-00276
`
`

`ARTICLES
`
`NATURE|Vol 437|15 September 2005
`
`prediction of signiﬁcant improvements in the integrated circuit at the
`start of its development cycle18.
`
`Received 6 May; accepted 10 June 2005.
`Published online 31 July 2005.
`
`1.
`
`2.
`
`METHODS
`Emulsion based clonal ampliﬁcation. The simultaneous ampliﬁcation of frag
`ments is achieved by isolating individual DNA carrying beads in separate ,100
`m m aqueous droplets (on the order of 2 £ 106 ml
`1) made through the creation
`of a PCR reaction mixture in oil emulsion. (Fig. 1b; see also Supplementary
`Methods). The droplets act as separate microreactors in which parallel DNA
`ampliﬁcations are performed, yielding approximately 107 copies of a template
`per bead; 800 m l of emulsion containing 1.5 million beads are prepared in a
`standard 2 ml tube. Each emulsion is aliquoted into eight PCR tubes for
`ampliﬁcation. After PCR, the emulsion is broken to release the beads, which
`include beads with ampliﬁed, immobilized DNA template and empty beads
`(Supplementary Methods). We then enrich for template carrying beads (Sup
`plementary Methods). Typically, about 30% of the beads will have DNA,
`producing 450,000 template carrying beads per emulsion reaction. The number
`of emulsions prepared depends on the size of the genome and the expected
`number of runs required to achieve adequate over sampling. The 580 kb M.
`
`genitalium genome, sequenced on one 60 £ 60 mm2 ﬁbre optic slide, required
`
`1.6 ml of emulsion. A human genome, over sampled ten times, would require
`approximately 3,000 ml of emulsion.
`Bead loading into picolitre wells. The enriched template carrying beads are
`deposited by centrifugation into open wells (Fig. 1c), arranged along one face of a
`
`60 £ 60 mm2 ﬁbre optic slide. The beads (diameter , 28 m m) are sized to ensure
`
`that no more than one bead ﬁts in most wells (we observed that 2 5% of ﬁlled
`wells contain more than one bead). Loading 450,000 beads (from one emulsion
`
`preparation) onto each half of a 60 £ 60 mm2 plate was experimentally found to
`
`limit bead occupancy to approximately 35% of all wells, thereby reducing
`chemical and optical crosstalk between wells. A mixture of smaller beads that
`carry immobilized ATP sulphurylase and luciferase necessary to generate light
`from free pyrophosphate are also loaded into the wells to create the individual
`sequencing reactors (Supplementary Methods).
`Image capture. A bead carrying 10 million copies of a template yields
`approximately 10,000 photons at the CCD sensor, per incorporated nucleotide.
`The generated light is transmitted through the base of the ﬁbre optic slide and
`
`detected by a large format CCD (4,095 £ 4,096 pixels). The images are processed
`
`to yield sequence information simultaneously for all wells containing template
`carrying beads. The imaging system was designed to accommodate a large
`number of small wells and the large number of optical signals being generated
`from individual wells during each nucleotide ﬂow. Once mounted, the ﬁbre
`optic slide’s position does not shift; this makes it possible for the image analysis
`software to determine the location of each well (whether or not it contains a
`DNA carrying bead), based on light generation during the ﬂow of a pyropho
`sphate solution, which precedes each sequencing run. A single well is imaged by
`approximately nine 15 m m pixels. For each nucleotide ﬂow, the light intensities
`collected by the pixels covering a particular well are summed to generate a signal
`for that particular well at that particular nucleotide ﬂow. Each image captured by
`the CCD produces 32 megabytes of data. In order to perform all of the necessary
`signal processing in real time, the control computer is ﬁtted with an accessory
`board (Supplementary Methods), hosting a 6 million gate Field Programmable
`Gate Array (FPGA)19,20.
`De novo shotgun sequence assembler. A de novo ﬂow space assembler was
`developed to capture all of the information contained in the original ﬂow based
`signal trace. It also addresses the fact that existing assemblers are not optimized
`for 80 120 bases reads, particularly with respect to memory management due to
`the increased number of sequencing reads needed to achieve equivalent genome
`coverage. (A completely random genome covered with 100 bases reads requires
`approximately 50% more reads to yield the same number of contiguous regions
`(contigs) as achieved with 700 bases reads, assuming the need for a 30 bases
`overlap between reads21.) This assembler consists of a series of modules: the
`Overlapper, which ﬁnds and creates overlaps between reads; the Unitigger, which
`constructs larger contigs of overlapping sequence reads; and the Multialigner,
`which generates consensus calls and quality scores for the bases within each
`contig (Supplementary Methods). (The names of the software modules are based
`on those performing related functions in other assemblers developed pre
`viously22.)
`
`5.
`
`6.
`
`7.
`
`8.
`
`9.
`
`Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain terminating
`inhibitors. Proc. Natl Acad. Sci. USA 74, 5463 5467 (1977).
`Prober, J. M. et al. A system for rapid DNA sequencing with ﬂuorescent chain
`terminating dideoxynucleotides. Science 238, 336 341 (1987).
`3. NIH News Release. NHGRI seeks next generation of sequencing technologies.
`14 October 2004 khttp://www.genome.gov/12513210l.
`4. Nyren, P., Pettersson, B. & Uhlen, M. Solid phase DNA minisequencing by an
`enzymatic luminometric inorganic pyrophosphate detection assay. Anal.
`Biochem. 208, 171 175 (1993).
`Ronaghi, M. et al. Real time DNA sequencing using detection of pyrophosphate
`release. Anal. Biochem. 242, 84 89 (1996).
`Jacobson, K. B. et al. Applications of mass spectrometry to DNA sequencing.
`GATA 8, 223 229 (1991).
`Bains, W. & Smith, G. C. A novel method for nucleic acid sequence
`determination. J. Theor. Biol. 135, 303 307 (1988).
`Jett, J. H. et al. High speed DNA sequencing: an approach based upon
`ﬂuorescence detection of single molecules. Biomol. Struct. Dynam. 7, 301 309
`(1989).
`Tawﬁk, D. S. & Grifﬁths, A. D. Man made cell like compartments for molecular
`evolution. Nature Biotechnol. 16, 652 656 (1998).
`10. Ghadessy, F. J., Ong, J. L. & Holliger, P. Directed evolution of polymerase
`function by compartmentalized self replication. Proc. Natl Acad. Sci. USA 98,
`4552 4557 (2001).
`11. Dressman, D., Yan, H., Traverso, G., Kinzler, K. W. & Vogelstein, B.
`Transforming single DNA molecules into ﬂuorescent magnetic particles for
`detection and enumeration of genetic variations. Proc. Natl Acad. Sci. USA 100,
`8817 8822 (2003).
`12. Ronaghi, M., Uhlen, M. & Nyren, P. A sequencing method based on real time
`pyrophosphate. Science 281, 363 365 (1998).
`13. Fraser, C. M. et al. The minimal gene complement of Mycoplasma genitalium.
`Science 270, 397 403 (1995).
`14. Tettelin, H. et al. Complete genome sequence of a virulent isolate of
`Streptococcus pneumoniae. Science 293, 498 506 (2001).
`15. Leamon, J. H. et al. A massively parallel PicoTiterPlate based platform for
`discrete picoliter scale polymerase chain reactions. Electrophoresis 24,
`3769 3777 (2003).
`16. Ronaghi, M. Pyrosequencing sheds light on DNA sequencing. Genome Res. 11,
`3 11 (2001).
`17. Ewing, B., Hillier, L., Wendl, M. C. & Green, P. Base calling of automated
`sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175 185
`(1998).
`18. Moore, G. E. Cramming more components onto integrated circuits. Electronics
`38(8) (1965).
`19. Mehta, K., Rajesh, V. A. & Veeraswamy, S. FPGA implementation of VXIbus
`interface hardware. Biomed. Sci. Instrum. 29, 507 513 (1993).
`20. Fagin, B., Watt, J. G. & Gross, R. A special purpose processor for gene
`sequence analysis. Comput. Appl. Biosci. 9, 221 226 (1993).
`21. Lander, E. S. & Waterman, M. S. Genomic mapping by ﬁngerprinting random
`clones: a mathematical analysis. Genomics 2, 231 239 (1988).
`22. Myers, E. W. Toward simplifying and accurately formulating fragment
`assembly. J Comput. Biol. 2, 275 290 (1995).
`23. Ogawa, T. et al. Increased Productivity For Core Labs Using One Polymer and One
`Array Length for Multiple Applications. Poster P108 T. ABRF ’05: Biomolecular
`Technologies: Discovery to Hypotheses (Savannah, Georgia, 5 8 February
`2005); also available as Applied Biosystems 3730x/DNA Analyzer
`Speciﬁcation Sheet (2004).
`
`Supplementary Information is linked to the online version of the paper at
`www.nature.com/nature.
`
`Acknowledgements We acknowledge P. Dacey and the support of the
`Operations groups of 454 Life Sciences. This research was supported in part by
`the US Department of Health and Human Services under NIH grants.
`
`Author Information Sequences for M. genitalium and S. pneumoniae are
`deposited at DDBJ/EMBL/GenBank under accession numbers AAGX01000000
`and AAGY01000000, respectively. Reprints and permissions information is
`available at npg.nat

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go Monthly

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases