`/REVIEW
`
`Oligonucleotide Arrays:
`New Concepts and Possibilities
`Alexander B. Chetverin and Fred Russell Kramer‘
`
`Institute of Protein Research, Russian Academy of Sciences, 142292 Pushchino, Moscow Region, Russia (e-mail: chetverin@vax.ipr.serpukhov.su).
`]Department of Molecular Genetics, Public Health Research Institute, 455 First Ave, New York, NY 10016 (e-mail: kramer@phri.nyu.edu).
`
`Advances in solid-phase oligonucleotide synthesis and hybridization techniques have led to an incipient
`technology based on the use of oligonucleotide arrays. The inclusion of a large number of oligonucleotide
`probes within a single array greatly reduces the cost of their synthesis and allows thousands of hybridiza-
`tions to be carried out simultaneously. The range of potential applications of oligonucleotide arrays was
`expanded by the realization that nucleic acids can be sequenced by hybridizing them to all possible
`oligonucleotides of a given length. Additional possibilities are offered by novel types of oligonucleotide
`arrays that are capable of parallel sorting, isolating, and manipulating thousands, and even millions, of
`nucleic acid species. Fields, such as site-directed mutagenesis, protein engineering, and recombinant DNA
`technology, would benefit from using these arrays. Further, these approaches could enable the analysis of
`entire genomes by preparing ordered fragment libraries, and by sequencing complex pools of nucleic
`acids, in a novel approach that provides long-range sequence information by generating nested nucleic
`acids and then surveying the oligonucleotides contained in the nested strands. This would allow large
`diploid genomes to be sequenced directly in a completely automated procedure that does not require
`fragment cloning or chromosome mapping.
`
`his paper outlines the prospects of an emerging tech-
`nology based on the use of oligonucleotide arrays.
`The main components of this technology are solid-
`phase oligonucleotide synthesis and nucleic acid
`hybridization.
`Hybridization is a hydrogen-bonding interaction
`between two nucleic acid strands that obey the
`Watson-Crick complementarity rules. All other base pairs are
`mismatches that destabilize hybrids. Since a single mismatch
`decreases the melting temperature (Tm) of a hybrid by up to
`10°C', conditions can be found in which only perfect hybrids
`survive. Hybridization comprises contacting the strands, one of
`which is usually immobilized on a solid support and the other
`usually bears a radioactive or fluorescent label, and then sepa-
`rating the resulting hybrids from the unreacted labeled strands
`by washing the support. Hybrids are recognized by detecting the
`label bound to the surface of the support.
`Oligonucleotide hybridization is widely used to determine
`the presence in a nucleic acid of a sequence that is complemen-
`tary to the oligonucleotide probe. In many cases, this provides a
`simple,
`fast, and inexpensive alternative to conventional
`sequencing methods“. Hybridization does not require nucleic
`acid cloning and purification, carrying out base—specific reac—
`tions, or tedious electrophoretic separations. Hybridization of
`oligonucleotide probes has been successfully used for various
`purposes, such as the analysis of genetic polymorphisms“, diag—
`nosis of genetic diseases5, cancer diagnostics", detection of viral
`and microbial pathogens”, screening of clonesg, genome map—
`ping‘°, and the ordering of fragment libraries”. Hybridization is
`often used in combination with ligation of hybridized probes by
`a DNA ligase12 or their extension by a DNA polymerase”,
`which increases the sensitivity and signal-to—noise ratio, mainly
`due to overcoming the mismatches at hybrid termini that are the
`most difficult to discriminate against”.
`Informationally,
`the difference between conventional
`sequencing methods and oligonucleotide hybridization is analo-
`gous to the difference between reading a text by letters and
`reading it by words. The latter is faster, but requires knowledge
`of all the words. That is why the nucleic acids that are currently
`analyzed by hybridization are those whose sequence is known
`
`and whose presence in a sample is expected. The analysis of
`unknown sequences or unknown sequence variants requires that
`they be hybridized to all possible oligonucleotides, whose num—
`ber N is an exponential function of their length n: N = 4"
`(for example, N = 65,536 when n = 8, and N = 1,048,576
`when n = 10). Such large-scale hybridizations would not be
`feasible if each oligonucleotide had to be synthesized and
`hybridized individually. However, this approach is now a real
`possibility because of the invention of oligonucleotide arrays".
`An oligonucleotide array is comprised of a number of indi—
`vidual oligonucleotide species tethered to the surface of a solid
`support in a regular pattern, each one in a different area, so that
`the location of each oligonucleotide is known. Oligonucleotide
`arrays can be prepared by synthesizing all the oligonucleotides,
`in parallel, directly on the support, employing the methods of
`solid-phase chemical synthesis in combination with site-direct—
`ing masks‘i‘". Such masks direct a particular nucleotide mono-
`mer (A, T, G or C) to react with a predetermined exposed area
`on the surface of the support. Four masks with non—overlapping
`windows and four coupling reactions are required to increase the
`length of the tethered oligonucleotides by one. In each subse-
`quent round of synthesis a different set of four masks is used,
`and this determines the unique sequence of the oligonucleotides
`synthesized in each particular area. The total number of coup-
`ling reactions needed to synthesize all possible n—mers is 4 Xn.
`Thus, all possible octamers can be synthesized on an array in
`only 32 reactions, whereas as many as 524,288 reactions (8 X4”)
`are needed to synthesize them individually. Chemistries have
`been developed17 so that the growing end of the oligonucleotides
`can be either the 5 ’ or the 3’ end. An efficient photolithographic
`technique has been invented“"18 for manufacturing miniature
`arrays containing as many as 105 individual oligonucleotide
`areas per cm2, and there is no fundamental problem in increas—
`ing the density to up to the 1
`'0 areas per cm2 that is now
`achievable in semiconductor fabrication'm.
`
`Thus, a miniature array can contain a large number of oli—
`gonucleotide probes, and all of them can be simultaneously
`hybridized to a nucleic acid sample in one experiment, thereby
`greatly reducing the time required for analysis and eliminating
`the need for the costly synthesis of individual oligonucleotides.
`
`BIO/TECHNOLOGY VOL. 12 NOVEMBER 1994 1093
`
`Ariosa Exhibit 1028, pg. 1
`|PR2013-00276
`
`Ariosa Exhibit 1028, pg. 1
`IPR2013-00276
`
`
`
`A
`
`B
`
`A
`
`.
`.
`Aligning
`the n-mers
`at (n-1)-long
`subsequences
`
`SEET
`G¥¥A
`TAT
`ATC
`TCC
`
`—>
`
`Sequence
`ACGGTTAch
`
`Sequence with repeated (n-1)-mers
`@Tt-ESQGT
`Constituent n-mers
`GCA CAT ATG TGC GCC CCA CAG AGT
`
`Some variants of sequence reconstruction
`Subtragments
`Sequences
`am @TQES (reg @GT —> GCATGCCAGT
`
`fig @Tti‘ti m @GT —> GCCATGCAGT
`
`. © 1994 Nature Publishing Group http://www.nature.com/naturebiotechnology
`each area at a predetermined time when the temperature reaches
`Constituent n-mers
`the T," value of that particular hybrid", and adjusting the surface
`ACG CGG GGT GTT TTA TAT ATc ch
`concentration of the immobilized probes so that the rates of
`hybrid dissociation are independent of base composition”.
`An array can contain a chosen collection of oligonucleotides,
`e.g., probes specific for all known clinically important patho-
`gens or specific for all known sequence markers of genetic
`diseases'5~24. Such an array can satisfy the needs of a diagnostic
`laboratory. Alternatively, an array can contain all possible oli—
`gonucleotides25 of a given length n. Hybridization of a nucleic
`acid with such a “comprehensive” array results in a list of all its
`constituent n-mers, which can be used for unambiguous gene
`identification (e.g., in forensic studies), for determination of
`unknown gene variants and mutations (including the sequencing
`of related genomes once the sequence of one of them is known),
`for finding overlapping clones, and for checking sequences
`determined by conventional methods. Finally, surveying the
`n-mers by hybridization to a comprehensive array can provide
`sufficient information to determine the sequence of a totally
`unknown nucleic acid, as discussed below.
`
`W @Tm @fil fiTQ‘Q Qfifil EST —> GCATGCCATGCCAGT
`
`
`
`
`
`gig ETQEZE m @Tgit‘; ESE @GT —> GCCATGCATGCAGT
`
`QT§§§ $.51 QTQE mg, gar —> CATGCATGCCAGT
`
`@Tfi {tag @TKSR (fig EGT —> CATGCCATGCAGT
`FIGURE 1. Principle of sequencing by hybridization.
`(A) Reconstruction of a nucleic acid sequence from the list of
`its constituent n-mers identified by its hybridization to a com-
`prehensive set ot oligonucieotide probes.
`(B)
`if repeated
`(n-1)-mers are present in a sequence,
`it cannot be recon-
`structed unambiguously. Assembly of the n-mers results in
`multiple subfragments that can be permuted and/or repeated
`in ditferent ways. In these examples, n = 3 in order to simplify
`the illustration. Crosshatches and boxes indicate repeated
`(n-1)-mers.
`
`A
`
`B
`
`A well of I preparative array
`
`
`
`‘
`‘ Dignting DNA with Plti
`Mine-rm_nm_
`—mmm—My-
`Extending the 5' termini
`I'mm”
`:mvxm— r
`
`Melting and hybridizing
`to the preparative may
`
`Hyman-non by m- :' llrrnlnl... [m r
`.
`mm”
`
`Nybddtzmon wow um
`can not ma m ligation
`I
`
`l'
`
`
`
`
`I
`
`a
`Iflm'x'm
`" H
`' “
`
`I'
`r
`
`load- to lip-Item-
`m
`
`I'
`"and IMO“ at m imam copy
`
`,Mvnext
`
`FIGURE 2. An example of sorting nucleic acid strands on a
`preparative array. (A) Structure of the oiigonucleotides immo-
`bilized in a well of the preparative array. (B) Principle of sorting
`the strands from a restriction digest of human DNA by virtue of
`the n-mer (n
`8) located immediately upstream from the
`endonuciease recognition sequence.
`
`Of course, different probes contained in the same array would
`produce hybrids with difi‘erent stabilities because of their different
`base composition. This problem can be overcome in a number of
`ways,
`including carrying out hybridization in the presence of
`tetraalkylammonium salts that largely eliminate the difference in
`the stability of G:C and A:T base pairsm‘, washing the array at
`steadily increasing temperature and collecting the signal from
`
`1094 BIO/TECHNOLOGY VOL. 12 NOVEMBER 1994
`
`Sequencing by Hybridization
`Surveying the n—mers in a nucleic acid is analogous to listing
`the words contained in a text. This would not make much sense
`
`unless we know how the words are connected. Fortunately,
`unlike common words,
`the n-mers in a nucleic acid strand
`overlap one another so that each non-terminal n-mer includes
`the last n-l nucleotides of the preceding n—mer and the first n-l
`nucleotides of the next n-mer. This allows the surveyed n-mers
`to be assembled by overlapping them at their (n—l)—long subse—
`quences [(n-l)-mers] and, thus, to reconstruct the sequence of
`the analyzed nucleic acid (Fig. 1A). This strategy for sequence
`determination has independently been proposed by groups in
`Great Britain‘iz", Yugoslavia”, and Russia28 and is called
`“sequencing by hybridization” (SBH). SBH can surpass con—
`ventional sequencing procedures in a number of parameters,
`including speed, cost, quality of the results, and ease of automa—
`tion. Test sequencing of z 100 nucleotide—long DNA strands by
`hybridization with octamers has demonstrated that the method is
`feasible and is tolerant of occasional hybridization errorsz°~3". It
`has also been shown that costs can be reduced further by
`employing combinatorial arrays in which the individual areas
`contain groups of selected n—mers. This saves array space by an
`order of magnitude, with only a slight loss in resolution‘m.
`However, there is an inherent flaw in the SBH method that
`undermines its advantages. SBH relies exclusively on short—
`range information provided by the sequences of the surveyed
`n-mers, and success in assembling the n-mers is absolutely
`dependent on whether or not their (n-l)-long overlaps are
`unique. Put another way, success is dependent on whether or not
`there are repeated (n—1)-mers in the nucleic acid being analyzed.
`Strand reconstruction terminates at non-unique (n-1)-mers and
`the resulting subfragments can be permuted and/or repeated in
`many different ways without conflicting with the hybridization
`data (Fig. 1B). When n = 8 only 94%, 32% and 0.9% of
`random sequences of 50, 200 and 400 nucleotides in length,
`respectively, can be reconstructed unambiguously. (The remain—
`ing sequences contain repeated heptamers.) The situation is
`even worse with natural nucleic acids, since they usually contain
`more repeats than do random sequences. Utilizing longer
`probes would reduce the ambiguities, but would result in an
`exponential increase in cost. For example, if n is increased from
`8 to 12, then the length of random strands that can be sequenced
`with 95% success increases from 47 to 666 bases (14-fold),
`whereas the number of probes required increases 256—fold.
`Computer simulations demonstrate that the resolvable strand
`length can be increased by a factor of 4 if additional information
`
`Ariosa Exhibit 1028, pg. 2
`|PR2013-00276
`
`Ariosa Exhibit 1028, pg. 2
`IPR2013-00276
`
`
`
`cEfl
`
`5‘I
`J'IC
`
`Second type fragments (sorted on a preparative array)
`
`6-
`ll
`cll
`[l A.
`If]
`5'I7 c! 5.1;...
`I?
`llt
`IA D [it
`ll
`TIC I! fiii-C
`Linked signatures (found together in two wells)
`gC—Ca
`tA—Ac
`tG—Ta
`
`ID [1 Ala'
`ii
`[iii
`ls‘
`
`Cc—gC
`
`Ordering the fragments
`Ca—tA
`Ac-tG
`
`Ta—gA
`
`@ © 1994 Nature Publishing Group http://www.nature.com/naturebiotechnology
`is known, such as the sequences at the strand termini,
`the
`First-type fragments of a DNA (sequenced)
`approximate strand length, or the copy number of each n-mer in
`=le] AIJ‘
`.CI I
`It]
`Al I
`ci'l
`GI I
`the strandmm. However, as the intensity of the hybridization
`Flt!
`I5'
`I IC
`HI
`I IA l‘lt
`I IT
`signal is influenced by a number of factors, such as the base
`Linked lntereite segment signature: (occurring together in one fragment)
`composition, sequence context, and strand secondary structure,
`Cc—gC
`Ca—tA
`Ac—tG
`Ta—gA
`measurements of the n-mer copy number are fraught with diffi-
`culties25~3“. Furthermore, if it were necessary to estimate the
`strand length by gel electrophoresis, the advantage of SBH over
`conventional sequencing methods would be greatly diminished.
`Several methods have been proposed to increase the readable
`strand length without increasing the array size: additional strand
`hybridizations with longer oligonucleotides that extend over
`putative subfragment junctions”; the hybridization of additional
`oligonucleotides that stack with the hybrids formed at ambigu—
`ously positioned n-mers, in order to increase the effective hybrid
`length”; and the analysis of multiple clones of densely overlap-
`ping random subfragments that have different sets of repeated
`(n-l)-mers27. However, all these modifications are cumbersome
`and time-consuming, and deprive SBH of its inherent beauty—
`the ability to provide an instant result and to be easily automated.
`They also do not overcome its main weakness—the inability to
`obtaining long-range sequence information. As will be seen
`below, this problem can be solved by utilizing novel oligonu—
`cleotide arrays and a novel strategy that combines the power of
`SBH with the advantages of classical sequencing methods.
`
`Sorting Nucleic Acids
`In the applications discussed above, oligonucleotide arrays
`are exclusively used as an analytical tool. Recently, we have
`proposed the use of oligonucleotide arrays for preparative pur-
`poses“. A preparative array is larger than an analytical array,
`and its individual oligonucleotide areas are physically separated
`from one another in the same manner as are the wells in a
`
`microtiter plate. The most obvious application for these prepara—
`tive arrays is the sorting of nucleic acids by the identity of their
`constituent oligonucleotides.
`One well of such an array is shown in Figure 2A. In this
`example, the oligonucleotides are tethered to the surface of the
`well by their 5’ ends, and are “binary”, in the sense that they
`consist of two sequence segments. The 3’-terminal segment
`(of length n) is variable (i.e., its sequence is different in each of
`the 4" wells in the array), whereas the 5’—terminal segment is
`constant (i.e.,
`its sequence is the same in every well in the
`array). The constant segments are longer than the variable
`segments, and are pre—hybridized to complementary masking
`oligonucleotides. As shown in Figure 2B, such an array is
`capable of sorting nucleic acids in! their 3’-termini.
`For example, when human DNA (z 3X109 base pairs) is
`digested with restriction endonuclease PstI the result is :1
`million double-stranded fragments of z 3,000 basepairs mean
`length”. These fragments are modified by ligating an oligonu—
`cleotide adapter to their 5’ ends in order to restore the restriction
`recognition sequence and to generate an additional 5’-terminal
`extension. The fragments are then denatured to release single
`strands (whose number is twice that of the fragments), and the
`mixture is hybridized to the entire array. Because of the large
`size of the preparative array, a temperature gradient can be
`applied across its surface resulting in the temperature in each
`well being close to the hybrid T”, value. After washing away
`unbound strands, the array is incubated with a DNA ligase in
`order to join the 3’ ends of the hybridized strands to the masking
`oligonucleotides. This
`restores
`the restriction recognition
`sequence at the 3’ end, and generates an additional 3’ extension.
`The array is then washed at a higher temperature to remove all
`non-ligated strands. At
`this step, strands hybridized to the
`immobilized oligonucleotides at any other site than the 3’ tenni-
`nus are not ligated (Fig. 2B, inset) and are therefore washed
`
`m—n
`m—m
`w—m
`FIGURE 3. Principle of ordering sequenced restriction frag-
`ments by sorting the strands from an alternate restriction
`digest on a preparative array, amplifying the strands to pro-
`duce both direct and complementary copies, and then survey-
`ing the restriction site-tagged n-mers in each well of the array.
`In the diagram, the first-type restriction sites are shown as
`black rectangles, and the second-type restriction sites as
`cross-hatched rectangles. The signature of an intersite seg-
`ment consists of a combination of two n-mers, one being
`tagged to the first-type restriction site (upper-case letters)
`and the other being tagged to the second-type restriction
`site (lower-case letters). In this example n = 1 for simplicity.
`Linkages between the sequenced fragments are identified
`by noting which pairs of intersite segment signatures are
`found together in more than one well.
`
`away. Thus, each strand species from the PstI digest occupies a
`single well in the array, whose immobilized oligonucleotides
`contain a variable segment that is complementary to the n—mer
`located in that strand immediately upstream from the PstI recog—
`nition sequence. Since every possible n—mer occurs among the
`variable segments in a comprehensive array, no strand species is
`lost. Consequently, this sorting procedure results in a complete
`library of human genome fragments. If n = 8, the strands will
`be distributed among 65,536 wells, with a mean of 30 strand
`species in each well (the expected extremes are 10 and 60
`species). With 1 mle m wells, the size of the array would
`be approximately 1 square foot.
`One may wonder whether, given the complexity of a human
`genome digest, conditions can be found that prevent the hybrid-
`ization of strands in wrong wells? The answer is yes. Experi—
`ments with whole-genome hybridization demonstrate the ability
`of allele-specific oligonucleotides to discriminate against single-
`base mismatches“. Furthermore, the specificity of hybridization
`can be increased orders of magnitude by employing a reversible
`hybridization procedure“. In a preparative array, the hybridized
`strands can be released and rehybridized without intermixing
`the contents of different wells. The unbound strands can then be
`
`washed away, and the entire process of release and rehybridiza—
`tion can be repeated. In each cycle the only strands that are
`available for hybridization are those that were hybridized in the
`previous cycle. Thus, the ratio of perfect hybrids to mismatched
`hybrids will increase exponentially as the number of cycles
`increases. The reversible hybridization procedure can be carried
`out both before and after ligation of the sorted strands to the
`masking oligonucleotides.
`Of course, the final amount of each strand will be very low.
`For further use, the sorted strands should be amplified in situ,
`utilizing a polymerase chain reaction (PCR)37 which can be
`initiated by as few as 100 molecules of a template“. First, a
`complementary copy of each strand is synthesized by a DNA
`polymerase, utilizing the immobilized oligonucleotide as a
`primer (Fig. 2B). The array is then washed vigorously under
`
`BIO/TECHNOLOGY VOL. 12 NOVEMBER 1994 1095
`
`Ariosa Exhibit 1028, pg. 3
`|PR2013-00276
`
`Ariosa Exhibit 1028, pg. 3
`IPR2013-00276
`
`
`
`Tami"M
`A Random strand fragmentation
`a
`s man—macaw —:
`rum—mum—:
`rig—ammo —-
`
`Sorting the fragments by their 3‘~terrnlna| n~rnerl
`B‘s—.4—
`
`B Sorting the strands by their lntemal n-mers
`Tomi
`nun-m
`
`itwat
`
`. © 1994 Nature Publishing Group http://www.nature.com/naturebiotechnology
`immediate neighbors. This can be accomplished by cleaving the
`same DNA at different sites with another restriction endonu-
`clease, in order to produce fragments whose sequence overlaps
`the sequences of neighboring fragments from the first digest,
`and then ascertaining which segments of different fragments
`from the first digest are contained in the same fragment from the
`second digest (Fig. 3). Since these segments are bounded by two
`types of restriction sites, we refer to them as “intersite seg—
`ments”. The problem of determining neighboring fragments
`can thus be reduced to identifying the intersite segments that are
`linked to each other by being present in the same fragment from
`a second digest.
`Strands from the second digest are sorted by their 3’ termini
`on a preparative array, as described above, with the cleaved
`restriction sites being restored by terminal extensions. After
`amplification of the strands by PCR to produce both direct and
`complementary copies, the intersite segments are identified by
`determining their 3’-terminal n-mers. This is achieved by
`hybridizing the strands in each well to analytical arrays that
`contain two types of binary oligonucleotides whose constant
`segments (of length m) are complementary to either the first— or
`the second-type of restriction site, and whose variable n—mers
`are located at their 3’ ends. The strands are hybridized to the
`analytical arrays by their (m+n)—10ng sequences, consisting of
`one of the restriction sites plus the n-mer adjacent to that restric-
`tion site (a tagged n—mer). There are two such n-mers in each
`double-stranded intersite segment, and their combination con-
`stitutes its unique “signature.” The signatures of the intersite
`segments are known in advance, since the fragments from the
`first digest have already been sequenced.
`The intersite segments that are linked to each other in the
`fragments from the second digest are identified by listing, for
`each well, all those pairwise combinations of tagged n—mers that
`match known intersite segment signatures, and noting “linked”
`signatures, i.e. , those found together in at least two wells. These
`are the wells where the complementary strands of the corres-
`ponding fragment from the second-digest have been sorted.
`Statistical estimates show that more than 90% of the PstI human
`
`:' MHBQI:
`Synthesis of Immobilized templates
`
`:- m.
`
`"Wc/
`"“‘L‘EE
`I MYIWID:
`:
`an...”
`Amplification of the immobilized copies
`Transcription of the immobilized templates
`by asymmetric PCR
`s'—’u‘\‘lC<M )‘
`a ...,_, - .._vum~'r_i:ir
`"—1Lmuc )-
`S L—"MNN_—'_ ‘12-
`au WM“
`5»:]«
`5: —~:l.
`9
`some...
`1}
`:L:_wmm:
`xL:_M"m:—'
`FIGURE 4. Two strategies for generating nested strands on
`preparative arrays. The first strategy (A) employs a limited
`random fragmentation of the parental strands, and subse-
`quent sorting of the fragments by their 3’-terminal n-mers
`(n = 8). The second strategy (B) involves sorting the parental
`strands by their internal n-mers, and subsequent synthesis of
`complementary shortened copies, utilizing the immobilized
`oligonucleotides as primers.
`
`strong denaturing conditions to remove all non—covalently bound
`material, including the original sorted strands. PCR is then
`carried out simultaneously in each well, utilizing the immobi-
`lized strand copies as templates and two universal primers, one
`being identical
`to the 5’—terminal extension (introduced into
`each strand prior to sorting), and the other being identical to
`the constant segment of the immobilized oligonucleotides. The
`sorting of nucleic acids on preparative oligonucleotide arrays
`can be used in a number of applications“, two of which are
`discussed below.
`Isolation of individual strands. The pools of strands can be
`sorted further to isolate individual strands”. Since the added
`terminal extensions restore the restriction sites at the strand
`
`termini, the double-stranded fragments generated by PCR can
`be re-digested with the same restriction endonuclease to remove
`these extensions, and the sorting procedure can be repeated with
`each pool of strands. Of course, the direct copies of strands that
`were originally sorted into a well will all have identical 3’-
`terminal n—mers. However, the complementary copies (gener—
`ated by PCR) will have different 3’-terrnini and will therefore be
`sorted into different wells. Since the maximum number of com-
`
`plementary strand species in a pool is only 60, the oligonu-
`cleotides in the second-round sorting arrays may have shorter
`variable segments (e.g., n = 4, corresponding to only 256
`wells) and yet ensure that no more than one strand species will
`occur in most wells.
`
`Preparative arrays can also be used for the isolation of all
`cellular mRNAs, after their conversion into cDNAs. As the
`mean number of mRNA species in a human cell is between
`10,000 and 30,000 (ref. 39), most of their cDNAs can be
`isolated by a single round of sorting when n = 8. Although there
`is a high disproportion in the amount of different mRNA species
`in a cell, the amounts of individual cDNAs will be equalized
`upon sorting as a result of PCR amplification.
`Fragment ordering. The ability to prepare complete frag—
`ment libraries with the aid of comprehensive arrays makes them
`an ideal tool for genome analysis, and in particular, for ordering
`fragments that have already been sequenced“. Sequencing of a
`long DNA, by whatever method, includes digesting it, usually
`with a restriction endonuclease, into fragments of no more than
`a few thousand base pairs in length, and then determining the
`sequence of each fragment. The fragment sequences are then
`put into the correct order by determining, for each fragment, its
`
`1096 BiOfTECHNOLOGY VOL. 12 NOVEMBER 1994
`
`genome fragments can be ordered if n : 8. To order the
`remaining fragments,
`the procedure is repeated with other
`restriction endonucleases.
`
`Preparation of Nested Strands
`Another possibility provided by preparative oligonucleotide
`arrays is the generation of a nested set of shortened strands that
`are truncated from one end“~“°. These nested strands can be used
`
`to obtain long-range sequence information that overcomes the
`inherent ambiguity in SBH. There are two basic nesting strate—
`gies (Fig. 4). In both cases, the parental nucleic acid is modified
`to contain a universal 5’-terminal extension, and the generated
`nested strands consist of parental strands whose 3' ends are
`truncated. The 3’ ends of the nested strands are thus variable and
`their 5’ ends are constant.
`
`One nesting strategy includes a limited random fragmenta-
`tion of a parental nucleic acid by nuclease digestion or chemical
`treatment (Fig. 4A). The resulting fragments are then sorted by
`their 3’ termini, essentially as described above for the sorting of
`full—length strands. The biochemical steps employed by this
`strategy (ligation of the 3’ terminus of a fragment strand to a
`masking oligonucleotide, and copying the ligated strand by
`extending the immobilized oligonucleotide) have recently been
`realized”. To obtain direct copies of the nested strands, they are
`amplified in situ by asymmetric PCR“ in which a primer that is
`identical to the 5’—terminal extension is present in excess. The
`only fragments that are amplified are those that have not lost
`their 5’-terminal extensions during the fragmentation step.
`The other strategy requires that full—length strands be hybrid
`
`Ariosa Exhibit 1028, pg. 4
`|PR2013-00276
`
`Ariosa Exhibit 1028, pg. 4
`IPR2013-00276
`
`
`
`A Sequence:
`ATGQIAATQIMCAQTAT
`
`Neste
`
`:
`
`Pos. No.
`
`@ © 1994 Nature Publishing Group http://www.nature.com/naturebiotechnology
`.
`.
`3 -terminal n-mers of the nested strands
`and the n-mers identified in the wells
`,
`ATG TGC GCT Em m AAT ATC 'rcr AAC ACA CAC ACT TAT
`0
`0
`0
`IT 0
`0
`U
`i}
`0
`{1
`U
`U
`0
`atg atg atg atg atg atg atg atg atg atg atg atg atg
`tqc tgc tgc tgc tgc tgc tgc tgc tqc tgc tgc tgc
`get get get gct get get get get gct gct gct
`cta cta cta cta cta cta cta cta cta cta
`taa taa taa taa taa taa taa taa taa tea
`aat aat aat aat: aat aat aat aat aat aat
`etc etc
`atc atc atc atc atc atc atc
`tct tct
`tct: tct tct tct tct tcl:
`aac
`”wean-M
`aac aac aac aac aac
`aca
`aca aca aca aca
`cac
`cac cac cac
`act
`act act.
`“"‘
`tat
`?
`13
`
`rangeotCTA
`
`1
`
`2
`
`a
`
`?
`
`6
`
`7
`
`B
`
`9
`
`10
`
`11
`
`12
`
`dStrands
`atgctaatctaacacTAT
`atgctaatctaacaETA
`atgctaatctaacACT
`atgctaatctaaCAC
`atgctaatctaACA
`atgctaatctJ-XA;
`atgctaat
`athtaa
`atgctaaTCT
`atgctaATC
`atgctPtA‘T
`atgcgh
`:tglgl
`aTGC
`ATG
`
`FIGURE 5. Principle of sequencing by
`nested strand hybridization. (A) Recon-
`struction of a nucleic acid sequence
`that contains repeated (n-1)-mers. The
`data are obtained by generating all pos-
`sible nested strands of the nucleic acid
`‘
`-
`onapreparative array, and then survey
`ing the n-mers In each well of the array
`(n =3). Uniquely occurring n-mers are
`ordered according to the number of the
`n-mers found in their respective wells.
`The range of a repeated n-mer is delim-
`ited, from the 5’ end by the last n-mer
`whose well does not contain that
`repeated n-mer, and from the 3’ end by
`the first n-mer that is not contained in
`the well of the repeated n-mer. (B) Sepa-
`ration ot the n-mers that belong to differ-
`ent strands in a mixture of nucleic acids
`by identifying maximal sets of n-mers
`that are connected with each other. Two
`n-mers are connected if one of them is
`contained in the other’s well.
`
`1
`ATG
`
`3
`2
`TGc ~Gcr
`\
`C'rA
`\
`TAA
`
`s
`AAT
`
`Assembllng the n-mers
`13
`12
`11
`10
`7
`a
`9
`ATC
`TCT
`Aacs ACA—s GAG—ACT TAT
`\
`\
`/
`CTA /CTA
`CTA c'rA
`\
`/
`TAA
`
`CTA
`TAA
`
`CTA
`TAA
`
`CTA
`
`B
`
`Mixture of sequences:
`AGTCAGCTAC
`AGTCAGCTAC
`
`Connections between n-mers
`
`Sets of connected n-mers
`
`AGT
`
`
`
`
`
`GCT
`
`ized to a preparative array whose oligonucleotides consist of
`only variable sequences (Fig. 4B). In this case,
`the strands
`hybridize to the immobilized oligonucleotides by any comple—
`mentary n—mer along their length, whether it is terminal or not.
`Therefore, each strand species hybridizes to many wells in the
`array, each containing an immobilized oligonucleotide that is
`complementary to a difl’erent n—mer in the sequence. The strands
`are then copied by a DNA polymerase, beginning from the
`location where they have hybridized and continuing up to their
`5’—terminal extension, utilizing the immobilized oligonu-
`cleotides as primers. The resulting immobilized templates are
`used to produce in situ multiple DNA or RNA copies (in the
`latter case, a 5’—terminal extension is utilized that encodes an
`RNA polymerase promoter).
`Whatever strategy is used, the fragments amplified in each
`well correspond to a truncated parental strand that contains the
`region between the 5’ end and the n-mer that is complementary
`to the variable sequence of the immobilized oligonucleotide. If
`some n-mer occurs in the strand at several locations, all corres-
`ponding fragments will be generated in the well. Nesting nucleic
`acids on preparative arrays resembles the procedures employed
`in classical sequencing methods“. The difference is that nested
`strands are sorted here according to the identity of their 3'-
`terminal n-mers, rather than being separated by gel electropho—
`resis according to their lengths.
`
`Sequencing by Nested Strand Hybridization
`Unlike standard SBH, where the data are collected in one
`step by hybridizing a nucleic acid strand to all possible n-mers,
`sequencing by nested strand hybridization (SNSH