`/REVIEW
`
`Oligonucleotide Arrays:
`New Concepts and Possibilities
`Alexander B. Chetverin and Fred Russell Kramer‘
`
`Institute of Protein Research, Russian Academy of Sciences, 142292 Pushchino, Moscow Region, Russia (e—mail: chetverin@vax.ipr.serpukhov.su).
`‘Department of Molecular Genetics, Public Health Research Institute, 455 First Ave., New York, NY 10016 (e-mail: kramer@phri.nyu.edu).
`
`Advances in solid-phase oligonucleotide synthesis and hybridization techniques have led to an incipient
`technology based on the use of oligonucleotide arrays. The inclusion of a large number of oligonucleotide
`probes within a single array greatly reduces the cost of their synthesis and allows thousands of hybridiza-
`tions to be carried out simultaneously. The range of potential applications of oligonucleotide arrays was
`expanded by the realization that nucleic acids can be sequenced by hybridizing them to all possible
`oligonucleotides of a given length. Additional possibilities are offered by novel types of oligonucleotide
`arrays that are capable of parallel sorting, isolating, and manipulating thousands, and even millions, of
`nucleic acid species. Fields, such as site-directed mutagenesis, protein engineering, and recombinant DNA
`technology, would benefit from using these arrays. Further, these approaches could enable the analysis of
`entire genomes by preparing ordered fragment libraries, and by sequencing complex pools of nucleic
`acids, in a novel approach that provides long-range sequence information by generating nested nucleic
`acids and then surveying the oligonucleotides contained in the nested strands. This would allow large
`diploid genomes to be sequenced directly in a completely automated procedure that does not require
`fragment cloning or chromosome mapping.
`
`his paper outlines the prospects of an emerging tech-
`nology based on the use of oligonucleotide arrays.
`The main components of this technology are solid-
`phase oligonucleotide synthesis and nucleic acid
`hybridization.
`Hybridization is a hydrogen-bonding interaction
`between two nucleic acid strands that obey the
`Watson-Crick complementarity rules. All other base pairs are
`mismatches that destabilize hybrids. Since a single mismatch
`decreases the melting temperature (Tm) of a hybrid by up to
`10°C’, conditions can be found in which only perfect hybrids
`survive. Hybridization comprises contacting the strands, one of
`which is usually immobilized on a solid support and the other
`usually bears a radioactive or fluorescent label, and then sepa-
`rating the resulting hybrids from the unreacted labeled strands
`by washing the support. Hybrids are recognized by detecting the
`label bound to the surface of the support.
`Oligonucleotide hybridization is widely used to determine
`the presence in a nucleic acid of a sequence that is complemen-
`tary to the oligonucleotide probe. In many cases, this provides a
`simple,
`fast, and inexpensive alternative to conventional
`sequencing methods”. Hybridization does not require nucleic
`acid cloning and purification, carrying out base—specific reac-
`tions, or tedious electrophoretic separations. Hybridization of
`oligonucleotide probes has been successfully used for various
`purposes, such as the analysis of genetic polymorphisms‘, diag-
`nosis of genetic diseases5, cancer diagnostics”, detection of viral
`and microbial pathogens”, screening of clones°, genome map-
`ping‘‘’, and the ordering of fragment libraries". Hybridization is
`often used in combination with ligation of hybridized probes by
`a DNA ligase” or their extension by a DNA polymerase”,
`which increases the sensitivity and signal—to—noise ratio, mainly
`due to overcoming the mismatches at hybrid termini that are the
`most difficult to discriminate against“.
`Informationally,
`the difference between conventional
`sequencing methods and oligonucleotide hybridization is analo-
`gous to the difference between reading a text by letters and
`reading it by words. The latter is faster, but requires knowledge
`of all the words. That is why the nucleic acids that are currently
`analyzed by hybridization are those whose sequence is known
`
`and whose presence in a sample is expected. The analysis of
`unknown sequences or unknown sequence variants requires that
`they be hybridized to all possible oligonucleotides, whose num-
`ber N is an exponential function of their length n: N = 4"
`(for example, N = 65,536 when n = 8, and N = 1,048,576
`when n = 10). Such large-scale hybridizations would not be
`feasible if each oligonucleotide had to be synthesized and
`hybridized individually. However, this approach is now a real
`possibility because of the invention of oligonucleotide arrays".
`An oligonucleotide array is comprised of a number of indi-
`vidual oligonucleotide species tethered to the surface of a solid
`support in a regular pattern, each one in a different area, so that
`the location of each oligonucleotide is known. Oligonucleotide
`arrays can be prepared by synthesizing all the oligonucleotides,
`in parallel, directly on the support, employing the methods of
`solid-phase chemical synthesis in combination with site-direct-
`ing masks‘5“". Such masks direct a particular nucleotide mono-
`mer (A, T, G or C) to react with a predetermined exposed area
`on the surface of the support. Four masks with non—overlapping
`windows and four coupling reactions are required to increase the
`length of the tethered oligonucleotides by one. In each subse-
`quent round of synthesis 21 different set of four masks is used,
`and this determines the unique sequence of the oligonucleotides
`synthesized in each particular area. The total number of coup-
`ling reactions needed to synthesize all possible rz—mers is 4 ><n.
`Thus, all possible octamers can be synthesized on an array in
`only 32 reactions, whereas as many as 524,288 reactions (8 X4”)
`are needed to synthesize them individually. Chemistries have
`been developed” so that the growing end of the oligonucleotides
`can be either the 5’ or the 3’ end. An efficient photolithographic
`technique has been invented“’v“‘ for manufacturing miniature
`arrays containing as many as 105 individual oligonucleotide
`areas per cm2, and there is no fundamental problem in increas-
`ing the density to up to the 10”‘ areas per cm2 that is now
`achievable in semiconductor fabrication”-'9.
`
`Thus, a miniature array can contain a large number of oli-
`gonucleotide probes, and all of them can be simultaneously
`hybridized to a nucleic acid sample in one experiment, thereby
`greatly reducing the time required for analysis and eliminating
`the need for the costly synthesis of individual oligonucleotides.
`
`BIONECHNOLOGY VOL. 12 NOVEMBER 1994 1093
`
`Ariosa Exhibit 1028, pg. 1
`|PR201 13-00277
`
`Ariosa Exhibit 1028, pg. 1
`IPR2013-00277
`
`
`
`_ © 1994 Nature Publishing Group http://www.nature.com/naturebiotechnology
`each area at a predetermined time when the temperature reaches
`Constituent n-mers
`the T,,, value of that particular hybrid", and adjusting the surface
`ACG CGG GGT GTT TTA TAT ATC Tcc
`concentration of the immobilized probes so that the rates of
`hybrid dissociation are independent of base composition”.
`An array can contain a chosen collection of oligonucleotides,
`e.g., probes specific for all known clinically important patho-
`gens or specific for all known sequence markers of genetic
`diseases”-Z‘. Such an array can satisfy the needs of a diagnostic
`laboratory. Alternatively, an array can contain all possible oli-
`gonucleotides“ of a given length n. Hybridization of a nucleic
`acid with such a “comprehensive” array results in a list of all its
`constituent n-mers, which can be used for unambiguous gene
`identification (e.g., in forensic studies), for determination of
`unknown gene variants and mutations (including the sequencing
`of related genomes once the sequence of one of them is known),
`for finding overlapping clones, and for checking sequences
`determined by conventional methods. Finally, surveying the
`n-mers by hybridization to a comprehensive array can provide
`sufficient information to determine the sequence of a totally
`unknown nucleic acid, as discussed below.
`
`Aligning
`the mmers
`at (,,_1)_|°ng
`subsequences
`
`P-(Gigs
`GGT
`G$_'%‘A
`T;-gcTCC
`
`—>
`
`Sequence
`ACGGTTATCC
`
`Sequence with repeated (n-1)-mers
`&T:§@GT
`Constituent n-mers
`GCA CAT ATG TGC GCC CCA CAG AGT
`
`Some variants of sequence reconstruction
`subfragments
`Sequences
`QQ @T§§
`QGT —> GCATGCCAGT
`
`@T§‘§
`
`@GT —> GCCATGCAGT
`
`A
`
`B
`
`gg §l,T§{§ gxfil WTQK3 gem EGT —> GCATGCCATGCCAGT
`
`@T§fi§ QE
`
`@GT —> GCCATGCATGCAGT
`
`glT§§§ & glT§§
`
`QGT —> CATGCATGCCAGT
`
`EGT —> CATGCCATGCAGT
`@T§;§ Qttgl @T§§t§
`FIGURE 1. Principle of sequencing by hybridization.
`(A) Reconstruction of a nucleic acid sequence from the list of
`its constituent n-mers identified by its hybridization to a com-
`prehensive set of oligonucleotide probes.
`(B)
`It repeated
`(n-1)-mers are present in a sequence,
`it cannot be recon-
`structed unambiguously. Assembly of the n-mers results in
`multiple subfragments that can be permuted and/or repeated
`in different ways. In these examples, n = 3 in order to simplify
`the illustration. Crosshatches and boxes indicate repeated
`(n-1)-mers.
`
`A
`
`B
`
`A well of a preparative array
`
`
`
`‘
`‘ Dlguting DNA with Plti
`he-cvnauflfi—'nMoe0f—
`jI.B1ll»Avuctoom lrrwonv-I
`Extending the 5' tennlnl’
`I‘ WW%‘
`raflfiuvuctu I‘
`
`Melting and hybridizing
`to the preparative array
`
`nysmiwion byin: 1-lumnl... r
`k3“'°"
`
`Hybddtuttonbyotnouitu
`aooanotloadtolloatioa
`
`
`
`I’
`I‘
`
`loads to ligation-
`at ,naucvr
`
`I45
`
`..and nymnuln at an lrnmoulnd copy
`
`FIGURE 2. An example of sorting nucleic acid strands on a
`preparative array. (A) Structure of the oligonucleotides immo-
`bilized in a well of the preparative array. (B) Principle of sorting
`the strands from a restriction digest of human DNA by virtue of
`the n-mer (n
`8) located immediately upstream from the
`endonuclease recognition sequence.
`
`Of course, different probes contained in the same array would
`produce hybrids with different stabilities because of their different
`base composition. This problem can be overcome ir1 a number of
`ways,
`including carrying out hybridization in the presence of
`tetraalkylammonium salts that largely eliminate the difference in
`the stability of G:C and A:T base pairs?“-2‘, washing the array at
`steadily increasing temperature and collecting the signal from
`
`1094 BIO/TECHNOLOGY VOL. 12 NOVEMBER 1994
`
`Sequencing by Hybridization
`Surveying the n-mers in a nucleic acid is analogous to listing
`the words contained in a text. This would not make much sense
`
`unless we know how the words are connected. Fortunately,
`unlike common words,
`the n-mers in a nucleic acid strand
`overlap one another so that each non-terrninal n-mer includes
`the last n-l nucleotides of the preceding n-mer and the first rz-1
`nucleotides of the next n-mer. This allows the surveyed n-mers
`to be assembled by overlapping them at their (n-l)—long subse-
`quences [(n-1)-mers] and, thus, to reconstruct the sequence of
`the analyzed nucleic acid (Fig. 1A). This strategy for sequence
`determination has independently been proposed by groups in
`Great Britain‘“‘’, Yugoslavia”, and Russia” and is called
`“sequencing by hybridization” (SBH). SBH can surpass con-
`ventional sequencing procedures in a number of parameters,
`including speed, cost, quality of the results, and ease of automa-
`tion. Test sequencing of z 100 nucleotide—long DNA strands by
`hybridization with octamers has demonstrated that the method is
`feasible and is tolerant of occasional hybridization errors2"~3". It
`has also been shown that costs can be reduced further by
`employing combinatorial arrays in which the individual areas
`contain groups of selected n-mers. This saves array space by an
`order of magnitude, with only a slight loss in resolution3"~‘2.
`However, there is an inherent flaw in the SBH method that
`undermines its advantages. SBH relies exclusively on short-
`range information provided by the sequences of the surveyed
`n-mers, and success in assembling the n-mers is absolutely
`dependent on whether or not their (n-l)—long overlaps are
`unique. Put another way, success is dependent on whether or not
`there are repeated (n-1)-mers in the nucleic acid being analyzed.
`Strand reconstruction terminates at non—unique (n-1)-mers and
`the resulting subfragments can be permuted and/or repeated in
`many different ways without conflicting with the hybridization
`data (Fig. 1B). When n
`8 only 94%, 32% and 0.9% of
`random sequences of 50, 200 and 400 nucleotides in length,
`respectively, can be reconstructed unambiguously. (The remain-
`ing sequences contain repeated heptamers.) The situation is
`even worse with natural nucleic acids, since they usually contain
`more repeats than do random sequences. Utilizing longer
`probes would reduce the ambiguities, but would result in an
`exponential increase in cost. For example, if n is increased from
`8 to 12, then the length of random strands that can be sequenced
`with 95% success increases from 47 to 666 bases (14-fold),
`whereas the number of probes required increases 256—fold.
`Computer simulations demonstrate that the resolvable strand
`length can be increased by a factor of 4 if additional information
`
`Ariosa Exhibit 1028, pg. 2
`|PR201 13-00277
`
`Ariosa Exhibit 1028, pg. 2
`IPR2013-00277
`
`
`
`51 cal
`J'IC
`
`Second type fragments (sorted on a preparative array)
`
`ll GI
`cil
`Ill U A.
`5'IV CI I775-,,
`[it
`I?
`ll
`ilt
`IA ll
`3'IC I! D ,IC
`Linked signatures (found together in two wells)
`gC—Ca
`tA—Ac
`[G--Ta
`
`e[]
`ii
`
`[1 Ah-
`[iii
`I5‘
`
`@ © 1994 Nature Publishing Group http://www.nature.com/naturebiotechnology
`is known, such as the sequences at the strand termini,
`the
`First-type fragments of a DNA (sequenced)
`approximate strand length, or the copy number of each n—mer in
`am An-
`.cII
`at] All
`cm c||
`the strand’““3. However, as the intensity of the hybridization
`I-19
`I5‘
`I IC
`i"lt
`I IA i'lt
`I IT
`signal is influenced by a number of factors, such as the base
`Untied lnterclte segment signatures (occurring together In one fragment)
`composition, sequence context, and strand secondary structure,
`Cc-ac
`Ca—tA
`Ac—|G
`Ta—gA
`measurements of the n-mer copy number are fraught with diffi-
`cultiesz-"~*“’. Furthermore, if it were necessary to estimate the
`strand length by gel electrophoresis, the advantage of SBH over
`conventional sequencing methods would be greatly diminished.
`Several methods have been proposed to increase the readable
`strand length without increasing the array size: additional strand
`hybridizations with longer oligonucleotides that extend over
`putative subfragment junctions”; the hybridization of additional
`oligonucleotides that stack with the hybrids formed at ambigu-
`ously positioned n-mers, in order to increase the effective hybrid
`length”; and the analysis of multiple clones of densely overlap-
`ping random subfragments that have different sets of repeated
`(n-1)-mers”. However, all these modifications are cumbersome
`and time—consun1ing, and deprive SBH of its inherent beauty-
`the ability to provide an instant result and to be easily automated.
`They also do not overcome its main weakness—the inability to
`obtaining long-range sequence information. As will be seen
`below, this problem can be solved by utilizing novel oligonu-
`cleotide arrays and a novel strategy that combines the power of
`SBH with the advantages of classical sequencing methods.
`
`Cc-cc
`
`Ordering the fragments
`Ca—tA
`Ac-tG
`
`Ta—gA
`
`tG—Ta
`tA—Ac
`QC-Ca
`FIGURE 3. Principle of ordering sequenced restriction frag-
`ments by sorting the strands from an alternate restriction
`digest on a preparative array, amplifying the strands to pro-
`duce both direct and complementary copies, and then survey-
`ing the restriction site-tagged n-mers in each well of the array.
`In the diagram, the first-type restriction sites are shown as
`black rectangles, and the second-type restriction sites as
`cross-hatched rectangles. The signature of an intersite seg-
`ment consists of a combination of two n-mers, one being
`tagged to the first-type restriction site (upper-case letters)
`and the other being tagged to the second-type restriction
`site (lower-case letters). in this example n = 1 for simplicity.
`Linkages between the sequenced fragments are identified
`by noting which pairs of intersite segment signatures are
`found together in more than one well.
`
`away. Thus, each strand species from the Pstl digest occupies a
`single well in the army, whose immobilized oligonucleotides
`contain a variable segment that is complementary to the n-mer
`located in that strand immediately upstream from the Pstl recog-
`nition sequence. Since every possible rz-mer occurs among the
`variable segments in a comprehensive array, no strand species is
`lost. Consequently, this sorting procedure results in a complete
`library of human genome fragments. If n = 8, the strands will
`be distributed among 65 ,536 wells, with a mean of 30 strand
`species in each well (the expected extremes are 10 and 60
`species). With 1 mm><l mm wells, the size of the array would
`be approximately 1 square foot.
`One may wonder whether, given the complexity of a human
`genome digest, conditions can be found that prevent the hybrid-
`ization of strands in wrong wells? The answer is yes. Experi-
`ments with whole-genome hybridization demonstrate the ability
`of allele-specific oligonucleotides to discriminate against single-
`base mismatches“. Furthermore, the specificity of hybridization
`can be increased orders of magnitude by employing a reversible
`hybridization procedure“. In a preparative array, the hybridized
`strands can be released and rehybridized without intermixing
`the contents of different wells. The unbound strands can then be
`
`washed away, and the entire process of release and rehybridiza—
`tion can be repeated. In each cycle the only strands that are
`available for hybridization are those that were hybridized in the
`previous cycle. Thus, the ratio of perfect hybrids to mismatched
`hybrids will increase exponentially as the number of cycles
`increases. The reversible hybridization procedure can be carried
`out both before and after ligation of the sorted strands to the
`masking oligonucleotides.
`Of course, the final amount of each strand will be very low.
`For further use, the sorted strands should be amplified in situ,
`utilizing a polymerase chain reaction (PCR)37 which can be
`initiated by as few as 100 molecules of a ternplate“. First, a
`complementary copy of each strand is synthesized by a DNA
`polymerase, utilizing the immobilized oligonucleotide as a
`primer (Fig. 2B). The array is then washed vigorously under
`
`BIO/TECHNOLOGY VOL. 12 NOVEMBER 1994 1095
`
`Ariosa Exhibit 1028, pg. 3
`|PR201 3-00277
`
`Sorting Nucleic Acids
`In the applications discussed above, oligonucleotide arrays
`are exclusively used as an analytical tool. Recently, we have
`proposed the use of oligonucleotide arrays for preparative pur-
`poses“. A preparative array is larger than an analytical array,
`and its individual oligonucleotide areas are physically separated
`from one another in the same manner as are the wells in a
`
`microtiter plate. The most obvious application for these prepara-
`tive arrays is the sorting of nucleic acids by the identity of their
`constituent oligonucleotides.
`One well of such an array is shown in Figure 2A. In this
`example, the oligonucleotides are tethered to the surface of the
`well by their 5’ ends, and are “binary”, in the sense that they
`consist of two sequence segments. The 3’-terminal segment
`(of length n) is variable (i.e., its sequence is different in each of
`the 4" wells in the array), whereas the 5’—terminal segment is
`constant (i.e.,
`its sequence is the same in every well in the
`array). The constant segments are longer than the variable
`segments, and are pre—hybridized to complementary masking
`oligonucleotides. As shown in Figure 2B, such an array is
`capable of sorting nucleic acids tw their 3’-termini.
`For example, when human DNA (= 3x109 base pairs) is
`digested with restriction endonuclease Pstl the result is =1
`million double-stranded fragments of z 3,000 basepairs mean
`length”. These fragments are modified by ligating an oligonu-
`cleotide adapter to their 5’ ends in order to restore the restriction
`recognition sequence and to generate an additional 5’-terminal
`extension. The fragments are then denatured to release single
`strands (whose number is twice that of the fragments), and the
`mixture is hybridized to the entire array. Because of the large
`size of the preparative array, a temperature gradient can be
`applied across its surface resulting in the temperature in each
`well being close to the hybrid T," value. After washing away
`unbound strands, the array is incubated with a DNA ligase in
`order to join the 3’ ends of the hybridized strands to the masking
`oligonucleotides. This
`restores
`the restriction recognition
`sequence at the 3’ end, and generates an additional 3’ extension.
`The array is then washed at a higher temperature to remove all
`non—ligated strands. At
`this step, strands hybridized to the
`immobilized oligonucleotides at any other site than the 3’ tenni-
`nus are not ligated (Fig. 2B, inset) and are therefore washed
`
`Ariosa Exhibit 1028, pg. 3
`IPR2013-00277
`
`
`
`lumuul
`A Random strand lraqmontatlon
`llithlmt
`.
`5 W'W' —Y
`9'l 'W‘~C‘<'—I
`" ’W¢<°.—|
`Sorting the iragmant: by their 3‘~terrnina| n-mers
`,.
`.
`
`,.
`"“““i
`
`B Sorting the strands by their lntemal n-mers
`r
`I
`romumi
`.xlnvvlnrl
`
`VInca:
`
`l‘AAnaor:
`Synthesis oi Immobilized templates
`5' arm ;
`1'11:
`
`nun:
`
`I.
`
`_ © 1994 Nature Publishing Group http://www.nature.com/naturebiotechnology
`immediate neighbors. This can be accomplished by cleaving the
`same DNA at different sites with another restriction endonu-
`clease, in order to produce fragments whose sequence overlaps
`the sequences of neighboring fragments from the first digest,
`and then ascertaining which segments of different fragments
`from the first digest are contained in the same fragment from the
`second digest (Fig. 3). Since these segments are bounded by two
`types of restriction sites, we refer to them as “intersite seg-
`ments”. The problem of determining neighboring fragments
`can thus be reduced to identifying the intersite segments that are
`linked to each other by being present in the same fragment from
`a second digest.
`Strands from the second digest are sorted by their 3’ termini
`on a preparative array, as described above, with the cleaved
`restriction sites being restored by terminal extensions. After
`amplification of the strands by PCR to produce both direct and
`complementary copies, the intersite segments are identified by
`determining their 3’-terminal n-mers. This is achieved by
`hybridizing the strands in each well to analytical arrays that
`contain two types of binary oligonucleotides whose constant
`segments (of length m) are complementary to either the first— or
`the second-type of restriction site, and whose variable n-mers
`are located at their 3’ ends. The strands are hybridized to the
`analytical arrays by their (m+n)-long sequences, consisting of
`one of the restriction sites plus the n-mer adjacent to that restric-
`tion site (a tagged n—mer). There are two such n-mers in each
`double-stranded intersite segment, and their combination con-
`stitutes its unique “signature.” The signatures of the intersite
`segments are known in advance, since the fragments from the
`first digest have already been sequenced.
`The intersite segments that are linked to each other in the
`fragments from the second digest are identified by listing, for
`each well, all those pairwise combinations of tagged n-mers that
`match known intersite segment signatures, and noting “linked”
`signatures, i.e. , those found together in at least two wells. These
`are the wells where the complementary strands of the corres-
`ponding fragment from the second-digest have been sorted.
`Statistical estimates show that more than 90% of the PstI human
`
`1 AAYIualn — 2
`Mtroat:
`Amplification oi the immobilized copies
`Transcription ol the immobilized templates
`by asymmetric PCR
`5‘-Mucucr
`:..,.,.._.,jv'-<«.-r ::rr
`"j’“““‘“"
`V" '
`' jvvur<'~<:.—'
`‘._r:-
`nupooymnnq
`s» ii
`5‘;cH"‘ ‘
`9
`.........
`«Q
`1 utrm.to;
`3' 'N‘“
`FIGURE 4. Two strategies for generating nested strands on
`preparative arrays. The first strategy (A) employs a limited
`random fragmentation of the parental strands, and subse-
`quent sorting of the fragments by their 3’-terminal n-mers
`(n = 8). The second strategy (B) involves sorting the parental
`strands by their internal n-mers, and subsequent synthesis of
`complementary shortened copies, utilizing the immobilized
`oligonucleotides as primers.
`
`strong denaturing conditions to remove all non-covalently bound
`material, including the original sorted strands. PCR is then
`carried out simultaneously in each well, utilizing the immobi-
`lized strand copies as templates and two universal primers, one
`being identical
`to the 5’-terrninal extension (introduced into
`each strand prior to sorting), and the other being identical to
`the constant segment of the immobilized oligonucleotides. The
`sorting of nucleic acids on preparative oligonucleotide arrays
`can be used in a number of applications“, two of which are
`discussed below.
`Isolation of individual strands. The pools of strands can be
`sorted further to isolate individual strands”. Since the added
`terminal extensions restore the restriction sites at the strand
`
`termini, the double-stranded fragments generated by PCR can
`be re-digested with the same restriction endonuclease to remove
`these extensions, and the sorting procedure can be repeated with
`each pool of strands. Of course, the direct copies of strands that
`were originally sorted into a well will all have identical 3’-
`terminal n-mers. However, the complementary copies (gener-
`ated by PCR) will have different 3’-termini and will therefore be
`sorted into different wells. Since the maximum number of com-
`
`plementary strand species in a pool is only 60, the oligonu-
`cleotides in the second-round sorting arrays may have shorter
`variable segments (e.g., n = 4, corresponding to only 256
`wells) and yet ensure that no more than one strand species will
`occur in most wells.
`
`Preparative arrays can also be used for the isolation of all
`cellular mRNAs, after their conversion into cDNAs. As the
`mean number of mRNA species in a human cell is between
`10,000 and 30,000 (ref. 39), most of their cDNAs can be
`isolated by a single round of sorting when n = 8. Although there
`is a high disproportion in the amount of different mRNA species
`in a cell, the amounts of individual cDNAs will be equalized
`upon sorting as a result of PCR amplification.
`Fragment ordering. The ability to prepare complete frag-
`ment libraries with the aid of comprehensive arrays makes them
`an ideal tool for genome analysis, and in particular, for ordering
`fragments that have already been sequenced“. Sequencing of a
`long DNA, by whatever method, includes digesting it, usually
`with a restriction endonuclease, into fragments of no more than
`a few thousand base pairs in length, and then determining the
`sequence of each fragment. The fragment sequences are then
`put into the correct order by determining, for each fragment, its
`
`1096 BIOITECHNOLOGY VOL. 12 NOVEMBER 1994
`
`genome fragments can be ordered if n = 8. To order the
`remaining fragments,
`the procedure is repeated with other
`restriction endonucleases.
`
`Preparation of Nested Strands
`Another possibility provided by preparative oligonucleotide
`arrays is the generation of a nested set of shortened strands that
`are truncated from one end“-4°. These nested strands can be used
`
`to obtain long-range sequence information that overcomes the
`inherent ambiguity in SBH. There are two basic nesting strate-
`gies (Fig. 4). In both cases, the parental nucleic acid is modified
`to contain a universal 5’-terrninal extension, and the generated
`nested strands consist of parental strands whose 3’ ends are
`truncated. The 3’ ends of the nested strands are thus variable and
`their 5’ ends are constant.
`
`One nesting strategy includes a limited random fragmenta-
`tion of a parental nucleic acid by nuclease digestion or chemical
`treatment (Fig. 4A). The resulting fragments are then sorted by
`their 3’ termini, essentially as described above for the sorting of
`full—length strands. The biochemical steps employed by this
`strategy (ligation of the 3’ terminus of a fragment strand to a
`masking oligonucleotide, and copying the ligated strand by
`extending the immobilized oligonucleotide) have recently been
`realized“. To obtain direct copies of the nested strands, they are
`amplified in situ by asymmetric PCR“ in which a primer that is
`identical to the 5’-terminal extension is present in excess. The
`only fragments that are amplified are those that have not lost
`their 5’-terminal extensions during the fragmentation step.
`The other strategy requires that full—length strands be hybrid
`
`Ariosa Exhibit 1028, pg. 4
`|PR201 13-00277
`
`Ariosa Exhibit 1028, pg. 4
`IPR2013-00277
`
`
`
`@ © 1994 Nature Publishing Group http://www.nature.com/naturebiotechnology
`3'-terminal n-mers of the nested strands
`and the n-mers identified in the wells
`‘
`
`A sequence:
`A'rG<;1AATg_'1mcAc_'1:A'r
`
`N
`
`:
`
`t
`es edstrands
`atgctaatctaacacTAT
`atgctaatctaaCaETA
`atgctaatctaacAC'I‘
`3t9C\’-a3tCtaaCAC
`3t9Cl‘-aatcta-ACA
`at9Ct33tC'~'AA§3
`atgctaa
`atgctaa
`at9Cta3TCT
`3t9I=t3ATC
`atgctAAg
`at9c¥§)>
`Ztgaffl
`31-GC
`ATG
`
`P°5-N°-
`
`AAT ATC 'rc'r AAC ACA CAC ACT TAT
`ATG 'rc;c GCT §EtT]
`0
`U
`U
`U
`U
`U»
`U
`U
`0
`8
`0
`17 0
`atg atg atg atg atg atg atg atg atg atg atg atg atg
`tqc tgc tgc tgc tgc tgc tgc tgc tqc tgc tgc tgc
`gct gct gct gct gct gct gct gct gct gct gct
`cta cta cta cta cta cta cta cta cta ctza
`taa taa taa taa taa taa taa taa taa taa
`aat aat aat aat: aat aat aat aat aat aat
`atc atc
`atc atc atc atc atc atc atc
`tct tct
`tct: tct tct tct tct tcl:
`aat:
`,a,,geo,-1-AA
`aac aac aac aac aac
`aca
`aca aca aca aca
`cac
`cac cac cac
`act mngeolCTA
`act act
`tat
`13
`
`1
`
`2
`
`3
`
`T’
`
`?
`
`5
`
`7
`
`3
`
`9
`
`1°
`
`11
`
`12
`
`FIGURE 5. Principle of sequencing by
`nested strand hybridization. (A) Recon-
`struction of a nucleic acid sequence
`that contains repeated (n-1)-mers. The
`data are obtained by generating alIpo_s-
`slble nested strands of the nucleic actd
`
`Pnapreparwv? a"ay’a"d the" survey‘
`mg the n-mers in each well of the array
`(n =3). Uniquely occurring n-mers are
`ordered according to the number of the
`n-mers found in their respective wells.
`The range of a repeated n-mer is delim-
`ited, from the 5' end by the last n-mer
`whose well does not contain that
`repeated n-mer, and from the 3’ end by
`the first n-mer that is not contained in
`the well of the repeated n-mer. (B) Sepa-
`ration ot the n-mers that belong to differ-
`ent strands in a mixture of nucleic acids
`by identifying maximal sets of n-mers
`that are connected with each other. Two
`n-mers are connected if one of them is
`contained in the other’s well.
`
`1
`ATG
`
`3
`2
`-rec —c;c'r
`
`5
`AA-r
`
`Assembling the n-mers
`12
`11
`10
`7
`3
`9
`ATC
`'rc'r
`AAc— ACA—~ CAC-—AC'1‘
`
`13
`1-AT
`
`\C'l‘A
`\
`TAA
`
`CTA
`TAA
`
`CTA
`TAA
`
`\CTA / CTA
`\
`/
`TAA
`
`CTA
`
`CTA \c'r1/5.
`
`B
`
`Mixture of sequences:
`AGTCAGCTAC
`AGTCAGCTAC
`
`Connections between n-mers
`
`Sets of connected n-mers
`
`AGT
`
`
`
`
`
`GCT
`
`ized to a preparative array whose oligonucleotides consist of
`only variable sequences (Fig. 4B). In this case,
`the strands
`hybridize to the immobilized oligonucleotides by any comple-
`mentary n—mer along their length, whether it is terminal or not.
`Therefore, each strand species hybridizes to many wells in the
`array, each containing an immobilized oligonucleotide that is
`complementary to a different n-mer in the sequence. The strands
`are then copied by a DNA polymerase, beginning from the
`location where they have hybridized and continuing up to their
`5’—terrninal extension, utilizing the immobilized oligonu-
`cleotides as primers. The resulting immobilized templates are
`used to produce in situ multiple DNA or RNA copies (in the
`latter case, a 5’—terrninal extension is utilized that encodes an
`RNA polymerase promoter).
`Whatever strategy is used, the fragments amplified in each
`well correspond to a truncated parental strand that contains the
`region between the 5’ end and the n-mer that is complementary
`to the variable sequence of the immobilized oligonucleotide. If
`some n-mer occurs in the strand at several locations, all corres-
`ponding fragments will be generated in the well. Nesting nucleic
`acids on preparative arrays resembles the procedures employed
`in classical sequencing methods“. The difference is that nested
`strands are sorted here according to the identity of their 3’-
`terrninal n-mers, rather than being separated by gel electropho-
`resis according to their lengths.
`
`Sequencing by Nested Strand Hybridization
`Unlike standard SBH, where the data are collected in one
`step by hybridizing a nucleic acid