throbber
Counting individual DNA molecules by the
`stochastic attachment of diverse labels
`
`Glenn K. Fu, Jing Hu, Pei-Hua Wang, and Stephen P. A. Fodor1
`
`Affymetrix, Inc., 3420 Central Expressway, Santa Clara, CA 95051
`
`Edited* by Ronald W. Davis, Stanford Genome Technology Center, Palo Alto, CA, and approved March 22, 2011 (received for review November 27, 2010)
`
`We implement a unique strategy for single molecule counting
`termed stochastic labeling, where random attachment of a diverse
`set of labels converts a population of identical DNA molecules
`into a population of distinct DNA molecules suitable for threshold
`detection. The conceptual framework for stochastic labeling is
`developed and experimentally demonstrated by determining the
`absolute and relative number of selected genes after stochastically
`labeling approximately 360,000 different fragments of the human
`genome. The approach does not require the physical separation of
`molecules and takes advantage of highly parallel methods such as
`microarray and sequencing technologies to simultaneously count
`absolute numbers of multiple targets. Stochastic labeling should
`be particularly useful for determining the absolute numbers of
`RNA or DNA molecules in single cells.
`
`absolute counting ∣ digital PCR ∣ next-generation sequencing ∣
`single molecule detection
`
`Determining small numbers of biological molecules and their
`
`changes is essential when unraveling mechanisms of cellular
`response, differentiation or signal transduction, and in perform-
`ing a wide variety of clinical measurements. Although many ana-
`lytical methods have been developed to measure the relative
`abundance of different molecules through sampling (e.g., micro-
`arrays and sequencing), the only practical method available to
`determine the absolute number of molecules in a sample is digital
`PCR (1–3), a powerful analytical technique typically limited to
`examining only a few different molecules at a time.
`In 2003, a theoretical approach to measure the number of
`molecules of a single mRNA species in a complex mRNA pre-
`paration was proposed (4). To our knowledge no experimental
`demonstration of this idea has been published. We have general-
`ized this idea and have expanded it to a highly parallel method
`capable of absolute counting of many different molecules simul-
`taneously. The concept is illustrated in Fig. 1. Each copy of a
`molecule randomly captures a label by choosing from a large,
`nondepleting reservoir of diverse labels. The subsequent diversity
`of the labeled molecules is governed by the statistics of random
`choice, and depends on the number of copies of identical mole-
`cules in the collection compared to the number of kinds of labels.
`Once the molecules are labeled, they can be amplified so that
`simple present/absent threshold detection methods can be used
`for each. Counting the number of distinctly labeled targets
`reveals the original number of molecules of each species.
`We can generalize the stochastic labeling process as follows.
`Consider a given set of copies of a single target sequence
`T ¼ ft1;t2…tng; where n is the number of copies of T. A set of
`labels is defined as L ¼ fl1;l2…lmg; where m is the number of
`different labels. T reacts stochastically with L, such that each t
`becomes attached to one l. If the ls are in nondepleting excess,
`each t will choose one l randomly, and will take on a new identity
`litj; where li is chosen from L and j is the jth copy from the set
`of n molecules. We identify each new molecule litj by its label
`subscript and drop the subscript for the copies of T because they
`are identical. The new collection of molecules becomes T ¼
`fl1t;l2t;…litg; where li is the ith choice from the set of m labels.
`At this point, the subscripts of l refer only to the ith choice and
`
`Identical DNA
`target molecules {t1, t2 …. tn}
`
`Random
`labeling
`
`Amplification and detection
`of k distinctly labeled molecules
`
`Pool of labels
`{l1 , l2 …. lm}
`
`t1
`
`t2
`
`t3
`
`t4
`
`t1l20
`
`t2l107
`
`t3l477
`
`t4l9
`
`Fig. 1. A schematic representation of the labeling process. An example
`showing four identical target molecules in solution. Each DNA molecule ran-
`domly captures and joins with a label by choosing from a large, nondepleting
`reservoir of m labels. Each resulting labeled DNA molecule takes on a new
`identity and is amplified to detect the number of k distinct labels.
`
`provide no information about the identity of each l. In fact, l1 and
`l2 will have some probability of being identical, depending upon
`the diversity m of the set of labels. Overall, T will contain a set
`of k unique labels resulting from n targets choosing from the non-
`depleting reservoir of m labels. Or, T ðm;nÞ ¼ flktg; where k
`represents the number of unique labels that have been captured.
`In all cases, k will be smaller than m, approaching m only when
`n becomes very large. We can define the stochastic attachment
`of the set of labels on a target using a stochastic operator S with
`m members, acting upon a target population of n, such that
`SðmÞTðnÞ ¼ T ðm;nÞ generating the set flktg. Furthermore, be-
`cause S operates on all molecules independently, it can act on
`many different targets. Hence, by combining the information
`of target sequence and label, we can simultaneously count copies
`of multiple target sequences. The probability of the number of
`labels generated by the number of trials n, from a diversity
`of m, can be approximated by the Poisson equation, Px ¼
`½ðn∕mÞx∕x!Še−ðn∕mÞ. Then P0 is the probability that a label will
`not be chosen in n trials, therefore, 1 − P0 is the probability that
`a label will occur at least once. It follows that the expected num-
`ber of unique labels captured is given by:
`
`Author contributions: G.K.F. and S.P.A.F. designed research; G.K.F. and P.-H.W. performed
`research; G.K.F., J.H., and S.P.A.F. analyzed data; and G.K.F., J.H., and S.P.A.F. wrote
`the paper.
`
`Conflict of interest statement: The authors are employees of Affymetrix, Inc. and the
`subject matter of this article may be a future commercial product.
`
`*This Direct Submission article had a prearranged editor.
`
`Freely available online through the PNAS open access option.
`
`1To whom correspondence should be addressed. E-mail: steve_fodor@affymetrix.com.
`
`This article contains supporting information online at www.pnas.org/lookup/suppl/
`doi:10.1073/pnas.1017621108/-/DCSupplemental.
`
`9026–9031 ∣ PNAS ∣ May 31, 2011 ∣ vol. 108 ∣ no. 22
`
`www.pnas.org/cgi/doi/10.1073/pnas.1017621108
`
`Page 9026
`
`FOUNDATION EXHIBIT 1032
`IPR2019-00634
`
`

`

`label, and counting k is equivalent to counting n. As n increases, k
`increases more slowly as given by Eq. 1. For example, when n∕m
`is approximately 0.01, the counting efficiency, which is defined as
`the ratio of unique labels to molecules k∕n is approximately 0.99,
`and we expect that an increase of 10 molecules will generate 10
`new labels. As n∕m approaches 0.5 (i.e., 480 molecules reacted
`with 960 labels), k∕n becomes approximately 0.79 and six new
`labels are expected with an increase of 10 molecules. At high
`n∕m, k increases more slowly as labels in the set are more likely
`to be captured more than once. The green curve in Fig. 2 shows
`the number of labels chosen exactly once, and the black curve
`shows the number of labels chosen exactly twice as n increases.
`A more complete description of the number of times a label is
`chosen and of the counting efficiency as a function of n is shown
`in Figs. S1 and S2.
`To demonstrate stochastic labeling, we performed an experi-
`ment to count small numbers of nucleic acid molecules in solu-
`tion. We used genomic DNA from a male individual with Trisomy
`21 to determine the absolute and relative number of DNA copies
`of chromosomes X, 4, and 21, representing one, two, and three
`target copies of each chromosome, respectively. The DNA con-
`centration in the stock solution was measured by quantitative
`staining with PicoGreen fluorescent dye, and dilutions containing
`3.62, 1.45, 0.36, and 0.036 ng were prepared. In each dilution, the
`number of copies of target molecules in the sample was calcu-
`lated from a total DNA mass of 3.5 pg per haploid nucleus
`(5), and represent approximately 1,000, 400, 100, and 10 haploid
`genome equivalents. As outlined in Fig. 3A, the genomic DNA
`sample was first digested to completion with the BamHI restric-
`tion endonuclease to produce 360,679 DNA fragments. A diverse
`set of labels consisting of 960 14-nt sequences was synthesized
`as adaptors harboring BamHI overhangs (Table S1). This set
`of labels adequately addresses a broad dynamic range and was
`chosen for favorable thermodynamic properties as described in
`Materials and Methods. For the stochastic labeling reaction, each
`DNA fragment end randomly attaches to a single label by means
`
`At least once
`Exactly once
`Exactly twice
`
`1500
`1000
`500
`Number of target molecules (n)
`
`2000
`
`1000
`
`800
`
`600
`
`400
`
`200
`
`k)
`
`( slebal fo rebmuN
`
`0
`0
`
`Fig. 2. The number of stochastically captured labels for a given number of
`target molecules calculated using a nondepleting reservoir of 960 diverse
`labels. The red curve represents the average number of labels observed at
`least once (calculated from Eq. S1); the green and black curves represent
`the number of labels observed exactly once and twice (calculated from
`Eq. S3), respectively. Error bars indicate one standard deviation (calculated
`from Eqs. S2 and S4) away from the corresponding mean values.
`
`k ¼ mð1 − P0Þ ¼m ½1 − e−ðn∕mފ:
`
`[1]
`
`Given k, we can calculate n. In addition to using the Poisson
`approximation, the relationship for k, n, and m can be described
`using the binomial distribution, or simulated using a random
`number generator, each yielding similar results (SI Text).
`
`Results
`The outcome of stochastic labeling is illustrated by examining the
`graph of k (the red curve in Fig. 2) calculated using a label diver-
`sity (m) of 960. The expected number of unique labels captured
`depends on the ratio of molecules to labels, n∕m. When n is much
`smaller than m, each molecule almost always captures a unique
`
`A
`
`Genomic DNA
`
`BamHI fragments
`
`Label ligation
`
`Universal PCR and circularization
`
`Gene-specific inverse PCR
`
`|||||||||||||||||||||||||
`Array Probe
`
`Fragmentation and array hybridization
`
`biotin
`Ligation to short 5’ biotin oligonucleotide
`
`B
`
`3.62 ng
`
`1.45 ng
`
`0.36 ng
`
`0.036 ng
`
`0 ng
`
`960 labels
`
`n.s.
`
`525
`
`256
`
`107
`
`14
`
`0
`
`SCIENCES
`
`APPLIEDBIOLOGICAL
`
`(A) A schematic drawing of the method used to attach labels to fragments of DNA in the genome. Red bars represent a pool of synthetic deoxyo-
`Fig. 3.
`ligonucleotide adaptors incorporating a collection of 960 labels used as counting sequences. A common primer sequence flanks each unique label adaptor,
`allowing universal amplification of all fragments with PCR. Circularization of amplified DNA molecules simplifies the selection and amplification of label-
`ligated DNA fragments through inverse PCR with gene-specific primers. The identity of labels that have been ligated to the genomic DNA fragment is
`determined using microarray hybridization, or DNA sequencing. (B) Microarray scan images of the 960 tiled probes for chromosome 4 corresponding to
`the labels used, as well as an additional 192 nonspecific (n.s.) probes serving as negative controls. The amount of genomic DNA used in each experiment
`is given on the left side of each image and the number of labels detected on microarrays is provided on the right side.
`
`Fu et al.
`
`PNAS ∣ May 31, 2011 ∣
`
`vol. 108 ∣ no. 22 ∣ 9027
`
`Page 9027
`
`

`

`of enzymatic ligation of compatible cohesive DNA ends. High
`coupling efficiency is achieved through incubation with a large
`molar excess of labels and DNA ligase enzyme (>1013 molecules
`each). At this stage, the stochastic labeling process is complete,
`and the samples can be amplified as desired for detection. A
`universal primer is added, and the entire population of labeled
`DNA fragments is PCR amplified. The PCR reaction preferen-
`tially amplifies approximately 80,000 fragments in the 150 bp–
`2 kb size range. After circularization of the amplified products,
`three test target fragments were isolated using gene-specific
`PCR; one on each of chromosomes X, 4, and 21, and prepared
`for detection.
`The three labeled targets were counted using two sampling
`techniques: DNA microarrays and next-generation sequencing.
`For the array counting, a custom DNA array detector capable
`of distinguishing the set of labels bound to the targets was con-
`structed by dedicating one array element for each of the 960
`target-label combinations. Each array element consists of a com-
`plementary target sequence attached to one of the complements
`of the 960 label sequences (Fig. 3A, Fig. S3). To maximize the
`specificity of target-label hybridization and scoring, we employed
`a ligation labeling procedure on the captured sequences (Fig. S3).
`We set thresholds to best separate the intensity data from the
`array into two clusters, one of low intensity and one of high
`intensity (Fig. S4A). We scored a label as “present” if its signal
`intensity exceeded the threshold. The number of labels detected
`on microarrays is summarized in Table S2. Fig. 3B shows exam-
`
`ples of microarray scan images where bright spots/features were
`counted as present. As an alternate form of detection, sequencing
`adaptors were added (Fig. S5) and the samples were subjected
`to two independent DNA sequencing runs. Between several hun-
`dred thousand to several million high-quality reads were used to
`score the captured labels (Table S3). Similarly, we set thresholds
`for the number of sequencing reads observed for each label, and
`scored a label as present if the number of sequencing reads ex-
`ceeded the threshold (Fig. S4B). The number of attached labels,
`k, detected for each target in each dilution either by microarray
`counting or sequence counting is presented in Table S4, and
`plotted in Fig. 4 A and B.
`The counting results span a range of approximately 1,500 to 5
`molecules, and it is useful to consider the results in two counting
`regimes, below and above 200 molecules. There is a striking
`agreement between the experimentally observed number of mo-
`lecules and that expected from dilution in the first regime where
`the ratio of molecules to labels ðn∕mÞ < 0.2 (Table S4). Below
`200 molecules the data are in tight agreement, including the data
`from the lowest number of molecules—5, 10, and 15—where the
`counting results are all within the expected sampling error for the
`experiment. (The sampling error for 10 molecules is 10  6.4,
`where 10 and 6.4 are the mean and two standard deviations from
`10,000 independent simulation trials.)
`In the second regime above 200 molecules, there is an approx-
`imate 10–25% undercounting of molecules, increasing as the
`number of molecules increases. We attribute this deviation to be
`
`B
`
`50
`
`40
`
`30
`
`20
`
`10
`
`800
`
`600
`
`400
`
`200
`
`800
`
`600
`
`400
`
`200
`
`50
`
`40
`
`30
`
`20
`
`10
`
`A
`
`Number of labels
`
`0
`
`0
`
`0
`
`500 1000 1500
`0
`40
`30
`20
`10
`50
`Number of target molecules
`
`0
`
`0
`
`0
`
`500 1000 1500
`0
`40
`30
`20
`10
`50
`Number of target molecules
`
`C
`
`Chr4
`Chr21
`ChrX
`
`6.67
`
`D
`
`4.5
`
`Chr4
`Chr21
`ChrX
`
`6.67
`
`4.5
`
`3.62ng
`
`1.45ng
`
`0.36ng
`
`0.036ng
`
`01234
`
`3.62ng
`
`1.45ng
`
`0.36ng
`
`0.036ng
`
`01234
`
`Fig. 4. Absolute counting results for DNA molecules. 3.62, 1.45, 0.36, and 0.036 ng dilutions of DNA isolated from cultured lymphoblasts of a Trisomy 21 male
`individual were processed for microarray hybridization and DNA sequencing. Three gene targets were tested, one from each of chromosomes X, 4, and 21, and
`the numbers of detected labels (blue curve) are shown for microarray (A) and DNA sequencing (B). The number of target molecules for each sample was
`determined from the amount of DNA used, assuming a single haploid nucleus corresponds to 3.5 pg. For comparison, the calculated number of labels expected
`from a stochastic model is also plotted in red. Numerical values are provided in Table S2. Copy number ratios of the three gene targets ChrX (red bar), Chr4 (blue
`bar), and Chr21 (green bar) representing one, two, and three copies per cell, respectively, are shown in (C) and (D). The calculated number of target molecules
`was determined from the number of labels detected on microarrays (Table S2, column 9) or from DNA sequencing. For each sample dilution, the copy number
`ratio of each gene target relative to ChrX is shown for microarray (C) and DNA sequencing (D). For comparison, copy number ratios obtained from in silico
`sampling simulations are also shown; where circles indicate the median values from 10,000 independent trials and error bars indicate the 10th and 90th
`percentiles. The 90th percentile values of the ratios at the lowest concentration (0.036 ng) are explicitly labeled in the plots.
`
`9028 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1017621108
`
`Fu et al.
`
`Page 9028
`
`

`

`SCIENCES
`
`APPLIEDBIOLOGICAL
`
`due to a distortion in the amplification reaction. PCR-introduced
`distortion occurs from small amounts of any complex template
`due to the differences in amplification efficiency between indivi-
`dual templates (6–8). In the present case, stochastic labeling will
`produce only one (at low n∕m ratios), and increasingly several
`copies (at higher n∕m ratios) of each template. Modeling suggests
`that simple random dropout of sequences (PCR efficiencies
`under 100%) generates significant distortion in the final numbers
`of each molecule after amplification. At any labeling ratio, ran-
`dom dropout of sequences because of PCR efficiency will result
`in an undercount of the original number of molecules. At high
`n∕m ratios, the number of labels residing on multiple targets will
`increase and have a statistical survival advantage through the
`PCR reaction causing greater distortion. In support of this argu-
`ment, we observe a wide range of intensities on the microarray
`and a wide range in the number of occurrences of specific
`sequences in the sequencing experiments (Fig. S4 A and B). This
`effect can be reduced by carrying out the reaction at n∕m ratios
`near or less than 0.2, increasing the number of labels m, further
`optimization of the amplification reaction, or by employing a
`linear amplification method.
`The lymphoblast cell line used in this study provides an inter-
`nal control for the relative measurement of copy number for
`genes residing on chromosomes X, 4, and 21. Fig. 4 C and D pre-
`sents the ratio of the absolute number of molecules from all three
`chromosomes normalized to copy number 1 for the X chromo-
`some. As shown, the measurements above 50 molecules all yield
`highly precise relative copy number values. At low numbers of
`molecules (0.036 ng) uncertainty results because the error asso-
`ciated with sampling an aliquot for dilution is significant. Numer-
`ical simulations were performed to estimate the sampling error,
`and summarized medians along with the 10th and 90th percen-
`tiles of the copy number ratios are shown in Fig. 4 C and D as
`circles and range bars, respectively. At the most extreme dilu-
`tions, where approximately 5, 10, and 15 molecules are expected
`for the chromosome X, 4, and 21 genes, the deviation in copy
`number ratio is within the expected sampling error.
`Overall, the identity of labels detected on the microarrays and
`in sequencing are in good agreement, with only a small subset of
`labels unique to each process (Fig. S4C). Despite a high sequen-
`cing sampling depth (Table S3), a small number of labels with
`high microarray intensity appear to be missing or underrepre-
`sented in the sequencing results. In contrast, labels that appear
`in high numbers in the sequencing reaction always correlate with
`high microarray intensities. No trivial explanation could be found
`for the labels that are missing from any given sequencing experi-
`ment. Although underrepresented in some experiments, the same
`labels appear as present with high sequence counts in other
`experiments, suggesting that the sequences are compatible with
`the sequencing reactions. We used PCR as an independent meth-
`od to investigate isolated cases of disagreement, and demon-
`strated that the labels were present in the samples used for the
`sequencing runs (Table S5). Although we can clearly confirm
`their presence in the sequencing libraries, it is unclear as to why
`these labels are missing or underrepresented in the sequencing
`reads.
`To test the stochastic behavior of label selection, we pooled the
`results of multiple reactions at low target concentrations (0.36
`and 0.036 ng), where the probability that a label will be chosen
`more than once is small. Fig. S6 shows that the number of times
`each label is used closely follows modeling for 1,064 label obser-
`vations from microarray counting. Furthermore, because each
`end of a target sequence chooses a label independently, we
`can compare the likelihood of the same label occurring on both
`ends of a target at high copy numbers. Table S2, columns 10–11
`present the experimentally observed frequency of labels occur-
`ring in common across both ends of a target and their expected
`
`frequency from numerical simulations. No evidence of nonsto-
`chastic behavior is observed in these data.
`
`Discussion
`It is interesting to contrast the attributes of stochastic labeling
`with other quantitative methods. Microarray and sequencing
`technologies are commonly used to obtain the relative abundance
`of multiple targets in a sample. In the case of microarray analysis,
`intensity values reflect the amount of hybridization bound target
`and can be used to compare to the intensity of other targets in
`the sample. In the case of sequencing, the number of times a
`sequence is found is compared to the number of times other
`sequences are found. Although the techniques differ by using
`intensity in one case and a digital count in the other, they both
`provide relative comparisons of the number of molecules in
`solution. To obtain absolute numbers, quantitative capture of all
`sequences would need to be assured, and distortions due to am-
`plification biases understood; however, in practice the efficiency
`of capture and/or distortions due to amplification biases with
`sequencing or other counting approaches (9–12) are unknown.
`With stochastic labeling, high-efficiency enzymatic reactions
`coupled with a large molar excess of labels ensures quantitative
`labeling, and after amplification, threshold detection diminishes
`the effects of distortions due to amplification bias.
`Digital PCR is an absolute counting method where solutions
`are stochastically partitioned into multiwell containers, typically
`until there is an average probability of less than one molecule
`per two containers, then detected by PCR (3). This condition
`is satisfied when, 1 − P0 ¼ ð1 − e−n∕cÞ ¼ 1
`2; where P0 is the prob-
`ability that a container does not contain any molecule, n is the
`number of molecules and c is the number of containers, or n∕c
`is 0.693. If quantitative partitioning is assumed, the dynamic
`range is governed by the number of containers available for
`stochastic separation. Once the molecules are partitioned, high-
`efficiency PCR detection gives the yes/no answer and absolute
`counting is enabled. To vary dynamic range, microfabrication (13)
`or picoliter droplets (14) can be used to substantially increase the
`number of containers. Similarly, in stochastic labeling, the same
`statistical conditions are met when 1 − P0 ¼ ð1 − e−n∕mÞ ¼ 1
`2;
`where m is the number of labels, and one half of the labels will
`be used at least once when n∕m ¼ 0.693. The dynamic range is
`governed by the number of labels used, and the number of labels
`can be easily increased to extend the dynamic range. The number
`of containers in digital PCR plays the same role as the number
`of labels in stochastic labeling and by substituting containers for
`labels we can write identical statistical equations. Using the prin-
`ciples of physical separation, digital PCR stochastically expands
`identical molecules into physical space, whereas the principle
`governing stochastic labeling is chemically based and expands
`identical molecules into chemical space.
`We have shown that a population of indistinguishable mole-
`cules can be stochastically expanded to a population of uniquely
`identifiable and countable molecules. High-sensitivity threshold
`detection of single molecules is demonstrated, and the process
`can be used to count both the absolute and relative number
`of molecules in a sample. The method should be well-suited for
`determining the absolute number of multiple target molecules in
`a specified container, such as high-sensitivity clinical assays, or for
`determining the number of transcripts in single cells. For exam-
`ple, counting on the order of 300,000 molecules of the approxi-
`mately 30,000 gene transcripts in the human genome in any given
`cell could be achieved with high efficiency using several thousand
`labels. We estimate that this experiment should require about
`10–30 million sequencing reads, falling within the capacity of
`modern sequencing devices (the number of reads required using
`sequencing technology depends on the number of molecules, not
`the diversity of labels). The number of array elements required
`depends on the number of different types of molecules times the
`
`Fu et al.
`
`PNAS ∣ May 31, 2011 ∣
`
`vol. 108 ∣ no. 22 ∣ 9029
`
`Page 9029
`
`

`

`diversity of labels, or ∼107 array elements in this example, also
`within range of current technology. The approach should also
`be compatible with other molecular assay systems. For example,
`antibodies could be stochastically labeled with DNA fragments
`and those that bind antigen harvested. After amplification, the
`number of labels detected will reveal the original number of anti-
`gens in solutions. In the examples shown here, DNA is used as a
`chemical label because of the great diversity of sequences avail-
`able, it can be amplified, and because it is easily detectable. In
`principle, any stochastic chemical change could be used as long
`as it can be easily detected and generates sufficient diversity for
`the desired application.
`
`Materials and Methods
`DNA Samples. Genomic DNA isolated from cultured B-Lymphocytes of a male
`Caucasian with Trisomy 21 was purchased from Coriell Institute for Medical
`Research (Catalog no. GM01921). The DNA quantity was determined by
`PicoGreen (Invitrogen) measurements using the lambda phage DNA pro-
`vided in the kit as reference standard. DNA quality was assessed by agarose
`gel electrophoresis.
`
`BamHI Digestion and Ligation to Labels. Genomic DNA was digested to
`completion with BamHI [New England BioLabs (NEB)] and ligated to a pool
`of adaptors consisting of an equal concentration of 960 distinct labels
`(Fig. 3A). Each adaptor consists of a universal PCR priming site, a 14-nt long
`label sequence, and a BamHI overhang (Fig. S3). The sequence of the labels
`(Table S1) was selected from an all-possible 414 nucleotide combination to be
`of similar melting temperature, minimal self-complementation, and maximal
`differences between one another. Homopolymer runs and the sequence of
`the BamHI restriction site were avoided. Oligonucleotides were synthesized
`(Integrated DNA Technologies) and annealed to form double-stranded adap-
`tors prior to pooling. For ligation, the digested DNA was diluted to the
`desired quantity and added to 100 pmol (equivalent to 6 × 1013 molecules)
`of pooled label adaptors, and 2 × 103 units (equivalent to 1 × 1016 molecules)
`of T4 DNA ligase (NEB) in a 30 μL reaction. The reaction was incubated at
`20 °C for 3 h until inactivation at 65 °C for 20 min.
`
`Adaptor PCR. Adaptor-ligated fragments were amplified in a 50 μL reaction
`containing 1X TITANIUM Taq PCR buffer (Clontech), 1M betaine (Sigma-
`Aldrich), 0.3 mM dNTPs, 4 μM PCR004StuA primer (Fig. S3), 2.5 units Taq
`DNA Polymerase (Affymetrix), and 1X TITANIUM Taq DNA polymerase (Clon-
`tech). An initial PCR extension was performed at 72 °C for 5 min, 94 °C for
`3 min, followed by 5 cycles of 94 °C for 30 s, 45 °C for 45 s, and 68 °C for
`15 s. This step was followed by 25 cycles of 94 °C for 30 s, 60 °C for 45 s,
`and 68 °C for 15 s and a final extension step of 68 °C for 7 min. PCR products
`were assessed with agarose gel electrophoresis (Fig. S4) and purified using
`the QIAquick PCR purification kit (Qiagen).
`
`Circularization. The purified PCR product was denatured at 95 °C for 3 min
`prior to phosphorylation with T4 polynucleotide kinase (NEB). The phos-
`phorylated DNA was ethanol precipitated and circularized using the CircLi-
`gase™ II ssDNA Ligase Kit (Epicentre). Circularization was performed at 60 °C
`for 2 h followed by 80 °C inactivation for 10 min in a 40 μL reaction consisting
`of 1X CircLigase™ II reaction buffer, 2.5 mM MnCl2, 1M betaine, and 200U
`CircLigase™ II ssDNA ligase. Noncircularized DNAs were removed by treat-
`ment with 20U Exonuclease I (Epicentre) at 37 °C for 30 min. Remaining
`DNA was purified with ethanol precipitation and quantified with OD260
`measurement.
`
`Amplification of Gene Targets. Three assay regions were tested, one on each
`of chromosomes 4, 21, and X. Table S1 lists the genomic location, length, and
`sequences of these selected fragments. The circularized DNA was amplified
`with gene-specific primers in a multiplex inverse PCR reaction. PCR primers
`were picked using Primer3 (http://frodo.wi.mit.edu/primer3) to yield ampli-
`cons ranging between 121 and 168 bp. PCR was carried out with 1X TITA-
`NIUM Taq PCR buffer (Clontech), 0.3 mM dNTPs, 0.4 μM each primer, 1X
`TITANIUM Taq DNA Polymerase (Clontech), and approximately 200 ng of
`the circularized DNA. After denaturation at 94 °C for 2 min, reactions were
`cycled 30 times as follows: 94 °C for 20 s, 60 °C for 20 s, 68 °C for 20 s, and a
`68 °C final hold for 4 min. PCR products were assessed on a 4–20% gradient
`polyacrylamide gel (Invitrogen) and precipitated with ethanol.
`
`Array Design. For each gene target assayed, the array probes consist of
`all possible combinations of the 960 label sequences connected to the two
`
`BamHI genomic fragment ends (Fig. S3). An additional 192 label sequences
`that were not included in the adaptor pool were also included to serve as
`nonspecific controls. This strategy enables label detection separately at each
`paired end, because each target fragment is ligated to two independent
`labels (one on either end).
`
`Array Synthesis. Arrays were synthesized following standard Affymetrix
`GeneChip manufacturing methods utilizing contact lithography and phos-
`phoramidite nucleoside monomers bearing photolabile 5′-protecting groups.
`Array probes were synthesized with 5′ phosphate ends to allow for ligation.
`Fused silica wafer substrates were prepared by standard methods with trialk-
`oxy aminosilane, as previously described (15). After the final lithographic
`exposure step, the wafer was deprotected in an ethanolic amine solution
`for a total of 8 h prior to dicing and packaging.
`
`Hybridization to Arrays. PCR products were digested with Stu I (NEB), and
`treated with lambda exonuclease (Affymetrix). Five micrograms of the di-
`gested DNA was hybridized to a GeneChip array in 112.5 μL of hybridization
`solution containing 80 μg denatured Herring sperm DNA (Promega), 25% for-
`mamide, 2.5 pM biotin-labeled gridding oligo, and 70 μL hybridization buffer
`(4.8M TMACl, 15 mM Tris pH 8, and 0.015% Triton X-100). Hybridizations
`were carried out in ovens at 50 °C for 16 h with rotation at 30 rpm. Following
`hybridization, arrays were washed in 30 mM NaCl, 2 mM NaH2PO4, 0.2 mM
`EDTA, pH 7.4 containing 0.005% Trition X-100 at 37 °C for 30 min, and
`with 10 mM Tris/1 mM EDTA, pH 8 (TE) at 37 °C for 15 min. A short
`biotin-labeled oligonucleotide (Fig. S3) was annealed to the hybridized
`DNAs, and ligated to the array probes with Escherichia coli DNA ligase
`(Affymetrix). Excess unligated oligonucleotides were removed with TE wash
`at 50 °C for 10 min. The arrays were stained with streptavidin, R-phycoery-
`thrin conjugate (Invitrogen), and scanned on the GCS3000 instrument
`(Affymetrix).
`
`Counting Labels. We set thresholds for the array intensity, or the number
`of sequencing reads to classify labels as either being used or not (Fig. S4 A
`and B). Appropriate thresholds were straightforward to determine when
`used and unused labels fall into two distinct clusters separated by a signifi-
`cant gap. In situations where a gap was not obvious, the function normal-
`mixEM in the R package mixtools was used to classify labels. This function
`uses the expectation maximization (EM) algorithm to fit the data by mixtures
`of two normal distributions iteratively. The two normal distributions corre-
`spond to the two clusters to be identified. The cluster of labels with a high
`value is counted as used, and the other as not used. The average of the
`minimum and maximum of the two clusters,ðImin þ ImaxÞ∕2, was applied as
`the threshold for separating the two clusters.
`
`Sampling Error Calculation. A sam

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket