`aneuploidy by massively parallel genomic sequencing
`of DNA in maternal plasma
`
`Rossa W. K. Chiua,b, K. C. Allen Chana,b, Yuan Gaoc,d, Virginia Y. M. Laua,b, Wenli Zhenga,b, Tak Y. Leunge,
`Chris H. F. Foof, Bin Xiec, Nancy B. Y. Tsuia,b, Fiona M. F. Luna,b, Benny C. Y. Zeef, Tze K. Laue, Charles R. Cantorg,1,
`and Y. M. Dennis Loa,b,1
`
`aCentre for Research into Circulating Fetal Nucleic Acids, Li Ka Shing Institute of Health Sciences, Departments of bChemical Pathology and eObstetrics and
`Gynaecology, and fCentre for Clinical Trials, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China; cCenter for the Study of
`Biological Complexity and dDepartment of Computer Science, Virginia Commonwealth University, Richmond, VA 23284; and gSequenom, Inc., San Diego, CA
`92121
`
`Contributed by Charles R. Cantor, October 22, 2008 (sent for review September 29, 2008)
`
`Chromosomal aneuploidy is the major reason why couples opt for
`prenatal diagnosis. Current methods for definitive diagnosis rely on
`invasive procedures, such as chorionic villus sampling and amniocen-
`tesis, and are associated with a risk of fetal miscarriage. Fetal DNA has
`been found in maternal plasma but exists as a minor fraction among
`a high background of maternal DNA. Hence, quantitative perturba-
`tions caused by an aneuploid chromosome in the fetal genome to the
`overall representation of sequences from that chromosome in ma-
`ternal plasma would be small. Even with highly precise single mole-
`cule counting methods such as digital PCR, a large number of DNA
`molecules and hence maternal plasma volume would need to be
`analyzed to achieve the necessary analytical precision. Here we
`reasoned that instead of using approaches that target specific gene
`loci, the use of a locus-independent method would greatly increase
`the number of target molecules from the aneuploid chromosome that
`could be analyzed within the same fixed volume of plasma. Hence, we
`used massively parallel genomic sequencing to quantify maternal
`plasma DNA sequences for the noninvasive prenatal detection of fetal
`trisomy 21. Twenty-eight first and second trimester maternal plasma
`samples were tested. All 14 trisomy 21 fetuses and 14 euploid fetuses
`were correctly identified. Massively parallel plasma DNA sequencing
`represents a new approach that is potentially applicable to all preg-
`nancies for the noninvasive prenatal diagnosis of fetal chromosomal
`aneuploidies.
`
`Down syndrome 兩 Solexa sequencing 兩 trisomy 21
`
`The testing of fetal chromosomal aneuploidies is the predomi-
`
`nant reason why many pregnant women opt for prenatal
`diagnosis. Conventional methods for definitive prenatal diagnosis
`of these disorders involve the invasive sampling of fetal materials
`through amniocentesis and chorionic villus sampling, with a risk for
`the fetus (1). Many workers tried to develop noninvasive ap-
`proaches. Methods based on ultrasound scanning and maternal
`serum biochemical markers (2) have proved to be useful screening
`tests. However, they detect epiphenomena instead of the core
`pathology of chromosomal abnormalities. They have limitations
`such as a narrow gestational window of applicability and the need
`to combine multiple markers, even over different time points, to
`arrive at a clinically useful sensitivity and specificity profile.
`For the direct detection of fetal chromosomal and genetic
`abnormalities from maternal blood, early work focused on the
`relatively difficult isolation of the rare fetal nucleated cells from
`maternal blood (3–5). The discovery of cell-free fetal nucleic acids
`in maternal plasma in 1997 opened up new possibilities (6, 7).
`However, the fact that fetal DNA represents only a minor fraction
`of total DNA in maternal plasma (8), with the majority being
`contributed by the pregnant woman herself, has offered consider-
`able challenge. Recently, a number of approaches have been
`developed. One strategy targets a fetal-specific subset of nucleic
`
`acids in maternal plasma, e.g., placental mRNA (9–11) and DNA
`molecules bearing a placental-specific DNA methylation signature
`(12–14). The fetal chromosomal dosage is then assessed by allelic
`ratio analysis of SNPs within the targeted molecules. These strat-
`egies are called the RNA–SNP allelic ratio approach (11) and the
`epigenetic allelic ratio approach (14). These allelic ratio-based
`methods can be used only for fetuses heterozygous for the analyzed
`SNPs. Thus, multiple markers are needed to enhance the popula-
`tion coverage of the methods.
`To develop a polymorphism-independent method for the detec-
`tion of fetal chromosomal aneuploidies from maternal plasma, our
`group has recently outlined the principles for the measurement of
`relative chromosome dosage (RCD) using digital PCR (15). Digital
`RCD aims to measure the total (maternal plus fetal) amount of a
`specific locus on a potentially aneuploid chromosome in maternal
`plasma, e.g., chromosome 21 (chr21) in trisomy 21 (T21), and
`compares it to that on a reference chromosome. Hence, fetal T21
`is diagnosed by detecting the small increment in the total amount
`of the chr21 gene locus contributed by the trisomic chr21 in the fetus
`as compared with a gene locus on a reference chromosome. The
`proportional increment in chr21 sequences is expectedly small
`because fetal DNA contributes only a minor fraction of DNA in
`maternal plasma (8). To reliably detect the small increase, a large
`absolute number of chr21 and reference chromosome sequences of
`the loci targeted by the digital PCR assays need to be analyzed and
`quantified with high precision. The number of molecules required
`for RCD increases by four times, for every twofold reduction in the
`fractional concentration of circulating fetal DNA. Thus, for cases in
`which the fractional concentration for circulating fetal DNA is low,
`e.g., during early gestation, relatively large volumes of maternal
`plasma may be needed. One way is to perform multiplex analysis of
`multiple genetic loci. However, the optimization of highly multi-
`plexed digital PCR might be challenging. If fluorescence reporters
`are used, one would also quickly run out of reporters for distin-
`guishing the products from the various loci.
`
`Author contributions: R.W.K.C., K.C.A.C., and Y.M.D.L. designed research; R.W.K.C.,
`K.C.A.C., Y.G., V.Y.M.L., W.Z., B.X., N.B.Y.T., and F.M.F.L. performed research; T.Y.L. and
`T.K.L. collected clinical samples; R.W.K.C., K.C.A.C., V.Y.M.L., C.H.F.F., B.C.Y.Z., C.R.C., and
`Y.M.D.L. analyzed data; and R.W.K.C. and Y.M.D.L. wrote the paper.
`
`Conflict of interest statement: R.W.K.C., K.C.A.C., N.B.Y.T., F.M.F.L., B.C.Y.Z., C.R.C., and
`Y.M.D.L. have filed patent applications on the detection of fetal nucleic acids in maternal
`plasma for noninvasive prenatal diagnosis. Part of this patent portfolio has been licensed
`to Sequenom. C.R.C. is Chief Scientific Officer of and holds equities in Sequenom. Y.M.D.L
`is a consultant to and holds equities in Sequenom.
`
`Freely available online through the PNAS open access option.
`
`1To whom correspondence may be addressed. E-mail: loym@cuhk.edu.hk or ccantor@
`sequenom.com.
`
`This article contains supporting information online at www.pnas.org/cgi/content/full/
`0810641105/DCSupplemental.
`
`© 2008 by The National Academy of Sciences of the USA
`
`20458 –20463 兩 PNAS 兩 December 23, 2008 兩 vol. 105 兩 no. 51
`
`www.pnas.org兾cgi兾doi兾10.1073兾pnas.0810641105
`
`Ariosa Exhibit 1033, pg. 1
`IPR2013-00276
`
`
`
`MEDICALSCIENCES
`
`Schematic illustration of the procedural framework for using mas-
`Fig. 1.
`sively parallel genomic sequencing for the noninvasive prenatal detection of
`fetal chromosomal aneuploidy. Fetal DNA (thick red fragments) circulates in
`maternal plasma as a minor population among a high background of mater-
`nal DNA (black fragments). A sample containing a representative profile of
`DNA molecules in maternal plasma is obtained. In this study, one end of each
`plasma DNA molecule was sequenced for 36 bp using the Solexa sequencing-
`by-synthesis approach. The chromosomal origin of each 36-bp sequence was
`identified through mapping to the human reference genome by bioinformat-
`ics analysis. The number of unique (U0 –1– 0 – 0, see text) sequences mapped to
`each chromosome was counted and then expressed as a percentage of all
`unique sequences generated for the sample, termed % chrN for chromosome
`N. Z-scores for each chromosome and each test sample were calculated using
`the formula shown. The z-score of a potentially aneuploid chromosome is
`expected to be higher for pregnancies with an aneuploid fetus (cases E–H
`shown in green) than for those with a euploid fetus (cases A–D shown in blue).
`
`per chromosome should in turn bear correlation with the relative
`size of each chromosome in the human genome. If the % chrN
`values could be determined precisely enough by sequencing and
`counting a large enough pool of plasma DNA sequences, we
`hypothesize that we would be able to discriminate perturbations in
`the quantitative representation of sequences mapped to the aneu-
`ploid chromosomes in a maternal plasma sample from a pregnancy
`involving a fetus with the said aneuploidy. We set out to test each
`of these assumptions.
`
`Detection of Fetal DNA in Maternal Plasma. If MPGS could sequence
`fetal DNA in maternal plasma, one should be able to detect chrY
`DNA from plasma of women carrying male fetuses. Plasma samples
`obtained from four pregnant women carrying euploid fetuses (three
`males and one female) were processed using the beta ChIP-Seq
`protocol from Illumina, which included amplification of the adap-
`tor-ligated DNA fragments both before and after (i.e., two rounds
`of amplification) a gel electrophoresis-based size fractionation step
`as described in supporting information (SI) Text.
`
`To overcome the above limitations, we propose to use a method
`independent of any particular gene locus to quantify the amount of
`chr21 sequences in maternal plasma. When a locus-independent
`method is used, potentially every DNA fragment originating from
`the aneuploid chromosome could contribute to the measurement of
`the amount of that chromosome. Therefore, for any fixed volume
`of maternal plasma, the number of quantifiable sequences would be
`much greater than the number of DNA molecules that could serve
`as templates for detection by gene locus-specific assays. Hence,
`precise detection of the over- or underrepresentation of sequences
`from an aneuploid chromosome could be more readily achieved.
`We previously (15) proposed that the recently available massively
`parallel genomic sequencing (MPGS) platforms (16, 17) might be
`adaptable as an approach to quantify DNA sequences for the
`noninvasive prenatal diagnosis of fetal chromosomal aneuploidy. In
`this study, we demonstrate the use of the ‘‘Solexa’’ sequencing
`technique (Illumina) (18) for this purpose.
`
`Results
`Procedural Framework. The procedural framework of using MPGS
`for noninvasive fetal chromosomal aneuploidy detection in mater-
`nal plasma is schematically illustrated in Fig. 1. In this study, we used
`the sequencing-by-synthesis Solexa method (18). As the maternal
`plasma DNA (maternal and fetal) molecules were already frag-
`mented in nature (19), no further fragmentation was required. One
`end of the clonally expanded copies of each plasma DNA fragment
`was sequenced and processed by standard postsequencing bioin-
`formatics alignment analysis for the Illumina Genome Analyzer,
`which uses the Efficient Large-Scale Alignment of Nucleotide
`Databases (ELAND) software. The purpose of the alignment was
`to simply determine the chromosomal origin of the sequenced
`plasma DNA fragments and details about their gene-specific loca-
`tion were not required. The number of sequence reads originating
`from any particular chromosome was then counted and tabulated
`for each human chromosome. In this study, we counted only
`sequences that could be mapped to just one location in the
`repeat-masked reference human genome with no mismatch, i.e.,
`deemed as a ‘‘unique’’ sequence in the human genome. We termed
`these sequences as U0–1–0–0 on the basis of values in a number of
`fields in the data output files of the ELAND sequence alignment
`software (Illumina) (see Materials and Methods).
`We then determined the percentage contribution of unique
`sequences mapped to each chromosome by dividing the U0–1–0–0
`count of a specific chromosome by the total number of U0–1–0–0
`sequence reads generated in the sequencing run for the tested
`sample to generate a value termed % chrN, when the chromosome
`of interest is chrN. To determine if a tested maternal plasma sample
`belonged to a T21 pregnancy, we calculated the z-score of % chr21
`of the tested sample. The z-score refers to the number of standard
`deviations from the mean of a reference data set. Hence, for a T21
`fetus, a high z-score for % chr21 was expected when compared with
`the mean and standard deviation of % chr21 values obtained from
`maternal plasma of euploid pregnancies.
`For this procedure to be effective for noninvasive prenatal fetal
`chromosomal aneuploidy detection, a number of assumptions need
`to be met. First, MPGS needs to be sensitive enough to capture and
`generate sequence reads for the small fraction of fetal DNA in
`maternal plasma alongside the background maternal DNA. Sec-
`ond, the pool of plasma DNA fragments captured for sequencing
`needs to be a representative sample of the total DNA pool with
`similar interchromosomal distribution to that in the original ma-
`ternal plasma. Third, there should be no major bias in the ability to
`sequence DNA fragments originating from each chromosome.
`When these assumptions hold, then the % chrN values should be
`reflective of the genomic representation of the maternal and fetal
`DNA fragments in maternal plasma. Furthermore, if both the
`maternal and the fetal genomes are evenly represented in maternal
`plasma, the proportional contribution of plasma DNA sequences
`
`Chiu et al.
`
`PNAS 兩 December 23, 2008 兩 vol. 105 兩 no. 51 兩 20459
`
`Ariosa Exhibit 1033, pg. 2
`IPR2013-00276
`
`
`
`Bar chart of % U0 –1– 0 – 0 sequences per chromosome for a maternal plasma sample involving a female fetus (sample 1), a maternal plasma sample
`Fig. 2.
`involving a male fetus (sample 2), and a mixture of plasma from two adult males (sample 3) processed using the new (protocol A) and original (protocol B)
`protocols. The percentage of genomic representation of each chromosome as expected for a repeat-masked reference haploid female genome was plotted as
`a reference (black bars).
`
`The clinical information and sequenced counts for these four
`samples are shown in Table S1. The total number of sequences
`obtained from each sample was ⬇9 ⫻ 106. The total U0–1-0–0
`counts ranged from ⬇1.8 ⫻ 106 to 2.0 ⫻ 106 per case. The
`percentages of the U0–1-0–0 counts mapped to each chromosome
`are shown in Fig. S1. For the three pregnancies with male fetuses,
`i.e., cases 3009, 3034, and 3143, the absolute and fractional (in
`parentheses) U0–1–0–0 counts mapped to chrY were 636
`(0.032%), 858 (0.048%), and 1,054 (0.056%), respectively. How-
`ever, it was unexpected that 177 (0.009%) sequences were also
`mapped to chrY in the sample involving a female fetus. Real-time
`PCR for the SRY gene (8) was negative for this latter plasma
`sample. We next considered that contamination from male se-
`quences might occur during the gel electrophoresis.
`
`Sequencing Protocol for Plasma DNA. We developed a new protocol to
`prepare plasma DNA samples for MPGS whereby the gel electro-
`phoresis and second amplification steps were omitted. The new and
`original protocols were compared and denoted as protocols A and
`B, respectively. To minimize the chance of bias in the sequencing
`results caused by low DNA input, 100 ng of DNA were extracted
`from three plasma samples. Half (50 ng) of each plasma sample was
`processed by either protocol and sequenced in the same manner.
`The tested plasma samples included one from a pregnant woman
`carrying a female fetus, one from a pregnant woman carrying a male
`fetus, and one that was a mixture of plasma from two male
`individuals. A mixture was required for the last sample so that 100
`ng of DNA could be obtained. The three samples were named
`samples 1, 2, and 3, respectively.
`The clinical details and sequencing counts for each sample and
`each protocol are shown in Table S2. The total U0–1–0–0 counts
`ranged from 2.0 ⫻ 106 to 2.2 ⫻ 106. The absolute and fractional
`U0–1–0–0 counts (in parentheses) mapped to chrY for samples 1,
`2, and 3 using the new protocol were 184 (0.009%), 1,444 (0.066%),
`and 3,523 (0.175%), respectively. The corresponding numbers for
`the original protocol were 218 (0.011%), 1,615 (0.077%), and 3,468
`(0.169%), respectively. Thus, contamination attributable predom-
`inantly to the gel purification and the second amplification steps
`could not be substantiated.
`We next explored if there might be a bioinformatic explanation.
`We used the Basic Local Alignment Search Tool (BLAST) to
`analyze each of the U0–1–0–0 sequences mapped to chrY for each
`of the three samples and for both protocols. We assessed the
`proportion of those DNA sequences that could genuinely be
`aligned just to chrY using BLAST. The proportion of sequences
`aligned uniquely to chrY by BLAST was comparable for both the
`
`new and the original protocols (Table S3). For the plasma sample
`obtained from the pregnancy with a female fetus, only ⬃30% of
`sequences mapped to chrY by ELAND were confirmed to map just
`to chrY by BLAST. This was in contrast to samples 2 and 3, where
`⬎90% of the sequences mapped to chrY by ELAND could be
`confirmed by BLAST. Nonetheless, the chrY sequences detected in
`the plasma sample from a pregnancy with a male fetus confirmed
`that fetal DNA in maternal plasma could be sequenced by MPGS.
`To confirm that there was little mapping error among the
`U0–1–0–0 sequences aligned by the ELAND software, we per-
`formed a BLAST analysis on 120 randomly selected U0–1–0–0
`sequences for each of the other chromosomes for the three plasma
`DNA samples processed by the new protocol. As shown in Table S4,
`among the selected test sequences, ⬎ 99% of U0–1–0–0 sequences
`mapped by ELAND to the autosomes were confirmed to align only
`to the corresponding chromosome by BLAST. All 120 chrX se-
`quences mapped by ELAND were confirmed by BLAST in sample
`1, which was composed of female DNA only. More than 97% of
`chrX sequences mapped by ELAND were confirmed by BLAST in
`samples 2 and 3, which contained male DNA. These data suggested
`that U0–1–0–0 sequences mapped by the ELAND software were
`generally accurate with chrY being the exception.
`
`Distribution of Maternal Plasma DNA Sequences Among the Human
`Chromosomes. The percentage contributions of U0–1–0–0 count by
`each chromosome among the total U0–1–0–0 sequences were
`calculated for samples 1, 2, and 3. To investigate if maternal plasma
`DNA sequences were evenly distributed across the human genome,
`we compared the plasma DNA data with the expected genomic
`contribution of each chromosome. Our main goal was to analyze
`maternal plasma DNA in which the predominant DNA background
`was female. Thus, we calculated the relative genomic representa-
`tion, i.e., size, of each chromosome, on the basis of the nucleotide
`content of each chromosome within a repeat-masked haploid
`reference human genome of a female. The relative size of each
`chromosome was plotted alongside the percentage of chromosomal
`contribution of U0–1–0–0 sequences of the sequenced plasma
`DNA samples.
`As shown in Fig. 2, aliquots of plasma DNA processed by the new
`protocol, i.e., samples 1A, 2A, and 3A, bore closer resemblances to
`the expected genomic representation of each human chromosome
`than the corresponding aliquot processed by the original protocol,
`i.e., samples 1B, 2B, and 3B. We performed linear regression
`analyses to compare the % U0–1–0–0 per chromosome obtained
`from both the new and the original protocols against the expected
`genomic representation of each chromosome in the human ge-
`
`20460 兩 www.pnas.org兾cgi兾doi兾10.1073兾pnas.0810641105
`
`Chiu et al.
`
`Ariosa Exhibit 1033, pg. 3
`IPR2013-00276
`
`
`
`MEDICALSCIENCES
`
`Plot of (A) % U0 –1– 0 – 0 counts and (B) z-scores for chromosome 21
`Fig. 3.
`and chromosome X for 28 maternal plasma samples. The sample numbers
`correspond to the cases described in Table S5.
`
`chrX counts between female and male fetuses were expected to be
`much larger than the difference in chr21 counts between T21 and
`euploid fetuses. This was because there was a 2-fold increase in the
`dosage of chrX for a female than a male individual, but just a
`1.5-fold increase in chr21 dosage for a T21 than a euploid individual.
`Furthermore, chrX is much larger than chr21 and contributed to a
`mean of 9.5 ⫻ 104 U0–1–0–0 counts in male fetuses compared with
`a mean of 3.2 ⫻ 104 for chr21 in all samples.
`As the z-score reflects the extent of differences in a measurement
`expressed as the number of SDs from the mean of a reference data
`set, we postulated that the SD was small for the measurement of %
`chr21 but large for the measurement of % chrX. As the SD of a data
`set was in fact reflecting the precision of its measurement, we used
`the data from the 10 euploid male fetuses to calculate the coeffi-
`cient of variation (CV ⫽ SD/mean ⫻ 100%) of measuring the
`percentage of representation of each chromosome. As shown in
`Table S6, chr21 had the third lowest CV (0.54%) among all
`chromosomes while the CV for the % chrX measurement was
`3.10%. As the absolute number of U0–1–0–0 sequences counted
`for chrX was threefold higher than that for chr21, the number of
`sequences counted could not explain the variation in the precision.
`We therefore explored the relationship between the CV in %
`U0–1–0–0 counts and the GC content of each chromosome (Fig.
`S5). Human chromosomes can be distributed into five groups with
`different levels of GC content (20). Group I chromosomes have the
`lowest levels while group V chromosomes have the highest levels of
`GC content. Interestingly, there was a statistically significant dif-
`ference (P ⬍ 0.001, ANOVA) in the CVs for the five groups of
`chromosomes. A Bonferroni t-test further identified that the CV for
`group V was significantly higher (P ⬍ 0.05) than that for the other
`four groups. The CV for group IV and group I was each signifi-
`cantly higher (P ⬍ 0.05) than for both groups II and III.
`
`Discussion
`We have demonstrated that MPGS can be used as a diagnostic tool
`in noninvasive prenatal diagnosis. We have shown that differences
`in amounts of chr21 DNA sequences in maternal plasma contrib-
`
`nome. As shown in Fig. S2, the slopes of the lines obtained from
`samples 1A, 2A, and 3A were ⬎0.95, while those for samples 1B,
`2B, and 3B were 0.755, 0.795, and 0.859, respectively. R2 was ⬎0.980
`for samples 1A, 2A, and 3A but was 0.803, 0.840, and 0.910 for
`samples 1B, 2B, and 3B, respectively. These data objectively con-
`firmed that the DNA processing protocol with just one PCR
`amplification step and the omission of the gel electrophoresis
`procedure produced a quantitative profile of sequences that better
`resembled the genomic content of each human chromosome than
`the original protocol. More importantly, these data suggested that
`the overall distribution of DNA molecules in maternal plasma
`(inclusive of maternal and fetal DNA) across the human genome
`was quite even. The chromosomal distribution of DNA molecules
`in the maternal plasma samples (1A and 2A) was also similar to that
`of adult male plasma (sample 3A). This observation suggested that
`it would be unlikely for the maternal and fetal DNA sequences in
`maternal plasma to bear significant discrepancies among their
`genomic distributions. Otherwise, if the genomic representation of
`the maternal DNA differed substantially from that of fetal DNA,
`one would expect the overall genomic representation to be dis-
`crepant from that of a nonpregnant human plasma DNA sample.
`
`Fetal Trisomy 21 Detection from Maternal Plasma. We proceeded to test
`if fetal chromosomal aneuploidy would lead to quantitative per-
`turbations in the percentage contribution in aligned sequences for
`the aneuploid chromosome. Plasma samples were obtained in the
`first and second trimesters of pregnancies from 14 women each
`pregnant with a euploid fetus and 14 women each pregnant with a
`T21 fetus. The chromosomal status of the fetuses was confirmed by
`full karyotyping. Plasma DNAs from the 28 pregnancies (median
`gestational age: 14.1 weeks) were processed by the new protocol and
`sequenced. The clinical details and sequencing counts for each
`sample are shown in Table S5. The 28 samples were processed as
`two batches on dates 6 weeks apart and sequenced in four flow cells.
`The mean number of sequence reads generated per sample was
`10.8 ⫻ 106. The mean U0–1–0–0 count was 2.5 ⫻ 106. The
`percentage contributions of U0–1–0–0 sequences to each chro-
`mosome were plotted against the percentage of genomic represen-
`tation per chromosome of the human genome as described above
`and are shown in Fig. S3. The data for chr21 and chrX are further
`shown in Fig. 3A. The percentage of U0–1–0–0 sequences aligned
`to chr21 was slightly higher for all T21 than for euploid cases. The
`% chrX was much higher and the % chrY was much lower for all
`female than male fetuses.
`To objectively quantify the degree of overrepresentation in chr21
`sequences of the T21 fetuses, we used the data from the 10 euploid
`male fetuses as a reference population to calculate the mean and SD
`in % U0–1–0–0 per chromosome. The reference population was
`restricted to euploid male fetuses so that an expected increase in %
`chrX could also be explored in female fetuses. Using these refer-
`ence values, we calculated the z-scores for each of the chromo-
`somes, except the Y chromosome, for each of the 28 cases, as shown
`in Fig. S4. The z-scores for chr21 and chrX are further shown in Fig.
`3B. All of the T21 cases had a z-score of ⬎3 (range 5.03–25.11) for
`chr21, i.e., at 3 standard deviations above the reference established
`from the euploid male fetuses. The cases with female fetuses had
`a z-score ⬎1.67 for chrX. All of the other chromosomes had z-scores
`within ⫾3 for all 28 cases.
`
`Reproducibility of Measuring Percentage of Chromosome Representa-
`tion. Among the 28 tested maternal plasma samples, we expected a
`difference in % chr21 representation between the T21 and euploid
`fetuses and the % chrX representation between the female and
`male fetuses. However, it was interesting to observe a small absolute
`difference in % chr21 representation, which translated to a large
`z-score difference but a large absolute difference in % chrX
`representation that translated to a less impressive z-score difference
`among the respective cases (Fig. 3). The absolute differences in
`
`Chiu et al.
`
`PNAS 兩 December 23, 2008 兩 vol. 105 兩 no. 51 兩 20461
`
`Ariosa Exhibit 1033, pg. 4
`IPR2013-00276
`
`
`
`uted by T21 fetuses compared with euploid fetuses can be unam-
`biguously detected. Absolute differences in amounts of chrX and
`chrY DNA sequences in maternal plasma contributed by male
`fetuses compared with female fetuses can also be observed robustly.
`The ability of MPGS to differentiate small quantitative perturba-
`tions in genomic distributions of chromosomes lies in the very large
`number of molecules analyzed, which minimizes the imprecision of
`the quantitative measurement. As no specific gene locus was
`targeted, all plasma DNA fragments together provide an unprec-
`edented number of molecules analyzed per plasma sample.
`This approach is in marked contrast to previous methods that
`quantified only DNA molecules that could serve as templates for
`locus-specific PCR assays, for example, SRY on chrY (8). The gene
`locus-specific DNA templates represent only an extremely small
`proportion of DNA fragments present in maternal plasma. In fact,
`MPGS is such a powerful tool for quantifying the relative genomic
`representation of plasma DNA molecules that only an amount
`corresponding to just a representative fraction of the human
`genome would need to be sequenced. For example, ⬃10 million
`36-bp reads were generated for each plasma sample, which was
`equivalent to just one-tenth of the human genome. Furthermore, in
`this study, only the U0–1–0–0 sequences, representing just ⬇20%
`of all of the reads sequenced from each plasma DNA sample, were
`used to generate a quantitative profile of chromosomal distribution.
`Thus, this is quite unlike some previously described sequencing-
`based methods for quantitative nucleic acid profiling that relied on
`sequencing at high fold coverage (21), for example, to determine
`the relative abundance of RNA species in transcriptome analysis
`(21). On the contrary, our present method simply sequences a
`random representative fraction of the human genome. The majority
`of DNA fragments are sequenced once, if at all. The relative
`chromosome size is then deduced by counting the relative number
`of sequences aligned to the chromosome. Each of the counted DNA
`fragments would be of a different nucleotide sequence. In fact, the
`pool of DNA sequenced for a sample would vary from run to run.
`Despite the randomness of the sequencing, the quantitative
`estimation of % chr21 sequences was so precise and robust that the
`z-scores for chr21 of the T21 pregnancies were markedly different
`from the mean of a reference euploid sample set. In this study, the
`median gestational age of the T21 pregnancies (14.1 weeks) was
`comparable with the median of the euploid group (15.4 weeks). All
`samples from the euploid group were collected before any invasive
`procedure in the present pregnancy. Blood samples from 11 of the
`T21 pregnancies were collected immediately before pregnancy
`termination at a median of 6 days (range: 2–22 days) after invasive
`prenatal diagnostic procedure. Our previous study (22) indicated
`that there would be no substantial difference in the fetal DNA
`concentration in samples collected days after amniocentesis. None-
`theless, blood samples from 3 T21 pregnancies, cases 17, 19, and 25
`(Table S5), were collected in the first trimester before chorionic
`villus sampling. Increases in their z-scores for chr21 were readily
`identifiable (Fig. 3B).
`Theoretically, the determination of the presence of quantitative
`perturbations in any particular chromosome could be achieved
`more precisely, for example, by taking into account the fetal DNA
`concentration to estimate the expected degree of chromosomal
`perturbation. Fetal DNA concentrations can be readily measured
`using either fetal epigenetic markers (23) or paternally inherited
`polymorphic markers (24). In this study, the fetal DNA concen-
`tration of each case was not required to derive cutoff values for
`determining the disease status of each case. First, according to
`Table S6, chr21 is one of the chromosomes whose percentage of
`representation could be measured at very low imprecision with our
`current protocol. Second, when compared to methods like digital
`RCD whereby disease cutoff values related to the fetal DNA
`concentration were required (15), many more chr21 sequences were
`measured by sequencing. For digital RCD, we reported that for a
`sample with 25% fetal DNA concentration, 7,680 digital PCRs
`
`would need to be performed to achieve a correct classification rate
`of 97%. Our previous data also showed that ⬇20% of the total
`number of digital PCRs analyzed, equal to 1,536 chr21 molecules
`for a 7,680-well experiment, would contain only the chr21 gene
`target and hence be counted as informative. Thus, the number of
`chr21 molecules (mean: 3.2 ⫻ 104) analyzed by the sequencing
`method is ⬃20-fold that of the digital RCD method. Hence, the
`measurement would be significantly more precise than the present
`scale of digital PCR analyses. However, by taking the fetal DNA
`concentration into account, the measurement of the percentage of
`chromosomal representation could be made more precisely for
`some of the other chromosomes or across batches and hence
`minimize false diagnoses.
`In fact, the precision and accuracy of MPGS for determining the
`genomic representation of maternal plasma DNA could be further
`improved by a number of postsequencing analysis strategies. For
`example, sequences occurring in regions of known copy number
`variations (25) could be adjusted for so that the reference range for
`euploid pregnancies might be even tighter. Sequences other than
`U0–1–0–0, for example, with one or two mismatches to the
`reference genome that, in some instances, may represent a poly-
`morphic difference between the tested sample and the reference
`human genome, may also be used to increase the number of usable
`sequence counts. We have also shown that the reproducibility of
`measuring the percentage contribution of plasma DNA sequences
`varied between chromosomes and the GC c