`
`SYSTEMS AND METHODS TO DETECT RARE MUTATIONS AND COPY
`
`NUMBER VARIATION
`
`WSGR Docket No. 42534-704102
`
`Invent0r(s):
`
`An1irAli Talasaz,
`a citizen of the United States,
`2181 Camino a Los Cerros
`
`Menlo Park, CA 94025
`
`Assignee:
`
`Guardant Health, Inc.
`
`Entity:
`
`Large
`
`W
`
`'Wilson Sensini Gocdrich 85 Rosati
`PRO 3; F. 531 O :\ AL CU R FUR/{1'1 ON
`
`650 Page Mill Road
`Palo Alto, CA 94304
`
`(650) 493-9300 (Main)
`(650) 493-6811 (Facsimile)
`
`Filed Electronically on: September 21, 2012
`
`PGDX EX. 1011
`
`Page 1 0f 52
`
`PGDX EX. 1011
`Page 1 of 52
`
`
`
`SYSTEMS AND METHODS TO DETECT RARE MUTATIONS AND COPY NUMBER
`
`VARIATION
`
`BACKGROUND OF THE INVENTION
`
`[0001] The detection and quantification of polynucleotides is important for molecular biology and
`
`medical applications such as diagnostics. Genetic testing is particularly useful for a number of
`
`diagnostic methods. For example, disorders that are caused by mutations, copy number variation, or
`
`changes in epigenetic markers, such as cancer and partial or complete aneuploidy, may be detected or
`
`more accurately characterized with DNA sequence information.
`
`[0002] Early detection and monitoring of genetic diseases, such as cancer is often useful and needed
`
`in the successful treatment or management of the disease. One approach may include the monitoring
`
`of a sample derived from cell free nucleic acids, a population of polynucleotides that can be found in
`
`different types of bodily fluids. In some cases, disease may be characterized or detected based on
`
`detection of genetic aberrations, such as a change in copy number variation and/or mutation of one or
`
`more nucleic acid sequences, or the development of certain rare mutations. Cell free DNAs have
`
`been known in the art for decades, and may contain genetic aberrations associated with a particular
`
`disease. With improvements in sequencing and techniques to manipulate nucleic acids, there is a
`
`need in the art for improved methods and systems for using cell free DNA to detect and monitor
`
`disease.
`
`SUMMARY OF THE INVENTION
`
`[0003] The disclosure provides for a method for detecting copy number variation comprising: a)
`
`sequencing extracellular polynucleotides from a bodily sample from a subject, wherein each of the
`
`extracellular polynucleotide are optionally attached to unique barcodes; b) filtering out reads that fail
`
`to meet a set threshold; c) mapping sequence reads obtained from step (a) to a reference sequence; d)
`
`quantifying/counting mapped reads in two or more predefined regions of the reference sequence; e)
`
`determining a copy number variation in one or more of the predefined regions by (i) normalizing
`
`number of reads in the predefined regions to each other and/or the number of unique barcodes in the
`
`predefined regions to each other; (ii) comparing the normalized numbers obtained in step (i) to
`
`5150779_1
`
`-2-
`
`42534-704102
`
`PGDX EX. 1011
`
`Page 2 of 52
`
`PGDX EX. 1011
`Page 2 of 52
`
`
`
`normalized numbers obtained from a control sample.
`
`[0004] The disclosure also provides for a method for detecting a rare mutation in a cell-free or
`
`substantially cell free sample obtained from a subject comprising: a) sequencing extracellular
`
`polynucleotides from a bodily sample from a subject, wherein each of the extracellular
`
`polynucleotide generate a plurality of sequencing reads;sequencing extracellular polynucleotides
`
`from a bodily sample from a subject, wherein each of the extracellular polynucleotide generate a
`
`plurality of sequencing reads; b) sequencing extracellular polynucleotides from a bodily sample from
`
`a subject, wherein each of the extracellular polynucleotide generate a plurality of sequencing
`
`reads;sequencing extracellular polynucleotides from a bodily sample from a subject, wherein each of
`
`the extracellular polynucleotide generate a plurality of sequencing reads; c)filtering out reads that fail
`
`to meet a set threshold; d) mapping sequence reads derived from the sequencing onto a reference
`
`sequence; e) identifying a subset of mapped sequence reads that align with a variant of the reference
`
`sequence at each mappable base position; f) for each mappable base position, calculating a ratio of (a)
`
`a number of mapped sequence reads that include a variant as compared to the reference sequence, to
`
`(b) a number of total sequence reads for each mappable base position; g) normalizing the ratios or
`
`frequency of variance for each mappable base position and determining potential rare variant(s) or
`
`mutation(s); h) and comparing the resulting number for each of the regions with potential rare
`
`variant(s) or mutation(s) to similarly derived numbers from a reference sample.
`
`[0005] Additionally, the disclosure also provides for a method of characterizing the heterogeneity of
`
`an abnormal condition in a subject, the method comprising generating a genetic profile of
`
`extracellular polynucleotides in the subject, wherein the genetic profile comprises a plurality of data
`
`resulting from copy number variation and rare mutation analyses.
`
`[0006]
`
`In some embodiments, the prevalence/concentration of each rare variant identified in the
`
`subject is reported and quantified simultaneously. In other embodiments, a confidences score,
`
`regarding the prevalence/concentrations of rare variants in the subject, is reported.
`
`[0007]
`
`In some embodiments, extracellular polynucleotide comprises DNA. In other embodiments,
`
`extracellular polynucleotides comprise RNA. Polynucleotides may be fragments or fragmented after
`
`isolation. Additionally, the disclosure provides for a method for circulating nucleic acid isolation and
`
`extraction.
`
`5150779_1
`
`-3-
`
`42534-704102
`
`PGDX EX. 1011
`
`Page 3 of 52
`
`PGDX EX. 1011
`Page 3 of 52
`
`
`
`[0008]
`
`In some embodiments, extracellular polynucleotides are isolated from a bodily sample which
`
`may be selected from a group consisting of blood, plasma, serum, urine, saliva, mucosal excretions,
`
`sputum, stool and tears.
`
`[0009]
`
`In some embodiments, the methods of the disclosure also comprise a step of determining the
`
`percent of sequences having copy number variation or rare mutation or variant in said bodily sample.
`
`[0010]
`
`In some embodiments, the percent of sequences having copy number variation in said bodily
`
`sample is determined by calculating the percentage of predefined regions with an amount of
`
`polynucleotides above or below a predetermined threshold.
`
`[0011]
`
`In some embodiments, bodily fluids are drawn from a subject suspected of having an
`
`abnormal condition which may be selected from the group consisting of, mutations, rare mutations,
`
`indels, copy number variations, transversions, translocations, inversion, deletions, aneupoloidy,
`
`partial aneulpoidy, polypoloidy, chromosomal instability, chromosomal structure alterations, gene
`
`fusions, chromosome fusions, gene truncations, gene amplification, gene duplications, chromosomal
`
`lesions, DNA lesions, abnormal changes in nucleic acid chemical modifications, abnormal changes in
`
`epigenetic patterns, abnormal changes in nucleic acid methylation infection and cancer.
`
`[0012]
`
`In some embodiments, the subject may be a pregnant female in which the abnormal condition
`
`may be a fetal abnormality selected from the group consisting of, mutations, rare mutations, indels,
`
`copy number variations, transversions, translocations, inversion, deletions, aneupoloidy, partial
`
`aneulpoidy, polypoloidy, chromosomal instability, chromosomal structure alterations, gene fusions,
`
`chromosome fusions, gene truncations, gene amplification, gene duplications, chromosomal lesions,
`
`DNA lesions, abnormal changes in nucleic acid chemical modifications, abnormal changes in
`
`epigenetic patterns, abnormal changes in nucleic acid methylation infection and cancer
`
`[0013]
`
`In some embodiments, the method may comprise comprising attaching one or more barcodes
`
`to the extracellular polynucleotides or fragments thereof prior to sequencing, in which the barcodes
`
`comprise are unique. In other embodiments barcodes attached to extracellular polynucleotides or
`
`fragments thereof prior to sequencing are not unique.
`
`[0014]
`
`In some embodiments, the methods of the disclosure may comprise selectively enriching
`
`regions from the subject’s genome or transcriptome prior to sequencing. In other embodiments the
`
`methods of the disclosure comprise selectively enriching regions from the subject’s genome or
`
`5150779_l
`
`-4-
`
`42534-704102
`
`PGDX EX. 1011
`
`Page 4 of 52
`
`PGDX EX. 1011
`Page 4 of 52
`
`
`
`transcriptome prior to sequencing. In other embodiments the methods of the disclosure comprise
`
`non-selectively enriching regions from the subject’s genome or transcriptome prior to sequencing.
`
`[0015] Further, the methods of the disclosure comprise attaching one or more barcodes to the
`
`extracellular polynucleotides or fragments thereof prior to any amplification or enrichment step.
`
`[0016]
`
`In some embodiments, the barcode is a polynucleotide, which may further comprise random
`
`sequence or a fixed or semi-random set of oligonucleotides that in combination with the diversity of
`
`molecules sequenced from a select region enables identification of unique molecules and be at least a
`
`3, 5, 10, 15, 20 25, 30, 35, 40, 45, or 50mer base pairs in length.
`
`[0017]
`
`In some embodiments, extracellular polynucleotides or fragments thereof may be amplified.
`
`In some embodiments amplification comprises global amplification or whole genome amplification.
`
`[0018]
`
`In some embodiments, sequence reads of unique identity may be detected based on sequence
`
`information at the beginning (start) and end (stop) regions of the sequence read and the length of the
`
`sequence read. In other embodiments sequence molecules of unique identity are detected based on
`
`sequence information at the beginning (start) and end (stop) regions of the sequence read, the length
`
`of the sequence read and attachment of a barcode.
`
`[0019]
`
`In some embodiments, amplification comprises selective amplification, non-selective
`
`amplification, suppression amplification or subtractive enrichment.
`
`[0020]
`
`In some embodiments, the methods of the disclosure comprise removing a subset of the reads
`
`from further analysis prior to quantifying or enumerating reads.
`
`[0021]
`
`In some embodiments, the method may comprise filtering out reads with an accuracy or
`
`quality score of less than a threshold, e.g., 90%, 99%, 99.9%, or 99.99% and/or mapping score less
`
`than a threshold, e.g., 90%, 99%, 99.9% or 99.99%. In other embodiments, methods of the
`
`disclosure comprise filtering reads with a quality score lower than a set threshold.
`
`[0022]
`
`In some embodiments, predefined regions are uniform or substantially uniform in size, about
`
`10kb, 20kb, 30kb 40kb, 50kb, 60kb, 70kb, 80kb, 90kb, or 100kb in size. In some embodiments, at
`
`least 50, 100, 200, 500, 1000, 2000, 5000, 10,000, 20,000, or 50,000 regions are analyzed.
`
`[0023]
`
`In some embodiments, a genetic variant, rare mutation or copy number variation occurs in a
`
`region of the genome selected from the group consisting of gene filSlOIlS, gene duplications, gene
`
`deletions, gene translocations, microsatellite regions, gene fragments or combination thereof. In
`
`5150779_1
`
`-5-
`
`42534-704102
`
`PGDX EX. 1011
`
`Page 5 of 52
`
`PGDX EX. 1011
`Page 5 of 52
`
`
`
`other embodiments a genetic variant, rare mutation or copy number variation occurs in a region of the
`
`genome selected from the group consisting of genes, oncogenes, tumor suppressor genes, promoters,
`
`regulatory sequence elements, or combination thereof. In some embodiments the variant is a
`
`nucleotide variant, single base substitution, or small indel, transversion, translocation, inversion,
`
`deletion, truncation or gene truncation about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or 20 nucleotides in length.
`
`[0024]
`
`In some embodiments, the method comprises correcting/normalizing/adjusting the quantity of
`
`mapped reads using the barcodes or unique properties of individual reads.
`
`[0025]
`
`In some embodiments, enumerating the reads is performed through enumeration of unique
`
`barcodes in each of the predefined regions and normalizing those numbers across at least a subset of
`
`predefined regions that were sequenced. In some embodiments, samples at succeeding time intervals
`
`from the same subject are analyzed and compared to previous sample results. The method of the
`
`disclosure may further comprise determining partial copy number variation frequency, loss of
`
`heterozygosity, gene expression analysis, epigenetic analysis and hypermethylation analysis after
`
`amplifying the barcode-attached extracellular polynucleotides.
`
`[0026]
`
`In some embodiments, copy number variation and rare mutation analysis is determined in a
`
`cell-free or substantially cell free sample obtained from a subject using multiplex sequencing,
`
`comprising performing over 10,000 sequencing reactions; simultaneously sequencing at least 10,000
`
`different reads; or performing data analysis on at least 10,000 different reads across the genome. The
`
`method may comprise multiplex sequencing comprising performing data analysis on at least 10,000
`
`different reads across the genome. The method may further comprise enumerating sequenced reads
`
`that are uniquely identifiable.
`
`[0027]
`
`In some embodiments, the methods of the disclosure comprise normalizing and detection is
`
`performed using one or more of hidden markov, dynamic programming, support vector machine,
`
`Bayesian network, trellis decoding, Viterbi decoding, expectation maximization, Kalman filtering, or
`
`neural network methodologies.
`
`[0028]
`
`In some embodiments the methods of the disclosure comprise monitoring disease progression,
`
`monitoring residual disease, monitoring therapy, diagnosing a condition, prognosing a condition, or
`
`selecting a therapy based on discovered variants.
`
`[0029]
`
`In some embodiments, a therapy is modified based on the most recent sample analysis.
`
`5150779_1
`
`-6-
`
`42534-704102
`
`PGDX EX. 1011
`
`Page 6 of 52
`
`PGDX EX. 1011
`Page 6 of 52
`
`
`
`Further, the methods of the disclosure comprise inferring the genetic profile of a tumor, infection or
`
`other tissue abnormality. In some embodiments growth, remission or evolution of a tumor, infection
`
`or other tissue abnormality is monitored. In some embodiments the subject’s immune system are
`
`analyzed and monitored at single instances or over time.
`
`[0030]
`
`In some embodiments, the methods of the disclosure comprise identification of a variant that
`
`is followed up through an imaging test (e. g., CT, PET-CT, MRI, X-ray, ultrasound) for localization
`
`of the tissue abnormality suspected of causing the identified variant.
`
`[0031]
`
`In some embodiments, the methods of the disclosure comprise use of genetic data obtained
`
`from a tissue or tumor biopsy from the same patient. In some embodiments, whereby the
`
`phylo genetics of a tumor, infection or other tissue abnormality is inferred.
`
`[0032]
`
`In some embodiments, the methods of the disclosure comprise performing population-based
`
`no-calling and identification of low-confidence regions. In some embodiments, obtaining the
`
`measurement data for the sequence coverage comprises measuring sequence coverage depth at every
`
`position of the genome. In some embodiments correcting the measurement data for the sequence
`
`coverage bias comprises calculating window-averaged coverage. In some embodiments correcting
`
`the measurement data for the sequence coverage bias comprises performing adjustments to account
`
`for GC bias in the library construction and sequencing process. In some embodiments correcting the
`
`measurement data for the sequence coverage bias comprises performing adjustments based on
`
`additional weighting factor associated with individual mappings to compensate for bias.
`
`[0033]
`
`In some embodiments, the methods of the disclosure comprise extracellular polynucleotide
`
`derived from a diseased cell origin. In some embodiments, the extracellular polynucleotide is derived
`
`from a healthy cell origin.
`
`[0034] The disclosure also provides for a system comprising a computer readable medium for
`
`performing the following steps: selecting predefined regions in a genome; enumerating number of
`
`sequence reads in the predefined regions; normalizing the number of sequence reads across the
`
`predefined regions; and determining percent of copy number variation in the predefined regions. In
`
`some embodiments, the entirety of the genome or at least 85% of the genome is analyzed. In some
`
`embodiments, computer readable medium provides data on percent cancer DNA or RNA in plasma or
`
`serum to the end user.
`
`5150779_1
`
`-7-
`
`42534-704102
`
`PGDX EX. 1011
`
`Page 7 of 52
`
`PGDX EX. 1011
`Page 7 of 52
`
`
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0035] The novel features of a system and methods of this disclosure are set forth with particularity
`
`in the appended claims. A better understanding of the features and advantages of this disclosure will
`
`be obtained by reference to the following detailed description that sets forth illustrative embodiments,
`
`in which the principles of a systems and methods of this disclosure are utilized, and the
`
`accompanying drawings of which:
`
`[0036] Fig. 1 is a flow chart representation of a method of detection of copy number variation using a
`
`single sample
`
`[0037] Fig. 2 is a flow chart representation of a method of detection of copy number variation using
`
`paired samples
`
`[0038] Fig. 3 is a flow chart representation of a method of detection of rare mutation detection.
`
`[0039] Fig. 4A is graphical copy number variation detection report generated from a normal, non
`
`cancerous subject
`
`[0040] Fig. 4B is a graphical copy number variation detection report generated from a subject with
`
`prostate cancer.
`
`[0041] Fig. 4C is schematic representation of intemet enabled access of reports generated from copy
`
`number variation analysis of a subject with prostate cancer.
`
`[0042] Fig. 5A is a graphical copy number variation detection report generated from a subject with
`
`prostate cancer remission.
`
`[0043] Fig. 5B is a graphical copy number variation detection report generated from a subject with
`
`prostate recurrence cancer.
`
`[0044] Fig. 6A is graphical rare mutation detection report generated from various mixing
`
`experiments using DNA samples containing both wildtype and mutant copies of MET and TP53.
`
`[0045] Fig. 6B is logarithmic graphical representation of rare mutation detection results. Observed
`
`vs. expected percent cancer measurements are shown for various mixing experiments using DNAs
`
`samples containing both wildtype and mutant copies of MET, HRAS and TP53.
`
`[0046] Fig. 7A is graphical report of percentage of two rare mutations in two genes, MET and TP53,
`
`in a subject with prostate cancer as compared to a reference (control).
`
`5150779_l
`
`-8-
`
`42534-704102
`
`PGDX EX. 1011
`
`Page 8 of 52
`
`PGDX EX. 1011
`Page 8 of 52
`
`
`
`[0047] Fig. 7B is schematic representation of intemet enabled access of reports generated from rare
`
`mutation analysis of a subject with prostate cancer.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`1. General Overview
`
`[0048] The present disclosure provides a system and method for the detection of rare mutations and
`
`copy number variations in cell free polynucleotides. Generally, the systems and methods comprise
`
`sample preparation, or the extraction and isolation of cell free polynucleotide sequences from a
`
`bodily fluid; subsequent sequencing of cell free polynucleotides by techniques known in the art; and
`
`application of bioinformatics tools to detect rare mutations and copy number variations as compared
`
`to a reference. The systems and methods also may contain a database or collection of different rare
`
`mutations or copy number variation profiles of different diseases, to be used as additional references
`
`in aiding detection of rare mutations, copy number variation profiling or general genetic profiling of
`
`a disease.
`
`[0049] The systems and methods may be particularly usefill in the analysis of cell free DNAs. In
`
`some cases, cell free DNAs are extracted and isolated from a readily accessible bodily fluid such as
`
`blood. For example, cell free DNAs can be extracted using a variety of methods known in the art,
`
`including but not limited to isopropanol precipitation and/or silica based purification. Cell free
`
`DNAs may be extracted from any number of subjects, such as subjects without cancer, subjects at
`
`risk for cancer, or subjects known to have cancer (e. g. through other means).
`
`[0050] Following the isolation/extraction step, any of a number of different sequencing operations
`
`may be performed on the cell free polynucleotide sample. Samples may be processed before
`
`sequencing with one or more reagents (e.g., enzymes, unique identifiers (e. g., barcodes), probes,
`
`etc.). In some cases if the sample is processed with a unique identifier such as a barcode, the samples
`
`or fragments of samples may be tagged individually or in subgroups with the unique identifier. The
`
`tagged sample may then be used in a downstream application such as a sequencing reaction by which
`
`individual molecules may be tracked to parent molecules.
`
`[0051] After sequencing data of cell free polynucleotide sequences is collected, one or more
`
`bioinformatics processes may be applied to the sequence data to detect genetic features or aberrations
`
`5150779_1
`
`-9-
`
`42534-704102
`
`PGDX EX. 1011
`
`Page 9 of 52
`
`PGDX EX. 1011
`Page 9 of 52
`
`
`
`such as copy number variation, rare mutations or changes in epigenetic markers, including but not
`
`limited to methylation profiles. In some cases, in which copy number variation analysis is desired,
`
`sequence data may be: 1) aligned with a reference genome; 2) filtered and mapped; 3) partitioned
`
`into windows or bins of sequence; 4) coverage reads counted for each window; 5) coverage reads can
`
`then be normalized using a stochastic or statistical modeling algorithm; 6) and an output file can be
`
`generated reflecting discrete copy number states at various positions in the genome. In other cases, in
`
`which rare mutation analysis is desired, sequence data may be 1) aligned with a reference genome; 2)
`
`filtered and mapped; 3) frequency of variant bases calculated based on coverage reads for that
`
`specific base; 4) variant base frequency normalized using a stochastic, statistical or probabilistic
`
`modeling algorithm; 5) and an output file can be generated reflecting mutation states at various
`
`positions in the genome.
`
`[0052] A variety of different reactions and/operations may occur within the systems and methods
`
`disclosed herein, including but not limited to: nucleic acid sequencing, nucleic acid quantification,
`
`sequencing optimization, detecting gene expression, quantifying gene expression, genomic profiling,
`
`cancer profiling, or analysis of expressed markers. Moreover, the systems and methods have
`
`numerous medical applications. For example, it may be used for the identification, detection,
`
`diagnosis, treatment, staging of, or risk prediction of various genetic and non-genetic diseases and
`
`disorders including cancer. It may be used to assess subject response to different treatments of said
`
`genetic and non-genetic diseases, or provide information regarding disease progression and
`
`prognosis.
`
`11. Sample Preparation
`
`A. Polynucleotide Isolation and Extraction
`
`[0053] The systems and methods of this disclosure may have a wide variety of uses in the
`
`manipulation, preparation, identification and/or quantification of cell free polynucleotides. Examples
`
`of polynucleotides include but are not limited to: DNA, RNA, amplicons, cDNA, dsDNA, ssDNA,
`
`plasmid DNA, cosmid DNA, high Molecular Weight (MW) DNA, chromosomal DNA, genomic
`
`DNA, viral DNA, bacterial DNA, mtDNA (mitochondrial DNA), mRNA, rRNA, tRNA, nRNA,
`
`siRNA, snRNA, snoRNA, scaRNA, microRNA, dsRNA, ribozyme, riboswitch and viral RNA (e. g.,
`
`retroviral RNA).
`
`5150779_1
`
`-10-
`
`42534—704102
`
`PGDX EX. 1011
`
`Page 10 of 52
`
`PGDX EX. 1011
`Page 10 of 52
`
`
`
`[0054] Cell free polynucleotides may be derived from a variety of sources including human,
`
`mammal, non-human mammal, ape, monkey, chimpanzee, reptilian, amphibian, or avian, sources.
`
`Further, samples may be extracted from variety of animal fluids containing cell free sequences,
`
`including but not limited to blood, serum, plasma, vitreous, sputum, urine, tears, perspiration, saliva,
`
`semen, mucosal excretions, mucus, spinal fluid, amniotic fluid, lymph fluid and the like. Cell free
`
`polynucleotides may be fetal in origin (via fluid taken from a pregnant subject), or may be derived
`
`from tissue of the subject itself.
`
`[0055]
`
`Isolation and extraction of cell free polynucleotides may be performed through collection of
`
`bodily fluids using a variety of techniques. In some cases, collection may comprise aspiration of a
`
`bodily fluid from a subject using a syringe. In other cases collection may comprise pipetting or direct
`
`collection of fluid into a collecting vessel.
`
`[0056] After collection of bodily fluid, cell free polynucleotides may be isolated and extracted using
`
`a variety of techniques known in the art. In some cases, cell free DNA may be isolated, extracted and
`
`prepared using commercially available kits such as the Qiagen Qiamp® Circulating Nucleic Acid Kit
`
`protocol. In other examples, Qiagen Qubit TM dsDNA HS Assay kit protocol, AgilentTM DNA 1000
`
`kit, or TruSeqTM Sequencing Library Preparation; Low-Throughput (LT) protocol may be used.
`
`[0057] Generally, cell free polynucleotides are extracted and isolated by from bodily fluids through a
`
`partitioning step in which cell free DNAs, as found in solution, are separated from cells and other non
`
`soluble components of the bodily fluid. Partitioning may include, but is not limited to, techniques
`
`such as centrifilgation or filtration. In other cases, cells are not partitioned from cell free DNA first,
`
`but rather lysed. In this example, the genomic DNA of intact cells is partitioned through selective
`
`precipitation. Cell free polynucleotides, including DNA, may remain soluble and may be separated
`
`from insoluble genomic DNA and extracted. Generally, after addition of buffers and other wash
`
`steps specific to different kits, DNA may be precipitated using isoproponal precipitation. Further
`
`clean up steps may be used such as silica based columns to remove contaminants or salts. General
`
`steps may be optimized for specific applications. Non specific bulk carrier polynucleotides, for
`
`example, may be added throughout the reaction to optimize certain aspects of the procedure such as
`
`yield.
`
`[0058]
`
`Isolation and purification of cell free DNA may be accomplished using any means, including,
`
`5150779_1
`
`-11-
`
`42534—704102
`
`PGDX EX. 1011
`
`Page 11 of 52
`
`PGDX EX. 1011
`Page 11 of 52
`
`
`
`but not limited to, the use of commercial kits and protocols provided by companies such as Sigma
`
`Aldrich, Life Technologies, Promega, Affymetrix, IBI or the like. Kits and protocols may also be
`
`non-commercially available.
`
`[0059] After isolation, in some cases, the cell free polynucleotides are pre-mixed with one or more
`
`additional materials, such as one or more reagents (e.g., ligase, protease, polymerase) prior to
`
`sequencing.
`
`B. Molecular Bar Coding of Cell Free Polynucleotides
`
`[0060] The systems and methods of this disclosure may also enable the cell free polynucleotides to
`
`be tagged or tracked in order to permit subsequent identification and origin of the particular
`
`polynucleotide. This feature is in contrast with other methods that use pooled or multiplex reactions
`
`and that only provide measurements or analyses as an average of multiple samples. Here, the
`
`assignment of an identifier to individual or subgroups of polynucleotides may allow for a unique
`
`identity to be assigned to individual sequences or fragments of sequences. This may allow
`
`acquisition of data from individual samples and is not limited to averages of samples.
`
`[0061]
`
`In some examples, nucleic acids or other molecules derived from a single strand may share a
`
`common tag or identifier and therefore may be later identified as being derived from that strand.
`
`Similarly, all of the fragments from a single strand of nucleic acid may be tagged with the same
`
`identifier or tag, thereby permitting subsequent identification of fragments from the parent strand.
`
`In
`
`other cases, gene expression products (e.g., mRNA) may be tagged in order to quantify expression,
`
`by which the barcode, or the barcode in combination with sequence to which it is attached can be
`
`counted. In still other cases, the systems and methods can be used as a PCR amplification control. In
`
`such cases, multiple amplification products from a PCR reaction can be tagged with the same tag or
`
`identifier. If the products are later sequenced and demonstrate sequence differences, differences
`
`among products with the same identifier can then be attributed to PCR error.
`
`[0062] Additionally, individual sequences may be identified based upon characteristics of sequence
`
`data for the read themselves. For example, the detection of unique sequence data at the beginning
`
`(start) and end (stop) portions of individual sequencing reads may be used, alone or in combination,
`
`with the length, or number of base pairs of each sequence read unique sequence to assign unique
`
`identities to individual molecules. Fragments from a single strand of nucleic acid, having been
`
`5150779_1
`
`-12-
`
`42534—704102
`
`PGDX EX. 1011
`
`Page 12 of 52
`
`PGDX EX. 1011
`Page 12 of 52
`
`
`
`assigned a unique identity, may thereby permit subsequent identification of fragments from the parent
`
`strand. This can be used in conjunction with bottlenecking the initial starting genetic material to limit
`
`diversity.
`
`[0063] Further, using unique sequence data at the beginning (start) and end (stop) portions of
`
`individual sequencing reads and sequencing read length may be used, alone or combination, with the
`
`use of barcodes. In some cases, the barcodes may be unique as described herein. In other cases, the
`
`barcodes themselves may not be unique. In this case, the use of non unique barcodes, in combination
`
`with sequence data at the beginning (start) and end (stop) portions of individual sequencing reads and
`
`sequencing read length may allow for the assignment of a unique identity to individual sequences.
`
`Similarly, fragments from a single strand of nucleic acid having been assigned a unique identity, may
`
`thereby permit subsequent identification of fragments from the parent strand.
`
`[0064] Generally, the methods and systems provided herein are useful for preparation of cell free
`
`polynucleotide sequences to a down-stream application sequencing reaction. Often, a sequencing
`
`method is classic Sanger sequencing. Sequencing methods may include, but are not limited to: high-
`
`throughput sequencing, pyrosequencing, sequencing-by-synthesis, single-molecule sequencing,
`
`nanopore sequencing, semiconductor sequencing, sequencing-by-ligation, sequencing-by-
`
`hybridization, RNA-Seq (Illumina), Digital Gene Expression (Helicos), Next generation sequencing,
`
`Single Molecule Sequencing by Synthesis (SMSS)(Helicos), massively-parallel sequencing, Clonal
`
`Single Molecule Array (Solexa), shotgun sequencing, Maxim-Gilbert sequencing, primer walking,
`
`and any other sequencing methods known in the art.
`
`C. Assignment of Barcodes to Cell Free Polynucleotide Sequences
`
`[0065] The systems and methods disclosed herein may be used in applications that involve the
`
`assignment of unique or non-unique identifiers, or molecular barcodes, to cell free polynucleotides.
`
`Often, the identifier is a bar-code oligonucleotide that is used to tag the polynucleotide; but, in some
`
`cases, different unique identifiers are used. For example, in some cases, the unique identifier is a
`
`hybridization probe. In other cases, the unique identifier is a dye, in which case the attachment may
`
`comprise intercalation of the dye into the analyte molecule (such as intercalation into DNA or RNA)
`
`or binding to a probe labeled with the dye. In still other cases, the unique identifier may be a nucleic
`
`acid oligonucleotide, in which case the attachment to the polynucleotide sequences may comprise a
`
`5150779_1
`
`-13-
`
`42534—704102
`
`PGDX EX. 1011
`
`Page 13 of 52
`
`PGDX EX. 1011
`Page 13 of 52
`
`
`
`ligation reaction between the oligonucleotide and the sequences or incorporation through PCR. In
`
`other cases, the reaction may comprise addition of a metal isotope, either directly to the analyte or by
`
`a probe labeled with the isotope. Generally, assignment of unique or non-unique identifiers, or
`
`molecular barcodes in reactions of this disclosure may follow methods and systems described by US
`
`patent applications 20010053519, 20030152490, 20110160078 and US patent US 6,582,908.
`
`[0066] Often, the method c