`
`GUARDANT - EXHIBIT 2030
`Foundation Medicine, Inc. v. Guardant Health, Inc.
`IPR2019-00634
`
`
`
`Atty Dockct No. 42534-704601
`
`sequence; d) quantifying/counting mappedreads in two or more predefined regions of the
`
`reference sequence; e) determining a copy numbervariation in one or moreof the predefined
`
`regions by (i) normalizing the numberof reads in the predefined regions to each other and/or the
`
`number of unique barcodesin the predefined regions to each other; and (ii) comparing the
`
`normalized numbersobtainedin step (i) to normalized numbers obtained from a control sample.
`
`[0005] The disclosure also provides for a method for detecting a rare mutation in a cell-free or
`
`substantially ccll free sample obtained from a subject comprising: a) sequencing extraccllular
`
`polynucleotides from a bodily sample from a subject, wherein each of the extracellular
`
`polynucleotide generate a plurality of sequencing reads; b) sequencing extracellular
`
`polynucleotides from a bodily sample from a subject, wherein each of the extracellular
`
`polynucleotide generate a plurality of sequencing reads; sequencing extracellular polynucleotides
`
`from a bodily sample from a subject, wherein each of the extracellular polynucleotide generate a
`
`plurality of sequencing reads; c) filtering out reads that fail to meet a set threshold; d) mapping
`
`sequence reads derived from the sequencing onto a reference sequence; e) identifying a subset of
`
`mapped sequencereads that align with a variant of the reference sequence at each mappable base
`
`position; f) for cach mappable basc position, calculating a ratio of (a) a number of mapped
`
`sequencereads that include a variant as comparedto the reference sequence, to (b) a number of
`
`total sequence reads for each mappable base position; g) normalizing the ratios or frequency of
`
`variance for each mappable base position and determining potential rare variant(s) or mutation(s);
`
`h) and comparing the resulting number for each of the regions with potential rare variant(s) or
`
`mutation(s) to similarly derived numbers from a reference sample.
`
`[0006] Additionally, the disclosure also provides for a method of characterizing the heterogeneity
`
`of an abnormal condition in a subject, the method comprising generating a genetic profile of
`
`extracellular polynucleotides in the subject, wherein the genetic profile comprises a plurality of
`
`data resulting from copy numbervariation and/or other rare mutation (c.g., genctic altcration)
`
`analyses.
`
`[0007] In some embodiments, the prevalence/concentration of each rare variant identified in the
`
`subject is reported and quantified simultaneously. In other embodiments, a confidence score,
`
`regarding the prevalence/concentrations of rare variants in the subject, is reported.
`
`[0008] In some embodiments, extracellular polynucleotides comprise DNA. In other
`
`embodiments, extracellular polynucleotides comprise RNA. Polynucleotides may be fragments or
`
`-2-
`
`0002
`
`0002
`
`
`
`Atty Dockct No. 42534-704601
`
`fragmented after isolation. Additionally, the disclosure provides for a methodfor circulating
`
`nucleic acid isolation and extraction.
`
`[0009] In some embodiments, extracellular polynucleotides are isolated from a bodily sample that
`
`may be selected from a group consisting of blood, plasma, serum, urine, saliva, mucosal
`
`excretions, sputum, stool andtears.
`
`[0010] In some embodiments, the methods of the disclosure also comprise a step of determining
`
`the percent of sequences having copy numbervariation or other rare genctic altcration (c.g.,
`
`sequencevariants) in said bodily sample.
`
`[0011] In some embodiments, the percent of sequences having copy numbervariation in said
`
`bodily sample is determined by calculating the percentage of predefined regions with an amount
`
`of polynucleotides above or below a predetermined threshold.
`
`[0012] In some embodiments, bodily fluids are drawn from a subject suspected of having an
`
`abnormal condition which may beselected from the group consisting of, mutations, rare
`
`mutations, single nucleotide variants, indels, copy numbervariations, transversions,
`
`translocations, inversion, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal
`
`instability, chromosomalstructure altcrations, gene fusions, chromosomefusions, gence
`
`truncations, gene amplification, gene duplications, chromosomal lesions, DNAlesions, abnormal
`
`changes in nucleic acid chemical modifications, abnormal changes in epigenetic patterns,
`
`abnormal changes in nucleic acid methylation infection and cancer.
`
`[0013] In some embodiments, the subject may be a pregnant female in which the abnormal
`
`condition maybe a fetal abnormality selected from the group consisting of, single nucleotide
`
`variants, indels, copy numbervariations, transversions, translocations, inversion, deletions,
`
`aneuploidy, partial aneuploidy, polyploidy, chromosomalinstability, chromosomal structure
`
`alterations, gene fusions, chromosomefusions, genetruncations, gene amplification, gene
`
`duplications, chromosomal Icsions, DNA Icsions, abnormal changesin nucleic acid chemical
`
`modifications, abnormal changes in epigenetic patterns, abnormal changes in nucleic acid
`
`methylation infection and cancer
`
`[0014] In some embodiments, the method may comprise comprising attaching one or more
`
`barcodesto the extracellular polynucleotides or fragments thereof prior to sequencing, in which
`
`the barcodes comprise are unique. In other embodiments barcodes attached to extracellular
`
`polynucleotides or fragments thereof prior to sequencingare not unique.
`
`3-
`
`0003
`
`0003
`
`
`
`Atty Dockct No. 42534-704601
`
`[0015]
`
`In some embodiments, the methods of the disclosure may comprise selectively enriching
`
`regions from the subject’s genomeor transcriptomeprior to sequencing. In other embodiments
`
`the methods of the disclosure comprise selectively enriching regions from the subject’s genome or
`
`transcriptome prior to sequencing. In other embodiments the methodsofthe disclosure comprise
`
`non-selectively enriching regions from the subject’s genomeortranscriptome prior to sequencing.
`
`[0016] Further, the methods of the disclosure comprise attaching one or more barcodesto the
`
`extraccllular polynucleotides or fragments thercof prior to any amplification or cnrichmentstcp.
`
`[0017] In some embodiments, the barcode is a polynucleotide, which may further comprise
`
`random sequenceor a fixed or semi-randomset of oligonucleotides that in combination with the
`
`diversity of molecules sequenced from a select region enables identification of unique molecules
`
`andbeat least a 3, 5, 10, 15, 20 25, 30, 35, 40, 45, or 5Omerbase pairs in length.
`
`[0018] In some embodiments, extracellular polynucleotides or fragments thereof may be
`
`amplified. In some embodiments amplification comprises global amplification or whole genome
`
`amplification.
`
`[0019] In some embodiments, sequence reads of unique identity may be detected based on
`
`sequence information at the beginning(start) and cnd (stop) regions of the sequenceread and the
`
`length of the sequence read. In other embodiments sequence molecules of unique identity are
`
`detected based on sequence information at the beginning (start) and end (stop) regions of the
`
`sequenceread, the length of the sequence read and attachmentof a barcode.
`
`[0020] In some embodiments, amplification comprises selective amplification, non-selective
`
`amplification, suppression amplification or subtractive enrichment.
`
`[0021] In some embodiments, the methods of the disclosure comprise removing a subset of the
`
`reads from further analysis prior to quantifying or enumerating reads.
`
`[0022] In some embodiments, the method may comprise filtering out reads with an accuracy or
`
`quality score of less than a threshold, c.g., 90%, 99%, 99.9%, or 99.99% and/or mapping score
`
`less than a threshold, e.g., 90%, 99%, 99.9% or 99.99%. In other embodiments, methods of the
`
`disclosure comprise filtering reads with a quality score lower than a set threshold.
`
`[0023] In some embodiments, predefined regions are uniform or substantially uniform in size,
`
`about 10kb, 20kb, 30kb 40kb, 50kb, 60kb, 70kb, 80kb, 90kb, or 100kb in size. In some
`
`embodiments, at least 50, 100, 200, 500, 1000, 2000, 5000, 10,000, 20,000, or 50,000 regionsare
`
`analyzed.
`
`_4-
`
`0004
`
`0004
`
`
`
`Atty Dockct No. 42534-704601
`
`[0024] In some embodiments, a genetic variant, rare mutation or copy numbervariation occurs in
`
`a region of the genomeselected from the group consisting of gene fusions, gene duplications,
`
`gene deletions, gene translocations, microsatellite regions, gene fragments or combination thereof.
`
`In other embodimentsa genetic variant, rare mutation, or copy numbervariation occurs in a
`
`region of the genomeselected from the group consisting of genes, oncogenes, tumor suppressor
`
`genes, promoters, regulatory sequence elements, or combination thereof. In some embodiments
`
`the variant is a nucleotide variant, single basc substitution, or small indcl, transversion,
`
`translocation, inversion, deletion, truncation or genetruncation about 1, 2, 3, 4, 5, 6, 7, 8,9, 10, 15
`
`or 20 nucleotides in length.
`
`[0025] In some embodiments, the method comprises correcting/normalizing/adjusting the
`
`quantity of mapped reads using the barcodes or unique properties of individualreads.
`
`[0026] In some embodiments, enumerating the reads is performed through enumeration of unique
`
`barcodesin each of the predefined regions and normalizing those numbersacross at least a subset
`
`of predefined regions that were sequenced. In some embodiments, samples at succeeding time
`
`intervals from the same subject are analyzed and compared to previous sample results. The
`
`method ofthe disclosure may furthcr comprise determining partial copy numbervariation
`
`frequency, loss of heterozygosity, gene expression analysis, epigenetic analysis and
`
`hypermethylation analysis after amplifying the barcode-attached extracellular polynucleotides.
`
`[0027] In some embodiments, copy numbervariation and rare mutation analysis is determined in
`
`a cell-free or substantially cell free sample obtained from a subject using multiplex sequencing,
`
`comprising performing over 10,000 sequencing reactions; simultaneously sequencingat least
`
`10,000 different reads; or performing data analysis on at least 10,000 different reads across the
`
`genome. The method may comprise multiplex sequencing comprising performing data analysis
`
`on at least 10,000 different reads across the genome. The method may further comprise
`
`cnumcrating sequencedreads that are uniqucly identifiable.
`
`[0028] In some embodiments, the methods of the disclosure comprise normalizing and detection
`
`is performed using one or more of hidden markov, dynamic programming, support vector
`
`machine, Bayesian network,trellis decoding, Viterbi decoding, expectation maximization,
`
`Kalmanfiltering, or neural network methodologies.
`
`[0029] In some embodiments the methods of the disclosure comprise monitoring disease
`
`progression, monitoring residual disease, monitoring therapy, diagnosing a condition, prognosing
`
`a condition, or selecting a therapy based on discovered variants.
`
`-5-
`
`0005
`
`0005
`
`
`
`Atty Dockct No. 42534-704601
`
`[0030] In some embodiments, a therapy is modified based on the most recent sample analysis.
`
`Further, the methodsof the disclosure comprise inferring the genetic profile of a tumor, infection
`
`or other tissue abnormality. In some embodiments growth, remission or evolution of a tumor,
`
`infection or other tissue abnormality is monitored. In some embodiments the subject’s immune
`
`system are analyzed and monitoredat single instancesor over time.
`
`[0031] In some embodiments, the methods of the disclosure comprise identification of a variant
`
`that is followed up through an imagingtest (c.g., CT, PET-CT, MRI, X-ray, ultrasound) for
`
`localization of the tissue abnormality suspected of causing the identified variant.
`
`[0032] In some embodiments, the methods of the disclosure comprise use of genetic data obtained
`
`from a tissue or tumor biopsy from the samepatient. In some embodiments, whereby the
`
`phylogenetics of a tumor, infection or other tissue abnormality is inferred.
`
`[0033] In some embodiments, the methodsof the disclosure comprise performing population-
`
`based no-calling and identification of low-confidence regions. In some embodiments, obtaining
`
`the measurementdata for the sequence coverage comprises measuring sequence coverage depth at
`
`every position of the genome. In some embodiments correcting the measurementdata for the
`
`sequence coverage bias comprises calculating window-avecraged coverage. In some embodiments
`
`correcting the measurementdata for the sequence coverage bias comprises performing
`
`adjustments to account for GC bias in the library construction and sequencing process. In some
`
`embodiments correcting the measurement data for the sequence coverage bias comprises
`
`performing adjustments based on additional weighting factor associated with individual mappings
`
`to compensatefor bias.
`
`[0034] In some embodiments, the methods of the disclosure comprise extracellular polynucleotide
`
`derived from a diseased cell origin. In some embodiments, the extracellular polynucleotide is
`
`derived from a healthy cell origin.
`
`[0035] The disclosure also provides for a system comprising a computer readable medium for
`
`performing the following steps: selecting predefined regions in a genome; enumerating number of
`
`sequencereads in the predefined regions; normalizing the number of sequence reads acrossthe
`
`predefined regions; and determining percent of copy numbervariation in the predefined regions.
`
`In some embodiments,the entirety of the genomeorat least 10%, 20%, 30%, 40%, 50%, 60%,
`
`70%, 80%, or 90% of the genomeis analyzed. In some embodiments, computer readable medium
`
`provides data on percent cancer DNA or RNAin plasma or serum to the end user.
`
`_6-
`
`0006
`
`0006
`
`
`
`Atty Dockct No. 42534-704601
`
`[0036] In some embodiments, the amount of genetic variation, such as polymorphismsor causal
`
`variants is analyzed. In some embodiments, the presence or absenceof genetic alterations is
`
`detected.
`
`[0037] The disclosure also provides for a method for detecting a rare mutation in a cell-free or a
`
`substantially cell free sample obtained from a subject comprising: a) sequencing extracellular
`
`polynucleotides from a bodily sample from a subject, wherein each of the extracellular
`
`polynuclcotides gencrate a plurality of sequencing reads; b) filtcring out reads that fail to mect a
`
`set threshold; c) mapping sequence reads derived from the sequencing onto a reference sequence;
`
`d) identifying a subset of mapped sequencereads that align with a variant of the reference
`
`sequence at each mappable base position; e) for each mappable baseposition, calculating a ratio
`
`of (a) a number of mapped sequence readsthat include a variant as comparedto the reference
`
`sequence, to (b) a numberof total sequence reads for each mappable baseposition; f) normalizing
`
`the ratios or frequency of variance for each mappable base position and determining potential rare
`
`variant(s) or other genetic alteration(s); and g) comparing the resulting number for each of the
`
`regions
`
`[0038] This disclosure also provides for a method comprising: a. providing at lcast one sct of
`
`tagged parent polynucleotides, and for each set of tagged parent polynucleotides; b. amplifying
`
`the tagged parent polynucleotides in the set to produce a correspondingset of amplified progeny
`
`polynucleotides; c. sequencing a subset (including a proper subset) of the set of amplified progeny
`
`polynucleotides, to produce a set of sequencing reads; and d. collapsing the set of sequencing
`
`reads to generate a set of consensus sequences, each consensus sequence corresponding to a
`
`unique polynucleotide amongtheset of tagged parent polynucleotides. In certain embodiments
`
`the method further comprises: e. analyzing the set of consensus sequences for each set of tagged
`
`parent molecules.
`
`[0039] In some cmbodiments cach polynucleotide in a sct is mappable to a reference sequence.
`
`[0040] In some embodiments the method comprises providinga plurality of sets of tagged parent
`
`polynucleotides, wherein each set is mappable to a different reference sequence.
`
`[0041] In some embodiments the method further comprises converting initial starting genetic
`
`material into the tagged parent polynucleotides.
`
`[0042] In some embodiments the initial starting genetic material comprises no more than 100 ng
`
`of polynucleotides.
`
`-7-
`
`0007
`
`0007
`
`
`
`Atty Dockct No. 42534-704601
`
`[0043] In some embodiments the method comprises bottlenecking the initial starting genetic
`
`material prior to converting.
`
`[0044] In some embodiments the method comprises converting the initial starting genetic material
`
`into tagged parent polynucleotides with a conversion efficiency of at least 10%, at least 20%, at
`
`least 30%, at least 40%, at least 50%, at least 60%, at least 80% or at least 90%.
`
`[0045] In some embodiments converting comprises any of blunt-endligation, sticky end ligation,
`
`molecular inversion probes, PCR, ligation-bascd PCR,single strand ligation and single strand
`
`circularization.
`
`[0046] In some embodiments the initial starting genetic material is cell-free nucleic acid.
`
`[0047] In some embodiments a plurality of the reference sequences are from the same genome.
`
`[0048] In some embodiments each tagged parent polynucleotide in the set is uniquely tagged.
`
`[0049] In some embodiments the tags are non-unique.
`
`[0050] In some embodiments the generation of consensus sequencesis based on information from
`
`the tag and/orat least one of sequence information at the beginning(start) region of the sequence
`
`read, the end (stop) regions of the sequence read and the length of the sequenceread.
`
`[0051] In some cmbodiments the method comprises sequencing a subsct of the sct of amplificd
`
`progeny polynucleotides sufficient to produce sequencereads for at least one progeny from of
`
`each ofat least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least
`
`80%, at least 90% at least 95%, at least 98%, at least 99%, at least 99.9% or at least 99.99% of
`
`unique polynucleotides in the set of tagged parent polynucleotides.
`
`[0052] In some embodiments the at least one progenyis a plurality of progeny,e.g., at least 2, at
`
`least 5 or at least 10 progeny.
`
`[0053] In some embodiments the number of sequence reads in the set of sequence reads is greater
`
`than the numberof unique tagged parent polynucleotides in the set of tagged parent
`
`polynucleotides.
`
`[0054] In some embodiments the subset of the set of amplified progeny polynucleotides
`
`sequencedis of sufficient size so that any nucleotide sequence represented in the set of tagged
`
`parent polynucleotides at a percentage that is the same as the percentage per-base sequencing
`
`error rate of the sequencing platform used, has at least a 50%, at least a 60%, at least a 70%, at
`
`least a 80%, at least a 90% at least a 95%, at least a 98%, at least a 99%, at least a 99.9% or at
`
`least a 99.99% chance of being represented amongthe set of consensus sequences.
`
`_8-
`
`0008
`
`0008
`
`
`
`Atty Dockct No. 42534-704601
`
`[0055] In some embodiments the method comprises enriching the set of amplified progeny
`
`polynucleotides for polynucleotides mapping to one or more selected reference sequencesby:(i)
`
`selective amplification of sequences from initial starting genetic material converted to tagged
`
`parent polynucleotides; (11) selective amplification of tagged parent polynucleotides; (iii) selective
`
`sequence capture of amplified progeny polynucleotides; or (iv) selective sequence capture of
`
`initial starting genetic material.
`
`[0056] In some cmbodiments analyzing compriscs normalizing a measure (c.g., number) taken
`
`from a set of consensus sequences against a measure taken from a set of consensus sequences
`
`from a control sample.
`
`[0057] In some embodiments analyzing comprises detecting mutations, rare mutations, single
`
`nucleotide variants, indels, copy numbervariations, transversions, translocations, inversion,
`
`deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomalinstability, chromosomal
`
`structure alterations, gene fusions, chromosomefusions, gene truncations, gene amplification,
`
`gene duplications, chromosomal lesions, DNA lesions, abnormal changesin nucleic acid chemical
`
`modifications, abnormal changes in epigenetic patterns, abnormal changesin nucleic acid
`
`methylation infection or canccr.
`
`[0058] In some embodiments the polynucleotides comprise DNA, RNA, a combination of the two
`
`or DNA plus RNA-derived cDNA.
`
`[0059] In some embodiments a certain subset of polynucleotides is selected for or is enriched
`
`based on polynucleotide length in base-pairs from the initial set of polynucleotides or from the
`
`amplified polynucleotides.
`
`[0060] In some embodiments analysis further comprises detection and monitoring of an
`
`abnormality or disease within an individual, such as, infection and/or cancer.
`
`[0061] In some embodiments the method is performed in combination with immunerepertoire
`
`profiling.
`
`[0062] In some embodiments the polynucleotides are extract from the group consisting of blood,
`
`plasma, serum, urine, saliva, mucosal excretions, sputum, stool, and tears.
`
`[0063] In some embodiments collapsing comprising detecting and/or correcting errors, nicks or
`
`lesions present in the sense or anti-sense strand of the tagged parent polynucleotides or amplified
`
`progeny polynucleotides.
`
`[0064] This disclosure also provides for a method comprising detecting genetic variation in initial
`
`starting genetic material with a sensitivity of at least 5%, at least 1%, at least 0.5%, at least 0.1%
`
`-9-
`
`0009
`
`0009
`
`
`
`Atty Dockct No. 42534-704601
`
`or at least 0.05%. In some embodiments the initial starting genetic material is provided in an
`
`amount less than 100 ng of nucleic acid, the genetic variation is copy number/heterozygosity
`
`variation and detecting is performed with sub-chromosomalresolution; e.g., at least 100 megabase
`
`resolution, at least 10 megabase resolution, at least 1 megabase resolution, at least 100 kilobase
`
`resolution, at least 10 kilobase resolution or at least 1 kilobase resolution. In another embodiment
`
`the method comprises providing a plurality of sets of tagged parent polynucleotides, wherein each
`
`sect is mappable to a different reference sequence. In another embodiment the reference sequence
`
`is the locus of a tumor marker, and analyzing comprises detecting the tumor markerin the set of
`
`consensus sequences. In another embodiment the tumor marker is present in the set of consensus
`
`sequencesat a frequency less than the error rate introduced at the amplifying step. In another
`
`embodimentthe at least oneset is a plurality of sets, and the reference sequences comprise a
`
`plurality of reference sequences, each of which is the locus of a tumor marker. In another
`
`embodiment analyzing comprises detecting copy numbervariation of consensus sequences
`
`betweenat least two sets of parent polynucleotides. In another embodiment analyzing comprises
`
`detecting the presence of sequence variations compared with the reference sequences. In another
`
`embodiment analyzing comprises detecting the presence of sequence variations compared with
`
`the reference sequences and detecting copy numbervariation of consensus sequences between at
`
`least two sets of parent polynucleotides. In another embodimentcollapsing comprises: 1. grouping
`
`sequences reads sequenced from amplified progeny polynucleotides into families, each family
`
`amplified from the same tagged parent polynucleotide; and ii. determining a consensus sequence
`
`based on sequencereads in a family.
`
`[0065] This disclosure also provides for a system comprising a computer readable medium for
`
`performing the following steps: a. providing at least one set of tagged parent polynucleotides, and
`
`for each set of tagged parent polynucleotides; b. amplifying the tagged parent polynucleotides in
`
`the set to produce a corresponding sct of amplificd progeny polynucleotides; c. sequencing a
`
`subset (including a proper subset) of the set of amplified progeny polynucleotides, to produce a
`
`set of sequencing reads; and d. collapsing the set of sequencing reads to generate a set of
`
`consensus sequences, each consensus sequence corresponding to a unique polynucleotide among
`
`the set of tagged parent polynucleotides and, optionally, ce. analyzing the set of consensus
`
`sequencesfor each set of tagged parent molecules.
`
`[0066] This disclosure also provides a method comprising: a. providing at least one set of tagged
`
`parent polynucleotides, and for each set of tagged parent polynucleotides; b. amplifying the
`
`-10-
`
`0010
`
`0010
`
`
`
`Atty Dockct No. 42534-704601
`
`tagged parent polynucleotides in the set to produce a corresponding set of amplified progeny
`
`polynucleotides; c. sequencing a subset (including a propersubset) of the set of amplified progeny
`
`polynucleotides, to produce a set of sequencing reads; d. collapsing the set of sequencing reads to
`
`generate a set of consensus sequences, each consensus sequence corresponding to a unique
`
`polynucleotide amongthe set of tagged parent polynucleotides; and e. filtering out from among
`
`the consensus sequencesthose that fail to meet a quality threshold. In one embodimentthe
`
`quality threshold considers a number of sequence reads from amplified progeny polynucleotides
`
`collapsed into a consensus sequence. In another embodiment the quality threshold considers a
`
`number of sequence reads from amplified progeny polynucleotides collapsed into a consensus
`
`sequence. This disclosure also provides a system comprising a computer readable medium for
`
`performing the aforesaid method.
`
`[0067] This disclosure also provides a method comprising: a. providing at least one set of tagged
`
`parent polynucleotides, wherein each set maps to a different reference sequence in one or more
`
`genomes,and, for each set of tagged parent polynucleotides; i. amplifying the first
`
`polynucleotides to produce a set of amplified polynucleotides; ii. sequencing a subset ofthe set of
`
`amplificd polynucleotides, to produce a sct of sequencing reads; and iii. collapsing the sequence
`
`reads by: 1. grouping sequences reads sequenced from amplified progeny polynucleotides into
`
`families, each family amplified from the same tagged parent polynucleotide. In one embodiment
`
`collapsing further comprises: 2. determining a quantitative measure of sequence reads in each
`
`family. In another embodiment the method further comprises (including a) includinga): b.
`
`determining a quantitative measure of unique families; and c. based on (1) the quantitative
`
`measure of unique families and (2) the quantitative measure of sequence reads in each group,
`
`inferring a measure of unique tagged parent polynucleotides in the set. In another embodiment
`
`inferring is performed usingstatistical or probabilistic models. In another embodiment wherein
`
`the at lcast onc sct is a plurality of scts. In another cmbodiment the method further comprises
`
`correcting for amplification or representational bias between the two sets. In another embodiment
`
`the method further comprises using a control or set of control samples to correct for amplification
`
`or representational biases between the two sets. In another embodiment the method further
`
`comprises determining copy numbervariation betweenthe sets. In another embodiment the
`
`method further comprises (includinga, b, c): d. determining a quantitative measure of
`
`polymorphic forms amongthe families; and e. based on the determined quantitative measure of
`
`polymorphic forms, inferring a quantitative measure of polymorphic forms in the number of
`
`-ll-
`
`0011
`
`0011
`
`
`
`Atty Dockct No. 42534-704601
`
`inferred unique tagged parent polynucleotides. In another embodiment wherein polymorphic
`
`formsinclude but are not limited to: substitutions, insertions, deletions, inversions, microsatellite
`
`changes, transversions, translocations, fusions, methylation, hypermethylation,
`
`hyrdroxymethylation, acetylation, epigenetic variants, regulatory-associated variants or protein
`
`binding sites. In another embodiment wherein the sets derive from a common sample, the method
`
`further comprising: a. inferring copy numbervariation for the plurality of sets based on a
`
`comparison of the inferred numberof tagged parent polynucleotides in cach sct mapping to cach
`
`of a plurality of reference sequences. In another embodimentthe original number of
`
`polynucleotides in each set is further inferred. This disclosure also provides a system comprising
`
`a computer readable medium for performing the aforesaid methods.
`
`[0068] This disclosure also provides a method of determining copy numbervariation in a sample
`
`that includes polynucleotides, the method comprising: a. providing at least two sets of first
`
`polynucleotides, wherein each set maps to a different reference sequence in a genome, and, for
`
`eachset of first polynucleotides; i. amplifying the polynucleotides to produce a set of amplified
`
`polynucleotides; ii. sequencing a subset of the set of amplified polynucleotides, to producea set of
`
`sequencing reads; ili. grouping scquences reads sequenced from amplificd polynucleotides into
`
`families, each family amplified from the samefirst polynucleotide in the set; iv. inferring a
`
`quantitative measure of families in the set; b. determining copy numbervariation by comparing
`
`the quantitative measure of families in each set. This disclosure also provides a system
`
`comprising a computer readable medium for performing the aforesaid methods.
`
`[0069] This disclosure also provides a methodof inferring frequency of sequencecalls in a
`
`sample of polynucleotides comprising: a. providing at least oneset of first polynucleotides,
`
`wherein each set maps to a different reference sequence in one or more genomes, and, for each set
`
`of first polynucleotides; 1. amplifying the first polynucleotides to producea set of amplified
`
`polynucleotides; 11. sequencing a subsct of the sct of amplificd polynucleotides, to producea sct of
`
`sequencingreads; iii. grouping the sequencereadsinto families, cach family comprising sequence
`
`reads of amplified polynucleotides amplified from the samefirst polynucleotide; b. inferring, for
`
`each set of first polynucleotides, a call frequency for one or more bases in thesetoffirst
`
`polynucleotides, wherein inferring comprises: 1. assigning, for each family, confidence score for
`
`each of a plurality of calls, the confidence score taking into consideration a frequencyofthe call
`
`among members of the family; and ii. estimating a frequency of the one or morecalls taking into
`
`consideration the confidence scores of the one or more calls assigned to each family. This
`
`-12-
`
`0012
`
`0012
`
`
`
`Atty Dockct No. 42534-704601
`
`disclosure also provides a system comprising a computer readable medium for performing the
`
`aforesaid methods.
`
`[0070] This disclosure also provides a method of communicating sequence information about at
`
`least one individual polynucleotide molecule comprising: a. providing at least one individual
`
`polynucleotide molecule; b. encoding sequence information in the at least one individual
`
`polynucleotide molecule to produce a signal; c. passing at least part of the signal through a
`
`channel to produce a received signal comprising nucleotide sequence information about the at
`
`least one individual polynucleotide molecule, wherein the received signal comprises noise and/or
`
`distortion; d. decoding the received signal to produce a message comprising sequence information
`
`about the at least one individual polynucleotide molecule, wherein decoding reduces noise and/or
`
`distortion in the message; and e. providing the messageto a recipient. In one embodiment the
`
`noise comprises incorrect nucleotide calls. In another embodimentdistortion comprises uneven
`
`amplification of the individual polynucleotide molecule compared with other individual
`
`polynucleotide molecules. In another embodimentdistortion results from amplification or
`
`sequencing bias. In another embodimentthe at least one individual polynucleotide moleculeis a
`
`plurality of individual polynuclcotide molecules, and decoding produccs a message about cach
`
`molecule in the plurality. In another embodiment encoding comprises amplifying the at least
`
`individual polynucleotide molecule which has optionally been tagged, wherein the signal
`
`comprises a collection of amplified molecules. In another embodiment the channel comprises a
`
`polynucleotide sequencer andthe received signal comprises sequence reads ofa plurality of
`
`polynucleotides amplified from the at least one individual polynucleotide molecule. In another
`
`embodiment decoding comprises grouping sequence reads of amplified