`
`GUARDANT - EXHIBIT 2030
`Foundation Medicine, Inc. v. Guardant Health, Inc.
`IPR2019-00634
`
`
`
`Atty Docket No. 42534-704601
`
`sequence; d) quantifying/counting mapped reads in two or more predefined regions of the
`
`reference sequence; e) determining a copy number variation in one or more of the predefined
`
`regions by (i) normalizing the number of reads in the predefined regions to each other and/or the
`
`number of unique barcodes in the predefined regions to each other; and (ii) comparing the
`
`normalized numbers obtained in step (i) to normalized numbers obtained from a control sample.
`
`[0005] The disclosure also provides for a method for detecting a rare mutation in a cell—free or
`
`substantially cell free sample obtained from a subject comprising: a) sequencing extracellular
`
`polynucleotides from a bodily sample from a subject, wherein each of the extracellular
`
`polynucleotide generate a plurality of sequencing reads; b) sequencing extracellular
`
`polynucleotides from a bodily sample from a subject, wherein each of the extracellular
`
`polynucleotide generate a plurality of sequencing reads; sequencing extracellular polynucleotides
`
`from a bodily sample from a subject, wherein each of the extracellular polynucleotide generate a
`
`plurality of sequencing reads; e) filtering out reads that fail to meet a set threshold; d) mapping
`
`sequence reads derived from the sequencing onto a reference sequence; e) identifying a subset of
`
`mapped sequence reads that align with a variant of the reference sequence at each mappable base
`
`position; f) for each mappable base position, calculating a ratio of (a) a number of mapped
`
`sequence reads that include a variant as compared to the reference sequence, to (b) a number of
`
`total sequence reads for each mappable base position; g) normalizing the ratios or frequency of
`
`variance for each mappable base position and determining potential rare variant(s) or mutation(s);
`
`h) and comparing the resulting number for each of the regions with potential rare variant(s) or
`
`mutation(s) to similarly derived numbers from a reference sample.
`
`[0006] Additionally, the disclosure also provides for a method of characterizing the heterogeneity
`
`of an abnormal condition in a subject, the method comprising generating a genetic profile of
`
`extracellular polynucleotides in the subject, wherein the genetic profile comprises a plurality of
`
`data resulting from copy number variation and/or other rare mutation (e.g., genetic alteration)
`
`analyses.
`
`[0007] In some embodiments, the prevalence/concentration of each rare variant identified in the
`
`subject is reported and quantified simultaneously. In other embodiments, a confidence score,
`
`regarding the prevalence/concentrations of rare variants in the subject, is reported.
`
`[0008] In some embodiments, extracellular polynucleotides comprise DNA. In other
`
`embodiments, extracellular polynucleotides comprise RNA. Polynucleotides may be fragments or
`
`0002
`
`0002
`
`
`
`Atty Docket No. 42534-704601
`
`fragmented after isolation. Additionally, the disclosure provides for a method for circulating
`
`nucleic acid isolation and extraction.
`
`[0009] In some embodiments, extracellular polynucleotides are isolated from a bodily sample that
`
`may be selected from a group consisting of blood, plasma, serum, urine, saliva, mucosal
`
`excretions, sputum, stool and tears.
`
`[0010] In some embodiments, the methods of the disclosure also comprise a step of determining
`
`the percent of sequences having copy number variation or other rare genetic alteration (c.g.,
`
`sequence variants) in said bodily sample.
`
`[0011] In some embodiments, the percent of sequences having copy number variation in said
`
`bodily sample is determined by calculating the percentage of predefined regions with an amount
`
`of polynucleotides above or below a predetermined threshold.
`
`[0012] In some embodiments, bodily fluids are drawn from a subject suspected of having an
`
`abnormal condition which may be selected from the group consisting of, mutations, rare
`
`mutations, single nucleotide variants, indels, copy number variations, transversions,
`
`translocations, inversion, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal
`
`instability, chromosomal structure alterations, gcnc fusions, chromosome fusions, gcnc
`
`truncations, gene amplification, gene duplications, chromosomal lesions, DNA lesions, abnormal
`
`changes in nucleic acid chemical modifications, abnormal changes in epigenetic patterns,
`
`abnormal changes in nucleic acid methylation infection and cancer.
`
`[0013] In some embodiments, the subject may be a pregnant female in which the abnormal
`
`condition may be a fetal abnormality selected from the group consisting of, single nucleotide
`
`variants, indels, copy number variations, transversions, translocations, inversion, deletions,
`
`aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, chromosomal structure
`
`alterations, gene fitsions, chromosome fusions, gene truncations, gene amplification, gene
`
`duplications, chromosomal lesions, DNA lesions, abnormal changes in nucleic acid chemical
`
`modifications, abnormal changes in epigenetic patterns, abnormal changes in nucleic acid
`
`methylation infection and cancer
`
`[0014] In some embodiments, the method may comprise comprising attaching one or more
`
`barcodes t0 the extracellular polynucleotides or fragments thereof prior to sequencing, in which
`
`the barcodes comprise are unique. In other embodiments barcodes attached to extracellular
`
`polynucleotides or fragments thereof prior to sequencing are not unique.
`
`-3-
`
`0003
`
`0003
`
`
`
`Atty Docket No. 42534-704601
`
`[0015]
`
`In some embodiments, the methods of the disclosure may comprise selectively enriching
`
`regions from the subject’s genome or transcriptome prior to sequencing. In other embodiments
`
`the methods of the disclosure comprise selectively enriching regions from the subject’s genome or
`
`transcriptome prior to sequencing. In other embodiments the methods of the disclosure comprise
`
`non—selectively enriching regions from the subject’s genome or transcriptome prior to sequencing.
`
`[0016] Further, the methods of the disclosure comprise attaching one or more barcodes to the
`
`extracellular polynucleotides or fragments thereof prior to any amplification or enrichment step.
`
`[0017] In some embodiments, the barcode is a polynucleotide, which may further comprise
`
`random sequence or a fixed or semi-random set of oligonucleotides that in combination with the
`
`diversity of molecules sequenced from a select region enables identification of unique molecules
`
`and be at least a 3, 5, 10, 15, 20 25, 30, 35, 40, 45, or 50mer base pairs in length.
`
`[0018] In some embodiments, extracellular polynucleotides or fragments thereof may be
`
`amplified. In some embodiments amplification comprises global amplification or whole genome
`
`amplification.
`
`[0019] In some embodiments, sequence reads of unique identity may be detected based on
`
`sequence information at the beginning (start) and end (stop) regions of the sequence read and the
`
`length of the sequence read. In other embodiments sequence molecules of unique identity are
`
`detected based on sequence information at the beginning (start) and end (stop) regions of the
`
`sequence read, the length of the sequence read and attachment of a barcode.
`
`[0020] In some embodiments, amplification comprises selective amplification, non-selective
`
`amplification, suppression amplification or subtractive enrichment.
`
`[0021] In some embodiments, the methods of the disclosure comprise removing a subset of the
`
`reads from further analysis prior to quantifying or enumerating reads.
`
`[0022] In some embodiments, the method may comprise filtering out reads with an accuracy or
`
`quality score of less than a threshold, e.g., 90%, 99%, 99.9%, or 99.99% and/or mapping score
`
`less than a threshold, e.g., 90%, 99%, 99.9% or 99.99%. In other embodiments, methods of the
`
`disclosure comprise filtering reads with a quality score lower than a set threshold.
`
`[0023] In some embodiments, predefined regions are uniform or substantially uniform in size,
`
`about 10kb, 20kb, 30kb 40kb, 50kb, 60kb, 70kb, 80kb, 90kb, or 100kb in size. In some
`
`embodiments, at least 50, 100, 200, 500, 1000, 2000, 5000, 10,000, 20,000, or 50,000 regions are
`
`analyzed.
`
`0004
`
`0004
`
`
`
`Atty Docket No. 42534-704601
`
`[0024] In some embodiments, a genetic variant, rare mutation or copy number variation occurs in
`
`a region of the genome selected from the group consisting of gene fusions, gene duplications,
`
`gene deletions, gene translocations, microsatellite regions, gene fragments or combination thereof.
`
`In other embodiments a genetic variant, rare mutation, or copy number variation occurs in a
`
`region of the genome selected from the group consisting of genes, oncogenes, tumor suppressor
`
`genes, promoters, regulatory sequence elements, or combination thereof. In some embodiments
`
`the variant is a nucleotide variant, single base substitution, or small indel, transversion,
`
`translocation, inversion, deletion, truncation or gene truncation about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15
`
`or 20 nucleotides in length.
`
`[0025] In some embodiments, the method comprises correcting/normalizing/adjusting the
`
`quantity of mapped reads using the barcodes or unique properties of individual reads.
`
`[0026] In some embodiments, enumerating the reads is performed through enumeration of unique
`
`barcodes in each of the predefined regions and normalizing those numbers across at least a subset
`
`of predefined regions that were sequenced. In some embodiments, samples at succeeding time
`
`intervals from the same subject are analyzed and compared to previous sample results. The
`
`method of the disclosure may further comprise determining partial copy number variation
`
`frequency, loss of heterozygosity, gene expression analysis, epigenetic analysis and
`
`hypermethylation analysis after amplifying the barcode-attached extracellular polynu cleotid es.
`
`[0027] In some embodiments, copy number variation and rare mutation analysis is determined in
`
`a cell-free or substantially cell free sample obtained from a subject using multiplex sequencing,
`
`comprising performing over 10,000 sequencing reactions; simultaneously sequencing at least
`
`10,000 different reads; or performing data analysis on at least 10,000 different reads across the
`
`genome. The method may comprise multiplex sequencing comprising performing data analysis
`
`on at least 10,000 different reads across the genome. The method may further comprise
`
`enumerating sequenced reads that are uniquely identifiable.
`
`[0028] In some embodiments, the methods of the disclosure comprise normalizing and detection
`
`is performed using one or more of hidden markov, dynamic programming, support vector
`
`machine, Bayesian network, trellis decoding, Viterbi decoding, expectation maximization,
`
`Kalman filtering, or neural network methodologies.
`
`[0029] In some embodiments the methods of the disclosure comprise monitoring disease
`
`progression, monitoring residual disease, monitoring therapy, diagnosing a condition, prognosing
`
`a condition, or selecting a therapy based on discovered variants.
`
`-5-
`
`0005
`
`0005
`
`
`
`Atty Docket No. 42534-704601
`
`[0030] In some embodiments, a therapy is modified based on the most recent sample analysis.
`
`Further, the methods of the disclosure comprise inferring the genetic profile of a tumor, infection
`
`or other tissue abnormality. In some embodiments growth, remission or evolution of a tumor,
`
`infection or other tissue abnormality is monitored. In some embodiments the subj ect’s immune
`
`system are analyzed and monitored at single instances or over time.
`
`[0031] In some embodiments, the methods of the disclosure comprise identification of a variant
`
`that is followed up through an imaging test (e.g., CT, PET-CT, MRI, X-ray, ultrasound) for
`
`localization of the tissue abnormality suspected of causing the identified variant.
`
`[0032] In some embodiments, the methods of the disclosure comprise use of genetic data obtained
`
`from a tissue or tumor biopsy from the same patient. In some embodiments, whereby the
`
`phylo genetics of a tumor, infection or other tissue abnormality is inferred.
`
`[0033] In some embodiments, the methods of the disclosure comprise performing population-
`
`based no-calling and identification of low-confidence regions. In some embodiments, obtaining
`
`the measurement data for the sequence coverage comprises measuring sequence coverage depth at
`
`every position of the genome. In some embodiments correcting the measurement data for the
`
`sequence coverage bias comprises calculating window-averaged coverage. In some embodiments
`
`correcting the measurement data for the sequence coverage bias comprises performing
`
`adjustments to account for GC bias in the library construction and sequencing process. In some
`
`embodiments correcting the measurement data for the sequence coverage bias comprises
`
`performing adjustments based on additional weighting factor associated with individual mappings
`
`to compensate for bias.
`
`[0034] In some embodiments, the methods of the disclosure comprise extracellular polynucleotide
`
`derived from a diseased cell origin. In some embodiments, the extracellular polynucleotide is
`
`derived from a healthy cell origin.
`
`[0035] The disclosure also provides for a system comprising a computer readable medium for
`
`performing the following steps: selecting predefined regions in a genome; enumerating number of
`
`sequence reads in the predefined regions; normalizing the number of sequence reads across the
`
`predefined regions; and determining percent of copy number variation in the predefined regions.
`
`In some embodiments, the entirety of the genome or at least 10%, 20%, 30%, 40%, 50%, 60%,
`
`70%, 80%, or 90% of the genome is analyzed. In some embodiments, computer readable medium
`
`provides data on percent cancer DNA or RNA in plasma or serum to the end user.
`
`-6-
`
`0006
`
`0006
`
`
`
`Atty Docket No. 42534-704601
`
`[0036] In some embodiments, the amount of genetic variation, such as polymorphisms or causal
`
`variants is analyzed. In some embodiments, the presence or absence of genetic alterations is
`
`detected.
`
`[0037] The disclosure also provides for a method for detecting a rare mutation in a cell-free or a
`
`substantially cell free sample obtained from a subject comprising: a) sequencing extracellular
`
`polynucleotides from a bodily sample from a subject, wherein each of the extracellular
`
`polynucleotides generate a plurality of sequencing reads; b) filtering out reads that fail to meet a
`
`set threshold; 0) mapping sequence reads derived from the sequencing onto a reference sequence;
`
`d) identifying a subset of mapped sequence reads that align With a variant of the reference
`
`sequence at each mappable base position, e) for each mappable base position, calculating a ratio
`
`of (a) a number of mapped sequence reads that include a variant as compared to the reference
`
`sequence, to (b) a number of total sequence reads for each mappable base position; f) normalizing
`
`the ratios or frequency of variance for each mappable base position and determining potential rare
`
`variant(s) or other genetic alteration(s); and g) comparing the resulting number for each of the
`
`regions
`
`[0038] This disclosure also provides for a method comprising: a. providing at least one set of
`
`tagged parent polynucleotides, and for each set of tagged parent polynucleotides; b. amplifying
`
`the tagged parent polynucleotides in the set to produce a corresponding set of amplified progeny
`
`polynucleotides; c. sequencing a subset (including a proper subset) of the set of amplified progeny
`
`polynucleotides, to produce a set of sequencing reads; and d. collapsing the set of sequencing
`
`reads to generate a set of consensus sequences, each consensus sequence corresponding to a
`
`unique polynucleotide among the set of tagged parent polynucleotides. In certain embodiments
`
`the method fithher comprises: e. analyzing the set of consensus sequences for each set of tagged
`
`parent molecules.
`
`[0039] In some embodiments each polynucleotide in a set is mappable to a reference sequence.
`
`[0040] In some embodiments the method comprises providing a plurality of sets of tagged parent
`
`polynucleotides, wherein each set is mappable to a different reference sequence.
`
`[0041] In some embodiments the method further comprises converting initial starting genetic
`
`material into the tagged parent polynucleotides.
`
`[0042] In some embodiments the initial starting genetic material comprises no more than 100 ng
`
`of polynucleotides.
`
`0007
`
`0007
`
`
`
`Atty Docket No. 42534-704601
`
`[0043] In some embodiments the method comprises bottlenecking the initial starting genetic
`
`material prior to converting.
`
`[0044] In some embodiments the method comprises converting the initial starting genetic material
`
`into tagged parent polynucleotides with a conversion efficiency of at least 10%, at least 20%, at
`
`least 30%, at least 40%, at least 50%, at least 60%, at least 80% or at least 90%.
`
`[0045] In some embodiments converting comprises any of blunt—end ligation, sticky end ligation,
`
`molecular inversion probes, PCR, ligation-based PCR, single strand ligation and single strand
`
`circularization.
`
`[0046] In some embodiments the initial starting genetic material is cell-free nucleic acid.
`
`[0047] In some embodiments a plurality of the reference sequences are from the same genome.
`
`[0048] In some embodiments each tagged parent polynucleotide in the set is uniquely tagged.
`
`[0049] In some embodiments the tags are non-unique.
`
`[0050] In some embodiments the generation of consensus sequences is based on information from
`
`the tag and/or at least one of sequence information at the beginning (start) region of the sequence
`
`read, the end (stop) regions of the sequence read and the length of the sequence read.
`
`[0051] In some embodiments the method comprises sequencing a subset of the set of amplified
`
`progeny polynucleotides sufficient to produce sequence reads for at least one progeny from of
`
`each of at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least
`
`80%, at least 90% at least 95%, at least 98%, at least 99%, at least 99.9% or at least 99.99% of
`
`unique polynucleotides in the set of tagged parent polynucleotides.
`
`[0052] In some embodiments the at least one progeny is a plurality of progeny, e. g., at least 2, at
`
`least 5 or at least 10 progeny.
`
`[0053] In some embodiments the number of sequence reads in the set of sequence reads is greater
`
`than the number of unique tagged parent polynucleotides in the set of tagged parent
`
`polynucleotides.
`
`[0054] In some embodiments the subset of the set of amplified progeny polynucleotides
`
`sequenced is of sufficient size so that any nucleotide sequence represented in the set of tagged
`
`parent polynucleotides at a percentage that is the same as the percentage per-base sequencing
`
`error rate of the sequencing platform used, has at least a 50%, at least a 60%, at least a 70%, at
`
`least a 80%, at least a 90% at least a 95%, at least a 98%, at least a 99%, at least a 99.9% or at
`
`least a 99.99% chance of being represented among the set of consensus sequences.
`
`-8-
`
`0008
`
`0008
`
`
`
`Atty Docket No. 42534-704601
`
`[0055] In some embodiments the method comprises enriching the set of amplified progeny
`
`polynucleotides for polynucleotides mapping to one or more selected reference sequences by: (i)
`
`selective amplification of sequences from initial starting genetic material converted to tagged
`
`parent polynucleotides; (ii) selective amplification of tagged parent polynucleotides; (iii) selective
`
`sequence capture of amplified progeny polynucleotides; or (iv) selective sequence capture of
`
`initial starting genetic material.
`
`[0056] In some embodiments analyzing comprises normalizing a measure (e.g., number) taken
`
`from a set of consensus sequences against a measure taken from a set of consensus sequences
`
`from a control sample.
`
`[0057] In some embodiments analyzing comprises detecting mutations, rare mutations, single
`
`nucleotide variants, indels, copy number variations, transversions, translocations, inversion,
`
`deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, chromosomal
`
`structure alterations, gene fusions, chromosome fusions, gene truncations, gene amplification,
`
`gene duplications, chromosomal lesions, DNA lesions, abnormal changes in nucleic acid chemical
`
`modifications, abnormal changes in epigenetic patterns, abnormal changes in nucleic acid
`
`methylation infection or cancer.
`
`[0058] In some embodiments the polynucleotides comprise DNA, RNA, a combination of the two
`
`or DNA plus RNA-d erived cDNA.
`
`[0059] In some embodiments a certain subset of polynucleotides is selected for or is enriched
`
`based on polynucleotide length in base-pairs from the initial set of polynucleotides or from the
`
`amplified polynucleotides.
`
`[0060] In some embodiments analysis filrther comprises detection and monitoring of an
`
`abnormality or disease Within an individual, such as, infection and/or cancer.
`
`[0061] In some embodiments the method is performed in combination with immune repertoire
`
`profiling.
`
`[0062] In some embodiments the polynucleotides are extract from the group consisting of blood,
`
`plasma, serum, urine, saliva, mucosal excretions, sputum, stool, and tears.
`
`[0063] In some embodiments collapsing comprising detecting and/or correcting errors, nicks or
`
`lesions present in the sense or anti-sense strand of the tagged parent polynucleotides or amplified
`
`progeny polynucleotides.
`
`[0064] This disclosure also provides for a method comprising detecting genetic variation in initial
`
`starting genetic material with a sensitivity of at least 5%, at least 1%, at least 0.5%, at least 0.1%
`
`-9-
`
`0009
`
`0009
`
`
`
`Atty Docket No. 42534-704601
`
`or at least 0.05%. In some embodiments the initial starting genetic material is provided in an
`
`amount less than 100 ng of nucleic acid, the genetic variation is copy number/heterozygosity
`
`variation and detecting is performed with sub-chromosomal resolution; e. g., at least 100 megabase
`
`resolution, at least 10 megabase resolution, at least 1 megabase resolution, at least 100 kilobase
`
`resolution, at least 10 kilobase resolution or at least 1 kilobase resolution. In another embodiment
`
`the method comprises providing a plurality of sets of tagged parent polynucleotides, wherein each
`
`set is mappable to a different reference sequence. In another embodiment the reference sequence
`
`is the locus of a tumor marker, and analyzing comprises detecting the tumor marker in the set of
`
`consensus sequences. In another embodiment the tumor marker is present in the set of consensus
`
`sequences at a frequency less than the error rate introduced at the amplifying step. In another
`
`embodiment the at least one set is a plurality of sets, and the reference sequences comprise a
`
`plurality of reference sequences, each of which is the locus of a tumor marker. In another
`
`embodiment analyzing comprises detecting copy number variation of consensus sequences
`
`between at least two sets of parent polynucleotides. In another embodiment analyzing comprises
`
`detecting the presence of sequence variations compared with the reference sequences. In another
`
`embodiment analyzing comprises detecting the presence of sequence variations compared with
`
`the reference sequences and detecting copy number variation of consensus sequences between at
`
`least two sets of parent polynucleotides. In another embodiment collapsing comprises: i. grouping
`
`sequences reads sequenced from amplified progeny polynucleotides into families, each family
`
`amplified from the same tagged parent polynucleotide; and ii. determining a consensus sequence
`
`based on sequence reads in a family.
`
`[0065] This disclosure also provides for a system comprising a computer readable medium for
`
`performing the following steps: a. providing at least one set of tagged parent polynucleotides, and
`
`for each set of tagged parent polynucleotides; b. amplifying the tagged parent polynucleotides in
`
`the set to produce a corresponding set of amplified progeny polynucleotides; c. sequencing a
`
`subset (including a proper subset) of the set of amplified progeny polynucleotides, to produce a
`
`set of sequencing reads; and d. collapsing the set of sequencing reads to generate a set of
`
`consensus sequences, each consensus sequence corresponding to a unique polynucleotide among
`
`the set of tagged parent polynucleotides and, optionally, e. analyzing the set of consensus
`
`sequences for each set of tagged parent molecules.
`
`[0066] This disclosure also provides a method comprising: a. providing at least one set of tagged
`
`parent polynucleotides, and for each set of tagged parent polynucleotides; b. amplifying the
`
`-10-
`
`0010
`
`0010
`
`
`
`Atty Docket No. 42534-704601
`
`tagged parent polynucleotides in the set to produce a corresponding set of amplified progeny
`
`polynucleotides; C. sequencing a subset (including a proper subset) of the set of amplified progeny
`
`polynucleotides, to produce a set of sequencing reads; d. collapsing the set of sequencing reads to
`
`generate a set of consensus sequences, each consensus sequence corresponding to a unique
`
`polynucleotide among the set of tagged parent polynucleotides; and e. filtering out from among
`
`the consensus sequences those that fail to meet a quality threshold. In one embodiment the
`
`quality threshold considers a number of sequence reads from amplified progeny polynucleotides
`
`collapsed into a consensus sequence. In another embodiment the quality threshold considers a
`
`number of sequence reads from amplified progeny polynucleotides collapsed into a consensus
`
`sequence. This disclosure also provides a system comprising a computer readable medium for
`
`performing the aforesaid method.
`
`[0067] This disclosure also provides a method comprising: a. providing at least one set of tagged
`
`parent polynucleotides, wherein each set maps to a different reference sequence in one or more
`
`genomes, and, for each set of tagged parent polynucleotides; i. amplifying the first
`
`polynucleotides to produce a set of amplified polynucleotides; ii. sequencing a subset of the set of
`
`amplified polynucleotides, to produce a set of sequencing reads; and iii. collapsing the sequence
`
`reads by: l. grouping sequences reads sequenced from amplified progeny polynucleotides into
`
`families, each family amplified from the same tagged parent polynucleotide. In one embodiment
`
`collapsing further comprises: 2. determining a quantitative measure of sequence reads in each
`
`family. In another embodiment the method further comprises (including a) including a): b.
`
`determining a quantitative measure of unique families; and c. based on (1) the quantitative
`
`measure of unique families and (2) the quantitative measure of sequence reads in each group,
`
`inferring a measure of unique tagged parent polynucleotides in the set. In another embodiment
`
`inferring is performed using statistical or probabilistic models. In another embodiment wherein
`
`the at least one set is a plurality of sets. In another embodiment the method further comprises
`
`correcting for amplification or representational bias between the two sets. In another embodiment
`
`the method further comprises using a control or set of control samples to correct for amplification
`
`or representational biases between the two sets. In another embodiment the method further
`
`comprises determining copy number variation between the sets. In another embodiment the
`
`method further comprises (including a, b, c): d. determining a quantitative measure of
`
`polymorphic forms among the families; and e. based on the determined quantitative measure of
`
`polymorphic forms, inferring a quantitative measure of polymorphic forms in the number of
`
`-11-
`
`0011
`
`0011
`
`
`
`Atty Docket No. 42534-704601
`
`inferred unique tagged parent polynucleotides. In another embodiment wherein polymorphic
`
`forms include but are not limited to: substitutions, insertions, deletions, inversions, microsatellite
`
`changes, transversions, translocations, filsions, methylation, hypermethylation,
`
`hyrdroxymethylation, acetylation, epigenetic variants, regulatory-associated variants or protein
`
`binding sites. In another embodiment wherein the sets derive from a common sample, the method
`
`further comprising: a. inferring copy number variation for the plurality of sets based on a
`
`comparison of the inferred number of tagged parent polynucleotides in each set mapping to each
`
`of a plurality of reference sequences. In another embodiment the original number of
`
`polynucleotides in each set is further inferred. This disclosure also provides a system comprising
`
`a computer readable medium for performing the aforesaid methods.
`
`[0068] This disclosure also provides a method of determining copy number variation in a sample
`
`that includes polynucleotides, the method comprising: a. providing at least two sets of first
`
`polynucleotides, wherein each set maps to a different reference sequence in a genome, and, for
`
`each set of first polynucleotides; i. amplifying the polynucleotides to produce a set of amplified
`
`polynucleotides; ii. sequencing a subset of the set of amplified polynucleotides, to produce a set of
`
`sequencing reads; iii. grouping sequences reads sequenced from amplified polynucleotides into
`
`families, each family amplified from the same first polynucleotide in the set; iv. inferring a
`
`quantitative measure of families in the set; b. determining copy number variation by comparing
`
`the quantitative measure of families in each set. This disclosure also provides a system
`
`comprising a computer readable medium for performing the aforesaid methods.
`
`[0069] This disclosure also provides a method of inferring frequency of sequence calls in a
`
`sample of polynucleotides comprising: a. providing at least one set of first polynucleotides,
`
`wherein each set maps to a different reference sequence in one or more genomes, and, for each set
`
`of first polynucleotides; i. amplifying the first polynucleotides to produce a set of amplified
`
`polynucleotides; ii. sequencing a subset of the set of amplified polynucleotides, to produce a set of
`
`sequencing reads; iii. grouping the sequence reads into families, each family comprising sequence
`
`reads of amplified polynucleotides amplified from the same first polynucleotide; b. inferring, for
`
`each set of first polynucleotides, a call frequency for one or more bases in the set of first
`
`polynucleotides, wherein inferring comprises: i. assigning, for each family, confidence score for
`
`each of a plurality of calls, the confidence score taking into consideration a frequency of the call
`
`among members of the family; and ii. estimating a frequency of the one or more calls taking into
`
`consideration the confidence scores of the one or more calls assigned to each family. This
`
`-12-
`
`0012
`
`0012
`
`
`
`Atty Docket No. 42534-704601
`
`disclosure also provides a system comprising a computer readable medium for performing the
`
`aforesaid methods.
`
`[0070] This disclosure also provides a method of communicating sequence information about at
`
`least one individual polynucleotide molecule comprising: a. providing at least one individual
`
`polynucleotide molecule; b. encoding sequence information in the at least one individual
`
`polynucleotide molecule to produce a signal; 0. passing at least part of the signal through a
`
`channel to produce a received signal comprising nucleotide sequence information about the at
`
`least one individual polynucleotide molecule, wherein the received signal comprises noise and/or
`
`distortion; (1. decoding the received signal to produce a message comprising sequence information
`
`about the at least one individual polynucleotide molecule, wherein decoding reduces noise and/or
`
`distortion in the message; and e. providing the message to a recipient. In one embodiment the
`
`noise comprises incorrect nucleotide calls. In another embodiment distortion comprises uneven
`
`amplification of the individual polynucleotide molecule compared with other individual
`
`polynucleotide molecules. In another embodiment distortion results from amplification or
`
`sequencing bias. In another embodiment the at least one individual polynucleotide molecule is a
`
`plurality of individual polynucleotide molecules, and decoding produces a message about each
`
`molecule in the plurality. In another embodiment encoding comprises amplifying the at least
`
`individual polynucleotide molecule which has optionally been tagged, wherein the signal
`
`comprises a collection of amplified molecules. In another embodiment the channel comprises a
`
`polynucleotide sequencer and the received signal comprises sequence reads of a plurality of
`
`