`(12) Patent Application Publication (10) Pub. No.: US 2015/0368708 A1
`TALASAZ
`
`(43) Pub. Date: Dec. 24, 2015
`
`US 20150368708A1
`
`(54)
`
`(71)
`
`(72)
`
`SYSTEMS AND METHODS T0 DETECT
`RARE MUTATIONS AND COPY NUMBER
`VARIATION
`
`Applicant: GAURDANT HEALTH, INC.,
`Redwood City, CA (US)
`
`Inventor: AmirAli TALASAZ, Menlo Park, CA
`(1J3)
`
`(21)
`
`Appl. No.:
`
`14/425,189
`
`(22)
`
`PCT Filed:
`
`Sep. 4, 2013
`
`PCT No.:
`
`PCT/US13/58061
`
`(86)
`
`§371(CX1%
`(2) Date:
`
`Mar. 2, 2015
`
`Related US. Application Data
`
`(60)
`
`Provisional application No. 61/696,734, filed on Sep.
`4, 2012, provisional application No. 61/704,400, filed
`on Sep. 21, 2012, provisional application No. 61/793,
`997, filed on Mar. 15, 2013, provisional application
`No. 61/845,987, filed on Jul. 13,2013.
`
`Publication Classification
`
`(51)
`
`Int. Cl.
`C12Q 1/68
`G06F 19/22
`(52) us. Cl.
`CPC ............ C12Q 1/68 74 (2013.01); C12Q 1/6806
`(2013.01); G06F 19/22 (2013.01)
`
`(2006.01)
`(2006.01)
`
`(57)
`
`ABSTRACT
`
`The present disclosure provides a system and method for the
`detection ofrare mutations and copy number variations in cell
`free polynucleotides. Generally, the systems and methods
`comprise sample preparation, or the extraction and isolation
`of cell free polynucleotide sequences from a bodily fluid;
`subsequent sequencing of cell free polynucleotides by tech-
`niques known in the art; and application of bioinforrnatics
`tools to detect rare mutations and copy number variations as
`compared to a reference. The systems and methods also may
`contain a database or collection of different rare mutations or
`
`copy number variation profiles of different diseases, to be
`used as additional references in aiding detection ofrare muta-
`tions, copy number variation profiling or general genetic pro-
`filing of a disease.
`
`PGDX EX. 1015
`
`Page 1 of 51
`
`PGDX EX. 1015
`Page 1 of 51
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 1 of 16
`
`US 2015/0368708 A1
`
`Extract and isolate cell free polynucleotides
`from bodily fluid
`
`Obtain sequencing data covering cell free
`polynucleotides
`
`102
`
`104
`
`Map sequence reads to a reference genome and
`determine number of reads for each mappable
`position in a plurality of chromosomal regions
`
`106
`
`108
`
`100
`
`Divide each ofthe chromosomal regions into
`windows or bins and determine number of reads
`
`per window
`
`Normalize the sequence reads per window and
`correct for bias
`
`110
`
`
`
`Use a stochastic or statistical algorithm to
`convert the number of sequence reads per
`window into discrete copy number states
`
`Generate report identifying genomic positions
`with copy number variations
`
`Fig. 1
`
`112
`
`114
`
`PGDX EX. 1015
`
`Page 2 of 51
`
`PGDX EX. 1015
`Page 2 of 51
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 2 of 16
`
`US 2015/0368708 A1
`
`Extract and isolate cell free polynucleotides
`from bodily fluid for both a subject and a control
`subject
`
`Obtain sequencing data covering cell free
`polynucleotides for both subject and control
`
`Map sequence reads in subject to control and
`determine number of reads for each mappable
`position in a plurality of chromosomal regions
`
`202
`
`204
`
`206
`
`200
`
`Divide each of the chromosomal regions into
`Windows or bins and determine number of reads
`
`per Window
`
`208
`
`Normalize the sequence reads per Window and
`correct for bias
`
`210
`
`212
`
`Use a stochastic or statistical algorithm to
`convert the number of sequence reads per
`Window into discrete copy number states
`
`214
`
`Generate report identifying genomic positions
`with copy number variations in relationship to
`control
`
`Fig. 2
`
`PGDX EX. 1015
`
`Page 3 of 51
`
`PGDX EX. 1015
`Page 3 of 51
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 3 of 16
`
`US 2015/0368708 A1
`
`Extract and isolate cell free polynucleotides
`from bodily fluid for both a subject and a control
`subject
`
`Obtain sequencing data covering cell free
`polynucleotides for both subject and control or
`reference
`
`Map sequence reads in subject to control and
`determine number of reads for each mappable
`position
`
`300
`
`302
`
`304
`
`306
`
`308
`
`Calculate the fi'equency of variant bases as the
`number of reads containing the variant divided
`by the total reads
`
`Analyze all four nucleotides for each mappable
`position in cell free polynucleotide
`
`310
`
`Use a stochastic or statistical algorithm to
`convert frequency of variance per each base into
`discrete variant states for each base position
`
`312
`
`314
`
`Generate report identifying base variants or rare
`mutations with largest deviation(s) for each
`base position With respect to reference or control
`
`Fig. 3
`
`PGDX EX. 1015
`
`Page 4 of 51
`
`PGDX EX. 1015
`Page 4 of 51
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 4 of 16
`
`US 2015/0368708 A1
`
`mafia
`
`Em
`
`SEQ
`
`Em
`
`«fix
`
`\ESQiwk
`g?mi§33
`
`.0
`\3
`
`$3
`
`
`
` \x\x\“V“.u1“
`o~\\\
`
`$$
`\\\
`
`ES:\
`
`Mum
`
`m
`
`mm
`
`mm
`
`vE5$5
`Xv{XVfi
`
`»
`
`\\\
`
`“\w
`
`V\%
`S.\\
`
`‘3FE
`
`Q».
`
`8x\an.
`
`Pmstaie
`
`Cancer Pat
`
`gen: 1
`
`Fig. 4
`
`PGDX EX. 1015
`
`Page 5 0f 51
`
`PGDX EX. 1015
`Page 5 of 51
`
`
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 5 0f 16
`
`US 2015/0368708 A1
`
`13105811": is canneci is 1111811181
`
`Chip having array of microweim
`Sequ-enting
`far sequencing {8391:1038
`apgaratu 5
`rum:
`
`01
`‘2” (a x‘\
`
`7 ,,,,,,,,,,,
`/ 1Q:
`Software
`
`
`
`'
`
`a
`*7
`
`‘
`
`.
`
`&\\M\mmwgwi LMm
`
`1
`
`Pn‘n‘
`
`
`
`Handheld dewiee {ca prmide
`sequencing minim 31%an to
`remote user
`
`C
`
`C U'mputef sysiem
`
`Fig. 4
`
`PGDX EX. 1015
`
`Page 6 0f 51
`
`PGDX EX. 1015
`Page 6 of 51
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 6 0f 16
`
`US 2015/0368708 A1
`
`
`
`unnWmer.m-4wrw~wxwuyv~ugum-rmaMLW-uflWww-v-wqww,-\MwM«—»«—,v-mw_~fvvrmmwawfim-y—uaww".Many-1..
`
`4/////xMIM/MIM/MIMIMIM/w/M/M/fld
`
`iiof{3933365
`
`Fmsmm Cgmm Patimt 2?;
`
`\mm\mm\wm\mm\wm\mm\mm\wm\mm\wm\mmmewmmwwmwwmwwmmwwmwwmww ‘
`
`
`‘f:fuA1%G.0g)i{“3is
`
`W,”WWW,WW,,”W”,,,.y,,,,,,,,,.y,,,,,,,,
`
`
`
`xW31M“aWWHWMWWiWFWE‘WWW‘WwWWWNWWWAW
`
`u\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
`W
`
`
`
`9535:5239: {Jamar gatésent 3
`
`Fig. 5
`
`PGDX EX. 1015
`
`Page 7 0f 51
`
`PGDX EX. 1015
`Page 7 of 51
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 7 0f 16
`
`US 2015/0368708 A1
`
`’0’$WWWW\\
`
`TP53 7578552
`
`\
`§
`
`100.00%
`
`10.00%
`
`1.00%
`
`A
`
`B
`
`0.10%
`0.10%
`
`-
`1.00%
`
`~E
`100.00%
`
`10.00%
`
`Fig. 6
`
`
`
`
`\\\‘ TP53
`
`§§ HRAS
`
`.-$1. M ET
`
`PGDX EX. 1015
`
`Page 8 0f 51
`
`PGDX EX. 1015
`Page 8 of 51
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 8 of 16
`
`US 2015/0368708 A1
`
`
`
`\\\\
`W\WWWWW\
`x}
`\\\\
`§
`
`TP537578411
`
`wwWWWWW\
`
`P|K3CA 178952189
`
`1717.
`
`
`
`
`
`
`
`N\\\\W\\\\m
`~.\\“
`mummmw
`
`\l
`
`/l/V/I/I/IA’A’A’flM)7x7fl/V/VllllllA’A’A’A’M)7)7fl/V/Vllllllflflflflflwflflflflflllllllflflfllfl/
`
`Ill)?//////////////I]”II/ll)?!”////////////fl/J/I/I/I/IM/lflflflfl/l/l/J/I/I/I/IM/l/lflfl
`
`5%
`
`
`&5%“f
`
`15% E
`
`
`
`/”
`\\C)%
`
`.I/”III/.1
`’41
`«Ir,’41,
`\\\“
`wam\mmmmwm\mmmmmm\\mmm®
`
`Chip 1‘uar.<’i:ngz «may of mi-crewgeiii's
`
`fer ssqgue'ncziag: {Eafiii n3
`Modem in surmise! to imminet
`
`
`Sequencing
`apparatus
`
`U581
`
`
`
`
`
` ‘1IIIIla"
`
`
`i
`“x,
`I I.
`"fit-an
`Sammie
`
`
`
`/
`
`
`Randi: Edd fievice m prmide
`saga-emifig inimm aim to
`rem 019 user
`
`
`|
`
`Cu 83 purer 33:31am.
`
`B
`
`Fig. 7
`
`PGDX EX. 1015
`
`Page 9 0f 51
`
`PGDX EX. 1015
`Page 9 of 51
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 9 of 16
`
`US 2015/0368708 A1
`
`Provide initial starting genetic material
`
`802
`
`Convert polynucleotides from initial starting
`genetic material into tagged parent
`
`polynucleotides
`
`804
`
`Amplify tagged parent polynucleotides to
`produce amplified progeny polynucleotides
`
`806
`
`800
`
`Sequence a subset of amplified progeny
`polynucleotides to produce sequence reads
`
`808
`
`Collapse sequence reads into set of consensus
`sequences of unique tagged parent
`
`polynucleotides
`
`810
`
`Analyze set of consensus sequences
`
`812
`
`Fig. 8
`
`PGDX EX. 1015
`
`Page 10 0f 51
`
`PGDX EX. 1015
`Page 10 of 51
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 10 of 16
`
`US 2015/0368708 A1
`
`Provide initial starting genetic material
`
`902
`
`904
`
`Convert polynucleotides from initial starting
`genetic material into tagged parent
`polynucleotides
`
`Amplify tagged parent polynucleotides to
`produce amplified progeny polynucleotides
`
`906
`
`900
`
`Sequence a subset of amplified progeny
`polynucleotides to produce sequence reads
`
`Group sequence reads into families, each family
`generated from a unique tagged parent
`polynucleotide
`
`distortion compared with sequence reads
`
`Produce representation of information in tagged
`parent polynucleotides and/or initial starting
`genetic material with reduced noise and/or
`
`908
`
`91 0
`
`912
`
`Fig. 9
`
`PGDX EX. 1015
`
`Page 11 0f 51
`
`PGDX EX. 1015
`Page 11 of 51
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 11 of 16
`
`US 2015/0368708 A1
`
`Provide initial starting genetic material
`
`1002
`
`1004
`
`Convert polynucleotides from initial starting
`genetic material into tagged parent
`polynucleotides
`
`Amplify tagged parent polynucleotides to
`
`produce amplified progeny polynucleotides \ 1 006
`
`Sequence a subset of amplified progeny
`ol
`1
`t'd
`t
`d
`d
`p ynuc e0 1 es 0 pro uce sequence rea s \ 1008
`
`
`
`1000
`
`Group sequence reads into families, each family
`generated from a unique tagged parent
`polynucleotide
`
`\
`
`1010
`
`1012
`
`Determine quantitative measure of (c.g., count
`number of) families mapping to each of a
`plurality of reference loci; optionally quantify
`sequence reads in each family
`
`Infer quantities of 1mique tagged parent
`polynucleotides mapping to each locus
`based on quantity of families at leach
`locus and quantity of sequence reads in
`
`each family
`
`1014
`
`Determine CNV based on
`quantity of families mapping to
`each reference locus
`
`locus
`
`Determine CNV based on quantity
`of inferred unique tagged parent
`polynucleotides mapping to each
`
`1016a
`
`1016b
`
`Fig. 10
`
`PGDX EX. 1015
`
`Page 12 of 51
`
`PGDX EX. 1015
`Page 12 of 51
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 12 of 16
`
`US 2015/0368708 A1
`
`Provide initial starting genetic material
`
`Convert polynucleotides from initial starting
`
`polynucleotides
`
`genetic material into tagged parent
`
`Amplify tagged parent polynucleotides to
`produce amplified progeny polynucleotides
`
`Sequence a subset of amplified progeny
`polynucleotides to produce sequence reads
`
`1100
`
`Group sequence reads into families, each family
`generated from a unique tagged parent
`polynucleotide
`
`
`
`At a selected locus (nucleotide or sequence of
`nucleotides) assign, for each family, a
`confidence score for each of one or more bases
`
`or sequence of bases
`
`Infer the frequency of each of one or more bases
`or sequence of bases at the locus in the set of
`tagged parent polynucleotides based on the
`confidence scores among the families
`
`1102
`
`1104
`
`1106
`
`1108
`
`1110
`
`1112
`
`1114
`
`Fig. 11
`
`PGDX EX. 1015
`
`Page 13 of 51
`
`PGDX EX. 1015
`Page 13 of 51
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 13 of 16
`
`US 2015/0368708 A1
`
`Provide at least one individual polynucleotide
`molecule
`
`1202
`
`1204
`
`Encode sequence information in the at least one
`individual polynucleotide molecule to produce a
`signal
`
`Pass at least part of the signal through a channel
`to produce a received signal comprising
`nucleotide sequence information about the at
`least one individual polynucleotide molecule,
`wherein the received signal comprises noise
`
`1200
`
`1206
`
`1208
`
`and/or distortion
`1210
`
`decoding the received signal to produce a
`message comprising sequence information about
`the at least one individual polynucleotide
`molecule, wherein decoding reduces noise
`and/or distortion in the message
`
`Group sequence reads into families, each family
`generated from a unique tagged parent
`polynucleotide
`
`Provide the message to a recipient
`
`1212
`
`Fig. 12
`
`PGDX EX. 1015
`
`Page 14 of 51
`
`PGDX EX. 1015
`Page 14 of 51
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 14 0f 16
`
`US 2015/0368708 A1
`
`C:
`
`M \
`
`w
`
`W J1
`
`or;1
`
`~11.
`
`at}
`
`is
`
`\
`
`Q
`
`%
`
`Q
`
`be
`
`$3
`
`wumxmwu"zucuzcmmsnvaummwm---~.- -:<wo.~:- '1"::<u<m\\\.n: --mm:mnumumwma: 'it:(«\\\\\\3':".'.1<I::::z\\\: w:-:-::acmuwuwm::am~«~::-:--.>r.-:c:m\\mv:- 'XZ:£:£:\\'\\: w
`
`G
`
`i (H.100
`
`201300
`
`30060
`
`«wane
`
`50000
`
`60000
`
`R1000
`
`Fig. 138
`
`Fig. 13
`
`PGDX EX. 1015
`
`Page 15 0f 51
`
`PGDX EX. 1015
`Page 15 of 51
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 15 0f 16
`
`US 2015/0368708 A1
`
`1.139%
`
`mag-I21»
`
`mam»
`
`Percentage 13f spikefi ,mCaP
`
`Fig. 14
`
`PGDX EX. 1015
`
`Page 16 0f 51
`
`ififlfifi‘fi
`
`.
`..
`V
`19.131111
`
`m ‘
`
`53?
`g
`it"?
`3.3
`\{3
`g:
`;#
`
`“.5,
`
`9.19%
`13.113516:
`
`
`
`.
`z
`% Mugging-us
`
`PGDX EX. 1015
`Page 16 of 51
`
`
`
`Patent Application Publication
`
`Dec. 24, 2015 Sheet 16 0f 16
`
`US 2015/0368708 A1
`
`1501
`
`1520
`
`1525
`
`1515
`
`1510
`
`1530
`
`Fig. 15
`
`PGDX EX. 1015
`
`Page 17 0f 51
`
`PGDX EX. 1015
`Page 17 of 51
`
`
`
`US 2015/0368708 A1
`
`Dec. 24, 2015
`
`SYSTEMS AND METHODS TO DETECT
`RARE MUTATIONS AND COPY NUMBER
`VARIATION
`
`CROSS-REFERENCE
`
`[0001] This application claims priority to US. Provisional
`Patent Application No. 61/696,734, filed Sep. 4, 2012, US.
`Provisional Patent Application No. 61/704,400, filed Sep. 21,
`2012, US. Provisional Patent Application No. 61/793,997,
`filed Mar. 15, 2013, and US. Provisional Patent Application
`No. 61/845,987, filed Jul. 13, 2013, each ofwhich is entirely
`incorporated herein by reference for all purposes.
`
`BACKGROUND OF THE INVENTION
`
`[0002] The detection and quantification of polynucleotides
`is important for molecular biology and medical applications
`such as diagnostics. Genetic testing is particularly useful for
`a number of diagnostic methods. For example, disorders that
`are caused by rare genetic alterations (e. g., sequence variants)
`or changes in epigenetic markers, such as cancer and partial
`or complete aneuploidy, may be detected or more accurately
`characterized with DNA sequence information.
`[0003] Early detection and monitoring of genetic diseases,
`such as cancer is often useful and needed in the successful
`
`treatment or management of the disease. One approach may
`include the monitoring of a sample derived from cell free
`nucleic acids, a population of polynucleotides that can be
`found in different types of bodily fluids. In some cases, dis-
`ease may be characterized or detected based on detection of
`genetic aberrations, such as a change in copy number varia-
`tion and/or sequence variation of one or more nucleic acid
`sequences, or the development of other certain rare genetic
`alterations. Cell free DNA (“chN ”) has been known in the
`art for decades, and may contain genetic aberrations associ-
`ated with a particular disease. With improvements in
`sequencing and techniques to manipulate nucleic acids, there
`is a need in the art for improved methods and systems for
`using cell free DNA to detect and monitor disease.
`
`SUMMARY OF THE INVENTION
`
`[0004] The disclosure provides for a method for detecting
`copy number variation comprising: a) sequencing extracellu-
`lar polynucleotides from a bodily sample from a subject,
`wherein each of the extracellular polynucleotide are option-
`ally attached to unique barcodes; b) filtering out reads that fail
`to meet a set threshold; c) mapping sequence reads obtained
`from step (a) to a reference sequence; d) quantifying/counting
`mapped reads in two or more predefined regions of the refer-
`ence sequence; e) determining a copy number variation in one
`or more of the predefined regions by (i) normalizing the
`number ofreads in the predefined regions to each other and/or
`the number of unique barcodes in the predefined regions to
`each other; and (ii) comparing the normalized numbers
`obtained in step (i) to normalized numbers obtained from a
`control sample.
`[0005] The disclosure also provides for a method for
`detecting a rare mutation in a cell-free or substantially cell
`free sample obtained from a subject comprising: a) sequenc-
`ing extracellular polynucleotides from a bodily sample from
`a subject, wherein each of the extracellular polynucleotide
`generate a plurality of sequencing reads; b) sequencing extra-
`cellular polynucleotides from a bodily sample from a subject,
`wherein each of the extracellular polynucleotide generate a
`
`plurality of sequencing reads; sequencing extracellular poly-
`nucleotides from a bodily sample from a subject, wherein
`each of the extracellular polynucleotide generate a plurality
`of sequencing reads; c) filtering out reads that fail to meet a set
`threshold; d) mapping sequence reads derived from the
`sequencing onto a reference sequence; e) identifying a subset
`of mapped sequence reads that align with a variant of the
`reference sequence at each mappable base position; f) for
`each mappable base position, calculating a ratio of (a) a
`number of mapped sequence reads that include a variant as
`compared to the reference sequence, to (b) a number of total
`sequence reads for each mappable base position; g) normal-
`izing the ratios or frequency of variance for each mappable
`base position and determining potential rare variant(s) or
`mutation(s); h) and comparing the resulting number for each
`of the regions with potential rare variant(s) or mutation(s) to
`similarly derived numbers from a reference sample.
`[0006] Additionally,
`the disclosure also provides for a
`method of characterizing the heterogeneity of an abnormal
`condition in a subject, the method comprising generating a
`genetic profile of extracellular polynucleotides in the subject,
`wherein the genetic profile comprises a plurality of data
`resulting from copy number variation and/or other rare muta-
`tion (e.g., genetic alteration) analyses.
`[0007]
`In some embodiments, the prevalence/concentra-
`tion of each rare variant identified in the subject is reported
`and quantified simultaneously. In other embodiments, a con-
`fidence score, regarding the prevalence/concentrations ofrare
`variants in the subject, is reported.
`[0008]
`In some embodiments, extracellular polynucle-
`otides comprise DNA. In other embodiments, extracellular
`polynucleotides comprise RNA. Polynucleotides may be
`fragments or fragmented after isolation. Additionally, the
`disclosure provides for a method for circulating nucleic acid
`isolation and extraction.
`
`In some embodiments, extracellular polynucle-
`[0009]
`otides are isolated from a bodily sample that may be selected
`from a group consisting of blood, plasma, serum, urine,
`saliva, mucosal excretions, sputum, stool and tears.
`[0010]
`In some embodiments, the methods of the disclo-
`sure also comprise a step of determining the percent of
`sequences having copy number variation or other rare genetic
`alteration (e.g., sequence variants) in said bodily sample.
`[0011]
`In some embodiments, the percent of sequences
`having copy number variation in said bodily sample is deter-
`mined by calculating the percentage of predefined regions
`with an amount of polynucleotides above or below a prede-
`termined threshold.
`
`In some embodiments, bodily fluids are drawn from
`[0012]
`a subject suspected of having an abnormal condition which
`may be selected from the group consisting of, mutations, rare
`mutations, single nucleotide variants, indels, copy number
`variations, transversions, translocations, inversion, deletions,
`aneuploidy, partial aneuploidy, polyploidy, chromosomal
`instability, chromosomal structure alterations, gene fusions,
`chromosome fusions, gene truncations, gene amplification,
`gene duplications, chromosomal
`lesions, DNA lesions,
`abnormal changes in nucleic acid chemical modifications,
`abnormal changes in epigenetic patterns, abnormal changes
`in nucleic acid methylation infection and cancer.
`[0013]
`In some embodiments, the subject may be a preg-
`nant female in which the abnormal condition may be a fetal
`abnormality selected from the group consisting of, single
`nucleotide variants, indels, copy number variations, transver-
`
`PGDX EX. 1015
`
`Page 18 of 51
`
`PGDX EX. 1015
`Page 18 of 51
`
`
`
`US 2015/0368708 A1
`
`Dec. 24, 2015
`
`sions, translocations, inversion, deletions, aneuploidy, partial
`aneuploidy, polyploidy, chromosomal instability, chromo-
`somal
`structure alterations, gene fusions, chromosome
`fusions, gene truncations, gene amplification, gene duplica-
`tions, chromosomal lesions, DNA lesions, abnormal changes
`in nucleic acid chemical modifications, abnormal changes in
`epigenetic patterns, abnormal changes in nucleic acid methy-
`lation infection and cancer
`
`In some embodiments, the method may comprise
`[0014]
`comprising attaching one or more barcodes to the extracellu-
`lar polynucleotides or fragments thereof prior to sequencing,
`in which the barcodes comprise are unique. In other embodi-
`ments barcodes attached to extracellular polynucleotides or
`fragments thereof prior to sequencing are not unique.
`[0015]
`In some embodiments, the methods of the disclo-
`sure may comprise selectively enriching regions from the
`subject’s genome or transcriptome prior to sequencing. In
`other embodiments the methods of the disclosure comprise
`selectively enriching regions from the subject’s genome or
`transcriptome prior to sequencing. In other embodiments the
`methods ofthe disclosure comprise non-selectively enriching
`regions from the subject’s genome or transcriptome prior to
`sequencing.
`[0016]
`Further, the methods of the disclosure comprise
`attaching one or more barcodes to the extracellular poly-
`nucleotides or fragments thereof prior to any amplification or
`enrichment step.
`[0017]
`In some embodiments, the barcode is a polynucle-
`otide, which may further comprise random sequence or a
`fixed or semi-random set of oligonucleotides that in combi-
`nation with the diversity of molecules sequenced from a
`select region enables identification of unique molecules and
`be at least a 3, 5, 10, 15, 20 25, 30, 35, 40, 45, or 50mer base
`pairs in length.
`[0018]
`In some embodiments, extracellular polynucle-
`otides or fragments thereof may be amplified.
`In some
`embodiments amplification comprises global amplification
`or whole genome amplification.
`[0019]
`In some embodiments, sequence reads of unique
`identity may be detected based on sequence information at
`the beginning (start) and end (stop) regions of the sequence
`read and the length of the sequence read. In other embodi-
`ments sequence molecules of unique identity are detected
`based on sequence information at the beginning (start) and
`end (stop) regions of the sequence read, the length of the
`sequence read and attachment of a barcode.
`[0020]
`In some embodiments, amplification comprises
`selective amplification, non-selective amplification, suppres-
`sion amplification or subtractive enrichment.
`[0021]
`In some embodiments, the methods of the disclo-
`sure comprise removing a subset of the reads from further
`analysis prior to quantifying or enumerating reads.
`[0022]
`In some embodiments, the method may comprise
`filtering out reads with an accuracy or quality score of less
`than a threshold, e.g., 90%, 99%, 99.9%, or 99.99% and/or
`mapping score less than a threshold, e.g., 90%, 99%, 99.9%
`or 99.99%. In other embodiments, methods of the disclosure
`comprise filtering reads with a quality score lower than a set
`threshold.
`
`In some embodiments, predefined regions are uni-
`[0023]
`form or substantially uniform in size, about 10 kb, 20 kb, 30
`kb 40 kb, 50 kb, 60 kb, 70 kb, 80 kb, 90 kb, or 100 kb in size.
`In some embodiments, at least 50, 100, 200, 500, 1000, 2000,
`5000, 10,000, 20,000, or 50,000 regions are analyzed.
`
`In some embodiments, a genetic variant, rare muta-
`[0024]
`tion or copy number variation occurs in a region of the
`genome selected from the group consisting of gene fusions,
`gene duplications, gene deletions, gene translocations, mic-
`rosatellite regions, gene fragments or combination thereof. In
`other embodiments a genetic variant, rare mutation, or copy
`number variation occurs in a region of the genome selected
`from the group consisting of genes, oncogenes, tumor sup-
`pressor genes, promoters, regulatory sequence elements, or
`combination thereof. In some embodiments the variant is a
`
`nucleotide variant, single base substitution, or small indel,
`transversion, translocation, inversion, deletion, truncation or
`gene truncation about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or 20
`nucleotides in length.
`[0025]
`In some embodiments, the method comprises cor-
`recting/normalizing/adjusting the quantity of mapped reads
`using the barcodes or unique properties of individual reads.
`[0026]
`In some embodiments, enumerating the reads is per-
`formed through enumeration of unique barcodes in each of
`the predefined regions and normalizing those numbers across
`at least a subset ofpredefined regions that were sequenced. In
`some embodiments, samples at succeeding time intervals
`from the same subject are analyzed and compared to previous
`sample results. The method of the disclosure may further
`comprise determining partial copy number variation fre-
`quency, loss of heterozygosity, gene expression analysis, epi-
`genetic analysis and hyperrnethylation analysis after ampli-
`fying the barcode-attached extracellular polynucleotides.
`[0027]
`In some embodiments, copy number variation and
`rare mutation analysis is determined in a cell-free or substan-
`tially cell free sample obtained from a subject using multiplex
`sequencing, comprising performing over 10,000 sequencing
`reactions; simultaneously sequencing at least 10,000 differ-
`ent reads; or performing data analysis on at least 10,000
`different reads across the genome. The method may comprise
`multiplex sequencing comprising performing data analysis
`on at least 10,000 different reads across the genome. The
`method may further comprise enumerating sequenced reads
`that are uniquely identifiable.
`[0028]
`In some embodiments, the methods of the disclo-
`sure comprise normalizing and detection is performed using
`one or more of hidden markov, dynamic programming, sup-
`port vector machine, Bayesian network, trellis decoding, Vit-
`erbi decoding, expectation maximization, Kalman filtering,
`or neural network methodologies.
`[0029]
`In some embodiments the methods ofthe disclosure
`comprise monitoring disease progression, monitoring
`residual disease, monitoring therapy, diagnosing a condition,
`prognosing a condition, or selecting a therapy based on dis-
`covered variants.
`
`In some embodiments, a therapy is modified based
`[0030]
`on the most recent sample analysis. Further, the methods of
`the disclosure comprise inferring the genetic profile of a
`tumor, infection or other tissue abnormality. In some embodi-
`ments growth, remission or evolution of a tumor, infection or
`other tissue abnormality is monitored. In some embodiments
`the subject’s immune system are analyzed and monitored at
`single instances or over time.
`[0031]
`In some embodiments, the methods of the disclo-
`sure comprise identification of a variant that is followed up
`through an imaging test (e. g., CT, PET-CT, MRI, X-ray, ultra-
`sound) for localization ofthe tissue abnormality suspected of
`causing the identified variant.
`
`PGDX EX. 1015
`
`Page 19 of 51
`
`PGDX EX. 1015
`Page 19 of 51
`
`
`
`US 2015/0368708 A1
`
`Dec. 24, 2015
`
`In some embodiments, the methods of the disclo-
`[0032]
`sure comprise use of genetic data obtained from a tissue or
`tumor biopsy from the same patient. In some embodiments,
`whereby the phylogenetics of a tumor, infection or other
`tissue abnormality is inferred.
`[0033]
`In some embodiments, the methods of the disclo-
`sure comprise performing population-based no-calling and
`identification of low-confidence regions. In some embodi-
`ments, obtaining the measurement data for the sequence cov-
`erage comprises measuring sequence coverage depth at every
`position of the genome. In some embodiments correcting the
`measurement data for the sequence coverage bias comprises
`calculating window-averaged coverage. In some embodi-
`ments correcting the measurement data for the sequence cov-
`erage bias comprises performing adjustments to account for
`GC bias in the library construction and sequencing process. In
`some embodiments correcting the measurement data for the
`sequence coverage bias comprises performing adjustments
`based on additional weighting factor associated with indi-
`vidual mappings to compensate for bias.
`[0034]
`In some embodiments, the methods of the disclo-
`sure comprise extracellular polynucleotide derived from a
`diseased cell origin. In some embodiments, the extracellular
`polynucleotide is derived from a healthy cell origin.
`[0035] The disclosure also provides for a system compris-
`ing a computer readable medium for performing the follow-
`ing steps: selecting predefined regions in a genome; enumer-
`ating number of sequence reads in the predefined regions;
`normalizing the number of sequence reads across the pre-
`defined regions; and determining percent of copy number
`variation in the predefined regions. In some embodiments, the
`entirety ofthe genome or at least 10%, 20%, 30%, 40%, 50%,
`60%, 70%, 80%, or 90% of the genome is analyzed. In some
`embodiments, computer readable medium provides data on
`percent cancer DNA or RNA in plasma or serum to the end
`user.
`
`In some embodiments, the amount of genetic varia-
`[0036]
`tion, such as polymorphisms or causal variants is analyzed. In
`some embodiments, the presence or absence of genetic alter-
`ations is detected.
`
`[0037] The disclosure also provides for a method for
`detecting a rare mutation in a cell-free or a substantially cell
`free sample obtained from a subject comprising: a) sequenc-
`ing extracellular polynucleotides from a bodily sample from
`a subject, wherein each of the extracellular polynucleotides
`generate a plurality of sequencing reads; b) filtering out reads
`that fail to meet a set threshold; c) mapping sequence reads
`derived from the sequencing onto a reference sequence; d)
`identifying a subset of mapped sequence reads that align with
`a variant of the reference sequence at each mappable base
`position; e) for each mappable base position, calculating a
`ratio of (a) a number of mapped sequence reads that include a
`variant as compared to the reference sequence, to (b) a num-
`ber of total sequence reads for each mappable base position;
`f) normalizing the ratios or frequency of variance for each
`mappable base position and determining potential rare vari-
`ant(s) or other genetic alteration(s); and g) comparing the
`resulting number for each of the regions
`[0038] This disclosure also provides for a method compris-
`ing: a. providing at least one set of tagged parent polynucle-
`otides, and for each set of tagged parent polynucleotides; b.
`amplifying the tagged parent polynucleotides in the set to
`produce a corresponding set of amplified progeny polynucle-
`otides; c. sequencing a subset (including a proper subset) of
`
`the set of amplified progeny polynucleotides, to produce a set
`of sequencing reads; and d. collapsing the set of sequencing
`reads to generate a set of consensus sequences, each consen-
`sus sequence corresponding to a unique polynucleotide
`among the set of tagged parent polynucleotides. In certain
`embodiments the method further comprises: e. analyzing the
`set of consensus sequences for each set of tagged parent
`molecules.
`
`In some embodiments eachpolynucleotide in a set is
`[0039]
`mappable to a reference sequence.
`[0040]
`In some embodiments the method comprises pro-
`viding a plurality of sets of tagged parent polynucleotides,
`wherein each set
`is mappable to a different reference
`sequence.
`[0041]
`In some embodiments the method further comprises
`converting initial starting genetic material into the tagged
`parent polynucleotides.
`[0042]
`In some embodiments the initial starting genetic
`material comprises no more than 100 ng of polynucleotides.
`[0043]
`In some embodiments the method comprises bottle-
`necking the initial starting genetic material prior to convert-
`ing.
`In some embodiments the method comprises con-
`[0044]
`verting the initial starting genetic material into tagged parent
`polynucleotides with a conversion efficiency of at least 10%,
`at least 20%, at least 30%, at least 40%, at least 50%, at least
`60%, at least 80% or at least 90%.
`[0045]
`In some embodiments converting comprises any of
`blunt-end ligation, sticky end ligation, molecular inversion
`probes, PCR, ligation-based PCR, single strand ligation and
`single strand circularization.
`[0046]
`In some embodiments the initial starting genetic
`material is cell-free nucleic acid.
`
`In some embodiments a plurality of the reference
`[0047]
`sequences are from the same genome.
`[0048]
`In some embodiments each tagged parent poly-
`nucleotide in the set is uniquely tagged.
`[0049]
`In some embodiments the tags are non-unique.
`[0050]
`In some embodiments the generation of consensus
`sequences is based on information from the tag and/or at least
`one of sequence information at the beginning (start) region of
`the sequence read, the end (stop) regions of the sequence read
`and the length of the sequence read.
`[0051]
`In some embodiments
`the metho