throbber
I IIIII IIIIIIII II llllll lllll lllll lllll lllll lllll lllll lllll lllll lllll lllll 111111111111111111
`US 20120046877Al
`
`c19) United States
`c12) Patent Application Publication
`HYLAND et al.
`
`c10) Pub. No.: US 2012/0046877 Al
`Feb. 23, 2012
`(43) Pub. Date:
`
`(54) SYSTEMS AND METHODS TO DETECT
`COPY NUMBER VARIATION
`
`(75)
`
`Inventors:
`
`Fiona HYLAND, San Mateo, CA
`(US); Rajesh Gottimukkala, Foster
`City, CA (US)
`
`(73) Assignee:
`
`LIFE TECHNOLOGIES
`CORPORATION, Carlsbad, CA
`(US)
`
`(21) Appl. No.:
`
`13/176,471
`
`(22) Filed:
`
`Jul. 5, 2011
`
`Related U.S. Application Data
`
`(60) Provisional application No. 61/361,886, filed on Jul. 6,
`2010.
`
`Publication Classification
`
`(51)
`
`Int. Cl.
`G06F 19/00
`
`(2011.01)
`
`(52) U.S. Cl. .......................................................... 702/20
`
`(57)
`
`ABSTRACT
`
`In one aspect, a system for implementing a copy number
`variation analysis method, is disclosed. The system can
`include a nucleic acid sequencer and a computing device in
`communications with the nucleic acid sequencer. The nucleic
`acid sequencer can be configured to interrogate a sample to
`produce a nucleic acid sequence data file containing a plural(cid:173)
`ity of nucleic acid sequence reads. In various embodiments,
`the computing device can be a workstation, mainframe com(cid:173)
`puter, personal computer, mobile device, etc.
`
`The computing device can comprise a sequencing mapping
`engine, a coverage normalization engine, a segmentation
`engine and a copy number variation identification engine. The
`sequence mapping engine can be configured to align the
`plurality of nucleic acid sequence reads to a reference
`sequence, wherein the aligned nucleic acid sequence reads
`merge to form a plurality of chromosomal regions. The cov(cid:173)
`erage normalization engine can be configured to divide each
`chromosomal region into one or more non-overlapping win(cid:173)
`dow regions, determine nucleic acid sequence read coverage
`for each window region and normalize the nucleic acid
`sequence read coverage determined for each window region
`to correct for bias. The segmentation engine can be config(cid:173)
`ured to convert the normalized nucleic acid sequence read
`coverage for each window region to discrete copy number
`states. The copy number variation identification engine can be
`configured to identify copy number variation in the chromo(cid:173)
`somal regions by utilizing the copy number states of each
`window region.
`
`104
`
`106
`
`PROCESSOR
`
`RAM
`
`ROM
`
`DISK
`STORAGE
`
`BUS
`
`102
`
`DISPLAY
`
`INPUT
`DEVICE
`
`CURSOR
`CONTROL
`
`112
`
`114
`
`116
`
`Page 1
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`

`

`Patent Application Publication
`
`Feb. 23, 2012 Sheet 1 of 9
`
`US 2012/0046877 Al
`
`-
`
`N
`0 ...-
`lJ
`
`- .
`
`w
`~ C)
`en <C
`-
`0:::
`00
`I-en
`
`'
`
`~
`en
`:::)
`al
`
`j
`
`0::: _J
`00
`en o:::
`0::: I-
`:::) z
`u8
`
`I- w
`:::) u
`a..>
`zw -o
`
`co
`...(cid:173)
`...-
`
`lJ
`
`lJ
`
`N ...(cid:173)
`...-
`lJ
`
`.
`C)
`u..
`
`\ 0
`
`0 ...-
`
`~
`0
`0:::
`
`~
`~
`
`0:::
`
`0 en en
`w
`u
`0
`0:::
`a..
`
`-
`-
`
`-
`.
`- -
`
`- -
`
`-~
`
`. -
`
`-
`
`- --
`
`_J
`
`>-<(
`a.. en
`0
`
`0 :;::: , ,
`\
`
`\J
`
`co
`0 ...-
`\.
`
`co
`0 ...-
`\.
`
`-.::I"
`0 ...-
`\.
`
`Page 2
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`

`

`Patent Application Publication
`
`Feb. 23, 2012 Sheet 2 of 9
`
`US 2012/0046877 Al
`
`/200
`
`I
`
`208
`
`I
`
`I
`
`202
`
`I
`
`206
`
`•t
`
`204
`
`FIG. 2
`
`Page 3
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`

`

`Patent Application Publication
`
`Feb. 23, 2012 Sheet 3 of 9
`
`US 2012/0046877 Al
`
`302--...
`
`Test genome X
`f u"°"v J\.
`vu</
`
`Genomic
`fragments
`
`Sampling &
`Sequencing
`
`1 J
`
`304 ~ Mapping
`
`+
`Template Genome
`
`!
`dhJhrurest
`
`/306
`Coverage in
`each window
`
`Data Analysis
`
`308\
`
`_J
`
`-1
`
`..... t
`:-· \• i
`: • i ·=
`•••••
`o 1
`:
`•
`•
`•
`!;;:
`O'.'.'. o-•-• _• __ •_••-• •,•,•• • • • ••,• e•e• ••e• 1•1, I •Ila a
`e el e DU CU'• i
`N
`CJ
`•
`o
`:.• Genomic Position
`•••
`
`!
`
`i
`
`UC
`
`U & a &I
`
`FIG. 3
`
`Page 4
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`

`

`> ....
`
`-....J
`-....J
`QO
`O'I
`.i;...
`0
`0
`N ..._
`0 ....
`N
`rJJ
`c
`
`Data/Analysis
`
`API
`
`Client Device
`
`N
`
`N
`~
`N
`?'
`('D
`"f'j
`
`~
`
`0 ....
`.....
`rJJ =(cid:173)
`0 ....
`
`l,O
`
`.i;...
`
`('D
`('D
`
`410
`
`412
`
`.... 0 =
`.... 0 = ""O = O" -....
`('D = ..... t 'e -....
`
`~ .....
`
`(')
`
`~ .....
`
`(')
`
`~ .....
`""O
`
`1
`
`1
`
`I
`
`CNV Identification I I
`
`Engine
`
`408
`
`406
`
`FIG. 4
`
`401
`
`407
`
`Segmentation
`
`Engine
`
`404
`
`402./
`
`Read Coverage
`
`Engine
`
`Pre-processing
`
`Engine
`
`*.GFF File
`
`*.SAM/ *.BAM-----------.
`
`File
`
`Analytics Computing Device Mode
`
`Engine
`Mapping
`
`405
`
`/II_~-
`
`403
`
`400~
`
`Page 5
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`

`

`Patent Application Publication
`
`Feb. 23, 2012 Sheet 5 of 9
`
`US 2012/0046877 Al
`
`500\
`
`C BEGIN )
`t
`
`i
`
`.,,- 502
`RECEIVE A NUCLEIC ACID SEQUENCE DATA FILE CONTAINING A PLURALITY OF
`NUCLEIC ACID SEQUENCE READS ALIGNED TO A REFERENCE SEQUENCE
`WHEREIN,THE ALIGNED NUCLEIC ACID SEQUENCE READS TOGETHER FORM A
`PLURALITY OF CHROMOSOMAL REGIONS
`
`r 504
`
`DETERMINE NUCLEIC ACID SEQUENCE READ COVERAGE (THE NUMBER OF
`NUCLEIC ACID SEQUENCE READS ALIGNED TO) FOR EACH BASE POSITION OF
`THE PLURALITY OF CHROMOSOMAL REGIONS
`
`i
`
`r 506
`DIVIDE EACH OF THE PLURALITY OF CHROMOSOMAL REGIONS INTO ONE OR
`MORE NON-OVERLAPPING WINDOW REGIONS, WHERIN EACH WINDOW
`REGION CONTAINS ABOUT THE SAME NUMBER OF MAPPABLE BASES
`
`i
`
`i
`
`r 508
`DETERMINE NUCLEIC ACID SEQUENCE READ COVERAGE FOR EACH WINDOW
`REGION
`
`NORMALIZE THE NUCLEIC ACID SEQUENCE READ COVERAGE DETERMINED
`FOR EACH WINDOW REGION TO CORRECT BIAS
`
`i
`
`USE A STOCHASTIC MODELING ALGORITHM TO CONVERT THE NORMALIZED
`NUCLEIC ACID SEQUENCE READ COVERAGE FOR EACH WINDOW REGION
`TO DISCRETE COPY NUMBER STATES
`
`i
`
`UTILIZE THE DISCRETE COPY NUMBER STATES OF EACH WINDOW REGION
`TO IDENTIFY COPY NUMBER VARIATION IN THE CHROMOSOMAL REGIONS
`
`FIG. 5
`
`r 510
`
`r 512
`
`r 514
`
`Page 6
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`

`

`Patent Application Publication
`
`Feb. 23, 2012 Sheet 6 of 9
`
`US 2012/0046877 Al
`
`• Sample (Tumor) --=====------------
`• Control Sample --=====------------
`
`NOT a Copy Number Variant
`
`FIG. 6A
`
`• Sample (Tumor) --===-=====---------
`• Control Sample --===------------
`
`Copy Number INCREASE (4x)
`
`FIG. 68
`
`Page 7
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`

`

`Patent Application Publication
`
`Feb. 23, 2012 Sheet 7 of 9
`
`US 2012/0046877 Al
`
`700\
`
`(BEGIN)
`
`+
`
`RECEIVE NUCLEIC ACID SEQUENCE DATA FILES GENERATED FROM THE
`INTERROGATION OF A TEST SAMPLE AND A CONTROL SAMPLE, WHERIN EACH
`DATA FILE CONTAINS A PLURALITY OF NUCLEIC ACID SEQUENCE READS
`ALIGNED TO A REFERENCE SEQUENCE AND THE ALIGNED READS FORM A
`PLURALITY OF CHROMOSOMAL REGIONS
`
`r 702
`
`704
`
`/
`
`+
`
`DETERMINE NUCLEIC ACID SEQUENCE READ COVERAGE (THE NUMBER OF
`NUCLEIC ACID SEQUENCE READS ALIGNED TO) FOR EACH BASE POSITION OF
`THE PLURALITY OF CHROMOSOMAL REGIONS OF THE TEST SAMPLE AND
`THE CONTROL SAMPLE
`
`r 706
`DIVIDE EACH OF THE PLURALITY OF CHROMOSOMAL REGIONS OF THE TEST
`SAMPLE AND THE CONTROL SAMPLE INTO ONE OR MORE NON-OVERLAPPING
`FIXED-SIZE WINDOW REGIONS
`
`+
`
`/
`DETERMINE NUCLEIC ACID SEQUENCE READ COVERAGE FOR EACH WINDOW
`REGION
`
`+
`
`708
`
`+
`
`710
`
`/
`
`DETERMINE NUCLEIC ACID SEQUENCE READ COVERAGE RATIOS FOR EACH
`WINDOW REGION OF THE TEST SAMPLE BY DIVIDING THE READ COVERAGE OF
`EACH WINDOW REGION OF THE TEST SAMPLE WITH THE READ COVERAGE
`OF A CORRESPONDING WINDOW REGION OF THE CONTROL SAMPLE
`
`r 712
`NORMALIZE NUCLEIC ACID SEQUENCE READ COVERAGE RATIOS FOR EACH
`WINDOW REGION OF THE TEST SAMPLE
`
`+
`
`r 714
`USE A STOCHASTIC MODELING ALGORITHM TO CONVERT THE NORMALIZED
`NUCLEIC ACID SEQUENCE READ COVERAGE RATIOS FOR EACH WINDOW
`REGION OF THE TEST SAMPLE TO DISCRETE COPY NUMBER STATES
`
`+
`
`r 716
`
`+
`
`UTILIZE THE DISCRETE COPY NUMBER STATES OF EACH WINDOW REGION
`OF THE TEST SAMPLE TO IDENTIFY COPY NUMBER VARIATION IN THE
`CHROMOSOMAL REGIONS OF THE TEST SAMPLE
`
`FIG. 7
`
`Page 8
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`

`

`N
`~
`N
`?'
`('D
`"f'j
`
`~
`
`0 ....
`QO
`.....
`rJJ =(cid:173)
`0 ....
`
`l,O
`
`('D
`('D
`
`N
`
`150
`
`100
`
`50
`
`Chromosome 11 position (Mb)
`
`-->,
`
`0
`----g-
`0 = ~
`
`iii.ti.iii..
`
`> ....
`
`-....J
`-....J
`QO
`O'I
`.i;...
`0
`0
`N ..._
`0 ....
`N
`rJJ
`c
`
`FIG. 8A
`
`.
`
`'····-···-CDKN2B
`
`1,mt··CDKN2A·······-··
`
`0.5x
`
`J ...
`
`(-)
`(+)
`
`(+)
`
`------------------
`
`0 Deletion
`~ CNVPair
`(?
`(])
`~ Tumor
`~ Tumor
`~ Normal
`~ Normal
`
`0..
`
`C
`
`.... 0 =
`.... 0 = "'O = O' -....
`('D = ..... t 'e -....
`
`~ .....
`
`(')
`
`~ .....
`
`(')
`
`~ .....
`"'O
`
`-,.._-----3x-~~
`
`. '.~
`
`..
`-t:f' ~ : .. l
`
`150
`
`100
`
`50 /
`
`0
`
`5
`
`-
`
`0
`
`Chromosome 11 position (Mb)
`
`J.>..Uj uu
`
`. ______ ...
`
`9x
`
`I
`
`;
`
`RAOV1 PFIA 11 ..... -.-.-.-.-.-.-.-.-.-
`
`'h111111_•w.............
`
`P
`........... CTIN
`
`0
`..... CCND1 /'"-"',,:::._···
`
`...... _ ..
`
`I
`
`-
`
`.. -
`.
`
`(-)
`(+)
`(-) r-
`(+) I
`
`(?
`(])
`~ Tumor
`~ Tumor
`~ Normal
`·~ Normal
`
`0..
`
`C
`
`1\11i!!"IA·-
`
`r1
`
`I
`
`CNV Single (H)
`~ CNV Single (T)
`>
`
`CNV Pair
`
`Page 9
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`

`

`> ....
`
`-....J
`-....J
`QO
`O'I
`.i;...
`0
`0
`N ..._
`0 ....
`N
`rJJ
`c
`
`N
`
`N
`~
`N
`?'
`('D
`"f'j
`
`~
`
`0 ....
`.....
`rJJ =(cid:173)
`0 ....
`
`l,O
`
`l,O
`
`('D
`('D
`
`.... 0 =
`.... 0 = ""O = O" -....
`('D = ..... t 'e -....
`
`~ .....
`
`(')
`
`~ .....
`
`(')
`
`~ .....
`""O
`
`FIG. 88
`
`B) The correlation is stronger (R= 0.84), 1 only meaningful copy number changes (i.e. those greater
`
`than 1.4 fold) are cons1aered.
`
`A) A strong. correlatio~ (R=O. 73) is observed between changes in copy number and changes in gene
`
`expression for patient 8.
`
`R2 =0.560
`
`4
`
`3
`
`2
`
`1
`
`CNV log, (TS/NB)
`-6
`•
`-4
`• • ••
`...
`
`I
`-1;;>"' 0
`
`•
`
`•• ,A.
`
`I
`
`·2
`
`•
`•
`
`41
`
`6
`
`C)
`w
`~-2
`
`0)
`
`N
`
`c
`co
`z
`-----
`co
`
`-
`
`B.
`
`I
`
`7'
`./.
`
`•
`
`•
`•
`
`: I .
`
`R2 =0.560
`
`4
`
`3
`
`•
`
`CNV log, (TS/NB)
`-6
`•
`-4
`
`•
`• ••
`••
`
`C)
`w
`~-2
`
`0)
`N
`
`c
`co
`z
`-----
`co
`
`-
`
`A. I
`
`Page 10
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`

`

`US 2012/0046877 Al
`
`Feb.23,2012
`
`1
`
`SYSTEMS AND METHODS TO DETECT
`COPY NUMBER VARIATION
`
`RELATED APPLICATIONS
`
`[0001] This application claims priority pursuant to 35 U.S.
`C. §119(e) to U.S. Provisional Patent Application Ser. No.
`61/361,886, entitled "Method to Detect Copy Number Varia(cid:173)
`tion," filed on Jul. 6, 2010, the entirety of which is incorpo(cid:173)
`rated herein by reference as if set forth in full.
`
`FIELD
`
`[0002] The present disclosure generally relates to the field
`of nucleic acid sequencing including systems and methods
`for identifying genomic variants using nucleic acid sequenc(cid:173)
`ing data.
`
`INTRODUCTION
`
`[0003] Upon completion of the Human Genome Project,
`one focus of the sequencing industry has shifted to finding
`higher throughput and/or lower cost nucleic acid sequencing
`technologies, sometimes referred to as "next generation"
`sequencing (NGS) technologies. In making sequencing
`higher throughput and/or less expensive, the goal is to make
`the technology more accessible for sequencing. These goals
`can be reached through the use of sequencing platforms and
`methods that provide sample preparation for larger quantities
`of samples of significant complexity, sequencing larger num(cid:173)
`bers of complex samples, and/or a high volume of informa(cid:173)
`tion generation and analysis in a short period of time. Various
`methods, such as, for example, sequencing by synthesis,
`sequencing by hybridization, and sequencing by ligation are
`evolving to meet these challenges.
`[0004] Research into fast and efficient nucleic acid (e.g.,
`genome, exome, etc.) sequence assembly methods is vital to
`the sequencing industry as NGS technologies can provide
`ultra-high throughput nucleic acid sequencing. As such
`sequencing systems incorporating NGS technologies can
`produce a large number of short sequence reads in a relatively
`short amount time. Sequence assembly methods must be able
`to assemble and/or map a large number of reads quickly and
`efficiently (i.e., minimize use of computational resources).
`For example, the sequencing of a human size genome can
`result in tens or hundreds of millions of reads that need to be
`assembled before they can be further analyzed to determine
`their biological, diagnostic and/or therapeutic relevance.
`[0005] Exemplary applications of NGS
`technologies
`include, but are not limited to: genomic variant (e.g., indels,
`copy number variations, single nucleotide polymorphisms,
`etc.) detection, resequencing, gene expression analysis and
`genomic profiling.
`[0006] Of particular interest are copy number variations
`(CNVs), which have been observed in mammalian germline
`DNA and in tumor genomes. CNVs are being increasingly
`implicated as contributing factors in common disease states
`(for example, mental retardation and schizophrenia) and in
`cancer progression. In humans, more total nucleotides exhibit
`variation due to alterations in copy number than due to single
`nucleotide diversity. CNV detection has historically been
`done using comparative genomic hybridization, with one
`method measuring the log2 ratio of test data intensity/control
`
`data intensity. Such methods have inherent limitations so
`there is a need for more flexible CNV detection and analysis
`approaches.
`
`SUMMARY
`
`[0007] Systems, methods, software and computer-usable
`media for copy number variation determination from analyz(cid:173)
`ing biomolecule-related sequence reads are disclosed. Bio(cid:173)
`molecule-related sequences can relate to proteins, peptides,
`nucleic acids, and the like, and can include structural and
`functional information such as secondary or tertiary struc(cid:173)
`tures, amino acid or nucleotide sequences, sequence motifs,
`binding properties, genetic mutations and variants, and the
`like.
`[0008] Using nucleic acids as an example, in various
`embodiments, smaller nucleic acid sequence reads ( e.g., NGS
`reads) can be assembled into larger sequences using an
`anchor-extension mapping method that
`initially maps
`(aligns) only a contiguous portion of each read to a reference
`sequence and then extends the mapping of the read at both
`ends of the mapped contiguous portion until the entire read is
`mapped (aligned). In various embodiments, a mapping score
`can be calculated for the read aligmnent using a scoring
`function, score (i, j)=M+mx, where M can be the number of
`matches in the extended alignment, x can be the number of
`mismatches in the aligmnent, and m can be a negative penalty
`for each mismatch. In various embodiments, the negative
`penalty, m, for each mismatch is user defined. In various
`embodiments, the negative penalty, m, for each mismatch is
`automatically determined by the algorithm/script/program
`implementing the anchor-extension mapping method to
`maximize the accuracy of the read alignment.
`[0009]
`In various embodiments, the nucleic acid sequence
`read data can be generated using various techniques, plat(cid:173)
`forms or technologies, including, but not limited to: capillary
`electrophoresis, microarrays, ligation-based systems, poly(cid:173)
`merase-based systems, hybridization-based systems, direct
`or indirect nucleotide identification systems, pyrosequenc(cid:173)
`ing, ion- or pH-based detection systems, electronic signature(cid:173)
`based systems, etc.
`[0010]
`In one aspect, a system for implementing a copy
`number variation analysis method, is disclosed. The system
`can include a nucleic acid sequencer and a computing device
`in communications with the nucleic acid sequencer. The
`nucleic acid sequencer can be configured to interrogate a
`sample to produce a nucleic acid sequence data file contain(cid:173)
`ing a plurality of nucleic acid sequence reads. In various
`embodiments, the computing device can be a workstation,
`mainframe computer, personal computer, mobile device, etc.
`[0011] The computing device can be comprise a sequenc(cid:173)
`ing mapping engine, a coverage normalization engine, a seg(cid:173)
`mentation engine and a copy number variation identification
`engine. The sequence mapping engine can be configured to
`align the plurality of nucleic acid sequence reads to a refer(cid:173)
`ence sequence, wherein the aligned nucleic acid sequence
`reads merge to form a plurality of chromosomal regions. The
`coverage normalization engine can be configured to divide
`each chromosomal region into one or more non-overlapping
`window regions, determine nucleic acid sequence read cov(cid:173)
`erage for each window region and normalize the nucleic acid
`sequence read coverage determined for each window region
`to correct for bias.
`[0012] The segmentation engine can be configured to con(cid:173)
`vert the normalized nucleic acid sequence read coverage for
`
`Page 11
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`

`

`US 2012/0046877 Al
`
`Feb.23,2012
`
`2
`
`each window region to discrete copy number states. The copy
`number variation identification engine can be configured to
`identify copy number variation in the chromosomal regions
`by utilizing the copy number states of each window region.
`[0013]
`In one aspect, a computer-implemented method for
`identifying copy number variations, is disclosed. A nucleic
`acid sequence data file containing a plurality of nucleic acid
`sequence reads aligned to a reference sequence is received,
`wherein the aligned nucleic acid sequence reads together
`form a plurality of chromosomal regions. Each of the plural(cid:173)
`ity of chromosomal regions are divided into one or more
`non-overlapping window regions. The nucleic acid sequence
`read coverage for each window region is determined. The
`nucleic acid sequence read coverage determined for each
`window region is normalized to correct for bias. The normal(cid:173)
`ized nucleic acid sequence read coverage for each window
`region is converted to discrete copy number states. Copy
`number variation is identified in the chromosomal regions.
`[0014] These and other features are provided herein.
`
`DRAWINGS
`
`[0015] For a more complete understanding of the principles
`disclosed herein, and the advantages thereof, reference is now
`made to the following descriptions taken in conjunction with
`the accompanying drawings, in which:
`[0016] FIG.1 is a block diagram that illustrates a computer
`system, in accordance with various embodiments.
`[0017] FIG. 2 is a schematic diagram of a system for recon(cid:173)
`structing a nucleic acid sequence, in accordance with various
`embodiments.
`[0018] FIG. 3 is a diagram showing a single sample CNV
`sequencing analysis pipeline, in accordance with various
`embodiments.
`[0019] FIG. 4 is a schematic diagram of a system for CNV
`analysis, in accordance with various embodiments.
`[0020] FIG. 5 is an exemplary flowchart showing a method
`for identifying CNV using a single sample approach, in
`accordance with various embodiments.
`[0021] FIG. 6A is a depiction of a nucleic acid sequence
`that does not contain a copy number variant, in accordance
`with various embodiments.
`[0022] FIG. 68 is a depiction of a nucleic acid sequence
`containing a copy number variant, in accordance with various
`embodiments.
`[0023] FIG. 7 is an exemplary flowchart showing a method
`for identifying CNVs using a paired sample approach, in
`accordance with various embodiments.
`[0024] FIG. SA is an illustration of examples of genomic
`regions that show strong correlations between CNVs and
`changes of gene expression, in accordance with various
`embodiments.
`[0025] FIG. SB is an illustration of how large structural
`mutations are strongly correlated with tumor-specific
`changes in gene expression, in accordance with various
`embodiments.
`[0026]
`It is to be understood that the figures are not neces(cid:173)
`sarily drawn to scale, nor are the objects in the figures neces(cid:173)
`sarily drawn to scale in relationship to one another. The fig(cid:173)
`ures are depictions that are intended to bring clarity and
`understanding to various embodiments of apparatuses, sys(cid:173)
`tems, and methods disclosed herein. Wherever possible, the
`same reference numbers will be used throughout the draw(cid:173)
`ings to refer to the same or like parts. Moreover, it should be
`
`appreciated that the drawings are not intended to limit the
`scope of the present teachings in any way.
`
`DESCRIPTION OF VARIOUS EMBODIMENTS
`
`[0027] Embodiments of systems and methods for copy
`number variation determination are described herein.
`According to the present teachings, nucleic acid sequencing
`technologies can be utilized for genome-wide interrogation
`of CNV s. In contrast to conventional approaches ( e.g., array(cid:173)
`based methods, etc.), with sequencing, genomic coverage
`data is available at single base resolution which allows for
`high levels of fidelity when researchers and clinicians search
`for genomic variants such as CNV s in a genome.
`[0028] The section headings used herein are for organiza(cid:173)
`tional purposes only and are not to be construed as limiting
`the described subject matter in any way.
`[0029]
`In this detailed description of the various embodi(cid:173)
`ments, for purposes of explanation, numerous specific details
`are set forth to provide a thorough understanding of the
`embodiments disclosed. One skilled in the art will appreciate,
`however, that these various embodiments may be practiced
`with or without these specific details. In other instances,
`structures and devices are shown in block diagram form.
`Furthermore, one skilled in the art can readily appreciate that
`the specific sequences in which methods are presented and
`performed are illustrative and it is contemplated that the
`sequences can be varied and still remain within the spirit and
`scope of the various embodiments disclosed herein.
`[0030] All literature and similar materials cited in this
`application, including but not limited to, patents, patent appli(cid:173)
`cations, articles, books, treatises, and internet web pages are
`expressly incorporated by reference in their entirety for any
`purpose. Unless defined otherwise, all technical and scientific
`terms used herein have the same meaning as is commonly
`understood by one of ordinary skill in the art to which the
`various embodiments described herein belongs. When defi(cid:173)
`nitions of terms in incorporated references appear to differ
`from the definitions provided in the present teachings, the
`definition provided in the present teachings shall control.
`[0031]
`It will be appreciated that there is an implied
`"about" prior to the temperatures, concentrations, times,
`number of bases, coverage, etc. discussed in the present
`teachings, such that slight and insubstantial deviations are
`within the scope of the present teachings. In this application,
`the use of the singular includes the plural unless specifically
`stated otherwise. Also, the use of "comprise", "comprises",
`"comprising",
`"contain",
`"contains",
`"containing",
`"include", "includes", and "including" are not intended to be
`limiting. It is to be understood that both the foregoing general
`description and the following detailed description are exem(cid:173)
`plary and explanatory only and are not restrictive of the
`present teachings.
`[0032] Further, unless otherwise required by context, sin(cid:173)
`gular terms shall include pluralities and plural terms shall
`include the singular. Generally, nomenclatures utilized in
`connection with, and techniques of, cell and tissue culture,
`molecular biology, and protein and oligo- or polynucleotide
`chemistry and hybridization described herein are those well
`known and commonly used in the art. Standard techniques are
`used, for example, for nucleic acid purification and prepara(cid:173)
`tion, chemical analysis, recombinant nucleic acid, and oligo(cid:173)
`nucleotide synthesis. Enzymatic reactions and purification
`techniques are performed according to manufacturer's speci(cid:173)
`fications or as commonly accomplished in the art or as
`
`Page 12
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`

`

`US 2012/0046877 Al
`
`Feb.23,2012
`
`3
`
`described herein. The techniques and procedures described
`herein are generally performed according to conventional
`methods well known in the art and as described in various
`general and more specific references that are cited and dis(cid:173)
`cussed throughout the instant specification. See, e.g., Sam(cid:173)
`brook et al., Molecular Cloning: A Laboratory Manual (Third
`ed., Cold Spring Harbor Laboratory Press, Cold Spring Har(cid:173)
`bor, N.Y. 2000). The nomenclatures utilized in connection
`with, and the laboratory procedures and techniques described
`herein are those well known and commonly used in the art.
`[0033] As used herein, "a" or "an" means "at least one" or
`"one or more."
`[0034] A "system" denotes a set of components, real or
`abstract, comprising a whole where each component interacts
`with or is related to at least one other component within the
`whole.
`[0035] A "biomolecule" is any molecule that is produced
`by a biological organism, including large polymeric mol(cid:173)
`ecules such as proteins, polysaccharides, lipids, and nucleic
`acids (DNA and RNA) as well as small molecules such as
`primary metabolites, secondary metabolites, and other natu(cid:173)
`ral products.
`[0036] The phrase "next generation sequencing" or NGS
`refers to sequencing technologies having increased through(cid:173)
`put as compared to traditional Sanger- and capillary electro(cid:173)
`phoresis-based approaches, for example with the ability to
`generate hundreds of thousands of relatively small sequence
`reads at a time. Some examples of next generation sequencing
`techniques include, but are not limited to, sequencing by
`synthesis, sequencing by ligation, and sequencing by hybrid(cid:173)
`ization. More specifically, the SOLiD Sequencing System of
`Life Technologies Corp. provides massively parallel
`sequencing with enhanced accuracy. The SOLiD System and
`associated workflows, protocols, chemistries, etc. are
`described in more detail in PCT Publication No. WO 2006/
`084132, entitled "Reagents, Methods, and Libraries for
`Bead-Based Sequencing," international filing date Feb. 1,
`2006, U.S. patent application Ser. No. 12/873,190, entitled
`"Low-Volume Sequencing System and Method ofU se," filed
`on Aug. 31, 2010, and U.S. patent application Ser. No.
`12/873,132, entitled "Fast-Indexing Filter Wheel and Method
`of Use," filed on Aug. 31, 2010, the entirety of each of these
`applications being incorporated herein by reference.
`[0037] The phrase "sequencing run" refers to any step or
`portion of a sequencing experiment performed to determine
`some information relating to at least one biomolecule ( e.g.,
`nucleic acid molecule).
`It is well known that DNA (deoxyribonucleic acid)
`[0038]
`is a chain of nucleotides consisting of 4 types of nucleotides;
`A (adenine), T (thymine), C (cytosine), and G (guanine), and
`that RNA (ribonucleic acid) is comprised of 4 types of nucle(cid:173)
`otides; A, U (uracil), G, and C. It is also known that certain
`pairs of nucleotides specifically bind to one another in a
`complementary fashion ( called complementary base pair(cid:173)
`ing). That is, adenine (A) pairs with thymine (T) (in the case
`of RNA, however, adenine (A) pairs with uracil (U)), and
`cytosine (C) pairs with guanine (G). When a first nucleic acid
`strand binds to a second nucleic acid strand made up of
`nucleotides that are complementary to those in the first strand,
`the two strands bind to form a double strand. As used herein,
`"nucleic acid sequencing data," "nucleic acid sequencing
`information," "nucleic acid sequence," "genomic sequence,"
`"genetic sequence," or "fragment sequence," or "nucleic acid
`sequencing read" denotes any information or data that is
`
`indicative of the order of the nucleotide bases (e.g., adenine,
`guanine, cytosine, and thymine/uracil) in a molecule (e.g.,
`whole genome, whole transcriptome, exome, oligonucle(cid:173)
`otide, polynucleotide, fragment, etc.) of DNA or RNA. It
`should be understood that the present teachings contemplate
`sequence information obtained using all available varieties of
`techniques, platforms or technologies, including, but not lim(cid:173)
`ited to: capillary electrophoresis, microarrays, ligation-based
`systems, polymerase-based systems, hybridization-based
`systems, direct or indirect nucleotide identification systems,
`pyrosequencing, ion- or pH-based detection systems, elec(cid:173)
`tronic signature-based systems, etc.
`[0039] The phrase "ligation cycle" refers to a step in a
`sequence-by-ligation process where a probe sequence is
`ligated to a primer or another probe sequence.
`[0040] The phrase "color call" refers to an observed dye
`color resulting from the detection of a probe sequence after a
`ligation cycle of a sequencing run.
`[0041] The phrase "color space" refers to a nucleic acid
`sequence data schema where nucleic acid sequence informa(cid:173)
`tion is represented by a set of colors ( e.g., color calls, color
`signals, etc.) each carrying details about the identity and/or
`positional sequence of bases that comprise the nucleic acid
`sequence. For example, the nucleic acid sequence "ATCGA"
`can be represented in color space by various combinations of
`colors that are measured as the nucleic acid sequence is inter(cid:173)
`rogated using optical detection-based (e.g., dye-based, etc.)
`sequencing techniques such as those employed by the SOLiD
`System. That is, in various embodiments, the SOLiD System
`can employ a schema that represents a nucleic acid fragment
`sequence as an initial base followed by a sequence of over(cid:173)
`lapping dimers (adjacent pairs of bases). The system can
`encode each dimer with one of four colors using a coding
`scheme that results in a sequence of color calls that represent
`a nucleotide sequence.
`[0042] The phrase "base space" refers to a nucleic acid
`sequence data schema where nucleic acid sequence informa(cid:173)
`tion is represented by the actual nucleotide base composition
`of the nucleic acid sequence. For example, the nucleic acid
`sequence "ATCGA" is represented in base space by the actual
`nucleotide base identities ( e.g., A, T/or U, C, G) of the nucleic
`acid sequence.
`[0043] A "polynucleotide", "nucleic acid", or "oligonucle(cid:173)
`otide" refers to a linear polymer of nucleosides (including
`deoxyribonucleosides, ribonucleosides, or analogs thereof)
`joined by internucleosidic linkages. Typically, a polynucle(cid:173)
`otide comprises at least three nucleosides. Usually oligo(cid:173)
`nucleotides range in size from a few monomeric units, e.g.
`3-4, to several hundreds of monomeric units. Whenever a
`polynucleotide such as an oligonucleotide is represented by a
`sequence of letters, such as "ATGCCTG," it will be under(cid:173)
`stood that the nucleotides are in 5'->3' order from left to right
`and that "A" denotes deoxyadenosine, "C" denotes deoxycy(cid:173)
`tidine, "G" denotes deoxyguanosine, and "T" denotes thymi(cid:173)
`dine, unless otherwise noted. The letters A, C, G, and T may
`be used to refer to the bases themselves, to nucleosides, or to
`nucleotides comprising the bases, as is standard in the art.
`[0044] The techniques of"paired-end," "pairwise," "paired
`tag," or "mate pair" sequencing are generally known in the art
`of molecular biology (Siegel A. F. et al., Genomics. 2000, 68:
`237-246; Roach J.C. et al., Genomics. 1995, 26: 345-353).
`These sequencing techniques can allow the determination of
`multiple "reads" of sequence, each from a different place on
`a single polynucleotide. Typically, the distance (i.e., insert
`
`Page 13
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`

`

`US 2012/0046877 Al
`
`Feb.23,2012
`
`4
`
`region) between the two reads or other information regarding
`a relationship between the reads is known. In some situations,
`these sequencing techniques provide more information than
`does sequencing two stretches of nucleic acid sequences in a
`random fashion. With the use of appropriate software tools for
`the assembly of sequence information ( e.g., Millikin S C. et
`al., Genome Res. 2003, 13: 81-90; Kent, W. J. et al., Genome
`Res. 2001, 11: 1541-8) it is possible to make use of the
`knowledge that the "paired-end," "pairwise," "paired tag" or
`"mate pair" sequences are not completely random, but are
`known to occur a known distance apart and/or to have some
`other relationship, and are therefore linked or paired in the
`genome. This information can aid in the assembly of whole
`nucleic acid sequences into a consensus sequence.
`
`Computer-Implemented System
`
`[0045] FIG. 1 is a block diagram that illustrates a computer
`system 100, upon which emb

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket