`US 20120046877Al
`
`c19) United States
`c12) Patent Application Publication
`HYLAND et al.
`
`c10) Pub. No.: US 2012/0046877 Al
`Feb. 23, 2012
`(43) Pub. Date:
`
`(54) SYSTEMS AND METHODS TO DETECT
`COPY NUMBER VARIATION
`
`(75)
`
`Inventors:
`
`Fiona HYLAND, San Mateo, CA
`(US); Rajesh Gottimukkala, Foster
`City, CA (US)
`
`(73) Assignee:
`
`LIFE TECHNOLOGIES
`CORPORATION, Carlsbad, CA
`(US)
`
`(21) Appl. No.:
`
`13/176,471
`
`(22) Filed:
`
`Jul. 5, 2011
`
`Related U.S. Application Data
`
`(60) Provisional application No. 61/361,886, filed on Jul. 6,
`2010.
`
`Publication Classification
`
`(51)
`
`Int. Cl.
`G06F 19/00
`
`(2011.01)
`
`(52) U.S. Cl. .......................................................... 702/20
`
`(57)
`
`ABSTRACT
`
`In one aspect, a system for implementing a copy number
`variation analysis method, is disclosed. The system can
`include a nucleic acid sequencer and a computing device in
`communications with the nucleic acid sequencer. The nucleic
`acid sequencer can be configured to interrogate a sample to
`produce a nucleic acid sequence data file containing a plural(cid:173)
`ity of nucleic acid sequence reads. In various embodiments,
`the computing device can be a workstation, mainframe com(cid:173)
`puter, personal computer, mobile device, etc.
`
`The computing device can comprise a sequencing mapping
`engine, a coverage normalization engine, a segmentation
`engine and a copy number variation identification engine. The
`sequence mapping engine can be configured to align the
`plurality of nucleic acid sequence reads to a reference
`sequence, wherein the aligned nucleic acid sequence reads
`merge to form a plurality of chromosomal regions. The cov(cid:173)
`erage normalization engine can be configured to divide each
`chromosomal region into one or more non-overlapping win(cid:173)
`dow regions, determine nucleic acid sequence read coverage
`for each window region and normalize the nucleic acid
`sequence read coverage determined for each window region
`to correct for bias. The segmentation engine can be config(cid:173)
`ured to convert the normalized nucleic acid sequence read
`coverage for each window region to discrete copy number
`states. The copy number variation identification engine can be
`configured to identify copy number variation in the chromo(cid:173)
`somal regions by utilizing the copy number states of each
`window region.
`
`104
`
`106
`
`PROCESSOR
`
`RAM
`
`ROM
`
`DISK
`STORAGE
`
`BUS
`
`102
`
`DISPLAY
`
`INPUT
`DEVICE
`
`CURSOR
`CONTROL
`
`112
`
`114
`
`116
`
`Page 1
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`
`
`Patent Application Publication
`
`Feb. 23, 2012 Sheet 1 of 9
`
`US 2012/0046877 Al
`
`-
`
`N
`0 ...-
`lJ
`
`- .
`
`w
`~ C)
`en <C
`-
`0:::
`00
`I-en
`
`'
`
`~
`en
`:::)
`al
`
`j
`
`0::: _J
`00
`en o:::
`0::: I-
`:::) z
`u8
`
`I- w
`:::) u
`a..>
`zw -o
`
`co
`...(cid:173)
`...-
`
`lJ
`
`lJ
`
`N ...(cid:173)
`...-
`lJ
`
`.
`C)
`u..
`
`\ 0
`
`0 ...-
`
`~
`0
`0:::
`
`~
`~
`
`0:::
`
`0 en en
`w
`u
`0
`0:::
`a..
`
`-
`-
`
`-
`.
`- -
`
`- -
`
`-~
`
`. -
`
`-
`
`- --
`
`_J
`
`>-<(
`a.. en
`0
`
`0 :;::: , ,
`\
`
`\J
`
`co
`0 ...-
`\.
`
`co
`0 ...-
`\.
`
`-.::I"
`0 ...-
`\.
`
`Page 2
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`
`
`Patent Application Publication
`
`Feb. 23, 2012 Sheet 2 of 9
`
`US 2012/0046877 Al
`
`/200
`
`I
`
`208
`
`I
`
`I
`
`202
`
`I
`
`206
`
`•t
`
`204
`
`FIG. 2
`
`Page 3
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`
`
`Patent Application Publication
`
`Feb. 23, 2012 Sheet 3 of 9
`
`US 2012/0046877 Al
`
`302--...
`
`Test genome X
`f u"°"v J\.
`vu</
`
`Genomic
`fragments
`
`Sampling &
`Sequencing
`
`1 J
`
`304 ~ Mapping
`
`+
`Template Genome
`
`!
`dhJhrurest
`
`/306
`Coverage in
`each window
`
`Data Analysis
`
`308\
`
`_J
`
`-1
`
`..... t
`:-· \• i
`: • i ·=
`•••••
`o 1
`:
`•
`•
`•
`!;;:
`O'.'.'. o-•-• _• __ •_••-• •,•,•• • • • ••,• e•e• ••e• 1•1, I •Ila a
`e el e DU CU'• i
`N
`CJ
`•
`o
`:.• Genomic Position
`•••
`
`!
`
`i
`
`UC
`
`U & a &I
`
`FIG. 3
`
`Page 4
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`
`
`> ....
`
`-....J
`-....J
`QO
`O'I
`.i;...
`0
`0
`N ..._
`0 ....
`N
`rJJ
`c
`
`Data/Analysis
`
`API
`
`Client Device
`
`N
`
`N
`~
`N
`?'
`('D
`"f'j
`
`~
`
`0 ....
`.....
`rJJ =(cid:173)
`0 ....
`
`l,O
`
`.i;...
`
`('D
`('D
`
`410
`
`412
`
`.... 0 =
`.... 0 = ""O = O" -....
`('D = ..... t 'e -....
`
`~ .....
`
`(')
`
`~ .....
`
`(')
`
`~ .....
`""O
`
`1
`
`1
`
`I
`
`CNV Identification I I
`
`Engine
`
`408
`
`406
`
`FIG. 4
`
`401
`
`407
`
`Segmentation
`
`Engine
`
`404
`
`402./
`
`Read Coverage
`
`Engine
`
`Pre-processing
`
`Engine
`
`*.GFF File
`
`*.SAM/ *.BAM-----------.
`
`File
`
`Analytics Computing Device Mode
`
`Engine
`Mapping
`
`405
`
`/II_~-
`
`403
`
`400~
`
`Page 5
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`
`
`Patent Application Publication
`
`Feb. 23, 2012 Sheet 5 of 9
`
`US 2012/0046877 Al
`
`500\
`
`C BEGIN )
`t
`
`i
`
`.,,- 502
`RECEIVE A NUCLEIC ACID SEQUENCE DATA FILE CONTAINING A PLURALITY OF
`NUCLEIC ACID SEQUENCE READS ALIGNED TO A REFERENCE SEQUENCE
`WHEREIN,THE ALIGNED NUCLEIC ACID SEQUENCE READS TOGETHER FORM A
`PLURALITY OF CHROMOSOMAL REGIONS
`
`r 504
`
`DETERMINE NUCLEIC ACID SEQUENCE READ COVERAGE (THE NUMBER OF
`NUCLEIC ACID SEQUENCE READS ALIGNED TO) FOR EACH BASE POSITION OF
`THE PLURALITY OF CHROMOSOMAL REGIONS
`
`i
`
`r 506
`DIVIDE EACH OF THE PLURALITY OF CHROMOSOMAL REGIONS INTO ONE OR
`MORE NON-OVERLAPPING WINDOW REGIONS, WHERIN EACH WINDOW
`REGION CONTAINS ABOUT THE SAME NUMBER OF MAPPABLE BASES
`
`i
`
`i
`
`r 508
`DETERMINE NUCLEIC ACID SEQUENCE READ COVERAGE FOR EACH WINDOW
`REGION
`
`NORMALIZE THE NUCLEIC ACID SEQUENCE READ COVERAGE DETERMINED
`FOR EACH WINDOW REGION TO CORRECT BIAS
`
`i
`
`USE A STOCHASTIC MODELING ALGORITHM TO CONVERT THE NORMALIZED
`NUCLEIC ACID SEQUENCE READ COVERAGE FOR EACH WINDOW REGION
`TO DISCRETE COPY NUMBER STATES
`
`i
`
`UTILIZE THE DISCRETE COPY NUMBER STATES OF EACH WINDOW REGION
`TO IDENTIFY COPY NUMBER VARIATION IN THE CHROMOSOMAL REGIONS
`
`FIG. 5
`
`r 510
`
`r 512
`
`r 514
`
`Page 6
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`
`
`Patent Application Publication
`
`Feb. 23, 2012 Sheet 6 of 9
`
`US 2012/0046877 Al
`
`• Sample (Tumor) --=====------------
`• Control Sample --=====------------
`
`NOT a Copy Number Variant
`
`FIG. 6A
`
`• Sample (Tumor) --===-=====---------
`• Control Sample --===------------
`
`Copy Number INCREASE (4x)
`
`FIG. 68
`
`Page 7
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`
`
`Patent Application Publication
`
`Feb. 23, 2012 Sheet 7 of 9
`
`US 2012/0046877 Al
`
`700\
`
`(BEGIN)
`
`+
`
`RECEIVE NUCLEIC ACID SEQUENCE DATA FILES GENERATED FROM THE
`INTERROGATION OF A TEST SAMPLE AND A CONTROL SAMPLE, WHERIN EACH
`DATA FILE CONTAINS A PLURALITY OF NUCLEIC ACID SEQUENCE READS
`ALIGNED TO A REFERENCE SEQUENCE AND THE ALIGNED READS FORM A
`PLURALITY OF CHROMOSOMAL REGIONS
`
`r 702
`
`704
`
`/
`
`+
`
`DETERMINE NUCLEIC ACID SEQUENCE READ COVERAGE (THE NUMBER OF
`NUCLEIC ACID SEQUENCE READS ALIGNED TO) FOR EACH BASE POSITION OF
`THE PLURALITY OF CHROMOSOMAL REGIONS OF THE TEST SAMPLE AND
`THE CONTROL SAMPLE
`
`r 706
`DIVIDE EACH OF THE PLURALITY OF CHROMOSOMAL REGIONS OF THE TEST
`SAMPLE AND THE CONTROL SAMPLE INTO ONE OR MORE NON-OVERLAPPING
`FIXED-SIZE WINDOW REGIONS
`
`+
`
`/
`DETERMINE NUCLEIC ACID SEQUENCE READ COVERAGE FOR EACH WINDOW
`REGION
`
`+
`
`708
`
`+
`
`710
`
`/
`
`DETERMINE NUCLEIC ACID SEQUENCE READ COVERAGE RATIOS FOR EACH
`WINDOW REGION OF THE TEST SAMPLE BY DIVIDING THE READ COVERAGE OF
`EACH WINDOW REGION OF THE TEST SAMPLE WITH THE READ COVERAGE
`OF A CORRESPONDING WINDOW REGION OF THE CONTROL SAMPLE
`
`r 712
`NORMALIZE NUCLEIC ACID SEQUENCE READ COVERAGE RATIOS FOR EACH
`WINDOW REGION OF THE TEST SAMPLE
`
`+
`
`r 714
`USE A STOCHASTIC MODELING ALGORITHM TO CONVERT THE NORMALIZED
`NUCLEIC ACID SEQUENCE READ COVERAGE RATIOS FOR EACH WINDOW
`REGION OF THE TEST SAMPLE TO DISCRETE COPY NUMBER STATES
`
`+
`
`r 716
`
`+
`
`UTILIZE THE DISCRETE COPY NUMBER STATES OF EACH WINDOW REGION
`OF THE TEST SAMPLE TO IDENTIFY COPY NUMBER VARIATION IN THE
`CHROMOSOMAL REGIONS OF THE TEST SAMPLE
`
`FIG. 7
`
`Page 8
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`
`
`N
`~
`N
`?'
`('D
`"f'j
`
`~
`
`0 ....
`QO
`.....
`rJJ =(cid:173)
`0 ....
`
`l,O
`
`('D
`('D
`
`N
`
`150
`
`100
`
`50
`
`Chromosome 11 position (Mb)
`
`-->,
`
`0
`----g-
`0 = ~
`
`iii.ti.iii..
`
`> ....
`
`-....J
`-....J
`QO
`O'I
`.i;...
`0
`0
`N ..._
`0 ....
`N
`rJJ
`c
`
`FIG. 8A
`
`.
`
`'····-···-CDKN2B
`
`1,mt··CDKN2A·······-··
`
`0.5x
`
`J ...
`
`(-)
`(+)
`
`(+)
`
`------------------
`
`0 Deletion
`~ CNVPair
`(?
`(])
`~ Tumor
`~ Tumor
`~ Normal
`~ Normal
`
`0..
`
`C
`
`.... 0 =
`.... 0 = "'O = O' -....
`('D = ..... t 'e -....
`
`~ .....
`
`(')
`
`~ .....
`
`(')
`
`~ .....
`"'O
`
`-,.._-----3x-~~
`
`. '.~
`
`..
`-t:f' ~ : .. l
`
`150
`
`100
`
`50 /
`
`0
`
`5
`
`-
`
`0
`
`Chromosome 11 position (Mb)
`
`J.>..Uj uu
`
`. ______ ...
`
`9x
`
`I
`
`;
`
`RAOV1 PFIA 11 ..... -.-.-.-.-.-.-.-.-.-
`
`'h111111_•w.............
`
`P
`........... CTIN
`
`0
`..... CCND1 /'"-"',,:::._···
`
`...... _ ..
`
`I
`
`-
`
`.. -
`.
`
`(-)
`(+)
`(-) r-
`(+) I
`
`(?
`(])
`~ Tumor
`~ Tumor
`~ Normal
`·~ Normal
`
`0..
`
`C
`
`1\11i!!"IA·-
`
`r1
`
`I
`
`CNV Single (H)
`~ CNV Single (T)
`>
`
`CNV Pair
`
`Page 9
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`
`
`> ....
`
`-....J
`-....J
`QO
`O'I
`.i;...
`0
`0
`N ..._
`0 ....
`N
`rJJ
`c
`
`N
`
`N
`~
`N
`?'
`('D
`"f'j
`
`~
`
`0 ....
`.....
`rJJ =(cid:173)
`0 ....
`
`l,O
`
`l,O
`
`('D
`('D
`
`.... 0 =
`.... 0 = ""O = O" -....
`('D = ..... t 'e -....
`
`~ .....
`
`(')
`
`~ .....
`
`(')
`
`~ .....
`""O
`
`FIG. 88
`
`B) The correlation is stronger (R= 0.84), 1 only meaningful copy number changes (i.e. those greater
`
`than 1.4 fold) are cons1aered.
`
`A) A strong. correlatio~ (R=O. 73) is observed between changes in copy number and changes in gene
`
`expression for patient 8.
`
`R2 =0.560
`
`4
`
`3
`
`2
`
`1
`
`CNV log, (TS/NB)
`-6
`•
`-4
`• • ••
`...
`
`I
`-1;;>"' 0
`
`•
`
`•• ,A.
`
`I
`
`·2
`
`•
`•
`
`41
`
`6
`
`C)
`w
`~-2
`
`0)
`
`N
`
`c
`co
`z
`-----
`co
`
`-
`
`B.
`
`I
`
`7'
`./.
`
`•
`
`•
`•
`
`: I .
`
`R2 =0.560
`
`4
`
`3
`
`•
`
`CNV log, (TS/NB)
`-6
`•
`-4
`
`•
`• ••
`••
`
`C)
`w
`~-2
`
`0)
`N
`
`c
`co
`z
`-----
`co
`
`-
`
`A. I
`
`Page 10
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`
`
`US 2012/0046877 Al
`
`Feb.23,2012
`
`1
`
`SYSTEMS AND METHODS TO DETECT
`COPY NUMBER VARIATION
`
`RELATED APPLICATIONS
`
`[0001] This application claims priority pursuant to 35 U.S.
`C. §119(e) to U.S. Provisional Patent Application Ser. No.
`61/361,886, entitled "Method to Detect Copy Number Varia(cid:173)
`tion," filed on Jul. 6, 2010, the entirety of which is incorpo(cid:173)
`rated herein by reference as if set forth in full.
`
`FIELD
`
`[0002] The present disclosure generally relates to the field
`of nucleic acid sequencing including systems and methods
`for identifying genomic variants using nucleic acid sequenc(cid:173)
`ing data.
`
`INTRODUCTION
`
`[0003] Upon completion of the Human Genome Project,
`one focus of the sequencing industry has shifted to finding
`higher throughput and/or lower cost nucleic acid sequencing
`technologies, sometimes referred to as "next generation"
`sequencing (NGS) technologies. In making sequencing
`higher throughput and/or less expensive, the goal is to make
`the technology more accessible for sequencing. These goals
`can be reached through the use of sequencing platforms and
`methods that provide sample preparation for larger quantities
`of samples of significant complexity, sequencing larger num(cid:173)
`bers of complex samples, and/or a high volume of informa(cid:173)
`tion generation and analysis in a short period of time. Various
`methods, such as, for example, sequencing by synthesis,
`sequencing by hybridization, and sequencing by ligation are
`evolving to meet these challenges.
`[0004] Research into fast and efficient nucleic acid (e.g.,
`genome, exome, etc.) sequence assembly methods is vital to
`the sequencing industry as NGS technologies can provide
`ultra-high throughput nucleic acid sequencing. As such
`sequencing systems incorporating NGS technologies can
`produce a large number of short sequence reads in a relatively
`short amount time. Sequence assembly methods must be able
`to assemble and/or map a large number of reads quickly and
`efficiently (i.e., minimize use of computational resources).
`For example, the sequencing of a human size genome can
`result in tens or hundreds of millions of reads that need to be
`assembled before they can be further analyzed to determine
`their biological, diagnostic and/or therapeutic relevance.
`[0005] Exemplary applications of NGS
`technologies
`include, but are not limited to: genomic variant (e.g., indels,
`copy number variations, single nucleotide polymorphisms,
`etc.) detection, resequencing, gene expression analysis and
`genomic profiling.
`[0006] Of particular interest are copy number variations
`(CNVs), which have been observed in mammalian germline
`DNA and in tumor genomes. CNVs are being increasingly
`implicated as contributing factors in common disease states
`(for example, mental retardation and schizophrenia) and in
`cancer progression. In humans, more total nucleotides exhibit
`variation due to alterations in copy number than due to single
`nucleotide diversity. CNV detection has historically been
`done using comparative genomic hybridization, with one
`method measuring the log2 ratio of test data intensity/control
`
`data intensity. Such methods have inherent limitations so
`there is a need for more flexible CNV detection and analysis
`approaches.
`
`SUMMARY
`
`[0007] Systems, methods, software and computer-usable
`media for copy number variation determination from analyz(cid:173)
`ing biomolecule-related sequence reads are disclosed. Bio(cid:173)
`molecule-related sequences can relate to proteins, peptides,
`nucleic acids, and the like, and can include structural and
`functional information such as secondary or tertiary struc(cid:173)
`tures, amino acid or nucleotide sequences, sequence motifs,
`binding properties, genetic mutations and variants, and the
`like.
`[0008] Using nucleic acids as an example, in various
`embodiments, smaller nucleic acid sequence reads ( e.g., NGS
`reads) can be assembled into larger sequences using an
`anchor-extension mapping method that
`initially maps
`(aligns) only a contiguous portion of each read to a reference
`sequence and then extends the mapping of the read at both
`ends of the mapped contiguous portion until the entire read is
`mapped (aligned). In various embodiments, a mapping score
`can be calculated for the read aligmnent using a scoring
`function, score (i, j)=M+mx, where M can be the number of
`matches in the extended alignment, x can be the number of
`mismatches in the aligmnent, and m can be a negative penalty
`for each mismatch. In various embodiments, the negative
`penalty, m, for each mismatch is user defined. In various
`embodiments, the negative penalty, m, for each mismatch is
`automatically determined by the algorithm/script/program
`implementing the anchor-extension mapping method to
`maximize the accuracy of the read alignment.
`[0009]
`In various embodiments, the nucleic acid sequence
`read data can be generated using various techniques, plat(cid:173)
`forms or technologies, including, but not limited to: capillary
`electrophoresis, microarrays, ligation-based systems, poly(cid:173)
`merase-based systems, hybridization-based systems, direct
`or indirect nucleotide identification systems, pyrosequenc(cid:173)
`ing, ion- or pH-based detection systems, electronic signature(cid:173)
`based systems, etc.
`[0010]
`In one aspect, a system for implementing a copy
`number variation analysis method, is disclosed. The system
`can include a nucleic acid sequencer and a computing device
`in communications with the nucleic acid sequencer. The
`nucleic acid sequencer can be configured to interrogate a
`sample to produce a nucleic acid sequence data file contain(cid:173)
`ing a plurality of nucleic acid sequence reads. In various
`embodiments, the computing device can be a workstation,
`mainframe computer, personal computer, mobile device, etc.
`[0011] The computing device can be comprise a sequenc(cid:173)
`ing mapping engine, a coverage normalization engine, a seg(cid:173)
`mentation engine and a copy number variation identification
`engine. The sequence mapping engine can be configured to
`align the plurality of nucleic acid sequence reads to a refer(cid:173)
`ence sequence, wherein the aligned nucleic acid sequence
`reads merge to form a plurality of chromosomal regions. The
`coverage normalization engine can be configured to divide
`each chromosomal region into one or more non-overlapping
`window regions, determine nucleic acid sequence read cov(cid:173)
`erage for each window region and normalize the nucleic acid
`sequence read coverage determined for each window region
`to correct for bias.
`[0012] The segmentation engine can be configured to con(cid:173)
`vert the normalized nucleic acid sequence read coverage for
`
`Page 11
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`
`
`US 2012/0046877 Al
`
`Feb.23,2012
`
`2
`
`each window region to discrete copy number states. The copy
`number variation identification engine can be configured to
`identify copy number variation in the chromosomal regions
`by utilizing the copy number states of each window region.
`[0013]
`In one aspect, a computer-implemented method for
`identifying copy number variations, is disclosed. A nucleic
`acid sequence data file containing a plurality of nucleic acid
`sequence reads aligned to a reference sequence is received,
`wherein the aligned nucleic acid sequence reads together
`form a plurality of chromosomal regions. Each of the plural(cid:173)
`ity of chromosomal regions are divided into one or more
`non-overlapping window regions. The nucleic acid sequence
`read coverage for each window region is determined. The
`nucleic acid sequence read coverage determined for each
`window region is normalized to correct for bias. The normal(cid:173)
`ized nucleic acid sequence read coverage for each window
`region is converted to discrete copy number states. Copy
`number variation is identified in the chromosomal regions.
`[0014] These and other features are provided herein.
`
`DRAWINGS
`
`[0015] For a more complete understanding of the principles
`disclosed herein, and the advantages thereof, reference is now
`made to the following descriptions taken in conjunction with
`the accompanying drawings, in which:
`[0016] FIG.1 is a block diagram that illustrates a computer
`system, in accordance with various embodiments.
`[0017] FIG. 2 is a schematic diagram of a system for recon(cid:173)
`structing a nucleic acid sequence, in accordance with various
`embodiments.
`[0018] FIG. 3 is a diagram showing a single sample CNV
`sequencing analysis pipeline, in accordance with various
`embodiments.
`[0019] FIG. 4 is a schematic diagram of a system for CNV
`analysis, in accordance with various embodiments.
`[0020] FIG. 5 is an exemplary flowchart showing a method
`for identifying CNV using a single sample approach, in
`accordance with various embodiments.
`[0021] FIG. 6A is a depiction of a nucleic acid sequence
`that does not contain a copy number variant, in accordance
`with various embodiments.
`[0022] FIG. 68 is a depiction of a nucleic acid sequence
`containing a copy number variant, in accordance with various
`embodiments.
`[0023] FIG. 7 is an exemplary flowchart showing a method
`for identifying CNVs using a paired sample approach, in
`accordance with various embodiments.
`[0024] FIG. SA is an illustration of examples of genomic
`regions that show strong correlations between CNVs and
`changes of gene expression, in accordance with various
`embodiments.
`[0025] FIG. SB is an illustration of how large structural
`mutations are strongly correlated with tumor-specific
`changes in gene expression, in accordance with various
`embodiments.
`[0026]
`It is to be understood that the figures are not neces(cid:173)
`sarily drawn to scale, nor are the objects in the figures neces(cid:173)
`sarily drawn to scale in relationship to one another. The fig(cid:173)
`ures are depictions that are intended to bring clarity and
`understanding to various embodiments of apparatuses, sys(cid:173)
`tems, and methods disclosed herein. Wherever possible, the
`same reference numbers will be used throughout the draw(cid:173)
`ings to refer to the same or like parts. Moreover, it should be
`
`appreciated that the drawings are not intended to limit the
`scope of the present teachings in any way.
`
`DESCRIPTION OF VARIOUS EMBODIMENTS
`
`[0027] Embodiments of systems and methods for copy
`number variation determination are described herein.
`According to the present teachings, nucleic acid sequencing
`technologies can be utilized for genome-wide interrogation
`of CNV s. In contrast to conventional approaches ( e.g., array(cid:173)
`based methods, etc.), with sequencing, genomic coverage
`data is available at single base resolution which allows for
`high levels of fidelity when researchers and clinicians search
`for genomic variants such as CNV s in a genome.
`[0028] The section headings used herein are for organiza(cid:173)
`tional purposes only and are not to be construed as limiting
`the described subject matter in any way.
`[0029]
`In this detailed description of the various embodi(cid:173)
`ments, for purposes of explanation, numerous specific details
`are set forth to provide a thorough understanding of the
`embodiments disclosed. One skilled in the art will appreciate,
`however, that these various embodiments may be practiced
`with or without these specific details. In other instances,
`structures and devices are shown in block diagram form.
`Furthermore, one skilled in the art can readily appreciate that
`the specific sequences in which methods are presented and
`performed are illustrative and it is contemplated that the
`sequences can be varied and still remain within the spirit and
`scope of the various embodiments disclosed herein.
`[0030] All literature and similar materials cited in this
`application, including but not limited to, patents, patent appli(cid:173)
`cations, articles, books, treatises, and internet web pages are
`expressly incorporated by reference in their entirety for any
`purpose. Unless defined otherwise, all technical and scientific
`terms used herein have the same meaning as is commonly
`understood by one of ordinary skill in the art to which the
`various embodiments described herein belongs. When defi(cid:173)
`nitions of terms in incorporated references appear to differ
`from the definitions provided in the present teachings, the
`definition provided in the present teachings shall control.
`[0031]
`It will be appreciated that there is an implied
`"about" prior to the temperatures, concentrations, times,
`number of bases, coverage, etc. discussed in the present
`teachings, such that slight and insubstantial deviations are
`within the scope of the present teachings. In this application,
`the use of the singular includes the plural unless specifically
`stated otherwise. Also, the use of "comprise", "comprises",
`"comprising",
`"contain",
`"contains",
`"containing",
`"include", "includes", and "including" are not intended to be
`limiting. It is to be understood that both the foregoing general
`description and the following detailed description are exem(cid:173)
`plary and explanatory only and are not restrictive of the
`present teachings.
`[0032] Further, unless otherwise required by context, sin(cid:173)
`gular terms shall include pluralities and plural terms shall
`include the singular. Generally, nomenclatures utilized in
`connection with, and techniques of, cell and tissue culture,
`molecular biology, and protein and oligo- or polynucleotide
`chemistry and hybridization described herein are those well
`known and commonly used in the art. Standard techniques are
`used, for example, for nucleic acid purification and prepara(cid:173)
`tion, chemical analysis, recombinant nucleic acid, and oligo(cid:173)
`nucleotide synthesis. Enzymatic reactions and purification
`techniques are performed according to manufacturer's speci(cid:173)
`fications or as commonly accomplished in the art or as
`
`Page 12
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`
`
`US 2012/0046877 Al
`
`Feb.23,2012
`
`3
`
`described herein. The techniques and procedures described
`herein are generally performed according to conventional
`methods well known in the art and as described in various
`general and more specific references that are cited and dis(cid:173)
`cussed throughout the instant specification. See, e.g., Sam(cid:173)
`brook et al., Molecular Cloning: A Laboratory Manual (Third
`ed., Cold Spring Harbor Laboratory Press, Cold Spring Har(cid:173)
`bor, N.Y. 2000). The nomenclatures utilized in connection
`with, and the laboratory procedures and techniques described
`herein are those well known and commonly used in the art.
`[0033] As used herein, "a" or "an" means "at least one" or
`"one or more."
`[0034] A "system" denotes a set of components, real or
`abstract, comprising a whole where each component interacts
`with or is related to at least one other component within the
`whole.
`[0035] A "biomolecule" is any molecule that is produced
`by a biological organism, including large polymeric mol(cid:173)
`ecules such as proteins, polysaccharides, lipids, and nucleic
`acids (DNA and RNA) as well as small molecules such as
`primary metabolites, secondary metabolites, and other natu(cid:173)
`ral products.
`[0036] The phrase "next generation sequencing" or NGS
`refers to sequencing technologies having increased through(cid:173)
`put as compared to traditional Sanger- and capillary electro(cid:173)
`phoresis-based approaches, for example with the ability to
`generate hundreds of thousands of relatively small sequence
`reads at a time. Some examples of next generation sequencing
`techniques include, but are not limited to, sequencing by
`synthesis, sequencing by ligation, and sequencing by hybrid(cid:173)
`ization. More specifically, the SOLiD Sequencing System of
`Life Technologies Corp. provides massively parallel
`sequencing with enhanced accuracy. The SOLiD System and
`associated workflows, protocols, chemistries, etc. are
`described in more detail in PCT Publication No. WO 2006/
`084132, entitled "Reagents, Methods, and Libraries for
`Bead-Based Sequencing," international filing date Feb. 1,
`2006, U.S. patent application Ser. No. 12/873,190, entitled
`"Low-Volume Sequencing System and Method ofU se," filed
`on Aug. 31, 2010, and U.S. patent application Ser. No.
`12/873,132, entitled "Fast-Indexing Filter Wheel and Method
`of Use," filed on Aug. 31, 2010, the entirety of each of these
`applications being incorporated herein by reference.
`[0037] The phrase "sequencing run" refers to any step or
`portion of a sequencing experiment performed to determine
`some information relating to at least one biomolecule ( e.g.,
`nucleic acid molecule).
`It is well known that DNA (deoxyribonucleic acid)
`[0038]
`is a chain of nucleotides consisting of 4 types of nucleotides;
`A (adenine), T (thymine), C (cytosine), and G (guanine), and
`that RNA (ribonucleic acid) is comprised of 4 types of nucle(cid:173)
`otides; A, U (uracil), G, and C. It is also known that certain
`pairs of nucleotides specifically bind to one another in a
`complementary fashion ( called complementary base pair(cid:173)
`ing). That is, adenine (A) pairs with thymine (T) (in the case
`of RNA, however, adenine (A) pairs with uracil (U)), and
`cytosine (C) pairs with guanine (G). When a first nucleic acid
`strand binds to a second nucleic acid strand made up of
`nucleotides that are complementary to those in the first strand,
`the two strands bind to form a double strand. As used herein,
`"nucleic acid sequencing data," "nucleic acid sequencing
`information," "nucleic acid sequence," "genomic sequence,"
`"genetic sequence," or "fragment sequence," or "nucleic acid
`sequencing read" denotes any information or data that is
`
`indicative of the order of the nucleotide bases (e.g., adenine,
`guanine, cytosine, and thymine/uracil) in a molecule (e.g.,
`whole genome, whole transcriptome, exome, oligonucle(cid:173)
`otide, polynucleotide, fragment, etc.) of DNA or RNA. It
`should be understood that the present teachings contemplate
`sequence information obtained using all available varieties of
`techniques, platforms or technologies, including, but not lim(cid:173)
`ited to: capillary electrophoresis, microarrays, ligation-based
`systems, polymerase-based systems, hybridization-based
`systems, direct or indirect nucleotide identification systems,
`pyrosequencing, ion- or pH-based detection systems, elec(cid:173)
`tronic signature-based systems, etc.
`[0039] The phrase "ligation cycle" refers to a step in a
`sequence-by-ligation process where a probe sequence is
`ligated to a primer or another probe sequence.
`[0040] The phrase "color call" refers to an observed dye
`color resulting from the detection of a probe sequence after a
`ligation cycle of a sequencing run.
`[0041] The phrase "color space" refers to a nucleic acid
`sequence data schema where nucleic acid sequence informa(cid:173)
`tion is represented by a set of colors ( e.g., color calls, color
`signals, etc.) each carrying details about the identity and/or
`positional sequence of bases that comprise the nucleic acid
`sequence. For example, the nucleic acid sequence "ATCGA"
`can be represented in color space by various combinations of
`colors that are measured as the nucleic acid sequence is inter(cid:173)
`rogated using optical detection-based (e.g., dye-based, etc.)
`sequencing techniques such as those employed by the SOLiD
`System. That is, in various embodiments, the SOLiD System
`can employ a schema that represents a nucleic acid fragment
`sequence as an initial base followed by a sequence of over(cid:173)
`lapping dimers (adjacent pairs of bases). The system can
`encode each dimer with one of four colors using a coding
`scheme that results in a sequence of color calls that represent
`a nucleotide sequence.
`[0042] The phrase "base space" refers to a nucleic acid
`sequence data schema where nucleic acid sequence informa(cid:173)
`tion is represented by the actual nucleotide base composition
`of the nucleic acid sequence. For example, the nucleic acid
`sequence "ATCGA" is represented in base space by the actual
`nucleotide base identities ( e.g., A, T/or U, C, G) of the nucleic
`acid sequence.
`[0043] A "polynucleotide", "nucleic acid", or "oligonucle(cid:173)
`otide" refers to a linear polymer of nucleosides (including
`deoxyribonucleosides, ribonucleosides, or analogs thereof)
`joined by internucleosidic linkages. Typically, a polynucle(cid:173)
`otide comprises at least three nucleosides. Usually oligo(cid:173)
`nucleotides range in size from a few monomeric units, e.g.
`3-4, to several hundreds of monomeric units. Whenever a
`polynucleotide such as an oligonucleotide is represented by a
`sequence of letters, such as "ATGCCTG," it will be under(cid:173)
`stood that the nucleotides are in 5'->3' order from left to right
`and that "A" denotes deoxyadenosine, "C" denotes deoxycy(cid:173)
`tidine, "G" denotes deoxyguanosine, and "T" denotes thymi(cid:173)
`dine, unless otherwise noted. The letters A, C, G, and T may
`be used to refer to the bases themselves, to nucleosides, or to
`nucleotides comprising the bases, as is standard in the art.
`[0044] The techniques of"paired-end," "pairwise," "paired
`tag," or "mate pair" sequencing are generally known in the art
`of molecular biology (Siegel A. F. et al., Genomics. 2000, 68:
`237-246; Roach J.C. et al., Genomics. 1995, 26: 345-353).
`These sequencing techniques can allow the determination of
`multiple "reads" of sequence, each from a different place on
`a single polynucleotide. Typically, the distance (i.e., insert
`
`Page 13
`
`FOUNDATION EXHIBIT 1061
`IPR2019-00634
`
`
`
`US 2012/0046877 Al
`
`Feb.23,2012
`
`4
`
`region) between the two reads or other information regarding
`a relationship between the reads is known. In some situations,
`these sequencing techniques provide more information than
`does sequencing two stretches of nucleic acid sequences in a
`random fashion. With the use of appropriate software tools for
`the assembly of sequence information ( e.g., Millikin S C. et
`al., Genome Res. 2003, 13: 81-90; Kent, W. J. et al., Genome
`Res. 2001, 11: 1541-8) it is possible to make use of the
`knowledge that the "paired-end," "pairwise," "paired tag" or
`"mate pair" sequences are not completely random, but are
`known to occur a known distance apart and/or to have some
`other relationship, and are therefore linked or paired in the
`genome. This information can aid in the assembly of whole
`nucleic acid sequences into a consensus sequence.
`
`Computer-Implemented System
`
`[0045] FIG. 1 is a block diagram that illustrates a computer
`system 100, upon which emb