throbber
R E V I E W S
`
` A P P L I C AT I O N S O F N E X T- G E N E R AT I O N S E Q U E N C I N G
`
`Advances in understanding
`cancer genomes through
`second-generation sequencing
`
`Matthew Meyerson, Stacey Gabriel and Gad Getz
`
`Abstract | Cancers are caused by the accumulation of genomic alterations.
`Therefore, analyses of cancer genome sequences and structures provide insights
`for understanding cancer biology, diagnosis and therapy. The application of
`second-generation DNA sequencing technologies (also known as next-generation
`sequencing) — through whole-genome, whole-exome and whole-transcriptome
`approaches — is allowing substantial advances in cancer genomics. These methods
`are facilitating an increase in the efficiency and resolution of detection of each of
`the principal types of somatic cancer genome alterations, including nucleotide
`substitutions, small insertions and deletions, copy number alterations, chromosomal
`rearrangements and microbial infections. This Review focuses on the methodological
`considerations for characterizing somatic genome alterations in cancer and the future
`prospects for these approaches.
`
`Second-generation
`sequencing
`Used in this Review to refer to
`sequencing methods that have
`emerged since 2005 that
`parallelize the sequencing
`process and produce millions
`of typically short sequence
`reads (50–400 bases) from
`amplified DNA clones.
`It is also often known as
`next-generation sequencing.
`
`Dana-Farber Cancer Institute,
`44 Binney Street, Boston,
`Massachusetts 02115, USA.
`Broad Institute, 7 Cambridge
`Center, Cambridge,
`Massachusetts 02142, USA.
`Correspondence to M.M. 
`email: matthew_meyerson@
`dfci.harvard.edu
`doi:10.1038/nrg2841
`
`A major near-term medical impact of the genome
`technology revolution will be the elucidation of mecha-
`nisms of cancer pathogenesis, leading to improvements
`in the diagnosis of cancer and the selection of cancer
`treatment. Thanks to second-generation sequencing
`technologies1–5, recently it has become feasible to
`sequence the expressed genes (‘transcriptomes’)6,7,
`known exons (‘exomes’)8,9, and complete genomes10–15
`of cancer samples.
`These technological advances are important for
`advancing our understanding of malignant neoplasms
`because cancer is fundamentally a disease of the genome.
`A wide range of genomic alterations — including point
`mutations, copy number changes and rearrangements
`— can lead to the development of cancer. Most of these
`alterations are somatic, that is, they are present in cancer
`cells but not in a patient’s germ line16.
`An impetus for studies of somatic genome altera-
`tions, which are the focus of this Review, is the poten-
`tial for therapies targeted against the products of these
`alterations. For example, treatment with the inhibitors
`of the epidermal growth factor receptor kinase (EGFR),
`gefitinib and erlotinib, leads to a significant survival
`benefit in patients with lung cancer whose tumours
`carry EGFR mutations, but no benefit in patients
`whose tumours carry wild-type EGFR17–19. Therefore,
`
`comprehensive genome-based diagnosis of cancer is
`becoming increasingly crucial for therapeutic decisions.
`During the past decades, there have been major
`advances in experimental and informatic methods
`for genome characterization based on DNA and RNA
`microarrays and on capillary-based DNA sequenc-
`ing (‘first-generation sequencing’, also known as Sanger
`sequencing). These technologies provided the ability
`to analyse exonic mutations and copy number altera-
`tions and have led to the discovery of many important
`alterations in the cancer genome20.
`However, there are particular challenges for the
`detection and diagnosis of cancer genome alterations.
`For example, some genomic alterations in cancer are
`prevalent at a low frequency in clinical samples, often
`owing to substantial admixture with non-malignant
`cells. Second-generation sequencing can solve such
`problems21. Furthermore, these new sequencing meth-
`ods make it feasible to discover novel chromosomal rear-
`rangements22 and microbial infections23–25 and to resolve
`copy number alterations at very high resolution22,26.
`At the same time, the avalanche of data from second-
`generation sequencing provides a statistical and com-
`putational challenge: how to separate the ‘wheat’ of
`causative alterations from the ‘chaff’ of noise caused by
`alterations in the unstable and evolving cancer genome.
`
`NATURE REVIEWS | GENETICS
`
` VOLUME 11 | OCTOBER 2010 | 685
`
`
`
`© 20 Macmillan Publishers Limited. All rights reserved10
`
`Page 685
`
`FOUNDATION EXHIBIT 1042
`IPR2019-00634
`
`

`

`R E V I E W S
`
`First-generation sequencing
`(also known as Sanger
`sequencing or capillary
`sequencing). The standard
`sequencing methodology used
`to sequence the reference
`human (and other model
`organism) genomes. It uses
`radioactively or fluorescently
`labelled dideoxynucleotide
`triphosphates (ddNTPs) as
`DNA chain terminators. Various
`detection methods allow
`read-out of sequence according
`to the incorporation of each
`specific terminator (ddATP,
`ddCTP, ddGTP or ddTTP).
`
`Whole-genome
`amplification
`Various molecular techniques
`(including multiple
`displacement amplification,
`rolling circle amplification or
`degenerate oligonucleotide
`primed PCR) in which very
`small amounts (nanograms)
`of a genomic DNA sample
`can be multiplied in a largely
`unbiased fashion to produce
`suitable quantities for genomic
`analysis (micrograms).
`
`Moore’s law
`The observation made in
`1965 by Gordon Moore that
`the number of transistors per
`square inch on integrated
`circuits had doubled every
`other year since the integrated
`circuit was invented.
`
`This challenge is likely to be solved in part by system-
`atic analyses of large cancer genome data sets that
`will provide sufficient statistical power to overcome
`experimental and biological noise27,28.
`In this Review, we discuss the key challenges in can-
`cer genome sequencing, the methods that are currently
`available and their relative values for detecting differ-
`ent types of genomic alteration. We then summarize the
`main points to consider in the computational analysis
`of cancer genome sequencing data and comment on the
`future potential for using genomics in cancer diagnosis.
`Cancer genome sequencing is a rapidly moving field,
`so in this Review we aim to set out the principles and
`important methodological considerations, with a brief
`summary of important findings to date.
`
`Cancer-specific considerations
`Cancer samples and cancer genomes have general char-
`acteristics that are distinct from other tissue samples
`and from genomic sequences that are inherited through
`the germ line. These require particular consideration in
`second-generation sequencing analyses.
`
`Characteristics of cancer samples for genomic analysis.
`Cancer samples differ in their quantity, quality and purity
`from the peripheral blood samples that are used for germ-
`line genome analysis. Surgical resection specimens tend
`to be large and have been the mainstay of cancer genome
`analysis. However, diagnostic biopsies from patients with
`disseminated disease tend to contain few cells — as surgi-
`cal cure is not possible in these cases, minimizing biopsy
`size is a safety consideration. Therefore, the quantity of
`nucleic acids available may be limiting; obtaining sequence
`information from such biopsies will require decreasing the
`minimum inputs for second-generation sequencing. An
`alternative approach to sequencing from small samples is
`whole-genome amplification, but this method does not pre-
`serve genome structure and can give rise to artefactual
`nucleotide sequence alterations29.
`Nucleic acids from cancer are also often of lower
`quality than those purified from peripheral blood. One
`reason for this is technical: most cancer biopsy and
`resection specimens are formalin-fixed and paraffin-
`embedded (FFPE) to optimize the resolution of micro-
`scopic histology. Nucleic acids from FFPE specimens
`are likely to have undergone crosslinking and also may
`be degraded30. Second-generation sequence analy-
`sis of FFPE-derived nucleic acids can require special
`experimental 31 and computational methods to han-
`dle an increased background mutation rate32,33. A sec-
`ond reason for this difference in nucleic acid quality is
`biological: cancer specimens often include substantial
`fractions of necrotic or apoptotic cells that reduce the
`average nucleic acid quality, therefore, experimental
`methods should also be adapted to account for this. The
`many-fold coverage made possible by second-generation
`sequencing, however, can allow high-quality data to be
`produced from lower quality samples21.
`Finally, cancer nucleic acid specimens are less pure
`than specimens used to analyse the inherited genome,
`especially in terms of genomic DNA purity. The samples
`
`generally used for germline genome analysis — periph-
`eral blood mononuclear cells — are known to be hetero-
`geneous only at the rearranged immunoglobulin and T
`cell receptor loci in a subset of cells. By contrast, a cancer
`specimen contains a mixture of malignant and non-
`malignant cells and, therefore, a mixture of cancer and
`normal genomes (and transcriptomes). Furthermore,
`the cancers themselves may be highly heterogeneous
`and composed of different clones that have different
`genomes34. Cancer genome analytical models must take
`these two types of heterogeneity (cancer versus normal
`heterogeneity and within-cancer heterogeneity) into
`account in their prediction of genome alterations.
`
`Structural variability of cancer genomes. Cancer
`genomes are enormously diverse and complex. They
`vary substantially in their sequence and structure com-
`pared to normal genomes and among themselves. To
`paraphrase Leo Tolstoy’s famous first line from Anna
`Karenina: normal human genomes are all alike, but every
`cancer genome is abnormal in its own way.
`Specifically, cancer genomes vary considerably in
`their mutation frequency (degree of variation compared
`to the reference sequence), in global copy number or
`ploidy, and in genome structure. These variations have
`several implications for cancer genome analysis: the
`presence of a somatic mutation is not enough to establish
`statistical significance as it must be evaluated in terms
`of the sample-specific background mutation rate, which
`can vary at different types of nucleotides (discussed
`further below). The analysis of mutations must also be
`adjusted for the ploidy and the purity of each sample
`and the copy number at each region. For example, if 50%
`of the tumour DNA is derived from cancer cells and a
`mutation is present on 1 of 4 copies of chromosome 11,
`the frequency of that mutation will be 12.5% in the
`sample. Similar considerations apply to the detection of
`somatic rearrangements.
`To identify somatic alterations in cancer, comparison
`with matched normal DNA from the same individual is
`essential. This is largely owing to our incomplete knowl-
`edge of the variations in the normal human genome; to
`date, each ‘matched normal’ cancer genome sequence
`has identified large numbers of mutations and rear-
`rangements in the germ line that had not been previously
`described11–15,35. In the future, the complete characteriza-
`tion of many thousands of normal human genomes may
`obviate this need for a matched normal sample.
`
`Experimental approaches
`Second-generation sequencing technologies are based
`on the simultaneous detection of nucleotides in arrayed
`amplified DNA products originating from single DNA
`molecules36. Specific methods include picotitre-plate
`pyrosequencing3,5, single-nucleotide fluorescent base
`extension with reversible terminators1 and ligation-
`based sequencing2,4. Thanks to advances in sequencing
`approaches that include these technologies, the number of
`bases that can be sequenced for a given cost has increased
`one millionfold since 1990, more than doubling every year,
`which is twice as fast as Moore’s law for semiconductors37.
`
`686 | OCTOBER 2010 | VOLUME 11
`
` www.nature.com/reviews/genetics
`
`
`
`© 20 Macmillan Publishers Limited. All rights reserved10
`
`Page 686
`
`

`

`R E V I E W S
`
`Table 1 | Whole-genome sequencing studies of cancer
`Study
`Method
`Cancer type
`
`Deep single-end
`Ley et al.,
`whole-genome sequencing
`2008
`Shallow paired-end
`Campbell et al.,
`whole-genome sequencing
`2008
`Shallow paired-end
`Stephens et al.,
`whole-genome sequencing
`2009
`Deep paired-end
`Pleasance et al.,
`whole-genome sequencing
`2010
`Deep paired-end
`Pleasance et al.,
`whole-genome sequencing
`2010
`Deep paired-end
`Mardis et al.,
`whole-genome sequencing
`2009
`Deep paired-end
`Shah et al.,
`whole-genome sequencing
`2009
`Deep paired-end
`Ding et al.,
`whole-genome sequencing
`2010
`Deep paired-end
`Lee et al.,
`whole-genome sequencing
`2010
`AML, acute myelogenous leukaemia.
`
`AML
`
`Lung
`
`Breast
`
`Melanoma
`
`Small-cell lung
`
`AML
`
`Breast
`
`Breast
`
`Lung
`
`Number
`of samples
`sequenced
`1
`
`2
`
`24
`
`1
`
`1
`
`1
`
`1
`
`1
`
`1
`
`Aberration type
`
`Refs
`
`Point mutations, insertions, deletions
`
`Deletions, amplifications, tandem duplications,
`interchromosomal rearrangements
`Deletions, amplifications, tandem duplications,
`interchromosomal rearrangements, inversions
`Point mutations, insertions, deletions, amplifications,
`interchromosomal rearrangements
`Point mutations, insertions, deletions, amplifications,
`interchromosomal rearrangements
`Point mutations, insertions, deletions, amplifications,
`interchromosomal rearrangements
`Point mutations, insertions, deletions, amplifications,
`interchromosomal rearrangements
`Point mutations, insertions, deletions, amplifications,
`interchromosomal rearrangements, inversions
`Point mutations, insertions, deletions, amplifications,
`interchromosomal rearrangements, inversions
`
`10
`
`22
`
`39
`
`12
`
`13
`
`11
`
`15
`
`35
`
`14
`
`The application of second-generation sequencing
`has allowed cancer genomics to move from focused
`approaches — such as single-gene sequencing and array
`analysis — to comprehensive genome-wide approaches.
`Second-generation sequencing can be applied to cancer
`samples in various ways. These vary by the type of input
`material (for example, DNA, RNA or chromatin), the
`proportion of the genome targeted (the whole genome,
`transcriptome or a subset of genes) and the type of vari-
`ation studied (structural change, point mutation, gene
`expression or chromosomal conformation). In this
`section, we briefly introduce the main approaches to
`second-generation sequencing of cancer and their asso-
`ciated experimental considerations. Chromatin immuno-
`precipitation followed by sequencing (ChIP–seq) is an
`important complement to cancer genomics but is not
`discussed as it has been reviewed elsewhere38. Key whole-
`genome sequencing studies to date are summarized
`in TABLE 1.
`Compared with previous sequencing methods,
`which are analogue, second-generation sequencing is
`digital: it is possible to count alleles at any nucleotide or
`reads at any alignable position in the genome. Its digital
`nature gives rise to one of the key features of second-
`generation sequencing, the ability to over-sample the
`genome or other nucleic acid compartment that is tar-
`geted10. Over-sampling provides highly accurate sequence
`information by providing enough signal to overcome
`experimental noise and also allows detection of muta-
`tions and other genome alterations in heterogeneous
`samples such as cancer tissues.
`
`Whole-genome sequencing. The first whole cancer
`genome sequence was reported in 2008, a descrip-
`tion of the nucleotide sequence of DNA from an acute
`
`myeloid leukaemia compared with DNA from normal
`skin from the same patient10. Since then, six more
`complete sequences of cancer genomes together with
`matched normal genomes have been reported11–15,35,
`and this number will grow rapidly.
`Complete sequencing of the genome of cancer tis-
`sue to high redundancy, using germline DNA sequence
`from the same patient as a comparison, has the power to
`discover the full range of genomic alterations — includ-
`ing nucleotide substitutions, structural rearrange-
`ments, and copy number alterations — using a single
`approach10–15,35. Therefore, whole-genome sequencing
`provides the most comprehensive characterization
`of the cancer genome but, as it requires the greatest
`amount of sequencing, it is the most costly. Alternative,
`lower-cost approaches include shotgun sequencing with
`incomplete coverage (for example, less than 30-fold
`coverage; see below) — which is sufficient to identify
`somatic rearrangements in the genome22,39 and copy
`number alterations22,26 — and exome and transcriptome
`sequencing, which are described below.
`The major potential of whole-genome sequencing
`for cancer is the discovery of chromosomal rearrange-
`ments. Previously, there were no systematic approaches
`to study solid tumours that have complex karyotypes.
`Therefore, until recently it was thought that chromo-
`somal translocations were rare in epithelial tumours and
`found only in haematological malignancies in which
`they could be observed with cytogenetic methods40,41.
`However, the discoveries of the transmembrane protease
`serine 2 (TMPRSS2)–ERG translocations in prostate
`carcinoma42 and the echinoderm microtubule-associated
`protein like 4 (EML4)–anaplastic lymphoma recep-
`tor tyrosine kinase (ALK) translocations in non-small
`cell lung carcinoma43 have changed that view.
`
`Chromatin
`immunoprecipitation
`A technique used to identify
`the location of DNA-binding
`proteins and epigenetic marks
`in the genome. Genomic
`sequences containing the
`protein of interest are enriched
`by binding soluble DNA
`chromatin extracts (complexes
`of DNA and protein) to an
`antibody that recognizes the
`protein or modification.
`
`Over-sampling
`Reading the same stretch of
`DNA sequence many times to
`gain a confident sequence
`read-out.
`
`Shotgun sequencing
`Sequencing randomly derived
`fragments of the whole
`genome. The order and
`orientation of the sequences
`are determined by mapping
`individual reads back to a
`reference or through assembly
`of overlapping sequences into
`larger contigs of sequence.
`
`NATURE REVIEWS | GENETICS
`
` VOLUME 11 | OCTOBER 2010 | 687
`
`
`
`© 20 Macmillan Publishers Limited. All rights reserved10
`
`Page 687
`
`

`

`R E V I E W S
`
`a
`
`Chr 1q21
`
`b
`
`Chr 1q21
`
`c
`
`Chr 1q21
`
`Sequence = 4
`
`Physical = 4
`
`Sequence = 2
`
`Physical = 4
`
`Chr 2q12
`
`Chr 2q12
`
`Chr 2q12
`
`Sequence = 1
`
`Physical = 7
`
`Nature Reviews | Genetics
`Figure 1 | Depth of coverage and physical coverage. To illustrate considerations
`regarding depth of coverage and physical coverage, a rearrangement between human
`chromosome 1q21 and chromosome 2q12 is shown. Sequenced DNA fragments are
`represented by coloured bars: single-end sequencing is shown in a; paired-end
`sequencing is shown in b and c, in which the bars and the dashed lines indicate the
`sequenced ends and unsequenced part, respectively. Blue bars map to chromosome 1
`and purple bars to chromosome 2. Three different scenarios (a–c) are depicted that
`vary in the length of the DNA fragments that are sequenced. In each scenario, the
`sequence and physical coverage at the rearrangement site is shown below. Sequence
`coverage represents the number of sequenced reads that cover the site; this affects
`the ability to detect point mutations. Physical coverage measures the number of
`fragments that span the site; this affects the ability to detect the rearrangement,
`based on paired reads that map to different chromosomes. In cases in which the entire
`fragment is sequenced, as in a, the sequence and physical coverage are the same.
`
`In addition to rearrangements between unique, align-
`able sequences, whole-genome sequencing may be able
`to detect other types of genomic alterations that have
`not been observable using previous methods. Among
`the most important of such events are somatic mutations
`of non-coding regions, including promoters, enhancers,
`introns and non-coding RNAs (including microRNAs),
`as well as unannotated regions. Other novel types of
`alterations in cancer may include rearrangements
`of repetitive elements, and recent studies have suggested
`that active retrotransposons in the human genome might
`contribute to cancer, so whole-genome sequencing
`would be informative in this regard44,45.
`Two important issues to consider when planning
`whole-genome sequencing experiments are depth of
`coverage and physical coverage. Sequence depth is meas-
`ured by the amount of over-sampling: typically, to detect
`nucleotide alterations with high sensitivity, the 3 billion
`bases of the human genome are covered at least 30-fold
`on average, requiring the generation of 90 billion bases of
`
`Jumping library
`A method of library
`construction in which the
`genome is divided into large
`fragments using a rare cutter
`enzyme. Fragments are
`circularized and DNA
`sequences are read from
`the ends of the fragment,
`without reading the
`intervening sequence.
`
`sequence data per sample10–15,35. For cancer samples, this
`number needs to be increased to account for the decreased
`purity and often increased ploidy of each sample.
`Physical coverage is important for detecting rear-
`rangements and this detection is aided by analysis of
`‘paired reads’. In standard shotgun library methods, the
`fragments of DNA are typically 200–400 bases long, and
`second-generation sequencing technologies currently
`yield 50–100 base reads from each end of a fragment
`(known as paired reads). The expected distance between
`the paired reads is used to uniquely place the reads
`on the reference genome and unexpected read pairing
`can be used to detect structural anomalies.
`The distance between the paired reads can be
`increased to thousands of bases by the creation of jumping
`libraries, which can be constructed by generating large
`circular fragments of DNA4,13. This leads to higher physi-
`cal coverage of the genome with less sequence cover-
`age and, consequently, lower cost. For example, with
`3 kb spacing between pairs, the physical coverage of the
`genome is 10 times higher than with 300 bp inserts, so
`equivalent physical coverage can be obtained with 10
`times less sequence coverage (FIG. 1). Although powerful
`for the detection of structural rearrangements, the jump-
`ing library approach has two main limitations. First, with
`less total sequence, the coverage at any given position is
`lower, therefore the sensitivity to observe base changes
`such as point mutations is correspondingly lower.
`Second, the jumping library approach requires large
`quantities of high-quality input DNA, which may not
`be possible with all clinical cancer samples, especially
`those derived from FFPE specimens.
`
`Exome sequencing. Targeted sequencing approaches
`have the general advantage of increased sequence cov-
`erage of regions of interest — such as coding exons of
`genes — at lower cost and higher throughput compared
`with random shotgun sequencing, Most large-scale
`methods for targeted sequencing use a variation of a
`hybrid selection approach (FIG. 2): nucleic acid ‘baits’ are
`used to ‘fish’ for regions of interest in the total pool of
`nucleic acids, which can be DNA46–49 or RNA50. Any sub-
`set of the genome can be targeted, including exons, non-
`coding RNAs, highly conserved regions of the genome
`or other regions of interest.
`Analysis of selected sets of exons using capillary-based
`sequencing has been a powerful and effective approach
`to focus DNA sequencing efforts on the coding genes of
`greatest interest. For example, capillary sequencing
`of exons from specific gene families has led to the discov-
`ery of activating somatic mutations in various cancers,
`such as the BRAF serine–threonine kinase51, the EGFR,
`ERBB2, fibroblast growth factor receptor 2 (FGFR2),
`JAK2, and ALK receptor tyrosine kinases52–66, and the
`PIK3CA and PIK3R1 lipid kinase subunits28,67. Whole-
`exome sequencing with capillary sequencing allowed the
`analysis of all known coding genes in colorectal, breast and
`pancreatic carcinomas and glioblastoma68–71. These studies
`have led to the discovery of somatic mutations in iso-
`citrate dehydrogenase 1 (IDH1) in glioblastoma69 and of
`germline mutations in the gene encoding partner and
`
`688 | OCTOBER 2010 | VOLUME 11
`
` www.nature.com/reviews/genetics
`
`
`
`© 20 Macmillan Publishers Limited. All rights reserved10
`
`Page 688
`
`

`

`Tumour
`material
`
`R E V I E W S
`
`Matched
`normal
`(blood)
`
`DNA isolation
`
`DNA isolation
`
`Tumour DNA
`
`Normal DNA
`
`Tumour DNA
`(pond)
`
`+
`
`Gene-specific
`oligonucleotides
`(baits)
`
`Normal DNA
`(pond)
`
`+
`
`Gene-specific
`oligonucleotides
`(baits)
`
`Hybridization
`
`Hybridization
`
`Elute
`
`Elute
`
`Sequencing
`
`Sequencing
`
`Alignment
`
`Alignment
`
`Gene
`(reference
`sequence)
`
`AA
`
`AA
`
`AA
`AA
`
`Somatic mutation ‘A’, evidence in tumour, none in normal
`
`Nature Reviews | Genetics
`Figure 2 | Sequence capture for cancer genomics. A schematic diagram of hybrid selection to capture specific
`regions of the genome from tumour DNA (left panel, blue) and normal DNA (right panel, red). DNA from the starting
`material (the ‘pond’) is sheared and hybridized to oligonucleotides that are specific for the regions of interest (for
`example, exons in genes from a particular pathway or the whole exome; the ‘baits’). The baits have a tag that allows
`them to be isolated (for example, by immobilization on beads). The captured DNA is eluted, prepared into sequencing
`libraries, sequenced and aligned to the bait sequences. Because this technique allows greater depth of coverage for
`the regions of interest, somatic mutations in the tumour DNA can be detected from admixed populations containing
`tumour and normal DNA-derived reads.
`
`NATURE REVIEWS | GENETICS
`
` VOLUME 11 | OCTOBER 2010 | 689
`
`
`
`© 20 Macmillan Publishers Limited. All rights reserved10
`
`Page 689
`
`

`

`R E V I E W S
`
`localizer of BRCA2 (PALB2) in patients with pancreatic
`carcinoma72, among other important findings.
`However, second-generation sequencing is a more
`efficient and comprehensive technology for whole-
`exome sequence analysis than capillary-based sequenc-
`ing and is becoming increasingly routine8,9. Because the
`exome represents only approximately 1% of the genome,
`or about 30 Mb, vastly higher sequence coverage can be
`readily achieved using second-generation sequencing
`platforms with considerably less raw sequence and cost
`than whole-genome sequencing. For example, whereas
`90 Gb of sequence is required to obtain 30-fold aver-
`age coverage of the genome, 75-fold average coverage is
`achieved for the exome with only 3 Gb of sequence using
`the current state-of-the-art platforms for targeting73.
`However, there are inefficiencies in the targeting proc-
`ess. For example, uneven capture efficiency across
`exons can mean that not all exons are sequenced and
`some off-target hybridization can occur. These inef-
`ficiencies are likely to be ameliorated as sequencing
`and capture technology continue to improve.
`The higher coverage of the exome that can be affordably
`achieved for a large number of samples makes exome
`sequencing highly suitable for mutation discovery in
`cancer samples of mixed purity. In addition, the hybrid
`selection approach will be particularly powerful for
`diagnostic analysis of the cancer genome; for diagnosis,
`there may be interest in sequencing specific oncogenes74
`and/or tumour suppressor genes at very high coverage in
`samples with a low percentage of tumour cells21.
`
`Transcriptome sequencing. Second-generation sequencing
`of the transcriptome (RNA-seq) — as cDNA derived
`from mRNA, total RNA or other RNAs such as micro-
`RNAs — is a powerful approach for understanding can-
`cer. Transcriptome sequencing is a sensitive and efficient
`approach to detect intragenic fusions, including in-frame
`fusion events that lead to oncogene activation6,7,75,76.
`Transcriptome sequencing can also be used to detect
`somatic mutations but finding a matched normal sample
`for comparison is a challenge, as normal tissue is unlikely
`to express exactly the same genes as the tumour sample.
`Furthermore, mutation detection in genes expressed at
`low levels is hampered owing to lack of statistical power.
`Also, the possibilities of reverse transcriptase errors
`and RNA editing15 need to be considered. Nevertheless,
`important somatic nucleotide substitution mutations
`have been discovered by transcriptome sequencing, most
`notably recurrent mutations in the forkhead box L2 gene
`(FOXL2) in ovarian granulosa cell tumours77.
`RNA-seq also allows analysis of gene expression pro-
`files and is particularly powerful for identifying tran-
`scripts with low-level expression, which means that these
`transcripts can be included in tumour classification
`metrics78. RNA-seq may soon be competitive with oligo-
`nucleotide microarray technologies in terms of the cost
`and efficiency of gene expression analysis. Furthermore,
`transcriptome sequencing provides the advantage of not
`being limited to known genes but can also include the
`detection of novel transcripts, alternative splice forms
`and non-human transcripts.
`
`Detecting classes of genome alterations
`In contrast to previously available genome technologies,
`such as first-generation sequencing and array-based
`methods, second-generation sequencing methods can
`provide a comprehensive picture of the cancer genome
`by detecting each of the major alterations in the cancer
`genome (FIG. 3). Here we describe the analysis of each
`type of alteration briefly.
`
`Somatic nucleotide substitutions and small insertion
`and deletion mutations. Nucleotide substitution muta-
`tions are the most common known somatic genomic
`alteration in cancer, occurring typically at the rate of
`about one somatic nucleotide substitution per million
`nucleotides12,13,15,28,79; insertion and deletion mutations
`are approximately tenfold less common in most can-
`cer specimens. However, the rate of mutations varies
`greatly between cancer specimens. For example, ultra-
`violet radiation-induced melanomas have on the order
`of ten mutations per million bases12 and hypermutated
`tumours with defects in DNA repair genes can reach
`rates of tens of mutations per million bases28,79. By con-
`trast, haematopoietic malignancies can have less than
`one mutation per million bases10,11. Therefore, statistical
`analyses to assess mutation significance must take these
`sample-to-sample variations into account.
`Various computational methods have been devel-
`oped to determine the presence of somatic mutations
`using second-generation sequence data80. The detection
`of somatic mutations in cancer requires mutation call-
`ing in both the tumour DNA and the matched normal
`DNA, coupled with comparison to a reference genome
`and an assessment of the statistical significance of the
`number of counts of the mutation in the cancer sequence
`and its absence in the matched normal sequence. False-
`positive genome alteration calls are of two types: inac-
`curate detection of an event in the tumour, when the
`tumour and normal are both wild-type; and detection of
`a germline event in the tumour but failure to detect it in
`the normal. Different sources of noise contribute to the
`two types of false positives. The first type of error can
`be due to machine-sequencing errors, incorrect local
`alignment of individual reads and discordant alignment
`of pairs. Stochastic errors such as machine errors can
`be eliminated by high-level over-sampling of tumour
`and normal DNA sequence with sufficiently stringent
`statistical thresholds for mutation calling. The second
`type of false-positive mutation calls are caused by fail-
`ures to detect the germline alleles that differ from the
`reference sequence in the normal sample, mostly owing
`to insufficient coverage.
`In general, the most common cause of false-negative
`mutation calls is insufficient coverage of the cancer
`DNA. As discussed above, increased over-sampling may
`be required to overcome sample admixture, tumour het-
`erogeneity and variations in ploidy (genome-wide and
`local).
`The identification of candidate mutations associated
`with cancer then leads to two questions: is the specific
`mutation or the set of alterations in a particular gene
`statistically significant across all samples, and is the
`
`690 | OCTOBER 2010 | VOLUME 11
`
` www.nature.com/reviews/genetics
`
`
`
`© 20 Macmillan Publishers Limited. All rights reserved10
`
`Page 690
`
`

`

`R E V I E W S
`
`Chr 5
`
`Non-human
`sequence
`
`Reference sequence
`Chr 1
`
`cccccccccccccccA
`
`Point mutation
`
`Indel
`
`Homozygous
`deletion
`
`Hemizygous
`deletion
`
`Gain
`
`Translocation
`breakpoint
`
`Pathogen
`
`Copy number alterations
`Figure 3 | Types of genome alterations that can be detected by seco

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket