`
`ARTICLES
`
`Vol 456 | 6 November 2008 | doi:10.1038/nature07485
`
`DNA sequencing of a cytogenetically
`normal acute myeloid leukaemia genome
`
`Timothy J. Ley1,2,3,4*, Elaine R. Mardis2,3*, Li Ding2,3, Bob Fulton3, Michael D. McLellan3, Ken Chen3, David Dooling3,
`Brian H. Dunford-Shore3, Sean McGrath3, Matthew Hickenbotham3, Lisa Cook3, Rachel Abbott3, David E. Larson3,
`Dan C. Koboldt3, Craig Pohl3, Scott Smith3, Amy Hawkins3, Scott Abbott3, Devin Locke3, LaDeana W. Hillier3,8,
`Tracie Miner3, Lucinda Fulton3, Vincent Magrini2,3, Todd Wylie3, Jarret Glasscock3, Joshua Conyers3,
`Nathan Sander3, Xiaoqi Shi3, John R. Osborne3, Patrick Minx3, David Gordon8, Asif Chinwalla3, Yu Zhao1,
`Rhonda E. Ries1, Jacqueline E. Payton5, Peter Westervelt1,4, Michael H. Tomasson1,4, Mark Watson3,4,5, Jack Baty6,
`Jennifer Ivanovich4,7, Sharon Heath1,4, William D. Shannon1,4, Rakesh Nagarajan4,5, Matthew J. Walter1,4,
`Daniel C. Link1,4, Timothy A. Graubert1,4, John F. DiPersio1,4 & Richard K. Wilson2,3,4
`
`Acute myeloid leukaemia is a highly malignant haematopoietic tumour that affects about 13,000 adults in the United States
`each year. The treatment of this disease has changed little in the past two decades, because most of the genetic events that
`initiate the disease remain undiscovered. Whole-genome sequencing is now possible at a reasonable cost and timeframe to
`use this approach for the unbiased discovery of tumour-specific somatic mutations that alter the protein-coding genes. Here
`we present the results obtained from sequencing a typical acute myeloid leukaemia genome, and its matched normal
`counterpart obtained from the same patient’s skin. We discovered ten genes with acquired mutations; two were previously
`described mutations that are thought to contribute to tumour progression, and eight were new mutations present in virtually
`all tumour cells at presentation and relapse, the function of which is not yet known. Our study establishes whole-genome
`sequencing as an unbiased method for discovering cancer-initiating mutations in previously unidentified genes that may
`respond to targeted therapies.
`
`We used massively parallel sequencing technology to sequence the
`genomic DNA of tumour and normal skin cells obtained from a patient
`with a typical presentation of French–American–British (FAB) subtype
`M1 acute myeloid leukaemia (AML) with normal cytogenetics. For the
`tumour genome, 32.7-fold ‘haploid’ coverage (98 billion bases) was
`obtained, and 13.9-fold coverage (41.8 billion bases) was obtained
`for the normal skin sample. Of the 2,647,695 well-supported single
`nucleotide variants (SNVs) found in the tumour genome, 2,584,418
`(97.6%) were also detected in the patient’s skin genome, limiting the
`number of variants that required further study. For the purposes of this
`initial study, we restricted our downstream analysis to the coding
`sequences of annotated genes: we found only eight heterozygous,
`non-synonymous somatic SNVs in the entire genome. All were new,
`including mutations in protocadherin/cadherin family members
`(CDH24 and PCLKC (also known as PCDH24)), G-protein-coupled
`receptors (GPR123 and EBI2 (also known as GPR183)), a protein
`phosphatase (PTPRT), a potential guanine nucleotide exchange factor
`(KNDC1), a peptide/drug transporter (SLC15A1) and a glutamate
`receptor gene (GRINL1B). We also detected previously described,
`recurrent somatic insertions in the FLT3 and NPM1 genes. On the
`basis of deep readcount data, we determined that all of these mutations
`(except FLT3) were present in nearly all tumour cells at presentation
`and again at relapse 11 months later, suggesting that the patient had a
`single dominant clone containing all of the mutations. These results
`demonstrate the power of whole-genome sequencing to discover new
`cancer-associated mutations.
`
`AML refers to a group of clonal haematopoietic malignancies that
`predominantly affect middle-aged and elderly adults. An estimated
`13,000 people will develop AML in the United States in 2008, and
`8,800 will die from it1. Although the life expectancy from this disease
`has increased slowly over the past decade, the improvement is pre-
`dominantly because of improvements in supportive care—not in the
`drugs or approaches used to treat patients.
`For most patients with a ‘sporadic’ presentation of AML, it is not yet
`clear whether inherited susceptibility alleles have a role in the patho-
`genesis2. Furthermore, the nature of the initiating or progression
`mutations is for the most part unknown3. Recent attempts to identify
`additional progression mutations by extensively re-sequencing tyro-
`sine kinase genes yielded very few previously unidentified mutations,
`and most were not recurrent4,5. Expression profiling studies have
`yielded signatures that correlate with specific cytogenetic subtypes of
`AML, but have not yet suggested new initiating mutations6–8. Recent
`studies using array-based comparative genomic hybridization and/or
`single nucleotide polymorphism (SNP) arrays, although identifying
`important gene mutations in acute lymphoblastic leukaemia9,10 have
`revealed very few recurrent submicroscopic somatic copy number
`variants in AML (M.J.W., manuscript in preparation, and refs 11–
`13). Together, these studies suggest that we have not yet discovered
`most of the relevant mutations that contribute to the pathogenesis of
`AML. We therefore believe that unbiased whole-genome sequencing
`will be required to identify most of these mutations. Until recently, this
`approach has not been feasible because of the high cost of conventional
`
`1Department of Medicine, 2Department of Genetics, 3The Genome Center at Washington University, 4Siteman Cancer Center, 5Department of Pathology and Immunology, 6Division
`of Biostatistics, and 7Department of Surgery, Washington University School of Medicine, St. Louis, Missouri 63108, USA. 8Department of Genome Sciences, University of Washington,
`Seattle, Washington 98195, USA.
`*These authors contributed equally to this work.
`
`66
`
` ©2008 Macmillan Publishers Limited. All rights reserved
`
`Foresight EX1012
`Foresight v Personalis
`IPR2024-00170
`
`
`
`NATURE | Vol 456 | 6 November 2008
`
`ARTICLES
`
`capillary-based approaches and the large numbers of primary tumour
`cells required to yield the necessary genomic DNA. ‘Next-generation’
`sequencing approaches, however, have changed this landscape.
`Our group has pioneered the use of whole-genome re-sequencing
`and variant discovery approaches using the Illumina/Solexa techno-
`logy with the genome of the nematode worm Caenorhabditis elegans as
`a proof-of-principle14. This approach has distinct advantages in
`reduced cost, a markedly increased data production rate, and a low
`input requirement of DNA for library construction. In the present
`study, we used a similar approach to sequence the tumour genome
`of a single AML patient and the matched normal genome (derived
`from a skin biopsy) of the same patient. After alignment to the human
`reference genome, sequence variants were discovered in the tumour
`genome and compared to the patient’s normal sequence, to the dbSNP
`database, and to variants recently reported for two other human gen-
`omes15,16; revealing new single nucleotide and small insertion/deletion
`(indel) variants genome-wide. Somatic mutations were detected in
`genes not previously implicated in AML pathogenesis, demonstrating
`the need for unbiased whole-genome approaches to discover all muta-
`tions associated with cancer pathogenesis.
`
`Rationale for using the FAB M1 AML subtype for sequencing
`Of the eight FAB subtypes of AML, M1 AML is one of the most
`common (,20% of all cases). No specific cytogenetic abnormalities
`or somatic initiating mutations have been identified for this subtype;
`in fact, about half of the patients with de novo M1 AML have normal
`cytogenetics17–19. The frequency of well-described progression muta-
`tions (for example, activating alleles of FLT3, KIT and RAS) is similar
`to that of other common FAB subtypes5. We therefore decided to
`sequence the genome of tumour cells derived from a patient with M1
`AML, because so little is known about the molecular pathogenesis of
`this common subtype. The criteria used to select the sample are out-
`lined in Supplementary Information.
`
`Case presentation of UPN 933124
`The case presentation is described in detail in the Supplementary
`Information. In brief, a previously healthy woman in her mid-50s
`presented suddenly with fatigue and easy bruisability, and was found
`to have a peripheral white blood cell count of 105,000 cells per micro-
`litre, with 85% myeloblasts. A bone marrow examination revealed
`100% myeloblasts with morphological features and cell surface mar-
`kers consistent with FAB M1 AML (Supplementary Fig. 1).
`Cytogenetic analysis of tumour cells revealed a normal 46,XX karyo-
`type. Although the patient experienced a complete remission with
`conventional therapies, she relapsed at 11 months and expired
`24 months after her initial diagnosis was made. At relapse, the bone
`marrow had 78% myeloblasts, and contained a new clonal cytoge-
`netic abnormality, t(10; 12) (p12; p13). Informed consent for whole-
`genome sequencing was subsequently obtained from her next of kin.
`
`A typical M1 AML diploid genome and expression profile
`The tumour sample from patient 933124 contained no somatic copy
`number changes at a resolution of ,5 kb (further confirmed on the
`NimbleGen 2.1M array platform, data not shown), and no evidence
`of copy number neutral loss-of-heterozygosity (LOH), indicating
`that the genome was essentially diploid at this level of resolution
`(see Supplementary Fig. 2). Further analysis of the 933124-derived
`tumour and skin samples showed 26 inherited copy number variants
`(that is, detected in both the tumour and skin samples). All but two of
`these had been previously reported in the Database of Genomic
`Variants (see Supplementary Table 1). All of the copy number var-
`iants detected in this genome were found in at least one other AML
`patient (89 other cases, mostly Caucasian, have been queried using
`the same SNP array platform), and all but one were found in at least
`one of the 160 Caucasian HapMap and Coriell samples that were
`studied on the same array platform (Supplementary Table 1).
`
`To determine whether the tumour cells of 933124 were typical of
`M1 AML, we compared the expression signatures of 111 de novo AML
`cases using unsupervised clustering (Ward’s method, see Supple-
`mentary Information). The expression profile of patient 933124
`clustered with multiple other M1 (and M2) AML cases with normal
`cytogenetics, suggesting that the genetic events underlying the patho-
`genesis of this case are similar to those of other cases exhibiting normal
`cytogenetics (Supplementary Fig. 3).
`
`Coverage depth of the tumour and skin genomes
`Because most of the acquired mutations in cancer genomes have been
`shown to be heterozygous, the complete sequencing of a cancer gen-
`ome requires the detection of both alleles at most positions in the
`genome20. We therefore designed sequence coverage metrics to define
`the point at which 90% diploid coverage had been reached. To min-
`imize errors associated with any single platform or measurement,
`diploid coverage for this genome was assessed using a set of high-
`quality SNPs derived from two different SNP array platforms,
`Affymetrix 6.0 and Illumina Infinium 550K. For a SNP to be included
`in the high-quality set, the following criteria had to be satisfied: (1)
`identical genotypes were called from both assays at the same genomic
`positions, and (2) the resulting genotype was heterozygous. For the
`933124 tumour genome, 46,494 heterozygous SNPs passed the above
`criteria and were defined as high-quality SNPs. For the skin samples,
`46,572 high-quality SNPs were defined.
`We performed 98 full runs on the Illumina Genome Analyser to
`achieve the targeted level of 90% diploid coverage as determined by
`coverage of the high-quality SNP set. Maq21 was used to perform
`alignment, determine consensus, and identify SNVs within the 98
`billion bases generated from the tumour genome (see Table 1). Maq
`predicted a total of 3.81 million SNVs (Maq SNP quality $ 15) in the
`tumour genome, including matching heterozygous genotypes for
`91.2% of the 46,494 high-quality SNPs. When we lowered the Maq
`SNP quality cutoff to 0, 94.06% high-quality SNPs were predicted.
`Further investigation of Maq alignments revealed coverage for both
`alleles at a further 5.38% of the high-quality SNPs, but Maq did not
`predict a SNP or matching heterozygous genotype owing to insuf-
`ficient depth or quality of coverage. Extra analysis revealed coverage
`at 46,484 of 46,494 high-quality SNPs for at least one allele (that is,
`99.98% haploid coverage for the tumour genome).
`We sequenced the genome of normal skin cells from the same
`patient to enable the identification of inherited sequence variants
`in the tumour genome. Our targeted diploid coverage goal for the
`skin-derived genome was 80%. We achieved this goal with only 34
`Solexa runs (41.8 billion bases), using improved reagents and longer
`read lengths to attain 82.6% diploid and 84.2% haploid coverage
`(Table 1).
`To begin evaluating the quantity and quality of the detected
`sequence variants in the tumour and skin genomes, we compared
`the overlap and uniqueness of this genome’s variants with respect to
`the James D. Watson and J. Craig Venter genomes, and to dbSNP
`(v127; Fig. 1). Of the 3.68 million single nucleotide variants (SNVs;
`Maq SNP quality $15, excluding SNVs found on chromosome X)
`predicted by Maq in the tumour genome, 2.36 million were present in
`dbSNP, 2.36 million were detected in the skin genome (Fig. 1a),
`1.50 million were detected in the Venter genome, and 1.58 million
`were found in the Watson genome (Fig. 1b). Ultimately, 1.70 million
`SNVs were unique to the 933124 tumour genome. On filtering the
`933124 SNVs at different Maq quality values to determine the
`stability of results, we observed that the proportion of 933124
`SNVs that also are in dbSNP increases from 63.9% to 69.48% when
`the Maq quality threshold score increases from 15 to 30, as expected.
`
`Refining the detection of potential somatic mutations
`Because the number of sequence variants initially detected by Maq
`was high, we developed improved filtering tools to effectively sepa-
`rate true variants from false positives. To this end, we generated an
`
`67
`
` ©2008 Macmillan Publishers Limited. All rights reserved
`
`Foresight EX1012
`Foresight v Personalis
`IPR2024-00170
`
`
`
`ARTICLES
`
`NATURE | Vol 456 | 6 November 2008
`
`Table 1 | Tumour and skin genome coverage from patient 933124
`
`Libraries
`Runs
`Reads obtained
`Reads passing quality filter
`Bases passing quality filter
`Reads aligned by Maq
`Reads unaligned by Maq
`
`SNVs detected with respect to hg18 (no Y)
`SNVs (chr 1–22) detected with respect to hg18
`SNVs also present in dbSNP
`SNVs also present in Venter genome
`SNVs also present in Watson genome
`SNVs not in dbSNP/Venter/Watson
`SNVs not in dbSNP/Venter/Watson/skin
`
`HQ SNPs
`HQ SNPs where reference allele is detected
`HQ SNPs where variant allele is detected
`HQ SNPs where both alleles are detected
`
`Tumour
`
`4
`98
`5,858,992,064
`3,025,923,365
`98,184,511,523
`2,729,957,053
`295,966,312
`
`3,811,115
`3,681,968 (100.0%)
`2,368,458 (64.3%)
`1,499,010 (40.7%)
`1,573,435 (42.7%)
`1,223,830 (33.2%)
`925,200 (25.1%)
`
`46,494 (100.0%)
`42,419 (91.2%)
`43,164 (92.9%)
`42,415 (91.2%)
`
`Skin
`
`3
`34
`2,122,836,148
`1,228,177,690
`41,783,794,834
`1,080,576,680
`138,276,594
`
`2,918,446
`2,830,292 (100.0%)
`2,161,695 (76.4%)
`1,383,431 (48.9%)
`1,456,822 (51.5%)
`591,131 (20.9%)
`2
`
`46,572 (100.0%)
`38,454 (82.6%)
`39,220 (84.2%)
`38,454 (82.6%)
`
`Assessments are shown of the haploid and diploid coverage of the tumour and skin genomes from AML patient 933124.
`Chr, chromosome; hg18, human genome version 18; HQ, high quality.
`
`experimental data set by re-sequencing Maq-predicted SNVs, ran-
`domly selecting a training subset and a test data set, whose annota-
`tions and features were submitted to Decision Tree C4.5 (ref. 22).
`
`a
`
`b
`
`933124
`
`Venter
`
`Watson
`
`Skin
`
`Tumour
`
`dbSNP
`
`Figure 1 | Overlap of SNPs detected in 933124 and other genomes. a, Venn
`diagram of the overlap between SNPs detected in the 933124 tumour
`genome and the genomes of J. D. Watson and J. C. Venter. b, Venn Diagram
`of the overlap among the 933124 tumour genome, the skin genome and
`dbSNP (ver. 127). SNVs were defined with a Maq SNP quality $15.
`
`68
`
`This approach identified parameters that separated true variants
`from false positives, revealing that SNV-supporting read counts
`(unique on the basis of read start position and base position in
`supporting reads), base quality and Maq quality scores are chief
`determinants for identifying false positives. Implementing rules
`obtained from the Decision Tree analysis resulted in 91.9% sensitivity
`and 83.5% specificity for validated SNVs.
`
`Identification of somatic mutations in coding sequences
`The patient had 3,813,205 sequence variants in her tumour genome,
`as defined by Maq scores of .15 (Table 1). Of these, 2,647,695 were
`supported by the Decision Tree analysis in the tumour genome, of
`which 2,584,418 (97.6%) were also detected in the skin genome
`(Fig. 2). The detailed algorithm for selecting putative somatic var-
`iants is described in Supplementary Information. Most of the 63,277
`tumour-specific variants we detected were either present in dbSNP or
`were previously described in the Watson or Venter genomes
`(31,645), or occurred in non-genic regions (20,440). A total of
`11,192 variants were located within the boundaries of annotated
`
`3,813,205 tumour SNVs (Maq15)
`
`2,647,695 well supported SNVs (decision tree)
`
`!
`!
`!
`~ !
`!
`!
`---------.... -
`! ~
`
`2,584,418 present
`in skin (SNPs)
`
`20,440 in
`non-genic regions
`/
`10,735 intronic
`
`63,277 tumour-specific SNVs
`
`31,632 new SNVs
`
`31,645 in dbSNP/
`Watson/Venter
`
`11,192 SNVs in genic regions
`
`---------....
`216 in UTR
`
`241 SNVs in coding sequence
`
`181 SNVs predicted to alter gene function
`(non-synonymous and splice junctions)
`
`60 synonymous
`7 unable to
`be validated
`(technical failures)
`
`14 validated
`as germline
` SNVs (SNPs)
`
`
`8 validated as somatic
`SNVs (acquired mutations)
`
`152 validated
`as wild type
`(false positives)
`
`Figure 2 | Filters used to identify somatic point mutations in the tumour
`genome. See text for details. UTR, untranslated regions.
`
` ©2008 Macmillan Publishers Limited. All rights reserved
`
`Foresight EX1012
`Foresight v Personalis
`IPR2024-00170
`
`
`
`NATURE | Vol 456 | 6 November 2008
`
`ARTICLES
`
`Primary tumour
`Relapse tumour
`Skin
`
`* *
`
`*
`*
`
`*
`*
`
`*
`*
`
`*
`*
`
`*
`*
`
`*
`*
`
`*
`*
`
`*
`*
`
`*
`*
`
`100
`
`80
`
`60
`
`40
`
`20
`
`Variant (%)
`
`0
`
`SLC15A1
`G RINL1B
`G PR123
`K N D C1
`C D H24
`PTPRT
`
`P CLK C
`EB12
`
`FLT3
`
`N P M 1
`
`B R C A2
`
`TP53
`
`Figure 3 | Summary of Roche/454 FLX readcount data obtained for ten
`somatic mutations and two validated SNPs in the primary tumour, relapse
`tumour and skin specimens. The readcount data for the variant alleles in the
`primary tumour sample and relapse tumour sample are statistically different
`from that of the skin sample for all mutations (P , 0.000001 for all
`mutations, Fisher’s exact test, denoted by a single asterisk in all cases). Note
`that the normal skin sample was contaminated with leukaemic cells
`containing the somatic mutations. The patient’s white blood cell count was
`105,000 (85% blasts) when the skin punch biopsy was obtained.
`
`tumour variants to move forward in the discovery pipeline if they
`were detected at a low frequency (two or fewer reads) in the skin
`sample, as defined by a binomial test.
`
`Detecting insertions and deletions (indels)
`To discover small indels (,6 bp) from sequence reads (32–35 bp
`long), we started with a set of 236 million reads that were not con-
`fidently aligned by Maq to the reference genome. We applied
`Cross_Match and BLAT to identify gapped alignments that are unique
`in the genome. To detect indels longer than 6 bp, we developed a ‘split
`reads’ algorithm (see Supplementary Information) that aligns sub-
`segments of reads independently to the genome, and computes a
`mapping quality for the derived gapped alignment on the basis of
`the number of hits and the quality of the bases. These efforts resulted
`in the identification of 726 putative small indels (1 to 30 bp in size)
`that occur in coding exons, 393 of which (54.2%) were found in
`dbSNP. After manual review, we selected a set of 28 putative somatic
`coding indels for validation using PCR-based dye terminator sequen-
`cing. Of these putative indels, 22 were validated but were found pre-
`sent in both tumour and skin (15 of these were in dbSNP), two were
`false positive calls, two had no coverage, and two were previously
`validated somatic insertions in NPM1 (4 bp) and FLT3 (30 bp).
`
`Discussion
`Here we describe the sequencing and analysis of a primary human
`cancer genome using next-generation sequencing technology. Our
`
`genes; 216 of these variants were in untranslated regions, and 10,735
`were in introns (but not involving splice junctions) and were not
`explored further in our analysis. Of the coding sequence variants, 60
`were synonymous, and not further evaluated. The remaining 181
`variants were either non-synonymous, or were predicted to alter
`splice site function. By sequencing polymerase chain reaction
`(PCR)-generated amplicons from the tumour and skin samples
`(and also from the relapse tumour sample obtained 11 months after
`the original presentation), we determined that 152 of these variants
`were false positive (that is, wild type) calls, 14 were inherited SNPs,
`and eight were somatic mutations in both the original tumour and
`the relapse sample (Table 2). Seven variants could not be validated,
`either because the regions involved were repetitive, or because all
`attempts to obtain PCR amplicons failed. All of the PCR-amplified
`exons from the eight genes containing validated somatic mutations
`were sequenced in 187 further cases of AML using samples from our
`discovery and validation sets23; no further somatic mutations were
`detected in these genes (data not shown). A description of how we
`estimated the false negative (12.45%) and false positive (0.06%) rates
`for SNVs over the entire genome is presented in Supplementary
`Information. Using these estimates, we can predict that very few
`somatic, non-synonymous variants were missed by our analysis of
`this deeply covered genome.
`
`Defining mutation frequencies in the tumour sample
`To better define the percentage of tumour cells that contained each
`of the discovered somatic mutations, we amplified each mutation-
`containing locus from non-amplified genomic DNA derived from
`the de novo and relapse tumour samples, and from the skin biopsy
`obtained at presentation. The resulting amplicons were sequenced
`using the Roche/454 FLX platform, and the frequency of reads con-
`taining the reference and variant alleles were defined (Fig. 3 and
`Table 3). Control amplicons containing a known heterozygous
`SNP in BRCA2 (encoding N372H) and a homozygous SNP in
`TP53 (encoding P72R) were analysed similarly. The BRCA2 SNP
`yielded ,50% variant frequencies in the tumour and skin samples,
`whereas nearly 100% of the TP53 alleles were variant in all three
`samples, as expected. Remarkably, all eight somatic SNVs were
`detected at ,50% frequencies in the primary tumour sample
`(100% blasts), and at ,40% frequencies in the relapse sample
`(78% blasts;
`if the variant
`frequencies are corrected for blast
`counts—that is, multiplied by 1.28—the frequencies at relapse also
`were ,50%). The NPMc (cytoplasmic nucleophosmin) mutation
`was also detected at a frequency of ,50%, but the FLT3 internal
`tandem duplication (ITD) allele was only detected in 35.1% of the
`454 reads at diagnosis and 31.3% at relapse, suggesting that the
`mutation was not present in all tumour cells at diagnosis or relapse.
`Notably, the variant alleles also were detected at frequencies of
`,5–13% in the skin sample. In retrospect, it is clear that the skin
`sample contained contaminating leukaemic cells, because the
`patient’s white blood cell count at presentation was 105,000 per
`microlitre, with 85% blasts. This information was used to inform
`the Decision Tree analysis described above: we allowed high-quality
`
`Table 2 | Non-synonymous somatic mutations detected in the AML sample
`Gene
`Consequence
`Type
`Solexa tumour reads
`WT:variant
`9:9
`15:12
`7:8
`9:13
`15:10
`11:11
`7:12
`19:9
`18:12
`36:6
`
`CDH24
`SLC15A1
`KNDC1
`PTPRT
`GRINL1B
`GPR123
`EBI2
`PCLKC
`FLT3
`NPM1
`
`Y590X
`W77X
`L799F
`P1235L
`R176H
`T38I
`A338V
`P1004L
`ITD
`CATG ins
`
`Nonsense
`Nonsense
`Missense
`Missense
`Missense
`Missense
`Missense
`Missense
`Indel
`Indel
`
`Solexa skin reads
`WT:variant
`16:0
`19:0
`20:0
`16:0
`14:0
`13:0
`18:2
`15:1
`8:0
`33:0
`
`Conservation score of
`mutant base
`0.998
`1.000
`NA
`1.000
`NA
`NA
`1.000
`0.98
`NA
`NA
`
`Mutations in other AML
`cases*
`0/187
`0/187
`0/187
`0/187
`0/187
`0/187
`0/187
`0/187
`51/185
`43/180
`
`Ins, insertion; WT, wild type.
`* Patient cohort defined in ref. 23.
`
` ©2008 Macmillan Publishers Limited. All rights reserved
`
`69
`
`Foresight EX1012
`Foresight v Personalis
`IPR2024-00170
`
`
`
`ARTICLES
`
`NATURE | Vol 456 | 6 November 2008
`
`Table 3 | 454 Readcount data for somatic mutations and known SNPs
`Primary AML (100% blasts)
`
`Gene
`CDH24
`SLC15A1
`KNDC1
`PTPRT
`GRINL1B
`GPR123
`EBI2
`PCLKC
`FLT3
`NPM1
`BRCA2
`TP53
`
`Consequence
`Y590X
`W77X
`L799F
`P1235L
`R176H
`T38I
`A338V
`P1004L
`ITD
`CATG ins
`N372H
`P72R
`
`Variant
`
`5672
`3817
`4640
`998
`2211
`4618
`12750
`992
`4220
`1550
`778
`8989
`
`Ref
`
`4890
`4962
`4848
`1058
`2674
`4569
`15453
`855
`7810
`1974
`752
`1
`
`Variant (%)
`53.70
`43.48
`48.90
`48.54
`45.26
`50.27
`45.21
`53.71
`35.08
`43.98
`50.85
`99.99
`
`Variant
`
`564
`875
`770
`126
`318
`850
`458
`341
`3475
`143
`763
`8161
`
`Skin
`
`Relapse (78% blasts)
`
`Ref
`
`10358
`10773
`8972
`1489
`4461
`9751
`10088
`3153
`23159
`2390
`876
`0
`
`Variant (%)
`5.16
`7.51
`7.90
`7.80
`6.65
`8.02
`4.34
`9.76
`13.05
`5.65
`46.55
`100.00
`
`Variant
`
`3108
`4714
`3883
`350
`1447
`3660
`2646
`705
`3870
`2303
`285
`7914
`
`Ref
`
`4599
`7173
`6342
`493
`2070
`6057
`3627
`773
`8495
`3910
`303
`6
`
`Variant (%)
`40.33
`39.66
`37.98
`41.52
`41.14
`37.67
`42.18
`47.70
`31.30
`37.07
`48.47
`99.92
`
`The differences between variant frequencies in primary or relapse tumour samples and skin were highly significant for all somatic mutations (P , 0.000001, Fisher’s exact test, one tailed). The
`BRCA2 variant is a known heterozygous SNP in this genome, and the TP53 variant is a known homozygous SNP.
`
`patient’s tumour genome was essentially diploid, and contained ten
`non-synonymous somatic mutations that may be relevant for her
`disease. These mutations affect genes participating in several well-
`described pathways that are known to contribute to cancer patho-
`genesis, but most of these genes would not have been candidates for
`directed re-sequencing on the basis of our current understanding of
`cancer. Hence, these results justify the use of next-generation whole-
`genome sequencing approaches to reveal somatic mutations in can-
`cer genomes.
`As we demonstrated in our re-sequencing of the genome of the C.
`elegans N2 Bristol strain14, and again in this study, massively parallel
`short-read sequencing provides an effective method for examining
`single nucleotide and short indel variants by comparison of the aligned
`reads to a reference genome sequence. By sequencing our patient’s
`tumour genome to a depth of .30-fold coverage, and gauging our
`ability to detect known heterozygous positions across the genome,
`we have produced a sufficient depth and breadth of sequence coverage
`to comprehensively discover somatic genome variants. A slightly lower
`coverage of the normal genome from this individual helped to identify
`nearly 98% of potential variants as being inherited, a critical filter that
`allowed us to more readily identify the true somatic mutations in this
`tumour. Our results strongly support the notion that hypothesis-
`driven (for example, candidate gene-based) examination of tumour
`genomes by PCR-directed or capture-based methods is inherently
`limited, and will miss key mutations. A further and important consid-
`eration is the demand for large amounts of genomic DNA by these
`techniques; this is a serious limitation when precious clinical samples
`are being studied. The Illumina/Solexa technology requires only ,1 mg
`of DNA per library, enabling the study of primary tumour DNA rather
`than requiring the use of tumour cell lines, which may contain genetic
`changes and adaptations required for immortalization and mainten-
`ance in tissue culture conditions.
`A total of ten non-synonymous somatic mutations were identified
`in this patient’s tumour genome. Two are well-known AML-associated
`mutations, including an internal tandem duplication of the FLT3
`receptor tyrosine kinase gene, which constitutively activates kinase
`signalling, and portends a poor prognosis5,24,25, and a four-base inser-
`tion in exon 12 of the NPM1 gene (NPMc)26–28. Both of these mutations
`are common (25–30%) in AML tumours, and are thought to contri-
`bute to progression of the disease rather than to cause it directly29.
`Notably, the frequency of the mutant FLT3 allele in the primary and
`relapse tumour samples (35.08% and 31.30%, respectively) was
`significantly less than that of the other nine mutations (P , 0.000001
`for both the primary and relapse samples). These data suggest that the
`FLT3 ITD may not have been present in all tumour cells, and further,
`that it may have been the last mutation acquired.
`The other eight somatic mutations that we detected are all single
`base changes, and none has previously been detected in an AML
`genome. Four of the genes affected, however, are in gene families
`that are strongly associated with cancer pathogenesis (including
`
`70
`
`PTPRT, CDH24, PCLKC and SLC15A1). The other four somatic
`mutations occurred in genes not previously implicated in cancer
`pathogenesis, but whose potential functions in metabolic pathways
`suggest mechanisms by which they could act to promote cancer
`(including KNDC1, GPR123, EBI2 and GRINL1B). We speculate
`about the roles of these mutations for the pathogenesis of this
`patient’s disease in Supplementary Information.
`The importance of the eight newly defined somatic mutations for
`AML pathogenesis is not yet known, and will require functional
`validation studies in tissue culture cells and mouse models to assess
`their relevance. Even though we could not detect recurrent mutations
`in the limited AML sample set that we surveyed, several lines of
`evidence suggest that these mutations may not be random, ‘passen-
`ger’ mutations. First, somatic mutations in this genome are extremely
`rare. The rarity of somatic variants, and the normal diploid structure
`of the tumour genome, argues strongly against genetic instability or
`DNA repair defects in this tumour. Conceptually, this result is further
`supported by the very small number of somatic mutations discovered
`in the expressed tyrosine kinases of AML samples4,5; genetic insta-
`bility does not seem to be a general feature of AML genomes.
`Second, on the basis of the equivalent frequencies of the variant
`and wild-type alleles for the mutations in the tumour genome (except
`for FLT3 ITD), it is highly probable that all the mutations are het-
`erozygous, and are present in virtually all of the tumour cells (Fig. 3).
`The latter suggests that these mutations may have all been selected for
`and retained because they are important for disease pathogenesis in
`this patient. Alternatively, all may have occurred simultaneously in
`the same leukaemia-initiating cell, but only a subset of the mutations
`(or an as-yet undetected mutation) is truly important for pathoge-
`nesis (that is, disease ‘drivers’ versus passengers). Although we sug-
`gest that the latter hypothesis is very unlikely on the basis of our
`current understanding of tumour progression, many more AML
`genomes will need to be sequenced to resolve this issue.
`Third, the same mutations were detected in tumour cells in the
`relapse sample at approximately the same frequencies as in the prim-
`ary sample. All of these mutations were therefore present in the
`resistant tumour cells that contributed to the patient’s relapse, fur-
`ther suggesting that a single clone contains all ten mutations. Fourth,
`seven of the ten genes containing somatic mutations were detectably
`expressed in the tumour sample. FLT3 and NPM1 messenger RNAs
`were highly expressed in this tumour sample, as they are in virtually
`all AML samples. We detected mRNA from the CDH24, SLC15A1
`and EBI2 genes on the Affymetrix expression array, whereas express-
`ion of GRINL1B and PCLKC were detected by PCR with reverse
`transcription (RT–PCR; data not shown). Expression of KNDC1,
`PTPRT and GPR123 was not detected by either approach, but we
`cannot rule out expression of these genes in a small subset of tumour
`cells (for example, leukaemia-initiating cells). Furthermore, for the
`five point mutatio



