`(12)
`(10) Patent No.:
`US 7,704,687 B2
`
`Wang et al.
`(45) Date of Patent:
`Apr. 27, 2010
`
`US007704687B2
`
`(54) DIGITAL KARYOTYPING
`
`(75)
`
`Inventors: Tian_Li Wanga Egaltilnore3 MD ms);
`Victor Velculescu Dayton MD (US)
`.
`5
`~
`5
`’
`Kenneth Kinzler, Bel Air, MD (US);
`Bert Vogelstein, Baltimore, MD (US)
`
`(73) Assignee: The Johns Hopkins University,
`Baltimore MD (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`USC. 154(b) by 1063 days.
`
`(21) Appl. No.: 10/705,874
`~
`.
`Ffled‘
`
`(22)
`
`N0“ 13’ 2003
`.
`.
`.
`Prlor Publication Data
`US 2004/0096892 A1
`May 203 2004
`
`6,498,013 B1
`2002/0048767 A1 *
`
`12/2002 Velculescu et a1.
`4/2002 Bensimon et a1.
`
`.............. 435/6
`
`..... 702/20
`2002/0147549 A1 * 10/2002 Yoshida et a1.
`2003/0124584 A1*
`7/2003 Mohammed ................... 435/6
`2003/0186251 A1
`10/2003 Dunn et a1.
`
`11/2004 Dunn et a1.
`2004/0219580 A1
`FOREIGN PATENT DOCUMENTS
`
`W0
`
`1/2002
`W0 0202805 A2 *
`OTHER PUBLICATIONS
`
`New England Biolabs Technical online reference (www.neb.com/
`nebecomni/techireference/restrictionienzymes/overviewasp) pp.
`1-4, 2007.*
`
`Dunn, J. et a1., ‘I‘Genomic Signature Tags (GSTs)” A System for
`Profiling Genomic DNA, Genome Research, pp. (2002) 1221756-
`1765.
`Wang,T. etal.,“Digitalkaryotyping,”PNAS, (Dec. 10,2002),v01.99,
`No.25, pp. 16156-16161.
`Kallioniemi, A. et al., “Comparative Genomic Hybridization for
`Molecular Cytogenetic Analysis of Solid Tumors” Science, vol. 258
`(Oct. 30, 1992), pp. 818-821.
`
`(65)
`
`(60)
`
`Related US. Application Data
`Il’goggsoznal application No. 60/426,406, filed on NOV.
`’
`'
`
`* cited by examiner
`Primary ExamineriNancy Vogel
`Assistant Examiner%atherine Hibbert
`
`(51)
`
`Int Cl
`(2006.01)
`C12Q 1/68
`(2006.01)
`G06F 19/00
`(388281)
`23511;]53%;
`(2006.01)
`CI2P 19/34
`(
`’
`)
`(52) US. Cl.
`............................... 435/6; 702/19; 702/20;
`_
`_
`_
`435/91'2
`(58) Field of .Class1fication Search ...............: ....... None
`See application file for complete search history.
`References Cited
`U.S. PATENT DOCUMENTS
`
`(56)
`
`5,200,336 A *
`5,391,480 A *
`5,663,048 A *
`5,695,937 A
`5,981,190 A
`
`................. 435/199
`4/1993 Kong et a1.
`
`.....
`2/1995 Davis et a1.
`
`9/1997 Winkfein et a1.
`............... 435/6
`12/1997 Kinzler et a1.
`11/1999 Israel
`
`(74) Attorney, Agent, or FirmiBanner & Witcoff, Ltd.
`
`ABSTRACT
`(57)
`Alterations in the genetic content of a cell underlie many
`human diseases, including cancers. A method called Digital
`Karyotyping provides quantitative analysis of DNA copy
`number at high resolution. This approach involves the isola-
`tion and enumeration of short sequence tags from specific
`genomic loci. Analysis of human cancer cells using this
`method identified gross chromosomal changes as well as
`amplifications and deletions, including regions not previ-
`ously known to be altered. Foreign DNA sequences not
`present in the normal human genome could also be readily
`identified. Digital Karyotyping provides a broadly applicable
`means for systematic detection ofDNA copy number changes
`on a genomic scale.
`
`27 Claims, 5 Drawing Sheets
`
`PGDX EX. 1002
`
`Page 1 of 21
`
`PGDX EX. 1002
`Page 1 of 21
`
`
`
`US. Patent
`
`Apr. 27, 2010
`
`Sheet 1 0f5
`
`US 7,704,687 B2
`
`Step 1. lsolate genomic DNA
`
`Cleave with mapping enzyme (Sacl) —‘/
`
` Item—rm—
`Step 2. Ligate to biotinylated linkershk
`
`
`
`
`Step 3. Cleave with fragmenting enzyme (Nlalll)
`Isolate with streptavidin magnetic beads
`
`Step 4. Ligate to linkers containing
`
`tagging enzyme site (Mme!) (o)
`
`\Step 5. Release genomic tags using tagging enzyme (Mmel)
`
`ow
`-C>.- ”who
`on pom we we
`
`Step 6. Ligate to form ditags. PCR amplify, concatenate.
`and sequence
`Step 7. Map tags to chromosome, evaluate tag density
`
`Densit
`
`Chromosome Position
`
`FIGURE 1
`
`PGDX EX. 1002
`
`Page 2 of 21
`
`PGDX EX. 1002
`Page 2 of 21
`
`
`
`US. Patent
`
`Apr. 27, 2010
`
`Sheet 2 0f5
`
`US 7,704,687 B2
`
`
`
`
`
`Copiesperhaploidgenome
`
`
`Ka ryotype
`
`Digital
`Karyotype
`
`Digital
`
`00—50100150 Mb
`2:
`
`0......
`01mm
`
`
`
`01.....-”
`22mm;
`
`Position along chromosome
`
`FIGURE 2
`
`PGDX EX. 1002
`
`Page 3 of 21
`
`PGDX EX. 1002
`Page 3 of 21
`
`
`
`US. Patent
`
`Apr. 27, 2010
`
`Sheet 3 0f5
`
`US 7,704,687 B2
`
`
`
`
`
`Qammmmmmmmfim
`
`
`E-Efixfimfliflmfi
`
`
`‘Erflfimm-Efiflng
`mmnmmm-mmum
`
`
`
` éEXEME?
`
`
`
`
`
`Copiesperhaploidgenome
`
`
`
`Copiesperhaploid
`
`genome
`
`FIGURE 3
`
`PGDX EX. 1002
`
`Page 4 0f 21
`
`PGDX EX. 1002
`Page 4 of 21
`
`
`
`US. Patent
`
`Apr. 27, 2010
`
`Sheet 4 0f5
`
`US 7,704,687 B2
`
`to OOO
`
`1000
`
`Observed
`
`tags EBV genome
`
`.
`Other vrral
`seq uences
`
`Bacterial
`sequences
`
`FIGURE 4
`
`PGDX EX. 1002
`
`Page 5 0f 21
`
`PGDX EX. 1002
`Page 5 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 5 of 5
`
`US 7,704,687 B2
`
`250me
`
`.335
`
`830me
`.355
`M
`oonon
`
`zoou.
`
`830me
`
`.355
`
`xEO
`
`Nn
`
`r
`
` ' n
`
`mm:6"a50“
`
`aaM8anW-M,e
`
`u5EU
`
`
`
`35:mEomoEoEo956common.
`
`m
`
`.m2:9".
`
`
`
`
`
`ewouefi pgoldeu Jed segdoo
`
`PGDX EX. 1002
`
`Page 6 0f 21
`
`PGDX EX. 1002
`Page 6 of 21
`
`
`
`
`
`
`1
`DIGITAL KARYOTYPING
`
`US 7,704,687 B2
`
`2
`
`This application claims the benefit of provisional applica-
`tion Ser. No. 60/426,406 filed Nov. 15, 2002, the contents of
`which are expressly incorporated herein.
`The work underlying this invention was supported in part
`by the US. government. Thus the US. government retains
`certain rights in the invention according to the provisions of
`grant nos. CA 43460, CA 57345, CA 62924 of the National
`Institutes of Health.
`
`A portion of the disclosure of this patent document con-
`tains material which is subject to copyright protection. The
`copyright owner has no objection to the facsimile reproduc-
`tion by anyone of the patent document or the patent disclo-
`sure, as it appears in the Patent and Trademark Office patent
`file or records, but otherwise reserves all copyright rights
`whatsoever.
`
`FIELD OF THE INVENTION
`
`The invention relates to the field of genetics. In particular,
`it relates to the determination of karyotypes of genomes of
`individuals.
`
`BACKGROUND OF THE INVENTION
`
`Somatic and hereditary variations in gene copy number can
`lead to profound abnormalities at the cellular and organismal
`levels. In human cancer, chromosomal changes, including
`deletion of tumor suppressor genes and amplification of
`oncogenes, are hallmarks of neoplasia (1). Single copy
`changes in specific chromosomes or smaller regions can
`result in a number of developmental disorders, including
`Down, Prader Willi, Angelman, and cri du chat syndromes
`(2). Current methods for analysis of cellular genetic content
`include comparative genomic hybridization (CGH) (3), rep-
`resentational difference analysis (4), spectral karyotyping/M-
`FISH (5, 6), microarrays (7-10), and traditional cytogenetics.
`Such techniques have aided in the identification of genetic
`aberrations in human malignancies and other diseases (11-
`14). However, methods employing metaphase chromosomes
`have a limited mapping resolution (~20 Mb) (15) and there-
`fore cannot be used to detect smaller alterations. Recent
`
`implementation of comparative genomic hybridization to
`microarrays
`containing
`genomic
`or
`transcript DNA
`sequences provide improved resolution, but are currently lim-
`ited by the number of sequences that can be assessed (16) or
`by the difficulty of detecting certain alterations (9). There is a
`continuing need in the art for methods of analyzing and com-
`paring genomes.
`
`BRIEF SUMMARY OF THE INVENTION
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`In a first embodiment a method is provided for karyotyping
`a genome of a test eukaryotic cell. A population of sequence
`tags is generated from defined portions of the genome of the
`test eukaryotic cell. The portions are defined by one or two
`restriction endonuclease recognition sites. The sequence tags
`in the population are enumerated to determine the number of
`individual sequence tags present in the population. The num-
`ber of a plurality of sequence tags in the population is com-
`pared to the number of the plurality of sequence tags deter-
`mined for a genome of a reference cell. The plurality of
`sequence tags are within a window of sequence tags which are
`calculated to be contiguous in the genome ofthe species ofthe 65
`eukaryotic cell. A difference in the number of the plurality of
`sequence tags within the window present in the population
`
`60
`
`from the number determined for a reference eukaryotic cell
`indicates a karyotypic difference between the test eukaryotic
`cell and the reference eukaryotic cell.
`According to a second embodiment of the invention, a
`dimer
`is provided. The dimer comprises two distinct
`sequence tags from defined portions of the genome of a
`eukaryotic cell. The portions are defined by one or two restric-
`tion endonuclease recognition sites. Each of said sequence
`tags consists of a fixed number of nucleotides of one of said
`defined portions of the genome. The fixed number of nucle-
`otides extend from one of said restriction endonuclease rec-
`
`ognition sites.
`According to a third embodiment of the invention, a con-
`catamer of dimers is provided. The dimers comprise two
`distinct sequence tags from defined portions ofthe genome of
`a eukaryotic cell. The portions are defined by one or two
`restriction endonuclease recognition sites. Each of said
`sequence tags consists of a fixed number ofnucleotides ofone
`of said defined portions of the genome. The fixed number of
`nucleotides extend from one of the restriction endonuclease
`
`recognition sites.
`According to a fourth embodiment of the invention a
`method of karyotyping a genome of a test eukaryotic cell is
`provided. A population of sequence tags is generated from
`defined portions ofthe genome ofthe test eukaryotic cell. The
`portions are defined by one or two restriction endonuclease
`recognition sites. The sequence tags in the population are
`enumerated to determine the number of individual sequence
`tags present in the population. The number of a plurality of
`sequence tags in the population is compared to the number of
`said plurality of sequence tags calculated to be present in the
`genome of the species of the eukaryotic cell. The plurality of
`sequence tags are within a window of sequence tags which are
`calculated to be contiguous in the genome ofthe species ofthe
`eukaryotic cell. A difference in the number of the plurality of
`sequence tags within the window present in the population
`from the number calculated to be present in the genome ofthe
`eukaryotic cell indicates a karyotypic abnormality.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1. Schematic of Digital Karyotyping approach. Col-
`ored boxes represent genomic tags. Small ovals represent
`linkers. Large blue ovals represent streptavidin-coated mag-
`netic beads. This figure is described in more detail below.
`FIG. 2. Low resolution tag density maps reveal many sub-
`chromosomal changes. The top graph corresponds to the
`Digital Karyotype, while the lower graph represents CGH
`analysis.An ideogram of each normal chromosome is present
`under each set of graphs. For all graphs, values on the Y-axis
`indicate genome copies per haploid genome, and values on
`the X-axis represent position along chromosome (Mb for
`Digital Karyotype, and chromosome bands for CGH). Digital
`Karyotype values represent exponentially smoothed ratios of
`DiFi tag densities, using a sliding window of 1000 virtual tags
`normalized to the NLB genome. Chromosomal areas lacking
`Digital Karyotype values correspond to unsequenced por-
`tions ofthe genome, including heterochromatic regions. Note
`that using a window of 1000 virtual tags does not permit
`accurate identification alterations less than ~4 Mb, such as
`amplifications and homozygous deletions, and smaller win-
`dows need to be employed to accurately identify these lesions
`(see FIG. 3 for example).
`FIGS. 3A and 3B. High resolution tag density maps iden-
`tify amplifications and deletions. (FIG. 3A) Amplification on
`chromosome 7. Top panel represent bitmap viewer with the
`region containing the alteration encircled. The bitmap viewer
`
`PGDX EX. 1002
`
`Page 7 of 21
`
`PGDX EX. 1002
`Page 7 of 21
`
`
`
`US 7,704,687 B2
`
`3
`is comprised of ~39,000 pixels representing tag density val-
`ues at the chromosomal position of each Virtual tag on chro-
`mosome 7, determined from sliding windows of 50 virtual
`tags. Yellow pixels indicate tag densities corresponding to
`copy numbers <110 while black pixels correspond to copy
`number 2110. Middle panel represents an enlarged view of
`the region of alteration. The lower panel indicates a graphical
`representation of the amplified region with values on the
`Y-axis indicating genome copies per haploid genome and
`values on the X-axis representing position along the chromo-
`some in Mb. (FIG. 3B) Homozygous deletion on chromo-
`some 5. Top, middle and lower panels are similar to those for
`(FIG. 3A) except that the bitmap viewer for chromosome 5
`contains ~43,000 pixels, tag density values were calculated in
`sliding windows of 150 virtual tags, and yellow pixels indi-
`cate copy numbers >0.1 while black pixels indicate copy
`numbers £01. Bottom panel represents detailed analysis of
`the region containing the homozygous deletion in DiFi and
`C052. For each sample, white dots indicate markers that were
`retained, while black dots indicate markers that were
`homozygously deleted. PCR primers for each marker are
`listed in Table 4
`
`FIG. 4. Identification of EBV DNA in NLB cells. NLB,
`genomic tags derived from NLB cells after removal of tags
`matching human genome sequences or tags matching DiFi
`cells. DiFi, genomic tags derived from DiFi cells after
`removal of tags matching human genome sequences or tags
`matching NLB cells. The number of observed tags matching
`EBV, other viral, or bacterial sequences is indicated on the
`vertical axis.
`
`FIG. 5. Low resolution tag density maps of the DiFi tumor
`genome. For each chromosome, the top graph corresponds to
`the Digital Karyotype while the lower graph represents CGH
`analysis. An ideogram of each chromosome is depicted under
`each set of graphs. For all graphs, values on the Y-axis indi-
`cate genome copies per haploid genome, and values on the
`X-axis represent position along chromosome (Mb for Digital
`Karyotypes, and chromosome bands for CGH). Digital
`Karyotype values represent exponentially smoothed ratios of
`DiFi tag densities, using a sliding window of 1000 virtual tags
`normalized to the NLB genome. Chromosomal areas lacking
`Digital Karyotype values correspond to unsequenced por-
`tions of the genome, including heterochromatic regions.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`It is a discovery ofthe present inventors that the genome of
`an organism can be sampled in groups of small pieces to
`determine karyotypic properties of an organism using a sys-
`tematic and quantitative method. Changes in copy number of
`portions ofthe genome can be determined on a genomic scale.
`Such changes include gain or loss of whole chromosomes or
`chromosome arms, amplifications and deletions ofregions of
`the genome, as well as insertions of foreign DNA. Rearrange-
`ments, such as translocations and inversions, would typically
`not be detected by the method.
`Our data demonstrate that the method, called Digital
`Karyotyping, can accurately identify regions whose copy
`number is abnormal, even in complex genomes such as that of
`the human. Whole chromosome changes, gains or losses of
`chromosomal arms, and interstitial amplifications or dele-
`tions can be detected. Moreover, the method permits the
`identification of specific amplifications and deletions that had
`not been previously described by comparative genomic
`hybridization (CGH) or other methods in any human cancer.
`These analyses suggest that a potentially large number of
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`
`in cancer
`undiscovered copy number alterations exist
`genomes and that many of these could be detected through
`Digital Karyotyping.
`Like all genome-wide analyses, Digital Karyotyping has
`limitations. First, the ability to measure tag densities over
`entire chromosomes depends on the accuracy and complete-
`ness of the genome sequence. Fortunately, over 94% of the
`human genome is available in draft form, and 95% of the
`sequence is expected to be in a finished state by 2003. Second,
`a small number of areas of the genome are expected to have a
`lower density of mapping enzyme restriction sites and be
`incompletely evaluated by our approach. We estimate that
`less than 5% of the genome would be incompletely analyzed
`using the parameters employed in the current study. More-
`over, this problem could be overcome through the use of
`different mapping and fragmenting enzymes. Finally, Digital
`Karyotyping cannot generally detect very small regions, on
`the order of several thousand base pairs or less, that are
`amplified or deleted.
`Nevertheless,
`it is clear from our analyses that Digital
`Karyotyping provides a heretofore unavailable picture of the
`DNA landscape of a cell. The approach should be immedi-
`ately applicable to the analysis of human cancers, wherein
`identification of homozygous deletions and amplifications
`has historically revealed genes important in tumor initiation
`and progression. In addition, one can envisage a variety of
`other applications for this technique. First, the approach
`could be used to identify previously undiscovered alterations
`in hereditary disorders. A potentially large number of such
`diseases are thought to be due to deletions or duplications too
`small to be detected by conventional approaches. These may
`be detectable with Digital Karyotyping even in the absence of
`any linkage or other positional information. Second, use of
`mapping enzymes that are sensitive to DNA methylation (e.g.
`NotI) could be employed to catalog genome-wide methyla-
`tion changes in cancer or diseases thought to be affected by
`genomic imprinting. Third, the approach could be as easily
`applied to the genomes of other organisms to search for
`genetic alterations responsible for specific phenotypes, or to
`identify evolutionary differences between related species.
`Moreover, as the genome sequences of increasing numbers of
`microorganisms and viruses become available, the approach
`can be used to identify the presence of pathogenic DNA in
`infectious or neoplastic states.
`Populations of sequence tags are generated from defined
`portions of the genome. The portions are defined by one or
`two restriction endonuclease recognition sites. Preferably the
`recognition sites are located in a fixed position within the
`defined portions of the genome. In one embodiment three
`different restriction endonucleases are used to generate
`sequence tags. In this embodiment, the restriction endonu-
`cleases used to generate the tags can be termed mapping
`(first), fragmenting (second), and tagging restriction endonu-
`clease. The defined portions extend from the fragmenting
`(second) restriction endonuclease site to the closest mapping
`(first)
`restriction endonuclease site. The sequence tags
`derived from these definedportions are generated by cleavage
`with a tagging enzyme. The closest nucleotides adjacent to
`the fragmenting (second) restriction endonuclease comprise
`the sequence tags. The number of nucleotides is typically a
`fixed number (defined here to include a range of numbers)
`which is a function of the properties of the tagging (third)
`restriction endonuclease. For example, using MmeI the fixed
`number is 20, 21 or 22. Other Type IIS restriction endonu-
`cleases cleave at different distances from their recognition
`sequences. Other Type IIS restriction endonucleases which
`can be used include Bva, BvaI, BinI, FokI, HgaI, thI,
`
`PGDX EX. 1002
`
`Page 8 of 21
`
`PGDX EX. 1002
`Page 8 of 21
`
`
`
`US 7,704,687 B2
`
`5
`Mboll, Mnll, SfaNl, Taqll, Tthlllll , BsmFl, and Fokl. See
`Szybalski, W., Gene, 40:169, 1985. Other similar enzymes
`will be known to those of skill in the art (see, Current Proto-
`cols in Molecular Biology, supra). Restriction endonucleases
`with desirable properties can be artificially evolved, i.e., sub-
`jected to selection and screening, to obtain an enzyme which
`is useful as a tagging enzyme. Desirable enzymes cleave at
`least 18-21 nucleotides distant from their recognition sites.
`Artificial restriction endonucleases can also be used. Such
`
`endonucleases are made by protein engineering. For
`example, the endonuclease Fokl has been engineered by
`insertions so that it cleaves one nucleotide further away from
`its recognition site on both strands ofthe DNA substrates. See
`Li and Chandrasegaran, Proc. Nat. Acad. Sciences USA
`90:2764-8, 1993. Such techniques can be applied to generate
`restriction
`endonucleases with
`desirable
`recognition
`sequences and desirable distances from recognition site to
`cleavage site.
`In an alternative embodiment a single restriction endonu-
`clease can define a defined portion of the genome. A fixed
`number of nucleotides on one or both sides of the restriction
`
`endonuclease recognition site then forms the sequence tags.
`For example, the restriction endonuclease Bcgl can be used to
`provide a 36 bp fragment. The 12 bp recognition site (having
`6 degenerate positions) lies in the middle of a fragment; 12 bp
`flank the site on either side. Other similar enzymes which can
`be used in this embodiment include Bpll and BsaXl. Prefer-
`ably the enzyme used releases a fragment having a sum of at
`least 18 or 20 nucleotides flanking its recognition sequence.
`Enumeration of sequence tags generated is performed by
`determining the identity of the sequence tags and recording
`the number of occurrences of each such tag or of genomically
`clustered tags. Preferably the determination of identity of the
`tags is done by automated nucleotide sequence determination
`and the recording is done by computer. Other methods for
`identifying and recording tags can be used, as is convenient
`and efficient to the practitioner. According to one embodi-
`ment of the invention sequence tags are ligated together to
`form a concatenate and the concatenates are cloned and seqe-
`unces. In a preferred embodiment the sequence tags are
`dimerized prior to formation of the concatenate. The
`sequence tags can be amplified as single tags or as dimers
`prior to concatenation.
`A feature of the data analysis which enables the efficient
`practice of the method is the use of windows. These are
`groups of sequence tags which are genomically clustered.
`Virtual tags can be extracted from the genomic data for the
`species being tested. The virtual tags are associated with
`locations in the genome. Groups of adjacent virtual tags
`which are clustered in the genome are used to form a window
`for analysis of actual experimental tags. The term adjacent or
`contiguous as used herein to describe tags does not imply that
`the nucleotides of one tag are contiguous with the nucleotides
`ofanother tag, but rather that the tags are clustered in the same
`areas of the genome. Because of the way that sequence tags
`are generated, they only sample the genome; they do not
`saturate the genome. Thus, for example, a window can com-
`prise sequence tags that map within about 40 kb, about 200
`kb, about 600 kb, or about 4 Mb. Typically such windows
`comprise from 10 to 1000 sequence tags. Use of windows
`such as these permits the genome to be sampled rather than
`comprehensively analyzed. Thus, far less than 100% of the
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`sequence tags must be counted to obtain useful information.
`In fact, less than 50%, less than 33%, less than 25%, less than
`20%, even less than 15% ofthe sequence tags calculated to be
`present in the genome of the eukaryotic cell need be enumer-
`ated to obtain useful data. The karyotypic analysis can be used
`inter alia to compare a cancer cell to a normal cell, thereby
`identifying regions of genomic change involved in cancer.
`The karyotypic analysis can be used to identify genes
`involved in hereditary disorders. The karyotypic analysis can
`be used to identify genetic material in a eukaryotic cell
`derived from an infectious agent.
`Changes in amount ofparticular regions ofthe genome can
`identify aneuploidy if (a) sequence tags of one or more auto-
`somes are determined to be present in the test eukaryotic cell
`relative to the reference eukaryotic cell at a ratio of 3 or
`greater or less than 1.5; or (b) sequence tags of one or more
`sex chromosomes in a male are determined to be present in
`the test eukaryotic cell relative to the reference eukaryotic cell
`at a ratio of 1 .5 or greater or less than 0.7; or (c) sequence tags
`ofX chromosomes in a female are determined to be present in
`the test eukaryotic cell relative to the reference eukaryotic cell
`at a ratio of 3 or greater or less than 1.5, or relative to a
`reference female eukaryotic cell at a ratio of 1.5 or greater or
`less than 0.7. Similarly, such changes can be measured with
`reference to nucleotide sequence data for the genome of a
`particular species.
`Preferably the method ofthe present invention employs the
`formation of dimers of sequence tags. Such dimers permit the
`elimination of certain types of bias, for example that which
`might be introduced during amplification of tags. Typically
`dimers which do not comprise two distinct tags are excluded
`from analysis. Two sequence tags which form a dimer are
`desirably joined end-to-end at the ends distal to the second
`restriction endonuclease (fragmenting) site. Such distal ends
`are typically formed by the action of the tagging enzyme, i.e.,
`the third restriction endonuclease. Preferably the distal ends
`are sticky ends. All or part of the oligonucleotide linkers can
`remain as part of the dimers and can remain as part of the
`concatenates of dimers as well. However, the linkers are
`preferably cleaved prior to the concatenation.
`
`EXAMPLES
`
`Example 1
`
`Principles of Digital Karyotyping
`
`These concepts are practically incorporated into Digital
`Karyotyping ofhuman DNA as described in FIG. 1. Genomic
`DNA is cleaved with a restriction endonuclease (mapping
`enzyme) that is predicted to cleave genomic DNA into several
`hundred thousand pieces, each on average <10 kb in size
`(Step 1). A variety of different endonucleases can be used for
`this purpose, depending on the resolution desired. In the
`current study, we have used Sacl, with a 6-bp recognition
`sequence predicted to preferentially cleave near or within
`transcribed genes. Biotinylated linkers are ligated to the DNA
`molecules (Step 2) and then digested with a second endonu-
`clease (fragmenting enzyme) that recognizes 4-bp sequences
`(Step 3). As there are on average 16 fragmenting enzyme sites
`between every two mapping enzyme sites (46/44), the major-
`ity of DNA molecules in the template are expected to be
`
`PGDX EX. 1002
`
`Page 9 of 21
`
`PGDX EX. 1002
`Page 9 of 21
`
`
`
`US 7,704,687 B2
`
`7
`cleaved by both enzymes and thereby be available for subse-
`quent steps. DNA fragments containing biotinylated linkers
`are separated from the remaining fragments using streptavi-
`din-coated magnetic beads (Step 3). New linkers, containing
`a 5-bp site recognized by Mmel, a type HS restriction endo-
`nuclease (18), are ligated to the captured DNA (Step 4). The
`captured fragments are cleaved by Mmel, releasing 21 bp tags
`(Step 5). Each tag is thus derived from the sequence adjacent
`to the fragmenting enzyme site that is closest to the nearest
`mapping enzyme site. Isolated tags are self-ligated to form
`ditags, PCR amplified en masse, concatenated, cloned, and
`sequenced (Step 6). As described for SAGE (17), formation
`of ditags provides a robust method to eliminate potential PCR
`induced bias during the procedure. Current automated
`sequencing technologies identify up to 30 tags per concata-
`mer clone, allowing for analysis of ~100,000 tags per day
`using a single 384 capillary sequencing apparatus. Finally,
`tags are computationally extracted from sequence data,
`matched to precise chromosomal locations, and tag densities
`are evaluated over moving windows to detect abnormalities in
`DNA sequence content (Step 7).
`The sensitivity and specificity of Digital Karyotyping in
`detecting genome-wide changes was expected to depend on
`several factors. First, the combination of mapping and frag-
`menting enzymes determines the minimum size of the alter-
`ations that can be identified. For example, use of Sacl and
`Nlalll as mapping and fragmenting enzymes, respectively,
`was predicted to result in a total of 730,862 virtual tags
`(defined as all possible tags that could theoretically be
`obtained from the human genome). These virtual tags were
`spaced at an average of 3,864 bp, with 95% separated by 4 bp
`to 46 kb. Practically, this resolution is limited by the number
`oftags actually sampled in a given experiment and the type of
`alteration present (Table 1). Monte Carlo simulations con-
`firmed the intuitive concept that fewer tags are needed to
`detect high copy number amplifications than homozygous
`deletions or low copy number changes in similar sized
`regions (Table 1). Such simulations were used to predict the
`size of alterations that could be reliably detected given a fixed
`number of experimentally sampled tags. For example, analy-
`sis of 100,000 tags would be expected to reliably detect a
`10-fold amplification 2 100 kb, homozygous deletions 2600
`kb, or a single gain or loss of regions :4 Mb in size in a
`diploid genome (Table 1).
`
`TABLE 1
`
`8
`Example 2
`
`Analysis of Whole Chromosomes
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`We characterized 210,245 genomic tags from lymphoblas-
`toid cells of a normal individual (NLB) and 171,795 genomic
`tags from the colorectal cancer cell line (DiFi) using the
`mapping and fragmenting enzymes described above. After
`filtering to remove tags that were within repeated sequences
`or were not present in the human genome (see Materials and
`Methods), we recovered a total of 111,245 and 107,515 fil-
`tered tags from the NLB and DiFi libraries, respectively. Tags
`were ordered along each chromosome, and average chromo-
`somal tag densities, defined as the number of detected tags
`divided by the number of virtual tags present in a given
`chromosome, were evaluated (Table 2). Analysis of the NLB
`data showed that the average tag densities for each autosomal
`chromosome was similar, ~0.16+/—0.04. The small variations
`in tag densities were likely due to incomplete filtering of tags
`matching repeated sequences that were not currently repre-
`sented in the genome databases. The X andY chromosomes
`had average densities about half this level, 0.073 and 0.068,
`respectively, consistent with the normal male karyotype of
`these cells. Analysis of the DiFi data revealed a much wider
`variation in tag densities, ranging from 0.089 to 0.27 for
`autosomal chromosomes. In agreement with the origin of
`these tumor cells from a female patient (20), the tag density of
`theY chromosome was 0.00. Estimates of chromosome num-
`
`ber using observed tag densities normalized to densities from
`lymphoblastoid cells suggested a highly aneuploid genetic
`content, with E15 copies of chromosome 1, 4, 5, 8, 17, 21
`and 22, and :3 copies of chromosome 7, 13 and 20 per
`diploid genome. These observations were consistent with
`CGH analyses (see below) and the previously reported karyo-
`type of DiFi cells (20).
`
`Theoretical detection of copy number alterations using Digital Karyotyping*
`
`Size ofAlteration’“
`
`Homozygous
`
`#
`virtual
`
`Amplification
`Copy number = 10
`
`deletion Copy
`number = 0
`
`Heterozygous loss
`Copy number = 1
`
`Subchromosomal gain
`Copy number = 3
`
`# bp
`
`100,000
`200,000
`600,000
`2,000,000
`4,000,000
`
`tags
`
`30
`50
`150
`500
`1000
`
`100,000
`
`1,000,000
`
`100,000
`
`1,000,000
`
`100,000
`
`1,000,000
`
`100,000
`
`1,000,000
`
`100%
`100%
`100%
`100%
`100%
`
`100%
`100%
`100%
`100%
`100%
`
`0.06%
`1%
`96%
`100%
`100%
`
`100%
`100%
`100%
`100%
`100%
`
`0.008%
`0.01%
`0.07%
`11%
`99%
`
`0.02%
`3%
`100%
`100%
`100%
`
`0.006%
`0.01%
`0.05%
`3%
`97%
`
`0.08%
`0.7%
`100%
`100%
`100%
`
`*Copy number alteration refers to the gain or loss of chromosomal regions in the context of the normal diploid
`genome, where the normal copy number is 2. The limiting feature of these analyses was not sensitivity for detect-
`ing the alteration, as this was high in every case shown (>99% for amplifications and homozygous deletions and
`>92% for heterozygous losses or subchromosomal gains). What was of more concern was the positive predictive
`value (PPV), that is, the probability that a detected mutation represents a real mutation. PPVs were calculated from
`100 simulated genomes, using 100,000 or 1,000,000 filtered tags, and shown in the table as percents.
`+Size of alteration refers to the approximate size of the genomic alteration assuming an average of 3864 bp
`between virtual tags.
`
`PGDX EX. 1002
`
`Page 10 of 21
`
`PGDX EX. 1002
`Page 10 of 21
`
`
`
`US 7,704,687 B2
`
`10
`contrast, the DiFi tag density map (normalized to the NLB
`data) revealed widespread changes, including apparent losses
`in large regions of 5q, 8p and 10q, and gains of2p, 7q, 9p, 12q,
`13q, and 19q (FIG. 2 and FIG. 5). These changes included
`regions ofknown tumor suppressor genes (21)