`US 7,704,687 B2
`(10) Patent No.:
`(12)
`Wangetal.
`(45) Date of Patent:
`Apr. 27, 2010
`
`
`US007704687B2
`
`(54) DIGITAL KARYOTYPING
`
`(75)
`
`Inventors: Tian-Li Wang, Baltimore, MD (US);
`Victor Velculescu, Dayton, MD (US);
`:
`°
`;
`Kenneth Kinzler, Bel Air, MD (US);
`Bert Vogelstein, Baltimore, MD (US)
`
`(73) Assignee: The Johns Hopkins University,
`Baltimore, MD (US)
`
`(*) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1063 days.
`
`(21) Appl. No.: 10/705,874
`Tad:
`Filed:
`
`(22)
`(65)
`
`Nov. 13, 2003
`:
`as
`Prior Publication Data
`US 2004/0096892 Al
`May 20, 2004
`
`,
`
`Related U.S. Application Data
`(60) ysional application No. 60/426,406, filed on Nov.
`,
`(51)
`Int. Cl
`(2006.01)
`C120 168
`(2006.01)
`GO6F 19/00
`2006."
`elon
`(2006.0>
`C1OP 19/34
`(
`01)
`(52) US. Ch cece eeeeceees 435/6; 702/19; 702/20;
`.
`.
`.
`435/91.2
`(58) Field of Classification Search o0...0000..settee None
`See applicationfile for complete search history.
`References Cited
`U.S. PATENT DOCUMENTS
`
`(56)
`
`5,200,336 A *
`5,391,480 A *
`5,663,048 A *
`5,695,937 A
`5,981,190 A
`
`4/1993 Kongetal. oe 435/199
`
`2/1995 Davisetal.
`.....
`
`9/1997 Winkfein et al. 0... 435/6
`12/1997 Kinzleretal.
`11/1999 Israel
`
`6,498,013 Bl
`12/2002 Velculescuetal.
`2002/0048767 Al*
`4/2002 Bensimon etal.
`.............. 435/6
`
`2002/0147549 Al* 10/2002 Yoshida et al.
`..........00.... 702/20
`2003/0124584 AL*
`7/2003 Mohammed.........0ee 435/6
`2003/0186251 Al
`10/2003 Dunn etal.
`2004/0219580 Al
`11/2004 Dunn etal.
`FOREIGN PATENT DOCUMENTS
`
`wo
`
`1/2002
`WO 0202805 A2 *
`OTHER PUBLICATIONS
`
`New England Biolabs Technical online reference (www.neb.com/
`nebecomm/tech_reference/restriction_enzymes/overview.asp) pp.
`1-4, 2007.*
`Dunn, J. et al., “Genomic Signature Tags (GSTs)” A System for
`Profiling Genomic DNA, Genome Research, pp. (2002) 12:1756-
`1765.
`Wang,T.et al., “Digital karyotyping,”PNAS,(Dec. 10,2002), vol.99,
`No. 25, pp. 16156-16161.
`Kallioniemi, A. et al., “Comparative Genomic Hybridization for
`Molecular Cytogenetic Analysis of Solid Tumors” Science, vol. 258
`(Oct. 30, 1992), pp. 818-821.
`* cited by examiner
`Primary Examiner—Nancy Vogel
`Assistant Examiner—Catherine Hibbert
`(74) Attorney, Agent, or Firm—Banner & Witcoff, Ltd.
`67)
`ABSTRACT
`Alterations in the genetic content of a cell underlie many
`humandiseases, including cancers. A methodcalled Digital
`Karyotyping provides quantitative analysis of DNA copy
`numberat high resolution. This approach involvesthe isola-
`tion and enumeration of short sequence tags from specific
`genomic loci. Analysis of human cancer cells using this
`method identified gross chromosomal changes as well as
`amplifications and deletions, including regions not previ-
`ously known to be altered. Foreign DNA sequences not
`present in the normal human genomecould also bereadily
`identified. Digital Karyotyping provides a broadly applicable
`meansfor systematic detection ofDNA copy number changes
`on a genomicscale.
`
`27 Claims, 5 Drawing Sheets
`
`PGDX EX. 1002
`Page 1 of 21
`
`PGDX EX. 1002
`Page 1 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 1 of 5
`
`US 7,704,687 B2
`
`Step 1. Isolate genomic DNA
`
`Cleave with mapping enzyme (Sacl) meen a
`
`tagging enzymesite (Mmel) (<>)
`
`Step 3. Cleave with fragmenting enzyme(Nialll)
`Isolate with streptavidin magnetic beads
`
`Step 4. Ligate to linkers containing
`
`=]
`
`mECE] ESk>—ea EK
`cag sOOE! GE GK
`
`™,”5. Releasegenomictagsusing taggingenzyme(Mmel)
`
`Step 6. Ligate to form ditags, PCR amplify, concatenate,
`and sequence
`Step 7. Map tags to chromosome, evaluate tag density
`
` Amplification
`3.9]————_—_—_—_—/1—
`
`Deletion
`
`”4
`
`Chromosome Position
`
`FIGURE 1
`
`PGDX EX. 1002
`Page 2 of 21
`
`PGDX EX. 1002
`Page 2 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 2 of 5
`
`US 7,704,687 B2
`
`
`
`
`
`Copiesperhaploidgenome
`
`
`Karyotype
`
`Digital
`Karyotype
`
`50
`
`0-100 Mb
`
`CGH
`
`Digital
`
`Position along chromosome
`
`FIGURE2
`
`PGDX EX. 1002
`Page 3 of 21
`
`PGDX EX. 1002
`Page 3 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 3 of 5
`
`US 7,704,687 B2
`
`
`airePictvierbehcae
`
`Oyepeeeelnp
`
`feettLAbeeLC
`Es
`Sees)
`ver
`eeeee
`
`
`
`
`
`
`Copiesperhaploidgenome
`
`
`
`Copiesperhaploid
`
`genome
`
`FIGURE 3
`
`PGDX EX. 1002
`Page 4 of 21
`
`PGDX EX. 1002
`Page 4 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 4 of 5
`
`US 7,704,687 B2
`
`
`
`Observedtags
`
`Nh ©Soo
`
`1000
`
`DiFi EBV genome Otherviral
`
`ye,
`
`sequences
`
`Bacterial
`sequences
`
`FIGURE4
`
`PGDX EX. 1002
`Page 5 of 21
`
`PGDX EX. 1002
`Page 5 of 21
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 5 of 5
`
`US 7,704,687 B2
`
`reubiq‘
`
`
`
`adAjoAley:
`
`edAjoArey
`
`jenbid
`
`Zz¥zzviUDEbUDcb449bbJUDO1IUD*
`
`adAjofuey
`
`eubig
`
`
`
`LeBLqaebauy’ZENO9bU5
`
`
`
`
`
`
`
`
`
`(qi)QwuosoWOJYOBuojeuolisod
`
`‘gounbi4
`
`9wousb plojdey sed saido9
`
`PGDX EX. 1002
`Page6 of 21
`
`PGDX EX. 1002
`Page 6 of 21
`
`
`
`
`
`
`US 7,704,687 B2
`
`1
`DIGITAL KARYOTYPING
`
`This application claimsthe benefit of provisional applica-
`tion Ser. No. 60/426,406 filed Nov. 15, 2002, the contents of
`which are expressly incorporated herein.
`The work underlying this invention was supported in part
`by the U.S. government. Thus the U.S. governmentretains
`certain rights in the invention according to the provisions of
`grant nos. CA 43460, CA 57345, CA 62924 of the National
`Institutes of Health.
`
`A portion of the disclosure of this patent document con-
`tains material which is subject to copyright protection. The
`copyright owner has no objection to the facsimile reproduc-
`tion by anyone of the patent documentorthe patent disclo-
`sure, as it appears in the Patent and Trademark Office patent
`file or records, but otherwise reserves all copyright rights
`whatsoever.
`
`FIELD OF THE INVENTION
`
`The inventionrelates to the field of genetics. In particular,
`it relates to the determination of karyotypes of genomes of
`individuals.
`
`BACKGROUND OF THE INVENTION
`
`Somatic and hereditary variations in gene copy number can
`lead to profound abnormalities at the cellular and organismal
`levels. In human cancer, chromosomal changes, including
`deletion of tumor suppressor genes and amplification of
`oncogenes, are hallmarks of neoplasia (1). Single copy
`changes in specific chromosomes or smaller regions can
`result in a number of developmental disorders, including
`Down, Prader Willi, Angelman, and cri du chat syndromes
`(2). Current methods for analysis of cellular genetic content
`include comparative genomic hybridization (CGH) (3), rep-
`resentational difference analysis(4), spectral karyotyping/M-
`FISH (5, 6), microarrays (7-10), and traditional cytogenetics.
`Such techniques have aided in the identification of genetic
`aberrations in human malignancies and other diseases (11-
`14). However, methods employing metaphase chromosomes
`have a limited mapping resolution (~20 Mb) (15) and there-
`fore cannot be used to detect smaller alterations. Recent
`
`implementation of comparative genomic hybridization to
`microarrays
`containing
`genomic
`or
`transcript DNA
`sequences provide improvedresolution,but are currently lim-
`ited by the number of sequencesthat can be assessed (16) or
`bythe difficulty of detecting certain alterations (9). There is a
`continuing needin the art for methodsof analyzing and com-
`paring genomes.
`
`BRIEF SUMMARY OF THE INVENTION
`
`Ina first embodimenta methodis provided for karyotyping
`a genomeofa test eukaryotic cell. A population of sequence
`tags is generated from defined portions of the genomeofthe
`test eukaryotic cell. The portions are defined by one or two
`restriction endonuclease recognition sites. The sequence tags
`in the population are enumerated to determine the numberof
`individual sequencetags present in the population. The num-
`ber of a plurality of sequence tags in the population is com-
`pared to the numberofthe plurality of sequence tags deter-
`mined for a genome ofa reference cell. The plurality of
`sequencetags are within a window of sequence tags which are
`calculated to be contiguousin the genomeofthe species ofthe
`eukaryotic cell. A difference in the numberoftheplurality of
`sequence tags within the window present in the population
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`from the number determinedfor a reference eukaryoticcell
`indicates a karyotypic difference between the test eukaryotic
`cell and the reference eukaryoticcell.
`According to a second embodimentof the invention, a
`dimer
`is provided. The dimer comprises two distinct
`sequence tags from defined portions of the genome of a
`eukaryotic cell. The portionsare defined by one or tworestric-
`tion endonuclease recognition sites. Each of said sequence
`tags consists of a fixed numberof nucleotides of one of said
`defined portions of the genome. The fixed numberof nucle-
`otides extend from oneofsaid restriction endonuclease rec-
`ognition sites.
`According to a third embodimentofthe invention, a con-
`catamer of dimers is provided. The dimers comprise two
`distinct sequencetags from defined portions ofthe genome of
`a eukaryotic cell. The portions are defined by one or two
`restriction endonuclease recognition sites. Each of said
`sequencetags consists of a fixed numberofnucleotides ofone
`of said defined portions of the genome. Thefixed number of
`nucleotides extend from one of the restriction endonuclease
`recognitionsites.
`According to a fourth embodiment of the invention a
`method of karyotyping a genomeofa test eukaryotic cell is
`provided. A population of sequence tags is generated from
`defined portions ofthe genomeofthe test eukaryotic cell. The
`portions are defined by one or two restriction endonuclease
`recognition sites. The sequence tags in the population are
`enumerated to determine the numberof individual sequence
`tags present in the population. The numberofa plurality of
`sequencetagsin the population is comparedto the numberof
`said plurality of sequencetags calculated to be present in the
`genomeofthe species of the eukaryotic cell. The plurality of
`sequencetags are within a window of sequencetags which are
`calculated to be contiguousin the genomeofthe species ofthe
`eukaryotic cell. A difference in the numberoftheplurality of
`sequence tags within the window present in the population
`from the numbercalculated to be present in the genomeofthe
`eukaryotic cell indicates a karyotypic abnormality.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1. Schematic of Digital Karyotyping approach. Col-
`ored boxes represent genomic tags. Small ovals represent
`linkers. Large blue ovals represent streptavidin-coated mag-
`netic beads. This figure is described in more detail below.
`FIG.2. Low resolution tag density maps reveal many sub-
`chromosomal changes. The top graph corresponds to the
`Digital Karyotype, while the lower graph represents CGH
`analysis. An ideogram of each normal chromosomeis present
`undereachset of graphs. For all graphs, values on the Y-axis
`indicate genomecopies per haploid genome, and values on
`the X-axis represent position along chromosome (Mb for
`Digital Karyotype, and chromosome bands for CGH). Digital
`Karyotype values represent exponentially smoothedratios of
`DiFitag densities, using a sliding window of 1000virtualtags
`normalized to the NLB genome. Chromosomalareas lacking
`Digital Karyotype values correspond to unsequenced por-
`tions ofthe genome,including heterochromatic regions. Note
`that using a window of 1000 virtual tags does not permit
`accurate identification alterations less than ~4 Mb, such as
`amplifications and homozygousdeletions, and smaller win-
`dowsneed to be employed to accurately identify these lesions
`(see FIG. 3 for example).
`FIGS. 3A and 3B. High resolution tag density maps iden-
`tify amplifications and deletions. (FIG. 3A) Amplification on
`chromosome 7. Top panel represent bitmap viewer with the
`region containing the alteration encircled. The bitmap viewer
`
`PGDX EX. 1002
`Page 7 of 21
`
`PGDX EX. 1002
`Page 7 of 21
`
`
`
`US 7,704,687 B2
`
`3
`4
`in cancer
`undiscovered copy number alterations exist
`is comprised of ~39,000 pixels representing tag density val-
`genomes and that many of these could be detected through
`ues at the chromosomalposition of each virtual tag on chro-
`Digital Karyotyping.
`mosome7, determined from sliding windows of 50 virtual
`Like all genome-wide analyses, Digital Karyotyping has
`tags. Yellow pixels indicate tag densities corresponding to
`limitations. First, the ability to measure tag densities over
`copy numbers <110 while black pixels correspond to copy
`entire chromosomes depends on the accuracy and complete-
`number =110. Middle panel represents an enlarged view of
`ness of the genome sequence. Fortunately, over 94% of the
`the region ofalteration. The lowerpanel indicates a graphical
`human genomeis available in draft form, and 95% of the
`representation of the amplified region with values on the
`sequenceis expected to be inafinished state by 2003. Second,
`Y-axis indicating genome copies per haploid genome and
`a small numberof areas of the genomeare expected to have a
`values on the X-axis representing position along the chromo-
`lower density of mapping enzymerestriction sites and be
`some in Mb. (FIG. 3B) Homozygous deletion on chromo-
`incompletely evaluated by our approach. We estimate that
`some 5. Top, middle and lowerpanels are similar to those for
`less than 5% of the genome would be incompletely analyzed
`(FIG. 3A) except that the bitmap viewer for chromosome 5
`using the parameters employed in the current study. More-
`contains ~43,000pixels, tag density values were calculated in
`over, this problem could be overcome through the use of
`sliding windowsof 150 virtual tags, and yellow pixels indi-
`different mapping and fragmenting enzymes. Finally, Digital
`cate copy numbers >0.1 while black pixels indicate copy
`Karyotyping cannot generally detect very small regions, on
`numbers 0.1. Bottom panel represents detailed analysis of
`the order of several thousand base pairs or less, that are
`the region containing the homozygousdeletion in DiFi and
`amplified or deleted.
`Co52. For each sample, white dots indicate markers that were
`Nevertheless,
`it is clear from our analyses that Digital
`retained, while black dots indicate markers that were
`Karyotyping provides a heretofore unavailable picture of the
`homozygously deleted. PCR primers for each marker are
`listed in Table 4
`DNA landscape of a cell. The approach should be immedi-
`ately applicable to the analysis of human cancers, wherein
`identification of homozygous deletions and amplifications
`has historically revealed genes important in tumorinitiation
`and progression. In addition, one can envisage a variety of
`other applications for this technique. First, the approach
`could be usedto identify previously undiscoveredalterations
`in hereditary disorders. A potentially large number of such
`diseases are thoughtto be dueto deletions or duplications too
`small to be detected by conventional approaches. These may
`be detectable with Digital Karyotyping evenin the absence of
`any linkage or other positional information. Second, use of
`mapping enzymesthatare sensitive to DNA methylation(e.g.
`NotI) could be employed to catalog genome-wide methyla-
`tion changes in cancer or diseases thought to be affected by
`genomic imprinting. Third, the approach could be as easily
`applied to the genomes of other organisms to search for
`genetic alterations responsible for specific phenotypes, or to
`identify evolutionary differences between related species.
`Moreover, as the genome sequencesof increasing numbers of
`microorganismsandviruses becomeavailable, the approach
`can be used to identify the presence of pathogenic DNA in
`infectious or neoplasticstates.
`Populations of sequence tags are generated from defined
`portions of the genome. The portions are defined by one or
`tworestriction endonuclease recognition sites. Preferably the
`recognition sites are located in a fixed position within the
`defined portions of the genome. In one embodiment three
`different restriction endonucleases are used to generate
`sequence tags. In this embodiment, the restriction endonu-
`cleases used to generate the tags can be termed mapping
`(first), fragmenting (second), and tagging restriction endonu-
`clease. The defined portions extend from the fragmenting
`(second) restriction endonucleasesite to the closest mapping
`(first)
`restriction endonuclease site. The sequence tags
`derived from these defined portions are generated by cleavage
`with a tagging enzyme. The closest nucleotides adjacent to
`the fragmenting (second)restriction endonuclease comprise
`the sequence tags. The number of nucleotides is typically a
`fixed number (defined here to include a range of numbers)
`which is a function of the properties of the tagging (third)
`restriction endonuclease. For example, using Mmel the fixed
`numberis 20, 21 or 22. Other Type HSrestriction endonu-
`cleases cleave at different distances from their recognition
`sequences. Other Type HS restriction endonucleases which
`can be used include BbvI, BbvII, BinI, FokI, Hgal, Hphl,
`
`It is a discovery ofthe present inventors that the genome of
`an organism can be sampled in groups of small pieces to
`determine karyotypic properties of an organism using a sys-
`tematic and quantitative method. Changes in copy numberof
`portions ofthe genome can be determined ona genomic scale.
`Such changesinclude gain or loss of whole chromosomesor
`chromosomearms, amplifications and deletions ofregions of
`the genome,as well as insertions of foreign DNA. Rearrange-
`ments, such as translocations and inversions, would typically
`not be detected by the method.
`Our data demonstrate that the method, called Digital
`Karyotyping, can accurately identify regions whose copy
`numberis abnormal, even in complex genomessuch as that of
`the human. Whole chromosome changes, gains or losses of
`chromosomalarms, and interstitial amplifications or dele-
`tions can be detected. Moreover, the method permits the
`identification of specific amplifications and deletions that had
`not been previously described by comparative genomic
`hybridization (CGH)or other methods in any human cancer.
`These analyses suggest that a potentially large number of
`
`FIG.4. Identification of EBV DNA in NLBcells. NLB,
`genomic tags derived from NLBcells after removal of tags
`matching human genome sequences or tags matching DiFi
`cells. DiFi, genomic tags derived from DiFi cells after
`removal of tags matching human genome sequencesor tags
`matching NLBcells. The numberof observed tags matching
`EBV, other viral, or bacterial sequences is indicated on the
`vertical axis.
`
`FIG. 5. Low resolution tag density maps of the DiFi tumor
`genome. For each chromosome,the top graph correspondsto
`the Digital Karyotype while the lower graph represents CGH
`analysis. An ideogram of each chromosomeis depicted under
`each set of graphs. For all graphs, values on the Y-axis indi-
`cate genomecopies per haploid genome, and values on the
`X-axis represent position along chromosome (Mb for Digital
`Karyotypes, and chromosome bands for CGH). Digital
`Karyotype values represent exponentially smoothedratios of
`DiFitag densities, using a sliding window of 1000 virtual tags
`normalized to the NLB genome. Chromosomalareas lacking
`Digital Karyotype values correspond to unsequenced por-
`tions of the genome, including heterochromatic regions.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`20
`
`25
`
`30
`
`45
`
`50
`
`55
`
`60
`
`65
`
`PGDX EX. 1002
`Page 8 of 21
`
`PGDX EX. 1002
`Page 8 of 21
`
`
`
`US 7,704,687 B2
`
`5
`Mboll, Mnll, SfaNI, TaqII, TthllII , BsmFI, and FokI. See
`Szybalski, W., Gene, 40:169, 1985. Other similar enzymes
`will be knownto those of skill in the art (see, Current Proto-
`cols in Molecular Biology, supra). Restriction endonucleases
`with desirable properties can be artificially evolved, i.e., sub-
`jected to selection and screening, to obtain an enzyme which
`is useful as a tagging enzyme. Desirable enzymescleave at
`least 18-21 nucleotides distant from their recognition sites.
`Artificial restriction endonucleases can also be used. Such
`endonucleases are made by protein engineering. For
`example, the endonuclease FokI has been engineered by
`insertions sothat it cleaves one nucleotide further away from
`its recognitionsite on both strands ofthe DNA substrates. See
`Li and Chandrasegaran, Proc. Nat. Acad. Sciences USA
`90:2764-8, 1993. Such techniques can be applied to generate
`restriction
`endonucleases with
`desirable
`recognition
`sequences and desirable distances from recognition site to
`cleavagesite.
`In an alternative embodimenta single restriction endonu-
`clease can define a defined portion of the genome. A fixed
`numberof nucleotides on one or both sides oftherestriction
`
`endonuclease recognition site then forms the sequencetags.
`For example, the restriction endonuclease BegI can be used to
`provide a 36 bp fragment. The 12 bp recognition site (having
`6 degenerate positions)lies in the middle of a fragment; 12 bp
`flank the site on either side. Other similar enzymes which can
`be used in this embodimentinclude BplI and BsaXI. Prefer-
`ably the enzymeusedreleases a fragment having a sum ofat
`least 18 or 20 nucleotides flanking its recognition sequence.
`Enumeration of sequence tags generated is performed by
`determining the identity of the sequence tags and recording
`the number of occurrencesof each such tag or of genomically
`clustered tags. Preferably the determination of identity of the
`tagsis done by automated nucleotide sequence determination
`and the recording is done by computer. Other methods for
`identifying and recording tags can be used, as is convenient
`and efficient to the practitioner. According to one embodi-
`ment of the invention sequence tags are ligated together to
`form a concatenate and the concatenates are cloned and seqe-
`unces. In a preferred embodiment the sequence tags are
`dimerized prior to formation of the concatenate. The
`sequence tags can be amplified as single tags or as dimers
`prior to concatenation.
`A feature of the data analysis which enablesthe efficient
`practice of the method is the use of windows. These are
`groups of sequence tags which are genomically clustered.
`Virtual tags can be extracted from the genomic data for the
`species being tested. The virtual tags are associated with
`locations in the genome. Groups of adjacent virtual tags
`whichare clustered in the genomeare used to form a window
`for analysis of actual experimental tags. The term adjacent or
`contiguousas used herein to describe tags does not imply that
`the nucleotides of one tag are contiguous with the nucleotides
`ofanothertag, but ratherthat the tags are clustered in the same
`areas of the genome. Because of the way that sequence tags
`are generated, they only sample the genome; they do not
`saturate the genome. Thus, for example, a window can com-
`prise sequence tags that map within about 40 kb, about 200
`kb, about 600 kb, or about 4 Mb. Typically such windows
`comprise from 10 to 1000 sequence tags. Use of windows
`such as these permits the genome to be sampledrather than
`comprehensively analyzed. Thus, far less than 100% of the
`
`10
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`sequence tags must be counted to obtain useful information.
`In fact, less than 50%,less than 33%,less than 25%, less than
`20%, even less than 15% ofthe sequencetags calculated to be
`present in the genomeof the eukaryotic cell need be enumer-
`ated to obtain useful data. The karyotypic analysis can be used
`inter alia to compare a cancer cell to a normalcell, thereby
`identifying regions of genomic change involved in cancer.
`The karyotypic analysis can be used to identify genes
`involved in hereditary disorders. The karyotypic analysis can
`be used to identify genetic material in a eukaryotic cell
`derived from an infectious agent.
`Changes in amountofparticular regions ofthe genome can
`identify aneuploidy if (a) sequencetags of one or more auto-
`somesare determinedto be presentin the test eukaryotic cell
`relative to the reference eukaryotic cell at a ratio of 3 or
`greater or less than 1.5; or (b) sequence tags of one or more
`sex chromosomesin a male are determinedto be present in
`the test eukaryotic cell relative to the reference eukaryotic cell
`at aratio of 1.5 or greater or less than 0.7; or (c) sequence tags
`ofX chromosomesin a female are determinedto be present in
`the test eukaryotic cell relative to the reference eukaryotic cell
`at a ratio of 3 or greater or less than 1.5, or relative to a
`reference female eukaryotic cell at a ratio of 1.5 or greater or
`less than 0.7. Similarly, such changes can be measured with
`reference to nucleotide sequence data for the genome of a
`particular species.
`Preferably the methodofthe present invention employs the
`formation of dimers of sequence tags. Such dimers permit the
`elimination of certain types of bias, for example that which
`might be introduced during amplification of tags. Typically
`dimers which do not comprise two distinct tags are excluded
`from analysis. Two sequence tags which form a dimer are
`desirably joined end-to-end at the ends distal to the second
`restriction endonuclease (fragmenting) site. Such distal ends
`are typically formedbythe action of the tagging enzyme,i.e.,
`the third restriction endonuclease. Preferably the distal ends
`are sticky ends. All or part of the oligonucleotide linkers can
`remain as part of the dimers and can remain as part of the
`concatenates of dimers as well. However, the linkers are
`preferably cleavedprior to the concatenation.
`
`EXAMPLES
`
`Example 1
`
`Principles of Digital Karyotyping
`
`These concepts are practically incorporated into Digital
`Karyotyping ofhuman DNAas described in FIG. 1. Genomic
`DNA is cleaved with a restriction endonuclease (mapping
`enzyme) thatis predicted to cleave genomic DNAinto several
`hundred thousand pieces, each on average <10 kb in size
`(Step 1). A variety of different endonucleases can be used for
`this purpose, depending on the resolution desired. In the
`current study, we have used Sacl, with a 6-bp recognition
`sequence predicted to preferentially cleave near or within
`transcribed genes. Biotinylated linkers are ligated to the DNA
`molecules (Step 2) and then digested with a second endonu-
`clease (fragmenting enzyme)that recognizes 4-bp sequences
`(Step 3). As there are on average 16 fragmenting enzymesites
`between every two mapping enzymesites (4°/4*),the major-
`ity of DNA molecules in the template are expected to be
`
`PGDX EX. 1002
`Page 9 of 21
`
`PGDX EX. 1002
`Page 9 of 21
`
`
`
`US 7,704,687 B2
`
`7
`cleaved by both enzymesandthereby be available for subse-
`quent steps. DNA fragments containing biotinylated linkers
`are separated from the remaining fragments using streptavi-
`din-coated magnetic beads (Step 3). New linkers, containing
`a 5-bp site recognized by Mmel, a type IS restriction endo-
`nuclease (18), are ligated to the captured DNA (Step 4). The
`captured fragments are cleaved by Mmel,releasing 21 bp tags
`(Step 5). Each tag is thus derived from the sequence adjacent
`to the fragmenting enzymesite that is closest to the nearest
`mapping enzymesite. Isolated tags are self-ligated to form
`ditags, PCR amplified en masse, concatenated, cloned, and
`sequenced (Step 6). As described for SAGE (17), formation
`of ditags provides a robust methodto eliminate potential PCR
`induced bias during the procedure. Current automated
`sequencing technologies identify up to 30 tags per concata-
`merclone, allowing for analysis of ~100,000 tags per day
`using a single 384 capillary sequencing apparatus. Finally,
`tags are computationally extracted from sequence data,
`matched to precise chromosomallocations, and tag densities
`are evaluated over moving windowsto detect abnormalities in
`DNAsequence content (Step 7).
`The sensitivity and specificity of Digital Karyotyping in
`detecting genome-wide changes was expected to depend on
`several factors. First, the combination of mapping and frag-
`menting enzymesdetermines the minimum sizeofthe alter-
`ations that can be identified. For example, use of SacI and
`NlallI as mapping and fragmenting enzymes, respectively,
`was predicted to result in a total of 730,862 virtual tags
`(defined as all possible tags that could theoretically be
`obtained from the human genome). These virtual tags were
`spacedat an average of 3,864 bp, with 95% separated by 4 bp
`to 46 kb. Practically, this resolution is limited by the number
`oftags actually sampled in a given experimentandthe type of
`alteration present (Table 1). Monte Carlo simulations con-
`firmed the intuitive concept that fewer tags are needed to
`detect high copy number amplifications than homozygous
`deletions or low copy number changes in similar sized
`regions (Table 1). Such simulations were used to predict the
`size ofalterations that couldbe reliably detected given a fixed
`numberof experimentally sampled tags. For example, analy-
`sis of 100,000 tags would be expected to reliably detect a
`10-fold amplification =100 kb, homozygousdeletions =600
`kb, or a single gain or loss of regions =4 Mb in size ina
`diploid genome(Table 1).
`
`TABLE1
`
`8
`Example 2
`
`Analysis of Whole Chromosomes
`
`We characterized 210,245 genomic tags from lymphoblas-
`toid cells of anormal individual (NLB) and 171,795 genomic
`tags from the colorectal cancer cell line (DiF1) using the
`mapping and fragmenting enzymes described above. After
`filtering to remove tags that were within repeated sequences
`or were not present in the human genome(see Materials and
`Methods), we recovered a total of 111,245 and 107,515 fil-
`tered tags from the NLBand DiFilibraries, respectively. Tags
`were ordered along each chromosome,and average chromo-
`somal tag densities, defined as the numberofdetected tags
`divided by the numberof virtual tags present in a given
`chromosome,were evaluated (Table 2). Analysis of the NLB
`data showedthat the average tag densities for each autosomal
`chromosomewassimilar, ~0.16+/-0.04. The small variations
`in tag densities were likely due to incompletefiltering of tags
`matching repeated sequences that were not currently repre-
`sented in the genome databases. The X and Y chromosomes
`had average densities about half this level, 0.073 and 0.068,
`respectively, consistent with the normal male karyotype of
`these cells. Analysis of the DiFi data revealed a much wider
`variation in tag densities, ranging from 0.089 to 0.27 for
`autosomal chromosomes. In agreement with the origin of
`these tumorcells from a female patient(20), the tag density of
`the Y chromosome was0.00. Estimates of chromosome num-
`
`ber using observed tag densities normalized to densities from
`lymphoblastoid cells suggested a highly aneuploid genetic
`content, with =1.5 copies of chromosome1, 4, 5, 8, 17, 21
`and 22, and =3 copies of chromosome 7, 13 and 20 per
`diploid genome. These observations were consistent with
`CGHanalyses (see below) and the previously reported karyo-
`type of DiFi cells (20).
`
`20
`
`25
`
`30
`
`35
`
`40
`
`Theoretical detection of copy numberalterations using Digital Karyotyping*
`
`Size ofAlteration*
`
`Homozygous
`
`deletion Copy
`Amplification
`#
`Heterozygous loss
`Subchromosomal gain
`
`virtual|Copy number = 10 number = 0 Copy number= 1 Copy number = 3
`
`
`
`# bp
`
`100,000
`200,000
`600,000
`2,000,000
`4,000,000
`
`tags
`
`30
`50
`150
`500
`1000
`
`100,000
`
`1,000,000
`
`100,000
`
`1,000,000
`
`100,000
`
`1,000,000
`
`100,000
`
`1,000,000
`
`100%
`100%
`100%
`100%
`100%
`
`100%
`100%
`100%
`100%
`100%
`
`0.06%
`1%
`96%
`100%
`100%
`
`100%
`100%
`100%
`100%
`100%
`
`0.008%
`0.01%
`0.07%
`11%
`99%
`
`0.02%
`3%
`100%
`100%
`100%
`
`0.006%
`0.01%
`0.05%
`3%
`97%
`
`0.08%
`0.7%
`100%
`100%
`100%
`
`*Copy numberalteration refers to the gain or loss of chromosomalregionsin the context of the normal diploid
`genome, where the normal copy numberis 2. The limiting feature of these analyses was not sensitivity for detect-
`ing the alteration, as this was high in every case shown (>99% for amplifications and homozygous deletions and
`>92% for heterozygous losses or subchromosomalgains). What was of more concern wasthe positive predictive
`value (PPV),thatis, the probability that a detected mutation represents a real mutation. PPVs were calculated from
`100 simulated genomes, using 100,000 or 1,000,000 filtered tags, and shownin the table as percents.
`*Size ofalteration refers to the approximate size of the genomic alteration assuming an average of 3864 bp
`betweenvirtualtags.
`
`PGDX EX. 1002
`Page 10 of 21
`
`PGDX EX. 1002
`Page 10 of 21
`
`
`
`US 7,704,687 B2
`
`9
`
`TABLE 2
`
`Chromosome numberanalysis
`
`Chro-
`
`NLB
`
`Chromo-
`
`
`
`Example 3
`
`Analysis of Chromosomal Arms
`
`5
`
`15
`
`Example 4
`
`i:
`Dif
`
`
`
`10
`contrast, the DiFi tag density map (normalized to the NLB
`data) revealed widespread changes, including apparentlosses
`in large regions of 5q, 8p and 10q,andgainsof2p,7q, 9p, 12q,
`13q, and 19q (FIG. 2 and FIG. 5). These changes included
`regions ofknown tumorsuppressor genes (21) and other areas
`commonly altered in colorectal cancer (11, 12, 22). These
`alterations were confirmed by chromosomal CGH analyses,
`which revealed aberrations that were largely consistent with
`mo- Tag=Observed—TagVirtual Observed some Digital Karyotype analyses in both