throbber
(1991) 222, 233-249
`J. Mol. Biol.
`
`Sequence and Comparative Analysis of the Rabbit (X-Like
`
`
`
`
`
`Globin Gene Cluster Reveals a Rapid Mode of Evolution
`
`
`in a G + C-rich Region of Mammalian Genomes
`
`1t, Dan Krane1t, David Vandenbergh
`Ross Hardison
`1§, Jan-Fang Cheng1II
`
`James Mansberger
`1, John Taddie1, Scott Schwartz
`
`2, Xiaoqiu Huang2,r
`and Webb Miller2
`
`1 Department of Molecular and Cell Biolog y
`
`
`
`
`
`2 Department of Computer Science and
`
`
`
`Institute of Molecular Evolutionary Genetics
`
`The Pennsylvania State University
`
`University Park, PA 16802, U.S.A.
`
`
`
`(Received 8 April 1991; accepted 24 July 1991)
`
`
`
`A sequence of 10,621 base-pairs from the a-like globin gene cluster of rabbit has been
`
`
`
`
`
`
`
`
`determined. It includes the sequence of gene O (a pseudogene for the rabbit embryonic
`
`
`
`
`(-globin), the functional rabbit a-globin gene, and the 01 psuedogene, along with the
`
`
`
`
`
`sequences of eight C repeats (short interspersed repeats in rabbit) and a J sequence
`
`
`
`
`implicated in recombination. The region is quite G+C-rich (62%) and contains two CpG
`
`
`
`
`islands. As expected for a very G + C-rich region, it has an abundance of open reading
`
`
`
`
`frames, but few of the long open reading frames are associated with the coding regions of
`
`
`
`genes. Alignments between the sequences of the rabbit and human oc-like globin gene
`
`
`
`
`
`
`
`clusters reveal matches primarily in the immediate vicinity of genes and CpG islands, while
`
`
`
`
`the intergenic regions of these gene clusters have many fewer matches than are seen between
`
`
`
`
`
`the P-like globin gene clusters of these two species. Furthermore, the non-coding sequences
`
`
`
`
`
`in this portion of the rabbit oc-like globin gene cluster are shorter than in human, indicating
`
`
`
`
`
`
`a strong tendency either for sequence contraction in the rabbit gene cluster or for expansion
`
`
`
`
`in the human gene cluster. Thus, the intergenic regions of the a-like globin gene clusters
`
`
`
`
`have evolved in a relatively fast mode since the mammalian radiation, but not exclusively
`
`
`
`
`
`by nucleotide substitution. Despite this rapid mode of evolution, some strong matches are
`
`
`
`
`found 5' to the start sites of the human and rabbit a genes, perhaps indicating conservation
`
`
`
`
`
`
`of a regulatory element. The rabbit J sequence is over 1000 base-pairs long; it contains a C
`
`
`
`repeat at its 5' end and an internal region of homology to the 3' -untranslated region of the
`
`
`
`
`
`a-globin gene. Part of the rabbit J sequence matches with sequences within the X homology
`
`
`
`block in human. Both of these regions have been implicated as hot-spots for recombination,
`
`
`
`
`hence the matching sequences are good candidates for such a function. All the interspersed
`
`
`
`repeats within both gene clusters are retroposon SINEs that appear to have inserted
`
`independently in the rabbit and human lineages.
`
`oc-globin gene cluster; CpG islands; G+C-rich isochores;
`
`
`
`
`Keywords:
`
`
`
`
`
`DNA sequence alignments; evolutionary rates; recombination sequences;
`
`short interspersed repeats
`
`t Author to whom correspondence should be
`
`
`459 II Present address: Human Genome Center,
`
`
`
`addressed.
`
`Donner Laboratory, Lawrence Berkeley Laboratories,
`of Genetics, t Present address: Department
`
`
`
`
`Berkeley, CA 94720, U.S.A.
`
`
`
`Washington University School of Medicine, 4566 Scott
`'I! Present address: Department of Computer Science,
`
`
`
`
`
`
`Avenue, St Louis, MO 63110-1095, U.S.A.
`
`
`
`Michigan Technological University, Houghton,
`
`
`
`
`§Present address: National Institute on Drug Abuse/
`MI 49931, U.S.A.
`
`
`
`Addiction Research Center, P.O. Box 5180, Baltimore,
`MD 21224, U.S.A.
`
`
`
`0022-2836/91/2202:33-l 7 $03.00/0
`
`233
`
`
`
`© 1991 Academic Press Limited
`
`SKI Exhibit 2027 - Page 1 of 17
`
`

`

`234
`
`et al.
`R.Hardison
`
`non-erythroid cells (Calza et al., 1984; Goldman et
`
`1.Introduction
`
`
`al., 1984). These characteristics contrast with the
`
`Much work has been devoted to understanding
`
`A+ T-rich and CpG-deficient P-like gene cluster,
`
`
`the mechanisms involved in the co-ordinate
`
`which has been shown to be methylated in non­
`
`
`
`temporal and tissue-specific regulation of ct and
`
`
`erythroid tissues in human (van der Ploeg & Flavell,
`
`
`P-globin genes (for a review, see Collins &
`
`
`
`1980) and rabbit (Shen & Maniatis, 1980). The P-likc
`
`Weissman, 1984; Evans et al., 1990; Orkin, 1990).
`
`globin gene clusters, like the bulk of mammalian
`
`
`Given that equal amounts of ct-like and P-like globin
`
`
`isochores genomic DNA, are present in low-density
`
`must be synthesized to produce the hemoglobin
`
`
`(Bernardi et al., 1985), and like most tissue-specifie
`
`tetramer, ct2/32, one might anticipate
`that this co­
`
`genes, they are replicated late in S phase in non­
`
`
`
`ordinate expression would be achieved by utilizing
`
`
`erythroid cells but early in erythroid genes (Dhar et
`
`
`
`
`identical regulatory schemes. However, that is
`al., 1988; Epner et al., 1988).
`
`
`certainly not the case. The human ct-globin gene is
`The correlation between striking differences in
`
`
`
`
`
`
`
`expressed permissively in a variety of cell lines after
`
`
`
`genomic context and in types of regulation suggests
`
`
`transfection (Mellon et al., 1981 ), whereas the
`
`
`that a detailed comparison of both these gene
`
`
`
`/J-globin gene requires either a viral enhancer
`in cis
`
`
`
`clusters between mammalian species would be
`
`
`
`or erythroid induction to be expressed after trans­
`
`
`
`productive in generating insights into their regula­
`
`
`fection (Banerji et al., 1981; Humphries et al., 1982;
`
`
`tion. Indeed, analyses of the extensively sequeneed
`Wright et al., 1984; Charnay et al., 1984). This
`
`
`human and rabbit P-like glohin gene clusters (73·4
`
`
`
`permissive expression of ct-globin genes in non­
`
`
`and 44·6 kb, respectively) reveal long stretches of
`
`
`erythroid cells is observed for genes from both
`
`
`
`
`sequence similarity that extend through and even
`
`human and rabbit, but not mouse (Cheng et al.,
`
`
`between each of the P-like glohin genes (Margot
`et
`
`
`1986; Whitelaw et al., 1989). A dominant control
`
`
`al., 1989). The sequence similarity between t}w
`
`
`
`
`element for the P-like globin gene cluster is the locus
`
`
`mouse and human /3-like globin gene clusters is Jes,;
`
`
`control region (LCRt) located 5 to 20 kb 5' to the
`
`
`
`
`extensive (Shehee et al., 1989), as is expected given
`
`
`e-globin gene (Grosveld et al., 1987; Forrester
`et al ..
`
`
`
`the more rapid rate of evolution in rodents (Wu &
`
`
`
`regulatory region has also1987). A strong positive
`Li, 1985). One notable segment of extragenie
`
`be�n found 40 kb 5' to the human (-globin gene
`
`
`sequence, located about 6 kb 5' to the human
`
`
`(Higgs et al., 1990); although the ct-globin LCR
`
`
`e-globin gene, has been highly conserved between
`
`shares some properties with that of the p-globin
`
`human, rabbit and mouse (Hardison, 1991 ), and it
`
`gene cluster, it is not clear that the regulatory
`has been shown to he part of the LCR of the fi-likP
`
`
`features of these two LCRs are identical. Also,
`
`
`glohin gene cluster (Forrester et al .. 1987; Grosveld
`
`
`mammalian adult P-globin genes reach full induc­
`
`
`
`is carried out inet al., 1987). A similar comparison
`
`tion at later stages of development than adult
`
`
`this paper for the ct-like globin gene clustPrs of
`
`
`
`ct-globin genes (Rohrbaugh & Hardison, 1983;
`rabbit and human.
`
`
`
`Peschle et al., 1985). Some of the critical sequences
`The rabbit ct-like glohin gene cluster is located in
`
`
`
`
`
`that account for these differences are within or 3' to
`
`
`a Geimsa-light hand at the terminus of the long arm
`
`these genes (Charnay et al., 1984; Wright et al.,
`
`
`of chromosome 6 (Xu & Hardison, 199]). The
`1984).
`
`
`minimal gene cluster includes one adult ct-globin
`
`
`
`These differences in regulation may be related to
`
`
`
`gene, five homologs to the emhryonie (-globin gem·.
`
`
`the substantially different genomic contexts
`
`
`and two 0-glohin pseudogenes, arranged
`in the order
`
`
`observed for the ct and P-like globin gene clusters.
`
`5'-(0-(l-ct-01-(2-(3-02-(4-3' within a 38 kb DNA
`After the ct-like and P-like globin gene clusters
`
`
`segment (Cheng et al., 1986, 1987. 1988). This ge1w
`
`
`moved to different chromosomes in the progenitor
`
`
`
`
`cluster probably evolved by a duplication of a large
`
`to birds and mammals (for a review, see Collins
`&
`
`
`DNA block containing the (-(-ct-0 gene set, followed
`
`
`
`
`Weissman, 1984; Hardison, 1991), they evolved into
`
`
`by deletion of the ct-globin gene in the 3' duplicated
`
`
`very different segments of the genome in some
`
`(2-(3-02 gene set (Cheng et al .. 1987). The rabbit
`
`
`mammalian lineages. For example, the G + C-rich
`
`
`
`ct-like globin gene cluster is highly polymorphic both
`
`
`
`
`ct-like gene cluster in humans contains several CpG­
`
`for the number of duplicated gene sets as well as for
`
`
`rich islands (Fischel-Ghodsian et al., 1987) that are
`
`restriction fragment lengths around (0 and ;: I
`
`never methylated (Bird et al., 1987), and this gene
`
`
`
`(Cheng & Hardison, 1988). A similar sequence i,-,
`
`cluster is found in the most dense (most G + C-rich)
`
`
`found at the breakpoints proposed for the recom­
`
`isochore in both human and rabbit genomes
`
`
`
`binations involved in duplicatiorn; of(, block dupli­
`
`
`(Bernardi et al., 1985). Isochores are verv long
`
`, and deletion of a: this common
`
`cations of (-(-ct-0
`
`
`
`(probably thousands of kb) segments of "homo­
`
`
`
`
`junction sequence is called a ,J sequen('e (Cheng Pt
`
`
`
`geneous base composition that may correspond to
`
`
`
`al., 1987). Part of the J sequence is verv similar to
`the Giemsa light and dark bands seen in metaphase
`
`
`
`the 3'-untranslated sequence of the a-glohin gene.
`
`
`chromosomes (Bernardi, Unlike most tissue­
`1989).
`
`
`
`and this homology is likely to have been involved in
`
`
`
`specific genes, the ct-like globin gene clusters are
`
`
`
`the recombination that deleted the a-glohin ge1w
`
`
`replicated early in S phase in both erythroid and
`
`from the (2-(3-02 gene set. The deletions that
`
`
`
`occurred frequently during the propagation of ).
`
`
`
`
`this clones carrying rabbit genomic DNA eontaining
`
`gene cluster also mapped close to thP J sequences
`
`region-kb t Abbreviations used: LCR locus control
`
`
`
`10 bases or base-pairs; bp, base-pair(s).
`
`3
`
`'
`
`' '
`
`SKI Exhibit 2027 - Page 2 of 17
`
`

`

`
`
`Rapid Evolution in G + C-rich Regions
`
`
`
`235
`
`(b)Analysis
`of the DNA sequence
`
`(Cheng et al., 1987), arguing that these sequences
`
`sequence was determined from both strands. The
`
`
`
`
`
`
`could constitute a hot-spot for recombination. This
`
`
`
`
`sequence was not determined through 5 restriction sites
`
`gene cluster contains at least one active gene, the
`
`that were the ends of fragments used to construct
`
`
`
`
`subclones. These sites are internal to either C repeats or J
`
`adult a-globin gene (Cheng et al., 1986), and the
`
`
`sequences. In 3 cases, Bglll*, Pstl* and BamHI* in
`
`
`gene (O is the most likely candidate to encode the
`
`
`C4 7 (Fig. 1 ), the sequences around the restriction site
`
`
`
`embryonic (-globin found in rabbit. The remaining
`
`
`
`
`match with similar sequences repeated elsewhere in the
`
`
`(-globin genes appear to be pseudogenes (Cheng et
`
`
`gene cluster or genome, hence it is unlikely that any
`al., 1988).
`
`
`
`sequence is missing. The short segment from 4278 to 4283
`
`
`The a-like globin gene cluster in human is located
`
`was highly compressed on the gels and hence this
`
`verv close to the telomere of the short arm of
`
`
`sequence could not be determined unambiguously. The
`
`chr�mosome 16 in a segment of very G+C-rich
`
`
`
`new sequences were combined with previously determined
`
`DNA that continues as far as 2000 kb (Harris
`
`sequences (Cheng et al., 1986, 1987, 1988; Krane et al.,
`et al.,
`
`
`
`1990). The cluster contains a functional (2 gene
`
`
`
`
`1991) to generate a composite sequence extending from
`the 5' flank of (1 to the 3' flank of 01. The (l-0t-01
`
`
`
`
`encoding an embryonic (-globin polypeptide, a non­
`
`
`
`sequence is available as GenBank accession number
`
`
`functional O gene that is only slightly divergent
`
`
`M35026, and the sequence of J01 is available as EMBL
`
`
`from (2, a highly divergent t/Ja2 pseudogene, a
`
`accession number X60985.
`
`
`
`moderately divergent pseudogene I/Jal that has lost
`
`
`A composite file of sequences from the human Ot-like
`
`
`
`its CpG island, duplicated functional adult a-globin
`
`
`globin gene cluster was assembled from data in the
`
`genes a2 and al, and a 0 gene that produces tran­
`
`
`following sources: the 5' flank of gene (1 (Willard
`et al.,
`
`
`
`
`scripts, but for which no polypeptide product has
`
`
`1985), gene (1 (Proudfoot et al., 1982), pseudogene i/10t2
`
`
`been identified (for a review, see Higgs et al., 1989).
`
`
`
`(Hardison et al .. 1986), intergenic sequence between (1
`
`The genes are arranged in the order 5' -(2-0-t/Ja2-
`
`
`
`and i/10tl (Sawada et al., 1983), pseudogene i/10tl (Proudfoot
`
`
`
`
`& Maniatis, 1980), homology blocks containing 0t2 and 0tl
`
`
`t/Ja I -a2-a 1-0-3'. The a-glob in genes were duplicated
`
`
`(Liebhaber et al., 1980; Michelson & Orkin, 1980, 1983:
`
`
`
`in the stem simians (Sawada & Schmid, 1986), and
`
`
`Hess et al., 1983. 1984), 3' flank of 0tl (Hardison & Gelinas,
`
`the 5' a gene found in several other mammals (Schon
`
`
`1986). intergenic sequence between 0tl and 01 (Bailey.
`
`et al., 1982) is orthologous to the human I/Jal gene,
`
`1990), and 01 (Hsu et al., 1988).
`
`
`based on sequence similarities in the 5' flank
`
`
`(Hardison & Gelinas, 1986; Sawada & Schmid,
`
`
`1986). The duplication of a genes in higher primates
`
`
`has left a long homology block of about 4 kb that is
`
`
`divided into three regions called X, Y and Z. A
`Direct and inverted repeats, open reading frames and
`
`
`
`
`
`
`sequence that confers an enhanced rate of recom­
`
`
`
`nucleotide strings were identified with the computer
`
`bination in COS cells has been mapped to the first
`
`
`program DNA Inspector Ile (Textco) running on a
`
`300 bp of the X region (Hu & Shen, 1987). Almost
`
`
`
`Macintosh computer. Plots of G + C richness and CpG and
`
`
`20 kb of continuous sequence has been determined
`
`GpC dinucleotides were made from the output of the
`
`
`BASIC computer program "Di-nt Frequency" (Krane.
`
`from the human a-like globin gene cluster (see
`
`1990) scanning windows 50 bp in length.
`
`
`
`Materials and Methods), encompassing the region
`
`
`Local alignments of the 2 sequences were generated
`
`
`from O through 0. This sequence, along with that
`using the program SIM (Huang et al., 1990), run on a
`
`reported for rabbit in this paper, allows a compre­
`
`
`
`Sun4 workstation. SIM generates alignments between
`
`
`
`hensive comparison of a major portion of the gene
`
`
`very long D�A sequences while using computer space
`
`
`clusters in rabbit and human, including the three
`
`
`
`efficiently. and it produces alignments that are optimized
`
`
`
`major members of the a-like globin gene cluster,(, a
`
`
`
`to parameters set by the user. All alignments discussed in
`
`
`
`and 0. Parallels and differences are discussed for
`
`
`this paper were obtained where matches count I,
`
`
`
`these interspecies comparisons of a-like and P-like
`
`
`
`mismatches count -l, the gap-open penalty is 4·0, and
`globin gene clusters.
`
`
`
`the gap-extension penalty is 0·4 per nucleotide. With the
`
`
`single exception of Fig. 7, the number of local alignments
`
`
`to be used was determined by using theoretical results
`
`
`
`(Karlin & Altschul, 1990) on the expected number of gap­
`
`
`free alignments. Specifically, we used only those align­
`
`
`
`ments whose score exceeds a threshold r. defined so that
`
`
`the probability is 0·8 that random sequences matching the
`Clones of rabbit DNA containing the Ot-like globin gene
`
`
`
`
`
`
`cluster were isolated from a library of rabbit genomic
`
`
`given sequences in length and in nucleotide composition
`
`
`
`have a gap-free alignment scoring at least r. Informally
`
`DNA (Maniatis et al., 1978; Cheng et al., 1986, 1987, 1988).
`
`
`The recombinant phage ).R0tGl containing the genes (1, Ot
`
`
`speaking, r is a threshold where we expect that 2 random
`
`
`
`and 01 was used to generate restriction fragments that
`
`
`sequences of the given length and composition would
`
`
`were subcloned into plasmids pBR322, pUC or pBlue­
`
`
`
`
`exhibit 1 or several gap-free alignments, i.e. a dot-plot of
`
`script, and into the phage Ml3. Most of the DNA
`
`
`
`random sequences at these criteria would contain a few
`
`
`
`sequence was determined by the dideoxynucleotide chain
`
`
`specks. The large number of local alignments generated
`
`termination method (Sanger et al., 1977) and was
`
`
`by SIM were organized and viewed by a graphical user
`
`
`confirmed in some regions with the base-specific chemical
`
`
`
`
`
`interface called LAV (local alignment viewer; Schwartz
`et
`
`
`degradation method (Maxam & Gilbert, 1980). In some
`
`al., 1991). Figs 6, 7 and 8 were drawn directly from the
`
`
`
`cases. directed deletions for rapid sequence determination
`
`
`SIM alignments and from hand-generated files giving
`
`
`were constructed by using exonuclease III and mung bean
`
`
`
`
`positions of sequence features (such as exons, introns and
`
`
`
`
`nuclease (Henikoff, 1984). The strategies employed in
`
`
`
`repeated sequences) using the program LAD (local align­
`ment diagramer).
`
`
`
`determining the sequence are shown in Fig. I. Much of the
`
`
`
`2.Materials and Methods
`
`
`
`(a)Determination of DNA sequence
`
`SKI Exhibit 2027 - Page 3 of 17
`
`

`

`236
`
`R.Hardison
`et al.
`
`I
`
`II
`
`200 bp
`
`PstI
`BamHIPvun
`lf1ndm Pvull
`BamHI*
`
`NcoI Psi! 4 Aval
`
`BstXI.
`
`NcoI
`Noe! Sacll Noe! Ava I
`
`-
`
`-
`
`I
`
`I I
`I I
`
`I I I
`
`I
`
`I I JI I I
`a cap
`
`-
`
`-
`
`m
`Socil BomHI*
`
`BamHI*
`
`Ps!I
`
`- J
`
`81
`
`-
`
`Figure 1. Strategies used to determine the DNA sequence of regions I, II and III. Arrows above the line indicate the
`
`
`
`
`
`
`
`
`
`
`
`extent of individual readings of the top strand, and arrows below the line correspond to readings of the lower strand. The
`
`
`dotted lines drawn above parts of regions II and III cover the sequence previously determined by Cheng et al. (1987).
`
`
`
`
`
`
`
`
`of their poly(A) tracts. the gene cluster; they point in the direction ofC repeats within Ope n arrows indicate the positions
`
`
`
`
`
`Asterisks mark internal restriction sites through which the sequence has not been determined.
`
`3.Results
`
`
`
`
`gene cluster
`
`result of additional duplications of the (-(-0 gene set
`
`
`
`To obtain a quantitative estimate of the fraction of the
`
`
`
`1988).
`
`
`
`et-like or P-like globin gene clusters that match between
`
`(Cheng & Hardison, The homology blocks
`
`containing ( and 0 genes (Z blocks and T blocks,
`
`
`rabbit and human (see Results, section (h)), local align­
`
`
`
`ments were optimally chained together to make "meta­
`
`
`respectively) are bounded by a characteristic junc­
`
`
`
`alignments" using an algorithm for computing optimal
`
`
`tion sequence called a J sequence (Fig. 2). In this
`
`
`paths in a directed acyclic graph (Corman et al., 1990). In
`
`paper they will be referred to by the name of the
`
`1 SIM align ment follows another
`the meta-alignments,
`gene that they follow, e.g. JO is 3' to gene (I. As
`
`
`
`
`only if its starting positions in the 2 sequences follow the
`
`
`will be explained below, the J sequences extend
`
`
`
`ending positions of the other alignment. The chaining was
`
`from the C repeat at the 5' end through a sequence
`
`done so as to maximize the number of matches in the
`
`
`
`homologous to the 3' portions of the rx.-globin gene
`
`meta-alignment. For the rabbit and human et-globin
`(Cheng et al., 1987).
`
`
`genes, the divergence was determined from the aligned
`
`
`
`
`sequences and corrected for multiple substitutions at a
`New sequence data (Fig. l) were combined with
`
`
`
`
`single site (Jukes & Cantor, 1969) to obtain the number of
`
`previously published sequences to make a con­
`
`
`substitutions per site. The time of divergence between
`
`
`
`tiguous sequence of 10,621 bp, beginning 2273 hp
`rabbit and human was taken as the time of the mam­
`
`
`5' to (I, extending through the rx.-globin gene and
`
`
`malian radiation, about 80 million years ago (Romero­
`
`
`ending 204 hp 3' to the polyadenylation signal of the
`Herrera et al., 1973).
`
`
`01 pseudogene. It is available from the GenBank
`
`
`database under accession number M35026. This
`
`
`
`three-gene set contains homologs to each of the
`three rx.-like globin genes found in mammalian
`rx.-like globin(a)Nucleotide sequence of the rabbit
`
`
`species, and it contains most of the DNA in the
`
`basic set of genes that has duplicated to evolve this
`A diagram of the portion of the rabbit rx.-like
`
`gene cluster.
`
`
`globin gene cluster isolated in 38 kb of cloned DNA
`(Cheng et al., 1986, 1987, 1988) is shown in Figure 2.
`
`
`
`
`Analysis of a population of laboratory rabbits by
`
`genomic blot-hybridization shows that the gene
`The rabbit rx.-like globin gene cluster contains at
`
`
`
`
`
`
`cluster can extend farther 3' in some haplotypes as a
`
`
`least 15 C repeats, the predominant short inter-
`
`
`
`C repeats(b)Short, interspersed
`
`
`
`SKI Exhibit 2027 - Page 4 of 17
`
`

`

`Rapid Evolution
`in G + C-rich
`Regions
`
`237
`
`�o J�O �I J�1 ex
`■w■rn◄ I
`16 19
`17 20
`18 B B B
`II
`
`21 36 22
`
`81
`
`J81
`
`�2
`
`Hf ... @
`I
`
`46 47 23 25
`24
`
`2 kb
`J82
`
`82
`
`I [fi)
`
`BB B B
`
`B B
`
`BB B
`
`m
`
`ti -
`
`-
`
`-
`
`l.RcxGx1
`
`l.RcxGlt20
`
`l.RtGx1
`
`Figure
`2. Organization
`of the rabbit
`ix-like
`globin
`gene cluster.
`Related
`genes are represented
`by boxes with the same
`shading,
`C repeats
`are shown as filled triangles
`and J sequences
`are shown as open, pointed
`boxes.
`A repetitive sequence
`et al., 1987) is shown as a stippled
`found by hybridization
`experiments
`(Cheng
`polygon
`between
`genes (2 and (3; current
`data do not exclude
`the possibility
`that it is one or several divergent
`C repeats.
`BamHI (B) sites
`within
`the gene cluster
`are indicated
`on the 2nd line, and sequenced
`segments
`are shown as thick regions
`on this line. The new sequences
`reported
`in this paper are shaded
`on the 2nd line. Boxes
`below the gene map show the T homology
`blocks
`containing
`0
`genes and Z homology
`blocks
`containing
`( genes separated
`by junction,
`or J, sequences.
`Horizontal
`lines below the
`of A clones
`genome by Cheng et al. ( 1987,
`homology
`blocks
`indicate
`the positions
`isolated
`from this region
`of the rabbit
`1988).
`
`repeat in the rabbit genome (Cheng et al.,
`G+C-rich
`C repeat,
`C14, from a recently
`trans­
`spersed
`posing subfamily (Krane et al., 1991). Seven
`1984; Hardison
`& Printz,
`1985).
`These repeats
`tend
`segments
`of the rabbit a-like
`globin gene cluster
`are
`to insert into or nearby one another
`in groups
`(Krane et al., 1991);
`noticeably
`high in A+ T content
`( > 60 % ) relative
`this is readily
`apparent
`in the 5'
`to the remainder
`of the cluster
`(Fig. 3(a)),
`but in
`flanks of genes O and (2 (Fig. 2), although
`the
`five of these cases the A+ T richness
`is derived
`from
`segment
`from a through
`01 has remained
`free of C
`and (CT)n tracts
`found at the 3' end of C
`the poly(A)
`repeats.
`A total of eight C repeats
`are in the 0-a-01
`(Krane et al., 1991) that have
`repeat sequences
`sequence,
`comprising
`2·7 of the 10·6 kb, or 25% of
`transposed
`into this region
`of the rabbit
`genome.
`As
`the contiguous
`sequence.
`At least seven
`additional
`C
`et al., 1986), the longest
`previously
`noted (Cheng
`repeats
`have also been detected
`by sequence
`analy­
`A+ T-rich stretch
`(between
`a and 01) is flanked
`by
`sis and hybridization
`studies
`in the remainder
`of the
`gene cluster
`(Fig. 2), and it is likely that the
`10 hp-long
`inverted
`repeats,
`suggesting
`that it too
`may have entered
`this gene cluster
`by a trans­
`unidentified repeats
`between
`(2 and (3 include
`more
`C repeats
`and possibly
`a J sequence.
`In contrast,
`no
`position
`event.
`As expected
`for a sequence
`with a low A+ T
`members of the predominant
`long interspersed
`et al., 1989),
`DNA, LlOc (Demers
`content,
`many open reading
`frames are observed
`on
`family of repeated
`both strands (Fig.
`4). Some of the open reading
`have been found in the a-like
`globin gene cluster
`frames correspond
`to the exons of the a-globin
`gene,
`either
`by sequence
`determination
`or by hybridiza­
`(Cheng et al., 1987).
`the only functional
`gene in this region,
`but most do
`tion studies
`not correspond
`to regions
`that encode known poly­
`peptides.
`This illustrates
`the difficulty
`in identifying
`potential
`genes by mapping open reading
`frames in
`G + C-rich
`sequences.
`The base composition
`of the 0-a-01 DNA
`sequence
`is 62 % G + C and 38 % A+ T; this is
`essentially
`the reverse
`of the values reported
`for the
`whole rabbit genome, 44% G+C and 56% A+T
`in human (Bird et al.,
`(Sober,
`1968), or for the rabbit P-like
`globin gene
`Similar
`to the situation
`et al.,
`et al., 1987), the rabbit
`cluster,
`39% G+C and 61% A+T (Margot
`1987; Fischel-Ghodsian
`1989). The high G + C content
`of the a-like
`globin
`a-like globin gene cluster
`contains
`CpG islands,
`gene cluster
`is uniform over most of the gene cluster
`whereas
`the P-like
`globin gene cluster
`does not. The
`(Fig. 3(a)),
`and regions
`rabbit a-globin
`of greater
`than 65% G+C
`gene has many CpG dinucleotides
`in
`content
`can be found throughout
`the sequence,
`not
`its 5' flank and internally
`(Fig. 3(a)).
`This abun­
`just in close association
`with functional
`genes.
`This
`dance of CpGs is much greater
`than is seen for gene
`is in striking
`contrast
`to the P-like globin gene
`0, which is equally
`rich in G + C content;
`thus, the
`cluster,
`which has a very low G+C content
`through­
`cluster
`of CpG dinucleotides
`in the a-globin
`gene
`out, exemplified
`by the /;-{3 region shown in Figure
`does not simply result
`from a high G + C content.
`The 01-globin
`gene also has a strikingly
`high level of
`3(b). This A+T richness
`is interrupted
`mainly by a
`
`( c)Base composition
`
`(d)CpG islands
`
`SKI Exhibit 2027 - Page 5 of 17
`
`

`

`238
`
`R.Hardison
`et al.
`
`�I
`
`16 17 18
`19 20
`..... ti
`•
`100
`
`21 36 22
`
`a
`
`81
`
`-� ►◄ .. IM:�m@;I
`
`80
`
`� !!..- 60
`
`+ 40(9
`
`20
`
`0
`
`15
`
`(9
`
`(.)
`
`10
`
`5
`
`0
`
`15
`
`(9
`
`10
`
`5
`
`0
`
`2000
`
`6000 8000 10,000
`4000
`Nucleotide
`position
`
`� -�
`
`f3
`
`15
`
`,i,a
`I �
`
`13
`◄ I
`
`14
`�► �j
`
`(al
`
`100
`
`80
`
`� !!..- 60
`
`(.)
`+ 40
`(9
`
`20
`
`0
`0
`
`r
`I
`
`
`j ':kJtJ.JW\1mJ1�MAAn&\�
`
`051
`
`M a a M a m&a u a a
`� _ oo o a Ou
`su a Ao mwu a .
`
`22,000 24,000 26,000 28,000
`30,000 32,000
`
`
`
`Nucleotide position
`
`( b)
`Fig. 3.
`
`SKI Exhibit 2027 - Page 6 of 17
`
`

`

`
`
`Rapid Evolution in G+C-rich Regions
`
`
`
`239
`
`0
`
`5
`
`10
`
`21 J 36 22
`◄ �
`
`Cl
`
`DJ
`
`ORFs !
`.... 3 c:J
`
`c:::::J
`
`c:::::J c::::Jc=:c::J
`
`c:Jc::::J
`c::J
`
`c::J
`c:=:=:J c::J c::::J
`
`c:::::J
`
`c:J
`
`c::J
`
`c::::::::Jc::::J
`I j
`
`ORFs ! c::::J c::::::::J
`c:::::J
`..... 3 c::::::J
`
`y
`
`RY
`
`c::::J
`c::::J
`c::::J c::::J
`
`c:J
`
`c:::::J
`
`I:.
`
`c::==::::J
`c::::::J
`
`c::::J
`
`c::::J c::Jc::J
`j I ::::J
`
`6
`
`7
`
`8,9,10 11 13,13,13 12
`
`A
`
`AM A
`
`1
`2
`Dir Apt A A
`
`34 5
`!:,Al).
`
`Inv Apt
`
`13.13
`
`4
`
`1
`!).
`
`2
`
`� •v � t V, ire "ii
`,1
`�
`are shown as in Genes and repeats globin gene cluster. Figure 4. Summary of sequence features in the rabbit ix-like
`
`
`
`
`
`
`
`Fig. 2, except that the introns of the genes are shown as open boxes. Vertical bars below the line are BamHI restriction
`
`
`
`
`
`
`sites. 0 RFs -+, open reading frames longer than 300 nucleotides on the top strand; 0 RFs +-, open reading frames longer
`
`
`
`
`
`
`than 300 nucleotides on the bottom strand; Y, strings of pyrimidine residues longer than 14 nucleotides on the top
`
`
`
`
`
`
`
`strand: RY. alternating purine and pyrimidine residues longer than 14 nucleotides; Dir Rpt, direct repeats longer than
`
`
`
`
`
`
`18 nucleotides with no more than 2 mismatches; Inv Rpt, inverted repeats longer than 18 nucleotides with no more than
`
`
`
`
`
`
`
`
`2 mismatches. The direct and inverted repeats are indicated by arrowheads either pointing upward for the first repeat or
`
`
`
`
`
`downward for the second repeat; a series of 5 26-nucleotide tandem repeats between the IX and 01 genes is shown as 5
`
`
`
`
`
`regions of the coding regions or homologous triangles numbered 13. Repeats involving pairs of C repeats of genes are not
`
`
`
`shown.
`
`( e)Simple repeats
`in the rabbit rx-like globin
`
`
`gene cluster
`
`CpG dinucleotides (Fig. 4(a)), even though it does
`
`
`are relatively evenly dispersed in the P-like globin
`
`
`
`
`not encode a functional globin polypeptide (Cheng et gene cluster (Margot et al., 1989). Two of the five
`
`
`
`al., 1986). Like the bulk of the genome, the rabbit
`
`tracts shown in Figure 4 correspond to tandem
`
`
`repeats of the dinucleotide ApC. These are found
`
`P-like globin gene cluster has very few CpG
`
`dinucleotides (note the change of scale in Fig. 3(b)),
`3' to the rx gene and 570 hp 51 to 01. No
`immediately
`and the small group of CpGs that are seen are in the
`
`
`homologs of either of these sequences have been
`
`recently inserted Cl4 repeat.
`found in the human rx-like globin gene cluster,
`unlike an (AC)13 tract within the rabbit P-like
`
`globin gene cluster that has been shown to have a
`homolog in human (Margot et al., 1989).
`
`Direct repeats that contain at least 18 nucleotides
`
`
`
`with no more than two mismatches are also listed in
`
`
`Only one string of pyrimidine nucleotides longer
`
`
`Figure 4. One especially notable region is a segment
`than 14 is found in the rabbit rx-like globin gene
`
`of 134 nucleotides between the rx and 01 genes
`
`
`
`cluster (Fig. 4), and no strings of polypurines are
`
`(number 13, Fig. 4) that contains five tandem,
`
`found, whereas such sequences are common in the
`1989).
`
`
`P-like globin gene cluster (Margot et al., Most
`
`
`
`imperfect repeats of the 26 nucleotide sequence
`
`
`CACCGCCGTAGCCGGGAATGGTGGGG (Cheng et
`
`
`
`polypyrimidine strings within the rx-like globin gene
`al., 1986). (AC)n tracts
`
`account for two of the direct
`
`cluster occur in the (CT)n sequence
`
`in C repeats, and
`
`
`
`repeats (numbers 6 and 11, Fig. 4). Direct repeats 8,
`
`
`
`any such sequences involving C repeats have been
`
`9, 10 and 12 contain imperfect copies of the
`
`omitted from this analysis. Tracts of tandemly
`
`sequence GCCC; note that these are found in the
`(RY)n,
`
`repeating purine/pyrimidine dinucleotides,
`where n is greater
`
`
`CpG-rich islands of the rx and 01 genes. An inverted
`than six, are found mainly around
`
`
`repeat of 18 nucleotides separated by 37 nucleotides
`
`the 01 gene; these tracts also occur frequently, but
`
`Genes and C gene clusters. frequencies within the rabbit ix and P-like Figure 3. Base composition and dinucleotide
`
`
`
`
`
`
`
`
`
`
`
`repeats are shown as in Fig. 2; Ll repeats within the P-like gene cluster are shown as striped arrowheads. The number of
`
`
`
`
`
`
`
`
`CpG and GpC dinucleotides in 50-nucleotide segments across the regions are plotted in the lower 2 panels of (a) and (b).
`
`
`
`
`
`
`
`The ix-like gene cluster is shown in (a), and a region of the P-like gene cluster of equal size containing the t/Jb and P genes is
`
`shown in (b). C repeats are numbered.
`
`SKI Exhibit 2027 - Page 7 of 17
`
`

`

`240
`
`R.Hardison
`et al.
`
`RabbitJ�O
`
`their 5' end, hence this C repeat can be considered
`
`
`
`such part of the J sequence, but some ,J sequences,
`
`as J01 and ,](3 do not have C repeats at their 3' ends
`Human lf(ll
`
`(Fig. 2). The J sequence, therefore, begins with a C
`
`
`repeat and continues for a total of about 1000
`
`
`nucleotides. Part of the sequence 5' to ,J01 matches
`
`
`
`with the consensus C repeat, indicating an addi­
`
`
`tional C repeat, C46, in the gene cluster.
`
`Rabbit J�I
`
`RabbltJ81
`
`31JTa
`
`Rabblt<1
`
`(g)Homolog y with the human ,:x.-like
`ylobin
`gene cluster
`
`(f)Rabbit J sequences
`
`Figure
`among matches 5. Schematic diagram showing
`
`
`
`
`
`
`(i)Overall pattern8 of matches
`
`
`rabbit junction (J) sequences, human 1/let.l and rabbit et..
`
`
`
`The percentage of matching nucleotides, determined from
`
`The program SIM (Huang et al., 1990) was used to
`
`
`
`pairwise alignments generated by SIM, are given in the
`
`align the sequence of the rabbit ex-like globin gene
`
`
`shaded areas between the diagrams of the genes and ,J
`
`
`cluster (10,621 nucleotides) with a composite
`
`
`
`
`sequences. C and Alu repeats are filled arrows, exons are
`
`
`
`sequence of the human gene cluster (19,574 nucleo­
`
`
`grey boxes, and the 3' -untranslated region of the et.-globin
`
`
`
`tides), containing (l-l/,ex2-l/,ex l -ex2-cd-01. ThiR
`
`
`gene (3' UT!.X) or its homologs are cross-hatched boxes. A
`
`program generates non-overlapping local align­
`
`
`
`slanted line in a gene means non-matching sequences were
`
`
`ments of very long sequences that are optimized to
`
`
`omitted for clarity. nt, nucleotides.
`
`the scoring parameters specified by the user. These
`
`
`
`alignments are readily analyzed using a graphical
`
`
`
`
`user interface called LAV (local alignment viewer)
`is found 5' to the 01 gene (

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket