throbber

`
`
`
`GENOMICS 13, 741-760 (1992)
`
`Primate Ga/ago
`
`
`of the Prosimian The /3 Globin Gene Cluster
`
`
`
`
`
`crassicaudatus: Nucleotide Sequence Determination of the
`
`
`41-kb Cluster and Comparative Sequence
`Analyses
`
`2 PHILIP BENSON, t
`DANILO A. TAGLE, *·
`1 MICHAEL J. STANHOPE, t DAVID R. 51EMIENIAK,+·
`MORRIS GOODMAN, t AND JERRY l. SLIGHTOMt·+
`
`Departments of *Molecular Biology and Genetics and tAnatomy and Cell Biology, Wayne State University School of Medicine,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Detroit, Michigan 48201; and +Molecular Biology Unit 7242, The Upjohn Company, Kalamazoo, Michigan 49007
`
`
`
`
`
`
`
`Received November 1, 1991; revised March 27, 1992
`
`back to a single progenitor gene, which tandemly dupli­
`
`
`
`The nucleotide sequence of the fJ globin gene cluster
`
`
`cated some 150 to 200 millon years ago (MYA) in the
`of the prosimian
`has been deter­
`Galago crassicaudatus
`
`
`early mammals. By about 110 to 135 MYA, the two gene
`
`
`mined. A total sequence spanning 41,101 hp contains
`
`lines from the tandem duplication had differentiated
`
`
`
`
`and links together previously published sequences of
`
`
`
`into an embryonically expressed locus (proto-E) and an
`
`the five galago /1-like globin genes (5'-f-)'-if;T/-b-fJ-3').
`
`
`
`ontogenetically later expressed locus (proto-/3) (Efstra­
`
`
`A computer-aided search for middle interspersed repet­
`tiadis
`
`et al., 1980; Czelusniak et al., 1982; Koop and
`
`
`
`itive sequences identified 10 LINE (Ll) elements, in­
`
`
`
`
`Goodman, 1988). Further tandem duplications of the em­
`
`cluding a 5' truncated repeat that is orthologous to the
`
`
`
`
`bryonic 5' locus and adult 3' locus, in the early eutherian
`
`full-length Ll element found in the human E--y inter­
`
`
`mammals (85-100 MYA), led to a genomic domain of
`genie region.
`
`
`SINE elements that were identified in­
`
`
`five developmentally regulated loci (5'-ey-r,-o-/3-3').
`
`cluded one Alu type I repeat, four Alu type II repeats,
`
`
`
`In this /3 globin gene cluster, E, 'Y and r, originated from
`
`
`and two methionine tRNA-derived Monomer (type III)
`
`
`the 5' proto-, locus and were embryonically expressed,
`
`
`elements. Alu type II and Monomer sequences are
`
`
`while o and /3 originated from the 3' proto-/3 locus and
`
`
`unique to the galago genome. Structural analyses of the
`
`
`were expressed in the non-embryonic, later ontogenetic
`
`
`
`
`cluster sequence reveals that it is relatively A + T rich
`
`
`
`stages oflife (Hardies et al., 1984; Hardison, 1984; Hill et
`
`
`(about 62%) and regions with high G + C content are
`
`
`
`associated primarily with globin coding regions. Com­
`
`al., 1984; Goodman et al., 1984, 1987). This ancestral
`
`
`
`
`parative analyses with the fJ globin cluster sequences of
`
`
`
`
`
`eutherian /3 globin gene cluster underwent varying de­
`
`
`human, rabbit, and mouse reveal extensive sequence
`
`
`grees of change during the evolution of different euther­
`
`
`homologies in their genie regions, but only human,
`
`
`
`
`ian orders. For example, lagomorph (rabbit) and rodent
`
`
`
`galago, and rabbit sequences share extensive inter­
`
`
`(mouse) /3 globin gene clusters lack the r, globin locus
`
`
`
`genie sequence homologies. Divergence analyses of
`
`
`
`
`whereas artiodactyl (goat, sheep and ox) clusters lack
`
`aligned intergenic and flanking sequences from ortho­
`
`
`
`
`
`the 'Y globin locus. However, each primate /3 globin gene
`
`
`logous human, galago, and rabbit sequences show a
`
`
`
`
`cluster so far examined contains sequences from all five
`
`
`
`gradation in the rate of nucleotide sequence evolution
`loci (E,'Y, JJ, o, /3) in the ancestral
`5' to 3' arrangement.
`
`
`along the cluster where sequences 5' of the f glob in gene
`
`
`
`The fJ globin gene cluster in mammals ranges in length
`
`region show the least sequence divergence and se­
`
`from approximately 20 kb in lemurs to about 90 kb in
`quences just 5' of the fj globin gene region show the
`
`
`goats (Harris et al., 1984, 1986; Townes et al., 1984). The
`
`
`greatest sequence divergence.
`,g 1992 Academic Press, Inc.
`
`
`
`
`complete nucleotide sequence of the /3 globin gene clus­
`
`
`ter of human (Collins and Weissman, 1984; Li et al.,
`
`
`
`
`
`and mouse (Shehee 1985), rabbit (Margot et al., 1989), et
`
`al., 1989) have been determined, and these sequences
`The different developmentally expressed /3-type glo­
`
`
`
`
`reveal that only about 10% of the DNA in a mammalian
`
`bin genes of primates and other mammals can be traced
`
`
`
`
`/3 globin gene cluster encodes globin mRNAs.
`
`
`Very few orthologous sequence data sets that compare
`
`
`
`Sequence data from this article has been deposited with the Gen­
`
`
`Bank Data Library under Accession No. M73981.
`
`
`
`noncoding DNAs from a series of species are presently
`
`
`
`1 Present addresses: Department of Human Genetics, 4562 MSRB
`
`
`
`available. Of those that do exist, the most extensive are
`
`
`
`
`
`II, The University of Michigan Medical Center, Ann Arbor, MI 48109-
`
`
`
`from closely related simian primates; they involve inter­
`0650.
`
`
`genie sequences that flank the i/;a globin genes (Sawada
`
`
`2 Howard Hughes Institute 4522 MSRB I, The University of Michi­
`
`
`
`gan Medical Center, Ann Arbor, MI 48109-0650.
`
`
`
`
`and flank sequences that include et al., 1985), noncoding
`
`INTRODUCTION
`
`741
`
`0888-7543/92 $5.00
`
`@ 1992 by Academic
`Copyright
`Press, Inc.
`
`
`All rights of reproduction in any form reserved.
`
`SKI Exhibit 2030 - Page 1 of 20
`
`

`

`742
`
`TAGLE ET AL.
`
`the 1"'1 globin gene (Bailey et al., 1991; Fitch et al., 1988;
`
`
`ary structures were resolved by sequencing at a higher temperature
`
`
`
`
`
`(55°C) using Taq DNA polymerase. Selected BamHI-and HindIII­
`Koop et al., 1986; Maeda et al., 1988; Miyamoto et al.,
`
`
`
`
`generated fragments (not shown) were also subcloned into plJC-18
`et al.,
`
`
`1987), and o-fJ intergenic sequences (Savatier
`
`
`
`
`
`across the sequence contiguity and sequenced to verify nucleotide
`
`
`
`1985, 1987). Consequently, much still remains to be
`
`
`
`cloning site junctions of certain EcoRI clones.
`
`
`
`
`learned about the sequence features, evolution, and
`Base composi­
`
`
`
`Computer-aided analyses of nucleotide sequences.
`
`
`functional constraints that act upon intergenic noncod­
`
`
`
`tion and other sequence features [such as open reading frames
`
`
`
`ing sequences. Comparative analyses of the (3 globin
`
`
`
`
`(ORFs), strand asymmetry, subsequence breakdown, and repetitive
`
`
`
`
`
`elements] of the galago /3 globin gene cluster were identified and ana­
`
`
`
`
`
`gene cluster sequences from distantly related species can
`
`
`
`
`lyzed using the DNA analysis computer programs available from The
`
`
`
`be used to identify evolutionarily conserved DNA ele­
`
`
`
`
`Genetics Computer Group package (GCG; Madison, WI: Devereux
`et
`
`
`
`
`ments or phylogenetic footprints, which can provide in­
`
`
`
`6.6 from IBI Technolo­al., 1984) and the Mac Vector Package Version
`
`
`
`sights into the evolution of gene clusters at the molecu­
`
`
`gies (New Haven, CN). In addition to the ORF search, the galago
`
`
`lar level (Tagle et al., 1988; Gumucio et al., 1991).
`
`
`
`
`cluster sequence was submitted to GRAIL (Gene Recognition and
`
`
`
`
`Analysis Internet Link; Oak Ridge, TN; Uberbacher and Mural, 1991)
`
`
`
`In this context, we decided to determine and analyze
`
`
`
`
`
`for additional searches of potential protein coding regions. The loca­
`
`
`
`
`the complete nucleotide sequence of the (3 globin gene
`
`
`
`
`tion of shared nucleotide sequence identities between the galago /3
`
`
`
`
`cluster from a distantly related primate, the prosimian
`
`
`
`globin gene cluster sequences and human (Collins and Weissman,
`
`(Galago crassicaudatus).
`galago
`
`
`Fossil evidence indicates
`
`
`
`
`
`or mouse (Shehee 1984; Li et al., 1985), rabbit (Margot et al., 1989), et
`
`
`
`
`
`by dot­identified sequences were first al., 1989) [:J globin gene cluster
`
`
`
`that the prosimian (galago)/simian (human) divergence
`
`
`
`plot comparisons using the GCG computer program COMPARE. Ho­
`
`
`time dates back to about 55 MYA (Fleagle, 1988). Here
`
`
`mology plots of the galago globin gene cluster sequence with itself
`
`
`
`
`we report 41,101 bp of continuous sequence spanning
`
`
`
`
`
`were used to locate repeated regions. In addition, identity searches
`
`
`
`
`the entire (3 globin cluster of the galago. This cluster
`
`
`using SINE (Daniels and Deininger, 1983, 1985, 1991; Daniels
`et al.,
`
`
`
`contains five globin-like genes that are arranged in the
`
`
`se­1983) and Ll (Hattori et al., 1986; Scott et al., 1987) consensus
`
`
`
`
`of identify and define the boundaries quences were used to further
`
`
`order of their developmental expression: 5'-ey-(embry­
`these repetitive elements.
`
`onic )-1P11-(nonexpressed)-o-{J-(fetal and postnatal)-3'
`Pairwise alignment among human, galago, and rabbit /3 globin gene
`
`
`
`
`
`
`
`(Tagle, 1990; Tagle et al., 1988, 1991). An analysis of the
`
`
`
`
`
`cluster sequences in their matching regions were obtained using the
`
`
`
`compositional and structural features of the complete
`
`
`
`
`alignment programs of Smith and Waterman (1981), Wilbur and Lip­
`
`
`
`man (1983), Lipman and Pearson (1985), and Needleman and
`
`
`
`
`cluster sequence of galago is presented. We have identi­
`
`
`
`
`Wunsch (1970). These aligned sequences are available in diskettes
`
`
`fied the DNA sequences that are orthologous (derived
`
`
`
`
`from the authors upon request. In all cases, gaps were inserted into the
`
`from the same DNA sequence in the last common ances­
`
`
`
`
`
`alignments to increase sequence identities. Due to the conserved na­
`
`
`
`tral species) among galago, human, rabbit, and mouse {J
`
`
`
`ture of the orthologous gene loci, alignments in these regions were
`
`
`
`globin gene clusters and the locations of short and long
`
`
`
`used as anchor points to align the more diverged intergenic regions.
`
`
`The Local Alignment Diagrammer program (LAD; Schwartz
`et al.,
`
`
`
`
`interspersed repetitive nuclear elements (SINEs and
`
`
`
`l 991) was used to display the aligned sequences as plots. Each plot
`
`
`
`
`LINEs) within the galago cluster. Differences in the de­
`
`
`
`
`depicts pairwise interspecies alignments computed by the SIM pro­
`
`
`
`gree of sequence divergence among the orthologous non­
`
`gram (Huang et al., 1990) with a score of 1 for matches, -1.5 for mis­
`
`
`
`coding regions of galago, human, and rabbit have also
`
`
`matches, -6 for opening a gap, and -0.2 for each symbol in the gap.
`been determined.
`
`
`
`
`An alignment is displayed only if its score exceeds a threshold ( r),
`
`
`chosen so that the probability is 0.05 that random sequences matching
`
`
`
`
`the given sequences in length and nucleotide composition have a gap­
`
`
`free alignment scoring of at least r.
`Pairwise divergence values for the aligned sequences were calcu­
`
`
`
`
`
`
`
`lated following the method of Nei and Gojobori (1986). Nucleotide
`
`Twelve overlapping recombi­
`
`
`Cloning and nucleotide sequencing.
`
`
`
`substitutions (both transitions and transversions) and gaps (regard­
`nant Charon 35 phage clones
`
`
`that span the galago /3 globin gene clus­
`
`
`
`
`
`
`less of length) were counted as single events. Divergence values were
`
`
`
`
`ter (Fig. 1) were previously isolated and described (Tagle, 1990; Tagle
`
`
`
`corrected for hidden, superimposed substitutions
`by the method of
`et al., 1988, 1991). Phage clones A Ger 18.3, A Ger 11.9, >.. Ger 15.4, >..
`
`Jukes and Cantor (1969).
`
`
`Ger 16.1, >.. Ger 15.2A, and>.. Ger 15.2B were used to generate a com­
`
`
`(Rl to R22) that were sub­plete and ordered set of EcoRI fragments
`
`
`
`subclones cloned into pUC-18 (Yanisch-Perron Plasmid et al., 1985).
`
`
`R5 to R22 were sequenced by the dideoxynucleotide chain termination
`
`
`
`cleavage method method (Sanger et al., 1977) and/or hy the chemical
`
`
`
`
`
`(Maxam and Gilbert, 1980). In the latter method, restriction frag­
`
`
`
`ments from R7, R9, R12, and parts of R8 and RlO were end-labeled
`Globin Gene Cluster
`
`
`
`and sequenced as described in Tagle et al. (1988) for the€ and r globin
`
`
`of the genes and in Koop et al. (1989) for the t/111 gene. The sequences
`A schematic diagram showing the organization of the
`
`
`
`
`
`
`
`galago o and /3 globin genes have been presented by Tagle et al. (1991).
`
`
`
`
`
`These previously published galago /3 globin gene sequences represent
`
`
`
`galago {3 globin gene cluster is shown in Fig. lA. The 12
`
`
`
`
`only about 9 kb of the (3 globin gene cluster sequence presented here.
`
`
`overlapping recombinant phage clones used to recon­
`
`
`
`
`The remaining intergenic sequences were obtained by the dideoxynu­
`
`
`
`struct the structure of the cluster are also shown. Restric­
`
`
`
`
`cleotide chain termination method using initially universal forward
`
`
`tion enzyme site maps of these 12 clones revealed the
`
`
`
`
`
`and reverse vector primers and then by using synthetic oligomers that
`
`
`
`
`extent of overlaps for each clone. Southern blot analysis
`
`
`
`
`
`were designed from the previously determined sequence (referred to as
`
`
`
`
`
`
`oligomer walking). Nucleotide sequence readings ranged from 400 to
`
`
`
`of Eco RI restriction digests of the phage clones localized
`
`
`950 bp. Approximately 95% of the intergenic nucleotide sequences
`
`
`
`
`the five linked (3-like globin genes and was confirmed by
`
`
`
`were determined on both DNA strands, and the remaining 5% were
`
`
`
`Southern blot analysis of EcoRI-digested galago geno­
`
`determined from the same strand at least twice using independent
`
`
`mic DNA (data not shown). Gene identities were con-
`
`
`
`
`sequence reactions (see Fig. 2). Areas of compression or strong second-
`
`Organization and Nucleotide Sequence of the Galago (3
`
`MATERIALS AND METHODS
`
`RES UL TS AND DISCUSSION
`
`SKI Exhibit 2030 - Page 2 of 20
`
`

`

`THE GALAGO {3 GLOBIN GENE CLUSTER
`
`743
`
`A.
`
`♦♦ i
`0.1 1.4 8.6
`
`Ill Ill. 113
`
`R4
`
`y
`e
`\jfT\
`7 a • a: ♦ +9
`♦
`i u '
`me::
`OA I.I 1.HJ 1.H.$ 4.3 :u; 1.1 1J. ...
`7J
`♦+
`• •
`3.9 u "
`l.l 0.2 J.)
`lt17 RII lt19 Rll!IUI !Ill
`IU 116
`RS a, lilt IUl 1112 Rll lU41US Ill'
`.,
`
`6
`
`�
`
`� ·S 11.J
`
`l.OCRIU
`
`� OCR 16.J
`
`a SIi l�.16
`
`•
`l. S!!;!. 11..4
`lOCRlU
`
`� OCR 13.9
`
`l.OCll 153
`
`� OCR 10.7
`l. OCR 12..l
`
`l.OCll.12.!I
`
`). OCR IS.la
`
`B.
`
`EcolU SITES
`

`
`'Y
`'I'll
`6
`�•l.lXB OLOBIN OENES
`==
`7 .r
`. n ♦ .:rx:::i
`7 ..
`...
`l.fi l.l U 0.9 1.1
`3.9 3.&
`••
`u O.l
`3.S 0.4 LI 1.50.3 Ufl.5 (.3
`..
`..
`"
`Rl6 R17 RIB Rl'I klBRll
`RS lt9
`RIO RU JUl RU JU4RJS
`PLASMID CLONl!S R5 Rfi
`R7
`
`♦
`
`RU
`
`
`
`SrNF'.s: Alu TYPE l
`
`4
`◄
`lllu TYPE Il
`I l l
`� ◄
`MONOMER
`
`LJNF.s: Ll
`
`s ►
`
`I l
`
`'
`◄
`
`'
`◄
`
`1 • •
`•◄► ◄
`
`It
`-◄-
`
`R1,, TRACTS
`
`Y1,1 TRACTS
`
`RY,,, TRACTS
`
`IS
`1f lJ 1'
`U 1, 1111112' U
`11 u
`"
`
`11 1Jl0
`
`1$ U
`U 15 14
`
`l♦ll uu11
`" l5
`U 11 U IG
`
`1l lllJ U
`u�11 1;i41, .;'i,11,. 11 ull ,i'u
`"
`•• l7 J"u ..u
`The gene cluster. of the galago fJ globin The top most line shows the organization fJ globin gene cluster. FIG. 1. (A} Overview of the galago
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`with known mammalian s relationship to their orthologouand are labeled on top according by the rectangles {3-like globin genes are denoted
`
`
`
`
`
`
`
`
`
`globin genes. The approximate position of the three exons separated by two intrans and their 5' and 3' untranslated region are indicated inside
`
`
`
`
`
`
`
`
`arrows below the line are indicated by sites along the cluster the rectangles where the filled areas denote the coding regions. Eco RI restrictions
`
`
`
`
`
`
`
`into that were subcloned and the size of each EcoRI fragment is listed (kilobases) between the arrows. The ordered set of EcoRl fragments
`
`
`
`
`
`
`
`pUC-18 and numbered 1 through 22 are shown below the cluster map. The 12 overlapping recombinant)\ clones used to reconstruct the cluster
`
`
`
`
`
`
`
`
`
`organization are also shown below the cluster map. The length and position of the ;\ phage clones correspond directly to the galago linkage map.
`
`
`
`
`
`
`
`
`
`
`
`(B) Synopsis of sequence features of the galago (3 globin gene cluster. The extent of the galago fJ globin gene cluster region that was sequenced
`
`
`
`
`
`
`
`
`map. The position,by the rectangles on the cluster globin genes are represented in the top line. The {3-like (Eco RI fragments 5 to 22) is indicated
`
`
`
`
`
`
`
`direction, and length of the interspersed repeat elements (SINEs and LINEs) are shown below the cluster map. Left-pointing arrowheads
`
`
`
`
`for theRight-pointing arrowheads of globin transcription.
`
`
`
`
`
`indicate that the repeat is oriented in the 3' to 5' direction relative to the direction
`
`
`
`
`
`
`
`
`
`repeats indicate a 5' to 3' direction. The locations of purine (Rln and pyrimidine (Y)" tracts (where n ?o 10) and of alternating purine/pyrimidine
`
`
`
`
`
`(RY)" (where n ?o 5) tracts are indicated by numbers corresponding to their lengths.
`
`(Efstratiadis et al., 1980; Hardies et al., 1984; Hardison,
`
`
`firmed by sequence orthology with other known primate
`
`
`1984; Harris et al., 1984; Hill et al., 1984; Goodman et al.,
`
`
`
`
`and mammalian globin genes previously presented and
`
`
`characterized in Tagle et al. (1988) for the E and')' globin
`
`
`
`1984, 1987). A general scheme that depicts some of the
`
`
`
`genes, Tagle et al. ( 1991) for the o and f3 globin genes, and
`
`
`
`major elements and molecular events that have occurred
`
`
`Koop et al. (1989) and Bailey et al. (1991) for the \f1J
`
`
`(3 globin gene cluster
`in the evolution of the mammalian
`
`globin gene locus.
`
`
`
`
`is presented in Fig. 3. The entire sequenced region of the
`
`
`
`The nucleotide sequences of the remaining intergenic
`
`
`
`f3 globin gene cluster spans 41,101 bp and includes
`galago
`
`
`sequences starting 4.3 kb 5' of the t gene, intergenic
`
`
`
`
`regions of the galago {3 globin gene cluster were deter­
`
`
`mined from the ordered set of EcoRI plasmid subclones
`
`
`DNAs spanning 6.1, 3.8, 11.1, and 3.1 kb between the t
`and 'Y, 'Y and 1/111, 1/111 and o, and b and f3 globin
`
`
`
`R5 through R22 that span the galago cluster (Fig. lB).
`genes,
`
`
`
`
`The complete sequence of the galago fJ globin gene clus­
`
`
`
`and extending respectively, 4. 7 kb 3' of the fJ globin gene.
`
`
`ter is presented in Fig. 2. In general, the organization
`Galago Globin Genes
`
`
`
`
`and position of the fJ-type genes in the galago globin gene
`
`
`
`cluster (51-f-1-1/111-0-/3-3') are similar to that hypothe­
`The nucleotide sequence positions (from Fig. 2) for
`
`
`
`
`
`
`sized for the ancestral eutherian f3 globin gene cluster
`
`
`
`each /3-like globin gene are summarized in Table 1. The
`
`SKI Exhibit 2030 - Page 3 of 20
`
`

`

`744
`
`TAGLE ET AL.
`
`four expressed genes exhibit the basic exon-intron
`
`
`20,796 to 21,128; 20,897 to 21,196 and 27,828 to 27,493 of
`
`
`
`
`
`
`
`Fig. 2). Another ORF is associated with the Alu element
`
`
`
`structure of globin genes consisting of three exons sepa­
`
`
`
`
`rated by two intervening sequences (reviewed in Collins
`
`
`(GcAlull-3), which shows an ORF (positions 1847 to
`
`
`
`and Weissman, 1984), and each is structurally able to
`
`
`
`
`2168 of Fig. 2) throughout its entirety and may represent
`et al.,
`
`
`
`(Tagle encode the 146-residue globin polyp eptide
`
`
`
`a relatively recent insertion event (discussed below).
`
`
`
`1988, 1991). Codon usage for these expressed galago (3
`
`
`
`
`The remaining ORF at positions 32,071 to 32,391 ( on the
`
`
`
`
`globin genes was analyzed (Tagle, 1990), and there ap­
`
`
`
`
`in Fig. 2) not presented complementary strand, strand
`pears to be the same codon usage bias for the amino
`
`
`appears not to be associated with any known structural
`
`
`
`acids, Leu (the codon CTG is preferred 51/67), Val (the
`
`
`
`
`
`feature of the fJ globin gene cluster sequence (i.e., with
`
`
`codon GTG is preferred 39/63), and Gly (the codon GGC
`
`
`
`
`globin genes, repetitive elements, or simple repeat se­
`
`
`is preferred 22/47), as that found for human globin
`
`
`
`quences). A search of the translated sequences of this
`
`
`genes. This codon bias has also been observed for the
`
`
`
`ORF against GenBank did not reveal homology with any
`
`
`genes of other mammalian and avian species that are
`known gene.
`
`
`available in GenBank (Wada et al., 1991). The prefer­
`While the above search routine of examining the posi­
`
`
`
`
`ence for C and G in the third codon position is consistent
`
`
`
`tional and compositional bias of the sequence for ORFs
`
`
`with the prevalence of most mammalian and avian genes
`
`
`
`
`correctly identified all the expressed galago (3-like globin
`
`
`in GC-rich genomic regions (Wada et al., 1991).
`
`
`
`genes, the background noise was too high. The galago
`
`
`Like all other primate 1/17/ glob in genes studied thus far,
`
`
`
`
`sequence was analyzed by GRAIL for protein coding re­
`
`
`structural anomalies have maintained the galago i/;ri
`
`
`
`
`gions. This search routine combines a set of seven sensor
`
`
`locus as an nonfunctional gene or pseudogene. These
`
`
`
`
`algorithms that measure important attributes of coding
`
`
`
`
`anomalies include two deletions ( that resulted in a frame
`
`
`
`DNA versus noncoding DNA (such as statistical frame
`
`
`
`shift and six in-frame termination codons) and the loss
`
`
`
`bias, Fickett's base composition, dinucleotide frequen­
`
`
`
`
`cies, and coding six-tuple word preferences) on a sliding
`
`
`
`
`
`of consensus intron splice junction sequences for introns
`
`
`
`1 and 2 (Tagle, 1990; Bailey et al., 1991). In addition to
`
`window of 100 bases (Uberbacher and Mural, 1991).
`
`
`
`These sensors were applied to the known coding and
`
`
`
`having deleterious mutations in its coding and 5' regula­
`
`
`tory regions (Tagle, 1990; Bailey et al., 1991), the galago
`
`
`noncoding human DNA sequences that are in GenBank,
`
`
`
`
`
`1/17/ gene is truncated by the insertion of Ll elements in
`
`
`and the output was used to train a neural network to
`
`
`
`
`dissect potential protein coding regions from a set of
`
`
`intron 2 (discussed below).
`
`
`
`
`unknown sequence, such as the galago globin cluster se­
`
`
`
`quence, based on these "learned" attributes. The results
`
`
`
`
`of this analysis on the galago cluster sequence is shown
`
`
`in Fig. 4. Both sense and antisense strands were
`
`
`The galago (3 globin gene cluster sequence was
`
`
`
`searched for potential exons, but only the sense strand
`
`
`
`searched by computer for open reading frames that are
`
`
`
`showed significant peaks. Peaks with scores of greater
`
`
`at least 30 amino acids in length [ enough to detect the
`
`
`than 0.8 were ranked as excellent candidates for coding
`
`
`smallest globin exon ( exon 1) of 91 bp] and that follow
`
`
`
`
`sequences. The results of this analysis correctly pre­
`
`
`
`Fickett's criteria (Fickett, 1982) of G + C base composi­
`
`dicted 75% or at least one of the exons of each tran­
`
`tion ( window of 200 with a probability of 0.92 or greater).
`
`
`
`
`scribed galago globin gene (Fig. 4). The results of this
`
`
`
`This conventional search routine identified a total of
`
`
`
`analysis indicate that GRAIL, trained on human gene
`354 ORFs (Fig. 4). Of these, 11 of the 16 ORFs that are
`
`
`
`sequences, can also be used to identify protein coding
`
`
`
`each greater than 300 bp in length are associated with a
`
`
`regions of other mammalian species. Of the transcribed
`
`
`
`ORF in functional globin gene(€, 'Y, o, or fJ). The longest
`
`exons greater than 100 bp (exons 2 and 3), 88% were
`
`
`
`the cluster is found in the fJ globin gene, consists of 576
`
`
`identified. Only exon 3 of the E globin gene was missed.
`
`
`bp that begin 34 bp into intron 1, and extends through
`
`However, only about 50% of the exons less than 100 bp
`
`
`225 bp into intron 2. The second most extensive ORF
`
`
`
`
`were found. As expected, the search did not identify the ri
`
`
`
`starts 34 bp into (489 bp) is in the o globin gene. It also
`
`
`
`
`globin pseudogene as one of the protein coding regions
`
`
`
`
`intron 1 and extends through 141 bp into intron 2, and
`
`
`
`but did identify segments of two LI elements (LlGc-5
`
`
`the similarity of this ORF with that of the (3 globin genes
`
`
`
`and LlGc-6; see Figs. lB and 4) as potential exons. Two
`
`
`is due to concerted evolution between the two loci (Tagle
`
`
`
`et al., 1991). Of the remaining five ORFs greater than
`
`
`
`peaks (positions 7011 to 7091 and 33,804 to :33,851) were
`
`
`
`also identified as potential coding sequences where no
`
`
`(positions 300 bp, three are associated with LI elements
`
`
`
`
`
`Search for Other Protein Coding Regions
`
`bp was obtained The total sequence of 41.101 fl globin gene cluster. of the galago FIG. 2. The complete nucleotide sequence from 13 EcoRI
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`(:/-like g!obin genes are of the expressed The coding regions in lowercase letters. sites are indicated subcloned fragments. The Eco RI recognition
`
`
`
`
`
`
`
`indicated on top of the nucleotide sequence by their encoded amino acid sequences. Exons 1 and 2 of the ,{r, gene are indicated by asterisks above
`
`
`
`
`the nucleotide sequence. The promoter elements CACA, CCAAT, and TATA boxes as well as the putative CAP sites are labeled as such above
`
`
`
`
`
`
`
`
`
`their nucleotide sequence. SINE elements are indicated by an overline above their nucleotide sequence. LINE elements are indicated by a
`the :3' to
`
`
`
`
`
`
`
`repeats, left and right arrows indicate of interspersed For both families nucleotide sequence. double overline above their corresponding
`
`
`
`
`
`
`5' or 5' to 3' directionality of the repeats, respectively. A series of arrowheads are used to indicate where short direct repeats flank insertion
`
`
`
`
`
`
`
`
`
`elements. The known structural features of the sequence are indicated to the right. DNA regions sequenced multiple times on only one strand
`
`
`
`
`
`
`
`
`
`
`include positions 33,050 to 33,350: 40,600 to 41,101: 29,500 to 29,7:30: 26.450 to 27,100; and 22,430 to 22,960 presented in this figure.
`
`SKI Exhibit 2030 - Page 4 of 20
`
`

`

`V
`y
` V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`CUCTACCAGCAATCTAAAGTATAq••ttcffCA1'ACTAATAGTGCCTAAGGACU.'l"GCAATAGnGATTCTTCAGATTMlTAMflT1'ATAAATGAGCTAMJU&GlT'J'Tll.AAMCC:ACTGAA.T1''!'!THCCTGCTGU.UGAGACCT 1 so
`y
`
`V
`
`V
`
`V
`
`V
`
`V
`
`V
`
`Y
` V
`<<<<<<<---------
`
`V
`- ------
`
`V
`
`---
`
`V
`Gd luII-1
`300
`C'l"?ACA'IT'l'MIWUITl'CC'l"?CACAGMCTTACIWUIGTACAAATGCTAC'.TGCAMCTTTTGQAGGCAAGIUICTGATCAHATIWUIHTGllllGGTTC'l"?TTHHHTGMACMAGCCTTAAGCTTCCCCC'rGGGTMAGTGCCATG
`V
`
`"
`
`450
`GCTTCACTGCTCACAGCAACCTCCATCTCCTGGGC'l'CAAGCGAGTCTCCTGCCTCAGCCTCCCU.GTAGCTGGGACCAGACATGCCCGCCACAACACCTQGC'TATHTTTGGTC:GCAGCCTTTATTG'l'TGTTTGGe�GCC:CGGGCTGGA
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`GcAlull-2
`----------
`---------------<<<<<<<---------------------------
`----------
`
`
`TTCGAACCCACCACCTTAGGTGTATGTGGCTGGCACCTTAGTCACTTGl.GCCACIGGCACCGAGCCAAGGTTTTTTTTTTTTGAGACAGAGTCTCAC'lTTATCACC:CTCAGTAGAGTGCTGTGGCGTCACA.GCTCACAGAGACCTCAAAC
`60 0
`
`-----
`--------------------------------------------------------
`
`
`
`
`TCCTGGGC1'TAGGCGGT'J'CTCCTCCCTCAGCCTCCCMGTAGCTGGGACTACAGGCACCCGCACll1'GCC1CCCTATTTTTTTGTTGCAGTTTGGCTGGGGCTGGGT1'TGAAc:CCCCCATCetCA.CTATATGGGGCCGGGGCTGGTGCTA
`
`150
`
`... ----- --<
`900
`
`CTCAC:1'GAGCCATTTMJ.TGT-.TTAAGACllACACA.GTTTTTATTTTATTGATTT1't'AATtACATAGACAATAAGGUATATTCT'l'AC'TGATTAGTTTTTCTGACTrCTC1'TGl ATAACCTTGTCATTTCTTTG'lCTAAATTTTGGGCTT
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`
`
`
`
`V
`V
`V
`V
`V
`V
`V
`Y
` V
`V
`V
`V
`V
`V
`V
`
`
`
`
`V
`V
`V
`V
`Y
` V
`V
`V
`V
`V
`V
`V
`V
`V
`
`
`V
`V
`V
`V
`t/
` V
`V
`\'
`V
`V
`V
`V
`V
`V
`V
`
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`
`TTCATCtAGAAATGGAAGAGAA1GATTACTTGT1CCGGGTATTTCATAGGG.M.AAAAAATG.Ucn'QCnUAAATGGGCU.CTGAGGGTATTTAAATTGMCCM.TAAGGACCTMGCMTAATGAGATTTCCCATAGGTTATCTGACTC 1050
`
`CAGAACCAGMTTTATAGCACTGACCTGCTGCGTnAAATCCTGTAAACC1'TCC1'GAATCTCTTCTCCATGACTTTTAGCCATATGAGAATI\TGAGATGGGCCAGG1'C1C1'GGAGTCTAGCCAGGTC1'GTTACAGGTACTATGGGCCATG 12 0 0
`
`MGMAQGGATATAGATGGGCCCCGTM.'l''l'GTATCTGAGGGAAGAAGAGGAAGATCAACCCCATACTCTCMTATMGAGMGACTGCCACATTCTAGGGTCAA'?T'?'?AGGGAGGMCATTTTCCCCAMCTCATGGCCTTGI\GAGACCT 1350
`
`AATCTAtllTCCT1'GGACT?TGGCAGTGTCTGACCCTTGCTTCCAAATCCTGACGAAGGTCACTGCCt'l'TAT.UGATAGATCCMGCllATTTCCCTTGMAACCCAGAGTGC'l'A'tC1GCATTGA1'AATAAACTCTTAACTCAAGAACTG 1500
`
`
`
`CTCTTCCTTGAGTAGTCJIGATGAHCTTGACTATTCATTAGTCTTCTGCTTCAATGATCAAJICATCTTCTCTCCCAACCTAGCATCC'IGTCCACTTCAGCTATAGATTCTAGCCCATGAMTCTTCATCTAGGGTCCTCAGAACACTACA 1650
`
`GAMATGGTGMATT1'1'ACGGGUMA.fAATCCCTCTTTATTTTCGCTAACTTTAAATGAAATTTAGO.TTTCCTTCCCTTTTGAATATAAATTCACAAACTAGTTTAATI\TCAGCI\GTTCCTCTGGCCTGTTTATAGAGAGAAATMAG 1800
`
`
`V
`V
`V
`V
`V
`'Ir
`V
`V
`V
`V
`V
`V
`V
`V
`V
`
`
`V
`V
`V
`
`----
`- GcAlult-3
`__
`____
`_ ,___ ___
`___
`____
`___
`<«<<<<<<<< _______
`
`
`
`TI\TT1'fCATAnACATTATAGTC'TTATAGATCTTTTGTTTTTCTTTCTTTCTTTCCTTCTTTCTTTC1'1'1'TT1'1'tttTTTTTTTTAAATTGTGTCACCCTCAGTAGAGTGCCAGGGCATCACI\GCTCGCACCTCACAGCAGCTCACAGCA 19 50
`V
`
`ACCTCA.UATCATGGGC'?TC:AGCAATTCTCTTGCCTC:AGCCTCCCUGTAGCTGGGAC'l'ACACGTGCCCATGATAACACCTGGCTATTtCTTGCTTOCAATTGCAATTGTTG1'1''1A'1C"-GGCCTOGGCAGGCTTGAACCCACCI\GCCTTG
`2100
`
`GTGCl\'l'C'TGGTCI\GTGCCCTATCACTGAGCTI\TGGGCCCCMGCACAGATATT'l'flMAT.M.CATTTCCTTATAtCI\CTI\C'l'TAATgHttcATA.M.GAMCATATAAAAG"TTI\TGCTATAMTTTGATGTTTTGATATTTGATAACTG 21SO
`
`...
`......
`
`TtTTTTAATATAACTGGTCTCCTTAGCMTMTATTTATATTATTTTATATAAATAAAATCCACATTATTTHAG'tACTAACAGTCTGCTllACTAGTCCATGCTACATTllAAGTTTAGgeattcCTGATTATGTCTATATTGGCTATT 2400
`
`_____
`_______________________
`········-....... ,.. ____ _
`---------------·-··--·-·---------
`. --
`...
`__
`________
`_ ...... .,. _____ .......................
`.., .......... -............
`
`
`<<<<<<<<<<
`-----------------<
`
`
`
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`
`
`
`
`V
`V
`
`________
`
`V
`
`V
`<----.. • .. •••------------- .. --------------------• .... -........... --.. ---------------•- .... ------GcAlol-4
`
`
`
`
`GTTTGGMTGTATGCCTAGAAAGATATTCCTTTTTTTTTTTTTTGAMCAGAG1C1'CAC't'rTGTTGCCCTTGGTAGACTCCTATAATCTCAGAGCTTACAGCAMCTCAGACA.CTTGGGCTCAACTGATTCTC'l'TGCCtCAGC1'1'CCCCA 2550
`
`
`
`..,.., ___ ...............
`-... -..... ----------
`
`GGMTTGAGACTA.CAGGTGCCCACCACAATGCCCAGCAATTTTl'THTT'l'T?GAGACAGGGTCTC'l'CTCTTOCTCAGGCTGGTC'l'CAAACCTGTGAACTCAGACAA'l'TCACCCGTCTTGGCCCCCC.U.GTCC'l'GGGATTACAGGTGTGM
`27 0 0
`V
`
`_____ _
`
`CC"-TTGCTCTCGGC'CTCTAQ.AI.AGATAncnAATTAGMCCTllAAGAAAG'l'GGTAAAGAMTTCATTGGCCTGGAATACT-'CTAATTTGGAGAGAGTCltiCTAAGTCAGGTAAAGACTTCCAGAATCATAAGAGAAAGACAAAATATTT 2850
`
`TCATGTGCTACAGGTGTTTTTTTATCACACTCTTAAGllAAGATU.l'llATGCCAAGGTAGGGTTAAAGATAAGAGAAGAGTTCTCAG?AAATA1'GOAA1'ATATGGAGATCTCMGGGCACTAGACTGGGAAGGCATAATCCAAATGGTl' 3000
`
`--<
`
`
`
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`
`
`
`
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`Y
` V
`
`
`GAGTAAllGGGCCAAAA.ACCTCT GGGGCATCAAAATGCAAGAAATTGCTAGGAAAAGAAACTCCCAGGTCTGGllGGGGCCAATAGAGCACCAGAGAAAGAAAAGATAGAAATAGATG1'GGAAT1'AGCTAGAACAAGMCTGGGCAGAGC 31 5 0
`
`TCACTAGATT"TAGTGAACTGTAGCAAGACTTTTTTAGAATTAACAAAGGTGGAATCTGTGCI\TTCCTGGTTGCATTTTCCCCTAATCATCAATTTTGCCATCTCGACGGTTACACAACTTCTGTATCTTGTTTTCATAGTCAA.UAAAT 33 0 0
`
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`
`
`
`
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`
`AAGGGTAATGACCATTATATTACTAGGCACAGATGAAAAGATACACCACMAGAACATT<:TTCCTGCATATGGATGGAACTGAATATATATCAATGCC:AGCTCCAGT'l'TTAGC'l''l'CAACATATAAAGAGCTAACAAGTCA1"l'ACAAC1iTA
`34 50
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`y
`
`
`
`GI\CATGGGCACCTATGGC:'l'AGAGCTAACACTAAATTTAACCTGACTCTTCATCTGATTMCATATCAMlt.TACTTATTTCTCAGTTTGATCACAMTCTTGCTTT'TAMTAATTTTACATTTCTCAGCTCTCTCACTGATGAGAAGTATA JliOO
`
`
`
`GGCAGCACAC1.CAAGCI.CI.MGA'?ACCTCACGCAA>.1
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`
`
`
`V
`V
`V
`V
`V
`Y
` V
`V
`V
`V
`V
`V
`
`GGUAAGGCGGAATG,AGAGAAATCTTTGT1'ATTTTGTAAGGTGATGTGGGGAMGAAGAGAGTGAGTAGAGGGAATQTOCATGGAGGAGGACACMGGMAGAMGGTCTCAJCATCC:CCCACACATTATCTCAATGTGTGCCTACTTCA
`4200
`
`AGTATTAC'AC'flTGG.UTTAAGOCTl'CAAU,'?CGGGGTGAATTTTTTACTAC'l'CTGTTCMTTTtTAGAAGCGAC:C:ATGTATGGTTTflATCTCCCTAGMAMC'?A.AGATCCAGAGGTTTGGGTACAAGTCAGTCACCMGA.GCACtltiG 3900
`
`AACTGGATAAAACTCCATG'TCAACAGCTTACCCTTTTTOTTTCTTOGCCAGOGCTGTCTTTGTCATTGTCACTI\TCCATAATCCAAATTTTMACTCATTTTGGCTGMGCAG'""TTTCCTA.CCTGAGGCTTACTTCAtTA'l'CAGACTCTT 4 0 5 0
`
`'AGCCACJlUGCTCG.U1'TCCC'TCTATACTCACAGA.TA.AI\TGGMAGAGAAAA?GTTCCCTGGI\AGCACCAGGTGTA..,TCTTGTTCTTTCTGTCC1'efCCCAC.U..CCACG'l'CTT 3750
`
`CAP
`CCAAT
`CACA
`ATIIAA
`
`
`CTGCTGACCCTCTGCTGACCAGGCTCCACCCCTGAGGACAGAGCTTAGC:TnGACC:AA'tGACTTCTAACTACCACGGAG.U.CI\AGGGGCTAGAACTTCAGCAGTGCAGGATAAA.AOOCCAC:ATM3AAAATCAGCAGCATACACCTATTTC 4350
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`Exon 1
`
`
`IniVdHisPh•ThrAlaGluGl\lLyaAl■Il■Il.M•tS•rLautrpGlyL)l•V•lhnll.CluGluAl•GlyGlyGluAl•L•uAlaAr/
`
`
`
`TGGTACAGCTGTGATCACCAGCMGCTCCCAGACT1'GACACCA1'GGTGCA'J'TT1'ACTGC1GAGGAAAAGGCTATTATCATGAGCCTGTGGGGAAAGGTGAATATAGAAGAGGCTGGAGGAGAAGCCTTGGGCAGGTAAACACTGGTTCTC 4 500
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`lgLau1AluValV•lTyrProTrp1'hrGln.Ar9Phdh•GluThrP Exon 2
`
`
`
`
`AGTGCATGGGAATUAGGGGGAATATMCTCTGGCAAACTGACCAGGAAI\GTCCTAAAGATTTTGI\GCATCACtAAT'l'T1'C:CACCTGTTATGGTGJ.CGTATC1t.TAGGCTCCTTGTTGTCTACCCCTGGACCCAG1t.GATTCTTTGAAACCT 4fiS0
`
`
`
`Y
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`h•GlyA•nL•uS•rS.r:Al•SetAl•Il•M•tGlyA•nProLysValLy•AlaHiaGlyLyeLysV•lLeuthrSuPheGlyGluAl•V•lLysAanM•tA•pA•nlAuLysGlyAlaPheAlaLysIAuS.rGluLeuHii,CyabpL
`
`
`1'1'GGAAACCTGTCCTCTGCCTCTGCCATCA'l'GGGCAACCCCAAGGTCAAGGCCCA1'GGCMGAAAGTGCTGAC'.CTCCTT'?GGAGAAGCTGTCAMMCATGGACAACCTCAACGCTCC:C:TTTGC?MGCTGAOTGAGCTGCACTGTGACA 4 8 0 0
`V
`V
`V
`V
`V
`V
`V
`V
`V
`V
`yslAuHbVdA•pProGluA•nfh•Ly•/
`
`
`
`AGCTGCACGTGGATCCTGMAACTTCAAGGTAAGTTCAGGAAATGCTACTAGGCTCTTGGCTTTCACTTTGAGACAATAATGGAAGGTTACACTATGATTMAAGGATCAACAAMACGTCAGAAAACATAGG'l'CCAGTTTGGTCTTIIA

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket