`
`Cell, Vol. 21, 627-638. October 1980. Copyright© 1980 by MIT
`
`
`
`Human Fetal Gy-and Ay-Globin Genes: Complete
`
`
`
`Nucleotide Sequences Suggest That DNA Can Be
`
`
`Exchanged between These Duplicated Genes
`
`Summary
`
`Jerry L. Slightom, Ann E. Blechl and
`
`Oliver Smithies
`
`Laboratory of Genetics
`
`University of Wisconsin
`
`
`Madison, Wisconsin 53706
`
`duplicated globin genes varies, but in many cases the
`
`
`
`
`
`products of a pair of duplicated genes within a given
`
`
`species are more like each other than either is like the
`
`
`
`comparable pair of globins in other species. For ex
`
`
`ample, adult human 8-and {J-globins, which differ in
`
`1 0 amino acid residues out of 146, are more different
`
`than are the fetal human globins G1' and Ay, which
`
`
`differ in only 1 residue, but the adult human 8-and
`
`
`
`/3-globins are still more similar to each other than
`
`
`either is to the adult mouse /3-globins (the minimum
`We present the nucleotide sequences of the Gr
`
`
`
`
`
`human/mouse difference is 26 residues). The con
`
`
`and Ay-globln genes from one chromosome (A) and
`
`
`
`ventional explanation of these findings is that during
`
`of most of the Ay gene from the other chromosome
`
`
`evolution the globin genes have increased and de
`
`(B)of the same individual. All three genes have a
`
`
`
`creased their number by unequal crossovers and/or
`
`
`small, highly conserved Intervening sequence
`
`
`
`duplication events. A high degree of similarity between
`(IVS1) of 122 bp located between codons 30 and
`
`
`
`the members of a duplicated pair within a species is
`
`31 and a large intervening sequence (IVS2) of vari
`
`
`
`usually taken to indicate a short evolutionary time
`able length (866-904 bp) between codons 104 and
`
`
`since the last duplication occurred. A lack of similarity
`
`
`
`105.A stretch of simple sequence DNA occurs in
`
`between the comparable members of duplicated pairs
`IVS2 which appears to be a hot spot for recombi
`
`
`in two different species is taken to mean that the last
`
`
`nation. On the 5' side of this simple sequence, the
`
`
`
`duplications occurred after the species had diverged.
`
`
`allelic AY genes differ considerably in IVS2 whereas
`0r and A'Y genes from chromosome A
`
`
`(See Little et al., 1979b, for a discussion of the prob
`the nonallelic
`
`
`lems of this argument in relation to the fetal globin
`
`
`differ only slightly. Yet on the 3' side of the simple
`genes.)
`
`
`sequence, the allelic genes differ only slightly
`Duplications probably first arise by rare breakage
`
`
`
`whereas the nonallelic genes differ considerably.
`
`
`and reunion events at nonhomologous points on two
`
`
`We hypothesize that the 5' two thirds of the AY gene
`
`
`
`chromosomes, resulting in an unequal exchange be
`
`
`on chromosome A has been "converted" by an
`
`
`tween them (Muller, 1936; Smithies, 1964). Duplica
`
`lntergenic exchange to become more like the GY
`
`
`tions may span part of a gene with no intergenic DNA
`
`gene on its own chromosome A than it is like the
`
`
`
`(such as the haptoglobin Hp2 allele; Smithies, Connell
`
`
`allelic AY gene on the other chromosome B. Our
`
`
`sequence data suggest that intergenic conversions
`
`
`and Dixon, 1962) or may involve several genes and
`
`occur in the germ line. The DNA sequence differ
`
`intergenic DNA (such as the Bar locus in Drosophila;
`
`ences between two chromosomes from a single
`
`
`Bridges, 1 936). The DNA between duplicated genes
`
`
`
`which have recently arisen by a single nonhomologous
`
`
`individual strongly suggest that DNA sequence pol
`
`
`breakage and reunion event must also be duplicated,
`
`
`
`
`ymorphisms for localized deletions, additions and
`
`
`
`either completely on one side or partly on both sides
`
`base substitutions are very common in human pop
`ulations.
`of the duplicated genes.
`
`
`These considerations make it difficult to account,
`
`
`
`by events involving duplication followed by unequal
`
`
`
`but homologous crossing over, for the occurrence of
`The loci specifying the amino acid sequences of the
`
`
`
`
`
`closely related genes adjacent to DNA which does not
`
`
`
`globin chains of mammalian hemoglobins usually oc
`
`
`itself appear to be duplicated. Such cases are known.
`
`
`cur in duplicated nonallelic pairs which frequently
`
`
`For example, the two adult mouse {J-globin genes
`
`
`
`
`differ in their relative expressions at different stages
`
`
`appear to be the result of a relatively recent duplica
`
`of development (Bunn, Forget and Ranney, 1977).
`
`
`
`tion, since their structural genes readily hybridize with
`
`For example, most humans have two a genes (Orkin,
`
`
`each other to form a DNA heteroduplex (Tiemeier et
`1978); two adult /J-type genes, 8 and fJ (Lawn et al.,
`
`
`
`al., 1978); however, the DNA flanking these two genes
`1978); and two fetal {J-type genes, 0
`-y and AY (Fritsch,
`
`
`does not show sufficient homology to form a hetero
`
`
`Lawn and Maniatis, 1979; Bernards et al., 1979; Little
`
`
`duplex. The problem is that of understanding how
`
`et al., 1979a; Ramirez et al., 1979; Tuan et al., 1979).
`
`
`duplicated genes can continue to share many species
`
`
`
`In mice a similar situation exists; some strains have
`
`
`
`specific features while their flanking DNA shows little
`two adult fJ genes which code for different although
`
`or no evidence of once having had a common origin.
`
`
`closely related globins, pmai<x and pm ioor (Tiemeier
`et
`
`
`In this paper we present the complete nucleotide
`
`
`
`al., 1978), while in other strains either the two adult
`
`sequence of the two human fetal globin genes, GY and
`
`fJ genes code for identical products,
`/3". or only one of
`
`Ay, from one chromosome (A), and of most of the A1'
`
`the genes is expressed (Weaver et al., 1979). The
`
`gene from the other chromosome (B) of the same
`
`
`degree of similarity between the products of these
`
`
`
`individual. These sequences appear to provide an
`
`Introduction
`
`SKI Exhibit 2032 - Page 1 of 12
`
`
`
`Cell
`628
`
`mosomes (Dei�seroth et al., 1978) of our diploid do
`
`example at the molecular level of a mechanism per
`
`
`
`nor.
`
`
`
`
`mitting the co-evolution of related linked genes without
`the need to involve the DNA between these
`Clone 164.6 is the key clone for chromosome B. It
`
`
`
`was an independent isolate from the same unamplified
`
`
`genes.Thus the data indicate that the Gy-and Ay
`
`globin genes on a given chromosome can exchange
`
`
`
`collection of in vitro packaged phages as 165.24. We
`
`
`DNA sequences by a recombinational event like a
`
`it as being from chromosome B because its
`identify
`
`
`
`
`restriction map (Figure 1) and several critical se
`
`
`gene conversion, and that a stretch of simple se
`
`
`quenced regions (see below) differ from those of
`
`
`quence DNA in IVS2 is a hotspot for initiating these
`165.24.
`
`
`
`exchanges. The sequence data suggest that conver
`Clone 51. 1 was isolated and initially characterized
`
`
`
`sions take place in the germ line and can occur
`
`
`
`between chromosomes as well as within a single chro
`
`
`
`
`in earlier studies from this laboratory (Blattner et al..
`
`1978; Smithies et al., 1978, 1979) using DNA from
`mosome.
`
`the same donor as for 1 65.24. DNA sequence data
`
`
`In an accompanying paper {Efstratiadis et al., 1980)
`
`
`presented below show that the sequence of the AY
`
`
`
`we present a detailed analysis of the nucleotide se
`
`
`gene in 51 .1 is substantially different from that of the
`
`
`quences of the human fl-type globin genes [embryonic
`
`
`AY gene in 165.24, but is identical to the sequenced
`
`
`e (Baralle, Shoulders and Proudfoot, 1980), fetal GY
`
`regions of the AY gene in clone 164.6. Consequently
`
`
`and AY (this paper), adult 8 {Spritz et al., 1980) and
`
`we identify the AY gene in clone 51.1 as being from
`adult fJ (Lawn et al., 1980)], and we compare these
`
`
`chromosome B.
`
`
`sequences with published globin gene sequences
`from other species.
`
`
`
`Results and Discussion
`
`Organization of the Human Fetal Globin Genes
`
`
`
`The restriction sites and strategy used in the DNA
`
`sequencing are shown in Figure 2. The same basic
`
`
`
`strategy was applicable to all three y-globin genes.
`Chromosomal Maps and Clones Studied
`
`
`
`We have isolated clones covering the fetal globin
`
`
`
`Figure 3 presents and compares the complete DNA
`
`
`sequences of the Gy-and Ay-globin genes from chro
`
`
`region of both chromosomes of a diploid female donor
`
`
`and have found that the chromosomes differ in a
`of the Ay
`mosome A and most of the DNA sequence
`
`globin gene from chromosome B of our donor.
`
`
`number of places. Restriction enzyme sites used in
`
`
`
`
`defining the two chromosomes, arbitrarily labeled A
`
`
`
`The data obtained from this sequence analysis sup
`
`
`and B, are presented in Figure 1 together with the
`
`
`
`port our earlier finding (Smithies et al .• 1978) that the
`
`
`code names and extents of the clones. Asterisks
`
`coding region of the human fetal Ay-globin gene is
`
`
`
`emphasize restriction sites which differ between the
`
`
`
`divided into three segments by two noncoding inter
`
`
`two chromosomes. The coding regions for the GY and
`
`
`vening sequences, and extend this finding to the hu
`
`
`man fetal Gy-globin gene. The smaller intervening
`AY genes are shown by heavy raised bars. About 14
`kb to the 5' side of the GY gene of chromosome B we
`
`
`
`sequence (IVS 1) interrupts the coding region between
`
`codons 30 and 31, and the larger intervening se
`
`have identified another globin gene by hybridization.
`
`
`quence {IVS2) interrupts the coding region between
`
`We presume that it is an e-globin gene on the basis of
`codons 1 04 and 1 05. The arrows at the 5' and 3'
`
`
`
`restriction maps and sequence data presented by
`
`boundaries of both IVS1 and IVS2 in Figure 3 point
`
`
`
`Proudfoot and Baralle (1979) and Fritsch, Lawn and
`
`
`
`out a possible splicing frame for the removal of these
`
`
`
`Maniatis (1980), and have labeled it accordingly.
`
`
`intervening sequences which would conform to the
`
`
`Clone 165.24 is the key clone from chromosome A.
`
`
`GT/ AG rule observed at the boundaries of the inter
`
`
`The restriction map of 165.24 (and the sequence data
`
`
`
`presented below) establish that it contains 14.3 kb of
`
`vening sequences of many other genes (Breathnach
`et al., 1978). The human fetal globin gene intervening
`
`
`DNA which include the coding sequences for the
`with 6y on the 5' side of
`
`
`
`sequences occur in exactly the same positions as
`
`expected two fetal globins,
`
`they occur in adult globin genes from mouse (Konkel,
`
`
`ex"y. The restriction map of 165.24 agrees within
`
`
`Tilghman and Leder, 1978; Konkel, Maizel and Leder,
`
`al error with the fetal portion of more exten
`periment
`
`1979), rabbit (van den Berg et al., 1978; van Ooyen
`sive maps of the human fetal and adult globin region
`
`et al., 1979) and human (Lawn et al., 1980; Spritz et
`
`
`
`
`published by Bernards et al. (1979), Fritsch et al.
`
`
`al., 1980). Clearly the difference in expression be
`
`
`
`
`(1979), Little et al. (1979a), Ramirez et al. (1979) and
`
`
`tween adult and fetal globins cannot be the result
`
`
`Tuan et al. (1979), using DNA from different donors.
`
`either of the fetal genes lacking these intervening
`
`
`These previous studies showed one copy each of the
`
`GY and AY genes per chromosome. Since clone 165.24
`
`
`
`sequences or of their presence in different places in
`the coding region.
`
`contains one copy of each fetal globin gene in the
`IVS1 is 122 bases in length in all three y-globin
`
`
`same DNA environment found in the earlier studies,
`
`genes. The length of IVS2 is different in each of the
`
`
`as judged by the restriction maps, we conclude that
`
`
`three y-globin genes sequenced: 886 bases in the
`
`
`
`165.24 contains DNA corresponding to the entire fetal
`
`GY gene of chromosome A, 866 bases in the AY gene
`
`
`globin region from one of the two relevant #11 chro-
`
`SKI Exhibit 2032 - Page 2 of 12
`
`
`
`
`
`Human Fetal y-Globin Gene Sequences
`
`629
`
`
`
`Chromosome A
`
`E
`I
`
`i(
`
`*
`
`E E Hh BH
`I EE
`I I I ,J ! I
`A..,
`
`165.12
`
`165.24
`
`166.1
`
`242.7
`
`Chromosome B
`E
`
`E
`
`E
`
`E
`
`Th E
`
`B Hh
`f1T Jl(i i A"Y
`E
`JT(
`1i(
`i(
`I
`I
`
`** Th
`* HhHtHh
`
`E
`
`I
`
`51.1
`
`164.1
`
`164.2
`
`164.6
`
`I
`0
`Kbp
`
`I
`2
`
`I
`4
`
`I
`6
`
`I
`8
`
`I
`10
`
`I
`12
`
`I
`14
`
`I
`16
`
`I
`18
`
`I
`20
`
`I
`I
`22 2 4
`
`Figure 1. Maps Outlining Restriction Enzyme Sites Used to Define Chromosomes A and B of the DNA Donor
`
`
`The coding regions of the "y-and •y-globin genes and the general location of a gene presumed to code for E-globin are shown by heavy raised
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`bars. The direction of transcription of the fetal globin genes is shown. Asterisks emphasize restriction sites which differ in the two chromosomes.
`
`
`
`
`
`The brackets under the two chromosome maps show the extents of the clones considered in this paper; their respective code numbers are listed
`
`
`
`
`alongside. The scale is in kilobase pairs. The restriction enzyme sites are (B) Barn HI; (Bg) Bgl II; (E) Eco RI; (Hf) Hin! I; (Hh) Hha I; (Hp) Hph I; (Th)
`Tha I.
`
`5' Flanking and 5' Untranslated Region
`
`
`of chromosome A and 876 bases in the "y gene of
`
`Only one difference occurs in the DNA sequences of
`
`
`
`
`
`
`chromosome B. Preliminary data (not presented here)
`
`
`the 5' flanking untranscribed region and 5' tran
`
`indicate that the IVS2 in the GY gene from chromosome
`
`
`scribed but untranslated region of the three genes;
`
`B is the largest of the four, having about 904 bases.
`
`this is at position 25, where the "y gene from chro
`
`
`A difference in the size of IVS2 in the nonallelic Gy
`
`
`mosome A has an adenine residue whereas the other
`
`
`and "y-globin genes is not surprising, but we did not
`
`two genes have a guanine. Chang et al. (1978) have
`
`
`expect that the lengths of the allelic globin genes
`
`
`
`published a sequence for (presumably mixed G and
`would differ.
`
`
`A)y-globin mRNA; they only report a guanine at this
`
`
`
`position. At position 19 they report both guanine and
`Comparison and Analysis of Sequence Data
`
`
`
`
`cytosine, where we find only guanine. It is not known
`
`
`
`The general similarities and differences in the three
`
`
`y-globin genes are illustrated in Figure 4 by a bar
`
`
`whether these differences are due to genetic poly
`
`
`diagram showing the distribution of the differences.
`
`
`
`morphisms or to problems in sequencing. In the re
`
`mainder of the 5' untranslated region of mRNA, the
`
`
`
`Two striking features are revealed by this comparison.
`
`
`two sets of sequence data are in complete agreement.
`
`
`
`First, substantial portions of the three genes have
`
`
`Included in the 5' flanking region are some se
`
`
`
`
`virtually identical sequences: the 5' flanking and 5'
`
`quences with features common to many globin genes
`
`
`
`untranslated regions, the complete coding sequences,
`
`all of IVS1, and three regions at the ends and middle
`
`
`and to other eucaryotic genes. These sequences are
`
`
`considered in detail in the accompanying comparison
`
`
`of IVS2. Second, in the 3' third of the genes there are
`
`
`paper (Efstratiadis et al., 1980). They include a hex
`
`
`more differences between the nonallelic ay and "y
`
`
`genes on the same chromosome (hatched areas) than
`
`
`
`anucleotide sequence (AAT AAA) starting 31 bases
`
`before the first nucleotide of the mRNA (overlined
`
`
`
`between the allelic "y genes (unhatched areas); how
`
`
`sequence in Figure 3) which is similar in sequence
`
`ever, in the 5' two thirds of the genes the distribution
`
`of the differences is reversed.
`
`
`and position to that first recognized by Goldberg
`
`SKI Exhibit 2032 - Page 3 of 12
`
`
`
`Cell
`630
`
`G-,,
`Sac
`I A-,,[
`Pst
`
`Sac RI Mbo
`
`I
`
`I
`
`Ecol
`
`! J
`
`Map of the Restriction
`En
`Figure 2. Detailed
`Used
`Arrows.
`Shown by Vertical
`zyme Sites.
`g the GY and •y Genes of Clone
`in Sequencin
`165.24 and the "y Gene of Clone 51.1
`The •y above the Psi I site and the ay above
`that each site occurs
`indicate
`the Sac I site
`gene. The directions
`of
`only In the specified
`shown by the solid horizontal
`ar
`sequencing
`rows apply to all three genes; the dotted ar
`rows apply only to the genes in 165.24. The
`on the scale
`and negative numbers
`positive
`1 is the
`position
`positions;
`refer to nucleotide
`fetal globin mRNAs. The
`of the
`first adenine
`(common to all three genes)
`coding regions
`bars.
`are shown by heavy
`
`·100 1
`
`500
`
`1000
`
`1500
`
`, ...•..
`� ........ .
`----. ----. ---+-
`
`•• I
`
`---+--�---
`
`IVS1 DNA sequences,
`A have identical
`chromosome
`histone genes.
`[The hexa
`(1979) in the Drosophila
`between the IVS1 of
`and there is only one difference
`(AAT AAA) from the 5' side of the
`sequence
`nucleotide
`is at position
`21 O,
`AY genes: this difference
`the allelic
`genes is also the same as the poly(A)
`y-globin
`addition
`conservation
`about in the middle of the IVS. The strict
`signal found about 20 bases before the 3' end of
`that
`again suggests
`sequences
`of these nucleotide
`and Brownlee,
`mRNAs (Proudfoot
`many eucaryotic
`must either
`be
`like the coding region,
`the sequences,
`genes. This identity
`is
`our y-globin
`1976), including
`or else are maintained
`pressure
`under strong selection
`but we cannot exclude the
`coincidental,
`probably
`A result
`form by some other mechanism.
`in identical
`on the 5' side of the
`that these sequences
`possibility
`by Konkel et al.
`to ours has been reported
`similar
`genes are part of the 3' end of mRNAs
`y-globin
`from the mouse pm aior
`(1979) for the IVS1 sequences
`from DNA on the 5' side of the y-globin
`transcribed
`and pmin<>< globin genes; three nucleotide
`differences
`functional
`impor
`A second region of possible
`genes.]
`of the IVS1 se
`were found in the 116 nucleotides
`region of many eucaryotic
`tance in the 5' untranslated
`quences of their mouse fJ-globin
`genes.
`out by Ziff
`box," first pointed
`genes is the "capping
`The capping box consists
`of 1 2
`and Evans (1978).
`Region Contains
`a
`The 3' Untranslated
`1 0 of which are on the 5' side of the first
`nucleotides,
`Number of Differences
`Considerable
`of the mRNA. The capping boxes in the
`nucleotide
`of the 3' untranslated
`re
`sequences
`The nucleotide
`genes all have the same sequence
`three y-globin
`gions of both 0y mRNA (Forget
`et al., 1979) and AY
`in Figure 3.
`which is underlined
`(GCAGTTCCACAC),
`mRNA (Poon, Kan and Boyer, 1978) have already
`0y-and Ay-Globin Coding
`but their
`Both are 90 bp in length,
`been determined.
`Sequences
`Our AY sequence
`at six positions.
`differ
`sequences
`differ
`that only one nucleotide
`It is most remarkable
`with that of Poon et al. (1978).
`We
`agrees completely
`the coding
`ence occurs in the 438 bases comprising
`between the two nonallelic
`also find six differences
`genes. This
`y-globin
`region of the three sequenced
`is in codon 136, where the 0y-globin
`genes at the same positions
`as in these previous
`codon
`difference
`are in a block just
`Four of the differences
`reports.
`codon
`and the Ay-globin
`for glycine)
`is GGA (coding
`1508-1511
`in
`codon (positions
`after the terminator
`The fact that the DNA
`for alanine).
`is GCA (coding
`1522 and
`Figure 3); the other two occur at positions
`of the GY and AY
`in the coding regions
`sequences
`to the
`are related
`1 583. Whether these differences
`either
`indicates
`clearly
`identical
`genes are otherwise
`during
`of the Gy-versus Ay-globins
`varied synthesis
`are being exerted
`at
`pressures
`that strong selection
`(Bunn et al., 1977) cannot be deter
`development
`and/or that some type of molec
`level,
`the nucleotide
`Our GY se
`available.
`mined from the data currently
`exists whereby these duplicated
`ular mechanism
`from the GY mRNA
`in two positions
`quence differs
`divergence.
`evolutionary
`genes have avoided
`for 0y mRNA has previ
`The first
`by Forget et al. (1979).
`reported
`sequence
`sequence
`The nucleotide
`1510 (where a G was found
`is at position
`difference
`by Forget et al. (1979) from
`ously been determined
`of our 0y-globin coding
`in the mRNA and we find an A), and the second is at
`A comparison
`cDNA clones.
`position
`1 583 (where a T was found in the mRNA and
`sequence
`(B.
`revised
`with their recently
`sequence
`between
`also differ
`we find an A). These two positions
`no differ
`reveals
`communication)
`personal
`Forget,
`genes.
`the nonallelic
`ences.
`(1976) have suggested,
`as
`and Brownlee
`Proudfoot
`IVS1 Is Very Conserved
`above, that the hexanucleotide
`mentioned
`(AATAAA)
`about 20 bp 5' to
`addition
`forms a signal for poly(A)
`Comparison
`of the 122 bp IVS1 from the three y
`in many mRNAs. We
`of poly(A)
`the first nucleotide
`globin
`to be almost
`genes shows their DNA sequences
`find, as did Forget et al. (1979) and Poon et al. (1978),
`as the sequences
`as highly conserved
`of the coding
`0y and AY genes from
`occurs in both GY and AY
`this sequence
`that exactly
`The two nonallelic
`regions.
`
`SKI Exhibit 2032 - Page 4 of 12
`
`
`
`Human Fetal y-Globin Gene Sequences
`
`
`631
`
`7,6,pp
`
`
`
`100
`
`MetGlyHisPheThrGluGluAsplysAlaThrlleThrSerleuTr
`
`pG1yLysVa1AsnVa1GluAspA1aG1yG1yG1uThrleuGlyAr
`
`200
`
`300
`
`
`
`ePheAspSerPheGlyAsnleuSerSerAlaSerAlalleMetGlyAsnProLysValLysAlaHlsGlyLysLysValLeuThrSerleuGlyAspAla
`
`500
`
`gleuleuValValTyrProTrpThrGlnArgPh
`
`400
`
`IleLysHisleuAspAspleulysG1yThrPheA1aGlnleuSerGluLeuHlsCysAspLysLeuHlsVa1AspProG1uAsnPheLys
`
`-56 GGCCGGCGGCTGGCTAGGGATGAAGAATAAAAGGAAGCACCCTTCAGCAGTTCCAC -1
`lg�i�� l
`-----+---------+---------+---------+---------+---------+
`lg�i�i I A°fu1cGcncTGGAACGTCTGAGftTTATCAATAAGc1cc1AGTccAGAcGcCA1GGGTCATTTCACAGA
`GGAGGACAAGGCTAcTATCACMGCCTGTG
`
`---------+---------+---------+---------+---------+--------+----+--------+-__ ..,._ ___ +
`::::�::����:�::��ATGCTGGAGGAGAAACCCTGGGAAlTAGGCTCTG:��CAGGACMGGGAGG:���===�:::::�::�
`l
`Uii�i
`
`AGTCCAGGlfGCTTCTCAGGATTTGTGGCACCTTCTGACTGTCAAACTGTTCTTGTCMTCTCACAGGCTCCTGGTTGTCTACCCATGGACCCAGAGGTT
`l
`
`
`---------+---------+---------+---------+---------+---------+-------t-------------....
`iE�ii
`CAGCTTTGGCMCCTGTCCTCTGCCTCTGCCATCATGGGCAACCCCAAAGTCMGGCACATGGCAAGAAGGTGCTGACTTCCTTGGGAGATGCC
`CTTTGA
`
`
`---------+----+----1------+---------+---------+---------+---------+--------+---------+
`
`ATAAAGCACCTGGATGATCTCAAGGGCACCTTTGCCCAGCTGAGTGAACTGCACTGTGACAAGCTGCATGTGGATCCTGAGAACTTCAAGGTGAGTCCAG
`I
`lg�i�
`
`------·+---------+---------+---------+---------+----------+-------------4--------+
`GAGATGTTTCAGCtCTGTTGCCTTTAGTCTCGAGGCAACTTAGACAACTGAGTATTGATCTGAGCACAGCAGGGTGTGAGCTGTTTGMGATACTGG
`GGT
`I
`
`
`---------+---------+---------+---------+---------+---------+---------+-------------+---------+
`iEii�i
`TGGGtGTGAAGAAACTGCAGAGGACTAACTGGGCTGAGACCCAGTGGfAATGTTTTAGGGCCTAAGGAGfGCCTCljAAAATCTAGATGGACAAyTTTGA
`lg�·�tt i51.! A
`
`
`
`---------+---------+---------+-----------------+---------+-------------
`ATTAGATTfCXGTAGAAAGAACTTTCAyCTTTCCCftATTTTTGTT�!!�GTTTTA
`800
`CTTTGAGAAAAGAGAGGTGGAAATGAGGAAAATGACTTTT�T.
`ig�if
`----+ ---+--------➔---------+---------+-----�------+
`--- ·--------+---------+
`i
`i51.I AAAACATCTATCTGGAGGCAGGACAAGTATGGTCGTTAAAAAGATGCAGGCAGAAGGCA
`TATATTGGCTCAGTCAAAGTGGGGAACTTTGGTGGCCAAACA 900
`lg�·�tt
`I TACATTGCTAAGGCTATTCCTATATCAGCTGGACACATATAAAATGCTGCTAATGCTTCATTACAAACTTATATCCTTTAATTCCAGATGGGGGCAAAGT 1000
`
`Ig�i�i
`TGTGCGCGCGTGTGTTTGTG
`16� 24 G
`1 1100
`A ATGrccAGGGGTGAGGAAcAAnGAAAcArnGGGcrGGAGTAGAn11GAAAGrcAGcrcrGTG1GTGTG1G1G1G1Gc6c6c6c6c6r6r6r::6r6
`16si�!
`-------1--------+---------+---------+---------+---------+---------+-------------+
`
`GTGTGTGt&GCGTGTGTTTCTTTTAACGTtTTCAGCCTACJCATACAGGGTTCATGGTGG�AGAAGATAaCAAGATTTAAATTATGGCCAGTGACTA 1200
`I
`--------+------------+---------+---------+---------+---------+---------+---------+
`lg�ii
`
`
`GTGCTY�iGAACAACTACCTGCATTTAATGGGAA!GCAAAATCTCAGGCTTTGAGGGAAGTTAACATAGGCTTGATTCTGGGTXGAAGCT!GGTGT 1300
`U�ii I
`---------------------+---------+---------+--------+---------+---------+---------+
`1400
`AACA,CTCC�:::� TGTGCTGGTGA�
`
`:�AGTTATCTGGAGGCCAGGCTGG�:=�=!�:=
`�=�=���:::�������:�=�==����=
`I
`lgiii
`
`
`GTTTTGGCAATCCATTTCGGCAAAGAATTCACCCCTGAGGTGCAGGCTTCCTG GCAGAAGATGGTGACTGeAGTGGCCAGTGCCCTGTCCTCCAGATACC
`I
`----+-------+---------+
`------+ ---------+----+---------+---------+---------+-
`fg�ii
`ATCAC--PoLY
`
`
`
`
`
`ACTGAGCt�frGCCCATGArlCAGAGCTTTCMGGATAGGCTTTATTCTGCAAGCAATACAAATAATAAATCTATTCTGCTaAGAG
`A1aSerTrpGlnLysMetVa1ThrA1aVa1AlaSerA1aleuSerSerArgTyrH
`Va1LeuA1aI1eHfsPheG1yLysG1uPheThrProG1uVa1Gln
`
`ig�:�a x
`---+---------+---------+---------+-------:::;---------+---------+--
`---------+---------+------
`1sTer
`of the 0y and AY Genes from Clone 165.24 and of the •1 Gene from Clone 51.1
`
`Figure 3. Nucleotide Sequences
`
`The numbering system is taken from the 0y gene of 165.24,
`the largest of the three sequenced genes, with position 1 corresponding to the first
`
`
`
`
`is that of the •y gene of 165.24;
`
`adenine of globin mRNA to which the cap, 7mGppp, is added {Chang et al., 1978). The fully listed sequence
`in the 0y gene of 165.24 or in the •y gene of 51.1 are shown respectively
`
`nucleotides which are different
`
`above or below the sequence of the 'y
`
`
`
`
`
`
`
`
`
`
`gene of 165.24. Asterisks indicate gaps. Underlined and overlined nucleotides denote regions of possible biological importance {see text). The
`
`
`
`
`
`amino acids printed below the dashed counting line refer to coding nucleotides above the line. The initiator codon is printed as Met, and the
`
`
`
`
`terminator codon as Ter. Arrows indicate splicing sites which conform to the GT/ AG rule (Breathnach et al., 1978).
`
`
`
`
`
`600
`
`700
`
`I.,._ End of clone
`51.1
`
`G 1 Y
`
`1500
`
`LeuLeuG1yAsnVa1LeuVa1Thr
`
`A
`1592
`
`SKI Exhibit 2032 - Page 5 of 12
`
`
`
`Cell
`632
`
`15
`
`10
`L1)
`
`er
`
`LU
`
`!: 5
`0
`
`I0
`
`fiOS) {146;
`I
`1500
`
`IVS2 Contains Conserved, Nonconserved and
`
`l77J G-y V A-y
`� 165.24 . 165.24
`□A'Yv. A-y
`165 24 51.1
`- COOING REGION
`ll ll
`
`point, and no difference in the 80 bp of IVS2 adjoining
`
`
`
`
`the 3' splice point (Figure 3). Conservation of se
`
`quences around splice points may be important for
`
`
`
`the proper removal of intervening sequences, a pos
`
`
`
`sibility already pointed out by Konkel et al. (1979),
`
`who made similar observations for the mouse {1"' 810'
`and /f"1"0' genes.
`Figure 4 also shows that the three fetal globin genes
`
`
`have an invariant region of 285 bp close to the center
`
`
`of IVS2 (positions 795-1079, Figure 3). We shall
`
`
`
`consider later a possible explanation of this invariant
`region in the middle of IVS2. The data of Konkel et al.
`111 130) 1:111 11041
`
`(1979) on the mouse adult p-globin genes do not
`I
`I
`1000
`500
`show a similar invariant region.
`SEQUENCE POSITION
`
`
`We expected that the allelic forms of the Ay-globin
`Figure 4. Bar Diagram Illustrating the Distribution of Differences be
`
`
`
`
`gene from the two chromosomes of our donor would
`
`tween the Nonallelic G')' and 'y Genes and between the Allelic
`Ai'
`
`
`
`be very similar at the DNA sequence level, even in the
`Genes
`
`
`
`intervening sequences, and that the nonallelic globin
`The hatched areas show the differences between the G'Y gene of
`
`
`chromosome A (from clone 165.24) and the AY gene of the same
`
`genes on the same chromosome might be less similar.
`
`
`chromosome (also from clone 165.24). The unhatched areas show
`
`
`
`Although we found this expectation to be correct in
`
`
`the differences between the •y gene of chromosome A (from clone
`
`some parts of the genes, it is not correct in other
`
`
`165.24) and the allelic •y gene of chromosome B (from clone 51 .1 ).
`parts,
`
`
`as clearly demonstrated in Figure 4. On the 5'
`
`
`Base substitutions are counted as one difference; gaps, regardless
`
`
`
`
`of their length, are arbitrarily counted as three differences so as not
`
`
`
`side of the invariant IVS2 sequence the allelic AY genes
`
`
`
`
`to give gaps overemphasis while still indicating that they usually occur
`
`from the two chromosomes unexpectedly show many
`
`
`
`less frequently than substitutions. The horizontal scale shows nucleo
`
`
`differences in the IVS2 sequence, while in the same
`
`
`
`tide sequence positions. Each bar shows the differences found in
`
`region there are no differences between the nonallelic
`
`
`in 1 00 nucleotides, but the bars have been adjusted
`approximately
`
`0y and AY genes from chromosome A. On the 3' side
`
`
`
`width to coincide with rational boundaries. The coding regions are
`
`
`indicated with the relevant amino acid numbers in parentheses.
`
`
`
`of the invariant region the expected tendency is found;
`
`there are more differences between the IVS2 se
`
`quences in the nonallelic 0y and AY genes than be
`genes of chromosome A at the doubly underlined
`
`
`
`positions 1565-1571 in Figure 3, which is 21 bp from
`
`tween the allelic Ay genes.
`
`
`where rnRNA has poly(A) attached. We do not have
`A clue to solving this puzzle is provided by our
`
`
`
`the corresponding data for chromosome 8.
`
`
`finding a region of "simple sequence" DNA (positions
`
`
`1062-1107 in Figure 3) at the 3' boundary of the
`
`
`
`
`
`invariant region of IVS2. The DNA sequences of the
`Simple Sequence DNA
`
`three genes at and near this region of simple sequence
`
`
`
`We have completely sequenced the IVS2 from three
`
`DNA are reproduced in Figure 5. As shown in the
`
`figure, the dinucleotide TG is the most common ele
`
`
`
`
`of the four y-globin genes of our donor (Figure 3) and
`
`
`
`ment of the simple sequence (large letters in Figure
`
`
`
`(see Figure 7). As have partially sequenced the fourth
`
`
`
`pointed out earlier, the lengths of these IVS2 se
`
`
`5). It is repeated 19 times in stretches of 11 , 2 and 6
`quences vary between 866 and 904 bp.
`
`
`dinucleotides in the GY gene of chromosome A, 13
`
`
`A comparison of the sequences of IVS2 from these
`
`times in the AY gene of chromosome A in a single
`
`three y-globin genes shows a great deal of homology,
`
`stretch, and 1 7 times in the AY gene of chromosome
`
`
`which was expected from the restriction enzyme site
`
`
`B in stretches of 9 and 8 dinucleotides. The dinucleo
`
`
`analysis presented in Figure 2. Using the same
`
`tide CG is also repeated in two of the genes (under
`
`
`lined letters in Figure 5). The simple sequences on
`
`
`method for counting differences as was used for Fig
`
`
`
`chromosome A can formally be related to each other
`
`
`ure 4, the nonallelic Gy-and Ay-globin IVS2 se
`
`
`by a deletion resulting from an unequal exchange
`
`
`quences are 98.3% homologous and the allelic AY
`
`
`between the two larger stretches of TG dinucleotides
`
`
`genes are 97 .3% homologous. The homology found
`with the loss of 20 bases.
`
`here between the nonallelic GY and AY IVS2 sequences
`Figure 5 reemphasizes our finding that on the 5'
`
`
`
`is very much greater than the 59% found by Konkel et
`mouse pmajor and pmlnor
`
`side of the simple sequence region the nonallelic
`al. (1979) for the nonallelic
`GY
`
`
`and AY genes are virtually identical (only 1 base sub
`genes.
`
`
`stitution out of 11 35 bases) whereas the allelic AY
`Figure 4 shows that the IVS2 sequences of these
`
`
`
`
`genes differ considerably (13 substitutions and 2 four
`
`
`three y genes are virtually identical close to the 5' and
`base gaps). Yet on the 3' side of the simple sequence
`
`3' splice points of IVS2. There is only one base pair
`
`
`region the relationships are reversed, with the allelic
`
`
`difference in the 11 O bp of IVS2 adjoining the 5' splice
`
`SKI Exhibit 2032 - Page 6 of 12
`
`
`
`Human fetal y-Globin Gene Sequences
`
`
`633
`
`1060 1070 1080 1090 1100 1110
`t
`I
`I
`l
`l
`I
`
`+- -- CAGCTcTGTGTGTGTGTGTGTGTGTGTGcGcGcGTGTGrrTGTGTGTGTGTGAGAGCG ---..
`L_ 121333
`---•�
`
`
`+----CAGCTcTGTGTGTGTGTGTGTGTG .. •• ................ JGTGTGTGTcAGCG
`G-y CHROMOSOME A
`13+2 GAPs/1135
`
`
`+ - --CAGCTcTGTGTGTGTGTGTGTGTGcGCGCGcGcGTGTG .. TGTGTGTGTGTGTcAGCG ---• 2/333
`
`A'Y CHROMOSOME A
`
`A 'Y CHROMOSOME B
`
`Figure 5. Detailed Comparison of the Nucleotide Sequences of the Three Sequenced Genes at a Region of Simple Sequence DNA (Positions
`
`
`
`
`
`
`
`
`
`1062-1107) near the Middle of IVS2, and a Summary of the Differences between the Three Genes on Either Side of This Simple Sequence Region
`
`
`
`
`
`
`
`Repeated TG dinucleotides in the simple sequence region are shown in large letters; repeated CG dinucleotides are underlined; asterisks
`
`
`
`
`
`
`
`
`represent gaps. The brackets and fractions indicate the fractional sequence differences on either side of the simple sequence between the
`
`
`
`
`
`
`nonallelic GY and AY genes on chromosome A, and between the allelic •y genes on the two chromosomes. The boxed fractions emphasize that on
`
`
`
`
`the 5' side the nonallelic genes are more similar than the allelic genes, while on the 3' side the allelic genes are more similar.
`
`occurred by strand transfer without isomerization and
`
`
`
`
`genes being very similar (2 base substitutions out of
`
`
`
`
`
`branch migration. We do, however, exclude partici
`
`333)and the nonallelic genes being more different
`
`pation of chromosome B in the particular exchange
`(12 base substitutions).
`
`which led to the AY gene of chromosome A, because
`
`Because the DNA in the simple sequence region of
`
`of the 0y gene on chromosome B
`
`
`the 0y and AY genes of chromosome A differs in