throbber
Proc. Natl Acad. Sci. USA
`Vol. 78, No. 11, pp. 7064-7068, November 1981
`Genetics
`
`Allelic forms of rat K chain genes: Evidence for strong selection at
`the level of nucleotide sequence
`(DNA sequences/gene cloning/immunoglobulin genes/molecular evolution)
`
`H. W. SHEPPARD AND G. A. GUTMAN
`
`Department of Microbiology, College of Medicine, University of California, Irvine, California 92717
`
`Communicated by Ray D. Owen, August 5, 1981
`
`The genes that code for two allotypic forms of the
`ABSTRACT
`rat K light chain constant region (C,,) have been cloned and the
`nucleotide sequence of 1172 base pairs of coding and flanlcing se(cid:173)
`quence has been determined for both alleles. These sequences
`have been compared to each other and to the corresponding se(cid:173)
`quences found in the mouse and human. Comparison of the LEW
`allele with mouse C,. reveals two surprising features: (i) There is
`an unusually large number of amino acid substitutions (21) relative
`to the total number of nucleotide changes (37) in the coding region.
`Comparison among several other mammalian genes reveals a
`larger proportion of"silent" changes. (n) The rate of accumulation
`of base substitutions is the same within the coding region as it is
`in some 870 base pairs of noncoding sequence (including 3' un(cid:173)
`translated, 3' flanking, and 5' intervening sequences). Compari(cid:173)
`son of the two allelic forms of rat C,, shows the same unusual fea(cid:173)
`tures in more extreme form. (i) Twelve base substitutions in the
`coding region determine 11 amino acid differences-only one "si(cid:173)
`lent" change exists. (n) There are 12 base substitutions in the 318
`base pairs of coding sequence (3. 7% difference) and only 9 in the
`remaining 854 base pairs of noncoding DNA (1.1%), a highly sig(cid:173)
`nificant difference. This degree of conservation of noncoding se(cid:173)
`quences and of "silent" sites within the coding region is unique
`among the mammalian genes studied thus far. These patterns sug(cid:173)
`gest that there has been strong selection for conservation of nu(cid:173)
`cleotide sequences, both inside and outside the coding region, in(cid:173)
`dependent of the selection required to maintain the function and
`characteristic structure of the immunoglobulin domain itself. The
`functions of the nucleotide sequences that account for this selective
`pressure are unclear at the present time.
`
`There are now a number of mammalian genes for which nu(cid:173)
`cleotide sequences have been determined, including insulin
`(1), growth hormone (2), various globin genes (3), dihydrofolate
`reductase (4), and immunoglobulin heavy and light chains (5).
`For some of these genes the sequence has been determined in
`several species, allowing interspecific comparisons (3, 6-8).
`This kind of analysis has led to the identification of short highly
`conserved elements, which are thought to represent the func(cid:173)
`tional signals for several DNA and RNA processing events. In
`addition, conclusions have been drawn about phylogenetic re(cid:173)
`lationships and patterns of divergence on the basis of the com(cid:173)
`parative structure of these genes.
`With the exception of the intraspecies comparison of the
`~•ior / ~inor subunits of mouse hemoglobin, all comparisons
`to date have been made between genes that are thought to be
`separated by greater than 80 million years (the time separating
`mammalian families). Because the mouse and rat are thought
`to be only 10 million years apart, and the two allotypic forms
`of rat K chains even closer, it was of interest to determine the
`nucleotide sequence of the two allotypic genes (RI-la and RI-
`
`lb) and compare these to each other and to the published mouse
`K chain constant region (CJ sequences (9). In addition, the re(cid:173)
`cent publication of the human c. gene sequence (10) made it
`possible to compare the same gene between species that vary
`widely in their phylogenetic relatedness.
`
`MATERIALS AND METHODS
`Mouse Probe. The probe used for the isolation of the rat c.
`genes was obtained from Jonathan Seidman (Philip Leder' s lab(cid:173)
`oratory, National Institutes of Health) and is designated K41-C
`(11). This probe consists of the plasmid pBR322 with inserted
`mouse cD NA corresponding to the constant region of the K type
`immunoglobulin light chain from the MOPC-41 plasmacytoma.
`The DNA, inserted at the BamHl restriction endonuclease site
`of pBR322, consists of the coding region starting at amino acid
`125, through the 3' end of the mRNA [including the untrans(cid:173)
`lated region and poly(A)] with BamHl synthetic linkers at each
`end. For these studies, the BamHl insert was purified and la(cid:173)
`beled by nick translation to a specific activity of greater than 108
`cpm/µ,g.
`Genomic Blot. High molecular weight DNA was prepared
`from adult rat liver according to Blin and Stafford (12). About
`40 µ,g of this DNA was digested to completion with EcoRI re(cid:173)
`striction endonuclease and electrophoresed in a 0.8% agarose
`gel. The DNA was denatured in NaOH, neutralized, and trans(cid:173)
`ferred to nitrocellulose according to Southern (13). Hybridiza(cid:173)
`tion was carried out at 65°C for 24 hr in hybridization buffer
`(0.75 M NaCl/50 µ,M NaPOi5 mM EDTA/0.01% sodium
`dodecyl sulfate/ 0.1 % Fico II/ 0.1 % polyvinylpyrrolidone/0.1 %
`bovine serum albumin, pH 7.0) and 10" cpm of nick translated
`K41-C probe.
`Preparation of DNA for Gene Cloning. High molecular
`weight liver DNA was digested under conditions in which about
`one in four EcoRI sites is cut. This DNA was then separated by
`size by centrifugation in a 5-25% NaCl gradient. Fractions con(cid:173)
`taining DNA between 5 and 14 kilobases (kb) long were pooled,
`dialyzed, and concentrated by ethanol precipitation. Similarly,
`the left and right arms of the AgtWES vector(l4) were prepared
`by EcoRI digestion, centrifugation in a 5-25% NaCl gradient
`to remove the AB insert, dialysis, and ethanol precipitation.
`Molecular Cloning. The size-selected rat DNA fragments
`were ligated into the purified AgtWES arms at 1:1 molar ratio
`under standard ligation conditions at 25°C for 4 hr. The recom(cid:173)
`binant phage DNA was packaged in vitro by a modification of
`the procedure of Hohn and Murray (15) and screened by in situ
`hybridization (16).
`Subcloning. After Southern blot analysis, the C.-containing
`Eco RI inserts were inserted into the Eco RI site of pBR322 and
`subcloned (17, 18). After additional restriction map and South-
`
`The publication costs of this article were defrayed in part by page charge
`payment. This article must therefore be hereby marked "advertise(cid:173)
`ment" in accordance with 18 U. S. C. §1734 solely to indicate this fact.
`
`Abbreviations: c., constant region of K immunoglobulin light chain; kb,
`kilobase(s); bp, base pair(s).
`
`7064
`
`

`

`Genetics: Sheppard and Gutman
`
`Proc. Natl Acad. Sci. USA 78 (1981)
`
`7065
`
`em blot analysis, the 1.2-kb BspRI fragment was identified and
`inserted into the Eco RI site of pBR322 by filling in the staggered
`ends of the EcoRI site and blunt-end ligation with phage T4
`ligase.
`DNA Sequence Analysis. Nucleotide sequences were de(cid:173)
`termined by the method of Maxam and Gilbert (19). DNA frag(cid:173)
`ments were labeled at the 5' phosphate using [ -y-32P]A TP and
`polynucleotide kinase or by filling in the 3' end of staggered
`restriction sites with DNA polymerase and [a-32P]NTP. La(cid:173)
`beled fragments were separated on polyacrylamide gels cross(cid:173)
`linked with N,N' -bisacrylylcystamine (Bio-Rad), and recovered
`by reduction of the gel and DEAE-cellulose ion-exchange
`chromatography.
`
`RESULTS AND DISCUSSION
`Cloning of the Rat C,, Gene. In order to clone the rat C.
`genes we took advantage of the similarity between the mouse
`and rat (80% amino acid identity) by using a cloned mouse
`cDNA (11) as a hybridization probe. We have successfully used
`this probe to isolate 11 independent c. clones, 6 from the LOU
`strain (RI-lb), 2 from the LEW strain (RI-lb), and 3 from the
`DA strain (RI-la). DNA from each clone was digested with
`Eco RI, separated on a 0. 8% agarose gel, and analyzed by South(cid:173)
`ern blot. The results shown in Fig. 1 demonstrate that every
`clone contained a 6.5-kb fragment bearing the c. gene, which
`corresponds to the fragment identified in the analysis of the rat
`genome (20).
`A restriction map of the 6.5-kb EcoRI fragment, and of an
`1172-base pair (hp) BspRI fragment that was found to contain
`all the hybridizing sequence, is shown in Fig. 2. The complete
`nucleotide sequence of the BspRI fragment was determined for
`both allelic forms, the complete sequence of LOU clone 6 (RI(cid:173)
`lb) being shown in Fig. 3.
`Amino Acid Sequence of RI-lb C,,. The amino acid sequence
`of the RI-lb protein deduced from the nucleotide sequence is
`consistent with the published amino acid sequences of LOU and
`LEW K chains (21-23), with the exception of the arginine at
`amino acid position 155 (nucleotides 618-620). Our sequence
`is Glu-Arg-Arg-Asp at this point, where the previous studies
`
`indicated Glu-Arg-Asp. Two of these studies (21, 22) were based
`solely on tryptic peptides, and therefore would not have de(cid:173)
`tected the second Arg. The third study, however, was based on
`an automated sequencing run through this region (23). This
`discrepancy may represent yet another gene in the LOU strain,
`similar to the variant S211 protein (22), or a propagation of se(cid:173)
`quencing errors as has been seen with mouse C. regions (24).
`Furthermore, translation of the nucleotide sequence of the
`LOU clone gives rise to an amino acid sequence identical to that
`of RI-lb rather than the S-211 myeloma protein sequence (21).
`This result, together with the evidence that there is a single c.
`gene in the LOU genome (20), supports the hypothesis that the
`S-211 myeloma represents a third Rl-1 allele, present among
`rat myelomas due to residual heterozygosity in the LOU strain
`at the time the myelomas were isolated.
`Recognition Sequences and Repetitive Sequences. Exami(cid:173)
`nation of the rat nucleotide sequences reveals two features pre(cid:173)
`dicted from studies on other eukaryotic genes. First, the dinu(cid:173)
`cleotide A-G at position 441 follows the G-T/ A-G role for RNA
`splicing (25, 26). Second, the hexanucleotide A-A-T-A-A-A is
`present at position 940, 19 hp upstream from the presumed site
`of polyadenylylation (position 964); this sequence is thought to
`be a required signal for poly(A) addition to the message (27).
`Several short repetitive sequences have also been discerned
`whose significance is not clear. The hexanucleotide A-C-A-G(cid:173)
`C-A is repeated four times (with one base change in the fourth
`repeat) between positions 611 and 634; this repeat has been
`identified in mouse and human c. genes as well (9, 10). The
`tetranucleotide G-T-G-T and G-T-C-T are each repeated three
`times from positions 336 to 367. The sequence T-C-C-T is re(cid:173)
`peated four times from 403 to 440, including a tandem repeat
`three bases away from the 3' end of the intervening sequence;
`all of these are conserved in the mouse, only two in the human.
`This same tetranucleotide is repeated six times between 811 and
`948, as well as once each at 83 and 1173. Last, the sequence T(cid:173)
`T-T-G is found four times between 953 and 1114, a region span(cid:173)
`ning the site of polyadenylylation of the message. These re(cid:173)
`peated sequences -might be part of extended recognition se(cid:173)
`quences for rearrangements, RNA splicing, or other functions
`
`A
`
`B
`
`-- -
`
`- - -
`
`6.50
`
`21.80 -
`
`7.55 -
`
`5.54 -
`4.80 -
`
`3.38 -
`
`1.35 -
`1. 08 -
`0.87 -
`
`Fio. 1. Analysis of cloned EcoRI fragments. AgtWES DNA containing c. clones wu digested with EcoRI, electrophoresed on a 0.8% agarose
`gel, and subjected to Southern blotting and hybridization with C. probe (see text). (A) Ethidium-stained agarose gel. Lanes 1, 7; and 14 contain size
`standards, with kb indicated on the left. The remaining lanes contain DNA from clones l, 2, 3, 5, 6, and 7 (LOU); 8 and 9 (LEW); and 26, 28, and
`40 (DA). (B) Autoradiograph of Southern blot. Only the 11 C.-containing fragments are visible.
`
`

`

`7066
`
`Genetics: Sheppard and Gutman
`
`Proc. Natl Acad. Sci. USA 78 (1981)
`
`RI
`
`Pa Pv
`"'
`
`1 kb
`
`f\ 'I /
`He Pv He Ba
`
`He
`Hp Ba
`
`Hd
`
`Pv
`
`RI
`
`A
`
`A A RH
`
`D
`
`.
`
`D
`
`Rlt S13 ,a
`
`.
`
`AS9DA D
`'
`'
`
`He
`S3Hf Hp _,RH
`
`Ra
`
`100bp
`I
`
`I
`
`CK
`
`..
`
`3'U.T.
`
`=-r I •
`
`.. ~
`
`FIG. 2. Restriction map of C.-containing 6.5-kb EcoRI fragment from clone 6 (WU, RI-lb). A more detailed map of the 1.2-kb BspRI fragment
`is also shown, and the localization and orientation of the C. gene are indicated. The arrows below indicate the nucleotide sequence determination
`strategy used. Two restriction sites marked by asterisks are present in LOU and LEW clones (RI-lb) but absent from DA (RI-la). RI, EcoRI; Ps,
`Pst I; Pv,Pvu Il; He, Hincll; Bs, BspRI; A, Alu I; RII, EcoRII; D, Dde I; 83, Sau3A; Hf, Hinfl; Hp,Hpa I; S9, Sau 96; Rs, Rsa I; 3'U.T., 3' untranslated.
`
`(see below), or may be the fortuitous results of short duplication
`events (in the case ofless highly conserved repeats). Very sim(cid:173)
`ilar sequences have been observed in globin gene intervening
`sequences, and it has been postulated that they are responsible
`
`for DNA recombination events, which serve to maintain the
`DNA homology between duplicated gene pairs (28).
`Paucity of "Silent" Changes Between Rat and Mouse C,.
`Sequences. Comparison of the LOU and mouse amino acid se-
`
`1 ~TAGAGAGGCTGATAT~ATTCTTGGMT~~GCTCCTACCTTACl\lGTTIGIGTlA
`
`I
`1 GCCT~T/,J;/\J;lJ\GCCTMTATCAGAGTATTCTTGGMGAGACACTGAMGCCACTTTCTGCTCTTACCTT11rGTGCTlGTGl l~T
`
`1111 GGGGTGTCAGATMACTGGTCTGGTATCTCTGTCTGATGCATGGMCTATTGTAGCTGMGA/tJ;MJ;T~TAGMGGMGGCTCTGMTAGCTTCMAG
`
`121 CGGAGTGTCAG TMACTGGTCTGA ATCTCTGTCTGMGCATGGMCT
`
`GM MGMTGT~T/t.GMGGMiiC:CTGTATCTTCMAG
`
`22S GGTCAGACCC MmACTTTCTAMGT AGCTAmrACTAmrAAT~ATAC/t.TGTG TCCTGGCTTCATTGTTCCTMTCTGTAGGGATMGT
`
`2211 GGTCAGACTOfflMmACTTTCTMAGMGTAGCTAGCMCTA.mrAATMCTT/I.GMJ£AACMGA TTGTATATATGTGCATCCTGGC Q;A 11 GI I CCTT ATCTGTAGGGATMGC
`
`1
`'RI GTGCTmCTGTGTGTCTGTGTGTCTGTCTATMCATGTCTATA AC/t.TGCATMTGCACTGA
`- -~ - ' - ,
`348 GTGCTTTTTTGTGTGT TGTA
`1
`2
`448
`
`Ala
`TTTTCCTTGTTACTTCATACCATCCTCTGTGCTTCCTTCCTCA
`e
`I
`TATMCATMCTGTTTAC/t. CATMTAC/t.CTGAMTGCAGCCCTTCCTTGTTACTTCATACCA ICC I Cl GIGC I ICCI I CC I CAGGGGCT
`
`A..oAlaAlaPro Thr-ValS...I lel'hel'roProSer Thr-GluGlnl.euAlaTho-(1!y~h:AlaSerVal V a lC~ T vrProAr-c,A.pI l.S..-Va!Lye T ....Lve
`GA'tGCTGCACCMCTGT/1 TCT A TCTTCCCACC/1 TCCACGGMCAGTTAGCMCT!iGMiGTGCCTCAGTCGTGTtCCTCA TGMCMCTTCT11TCCCAGl(GAtATCAGTGTCAAGTGtMG
`
`GA TGCTGCACCMCTGT/1 TCCA TCTTCCCACC/1 TCCAGTGAGCAGTTMCATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGMCAACTTCTACCCC/t./tJ,J;;TCMTGTCAAGTGGAAG
`Lr
`,-.
`PheLeu
`Ser
`Thr-Ser
`I l.Aer,Gl y T h r -G l~ l yVali..u,..ps...Val Thr"-Gl'""'-5arl.ve,\epSarThr T vrSerMet.S.rs.r ThrL....s--Leu Thr-L,,.A.la/1,. T ,,..CluS.rHle
`ATTGA'tGGtACTGMCGllCGl 'tGG'tGTCCTGGA.t:AGTGTTACTGA~TGAGCAGCACCCTCTCGTTGACCMGGCTGAt:111 TGAMGTCA T
`
`- ----------+------------------ ~ ~ -.....- I
`..... - ----..... ----------·- ____..,___.__
`- -----.. ----- ---------·
`-· ------,. ........:.,•-----'•-•-,.-~,..-..-.- ----.-
`-t----.-
`.....__,_._ -
`·-·------~-----
`-------···------· ....,_ . ..,..
`.
`---+··~·-- ......,_ _, __ ...,. ___ .... ___ +----+----+ ___ .... __ . -·~ -•-.
`
`A TTGA TGGCM;TGMai/\CMMTGGCGTCCTGMCAGTTGGACTGA T~CM;C.ACCT AC/t.GCA TGAGCAGCACCCTCACGTTGACCMGGACTATGMCGACA T
`Ser
`,-. T rp
`/\rg
`&ln/\en
`Thr
`A..pGl u
`AenL.uil"'Thr-C,,.CluVal ValHlel..yeTJ:,rS...S...SerProVal V a l l~ l u l :~ _
`GCTCCCCAGCTCCT
`MCCTCTJ\TACCTtTGAGGTTGTTCATAN.ACA TCATCCTCACCCGTCGTCMGAGCTTCAACAGCM TGIIGTiiTT AGACCCAMGGTCCTGAGGTGCCACCT
`•
`'
`- - - - - 1
`MCAGCT/1 TACCTGTGAGGCCACTCACAAGACATCMCTTCACCCATTGTCAAGAGCTTCAACAGGMTGAGTGTTAGAGACMAGGTCCTGAGACGCCACCACCATCCCCAGCTCCA
`Ser
`AlaThr
`Thr
`Ile
`
`- - - - - ♦-- - -+---+---.... ~---• ---+-- -+-
`
`RAT
`
`MCIJSE
`
`RAT
`
`MCIJSE
`
`RAT
`
`MCIJSE
`
`RAT
`
`MCIJSE
`
`RAT
`
`RAT
`
`MCIJSE
`
`RAT
`
`MCl.lSE
`
`MCIJSE
`
`RAT
`
`. ------- .........
`
`RAT
`
`8115 TCCMTCTTCCCTCCTMGGTCTTGGAGACTTCCCCACMGCGACCTACCACTGTTGCGGTGCTCCMACCTCCTCCCCM:CTCA T
`
`CCTCCTTCCTTTCCTTGCCTTTGA TCA T
`
`817 TCCTATCTTCCCTTCTMGGTCTTGGAGGCTTCCCCACMGCGACCTACCACTGTTGCGGTGCTCCMACCTCCTCCCCACCTCCTTCTCCTCCTCCTcccmccnGCCTmA TCAT
`
`919 GCTMTATTT~ ~MATMAGTGMTCTTTGCACTTGAGAT~!TTGTCTTTCTTACTMAT~~,!!!_ll~~~TGG TTTCTCTTCTMN.AAG
`
`MCIJSE
`
`~ GCTM TA TTTGCAGMM T" TTCM TMAGTGAGTCTTTGCACTTGAGA TCTCTGTCmCTTACTM/1 TGGTAGTM TCAGTTGTmTCCAGTTACCTGGGTTTCTCTTCTMN.AAG
`
`RAT
`
`MOUSE
`
`11!311 TTM/1 TGTTTAGTT=TGAM TCCACCACACTTMACMCAM TMMi. I Cl CCCCCTTGCCCTACTTGGTTGTCCACTAC/t.TGGCAGTCCTCTCTAMGTTCACAAGTACT/1 TTCA T
`
`---+----+----+--- 1----+-- ...__. ---------..... ---------
`
`1857 TTNAA TGTTT AGTTGCCCTGAM TCCACCACACTTMAGGATM/1 TMMCCCTCCA CTTGCCCTGGTTGCCTGTCCACTAC/t.TGGCAGTCCTTTCTMGGTTCACGAGTACT/1 TTCAT
`
`RAT
`
`11511 GGCTTAmCTCTGGCC
`
`MCIJSE
`
`11711 GGCTTAmCTCTGGCC
`
`Fm. 3. Nucleotide sequence of the 1.2-kb BspRI fragment of clone 6 (LOU, RI-lb), aligned with the corresponding sequence of the mouse (9).
`The bar between the sequences indicates the extent of identity between the two; the coding region of the rat is translated above the sequence, amino
`acid differences in the mouse are shown below. Filled circles above the bar indicate positions at which the DA (RI-la) sequence differs from the LOU
`sequence (see Table 2). Hash marks indicate every tenth base in each sequence.
`
`

`

`Genetics: Sheppard and Gutman
`
`Proc. Natl Acad. Sci. USA 78 (1981)
`
`7067
`
`quences, and of the nucleotide sequences of their coding re(cid:173)
`gions, reveals that there are 21 amino acid substitutions (80.0%
`homology) and 37 nucleotide changes (88.5% homology) (Table
`1). If every base pair were selectively neutral, and if nucleotide
`changes occurred at random, approximately three coding
`changes for every silent change would be expected. However,
`in most genes many of the coding changes are clearly nonneutral
`due to the selective pressure on the amino acid sequence of the
`coded polypeptide. This would certainly be expected in the case
`of the c. domain, which has maintained its characteristic "im(cid:173)
`munoglobulin fold" structure and its ability to interact with
`heavy chains throughout vertebrate evolution. Therefore many
`of the coding changes that occur at random would be expected
`to be eliminated.
`Recently, Perler et al. (3) have compared the DNA sequences
`of the globin and proinsulin genes of several species, finding
`a great predominance of silent changes. Their analysis led them
`to the hypothesis that silent base changes accumulate 5--7 times
`more rapidly than nonsilent changes during the first 100 million
`years of divergence from an ancestral gene. After this initial
`"saturation" of the neutral silent sites, the rate of change in si(cid:173)
`lent positions is roughly equivalent to the rate of change at non(cid:173)
`silent positions, despite an increased variability around the
`regression line.
`The results presented here show a distinct paucity of silent
`base changes between rat and mouse C. genes, which are
`thought to be separated by about 10 million years. Using the
`calculation of Perler et al. (3) for corrected percent divergence,
`the ratio of silent to replacement changes is only 1. 9 in contrast
`to an average ratio of 7 for globin and preproinsulin genes.
`The only other example of DNA sequence studies on closely
`related genes is the 13major and 13minor genes in the mouse globin
`gene family (29), which are thought to be the result of a gene
`duplication that occurred about 50 million years ago. In this case
`as well, the coding regions have fewer silent base changes (i.e.,
`9 silent and 9 nonsilent base changes) than expected on the basis
`of the findings of Perler et al., although less so than in the K
`chain genes; the ratio of corrected divergence at silent vs. re(cid:173)
`placement sites is 3.4.
`High Degree of Homology Between Noncoding Regions. An
`equally striking finding is that the high degree of homology
`between the rat and mouse coding regions extends through the
`3' untranslated region and into the 5' and 3' flanking regions.
`As shown in Table 1, the homology between the 3' untranslated
`regions is indistinguishable from that seen between coding re(cid:173)
`gions (88.5% vs. 88.4%), the only significant difference being
`the appearance of three small deletions in the rat untranslated
`sequence. [Size differences are a common feature of noncoding
`regions of other eukaryotic genes (30) and are presumably re(cid:173)
`lated to the absence of selection against frameshifts.] If the dele(cid:173)
`tions are viewed as single changes (because they presumably
`
`Table 1. Degree of homology between nucleotide sequences of
`rat (LOU) and mouse C. genes
`
`Region of comparison
`1172-bp BspRI fragment
`C. coding region
`3' untranslated region
`5' intervening region
`3' flanking region
`
`%homology*
`
`Total
`
`Excluding gapst
`
`85.1
`88.5
`88.4
`80.3
`87.4
`
`88.7
`88.5
`91.5
`88.0
`87.4
`
`• The percent homology was determined by using the formula [(no. of
`homologous base pairs)/(no. of base pairs compared)] x 100.
`tThe percent homology, excluding gaps, was determined with the
`same formula, with each gap scored as one nonidentical base pair.
`
`reflect single events), then the homology between 3' untrans(cid:173)
`lated regions becomes even greater than the coding region ho(cid:173)
`mology (91.5% vs. 88.5%) (Table 1). The homology between the
`200 hp of 3' flanking sequences (87.4%) and the 450 hp of 5'
`flanking sequences (80.3%) is also high despite the presence of
`deletions. Once again, if gaps are scored as single differences,
`then the homology of the 5' flanking region becomes 88.0%,
`which is equivalent to the coding region homology. In fact, sev(cid:173)
`eral of the longest segments of perfect identity occur outside
`the coding region (see Fig. 3).
`Comparison Between Two Rat C,. Alleles. Table 2 shows the
`allotype-associated nucleotide sequence differences between
`the LOU and the DA genes and their distribution. Comparison
`of the LOU and DA sequences shows the same unusual features
`evident in the LOU/mouse comparison, in even more extreme
`form. While there are 11 amino acid changes between LOU and
`DA, there are only 12 nucleotide differences in the coding re(cid:173)
`gion (e.g., only one silent change). With the corrected percent
`divergence, this represents a ratio of silent to replacement
`changes of0.45. While there are 12 nucleotide changes in the
`coding region (318 bp) there is only one nucleotide change in
`the 3' untranslated region, two in the 3' flanking region, and
`six in the 5' flanking region, or a total of 9 changes in 854 bp.
`Thus the homology outside the coding region is several times
`higher than that found inside the coding region (Table 2). This
`difference between coding and noncoding region homology is
`significant at the 0. 0025 level.
`The low proportion of silent base changes in the coding region
`cannot be explained solely on the basis of similar strong codon
`preference between rat and mouse. The only way in which such
`preference would be expressed is by selection against "forbid-
`
`Table 2. Nucleotide sequence differences between LEW and DA
`C. alleles
`
`LEW
`
`DA
`
`Amino
`Acid
`
`Ser
`Met
`Thr
`Ser
`Thr
`Phe
`Val
`Ser
`Gin
`Val
`Glu
`Arg
`
`Position
`
`38
`76
`264
`266
`276
`428
`
`468
`485
`496
`499
`511
`523
`526
`578
`589
`671
`675
`689
`
`800
`
`Amino
`acid
`
`Base
`Base
`5' intron (6/447 = 1.34%)*
`G
`C
`G
`G
`A
`T
`
`A
`T
`A
`A
`G
`C
`
`Coding region (12/318 = 3.77%)*
`Ser
`T
`C
`Thr
`T
`C
`G
`Ala
`A
`Thr
`A
`T
`Ser
`A
`T
`C
`Leu
`T
`A
`Met
`G
`Thr
`C
`G
`A
`Arg
`G
`Ala
`T
`C
`Asp
`A
`C
`Ser
`T
`G
`
`3' untranslated region (1/200 = 0.50%)*
`C
`T
`
`967
`1012
`
`3' flanking region (2/207 = 0.97%)*
`A
`C
`A
`T
`• No. substitutions/total bases compared = % difference.
`
`

`

`7068
`
`Genetics: Sheppard and Gutman
`
`Proc. Natl Acad. Sci. USA 78 (1981)
`
`den" changes, which would have the effect.of increasing the
`. overall homology-between rat and mouse inside the coding re(cid:173)
`gion. The fact that the rate of accumulation ofsubstitutions ap(cid:173)
`pears to be the same or greater in DNA within the coding region
`compared with nearby flanking sequences speaks against this
`possibility.
`Nor can the argument be made that the paucity of "silent"
`changes in the coding region is due to a lack of selection at the
`level of the protein. The high degree of conservation of the
`three-dimensional structure. of lg domains (31) indicates that
`many random changes must be "forbidden. " In addition, as has
`been · pointed out previously (21), the spatial distribution of
`amino acid substitutions between the two Rl~l alleles is highly
`nonrandom.
`The paucity of"silent" changes in the coding region, as well
`as the pattern of flanking region homology between c. genes,
`indicates that there may be powerful constraints on the nu(cid:173)
`cleotide sequences of coding and noncoding sequences based
`upon some function of these DNA segments. Presumably this
`includes the conservation of DNA and RNA processing and
`splicing signals, as well as possible regulatory elements analo(cid:173)
`gous to the operators, promoters, and attenuators found in pro(cid:173)
`karyotic DNA. However, known examples of such signal sites
`are usually coded by small numbers of nucleotides relative to
`the large stretches of high homology found in the c. genes. It
`may be that the sequences required for these signals are con(cid:173)
`siderable more extensive than currently recognized or that ex(cid:173)
`tended tertiary structure is important in the alignment of these
`sequences. On the other hand, there may be additional con(cid:173)
`straints on the DNA sequence for which there are no clear ex(cid:173)
`planations at the present time. It has been suggested (11) that
`high homology in the flanking regions of immunoglobulin V
`region genes might be advantageous by providing a large target
`for intergenic recombination resulting in increased antibody
`diversity. In any case, there appear to be powerful constraints
`on the nucleotide sequence of the c. flanking regions and the
`silent sites of the coding region that are unrelated to the amino
`acid sequence of the coded polypeptide.
`Because these c. genes are very closely related, itis possible
`that there are relatively few silent changes during the early
`stages of divergence of all genes. However; this unusual pattern
`is not restricted to closely related c. genes. A comparison of
`human and mouse c. genes (9, 10) or human and mouse /3-glo(cid:173)
`bin also reveals a smaller proportion of silent changes than is
`found in the-comparison of human and murine a-globin or pre(cid:173)
`proinsulin genes (3). The ratio of corrected divergence at silent
`vs. replacement sites is 3.3 for c., 3.8 for /3-globin, 9.9 for a(cid:173)
`globin, and 14.6 for preproinsulin. Either c. and ~-globin have
`a different pattern of divergence or the pattern that is charac(cid:173)
`teristic of closely related genes is maintained during divergence
`of some genes and lost in others due to differences in the con(cid:173)
`straints on their coding sequences.
`The maintenance of flanking region homology, on the other
`hand, seems to be more characteristic of immunoglobulin
`genes. For example, the ratio of flanking region divergence vs.
`replacement site divergence is 1.4 for c. genes (mouse vs. hu(cid:173)
`man or mouse vs. rat), whereas the ratio among /3-globin genes
`is about 5 [average of ratios derived from published data (3)].
`Thus c. genes may have far fewer "neutral" positions that are
`free to accumulate changes at an accelerated rate, .both inside
`and outside the coding region, which in tum may be related to
`
`the DNA rearrangement and RNA processing events charac(cid:173)
`teristic of lg genes .
`
`We thank Ms: Susan Jasinski for excellent technical assistance, Dr.
`J. Seidman for having provided the C. probe (M-41C) used in the clon(cid:173)
`ing, Dr. E. Max (National Institutes of Health) for kindly supplying us
`with unpublished portions of the mouse C. sequence, and Dr. T. Hun(cid:173)
`kapillar (California Institute of Technology) for providing the computer(cid:173)
`generated Fig. 3. This work was supported in part by U.S. Public Health
`Service Grant Al-14774. H. W.S. was supported by National Research
`Service Award Predoctoral Training Grant GM-07134. G.A.G. is the
`recipient of Research Career Development Award Al-00286 from the
`U.S. Public Health Service.
`
`1. Bell, G. I., Pictet, R. L., Rutter, W. J., Cordell, B., Tischer, E.
`, & .Goodman, H. M. (1980) Nature (London) 284, 26-32.
`2. Seeburg, P. H., Shine, J., Martial, J. A., Baxter, J. D. & Good(cid:173)
`man, H. M. (1978) Nature (London) 270, 486-494.
`3. Perler, F., Efstratiadis, A., .Lomedico, P., Gilbert, W., Kolod(cid:173)
`ner, R. & Dodgson, J. (1980) Cell 20, 55.5-566.
`4. Nunberg, J. H. , Kaufman, R. J., Chang, A. C. Y., Cohen, S. N.
`& Schimke, R. T. (1980) Cell 19, 355-364.
`5. Gottlieb, P. D. (1980) Mol Immunol 17, 1423-1435.
`6. Efstratiadis, A. , Posakony, J. W., Maniatis, T., Lawn, R. M. ,
`O'Connell, C., Spritz, R. A., DeRiel, J. K., Forget, B. G.,
`Weissman, S. M., Slightom, J. L. , Blechl, A. E ., Smithies, 0 .,
`Baralle, F. E. , Shoulders, C. C. & Proudfoot, N. J. (1980) Cell
`21, 653-668.
`7. Lomedico, P., Rosenthal, N., Efstratiadis, A., Gilbert, W., Ko(cid:173)
`lodner, R. & Tizard, R. (1979) Cell 18, 545-558.
`8. Nishioka, Y. & Leder, P. (1979) Cell 18, 875-882.
`9. Max, E . E., Maize!, J. V. & Leder, P. (1981)] . Biol Chem . 256,
`5116-5120.
`10. Hieter, P. A., Max, E. E. , Seidman, J. G., Maize!, J. V. &
`Leder, P. (1980) Cell 22, 197-207.
`11. Seidman, J. G., Leder, A., Nan, M., Normal, B. & Leder, P.
`(1978) Science 202, 11-17.
`12. Blin, N. & Stafford, D. W. (1976) Nucleic Acids Res. 3,
`2303-2308.
`13. Southern, E. M. (1975) J. Mol Biol 98, 503-517.
`14. Enquist, L. , Tiemeier, D., Leder, P., Weisberg, R. & Stern(cid:173)
`berg, N. (1976) Nature (London) 259, 596-598.
`15. Hohn, B. & Murray, K. (1977) Proc. Natl Acad. Aci. USA 74,
`3259-3263.
`16. Benton, W. D. & Davis, R. W. (1977) Science 196, 180-182.
`17. Barnes, W. M. (1977) Science 195, 393-394.
`18. Meyers, J. A. , Sanchez, D., Elwell, L. P. & Falkow, S. (1976)] .
`Bacteriol 127, 1529-1537.
`19. Maxam, A. M. & Gilbert, W. (1977) Proc. Natl Acad. Sci. USA
`74, 560-564.
`20. Sheppard, H. W. & Gutman, G. A. (1981) Nature (London), in
`press.
`21. Gutman, G .. A., Loh, E. & Hood, L. (1975) Proc. Natl Acad. Sci.
`USA 72, 5046-5050.
`22. Starace, V. & Querinjean, P. (1975) J. lmmunol 115, 59-62.
`23. Wang, A. C., Fudenberg, H. H. & Bazin, H. (1976) Biochem .
`Genet. 14, 209-223.
`24. Hamlyn, P. H., Brownlee, G. G., Cheng, C. C. , Gait, M. N. &
`Milstein, C. (1978) Cell 15, 1067-1075.
`25. Breathnach, R., Benoist, C. , O'Hare, K., Gannon, F. & Cham(cid:173)
`bon, P. (1978) Proc. Natl Acad. Sci. USA 75, 4853-4857.
`26. Catterall, J. F., O'Malley, B. W., Robertson, M. A., Staden, R. ,
`Tanaka, Y. & Brownlee, G. G. (1978) Nature (London) 275,
`510-513.
`27. Proudfoot, N. J. & Brownlee, G. G. (1978) Nature (London) 263,
`211-214.
`28. Slightom, J. L., Blechl, A. E. & Smithies, 0 . (1980) Cell 21,
`627-638.
`29. Konkel, D. A .• Maize!, J. V. & Leder, P. (1979) Cell 18, ~73.
`30. Nishioka, Y. & Leder, P. (1979) Cell 18, 875-882.
`31. Poljak, R. J. (1978) Crit. Rev. Biochem. 5, 45--S4.
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket