`Vol. 78, No. 11, pp. 7064-7068, November 1981
`Genetics
`
`Allelic forms of rat K chain genes: Evidence for strong selection at
`the level of nucleotide sequence
`(DNA sequences/gene cloning/immunoglobulin genes/molecular evolution)
`
`H. W. SHEPPARD AND G. A. GUTMAN
`
`Department of Microbiology, College of Medicine, University of California, Irvine, California 92717
`
`Communicated by Ray D. Owen, August 5, 1981
`
`The genes that code for two allotypic forms of the
`ABSTRACT
`rat K light chain constant region (C,,) have been cloned and the
`nucleotide sequence of 1172 base pairs of coding and flanlcing se(cid:173)
`quence has been determined for both alleles. These sequences
`have been compared to each other and to the corresponding se(cid:173)
`quences found in the mouse and human. Comparison of the LEW
`allele with mouse C,. reveals two surprising features: (i) There is
`an unusually large number of amino acid substitutions (21) relative
`to the total number of nucleotide changes (37) in the coding region.
`Comparison among several other mammalian genes reveals a
`larger proportion of"silent" changes. (n) The rate of accumulation
`of base substitutions is the same within the coding region as it is
`in some 870 base pairs of noncoding sequence (including 3' un(cid:173)
`translated, 3' flanking, and 5' intervening sequences). Compari(cid:173)
`son of the two allelic forms of rat C,, shows the same unusual fea(cid:173)
`tures in more extreme form. (i) Twelve base substitutions in the
`coding region determine 11 amino acid differences-only one "si(cid:173)
`lent" change exists. (n) There are 12 base substitutions in the 318
`base pairs of coding sequence (3. 7% difference) and only 9 in the
`remaining 854 base pairs of noncoding DNA (1.1%), a highly sig(cid:173)
`nificant difference. This degree of conservation of noncoding se(cid:173)
`quences and of "silent" sites within the coding region is unique
`among the mammalian genes studied thus far. These patterns sug(cid:173)
`gest that there has been strong selection for conservation of nu(cid:173)
`cleotide sequences, both inside and outside the coding region, in(cid:173)
`dependent of the selection required to maintain the function and
`characteristic structure of the immunoglobulin domain itself. The
`functions of the nucleotide sequences that account for this selective
`pressure are unclear at the present time.
`
`There are now a number of mammalian genes for which nu(cid:173)
`cleotide sequences have been determined, including insulin
`(1), growth hormone (2), various globin genes (3), dihydrofolate
`reductase (4), and immunoglobulin heavy and light chains (5).
`For some of these genes the sequence has been determined in
`several species, allowing interspecific comparisons (3, 6-8).
`This kind of analysis has led to the identification of short highly
`conserved elements, which are thought to represent the func(cid:173)
`tional signals for several DNA and RNA processing events. In
`addition, conclusions have been drawn about phylogenetic re(cid:173)
`lationships and patterns of divergence on the basis of the com(cid:173)
`parative structure of these genes.
`With the exception of the intraspecies comparison of the
`~•ior / ~inor subunits of mouse hemoglobin, all comparisons
`to date have been made between genes that are thought to be
`separated by greater than 80 million years (the time separating
`mammalian families). Because the mouse and rat are thought
`to be only 10 million years apart, and the two allotypic forms
`of rat K chains even closer, it was of interest to determine the
`nucleotide sequence of the two allotypic genes (RI-la and RI-
`
`lb) and compare these to each other and to the published mouse
`K chain constant region (CJ sequences (9). In addition, the re(cid:173)
`cent publication of the human c. gene sequence (10) made it
`possible to compare the same gene between species that vary
`widely in their phylogenetic relatedness.
`
`MATERIALS AND METHODS
`Mouse Probe. The probe used for the isolation of the rat c.
`genes was obtained from Jonathan Seidman (Philip Leder' s lab(cid:173)
`oratory, National Institutes of Health) and is designated K41-C
`(11). This probe consists of the plasmid pBR322 with inserted
`mouse cD NA corresponding to the constant region of the K type
`immunoglobulin light chain from the MOPC-41 plasmacytoma.
`The DNA, inserted at the BamHl restriction endonuclease site
`of pBR322, consists of the coding region starting at amino acid
`125, through the 3' end of the mRNA [including the untrans(cid:173)
`lated region and poly(A)] with BamHl synthetic linkers at each
`end. For these studies, the BamHl insert was purified and la(cid:173)
`beled by nick translation to a specific activity of greater than 108
`cpm/µ,g.
`Genomic Blot. High molecular weight DNA was prepared
`from adult rat liver according to Blin and Stafford (12). About
`40 µ,g of this DNA was digested to completion with EcoRI re(cid:173)
`striction endonuclease and electrophoresed in a 0.8% agarose
`gel. The DNA was denatured in NaOH, neutralized, and trans(cid:173)
`ferred to nitrocellulose according to Southern (13). Hybridiza(cid:173)
`tion was carried out at 65°C for 24 hr in hybridization buffer
`(0.75 M NaCl/50 µ,M NaPOi5 mM EDTA/0.01% sodium
`dodecyl sulfate/ 0.1 % Fico II/ 0.1 % polyvinylpyrrolidone/0.1 %
`bovine serum albumin, pH 7.0) and 10" cpm of nick translated
`K41-C probe.
`Preparation of DNA for Gene Cloning. High molecular
`weight liver DNA was digested under conditions in which about
`one in four EcoRI sites is cut. This DNA was then separated by
`size by centrifugation in a 5-25% NaCl gradient. Fractions con(cid:173)
`taining DNA between 5 and 14 kilobases (kb) long were pooled,
`dialyzed, and concentrated by ethanol precipitation. Similarly,
`the left and right arms of the AgtWES vector(l4) were prepared
`by EcoRI digestion, centrifugation in a 5-25% NaCl gradient
`to remove the AB insert, dialysis, and ethanol precipitation.
`Molecular Cloning. The size-selected rat DNA fragments
`were ligated into the purified AgtWES arms at 1:1 molar ratio
`under standard ligation conditions at 25°C for 4 hr. The recom(cid:173)
`binant phage DNA was packaged in vitro by a modification of
`the procedure of Hohn and Murray (15) and screened by in situ
`hybridization (16).
`Subcloning. After Southern blot analysis, the C.-containing
`Eco RI inserts were inserted into the Eco RI site of pBR322 and
`subcloned (17, 18). After additional restriction map and South-
`
`The publication costs of this article were defrayed in part by page charge
`payment. This article must therefore be hereby marked "advertise(cid:173)
`ment" in accordance with 18 U. S. C. §1734 solely to indicate this fact.
`
`Abbreviations: c., constant region of K immunoglobulin light chain; kb,
`kilobase(s); bp, base pair(s).
`
`7064
`
`
`
`Genetics: Sheppard and Gutman
`
`Proc. Natl Acad. Sci. USA 78 (1981)
`
`7065
`
`em blot analysis, the 1.2-kb BspRI fragment was identified and
`inserted into the Eco RI site of pBR322 by filling in the staggered
`ends of the EcoRI site and blunt-end ligation with phage T4
`ligase.
`DNA Sequence Analysis. Nucleotide sequences were de(cid:173)
`termined by the method of Maxam and Gilbert (19). DNA frag(cid:173)
`ments were labeled at the 5' phosphate using [ -y-32P]A TP and
`polynucleotide kinase or by filling in the 3' end of staggered
`restriction sites with DNA polymerase and [a-32P]NTP. La(cid:173)
`beled fragments were separated on polyacrylamide gels cross(cid:173)
`linked with N,N' -bisacrylylcystamine (Bio-Rad), and recovered
`by reduction of the gel and DEAE-cellulose ion-exchange
`chromatography.
`
`RESULTS AND DISCUSSION
`Cloning of the Rat C,, Gene. In order to clone the rat C.
`genes we took advantage of the similarity between the mouse
`and rat (80% amino acid identity) by using a cloned mouse
`cDNA (11) as a hybridization probe. We have successfully used
`this probe to isolate 11 independent c. clones, 6 from the LOU
`strain (RI-lb), 2 from the LEW strain (RI-lb), and 3 from the
`DA strain (RI-la). DNA from each clone was digested with
`Eco RI, separated on a 0. 8% agarose gel, and analyzed by South(cid:173)
`ern blot. The results shown in Fig. 1 demonstrate that every
`clone contained a 6.5-kb fragment bearing the c. gene, which
`corresponds to the fragment identified in the analysis of the rat
`genome (20).
`A restriction map of the 6.5-kb EcoRI fragment, and of an
`1172-base pair (hp) BspRI fragment that was found to contain
`all the hybridizing sequence, is shown in Fig. 2. The complete
`nucleotide sequence of the BspRI fragment was determined for
`both allelic forms, the complete sequence of LOU clone 6 (RI(cid:173)
`lb) being shown in Fig. 3.
`Amino Acid Sequence of RI-lb C,,. The amino acid sequence
`of the RI-lb protein deduced from the nucleotide sequence is
`consistent with the published amino acid sequences of LOU and
`LEW K chains (21-23), with the exception of the arginine at
`amino acid position 155 (nucleotides 618-620). Our sequence
`is Glu-Arg-Arg-Asp at this point, where the previous studies
`
`indicated Glu-Arg-Asp. Two of these studies (21, 22) were based
`solely on tryptic peptides, and therefore would not have de(cid:173)
`tected the second Arg. The third study, however, was based on
`an automated sequencing run through this region (23). This
`discrepancy may represent yet another gene in the LOU strain,
`similar to the variant S211 protein (22), or a propagation of se(cid:173)
`quencing errors as has been seen with mouse C. regions (24).
`Furthermore, translation of the nucleotide sequence of the
`LOU clone gives rise to an amino acid sequence identical to that
`of RI-lb rather than the S-211 myeloma protein sequence (21).
`This result, together with the evidence that there is a single c.
`gene in the LOU genome (20), supports the hypothesis that the
`S-211 myeloma represents a third Rl-1 allele, present among
`rat myelomas due to residual heterozygosity in the LOU strain
`at the time the myelomas were isolated.
`Recognition Sequences and Repetitive Sequences. Exami(cid:173)
`nation of the rat nucleotide sequences reveals two features pre(cid:173)
`dicted from studies on other eukaryotic genes. First, the dinu(cid:173)
`cleotide A-G at position 441 follows the G-T/ A-G role for RNA
`splicing (25, 26). Second, the hexanucleotide A-A-T-A-A-A is
`present at position 940, 19 hp upstream from the presumed site
`of polyadenylylation (position 964); this sequence is thought to
`be a required signal for poly(A) addition to the message (27).
`Several short repetitive sequences have also been discerned
`whose significance is not clear. The hexanucleotide A-C-A-G(cid:173)
`C-A is repeated four times (with one base change in the fourth
`repeat) between positions 611 and 634; this repeat has been
`identified in mouse and human c. genes as well (9, 10). The
`tetranucleotide G-T-G-T and G-T-C-T are each repeated three
`times from positions 336 to 367. The sequence T-C-C-T is re(cid:173)
`peated four times from 403 to 440, including a tandem repeat
`three bases away from the 3' end of the intervening sequence;
`all of these are conserved in the mouse, only two in the human.
`This same tetranucleotide is repeated six times between 811 and
`948, as well as once each at 83 and 1173. Last, the sequence T(cid:173)
`T-T-G is found four times between 953 and 1114, a region span(cid:173)
`ning the site of polyadenylylation of the message. These re(cid:173)
`peated sequences -might be part of extended recognition se(cid:173)
`quences for rearrangements, RNA splicing, or other functions
`
`A
`
`B
`
`-- -
`
`- - -
`
`6.50
`
`21.80 -
`
`7.55 -
`
`5.54 -
`4.80 -
`
`3.38 -
`
`1.35 -
`1. 08 -
`0.87 -
`
`Fio. 1. Analysis of cloned EcoRI fragments. AgtWES DNA containing c. clones wu digested with EcoRI, electrophoresed on a 0.8% agarose
`gel, and subjected to Southern blotting and hybridization with C. probe (see text). (A) Ethidium-stained agarose gel. Lanes 1, 7; and 14 contain size
`standards, with kb indicated on the left. The remaining lanes contain DNA from clones l, 2, 3, 5, 6, and 7 (LOU); 8 and 9 (LEW); and 26, 28, and
`40 (DA). (B) Autoradiograph of Southern blot. Only the 11 C.-containing fragments are visible.
`
`
`
`7066
`
`Genetics: Sheppard and Gutman
`
`Proc. Natl Acad. Sci. USA 78 (1981)
`
`RI
`
`Pa Pv
`"'
`
`1 kb
`
`f\ 'I /
`He Pv He Ba
`
`He
`Hp Ba
`
`Hd
`
`Pv
`
`RI
`
`A
`
`A A RH
`
`D
`
`.
`
`D
`
`Rlt S13 ,a
`
`.
`
`AS9DA D
`'
`'
`
`He
`S3Hf Hp _,RH
`
`Ra
`
`100bp
`I
`
`I
`
`CK
`
`..
`
`3'U.T.
`
`=-r I •
`
`.. ~
`
`FIG. 2. Restriction map of C.-containing 6.5-kb EcoRI fragment from clone 6 (WU, RI-lb). A more detailed map of the 1.2-kb BspRI fragment
`is also shown, and the localization and orientation of the C. gene are indicated. The arrows below indicate the nucleotide sequence determination
`strategy used. Two restriction sites marked by asterisks are present in LOU and LEW clones (RI-lb) but absent from DA (RI-la). RI, EcoRI; Ps,
`Pst I; Pv,Pvu Il; He, Hincll; Bs, BspRI; A, Alu I; RII, EcoRII; D, Dde I; 83, Sau3A; Hf, Hinfl; Hp,Hpa I; S9, Sau 96; Rs, Rsa I; 3'U.T., 3' untranslated.
`
`(see below), or may be the fortuitous results of short duplication
`events (in the case ofless highly conserved repeats). Very sim(cid:173)
`ilar sequences have been observed in globin gene intervening
`sequences, and it has been postulated that they are responsible
`
`for DNA recombination events, which serve to maintain the
`DNA homology between duplicated gene pairs (28).
`Paucity of "Silent" Changes Between Rat and Mouse C,.
`Sequences. Comparison of the LOU and mouse amino acid se-
`
`1 ~TAGAGAGGCTGATAT~ATTCTTGGMT~~GCTCCTACCTTACl\lGTTIGIGTlA
`
`I
`1 GCCT~T/,J;/\J;lJ\GCCTMTATCAGAGTATTCTTGGMGAGACACTGAMGCCACTTTCTGCTCTTACCTT11rGTGCTlGTGl l~T
`
`1111 GGGGTGTCAGATMACTGGTCTGGTATCTCTGTCTGATGCATGGMCTATTGTAGCTGMGA/tJ;MJ;T~TAGMGGMGGCTCTGMTAGCTTCMAG
`
`121 CGGAGTGTCAG TMACTGGTCTGA ATCTCTGTCTGMGCATGGMCT
`
`GM MGMTGT~T/t.GMGGMiiC:CTGTATCTTCMAG
`
`22S GGTCAGACCC MmACTTTCTAMGT AGCTAmrACTAmrAAT~ATAC/t.TGTG TCCTGGCTTCATTGTTCCTMTCTGTAGGGATMGT
`
`2211 GGTCAGACTOfflMmACTTTCTMAGMGTAGCTAGCMCTA.mrAATMCTT/I.GMJ£AACMGA TTGTATATATGTGCATCCTGGC Q;A 11 GI I CCTT ATCTGTAGGGATMGC
`
`1
`'RI GTGCTmCTGTGTGTCTGTGTGTCTGTCTATMCATGTCTATA AC/t.TGCATMTGCACTGA
`- -~ - ' - ,
`348 GTGCTTTTTTGTGTGT TGTA
`1
`2
`448
`
`Ala
`TTTTCCTTGTTACTTCATACCATCCTCTGTGCTTCCTTCCTCA
`e
`I
`TATMCATMCTGTTTAC/t. CATMTAC/t.CTGAMTGCAGCCCTTCCTTGTTACTTCATACCA ICC I Cl GIGC I ICCI I CC I CAGGGGCT
`
`A..oAlaAlaPro Thr-ValS...I lel'hel'roProSer Thr-GluGlnl.euAlaTho-(1!y~h:AlaSerVal V a lC~ T vrProAr-c,A.pI l.S..-Va!Lye T ....Lve
`GA'tGCTGCACCMCTGT/1 TCT A TCTTCCCACC/1 TCCACGGMCAGTTAGCMCT!iGMiGTGCCTCAGTCGTGTtCCTCA TGMCMCTTCT11TCCCAGl(GAtATCAGTGTCAAGTGtMG
`
`GA TGCTGCACCMCTGT/1 TCCA TCTTCCCACC/1 TCCAGTGAGCAGTTMCATCTGGAGGTGCCTCAGTCGTGTGCTTCTTGMCAACTTCTACCCC/t./tJ,J;;TCMTGTCAAGTGGAAG
`Lr
`,-.
`PheLeu
`Ser
`Thr-Ser
`I l.Aer,Gl y T h r -G l~ l yVali..u,..ps...Val Thr"-Gl'""'-5arl.ve,\epSarThr T vrSerMet.S.rs.r ThrL....s--Leu Thr-L,,.A.la/1,. T ,,..CluS.rHle
`ATTGA'tGGtACTGMCGllCGl 'tGG'tGTCCTGGA.t:AGTGTTACTGA~TGAGCAGCACCCTCTCGTTGACCMGGCTGAt:111 TGAMGTCA T
`
`- ----------+------------------ ~ ~ -.....- I
`..... - ----..... ----------·- ____..,___.__
`- -----.. ----- ---------·
`-· ------,. ........:.,•-----'•-•-,.-~,..-..-.- ----.-
`-t----.-
`.....__,_._ -
`·-·------~-----
`-------···------· ....,_ . ..,..
`.
`---+··~·-- ......,_ _, __ ...,. ___ .... ___ +----+----+ ___ .... __ . -·~ -•-.
`
`A TTGA TGGCM;TGMai/\CMMTGGCGTCCTGMCAGTTGGACTGA T~CM;C.ACCT AC/t.GCA TGAGCAGCACCCTCACGTTGACCMGGACTATGMCGACA T
`Ser
`,-. T rp
`/\rg
`&ln/\en
`Thr
`A..pGl u
`AenL.uil"'Thr-C,,.CluVal ValHlel..yeTJ:,rS...S...SerProVal V a l l~ l u l :~ _
`GCTCCCCAGCTCCT
`MCCTCTJ\TACCTtTGAGGTTGTTCATAN.ACA TCATCCTCACCCGTCGTCMGAGCTTCAACAGCM TGIIGTiiTT AGACCCAMGGTCCTGAGGTGCCACCT
`•
`'
`- - - - - 1
`MCAGCT/1 TACCTGTGAGGCCACTCACAAGACATCMCTTCACCCATTGTCAAGAGCTTCAACAGGMTGAGTGTTAGAGACMAGGTCCTGAGACGCCACCACCATCCCCAGCTCCA
`Ser
`AlaThr
`Thr
`Ile
`
`- - - - - ♦-- - -+---+---.... ~---• ---+-- -+-
`
`RAT
`
`MCIJSE
`
`RAT
`
`MCIJSE
`
`RAT
`
`MCIJSE
`
`RAT
`
`MCIJSE
`
`RAT
`
`RAT
`
`MCIJSE
`
`RAT
`
`MCl.lSE
`
`MCIJSE
`
`RAT
`
`. ------- .........
`
`RAT
`
`8115 TCCMTCTTCCCTCCTMGGTCTTGGAGACTTCCCCACMGCGACCTACCACTGTTGCGGTGCTCCMACCTCCTCCCCM:CTCA T
`
`CCTCCTTCCTTTCCTTGCCTTTGA TCA T
`
`817 TCCTATCTTCCCTTCTMGGTCTTGGAGGCTTCCCCACMGCGACCTACCACTGTTGCGGTGCTCCMACCTCCTCCCCACCTCCTTCTCCTCCTCCTcccmccnGCCTmA TCAT
`
`919 GCTMTATTT~ ~MATMAGTGMTCTTTGCACTTGAGAT~!TTGTCTTTCTTACTMAT~~,!!!_ll~~~TGG TTTCTCTTCTMN.AAG
`
`MCIJSE
`
`~ GCTM TA TTTGCAGMM T" TTCM TMAGTGAGTCTTTGCACTTGAGA TCTCTGTCmCTTACTM/1 TGGTAGTM TCAGTTGTmTCCAGTTACCTGGGTTTCTCTTCTMN.AAG
`
`RAT
`
`MOUSE
`
`11!311 TTM/1 TGTTTAGTT=TGAM TCCACCACACTTMACMCAM TMMi. I Cl CCCCCTTGCCCTACTTGGTTGTCCACTAC/t.TGGCAGTCCTCTCTAMGTTCACAAGTACT/1 TTCA T
`
`---+----+----+--- 1----+-- ...__. ---------..... ---------
`
`1857 TTNAA TGTTT AGTTGCCCTGAM TCCACCACACTTMAGGATM/1 TMMCCCTCCA CTTGCCCTGGTTGCCTGTCCACTAC/t.TGGCAGTCCTTTCTMGGTTCACGAGTACT/1 TTCAT
`
`RAT
`
`11511 GGCTTAmCTCTGGCC
`
`MCIJSE
`
`11711 GGCTTAmCTCTGGCC
`
`Fm. 3. Nucleotide sequence of the 1.2-kb BspRI fragment of clone 6 (LOU, RI-lb), aligned with the corresponding sequence of the mouse (9).
`The bar between the sequences indicates the extent of identity between the two; the coding region of the rat is translated above the sequence, amino
`acid differences in the mouse are shown below. Filled circles above the bar indicate positions at which the DA (RI-la) sequence differs from the LOU
`sequence (see Table 2). Hash marks indicate every tenth base in each sequence.
`
`
`
`Genetics: Sheppard and Gutman
`
`Proc. Natl Acad. Sci. USA 78 (1981)
`
`7067
`
`quences, and of the nucleotide sequences of their coding re(cid:173)
`gions, reveals that there are 21 amino acid substitutions (80.0%
`homology) and 37 nucleotide changes (88.5% homology) (Table
`1). If every base pair were selectively neutral, and if nucleotide
`changes occurred at random, approximately three coding
`changes for every silent change would be expected. However,
`in most genes many of the coding changes are clearly nonneutral
`due to the selective pressure on the amino acid sequence of the
`coded polypeptide. This would certainly be expected in the case
`of the c. domain, which has maintained its characteristic "im(cid:173)
`munoglobulin fold" structure and its ability to interact with
`heavy chains throughout vertebrate evolution. Therefore many
`of the coding changes that occur at random would be expected
`to be eliminated.
`Recently, Perler et al. (3) have compared the DNA sequences
`of the globin and proinsulin genes of several species, finding
`a great predominance of silent changes. Their analysis led them
`to the hypothesis that silent base changes accumulate 5--7 times
`more rapidly than nonsilent changes during the first 100 million
`years of divergence from an ancestral gene. After this initial
`"saturation" of the neutral silent sites, the rate of change in si(cid:173)
`lent positions is roughly equivalent to the rate of change at non(cid:173)
`silent positions, despite an increased variability around the
`regression line.
`The results presented here show a distinct paucity of silent
`base changes between rat and mouse C. genes, which are
`thought to be separated by about 10 million years. Using the
`calculation of Perler et al. (3) for corrected percent divergence,
`the ratio of silent to replacement changes is only 1. 9 in contrast
`to an average ratio of 7 for globin and preproinsulin genes.
`The only other example of DNA sequence studies on closely
`related genes is the 13major and 13minor genes in the mouse globin
`gene family (29), which are thought to be the result of a gene
`duplication that occurred about 50 million years ago. In this case
`as well, the coding regions have fewer silent base changes (i.e.,
`9 silent and 9 nonsilent base changes) than expected on the basis
`of the findings of Perler et al., although less so than in the K
`chain genes; the ratio of corrected divergence at silent vs. re(cid:173)
`placement sites is 3.4.
`High Degree of Homology Between Noncoding Regions. An
`equally striking finding is that the high degree of homology
`between the rat and mouse coding regions extends through the
`3' untranslated region and into the 5' and 3' flanking regions.
`As shown in Table 1, the homology between the 3' untranslated
`regions is indistinguishable from that seen between coding re(cid:173)
`gions (88.5% vs. 88.4%), the only significant difference being
`the appearance of three small deletions in the rat untranslated
`sequence. [Size differences are a common feature of noncoding
`regions of other eukaryotic genes (30) and are presumably re(cid:173)
`lated to the absence of selection against frameshifts.] If the dele(cid:173)
`tions are viewed as single changes (because they presumably
`
`Table 1. Degree of homology between nucleotide sequences of
`rat (LOU) and mouse C. genes
`
`Region of comparison
`1172-bp BspRI fragment
`C. coding region
`3' untranslated region
`5' intervening region
`3' flanking region
`
`%homology*
`
`Total
`
`Excluding gapst
`
`85.1
`88.5
`88.4
`80.3
`87.4
`
`88.7
`88.5
`91.5
`88.0
`87.4
`
`• The percent homology was determined by using the formula [(no. of
`homologous base pairs)/(no. of base pairs compared)] x 100.
`tThe percent homology, excluding gaps, was determined with the
`same formula, with each gap scored as one nonidentical base pair.
`
`reflect single events), then the homology between 3' untrans(cid:173)
`lated regions becomes even greater than the coding region ho(cid:173)
`mology (91.5% vs. 88.5%) (Table 1). The homology between the
`200 hp of 3' flanking sequences (87.4%) and the 450 hp of 5'
`flanking sequences (80.3%) is also high despite the presence of
`deletions. Once again, if gaps are scored as single differences,
`then the homology of the 5' flanking region becomes 88.0%,
`which is equivalent to the coding region homology. In fact, sev(cid:173)
`eral of the longest segments of perfect identity occur outside
`the coding region (see Fig. 3).
`Comparison Between Two Rat C,. Alleles. Table 2 shows the
`allotype-associated nucleotide sequence differences between
`the LOU and the DA genes and their distribution. Comparison
`of the LOU and DA sequences shows the same unusual features
`evident in the LOU/mouse comparison, in even more extreme
`form. While there are 11 amino acid changes between LOU and
`DA, there are only 12 nucleotide differences in the coding re(cid:173)
`gion (e.g., only one silent change). With the corrected percent
`divergence, this represents a ratio of silent to replacement
`changes of0.45. While there are 12 nucleotide changes in the
`coding region (318 bp) there is only one nucleotide change in
`the 3' untranslated region, two in the 3' flanking region, and
`six in the 5' flanking region, or a total of 9 changes in 854 bp.
`Thus the homology outside the coding region is several times
`higher than that found inside the coding region (Table 2). This
`difference between coding and noncoding region homology is
`significant at the 0. 0025 level.
`The low proportion of silent base changes in the coding region
`cannot be explained solely on the basis of similar strong codon
`preference between rat and mouse. The only way in which such
`preference would be expressed is by selection against "forbid-
`
`Table 2. Nucleotide sequence differences between LEW and DA
`C. alleles
`
`LEW
`
`DA
`
`Amino
`Acid
`
`Ser
`Met
`Thr
`Ser
`Thr
`Phe
`Val
`Ser
`Gin
`Val
`Glu
`Arg
`
`Position
`
`38
`76
`264
`266
`276
`428
`
`468
`485
`496
`499
`511
`523
`526
`578
`589
`671
`675
`689
`
`800
`
`Amino
`acid
`
`Base
`Base
`5' intron (6/447 = 1.34%)*
`G
`C
`G
`G
`A
`T
`
`A
`T
`A
`A
`G
`C
`
`Coding region (12/318 = 3.77%)*
`Ser
`T
`C
`Thr
`T
`C
`G
`Ala
`A
`Thr
`A
`T
`Ser
`A
`T
`C
`Leu
`T
`A
`Met
`G
`Thr
`C
`G
`A
`Arg
`G
`Ala
`T
`C
`Asp
`A
`C
`Ser
`T
`G
`
`3' untranslated region (1/200 = 0.50%)*
`C
`T
`
`967
`1012
`
`3' flanking region (2/207 = 0.97%)*
`A
`C
`A
`T
`• No. substitutions/total bases compared = % difference.
`
`
`
`7068
`
`Genetics: Sheppard and Gutman
`
`Proc. Natl Acad. Sci. USA 78 (1981)
`
`den" changes, which would have the effect.of increasing the
`. overall homology-between rat and mouse inside the coding re(cid:173)
`gion. The fact that the rate of accumulation ofsubstitutions ap(cid:173)
`pears to be the same or greater in DNA within the coding region
`compared with nearby flanking sequences speaks against this
`possibility.
`Nor can the argument be made that the paucity of "silent"
`changes in the coding region is due to a lack of selection at the
`level of the protein. The high degree of conservation of the
`three-dimensional structure. of lg domains (31) indicates that
`many random changes must be "forbidden. " In addition, as has
`been · pointed out previously (21), the spatial distribution of
`amino acid substitutions between the two Rl~l alleles is highly
`nonrandom.
`The paucity of"silent" changes in the coding region, as well
`as the pattern of flanking region homology between c. genes,
`indicates that there may be powerful constraints on the nu(cid:173)
`cleotide sequences of coding and noncoding sequences based
`upon some function of these DNA segments. Presumably this
`includes the conservation of DNA and RNA processing and
`splicing signals, as well as possible regulatory elements analo(cid:173)
`gous to the operators, promoters, and attenuators found in pro(cid:173)
`karyotic DNA. However, known examples of such signal sites
`are usually coded by small numbers of nucleotides relative to
`the large stretches of high homology found in the c. genes. It
`may be that the sequences required for these signals are con(cid:173)
`siderable more extensive than currently recognized or that ex(cid:173)
`tended tertiary structure is important in the alignment of these
`sequences. On the other hand, there may be additional con(cid:173)
`straints on the DNA sequence for which there are no clear ex(cid:173)
`planations at the present time. It has been suggested (11) that
`high homology in the flanking regions of immunoglobulin V
`region genes might be advantageous by providing a large target
`for intergenic recombination resulting in increased antibody
`diversity. In any case, there appear to be powerful constraints
`on the nucleotide sequence of the c. flanking regions and the
`silent sites of the coding region that are unrelated to the amino
`acid sequence of the coded polypeptide.
`Because these c. genes are very closely related, itis possible
`that there are relatively few silent changes during the early
`stages of divergence of all genes. However; this unusual pattern
`is not restricted to closely related c. genes. A comparison of
`human and mouse c. genes (9, 10) or human and mouse /3-glo(cid:173)
`bin also reveals a smaller proportion of silent changes than is
`found in the-comparison of human and murine a-globin or pre(cid:173)
`proinsulin genes (3). The ratio of corrected divergence at silent
`vs. replacement sites is 3.3 for c., 3.8 for /3-globin, 9.9 for a(cid:173)
`globin, and 14.6 for preproinsulin. Either c. and ~-globin have
`a different pattern of divergence or the pattern that is charac(cid:173)
`teristic of closely related genes is maintained during divergence
`of some genes and lost in others due to differences in the con(cid:173)
`straints on their coding sequences.
`The maintenance of flanking region homology, on the other
`hand, seems to be more characteristic of immunoglobulin
`genes. For example, the ratio of flanking region divergence vs.
`replacement site divergence is 1.4 for c. genes (mouse vs. hu(cid:173)
`man or mouse vs. rat), whereas the ratio among /3-globin genes
`is about 5 [average of ratios derived from published data (3)].
`Thus c. genes may have far fewer "neutral" positions that are
`free to accumulate changes at an accelerated rate, .both inside
`and outside the coding region, which in tum may be related to
`
`the DNA rearrangement and RNA processing events charac(cid:173)
`teristic of lg genes .
`
`We thank Ms: Susan Jasinski for excellent technical assistance, Dr.
`J. Seidman for having provided the C. probe (M-41C) used in the clon(cid:173)
`ing, Dr. E. Max (National Institutes of Health) for kindly supplying us
`with unpublished portions of the mouse C. sequence, and Dr. T. Hun(cid:173)
`kapillar (California Institute of Technology) for providing the computer(cid:173)
`generated Fig. 3. This work was supported in part by U.S. Public Health
`Service Grant Al-14774. H. W.S. was supported by National Research
`Service Award Predoctoral Training Grant GM-07134. G.A.G. is the
`recipient of Research Career Development Award Al-00286 from the
`U.S. Public Health Service.
`
`1. Bell, G. I., Pictet, R. L., Rutter, W. J., Cordell, B., Tischer, E.
`, & .Goodman, H. M. (1980) Nature (London) 284, 26-32.
`2. Seeburg, P. H., Shine, J., Martial, J. A., Baxter, J. D. & Good(cid:173)
`man, H. M. (1978) Nature (London) 270, 486-494.
`3. Perler, F., Efstratiadis, A., .Lomedico, P., Gilbert, W., Kolod(cid:173)
`ner, R. & Dodgson, J. (1980) Cell 20, 55.5-566.
`4. Nunberg, J. H. , Kaufman, R. J., Chang, A. C. Y., Cohen, S. N.
`& Schimke, R. T. (1980) Cell 19, 355-364.
`5. Gottlieb, P. D. (1980) Mol Immunol 17, 1423-1435.
`6. Efstratiadis, A. , Posakony, J. W., Maniatis, T., Lawn, R. M. ,
`O'Connell, C., Spritz, R. A., DeRiel, J. K., Forget, B. G.,
`Weissman, S. M., Slightom, J. L. , Blechl, A. E ., Smithies, 0 .,
`Baralle, F. E. , Shoulders, C. C. & Proudfoot, N. J. (1980) Cell
`21, 653-668.
`7. Lomedico, P., Rosenthal, N., Efstratiadis, A., Gilbert, W., Ko(cid:173)
`lodner, R. & Tizard, R. (1979) Cell 18, 545-558.
`8. Nishioka, Y. & Leder, P. (1979) Cell 18, 875-882.
`9. Max, E . E., Maize!, J. V. & Leder, P. (1981)] . Biol Chem . 256,
`5116-5120.
`10. Hieter, P. A., Max, E. E. , Seidman, J. G., Maize!, J. V. &
`Leder, P. (1980) Cell 22, 197-207.
`11. Seidman, J. G., Leder, A., Nan, M., Normal, B. & Leder, P.
`(1978) Science 202, 11-17.
`12. Blin, N. & Stafford, D. W. (1976) Nucleic Acids Res. 3,
`2303-2308.
`13. Southern, E. M. (1975) J. Mol Biol 98, 503-517.
`14. Enquist, L. , Tiemeier, D., Leder, P., Weisberg, R. & Stern(cid:173)
`berg, N. (1976) Nature (London) 259, 596-598.
`15. Hohn, B. & Murray, K. (1977) Proc. Natl Acad. Aci. USA 74,
`3259-3263.
`16. Benton, W. D. & Davis, R. W. (1977) Science 196, 180-182.
`17. Barnes, W. M. (1977) Science 195, 393-394.
`18. Meyers, J. A. , Sanchez, D., Elwell, L. P. & Falkow, S. (1976)] .
`Bacteriol 127, 1529-1537.
`19. Maxam, A. M. & Gilbert, W. (1977) Proc. Natl Acad. Sci. USA
`74, 560-564.
`20. Sheppard, H. W. & Gutman, G. A. (1981) Nature (London), in
`press.
`21. Gutman, G .. A., Loh, E. & Hood, L. (1975) Proc. Natl Acad. Sci.
`USA 72, 5046-5050.
`22. Starace, V. & Querinjean, P. (1975) J. lmmunol 115, 59-62.
`23. Wang, A. C., Fudenberg, H. H. & Bazin, H. (1976) Biochem .
`Genet. 14, 209-223.
`24. Hamlyn, P. H., Brownlee, G. G., Cheng, C. C. , Gait, M. N. &
`Milstein, C. (1978) Cell 15, 1067-1075.
`25. Breathnach, R., Benoist, C. , O'Hare, K., Gannon, F. & Cham(cid:173)
`bon, P. (1978) Proc. Natl Acad. Sci. USA 75, 4853-4857.
`26. Catterall, J. F., O'Malley, B. W., Robertson, M. A., Staden, R. ,
`Tanaka, Y. & Brownlee, G. G. (1978) Nature (London) 275,
`510-513.
`27. Proudfoot, N. J. & Brownlee, G. G. (1978) Nature (London) 263,
`211-214.
`28. Slightom, J. L., Blechl, A. E. & Smithies, 0 . (1980) Cell 21,
`627-638.
`29. Konkel, D. A .• Maize!, J. V. & Leder, P. (1979) Cell 18, ~73.
`30. Nishioka, Y. & Leder, P. (1979) Cell 18, 875-882.
`31. Poljak, R. J. (1978) Crit. Rev. Biochem. 5, 45--S4.
`
`