`0022-538X/81/080519-10$02.00/0
`
`Vol. 39, No. 2
`
`Nucleotide Sequences of the mRNA's Encoding the Vesicular
`Stomatitis Virus G and M Proteins Determined from cDNA
`Clones Containing the Complete Coding Regions
`JOHN K. ROSE* AND CAROL J. GALLIONE
`Tumor Virology Laboratory, The Salk Institute, San Diego, California 92138
`
`Received 24 March 1981/Accepted 20 April 1981
`
`The complete nucleotide sequences of the vesicular stomatitis virus mRNA's
`encoding the glycoprotein (G) and the matrix protein (M) have been determined
`from cDNA clones that contain the complete coding sequences from each mRNA.
`The G protein mRNA is 1,665 nucleotides long, excluding polyadenylic acid, and
`encodes a protein of 511 amino acids including a signal peptide of 16 amino acids.
`G protein contains two large hydrophobic domains, one in the signal peptide and
`the other in the transmembrane segment near the COOH terminus. Two sites of
`glycosylation are predicted at amino acid residues 178 and 335. The close corre-
`spondence of the positions of these sites with the reported timing of the addition
`of the two oligosaccharides during synthesis of G suggests that glycosylation
`occurs as soon as the appropriate asparagine residues traverse the membrane of
`the rough endoplasmic reticulum. The mRNA encoding the vesicular stomatitis
`virus M protein is 831 nucleotides long, excluding polyadenylic acid, and encodes
`a protein of 229 amino acids. The predicted M protein sequence does not contain
`any long hydrophobic or nonpolar domains that might promote membrane
`association. The protein is rich in basic amino acids and contains a highly basic
`amino terminal domain. Details of construction of the nearly full-length cDNA
`clones are presented.
`
`Vesicular stomatitis virus (VSV) buds from
`the surface of the host cell and thereby acquires
`a membrane that contains spikes composed of
`the single viral glycoprotein (G). Our previous
`studies have shown that G protein has a COOH-
`terminal basic domain of 29 amino acids which
`is internal to the lipid bilayer of the virion and
`an adjacent hydrophobic segment of 20 amino
`acids which spans the bilayer (27). The NH2-
`terminal 95% of G is external to the lipid bilayer
`and contains two asparagine-linked complex ol-
`igosaccharides (10, 21). G plays an essential role
`in binding of the virus to the host cell and
`presumably plays a role in directing budding of
`virus from the plasma membrane of the host cell
`(2,5).
`The G protein is synthesized on membrane-
`bound polyribosomes (18) and inserted into the
`rough endoplasmic reticulum as a nascent pro-
`tein chain (29, 34). A short hydrophobic signal
`sequence of 16 amino acids is cleaved from the
`NH2 terminus after insertion into the rough
`endoplasmic reticulum (12, 15, 22). G assumes a
`transmembrane configuration in the rough en-
`doplasmic reticulum (6, 13) with only a small
`COOH-terminal segment exposed on the cyto-
`plasmic face of the rough endoplasmic reticu-
`
`lum. Transport of G to the plasma membrane
`occurs via the Golgi apparatus (la), with inter-
`transport probably occuring via
`organelle
`coated vesicles (28). At a late stage of transport
`to the plasma membrane, one or two molecules
`of fatty acid are esterified to G (31).
`The VSV matrix protein (M) is thought to be
`a peripheral membrane protein that lines the
`inner surface of the virion envelope, perhaps
`interacting with the lipid bilayer, the internal
`portion of G, and the nucleocapsid core (re-
`viewed by Wagner [35]). In addition to a struc-
`tural function, M plays a role in directing bud-
`ding of virus from infected cells (14) and may be
`involved in regulating transcription of mRNA
`from the single negative strand of genomic RNA
`(4, 7, 9, 16).
`In contrast to G, M is synthesized on free
`polyribosomes (18) and associates rapidly with
`the plasma membrane fraction after synthesis
`(14). The nature of the association of M with the
`plasma membrane fraction is not clear. It could
`be by association with the lipid bilayer in regions
`containing G protein or by association with other -
`proteins such as those in nucleocapsids which
`are already associated with the plasma mem-
`brane (14).
`519
`
`Page 1 of 10
`
`KELONIA EXHIBIT 1028
`
`
`
`ROSE AND GALLIONE
`
`520
`Knowledge of the complete primary amino
`acid sequences of G and M proteins is clearly
`critical to a molecular understanding of their
`interactions with membranes and other cellular
`and viral components. Our previous studies have
`employed VSV cDNA clones that contain only
`fractions of the coding sequences from each
`mRNA (24, 27), and only terminal mRNA and
`protein sequences were reported. We report here
`the isolation and the complete nucleotide se-
`quences of cDNA clones that contain the entire
`coding sequences of G and M protein mRNA's.
`From these sequences we predict the amino acid
`sequences of the G and M proteins and discuss
`features of these predicted sequences.
`MATERIALS AND METHODS
`Materials. Reverse transcriptase was supplied by
`J. Beard, St. Petersburg, Fla. The Klenow fragment of
`DNA polymerase I and most restriction endonucleases
`were purchased from New England Biolabs, Beverly,
`Mass. Nuclease S1, PstI, and T4 polynucleotide kinase
`were from Boehringer Mannheim, Indianapolis, Indi-
`ana, and terminal deoxynucleotidyl transferase was
`from Bethesda Research Laboratories, Bethesda, Md.
`Oligo(dT)12-18 and oligo(dT) cellulose (T3) were from
`Collaborative Research, Waltham, Mass. [a-32P]dCTP
`and dGTP were from Amersham/Searle, Chicago, Ill.,
`and unlabeled dNTP's were from P-L Biochemicals,
`Milwaukee, Wis.
`RNA synthesis. VSV mRNA was synthesized in
`vitro in a 10-ml reaction exactly as described previ-
`ously (26), with the following modification. After 4 h
`the reaction was stopped by addition of sodium dode-
`cyl sulfate and sodium acetate to final concentrations
`of 1% and 0.5 M, respectively. The entire mixture was
`then passed through a column containing 0.5 g of
`oligo(dT)-cellulose, and the column was washed with
`10 ml of 0.4 M sodium acetate. The bound mRNA was
`eluted with distilled water and precipitated with
`ethanol. All mRNA was from the San Juan strain of
`the Indiana serotype of VSV.
`First-strand DNA synthesis and purification.
`First-strand DNA copies of the mRNA's encoding the
`N, NS, M, and G proteins were synthesized in a 4-ml
`reaction containing 300 ,ug of total VSV mRNA, 0.5
`mM dATP, dGTP, and dTTP, 0.25 mM [a-32P]-
`dCTP (10 Ci/mmol), 30 mM /3-mercaptoethanol, 120
`mM KCl, 10 mM MgCl2, 4,000 U of reverse transcrip-
`tase, 300 jig of oligo(dT)12 8 and a cytoplasmic extract
`(1 mg of total protein) from baby hamster kidney
`(BHK-21) cells prepared as described below. After
`incubation for 30 min at 42°C, the reaction mixture
`was extracted with phenol. Unincorporated dNTP's
`were separated from cDNA by chromatography of the
`aqueous phase on Sephadex G-50. The excluded frac-
`tion was lyophilized in a silicated glass tube. The yield
`of total cDNA from 300 ,ug of VSV mRNA was ap-
`proximately 40,ug as determined from the incorpora-
`tion of [a-32P]dCTP. The full-length cDNA copies of
`each mRNA species were purified by electrophoresis
`on a 1.5% alkaline agarose gel (30 mM NaOH, 5 mM
`
`J. VIROL.
`
`EDTA; 1.5 mm by 18 cm by 20 cm). The cDNA's were
`located by autoradiography (Fig. 1) or by staining with
`ethidium bromide and UV illumination. Stained bands
`were excised from the gel, electroeluted, and precipi-
`tated with ethanol. The yield of full-length cDNA's
`was approximately 0.7 jig of G, 2 ,ug of N, and 3 ytg of
`both NS and M. Reaction conditions given were opti-
`mized with respect to the concentrations of reverse
`transcriptase, KCl, and cell extracts to give maximal
`sizes of cDNA's.
`Cell extract preparation. BHK cells (8 x 108
`cells) growing in a 1-liter suspension were pelleted by
`centrifugation and suspended in 1.5 times the packed
`cell volume of 10 mM HEPES (N-2-hydroxyethylpi-
`perazine-N'-2-ethanesulfonic acid) (pH 7.5), 10 mM
`KCl, 1.5 mM magnesium acetate, and 7 mM fl-mer-
`captoethanol. Cells were broken by 50 strokes of a
`Dounce homogenizer and centrifuged at 15,000 x g for
`20 min. The supernatant was dialyzed for 16 h against
`1 liter of 10 mM HEPES (pH 7.5), 90 mM KCl, 1.5
`mM magnesium acetate, and 7 mM ,B-mercaptoetha-
`nol. The total protein concentration in the extract was
`15 to 20 mg/ml. All steps were carried out at 4°C, and
`samples of the extract were stored at -80°C.
`Second-strand cDNA synthesis. Reactions for
`second-strand DNA synthesis (200 pl) included 100
`mM HEPES buffer (pH 6.9), 10 mM MgCl2, 60 mM
`KCl, 1 mM dATP, dCTP, dGTP, and dTTP, 0.5 jig of
`a purified cDNA, and 5 U of the Klenow fragment of
`DNA polymerase I. Reactions were for 15 h at 15°C
`and were terminated by extraction with phenol fol-
`lowed by precipitation with ethanol. Samples were
`then resuspended in 75 pl of a solution containing 3
`mM ZnCl2, 30 mM sodium acetate (pH. 4.5), 300 mM
`NaCl, and 75 U of S1 nuclease. Digestion was for 30
`min at 370C. S1-treated DNAs were extracted with
`phenol and precipitated with ethanol and then purified
`by electrophoresis on a 6% polyacrylamide gel. The
`nearly full-length double-stranded DNAs were de-
`tected by autoradiography or staining with ethidium
`bromide. The amounts of double-stranded DNAs re-
`covered were 100 ng (G size class), 180 ng (N size
`class), and 450 ng (NS and M size class), as calculated
`from the cpm of 32p in the first-strand DNA. Double-
`stranded DNAs were electroeluted and precipitated
`with ethanol.
`Homopolymer addition and cloning. Homopol-
`ymer addition was by a modification of a published
`procedure (30). Addition of dCMP residues to double-
`stranded cDNA was for 10 to 30 min at 370C in a 25-
`pl reaction containing 100 to 500 ng of DNA, 0.2 mM
`dCTP, and an amount of terminal deoxynucleotidyl
`transferase (Tdt) that was known to add approxi-
`mately 10 to 20 dC residues to HaeIII fragments (5
`pmol of total ends) of pBR322 DNA. The number of
`dC residues added was calculated from the incorpo-
`ration of [a-32P]dCTP (100 Ci/mmol) that was in-
`cluded in test reactions. Units reported by the supplier
`were generally 10- to 50-fold greater than what we
`observed. Addition of 10 to 20 dG residues to PstI-
`cleaved pBR322 DNA was carried out as above (0.2
`mM dGTP) using 1 to 2 ,ug of DNA. The linear form
`of PstI-cleaved pBR322 was always purified by agarose
`gel electrophoresis and electroelution before tailing.
`
`Page 2 of 10
`
`
`
`VSV G AND M mRNA SEQUENCES
`
`521
`
`full-length VSV
`synthesis
`hanced the
`of
`mRNA's by the VSV virion transcriptase (1, 26).
`Because our initial reverse transcription experi-
`ments showed that the yields of full-length
`cDNA's of VSV G and N mRNA's were low
`relative to the yields of cDNA's of the smaller
`mRNA's (NS and M), we examined the effect of
`such an extract on reverse transcription. Figure
`1 shows the products of reverse transcription of
`total VSV mRNA synthesized in the absence
`and presence of the extract. The yields of full-
`length G and N cDNA's were increased 5- and
`2-fold, respectively, whereas the M and NS
`cDNA's were increased only marginally (1.2-
`fold). Because of the dramatic enhancement of
`the yield of large cDNA, the cell extract was
`included in all syntheses of single-stranded
`cDNA's for cloning. The mechanism of action of
`the cell extract is not known, but it presumably
`acts by inhibiting RNase or by promoting un-
`folding of mRNA secondary structure.
`G and M cDNA clones contain the com-
`plete coding sequences. Two recombinant
`plasmids, pG1 and pM309, were obtained which
`had insert sizes of ca. 1,700 and 850 nucleotides,
`respectively, consistent with their having nearly
`complete copies of the G and M mRNA se-
`quences. To determine the exact 5' endpoints of
`each cloned sequence relative to the mRNA, we
`determined the nucleotide sequence at the end
`of the insert corresponding to the 5' end of the
`RNA. The sequencing gel showing this region
`from the pGl insert is shown in Fig. 2. The
`sequence preceding the homopolymer tails is
`
`12
`
`-Origin
`
`4PG
`
`N N
`
`S,M
`
`FIG. 1. Reverse transcripts of VSVmRNA synthe-
`sized in the presence (lane 2) or absence (lane 1) of a
`cytoplasmic cell extract. Reverse transcripts were la-
`beled with [a-32P]dCTP and analyzed by electropho-
`resis on an alkaline agarose gel. Identities of the
`cDNA bands in the autoradiogram are indicated.
`
`VOL. 39, 1981
`
`Equimolar quantities of dC-tailed insert DNA and dG-
`tailed plasmid DNA were annealed for 15 min at 4°C
`in 10 p1 of 0.5 M NaCI-10 mM Tris (pH 7.4) and used
`to transform Escherichia coli strain C600 to tetracy-
`cline resistance (30). From 10 to 40 colonies were
`obtained per ng of insert DNA. Approximately 80% of
`the colonies obtained were ampicillin sensitive and
`had inserts at the PstI site. Small preparations of
`plasmid DNA (1 to 2 ug) were analyzed for the size of
`the insert DNA by PstI digestion. HaeIII and Hinfl
`digestions gave partial restriction maps for each insert.
`By comparing these maps with previous data (24, 25,
`27) we were able to assign the clones unambiguously
`to the appropriate mRNA's. Although apparently full-
`length double-stranded DNAs were used as starting
`material, a large fraction (80% for G) of the inserts had
`deletions at one or both ends.
`DNA sequence analysis. All sequence analysis
`was by the Maxam-Gilbert procedure as described
`previously (17, 25). DNA fragments (25 to 50 pmol)
`were prepared by restriction enzyme digestion, alka-
`line phosphatase treatment, polyacrylamide gel elec-
`trophoresis, and electroelution (25). End labeling was
`with 500 ,uCi of [y-32P]ATP and 2 U of polynucleotide
`kinase (24). Labeled DNA strands were separated by
`electrophoresis on thin 5% polyacrylamide gels (0.35
`mm by 16 cm by 40 cm) rather than the thicker gels
`suggested by Maxam and Gilbert (17). Thin gels gave
`better resolution of closely spaced DNA strands. Gels
`were run at 1,200 V with an electric fan for cooling,
`and strand separation of fragments up to 800 base
`pairs long was accomplished by using electrophoresis
`times of less than 8 h. Occasionally fragments for
`sequencing were obtained by cleavage with a second
`restriction enzyme followed by gel electrophoresis.
`Sequencing gels (0.35 mm by 16 cm by 40, 85, or 152
`cm) were electrophoresed at 1,500 to 6,000 V, such
`that the gel remained warm (>40°C) throughout the
`run. Reaction times were reduced substantially for
`samples on which the sequence was to be read for
`more than 300 nucleotides from the labeled end. Gen-
`erally 100,000 to 500,000 dpm of 32P-labeled fragment
`were loaded per lane on 152-cm gels so that exposure
`could be carried out at high resolution without flu-
`orescent screens. Long (85 or 152 cm) gels were cut in
`sections, transferred to old exposed sheets of X-ray
`film for backing, covered with plastic wrap, and ex-
`posed to Kodak XR-5 film for 1 to 4 days.
`RESULTS
`The basic strategy we employed to obtain
`nearly full-length cDNA clones was: (i) to opti-
`mize reverse transcription of full-length first-
`strand DNA copies of each mRNA; (ii) to use
`isolated, full-length first strands as templates for
`second-strand DNA synthesis; and (iii) to purify
`the nearly full-length double-stranded DNAs
`(obtained after Si nuclease treatment) by gel
`electrophoresis before cloning. All cloning was
`of dC-tailed inserts into the dG-tailed PstI site
`of pBR322 (3).
`In previous experiments we and others found
`that addition of a crude cell extract greatly en-
`
`Page 3 of 10
`
`
`
`522
`
`ROSE AND GALLIONE
`
`complementary to all but nine nucleotides at the
`5' end of the G mRNA sequence (24). Similarly,
`the sequence of the pM309 insert (Fig. 2) showed
`that it contained all but 19 nucleotides from the
`5' M mRNA sequence (24). Because the trans-
`lation initiation codons for the G and M mRNA's
`are located 30 and 42 nucleotides from the 5'
`ends of the mRNA's (23, 24), these clones clearly
`contain the sequences encoding the NH2 termini
`of G and M proteins. Subsequent sequence anal-
`ysis showed that these clones contained the 3'-
`terminal mRNA sequences as well.
`
`FIG. 2. Sequencing gels showing the sequences in
`the regions ofpGl andpM309 that correspond to the
`5'-proximal regions of G and M mRNA's. Panels: A,
`sequence from the 5'-proximal AluI site of the pGl
`insert through the C-tails. Panel B, sequence from
`the pBR322 HpaII site proximal to the 5' end of
`pM309 reading into the insert DNA. Numbers indi-
`cate the alignment of the sequences with the mRNA
`sequence (see Fig. 4 and 6).
`
`J. VIROL.
`
`The mechanism of formation of pGl and
`pM309 can be explained from the 5'-terminal
`sequences of the mRNAs (Fig. 4 and 6 of refer-
`ence 24). The priming of second-strand DNA
`synthesis in pG1 presumably occurred from the
`3' end of the cDNA looped back to base pair
`perfectly with positions 13 through 18, leaving
`an unpaired loop of six nucleotides. After second
`strand synthesis, digestion of the loop with nu-
`clease S1 must have left the unpaired nucleo-
`tides 10 through 12 intact because they appear
`in the clone. In the formation of pM309, the 3'
`end of the cDNA presumably looped back to
`pair with positions 21 through 24 and began
`priming. Nuclease S1 digestion was complete or
`nearly complete, generating a clone lacking 19
`nucleotides of the 5' mRNA sequence.
`Sequencing strategy for pGl. Initial re-
`striction mapping combined with previous par-
`tial sequences (24) allowed us to locate the two
`HaeIII sites within the pGl insert (Fig. 3). Initial
`sequences were then established from the cen-
`tral HaeIII site reading 520 nucleotides toward
`the 5' end of the mRNA and 350 nucleotides
`toward the 3' end. Additional sequence was de-
`termined from the 3' HaeIII site for 520 nucleo-
`tides toward the 5' end of the mRNA. This
`approach, with sequencing gels 152 cm long,
`allowed us to establish a tentative sequence for
`almost the entire G gene using only two restric-
`tion sites. A restriction map was then established
`from this sequence by computer analysis (32).
`Subsequently we were able to identify specific
`restriction fragments required to complete the
`sequence on both DNA strands and correct a
`few errors that were generated by reading se-
`quences more than 450 nucleotides from the
`labeled end. Figure 3 shows the locations of sites
`for restriction enzymes that were used in the
`sequence analysis and the specific regions se-
`quenced. Several EcoRII sites (CC T GG) were
`encountered in this analysis. These sequences
`showed gaps in the sequence ladder at the sec-
`ond C residue (20), and the sequences at these
`sites were always confirmed by sequencing the
`complementary strand.
`The complete sequence of the G mRNA pre-
`dicted from the sequence of the pGl clone (and
`previous data, references 24 and 27) is shown in
`Fig. 4. The sequence contains a single open
`reading frame for translation beginning at the
`5'-proximal ATG codon (positions 29 through
`31) and ending 100 nucleotides from the 3' end
`of the mRNA. That this is the reading frame
`encoding G protein has been confirmed by direct
`sequencing of G protein at both ends of the
`molecule (12, 15, 27).
`Nucleotide sequencing of the M mRNA. A
`
`Page 4 of 10
`
`
`
`VOL. 39, 1981
`
`a~
`D v)
`C
`
`VSV G AND M mRNA SEQUENCES
`
`523
`
`<i
`0
`0
`re)I,,,,H)PHHH1,,
`
`HI
`
`)I
`
`HH
`
`H
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`12
`
`13
`
`14
`
`15
`
`16
`
`.
`
`.
`
`genonc
`sequence
`FIG. 3. Restriction map of the pGFI insert DNA. Sites for restriction enzymnes that were used in sequencing
`are shown. The left end corresponds to the 5' end of G mRNA. Arrows indicate the regions sequenced and the
`direction of sequencing. Numbers are in hundreds of nucleotides starting from the 5' end of the mRNA.
`Terminal PstI sites are at the junctions ofpBR32 sequences with the dG:dC tails. The 5'-terrninal sequence
`of G mRNA determnined previously from the genomic sequence is indicated (27). The sequence of the 3'.
`terminal 470 nucleotides were reported previously from a smaller clone, but also were sequenced on at least
`one DNA strand in pGI.
`
`sequence from the 5' end of the M protein
`mRNA had been determined from the sequence
`of a DNA primer extended from the adjacent
`NS gene along the genome into the M mRNA-
`coding region (24). This sequence overlaps and
`agrees with the sequences determined from the
`pM309 insert (Fig. 5) and with the sequence of
`a partial M clone (pM101; L. Iverson, unpub-
`lished results; 12a). The remainder of the M
`mRNA sequence was determined by sequencing
`from the restriction sites indicated in Fig. 5. All
`regions were sequenced on both DNA strands or
`sequenced at least twice (from different sites) to
`ensure accuracy.
`The M mRNA is 831 nucleotides long without
`polyadenylic acid and contains a single open
`reading frame for translation extending from the
`5'-proximal ATG codon at positions 42 through
`44 to a TAG termination codon at positions 729
`through 831 (Fig. 6). Other reading frames are
`blocked extensively throughout the mRNA se-
`quence, leaving little doubt that the predicted
`M protein sequence is correct. Although no di-
`rect protein sequence is available for M, the
`sequence of the ribosome binding site deter-
`mined directly from the mRNA (23) is in com-
`plete agreement with the sequence predicted
`here from the cDNA clone. The sequence also
`matches the partial RNA sequence of the M
`mRNA determined by priming on genomic RNA
`from the adjacent NS gene (24).
`DISCUSSION
`VSV G protein structure and modifica-
`tion. The nucleotide sequence presented here
`for the VSV glycoprotein mRNA predicts a pro-
`tein of 511 amino acids (57,416 daltons), includ-
`ing the NH2-terminal signal sequence of 16
`amino acids that is not present on the mature
`protein (12, 15, 27). The predicted COOH-ter-
`minal sequence of G contains a hydrophobic
`domain of 20 amino acids (residues 463 through
`
`482) followed by a hydrophilic domain (residues
`483 through 511). We have presented evidence
`previously that the hydrophobic domain spans
`the lipid bilayer of the viral envelope, and that
`the COOH-terminal charged domain residues
`inside the viral envelope leaving more than 95%
`of the protein protruding from the virus (27).
`The COOH-terminal basic domain probably in-
`teracts with intemal virion proteins and cellular
`proteins during budding of the virus from the
`plasma membrane. This portion of G could also
`play a role in targeting the protein to the plasma
`membrane.
`The VSV G protein contains two apparently
`identical Asn-linked complex oligosaccharides
`(10, 21). The precise locations of the oligosac-
`charide attachment sites in G are of special
`interest because the timing of the glycosylation
`events relative to the timing of G protein syn-
`thesis has been examined in detail in vitro. Ad-
`dition of oligosaccharide chains by transfer from
`a lipid carrier occurs on the nascent polypeptide
`chain (29, 34). The first addition occurs when
`about 38% of the chain has been synthesized,
`and the second occurs when about 70% has been
`synthesized (29), assuming that the rate of pro-
`tein synthesis is uniform in the in vitro system.
`N-glycosidic linkage of oligosaccharides to pro-
`teins occurs at Asn-X-Ser or Asn-X-Thr se-
`quences (19). Inspection of the predicted G pro-
`tein sequence reveals 18 Asn residues, but only
`two of these (amino acid residues 178 and 335,
`Fig. 4) occur in canonical glycosylation se-
`quences. We therefore presume that these are
`the actual glycosylation sites. These sites are at
`fractional distances of 0.35 and 0.66 from the
`NH2 terminus. The nearly exact correspondence
`of the positions of these sites with fraction of G
`synthesized when glycosylation occurs suggests
`that transfer of oligosaccharides to the nascent
`protein chain occurs when the appropriate Asn
`residues traverse the rough endoplasmic reticu-
`
`Page 5 of 10
`
`
`
`A A C A G A G A T C G A T C T GfV-0- TAC-V V't-1!
`
`A iC Ah IC:
`
`f-Y'T^1
`78
`
`C T`Tt 47C TFT TWA60
`
`PRO
`GLY
`GLN
`ASN
`THR
`VAL
`0LYS
`ASN TRP
`:t VA- MK CTS
`fl0
`PHE
`ILE
`HIS
`PHE
`LYS
`TCATTGGGGTGAATTGCAAGTTCACCATAGTTTTTCCACACAACCAAAAAGGAAACTGGA
`70
`80
`90
`100
`110
`120
`
`LYS
`
`VAL
`ASN
`PRD
`ASN
`SER
`TYR
`TYR
`PRO
`HIS
`CYS
`TRP
`LEU
`ASN
`SER
`SER
`ASP
`SER
`ASN
`HIS
`AAAATGTTCCTTCTAATTACCATTATTGCCCGTCAAGCTCAGATTTAAATTGGCATAATG
`130
`140
`150
`160
`170
`180
`
`ASP
`
`LEU
`GLN
`GLY
`ILE
`MET
`LYS
`ILE
`VAL
`ALA
`LYS
`THR
`PR)
`LYS
`HIS
`ALA
`ILE
`5ER
`GLN
`ALA
`ASP
`ACTTAATAGG CACA GCCATACAAGTCAAAATGCCCAAGAGTCACAAGG C3TATTC AAGCA G
`200
`19
`210
`220
`20
`240
`
`ARG
`TRP
`GLY
`CYS
`MET
`SER
`ALA
`HIS
`LYS
`VAL
`TRP
`THR
`THR
`CYS
`TRP
`PHE
`ASP
`TYR
`GLY
`ACGGTTGGATGTGTCATGCTTCCAAATGGG TCACTACTTGTGATTTCCGCTGGTATGGAC
`270
`280
`260
`250
`290
`300
`
`PRD
`
`LYS
`TYR
`1Q1]. 5ERR
`ILE
`THR
`ARG
`Pr)
`ILE
`SER
`PHE
`GiLN
`SER
`GLU
`THR
`VAL
`GLV
`CYS
`C G ALASG TTART AIT AAAC ACG TSCECRA
`LYS
`5E
`AT CGA TCTTCACTCCAT T GTA GA ACA TGC AAG GAA A
`310
`330
`320
`0
`0
`350
`360
`
`GLV
`GLN
`ILE
`GLY
`GLN
`LYS
`ASN
`TRP
`PHE
`THR
`LE
`GLY
`GLN
`PR)
`THR
`PR
`CYS
`PR)
`S
`GCATTGAACAAACGAAACAAGGAACTTGGCTGAATCCAGGCTTCCCTCCTCAAAGTTGTG
`370
`380
`390
`400
`410
`420
`
`GLY
`
`TYR
`ALA
`VAL
`THR
`GLU
`ASP
`VAL
`ALA
`THR
`VAL
`ALA
`VAL
`ILE
`PR)
`GLN
`HIS
`HIS
`VAL
`THR
`IL
`G M40
`45& T G A T T G T C 4i60 G G T G A C T C 4C70 C A C C A T G T a C
`4A40T G
`GATATGCAA TGTGACGGATGCCGAAGCA~ GTGCGTAT~ACTT
`
`GLU
`ASP
`VAL
`TYR
`THR
`GLU
`GLY
`TRP
`GIL
`VAL
`ASP
`LYS
`ASN
`PHE
`ILE
`SER
`GLY
`CYS
`SER
`TGGTTGATGAATACACAGGAGAATGGGTTGATTCACAGTTCATCAACGGAAAATGCAGCA
`490
`500
`510
`520
`530
`540
`
`ASN
`
`.m
`PR)
`TYR
`ILE
`CYS
`VAL
`HIS
`THR
`TRP
`HIS
`ASPTYR
`SER
`THR
`A T T A C A T A 5G C C C C A C T G W6C C A T A A C T C W7A C A A C C T G G &A T T C T G A C T
`
`LYSVA L
`GLY
`LYS
`G T C A A8
`
`LEW
`CYS
`ASP
`MET
`SER
`ILE
`ASN
`LEU
`SER
`PHE
`PHE
`SER
`GLU
`ASP
`ILE
`THR
`ASP
`GLU
`GLY
`GGCTATGTGATTCTAACCTCATTTCCATGGACATCACCTTCTTCTCAGAGGACGGAGAGC
`610
`620
`630
`650
`640
`660
`
`LW
`
`LEW
`SER
`SER
`GLU
`GLY
`LYS
`GLY
`GLY
`ARG
`THR
`SER
`PHE
`GLU
`ALA
`PHE
`TYR
`ASN
`TYR
`THR
`TATCATCC%TGGGAAAGGAGGGCACAGG gTTCAGAAGTAACTACTTTG5 TATGAAACT2
`60
`680
`7007172
`690
`
`GLY
`
`GLY
`LYS
`CYS
`ALA
`LYS
`MET
`GLN
`TYR
`LYS
`CYS
`ARG
`HIS
`GLY
`TRP
`VAL
`LW
`GLY
`SER
`PR
`GAGGCAAGGCCTGCAAAATGCAATACTGCAAGCATTGGGGAGTCAGACTCCCATCAGGTG
`730
`740
`750
`760
`770
`780
`
`VAL
`
`GL]
`LEW
`PR)
`PR)
`TRP
`MET
`PHE
`GLU
`ASP
`ALA
`LYS
`ASP
`ALA
`CYS
`PHE
`PHE
`ARG
`ALA
`GLU
`ALA
`TCTGGTTC9AGATGGCTGATAAGGATCT ,TTGCTGCAi2CAGATTCCJGAATGCCCftg
`790
`800 1o 38
`
`GiLN
`LW
`PR)
`SER
`GLY
`SER
`ILE
`SER
`ALA
`SER
`GLN
`ASP
`VAL
`THR
`SER
`VAL
`SER
`ILE
`ASP
`VAL
`AAGGGTCAAGTATCTCTGCTCCATCTCAGACCTCAGTGGATGTAAGTCTAATTCAGGACG
`850
`860
`870
`880
`890
`900
`
`LEW
`T1R
`LEW
`GLU
`ARG
`ILE
`SER
`ASP
`TYR
`WRP
`GL
`GLN
`CYS
`SER
`GLY LW
`LYS
`ILE
`ARG
`ALA
`T T G A G A G G A T C T T G G A T T A W T C C C T C T G j3g A A G A A A CCC48 G A G C A A A A 5E A G A G C G G G
`
`PR)
`PR)
`LWU
`PR)
`PR) MA
`ILE
`ALA
`SER
`PR)
`VAL
`ASP
`LEU
`SER
`TYR
`LYS
`ASN
`GLY
`GLY
`TM
`TTCCAATCTCTCCAGTGGATCTCAGCTATCTTGCTCCTAAAAACCCAGGAACCGGTCCTG
`970
`980
`990
`1000
`1010
`1020
`~~A'"TA 'CCTAAAATACTTTGAGACCAGATACATCAGAGTCGATA
`TiR
`GUl
`TiE
`IIC ARG AGL
`LYS
`LEU
`ARG
`ASP
`1050
`1070
`1080
`
`PHETTILEA ILE
`CTTTCACCATAATCA
`1030
`
`1040
`
`PHE
`
`THR
`1060
`
`ILE
`
`PR)
`LEW
`T11
`ALA
`ALA
`MET
`ILE
`AR
`MET
`SER
`VAL
`GLY
`SER
`THR
`GLY
`IE
`GLU
`ARG
`THR
`TTGCTGCTCCAATCCTCTCAAGAATGGTCGGAATGATCAGTGGAACTACCACAGAAAGGG
`1090
`1100
`1110
`1120
`1130
`1140
`
`GLU
`
`PR)
`PR)
`LEU
`VAL LW
`TRP
`ASP
`ASP
`TRP
`ALA
`TYR
`GLU
`VAL
`GLU
`ASP
`GLY
`ASN
`GLY
`IIE
`AACTGTGGGATGACTGGG6t CC ATATGAG
`GGGAGTTCT1G
`Al8GTGGACCCAf
`ACGTGGA
`
`ARG
`
`THR
`SER
`SER
`GLY
`TYR
`IIE GY HIS
`LYS
`PHE
`PRO
`LEU
`TYR
`MET
`GLY
`ASP
`LEU
`SER
`Mr
`GGACCAGTTCAGGATATAAGTTTCCTTTATACATGATTGGACATGGTATGTTGGACTCCG
`1210
`1220
`1230
`1240
`1250
`1260
`
`ASP
`
`T
`
`G
`
`LWIUSLW 1W
`115
`AATTAAAGCAC ACCAA
`1510
`
`AGTAACTCAAATCCTGCACAACAGATTCTTCATGTTTGGACCAAATCAACTTGTGATACC
`1570
`1580
`1590
`1600
`1610
`1620
`ATGCTCAAAGAGGCCTCAATTATATTTGAGTTTTTAATTTTTATG
`1630
`1640
`1650
`1660
`
`524
`
`V'L
`GiL
`LEU
`HIS
`PR)
`LEU
`SER
`SER
`ALA
`GLN
`LYS
`PHE
`HIS
`ILE
`HIS
`ALA MA SE
`ASP
`GLN
`ATCTTCAT WTAGCTCAAAg GCTCAGGT TTCGAACAT5TC
`ATC GCCGC2
`ATGCCTTACAC128
`ATCTTCA1217
`90 T C
`PR)
`GLN
`LEU
`SER
`PR
`ASP
`LEU
`GLU
`ASP
`PHE
`GLY
`PHE
`GLY
`LEU
`ASP
`ASN
`LYS
`THR
`SER
`CGCAACTTCCTGATGATGAGAGTTTATTTTTTGGTGATACTGGGC TATCCAAAAATC CAA
`1330
`1340
`1350
`1360
`1370
`1380
`TRP
`SSE
`GiLU
`LW
`GiL
`ML
`GLY
`..
`SER
`TRP
`pHE
`U5
`LYS
`GAA 3125TATTOPT36144
`TCGAGCTTGTAGAAGGTTGGTTCAGAT
`A3ItTT
`1390
`1400
`1411
`142
`ALU. M:
`LW
`GLY
`CYS
`ILE
`GG4T CT C CCA GTTGGTATC CAT CT T TTG CATTA
`VAL
`CATAGGGTTAATC A
`1450
`1460
`1470
`1480
`1490
`1500
`~GilN
`IL.8
`(LU
`MET
`WI
`S L
`L
`TYR
`ASP
`ASN
`ILE
`THR
`AAA ACAG ATT TATACAGACATAGAGAT G1AAC
`A CTTGGAA
`1520
`1530
`1540
`1550
`1560
`
`ILE
`
`ILE
`
`I
`
`Page 6 of 10
`
`
`
`VOL. 39, 1981
`
`VSV G AND M mRNA SEQUENCES
`
`525
`
`-4
`
`if
`0)
`0
`
`e .C
`It
`
`i
`
`H
`
`z
`
`2
`
`14
`m
`
`3
`
`M
`I
`
`4
`
`,<, I
`
`I
`
`r
`6
`
`5
`
`H I
`
`)
`
`0.
`II
`
`8
`
`Il
`
`7
`
`genomic
`sequence
`FIG. 5. Restriction map of the pM309 insert DNA. Sites for restriction enzymes used in sequencing are
`shown. The left end corresponds to the 5' end of M mRNA. Arrows indicate the regions and directions
`sequenced; numbers represent hundreds of nucleotides. Arrows labeled A and B represent sequences deter-
`mined from cDNA clones pM32 and pM37 which contain smaller portions of the M sequence than pM309.
`Terminal PstI sites are at the junctions ofpBR322 sequences with the dG:dC tails. The 5'-terminal mRNA
`sequence (genomic sequence) reported previously is indicated (24).
`
`At-
`
`AA
`
`lum. Extensive folding of the polypeptide chain
`on the COOH-terminal site of the glycosylation
`sites is apparently not required to specify gly-
`cosylation.
`One to two molecules of fatty acid are esteri-
`fied to G protein at a late stage in passage of G
`from the rough endoplasmic reticulum to the
`plasma membrane (31). The evidence suggests
`that the fatty acid is esterified to serine residues
`(31). Our preliminary evidence (W. J. Welch, B.
`M. Sefton, and J. K. Rose, unpublished data)
`indicates that [3H]palmitate label can be found
`associated specifically with the 64-amino acid,
`membrane-protected tailpiece of G isolated after
`proteolysis of intact virions (27). There are only
`five serine residues in this portion of G, and they
`are clustered around the NH2-terminal side of
`the hydrophobic domain (Fig. 4). Presumably,
`one or more of these residues are esterified to
`fatty acid. Fatty acid esterified in this region
`could serve the obvious function of promoting
`stable association of the hydrophobic protein
`domain with the membrane.
`Structure and function of VSV M. The
`nucleotide sequence of the VSV M mRNA pre-
`sented here predicts a protein sequence of 229
`amino acids (26,064 daltons). The predicted
`amino-terminal sequence of M is highly basic.
`Eight lysine residues occur in the first 19 posi-
`tions, and these are the only charged residues in
`this region (Fig. 6). A triple proline sequence
`separates this domain from the remainder of the
`
`molecule. There is evidence from analysis of M
`protein mutants and from in vitro studies that
`M may regulate VSV transcription (4, 7, 9, 16).
`This basic domain might play a role in such
`regulation, perhaps by interacting with the ge-
`nomic RNA. The predicted sequence indicates
`that M is the most basic of the VSV N, NS, M,
`and G proteins, having 21 Lys, 10 Arg, and 8 His
`residues and only 13 Asp and 13 Glu residues.
`Although there is no direct amino acid sequence
`analysis available for M protein, this basic char-
`acter is consistent with its isoelectric point de-
`termined by gel electrophoresis (4). Our at-
`tempts to obtain an NH2-terminal sequence from
`M have been unsuccessful, suggesting that the
`NH2 terminus is blocked.
`Evidence indicates that M protein is released
`into the soluble fraction of infected cells after
`synthesis. It then associates rapidly with the
`plasma membrane fraction (14). It is not clear,
`however, whether there is any direct association
`of M with the lipid bilayer of the plasma mem-
`brane. There is evidence that M will associate
`with membrane fractions from uninfected cells,
`although it is not clear that this interaction is
`biologically significant (8). Inspection of the pre-
`dicted M protein sequence (Fig. 6) does not
`reveal any long hydrophobic or nonpolar do-
`mains that might be inserted into the mem-
`brane. This situation contrasts with that of the
`influenza virus matrix protein which has a cen-
`tral hydrophobic domain that may be membrane
`
`FIG. 4. Nucleotide sequence of the VSV G mRNA and the predicted protein sequence. The shaded
`nucleotide sequence is the ribosome binding site. NH2-terminal and COOH-terminal hydrophobic domains
`are shaded, as are the two potential glycosylation sites and basic residues in the COOH-terminal hydrophilic
`domain. The bracketed G residue (317) appeared clearly as a G on two sequencing gels of the DNA strand
`shown. The sequence of the complementary DNA strand using the same DNA preparation showed a clear A
`residue instead of the expected C in this position in two separate experiments. This residue and the amino
`acid encoded should therefore be considered tentative. Nucleotide number one is linked to the 5' cap in the
`mRNA sequence.
`
`Page 7 of 10
`
`
`
`526
`
`ROSE AND GALLIONE
`
`AACAGATATCACGATCTAAGT
`10
`20
`
`G.7...C77CIL
`
`Gtt*$£e:q-bt.C¢.-s-T-C: C.¢.....-m..f..
`
`J. VIROL.
`
`LEW
`
`GLY
`
`LEU
`
`GLY
`
`GGfA
`
`LEU
`
`GLY
`
`AGGGATCG^CACcACfi
`
`ILE
`
`ALA
`
`TYR
`
`MET GW TI
`TH
`GLU
`SM ALA
`ASP
`MASP
`ALA
`ILE
`G
`PHE
`TE
`SE
`LYS
`PR
`PR
`SER
`0 4