`S. Wrighton, Acc. Chem. Res. 12, 303 (1979); A.
`J. Nozik, Annu. Rev. Phys. Chem. 29, 189
`(1978).
`3. J. R. Bolton, Science 202, 705 (1978).
`4. P. R. Ryason, Sol. Energy 19, 445 (1977).
`5. E. Collinson, F, S. Dainton, M. A. Malati,
`Trans. Faraday Soc. 55, 2096 (1959).
`6. L. J. Heidt, M. G. Mullin, W. B. Martin, Jr., A.
`M.J. Beatty, J. Phys. Chem. 66, 336 (1962).
`7. L. J. Heidt and A. F. McMillan, J. Am. Chem.
`Soc. 76, 2135 (1954).
`8. P. K. Eidem, A. W. Maverick, H. B. Gray,
`Inorg. Chim. Acta 50, 59 (1981).
`9. V. Balzani, F. Bolletta, M. T. Gondolfi, M.
`Maestri, Top. Curr. Chem. 75,
`1 (1978).
`10. H. D. Gafney and A. W. Adamson, J. Am.
`Chem. Soc. 94, 8238 (1972).
`il. FS) orueeer and M. Gratzel, ibid. 102, 2461
`(
`.
`12. Y. Tsutsui, K. Takuma, T. Nishijima, T. Mat-
`suo, Chem. Lett, (1979), p. 617.
`13. For a discussion of methods for optimizing the
`performance of these heterogeneous catalysts,
`see J.-M. Lehn, J.-P. Sauvage, R. Ziessel,
`Nouv. J. Chim. 5, 291 (1981); P.-A. Brugger, P.
`Cuendt, M. Gratzel, J. Am. Chem. Soc. 103,
`2923 (1981).
`14. K. Kalyanasundaram and M. Gratzel, Angew.
`Chem. Int. Ed. Engi. 18, 701 (1979); J. Kiwi, E.
`Borgarello, E. Pelizzetti, M. Visca, M. Gratzel,
`ibid. 19, 646 (1980).
`15. G. M. Brown, B. S. Brunschwig, C. Creutz, J.
`F. Endicott, N. Sutin, J. Am. Chem. Soc. 101,
`1298 (1979).
`16. N. Serpone, M. A. Jamieson, M. S. Henry, M.
`Z. Hoffman, F. Boletta, M. Maestri,
`ibid., p.
`2907.
`
`17. D. Miller and G. M. McLendon, personal com-
`munication.
`18. W. C. Trogler, G. L. Geoffroy, D. K. Erwin, H.
`B. Gray, J. Am. Chem. Soc. 100, 1160 (1978).
`19. D. K. Erwin, G. L. Geoffroy, H. B. Gray, G. S.
`Hammond, E. I. Solomon, W.C. Trogler, A. A.
`Zagars, ibid. 99, 3620 (1977).
`20. P. K. Eidem,
`thesis, California Institute of
`Technology (1981).
`21. D.R. Tyler and H. B. Gray, J. Am. Chem. Soc.
`103, 1683 (1981).
`22. K. R. Mann, N. S. Lewis, V. M. Miskowski, D.
`K. Erwin, G. S. Hammond, H. B. Gray, ibid.
`99, 5525 (1977).
`23. H. B. Gray et al., in Fundamental Research in
`Homogeneous Catalysis, M. Tsutsui, Ed. (Plen-
`um, New York, 1979), vol. 3, p. 819.
`24. V.M. Miskowski, I. S. Sigal, K. R. Mann, H. B.
`Gray, S. J. Milder, G. S. Hammond, P. R.
`Ryason, J. Am. Chem. Soc. 101, 4384 (1979).
`25. 1. S. Sigal, K. R. Mann, H. B. Gray, ibid. 102,
`7252 (1980).
`26. K. R. Mann, M. DiPierro, T. P. Gill, ibid., p.
`9653965.
`27. I. S. Sigal and H. B. Gray, ibid. 103, 3330 (1981).
`28. T. P. Smith and H. B. Gray, paper presented at
`the 180th National Meeting of the American
`Chemical Society, Las Vegas. Nev., 24 to 29
`August 1980.
`29. V.M. Miskowski, G. L. Nobinger, D. S. Kliger,
`G. S. Hammond, N. S. Lewis, K. R. Mann, H.
`B. Gray, J. Am. Chem. Soc. 100, 485 (1978).
`30. S. J. Milder, R. A. Goldbeck, D. S. Kliger, H.
`B. Gray, ibid. 102, 6761 (1980).
`31. S. F. Rice and H. B. Gray,
`ibid. 103, 1593
`(1981).
`32. R. F. Dallinger, V. M. Miskowski, H. B. Gray,
`W. H. Woodruff, ibid., p. 1595.
`
`33. J. S. Najdzionek, unpublished results.
`34. A. W. Maverick and H. B. Gray, J. Am. Chem.
`Soc. 103, 1298 (1981).
`35. V.M. Miskowski, R. A. Goldbeck, D. S. Kliger,
`H.B. Gray, Inorg. Chem. 18, 86 (1979).
`36. V. M. Miskowski, A. J. Twarowski, R. H.
`Fleming, G. S. Hammond,D.S. Kliger, ibid. 17,
`1056 (1978).
`37. D. G. Nocera and H. B. Gray, J. Am. Chem.
`Soc., im press.
`38. P. B. Fleming and R. E. McCarley, Inorg.
`Chem. 9, 1347 (1970):
`39, J. A. Ferguson and T. J. Meyer, J. Am. Chem.
`Soc. 94, 3409 (1972).
`40. H.-D. Scharf, J. Fleischhauer, H. Leismann, I.
`Ressler, W. Schleker, R. Weitz, Angew. Chem.
`Int. Ed. Engl. 18, 652 (1979).
`41. For several years our research in inorganic and
`organometallic photochemistry has been sup-
`ported by grants from the National Science
`Foundation (Chemical Dynamics Program). Col-
`laboration with researchersat the Jet Propulsion
`Laboratory and with Dr. D. S. Kliger and his
`group at Santa Cruz has been aided by grants
`from the Continental Group Foundation, the Jet
`Propulsion Laboratory Director’s Discretionary
`Fund, and the U.S. Department of Energy.
`Instrumentation was obtained with grants from
`the National Science Foundation (CHE78-
`10530) and from the Union Oil Company of
`California Foundation. Certain rhodium and
`iridium salts used in our research werelent to us
`by Johnson Matthey, Inc. A.W.M. acknowl-
`edges the National Science Foundation (1977 to
`1980) and the Standard Oil Co. (Ohio) (1980 and
`1981) for graduate fellowships. This is contribu-
`tion No. 6415 from the Arthur Amos Noyes
`Laboratory.
`
`Determination of Nucleotide
`Sequences in DNA
`
`Frederick Sanger
`
`of about 50 nucleotides in length were
`obtained with this method (4, 5), but it
`seemed that to be able to sequence ge-
`netic material a new approach wasdesir-
`able and we turned our attention to the
`use of copying procedures.
`
`Copying Procedures
`
`In the RNAfield these procedures had
`been pioneered by C. Weissmannandhis
`colleagues (6)
`in their studies on the
`RNA sequenceof the bacteriophage QB.
`Phage Q§ contains a replicase that will
`synthesize a complementary copyof the
`single-stranded RNA chain,
`starting
`from its 3’ end. These workers devised
`elegant procedures involving pulse-label-
`ing with radioactively labeled nucleo-
`tides, from which sequences could be
`deduced.
`For DNA sequences we have used the
`enzyme DNApolymerase, which copies
`single-stranded DNA as shownin Fig. 1.
`The enzymerequires a primer, whichis a
`single-stranded oligonucleotide having a
`sequence that is complementary to, and
`therefore able to hybridize with, a region
`on the DNA being sequenced (the tem-
`plate). Mononucleotide residues are add-
`
`quences. The general approach used in
`these studies, and in those on proteins,
`depended on the principle of partial deg-
`radation. The large molecules were bro-
`ken down, usually by suitable enzymes,
`to give smaller products which were then
`separated from each other, and their
`sequence was determined. When suffi-
`cient results had been obtained they
`were fitted together by a process of
`deduction to give the complete se-
`quence. This approach was necessarily
`rather slow and tedious, often involving
`successive digestions andfractionations,
`and it was not easy to apply it to the
`larger DNA molecules. When wefirst
`studied DNA somesignificant sequences
`
`In spite of the importantrole played by
`DNA sequences in living matter,
`it
`is
`only relatively recently that general
`methods for their determination have
`been developed. This is mainly because
`of the very large size of DNA molecules,
`the smallest being those of the simple
`bacteriophages such as $X174 (which
`contains about 5000 base pairs). It was
`therefore difficult
`to develop methods
`with such complicated systems. There
`are, however,
`some relatively small
`RNA molecules—notably the transfer
`RNA’s of about 75 nucleotides, and
`these were used for the early studies on
`nucleic acid sequences (/).
`Following my work on amino acid
`sequences in proteins (2) I turned my
`attention to RNA and, with G. G.
`Brownleeand B. G. Barrell, developed a
`relatively rapid small-scale method for
`the fractionation of **P-labeled oligonu-
`cleotides (3). This became the basis for
`most subsequent studies of RNA se-
`SCIENCE, VOL. 214, 11 DECEMBER1981
`
`Copyright © by the Nobel Foundation.
`The authoris head of the Division of Protein and Nucleic Acid Chemistry at the MRC Laboratory of
`Molecular Biology, Hills Road, Cambridge CB2 2QH, England. This article is the lecture he delivered in
`Stockholm, Sweden, on 8 December 1980, when he received the Nobel Prize in Chemistry, a prize he shared
`with Walter Gilbert and Paul Berg. Minor corrections and additions have been made by the author. The
`article is published here with the permission of the Nobel Foundation and will also be included in the
`complete volume of Les Prix Nobel en 1980 as well as in the series Nobel Lectures (in English) published by
`the Elsevier Publishing Company, Amsterdam and New York. Dr. Berg’s lecture appeared in the 17 July
`issue, page 296. Dr. Gilbert’s lecture will be published in a subsequent issue.
`0036-8075/8 1/121 1-1205301.00/0 Copyright © 1981 AAAS
`
`1205
`
`Oxford, Exh. 1012, p. 1
`
`Oxford, Exh. 1012, p. 1
`
`
`
`to size could be obtained by ionophore-
`sis on acrylamidegels.
`The method described above met with
`only limited success but we were able to
`develop two modified techniques that
`depended on the same general principle,
`and these provided a simpler and much
`more rapid method of DNA sequence
`determination than anything we had used
`before (9). This, which is known as the
`“plus and minus’’ technique, was used
`to determine the almost complete se-
`quence of the DNA of bacteriophage
`X174, which contains 5386 nucleotides
`(0).
`
`The ‘‘Dideoxy’’ Method
`
`on fractionation would be proportional
`to the distribution of the A’s along the
`DNAchain. Andthis, together with the
`distribution of the other three mononu-
`cleotides, is the information required for
`sequence determination. Initial experi-
`ments carried out with J. E. Donelson
`suggested that this approach could be the
`basis for a more rapid method, and it was
`found that good fractionations according
`
`reeOTT
`rT ad
`|
`~sbocceas
`Eebebetel
`+ PPo+ PPP.
`s
`
`F ote.
`
`1. Specificity requirements for DNA
`Fig.
`polymerase.
`
`° o
`
`~OCH,
`OCH
`
`NR
`
`A
`oNN~
`pa
`%
`‘O,P. OCH.
`
`~OCH,
`
`NR
`
`Oo
`HNNes
`©
`A
`OHs p.ppOCre
`
`DNA
`polymerase
`
`OH
`
`TTP
`
`etc
`
`OH
`
`ed sequentially to the 3’ end of the
`primer from the corresponding deoxynu-
`cleoside triphosphates, making a com-
`plementary copy of the template DNA.
`By using triphosphates containing **P in
`the a position,
`the newly synthesized
`DNAcan belabeled. In the early experi-
`ments synthetic oligonucleotides were
`used as primers, but after the discovery
`of restriction enzymes it was more con-
`venient to use fragments resulting from
`their action as they were much more
`easily obtained.
`The copying procedure was used ini-
`tially to prepare a short, specific region
`of labeled DNA which could then be
`subjected to partial digestion proce-
`dures. One of the difficulties of sequenc-
`ing DNA wasto find specific methods
`for breaking it down into small
`frag-
`ments. No suitable enzymes were known
`that would recognize only one nucieo-
`tide. However, Berg, Fancher,
`and
`Chamberlin (7) had shown earlier that
`under certain conditions it was possible
`to incorporate ribonucleotides, in place
`of the normal deoxyribonucleotides, into
`DNA chains with DNA polymerase.
`Thus, for instance, if copying were car-
`ried out with ribo CTP (7a) and the other
`three deoxynucleoside triphosphates, a
`chain could be built up in which the C
`residues were in the ribo form. Bonds
`involving ribonucleotides could be bro-
`ken by alkali under conditions where
`those involving the deoxynucleotides
`were not, so that a specific splitting at C
`residues could be obtained. Using this
`method we were able to extend our se-
`quencing studies to some extent
`(8).
`However extensive fractionations and
`analyses werestill required.
`
`The ‘“‘Plus and Minus’’ Method
`
`More recently we have developed an-
`other similar method that uses specific
`chain-terminating analogs of the normal
`deoxynucleoside
`triphosphates
`(//).
`This method is both quicker and more
`accurate than the plus and minus tech-
`nique. It was used to complete the se-
`quence of 6X174 (/2), to determine the
`sequence of a related bacteriophage, G4
`(/3), and has now been applied to mam-
`malian mitochondrial DNA (mtDNA).
`The analogs most widely used are the
`dideoxynucleoside triphosphates
`(Fig.
`2). They are the same as the normal
`deoxynucleoside triphosphates but lack
`the 3' hydroxyl group. They can be in-
`corporated into a growing DNA chain by
`DNApolymerase but act as terminators
`because, once they are incorporated, the
`chain contains no 3’ hydroxyl group and
`so no other nucleotide can be added.
`Theprinciple of the method is summa-
`rized in Fig. 3. Primer and template are
`denatured to separate the two strands of
`the primer, which is usually a restriction
`enzyme fragment, and then annealed to
`form the primer-template complex. The
`mixture is then divided into four sam-
`In the course of these experiments we
`ples. One (the T sample) is incubated
`needed to prepare DNA copies of high
`with DNA polymerasein the presence of
`specific radioactivity, and in order to do
`a mixture of ddTTP (dideoxythymidine
`this the highly labeled substrates had to
`triphosphate) and a low concentration of
`be present in low concentrations. Thusif
`[x-*P]dATP was used for labeling,
`its
`TTP, together with the other three de-
`concentration was much lowerthan that
`oxynucleoside
`triphosphates
`(one of
`ATP+ddATP
`ATP
`ATP
`ATP
`which is labeled with **P) at normal
`of the other three triphosphates and fre-
`
`
`GIP GTP+ddGTP=GTPGTP
`concentration. As the DNA chains are
`
`
`cre CTP+diCTP=«CTP cTP
`quently when we analyzed the newly
`
`TIP +cdTTP=TTP TIP TIP
`
`built up on the 3’ end of the primer, the
`synthesized DNA chains we foundthat
`position of the T’s will be filled, in most
`they terminated at a position immediate-
`cases by the normal substrate T, and
`ly before that at which an A should have
`extended further, but occasionally by
`been incorporated. Consequently a mix-
`ddT and terminated. Thus at the end of
`ture of products was producedall having
`incubation there remains a mixture of
`the same 5’ end (the 5’ end of the primer)
`chains terminating with ddT at their 3’
`and terminating at
`the 3’ end at
`the
`end butall having the same 5’ end(the 5’
`position of the A residues. If these prod-
`end of the primer). Similar incubations
`ucts could be fractionated on a system
`are carried out in the presence of each of
`that separated only on the basis of chain
`the other three dideoxy derivatives, giv-
`length, the pattern of their distribution
`SCIENCE, VOL. 214
`1206
`
`6
`~
`°
`NR
`° eos OCH9, TN
`A\ILS
`
`~OCH,
`
`OH +p epOCa
`
`®op — OCH;6
`
`dd TTP
`
`Fig. 2. Diagram showing chain termination
`with dideoxythymidine triphosphate (ddTTP).
`The top line shows -the DNA polymerase-
`catalyzed reaction of the normal deoxynu-
`cleoside triphosphate (TTP) with the 3’ termi-
`nal nucleotide of the primer: the bottom line
`the corresponding reaction with ddTTP.
`
`x
`Template oo TAGCAACT
`Primer hii
`ONA
`
`+
`
`—ATCGTdaT
`~~ ATCGddT
`
`—ATddC
`
`39=—ATOGTTadG —ATCGTTddA
`
`ATCGA
`
`Fig. 3. Principle of the chain-terminating
`method.
`
`Oxford, Exh. 1012, p. 2
`
`Oxford, Exh. 1012, p. 2
`
`
`
`prime into, and thus determine, a se-
`quence of about 200 nucleotides in the
`inserted DNA. Smaller synthetic primers
`have now been prepared (/9, 20) which
`allow longer sequences to be deter-
`mined. The approach that we have used
`is to prepare clones at random from
`restriction enzymedigests and determine
`the sequence with the flanking primer.
`Computer programs(2/) are then used to
`store, overlap, and arrange the data.
`Another important advantage of the
`cloning technique is that
`it
`is a very
`efficient and rapid method offractionat-
`ing fragments of DNA. In all sequencing
`techniques, both for proteins and nucleic
`acids, fractionation has been an impor-
`tant step, and major progress has usually
`been dependent on the development of
`new fractionation methods. With the
`new rapid methods for DNA sequencing,
`fractionation is still important; and, as
`the sequencing procedure itself is be-
`coming more rapid, more of the work has
`involved fractionation of the restriction
`enzyme fragments by electropitvesis on
`acrylamide. This becomes increwemgly
`difficult as larger DNA molecules are
`studied and mayinvolve several succes-
`sive fractionations before pure fragments
`are obtained. In the new method these
`fractionations are replaced by a cloning
`procedure. The mixture is spread on an
`agar plate and grown. Each clone repre-
`sents the progeny of a single molecule
`and is therefore pure,
`irrespective of
`how complex the original mixture was. It
`is particularly suitable for studying large
`DNA’s. In fact, there is no theoretical
`limit to the size of DNA that could be
`sequenced by this method.
`We have applied the method to frag-
`ments from mtDNA (22, 23) and to bac-
`teriophage \ DNA. Initially new data can
`be accumulated very quickly (underide-
`al conditions at about 500 to 1000 nucleo-
`tides a day). However, at
`later stages
`much of the data produced will be in
`regions
`that have already been se-
`quenced, and progress then appears to.
`be much slower. Nevertheless, we find
`that most new clones studied give some
`useful data, either for correcting or con-
`firming old sequences. Thus, in the work
`with bacteriophage } DNA, we have
`about 90 percent of the molecule identi-
`fied in sequences and most of the new
`clones we study contribute some new
`information. In mogt studies on DNA
`one is concerned with identifying the
`reading frames for protein genes, and to
`do this the sequence must be correct.
`Errors can readily occur in such exten-
`sive sequences and confirmation is al-
`ways necessary. We usually considerit
`necessary to determine the sequence of
`$207
`
`Oxford, Exh. 1012, p. 3
`
`(@stMiseeeOe
`
`—
`
`ae
`
`—
`
`fe
`{
`
`sete
`hte
`
`—
`oaoe
`cd
`—
`
`“
`
`a — =
`ae
`_——
`se
`neee
`
`tes
`os
`~—
`ee
`—=
`
`——
`
`bees Sel—
`
`-_
`
`—
`
`id
`ac
`<s
`~~.
`~el
`wie —
`=
`—
`
`—
`
`=
`
`—
`—_
`
`—
`
`tages. Oneis thatit is possible to use the
`same primer on all clones. Heidecker et
`al.
`(18) prepared a 96-nucleotide-long
`restriction fragment derived from a posi-
`tion in the M13 vector flanking the
`Eco RIsite (Fig. 6). This can be used to
`
`2.5 hours
`G AT Cc
`
`5 hours
`TC
`A
`
`G
`
`{
`
`=
`
`<1
`
`-
`~ baa
`-=-—
`—
`
`—
`
`— p
`
`as
`
`2
`—_
`
`2
`
`fee
`
`ee
`
`—
`
`nn
`
`—
`
`Fig. 4. Autoradiograph of a DNA sequencing
`‘gel. The origin is at the top and migration of
`the DNA chains, according to size, is down-
`ward. The gel on the left has been run for 2.5
`hours andthat ontheright for 5 hours with the
`same polymerization mixtures.
`
` ee
`it@eea61(CUROeeeeeeee|{ @e
`
`te —
`ad
`sie
`om
`
`—
`—_
`
`ee
`am
`—
`—
`
`=z.
`Leal
`—
`
`ing mixtures terminating at the positions
`of C, A, and G, respectively, and the four
`mixtures are fractionated in parallel by
`electrophoresis on acrylamide gel under
`denaturing conditions. This system sepa-
`rates the chains according to size, the
`small ones moving quickly and the large
`ones slowly. As all the chains in the T
`mixture end at T, the relative position of
`the T’s in the chain will define the rela-
`tive sizes of the chains, and therefore
`their relative positions on the gel after
`fractionation. The actual sequence can
`then simply be read off from an autora-
`diograph of the gel (Fig. 4). The method
`is comparatively rapid and accurate, and
`sequencesof up to about 300 nucleotides
`from the 3’ end of the primer can usually
`be determined. Considerably longer se-
`quences have been read off but these are
`usually jess reliable.
`One problem with the methodis that it
`requires single-stranded DNA as tem-
`plate. This is the natural form of the
`DNA in the bacteriophages &X174 and
`G4, but most DNA is double-stranded
`and it is frequently difficult to separate
`the two strands. One way of overcoming
`this was devised by A. J. H. Smith (/4).
`If the double-stranded linear DNA is
`treated with exonuclease III (a double-
`strand specific 3’ exonuclease) each
`chain is degraded from its 3' end, as
`shownin Fig. 5, giving rise to a structure
`that is largely single-stranded and can be
`used as template for DNA polymerase
`with suitable small primers. This method
`is particularly suitable for use with frag-
`ments cloned in plasmid vectors and has
`been used extensively in the work on
`human mtDNA.
`
`Cloning in Single-Stranded .
`
`Bacteriophage
`
`Another method of preparing suitable
`template DNAthat is being more widely
`used is to clone fragments in a single-
`stranded bacteriophage vector (/5-17).
`This approach is summarized in Fig. 6.
`Various vectors have been described.
`We have used a derivative of bacterio-
`phage M13 developed by Gronenborn
`and Messing (/6), which contains an
`insert of the B-galactosidase gene with an
`Eco RIrestriction enzymesite in it. The
`presence of B-galactosidase in a plaque
`can be readily detected by using a suit-
`able color-forming substrate (X-gal). The
`presence of an insert in the Eco RIsite
`destroys the B-galactosidase gene, giving
`rise to a colorless plaque.
`Besides being a simple and general
`method of preparing single-stranded
`DNA, this approach has other advan-
`1) DECEMBER1981
`
`Oxford, Exh. 1012, p. 3
`
`
`
`each region on both strands of the DNA.
`Although in theory it would be possi-
`ble to complete a sequence determina-
`tion solely by the random approach, it is
`probably better to use a more specific
`method to determine the final remaining
`nucleotides in a sequencing study. Vari-
`ous methodsare possible (22, 24), butall
`are slow compared with the random
`cloning approach.
`
`Bacteriophage ®X174 DNA
`
`The first DNA to be completely se-
`quencedby the copying procedures was
`from bacteriophage $X174 (10, 12}—a
`single-stranded circular DNA, 5386 nu-
`cleotides long, which codes for
`ten
`genes. The most unexpected finding
`from this work was the presence of
`
`| Exo I
`s——___*=3°
`Fig. 5. Degradation of double-stranded DNA
`with exonucleaseIII.
`
`5’
`
`Restriction
`enzymesite
`
`Galactosidase
`gene
`
`lac
`
`\ — “s.
`
`“,
`
`a
`
`(Blue plaques)
`
`Cut with restriction enzyme.
`Insert DNA fragment (wm)
`
`[pO
`(smi.\
`
`(White plaques)
`Primer
`Fig. 6. Diagram illustrating the cloning of
`DNAfragments in the single-stranded bacte-
`riophage vector M13mp2 (/6) and sequencing
`the insert with a flanking primer.
`
`Table 1. Coding changes in mitochondria.
`Aminoacid coded
`
`Codon
`
`Nor-
`mal
`Mam-
`
`malian
`Yeast
`
`Mitochondria
`
`Term*
`He
`Leu
`Arg
`Arg
`
`Trp
`Met
`Leu
`Term?
`Arg
`
`Trp
`lle
`Thr
`Arg
`Term?
`
`TGA
`AVA
`CTN
`AGA, AGG
`CGN
`*Terminating.
`1208
`
`‘“‘overlapping’’ genes. Previous genetic
`studies had suggested that genes were
`arrangedin a linear order along the DNA
`chains, each gene being encoded by a
`unique region of the DNA. The sequenc-
`ing studies
`indicated, however,
`that
`there were regions of the 6X DNAthat
`were coding for two genes. This is made
`possible by the nature of the genetic
`code. Since a sequence of three nucleo-
`tides (a codon) codes for one aminoacid,
`each region of DNA can theoretically
`code for three different amino acid se-
`quences, depending on wheretranslation
`starts (Fig. 7). The reading frame or
`phase in which translation takes place is
`defined by the position of the initiating
`ATGcodon,following which nucleotides
`are simply read off three at a time. In 6X
`there is an initiating ATG within the gene
`coding for the D protein, but in a differ-
`ent reading frame. Consequently this ini-
`tiates an entirely different sequence of
`amino acids, which is that of the E
`protein. Figure 8 shows the genetic map
`of X. The E gene is completely con-
`tained within the D gene, and the B gene
`is within the A.
`Further studies (25) on the related
`bacteriophage G4 revealed the presence
`of a previously unidentified gene, which
`was called K. This overlaps both the A
`and C genes, and there is a sequence of
`four nucleotides that codesfor part ofall
`three proteins, A, C, and K, using all of
`its three possible reading frames.
`_It is uncertain whether overlapping
`genes are a general phenomenon or
`whether they are confined to viruses,
`whose survival may depend on their rate
`of replication and therefore on the size of
`the DNA: with overlapping genes more
`genetic information can be concentrated
`in a given-sized DNA. Furtherdetails of
`the sequence of bacteriophage $X174
`DNAhave been described (/0, /2).
`
`Mammalian Mitochondrial DNA
`
`Mitochondria contain a small double-
`stranded DNA (mtDNA) which codes
`for two ribosomal RNA’s (rRNA’s), 22
`or 23 transfer RNA’s (tRNA’s) and
`about 10 to 13 proteins which appear to
`be componentsof the inner mitochondri-
`al membrane and are somewhat hydro-
`phobic. Other proteins of the mitochon-
`dria are encoded by the nucleus of the
`cell and specifically transported into the
`mitochondria. Using the above methods
`we have determined the nucleotide se-
`quence of human mtDNA (23) and al-
`most the complete sequence of bovine
`mtDNA. The sequence revealed a num-
`ber of unexpected features which indi-
`
`cated that the transcription and transla-
`tion machinery of mitochondria is rather
`different from that of other biological
`systems.
`The genetic code in mitochondria.
`Hitherto it has been believed that the
`genetic code was universal. No differ-
`ences were foundin the Escherichiacoli,
`yeast, or mammalian systems that had
`been studied. Ourinitial sequence stud-
`ies were on human mtDNA. No amino
`acid sequencesofthe proteins that were
`encoded by human mtDNAwere known.
`However Steffans and Buse (26) had
`determined the sequence of subunit II of
`cytochrome oxidase (COIDfrom bovine
`mitochondria, and Barrell, Bankier, and
`Drouin (27) found that a region of the
`human mtDNAthat they were studying
`had a sequence that would code for a
`protein homologous to this amino acid
`sequence—indicating that it most proba-
`bly was the gene for the human COII.
`Surprisingly the DNA sequence con-
`
`Arg
`Arg
`Cys
`Ala
`His
`Gly
`GGC*CAG*GCA*TGG*AGG*CGG
`
`Gly
`Ala
`His
`The
`Ala
`G*GCC*ACG*CAT*+GCA*GGC?GG
`
`Ala
`Gln
`Met
`Arg
`Pro
`GA*CCA*CGCrATG*GAG+GCG*G
`Fig. 7. Diagram illustrating how one DNA
`sequence can code for three different amino
`acid sequences. The dots indicate the posi-
`tions of triplet codons coding for the amino
`acids.
`
`
`
`Fig. 8. Gene map of 6X174 DNA.
`
`Table 2. Normal coding properties of alanine
`tRNA’s. Thefirst position of the codon pairs
`with the third position of the anticodon and
`vice versa, for example:
`*'GCU* (codon)
`3CGGs. (anticodon)
`Anticodon
`
`Codon
`Mito-
`Wobble
`chondria
`
`
`GGcc
`GCU
`GCC
`UGC
`UGC
`GCA
`GCG
`
`SCIENCE, VOL. 214
`
`Oxford, Exh. 1012, p. 4
`
`Oxford, Exh. 1012, p. 4
`
`
`
`tained TGAtriplets in the reading frame
`of the homologous protein. According to
`the normal genetic code TGAis a termi-
`nation codon, and if it occurs in the
`reading frame of a protein the polypep-
`tide chain is terminated at that position.
`It was noted that in the positions where
`TGA occurred in the human mtDNA
`sequence, tryptophan was found in the
`bovine protein sequence. The only possi-
`ble conclusion seemedto be that in mito-
`chondria TGA was not a termination
`codon but was coding for tryptophan. It
`was
`similarly concluded that ATA,
`which normally codes for isoleucine,
`was coding for methionine. As these
`studies were based on a comparison of a
`human DNAwith a bovine protein, the
`possibility that the differences were due
`to some species variation, although un-
`likely, could not be completely exclud-
`ed. For a conclusive determination of the
`mitochondrial code it was necessary to
`compare the DNA sequence of a gene
`with the amino acid sequence of the
`protein it was coding for. This was done
`by Young and Anderson (28) whoisolat-
`ed bovine mtDNA, determined the se-
`quenceof its COII gene, and confirmed
`the above differences.
`Figure 9 shows the human and bovine
`mitochondrial genetic code and the fre-
`quencyof use of the different codons in
`human mitochondria. All codons are
`used with the exception of UUA and
`UAG, which are terminators, and AGA
`and AGG, which normally code for argi-
`nine. This suggests that AGA and AGG
`are probably also termination codonsin
`mitochondria. Further support for this is
`that no tRNA recognizing the codons has
`been found (see below) and that some of
`the unidentified reading frames found in
`the DNA sequence appear to end with
`these codons.
`In parallel with our studies on mam-
`malian mtDNA, Tzagoloff and his col-
`leagues (29, 30) were studying yeast
`mtDNA. They also found changesin the
`genetic code, but surprisingly they are
`not all the same as those found in mam-
`malian mitochondria (Table 1).
`Transfer RNA’s. Transfer RNA’s
`have a characteristic base-pairing struc-
`ture which can be drawn in the form of
`the ‘‘cloverleaf’’ model. By examining
`the DNA sequence for cloverleaf struc-
`tures and using a computer program (3/),
`it was possible to identify genes coding
`for the mitochondrial tRNA’s.
`Besides the cloverleaf structure, nor-
`mal cytoplasmic tRNA’s have a number
`of so-called ‘‘invariable’’ features that
`are believed to be important
`to their
`biological function. Most of the mamma-
`lian mitochondrial tRNA’s are anoma-
`11 DECEMBER1981
`
`fous in that some of these invariable
`features are missing. The most bizarre is
`one of the serine tRNA’s in which a
`complete loop of the cloverleaf structure
`is missing (32, 33). Nevertheless, it func-
`tions as a tRNA.
`In normal cytoplasmic systemsatleast
`32 tRNA’s are required to code forall
`the amino acids. This is related to the
`“‘wobble”’ effect. Codon-anticodon rela-
`
`tionships in the first and second posi-
`tions of the codons are defined by the
`normal base-pairing rules, but
`in the
`third position G can pair with U. The
`result of this is that one tRNA can recog-
`nize two codons. There are many cases
`in the genetic code whereall four codons
`starting with the same two nucleotides
`code for the same amino acid. These are
`knownas ‘‘family boxes.’’ The situation
`
`Second letter
`
`letter Third
`
`First
`
`tetter
`
`Fig. 9. The human mitochondrial genetic code, showing the coding properties of the tRNA’s
`(boxed codons) and the total number of codons used in the whole genome shownin Fig. 10.
`(One methionine tRNA hasbeen detected, but since there is some uncertainty about the number
`present and their coding properties, these codons are not boxed.)
`
`1000
`'
`
`2000
`1
`
`
`D loop
`F
`Vv
`=a
`fi
`12S rRNA
`wv,
`16S rRNA
`H strand
`0
`oo
`4000
`origin
`\
`L
`
`URF 1
`
`5000
`I
`URF 2
`
`3000
`t
`
`6000
`'
`
`WAN CY
`* 7 131-112
`
`1 QFM
`i
`7000 ee
`
`is
`
`col
`
`va 9000
`
`
`*000
`0
`o4-
`Cytochrome bEE
`
`
`"2
`
`Fig. 10. Gene map of human mtDNAdeduced from the DNA sequence. Boxedregionsare the
`predicted reading frames for the proteins. URF, unidentified reading frame. The tRNA genes
`are denoted by the one-letter amino acid code and are either L strand coded (») or H strand
`coded (<4). Numbers above the genes show the scale in nucleotides and below the predicted
`number betweengenes. * Indicates that termination codons are created by polyadenylation of
`the mRNA.
`
`1209
`
`Oxford, Exh. 1012, p. 5
`
`Oxford, Exh. 1012, p. 5
`
`
`
`for the alanine family box is shown in
`Table 2, indicating that with the normal
`wobble system two tRNA’s are required
`to code for the four alanine codons.
`Only 22 tRNA genescould be found in
`mammalian mtDNA, andforall the fam-
`ily boxes there was only one, which had
`a T in the position corresponding to the
`third position of the codon (34). It seems
`very unlikely that none of the other
`predicted tRNA’s would have been de-
`tected, and the mostfeasible explanation
`is that in mitochondria one tRNA can
`recognize al! four codons in a family box
`and that a U in the first position of the
`anticodon can pair with U, C, A, or Gin
`the third position of the codon. Clearly in
`boxes in which two of the codons code
`for one amino acid and twofor a different
`one, there must be two different tRNA’s
`and the wobble effectstill applies. Such
`tRNA’s are found, as expected, in the
`mitochondrial genes. The coding proper-
`ties of the mitochondrial
`tRNA’s are
`shown in Fig. 9. Similar conclusions
`have been reached by Heckman ef al.
`(35) and by Bonitz ef al. (36), working
`respectively on neurospora and yeast
`mitochondria.
`Distribution of protein genes. Mito-
`chondriat DNA was known to code for
`three of the subunits of cytochrome oxi-
`dase, probably three subunits of the
`adenosinetriphosphatase complex, cyto-
`chrome b, and a numberof other uniden-
`tified proteins. In order to identify the
`protein-coding genes,
`the DNA was
`searched for reading frames; that is, long
`stretches of DNA containing no termina-
`tion codonsin one of the phases and thus
`being capable of coding for long polypep-
`tide chains. Such reading frames should
`start with an initiation codon, which in
`normal systems is nearly always ATG,
`and end with a termination codon. Fig-
`ure 10 summarizesthe distribution of the
`reading frames on the DNA and these
`are believed to be the genes coding for
`the proteins. The gene for COII was
`identified from the amino acid sequence
`as described above,
`for subunit
`I of
`cytochrome oxidase from amino acid se-
`quencestudies on the bovine protein by
`J. E. Walker (personal communication),
`and COIII, cytochrome b, and, proba-
`bly, adenosine triphosphatase 6 were
`identified by comparison with the DNA
`sequences of the corresponding genes in
`yeast mitochondria. Tzagoloff and his
`colleagues were able to identify these
`genesin yeast by genetic methods. It has
`not yet been possible to assign proteins
`to the other reading frames.
`One unusual feature of the mtDNAis
`
`that it has a very compact structure. The
`reading frames for coding for the pro-
`teins and the rRNA genes appear to be
`flanked by the tRNA genes with no, or
`very few, intervening nucleotides. This
`suggests a relatively simple model for
`transcription,
`in which a large RNA is
`copied from the DNA and the tRNA’s
`are cut out by a processing enzyme, and
`this same processing leads to the produc-
`tion of the rRNA’s and the messenger
`RNA’s (mRNA’s), most of which will b