`
`(1981); A. J. Bard. Science 207. 139 (1980): M.
`S. Wrighton. Act. Chem. Res. 12, 303 (1979): A.
`J. Nozik, Annu. Rev. Phys. Chem. 29. 189
`(1978).
`J. R. Bolton, Science 202. 705 (1978).
`P. R. Ryason, Sol. Energy 19. 445 (1977).
`E. Collinson. F. S. Dainton. M. A. Malati,
`Trans. Faraday Soc. 55, 2096 (1959).
`L. J. Heidt, M. G. Mullin. W. B. Martin, Jr.. A.
`M. J. Beatty. J. Phys. Chem. 66. 336 (1962).
`L. J. Heidt and A. F. McMillan. J. Am. Chem.
`Soc. 76. 2135 (1954).
`P. K. Eidem. A. W. Maverick, H. B. Gray.
`Inorg. Chim. Acta 50. 59 (1981).
`.V. Balzani. F. Bolletta. M. T. Gondolfi. M.
`Maestri. Top. Carr. Chem. 75.
`l (1978).
`10. H. D. Gafney and A. W. Adamson. J. Am.
`Chem. Soc. 94. 8238 (1972).
`(
`.
`11. 1:696)Brugger and M. Gratzel. ibid. 102. 246]
`12. Y. Tsutsui. K. Takuma. T. Nishijima. T. Mat-
`suo, Chem. Lett. (1979). p. 617.
`13. For a discussion of methods for optimizing the
`performance of these heterogeneous catalysts.
`see J.-M. Lehn.
`J.«P. Sauvage. R. Ziessel.
`Nouv. J. Chim. 5, 291 (1981); P.-A. Brugger. P.
`Cuendt. M. Gratzel, J. Am. Chem. Soc. 103.
`2923 (1981).
`14. K. Kalyanasundaram and M. Gratzel. Angew.
`Chem. Int. Ed. Eng]. 18, 701 (1979); I. Kiwi. E.
`Borgarello, E. Pelizzetti. M. Visca. M. Gratzel.
`ibid. 19. 646 (1980).
`15. G. M. Brown. B. S. Brunschwig. C. Creutz. J.
`F. Endicott. N. Sutin. J. Am. Chem. Soc. 101.
`1298 (I979).
`16. N. Serpone. M. A. Jamieson. M. S. Henry. M.
`Z. Hofl'man. F. Boletta. M. Maestri.
`ibid.. p.
`2907.
`
`17. D. Miller and G. M. McLendon. personal com-
`munication.
`18. W. C. Trogler. G. L. Geotfroy, D. K. Erwin. H.
`B. Gray, .1. Am. Chem. Soc. 100, 1160 (I978).
`19. D. K. Erwin. G. L. Geofiroy. H. B. Gray. G. S.
`Hammond, E. 1. Solomon, W. C. Trogler. A. A.
`Zagars, ibid. 99, 3620 (1977).
`20. P. K. Eidem.
`thesis, California Institute of
`Technology (1981).
`21. D. R. Tyler and H. B. Gray, J. Am. Chem. Soc.
`103, 1683 (1981).
`22. K. R. Mann. N. S. Lewis. V. M. Miskowski. D.
`K. Erwin, G. S. Hammond. H. B. Gray. ibid.
`99, 5525 (1977).
`23. H. B. Gray et al.. in Fundamental Research in
`Homogeneous Catalysis. M. Tsutsui. Ed. (Plen-
`um, New York. 1979). vol. 3, p. 819.
`24. V. M. Miskowski. l. S. Sigal. K. R. Mann, H. B.
`Gray. S. J. Milder. G. S. Hammond. P. R.
`Ryason. J. Am. Chem. Soc. 10]. 4384(1979).
`I. S. Sigal. K. R. Mann. H. B. Gray. ibid. 102.
`25.
`7252 (1980).
`26. K. R. Mann, M. DiPierro. T. P. Gill, ibid.. p.
`3965.
`27. I. S. Sigal and H. B. Gray, ibid. 103, 3330(1981).
`28. T. P. Smith and H. B. Gray. paper presented at
`the 180th National Meeting of the American
`Chemical Society, Las Vegas. Nev., 24 to 29
`August 1980.
`29. V. M. Miskowski. G. L. Nobinger. D. S. Kliger.
`G. S. Hammond, N. S. Lewis. K. R. Mann. H.
`B. Gray. J. Am. Chem. Soc. 100. 485 (1978).
`30. S. J. Milder. R. A. Goldbeck. D. S. Kliger. H.
`B. Gray. ibid. 102. 6761 (1980).
`31. S. F. Rice and H. B. Gray.
`ibid. 103. 1593
`(I981).
`32. R. F. Dallinger. V. M. Miskowski. H. B. Gray.
`W. H. Woodrufl‘. ibid.. p. 1595.
`
`33. J. S. Najdzionek. unpublished results.
`34. A. W. Maverick and H. B. Gray. J. Am. Chem.
`Soc. 103, 1298 (1981).
`35. V. M. Miskowski. R. A. Goldbeck. D. S. Kliger.
`H. B. Gray, Inorg. Chem. 18, 86 (1979).
`36. V. M. Miskowski. A. J. Twarowski. R. H.
`Fleming, G. S. Hammond, D. S. Kliger. ibid. 17.
`1056 (1978).
`37. D. G. Nocera and H. B. Gray. J. Am. Chem.
`Son. in press.
`38. P. B. Fleming and R. E. McCarley. Inorg.
`Chem. 9, 1347 (1970).
`39. J. A. Ferguson and T. J. Meyer. J. Am. Chem.
`Soc. 94, 3409 (1972).
`40. H.-D. Scharf, J. Fleischhauer, H. Leismann. l.
`Ressler. W. Schleker. R. Weitz, Angew. Chem.
`Int. Ed. Engl. 18. 652 (1979).
`41. For several years our research in inorganic and
`organometallic photochemistry has been sup-
`ported by grants from the National Science
`Foundation (Chemical Dynamics Program). Col~
`laboration with researchers at the Jet Propulsion
`Laboratory and with Dr. D. S. Kliger and his
`group at Santa Cruz has been aided by grants
`from the Continental Group Foundation. the Jet
`Propulsion Laboratory Director‘s Discretionary
`Fund. and the U.S. Department of Energy.
`Instrumentation was obtained with grants from
`the National Science Foundation (Cl-{£78-
`10530) and from the Union Oil Company of
`California Foundation. Cenain rhodium and
`iridium salts used in our research were lent to us
`by Johnson Matthey. Inc. A.W.M. acknowl-
`edges the National Science Foundation (1977 to
`1980) and the Standard Oil Co. (Ohio) (1980 and
`1981) for graduate fellowships. This is contribu-
`tion No. 6415 from the Arthur Amos Noyes
`Laboratory.
`
`Determination of Nucleotide
`
`Sequences in DNA
`
`Frederick Sanger
`
`of about 50 nucleotides in length were
`obtained with this method (4, 5), but it
`seemed that to be able to sequence ge-
`netic material a new approach was desir—
`able and we turned our attention to the
`use of copying procedures.
`
`Copying Procedures
`
`In the RNA field these procedures had
`been pioneered by C. Weissmann and his
`colleagues (6)
`in their studies on the
`RNA sequence of the bacteriophage QB.
`Phage QB contains a replicase that will
`synthesize a complementary copy of the
`single-stranded RNA chain.
`starting
`from its 3’ end. These workers devised
`elegant procedures involving pulse-label—
`ing with radioactively labeled nucleo-
`tides, from which sequences could be
`deduced.
`For DNA sequences we have used the
`enzyme DNA polymerase, which copies
`single-stranded DNA as shown in Fig. 1.
`The enzyme requires a primer, which is a
`single-stranded oligonucleotide having a
`sequence that is complementary to, and
`therefore able to hybridize with. a region
`on the DNA being sequenced (the tem—
`plate). Mononucleotide residues are add-
`
`quences. The general approach used in
`these studies, and in those on proteins,
`depended on the principle of partial deg—
`radation. The large molecules were bro-
`ken down, usually by suitable enzymes,
`to give smaller products which were then
`separated from each other, and their
`sequence was determined. When suffi-
`cient results had been obtained they
`were fitted together by a process of
`deduction to give the complete se-
`quence. This approach was necessarily
`rather slow and tedious, often involving
`successive digestions and fractionations,
`and it was not easy to apply it to the
`larger DNA molecules. When we first
`studied DNA some significant sequences
`
`In spite of the important role played by
`DNA sequences in living matter,
`it
`is
`only relatively recently that general
`methods for their determination have
`been developed. This is mainly because
`of the very large size of DNA molecules,
`the smallest being those of the simple
`bacteriophages such as ¢X174 (which
`contains about 5000 base pairs). It was
`therefore diflicult
`to develop methods
`with such complicated systems. There
`are, however,
`some relatively small
`RNA molecules—notably the transfer
`RNA’s of about 75 nucleotides, and
`these were used for the early studies on
`nucleic acid sequences (1).
`Following my work on amino acid
`sequences in proteins (2) I turned my
`attention to RNA and, with G. G.
`Brownlee and B. G. Barrel], developed a
`relatively rapid small—scale method for
`the fractionation of 32P-labeled oligonu-
`cleotides (3). This became the basis for
`most subsequent studies of RNA se-
`SCIENCE. VOL. 214. 11 DECEMBER 1981
`
`Copyright © by the Nobel Foundation.
`The author is head of the Division of Protein and Nucleic Acid Chemistry at the MRC Laboratory of
`Molecular Biology, Hills Road, Cambridge CBZ 20H. England. This article is the lecture he. delivered in
`Stockholm, Sweden, on 8 December 1980. when he received the Nobel Prize in Chemistry. a prize he shared
`with Walter Gilben and Paul Berg. Minor corrections and additions have been made by the author. The
`article is published here with the permission of the Nobel Foundation and will also be included in the
`complete volume of Les Prix Nobel en 1980 as well as in the series Nobel Lectures (in English) published by
`the Elsevier Publishing Company. Amsterdam and New York. Dr. Berg’s lecture appeared in the 17 July
`issue, page 296. Dr. Gilbert‘s lecture will be published in a subsequent issue.
`0036-8075/8l/l2l[-1205$01.00/0 Copyright © 1981 AAAS
`
`1205
`
`Oxford, Exh. 1012, p. 1
`
`Oxford, Exh. 1012, p. 1
`
`
`
`ed sequentially to the 3’ end of the
`primer from the corresponding deoxynu-
`cleoside triphosphates, making a com-
`plementary copy of the template DNA.
`By using triphosphates containing 32P in
`the a position,
`the newly synthesized
`DNA can be labeled. In the early experi-
`ments synthetic oligonucleotides were
`used as primers, but after the discovery
`of restriction enzymes it was more con-
`venient to use fragments resulting from
`their action as they were much more
`easily obtained.
`The copying procedure was used ini-
`tially to prepare a short, specific region
`of labeled DNA which could then be
`subjected to partial digestion proce-
`dures. One of the difficulties of sequenc-
`ing DNA was to find specific methods
`for breaking it down into small
`frag-
`ments. No suitable enzymes were known
`that would recognize only one nucleo-
`tide. However, Berg, Fancher,
`and
`Chamberlin (7) had shown earlier that
`under certain conditions it was possible
`to incorporate ribonucleotides, in place
`of the normal deoxyribonucleotides, into
`DNA chains with DNA polymerase.
`Thus, for instance, if copying were car-
`ried out with ribo CTP (7a) and the other
`three deoxynucleoside triphosphates, a
`chain could be built up in which the C
`residues were in the ribo form. Bonds
`involving ribonucleotides could be bro-
`ken by alkali under conditions where
`those involving the deoxynucleotides
`were not, so that a specific splitting at C
`residues could be obtained. Using this
`method we were able to extend our se-
`quencing studies to some extent
`(8).
`However extensive fractionations and
`analyses were still required.
`
`The “Plus and Minus” Method
`
`In the course of these experiments we
`needed to prepare DNA copies of high
`specific radioactivity, and in order to do
`this the highly labeled substrates had to
`be present in low concentrations. Thus if
`[Ct-32?]dATP was used for labeling,
`its
`concentration was much lower than that
`of the other three triphosphates and fre:
`quently when we analyzed the newly
`synthesized DNA chains we found that
`they terminated at a position immediate-
`ly before that at which an A should have
`been incorporated. Consequently a mix-
`ture of products was produced all having
`the same 5’ end (the 5’ end of the primer)
`and terminating at
`the 3’ end at
`the
`position of the A residues. If these prod—
`ucts could be fractionated on a system
`that separated only on the basis of chain
`length, the pattern of their distribution
`I206
`
`on fractionation would be proportional
`to the distribution of the A‘s along the
`DNA chain. And this, together with the
`distribution of the other three mononu-
`cleotides, is the information required for
`sequence determination. Initial experi-
`ments carried out with J. E. Donelson
`suggested that this approach could be the
`basis for a more rapid method, and it was
`found that good fractionations according
`
`»»»»»»rfitfi‘if’rt------
`AAAAAA3.333: ? 3m
`3
`
`------2t"iffy???"
`33.38.13.333...”
`
`I
`
`1. Specificity requirements for DNA
`Fig.
`polymerase.
`
`~OCH,
`
`NR
`
`O HJN\/(')\lCH;
`CHu.ouOCrO1,J\O
`
`OCH,
`
`NR
`
`0
`0
`“Kim,
`eqP—OCH20
`0 N/
`
`etc—.
`
`OH
`
`
`DNA
`polymerase
`
`~Ot:H2
`
`O
`
`NR
`
`0
`
`HN)\ /CH,
`.L,J_.
`OH ~ g p.pC>Ci-¢,D
`
`ddYTP
`
`~0CH
`
`2O
`
`NR
`
`0
`HNA/CH:
`,LNJ
`902p __ OCHQO
`
`Fig. 2. Diagram showing chain termination
`with dideoxythymidine triphosphate (ddTTP).
`The top line shows --the DNA polymerase-
`catalyzed reaction of the normal deoxynu-
`cleoside triphosphate (TTP) with the 3’ termi-
`nal nucleotide of the primer: the bottom line
`the corresponding reaction with ddTTP.
`
`H'imer
`Tet-Mae
`
`_LLLLA_
`3—.,—r,———, TAGCAACT
`DNA
`
`+
`
`ATP
`GTP
`CTP
`TTP MTTP
`
`ATP
`ATP
`GTP+ddGTP
`G'TP
`CTPWP CTP
`TTP
`TTP
`
`ATPs ddATP
`GI'P
`CTP
`TTP
`
`—ATlddC —ATCGTTddG —ATCGTTddA
`—ATCGTddT
`.. ATOGddT
`—AddT
`
`\‘///\—-ATCddG —ddA
`\TCGA/
`
`Fig. 3. Principle of the chain-terminating
`method.
`
`to size could be obtained by ionophore-
`sis on acrylamide gels.
`The method described above met with
`only limited success but we were able to
`develop two modified techniques that
`depended on the same general principle,
`and these provided a simpler and much
`more rapid method of DNA sequence
`determination than anything we had used
`before (9). This, which is known as the
`“plus and minus” technique, was used
`to determine the almost complete se-
`quence of the DNA of bacteriophage
`¢Xl74, which contains 5386 nucleotides
`([0).
`
`The “Dideoxy” Method
`
`More recently we have developed an—
`other similar method that uses specific
`chain-terminating analogs of the normal
`deoxynucleoside
`triphosphates
`(11).
`This method is both quicker and more
`accurate than the plus and minus tech-
`nique. It was used to complete the se—
`quence of ¢Xl74 (12), to determine the
`sequence of a related bacteriophage, G4
`(13), and has now been applied to mam-
`malian mitochondrial DNA (mtDNA).
`The analogs most widely used are the
`dideoxynucleoside triphosphates
`(Fig.
`2). They are the same as the normal
`deoxynucleoside triphosphates but lack
`the 3’ hydroxyl group. They can be in-
`corporated into a growing DNA chain by
`DNA polymerase but act as terminators
`because, once they are incorporated, the
`chain contains no 3’ hydroxyl group and
`so no other nucleotide can be added.
`The principle of the method is summa-
`rized in Fig. 3. Primer and template are
`denatured to separate the two strands of
`the primer, which is usually a restriction
`enzyme fragment, and then annealed to
`form the primer-template complex. The
`mixture is then divided into four sam-
`ples. One (the T sample) is incubated
`with DNA polymerase in the presence of
`a mixture of ddTTP (dideoxythymidine
`triphosphate) and a low concentration of
`TTP, together with the other three de-
`oxynucleoside
`triphosphates
`(one of
`which is labeled with 32P) at normal
`concentration. As the DNA chains are
`built up on the 3’ end of the primer, the
`position of the T’s will be filled, in most
`cases by the normal substrate T, and
`extended further, but occasionally by
`ddT and terminated. Thus at the end of
`incubation there remains a mixture of
`chains terminating with ddT at their 3’
`end but all having the same 5’ end (the 5’
`end of the primer). Similar incubations
`are carried out in the presence of each of
`the other three dideoxy derivatives, giv-
`SCIENCE. VOL. 214
`
`Oxford, Exh. 1012, p. 2
`
`Oxford, Exh. 1012, p. 2
`
`
`
`prime into, and thus determine, a se-
`quence of about 200 nucleotides in the
`inserted DNA. Smaller synthetic primers
`have now been prepared (19, 20) which
`allow longer sequences to be deter-
`mined. The approach that we have used
`is to prepare clones at random from
`restriction enzyme digests and determine
`the sequence with the flanking primer.
`Computer programs (21) are then used to
`store, overlap, and arrange the data.
`Another important advantage of the
`cloning technique is that
`it
`is a very
`efficient and rapid method of fractionat-
`ing fragments of DNA. In all sequencing
`techniques, both for proteins and nucleic
`acids, fractionation has been an impor-
`tant step, and major progress has usually
`been dependent on the development of
`new fractionation methods. With the
`
`new rapid methods for DNA sequencing,
`fractionation is still important; and, as
`the sequencing procedure itself is be-
`coming more rapid, more of the work has
`involved fractionation of the restriction
`
`enzyme fragments by electroms on
`acrylamide. This becomes increasingly
`difl‘icult as larger DNA molecules are
`studied and may involve several succes-
`sive fractionations before pure fragments
`are obtained. In the new method these
`fractionations are replaced by a cloning
`procedure. The mixture is spread on an
`agar plate and grown. Each clone repre-
`sents the progeny of a single molecule
`and is therefore pure,
`irrespective of
`how complex the original mixture was. It
`is particularly suitable for studying large
`DNA’s. In fact, there is no theoretical
`limit to the size of DNA that could be
`sequenced by this method.
`We have applied the method to frag—
`ments from mtDNA (22, 23) and to bac-
`teriophage x DNA. Initially new data can
`be accumulated very quickly (under ide~
`al conditions at about 500 to 1000 nucleo-
`tides a day). However, at
`later stages
`much of the data produced will be in
`regions
`that have already been se-
`quenced, and progress then appears to_
`be much slower. Nevertheless. we find
`that most new clones studied give some
`useful data, either for correcting or con-
`firming old sequences. Thus, in the work
`with bacteriophage A DNA, we have
`about 90 percent of the molecule identi-
`fied in sequences and most of the new
`clones we study contribute some new
`information. In most studies on DNA
`one is concerned with identifying the
`reading frames for protein genes, and to
`do this the sequence must be correct.
`Errors can readily occur in such exten-
`sive sequences and confirmation is al-
`ways necessary. We usually consider it
`necessary to determine the sequence of
`l 207
`
`Oxford, Exh. 1012, p. 3
`
`tages. One is that it is possible to use the
`same primer on all clones. Heidecker et
`al.
`(18) prepared a 96—nucleotide-long
`restriction fragment derived from a posi-
`tion in the M13 vector flanking the
`Eco RI site (Fig. 6). This can be used to
`
`2 5 hours
`G
`A
`T
`
`C
`
`5 hours
`T C
`A
`
`G
`
`g.
`l
`
`V“
`
`m - .
`
`un-a:
`“In!-
`II-
`
`a
`
`‘t‘k
`III-
`__
`
`
`
`II-lC“!1:755“!cm
`
`—.
`
`W
`
`"tr
`
`M ‘It—
`.
`-...."'
`-.
`
`"
`...._
`In-
`a...
`w—
`
`I
`w.—
`
`In.
`‘ n...
`'I-r “
`
`-._I
`
`‘-
`
`‘-
`M .
`~...
`%
`'-‘-
`-._ --..
`‘-
`“b
`
`‘-
`
`:2
`
`u-
`
`
`illC“i"3?"E;IfI“!Hlultmmtnmmmmael: M
`
`at:
`
`i
`
`‘
`
`fill
`
`Iu-
`' ‘5
`
`"‘
`
`II-
`
`-
`
`“ h.“
`“
`an
`
`-
`---~--
`
`h
`-...
`
`u
`......
`"""
`
`I—
`_,._
`
`u...
`an
`
`'
`"""
`
`II-
`
`m
`
`--
`
`-'
`
`--—.
`
`-
`
`Fig. 4. Autoradiograph of a DNA sequencing
`'gel. The origin is at the top and migration of
`the DNA chains, according to size. is down-
`ward. The gel on the left has been run for 2.5
`hours and that on the right for 5 hours with the
`same polymerization mixtures.
`
`ing mixtures terminating at the positions
`of C, A, and G, respectively, and the four
`mixtures are fractionated in parallel by
`electrophoresis on acrylamide gel under
`denaturing conditions. This system sepa-
`rates the chains according to size, the
`small ones moving quickly and the large
`ones slowly. As all the chains in the T
`mixture end at T, the relative position of
`the T’s in the chain will define the rela-
`tive sizes of the chains, and therefore
`their relative positions on the gel after
`fractionation. The actual sequence can
`then simply be read off from an autora-
`diograph of the gel (Fig. 4). The method
`is comparatively rapid and accurate, and
`sequences of up to about 300 nucleotides
`from the 3’ end of the primer can usually
`be determined. Considerably longer se-
`quences have been read off but these are
`usually less reliable.
`One problem with the method is that it
`requires single-stranded DNA as tem-
`plate. This is the natural form of the
`DNA in the bacteriophages ¢X174 and
`G4, but most DNA is double-stranded
`and it is frequently difficult to separate
`the two strands. One way of overcoming
`this was devised by A. J. H. Smith (14).
`If the double-stranded linear DNA is
`treated with exonuclease III (a double-
`strand specific 3’ exonuclease) each
`chain is degraded from its 3’ end, as
`shown in Fig. 5, giving rise to a structure
`that is largely single-stranded and can be
`used as template for DNA polymerase
`with suitable small primers. This method
`is particularly suitable for use with frag-
`ments cloned in plasmid vectors and has
`been used extensively in the work on
`human mtDNA.
`
`Cloning in Single-Stranded ~
`
`Bacteriophage
`
`Another method of preparing suitable
`template DNA that is being more widely
`used is to clone fragments in a single-
`stranded bacteriophage vector (15—17).
`This approach is summarized in Fig. 6.
`Various vectors have been described.
`We have used a derivative of bacterio-
`phage M13 developed by Gronenbom
`and Messing (16), which contains an
`insert of the B-galactosidase gene with an
`Eco RI restriction enzyme site in it. The
`presence of B-galactosidase in a plaque
`can be readily detected by using a suit-
`able color-forming substrate (X-gal). The
`presence of an insert in the Eco RI site
`destroys the B-galactosidase gene, giving
`rise to a colorless plaque.
`Besides being a simple and general
`method of preparing single-stranded
`DNA,
`this approach has other advan~
`11 DECEMBER l98l
`
`Oxford, Exh. 1012, p. 3
`
`
`
`cated that the transcription and transla—
`tion machinery of mitochondria is rather
`diflerent from that of other biological
`systems.
`The genetic code in mitochondria.
`Hitherto it has been believed that the
`genetic code was universal. No differ-
`ences were found in the Escherichia coli,
`yeast, or mammalian systems that had
`been studied. Our initial sequence stud-
`ies were on human mtDNA. No amino
`acid sequences of the proteins that were
`encoded by human mtDNA were known.
`However Steifans and Buse (26) had
`determined the sequence of subunit II of
`cytochrome oxidase (COII) from bovine
`mitochondria, and Barrell, Bankier, and
`Drouin (27) found that a region of the
`human mtDNA that they were studying
`had a sequence that would code for a
`protein homologous to this amino acid
`sequence—indicating that it most proba-
`bly was the gene for the human COII.
`Surprisingly the DNA sequence con-
`
`Arg
`Arg
`st
`Ala
`His
`Gly
`GGCOCAC-GCA-TGC-AGG-CGG
`
`Gly
`Ala
`Hls
`Thr
`Ala
`G-Gcc-ACG-CATcecA-GGC-GG
`
`Ala
`Gln
`Met
`Arg
`Fro
`ea.ccA.cec-ATG-CAG-GCG-G
`Fig. 7. Diagram illustrating how one DNA
`sequence can code for three different amino
`acid sequences. The dots indicate the posi—
`tions of triplet codons coding for the amino
`acids.
`
`mRNA—~
`
`DNA
`replication
`
`\ F
`
`ig. 8. Gene map of ¢Xl74 DNA.
`
`Table 2. Normal coding properties of alanine
`tRNA’s. The first position of the codon pairs
`with the third position of the anticodon and
`vice versa, for example:
`5'GCU" (codon)
`TCGGst (anticodon)
`Anticodon
`
`Codon
`Mito-
`
`Wobble
`chondria
`
`GCU
`GCC
`
`GGC
`
`UGC
`UGC
`GCA
`GCG
`
`SCIENCE. VOL. 214
`
`Oxford, Exh. 1012, p. 4
`
`each region on both strands of the DNA.
`Although in theory it would be possi-
`ble to complete a sequence determina-
`tion solely by the random approach, it is
`probably better to use a more specific
`method to determine the final remaining
`nucleotides in a sequencing study. Vari-
`ous methods are possible (22, 24), but all
`are slow compared with the random
`cloning approach.
`
`Bacteriophage ¢Xl74 DNA
`
`The first DNA to be completely se-
`quenced by the copying procedures was
`from bacteriophage ¢Xl74 (10, 12)—a
`single-stranded circular DNA, 5386 nu-
`cleotides long, which codes for
`ten
`genes. The most unexpected finding
`from this work was the presence of
`
`010)
`
`“I
`
`i Exo III
`5—33’
`Fig. 5. Degradation of double-stranded DNA
`with exonuclease III.
`
`5,
`
`Resumflon
`enzyme site
`
`Galactosldase
`gene
`
`lac
`
`\/“"‘\‘\
`
`(Blue plaques)
`
`M13
`
`Cut with restriction enzyme.
`Insert DNA fragment (MM)
`
`6"“ ‘ "~\
`/
`/\\
`
`(White plaques)
`Primer
`Fig. 6. Diagram illustrating the cloning of
`DNA fragments in the single-stranded bacte-
`riophage vector Ml3mp2 (16) and sequencing
`the insert with a flanking primer.
`
`Table 1. Coding changes in mitochondria.
`Amino acid coded
`
`Codon
`
`Nor-
`mal
`Mam-
`
`malian
`Yeast
`
`Mitochondria
`
`Term“
`lle
`Leu
`Arg
`Arg
`
`Trp
`Met
`Leu
`Term?
`Arg
`
`Trp
`Ile
`Thr
`Arg
`Term?
`
`TGA
`AUA
`CTN
`AGA, AGG
`CGN
`*Terminating.
`l208
`
`“overlapping” genes. Previous genetic
`studies had suggested that genes were
`arranged in a linear order along the DNA
`chains, each gene being encoded by a
`unique region of the DNA. The sequenc-
`ing studies
`indicated, however,
`that
`there were regions of the ¢x DNA that
`were coding for two genes. This is made
`possible by the nature of the genetic
`code. Since a sequence of three nucleo-
`tides (a codon) codes for one amino acid,
`each region of DNA can theoretically
`code for three different amino acid se-
`quences, depending on where translation
`starts (Fig. 7). The reading frame or
`phase in which translation takes place is
`defined by the position of the initiating
`ATG codon, following which nucleotides
`are simply read off three at a time. In ¢X
`there is an initiating ATG within the gene
`coding for the D protein, but in a differ-
`ent reading frame. Consequently this ini-
`tiates an entirely different sequence of
`amino acids, which is that of the E
`protein. Figure 8 shows the genetic map
`of 43X. The E gene is completely con-
`tained within the D gene, and the B gene
`is within the A.
`Further studies (25) on the related
`bacteriophage G4 revealed the presence
`of a previously unidentified gene, which
`was called K. This overlaps both the A
`and C genes, and there is a sequence of
`four nucleotides that codes for part of all
`three proteins, A, C, and K, using all of
`its three possible reading frames.
`.
`, It
`is uncertain whether overlapping
`genes are a general phenomenon or
`whether they are confined to viruses,
`whose survival may depend on their rate
`of replication and therefore on the size of
`the DNA: with overlapping genes more
`genetic information can be concentrated
`in a given-sized DNA. Further details of
`the sequence of bacteriophage ¢Xl74
`DNA have been described (10, 12).
`
`Mammalian Mitochondrial DNA
`
`Mitochondria contain a small double—
`stranded DNA (mtDNA) which codes
`for two ribosomal RNA’s (rRNA‘s), 22
`or 23 transfer RNA’s (tRNA’s) and
`about 10 to 13 proteins which appear to
`be components of the inner mitochondri-
`al membrane and are somewhat hydro-
`phobic. Other proteins of the mitochon-
`dria are encoded by the nucleUs of the
`cell and specifically transported into the
`mitochondria. Using the above methods
`we have determined the nucleotide se-
`quence of human mtDNA (23) and al-
`most the complete sequence of bovine
`mtDNA. The sequence revealed a num-
`ber of unexpected features which indi-
`
`Oxford, Exh. 1012, p. 4
`
`
`
`tained TGA triplets in the reading frame
`of the homologous protein. According to
`the normal genetic code TGA is a termi-
`nation codon, and if it occurs in the
`reading frame of a protein the polypep-
`tide chain is terminated at that position.
`It was noted that in the positions where
`TGA occurred in the human mtDNA
`sequence, tryptophan was found in the
`bovine protein sequence. The only possi-
`ble conclusion seemed to be that in mito-
`chondria TGA was not a termination
`codon but was coding for tryptophan. It
`was
`similarly concluded that ATA,
`which normally codes for isoleucine,
`was coding for methionine. As these
`studies were based on a comparison of a
`human DNA with a bovine protein, the
`possibility that the differences were due
`to some species variation, although un—
`likely, could not be completely exclud-
`ed. For a conclusive determination of the
`mitochondrial code it was necessary to
`compare the DNA sequence of a gene
`with the amino acid sequence of the
`protein it was coding for. This was done
`by Young and Anderson (28) who isolat-
`ed bovine mtDNA, determined the se-
`quence of its COII gene, and confirmed
`the above differences.
`Figure 9 shows the human and bovine
`mitochondrial genetic code and the fre-
`quency of use of the different codons in
`human mitochondria. All codons are
`used with the exception of UUA and
`UAG, which are terminators, and AGA
`and AGG, which normally code for argi-
`nine. This suggests that AGA and AGG
`are probably also termination codons in
`mitochondria. Further support for this is
`that no tRNA recognizing the codons has
`been found (see below) and that some of
`the unidentified reading frames found in
`the DNA sequence appear to end with
`these codons.
`In parallel with our studies on mam-
`malian mtDNA, Tzagolofl‘ and his col-
`leagues (29, 30) were studying yeast
`mtDNA. They also found changes in the
`genetic code, but surprisingly they are
`not all the same as those found in mam-
`malian mitochondria (Table 1).
`Transfer RNA’s. Transfer RNA’s
`have a characteristic base-pairing struc-
`ture which can be drawn in the form of
`the “cloverleaf” model. By examining
`the DNA sequence for cloverleaf struc-
`tures and using a computer program (31),
`it was possible to identify genes coding
`for the mitochondrial tRNA’s.
`Besides the cloverleaf structure, nor-
`mal cytoplasmic tRNA's have a number
`of so-called “invariable” features that
`are believed to be important
`to their
`biological function. Most of the mamma-
`lian mitochondrial tRNA’s are anoma—
`ll DECEMBER 1981
`
`lous in that some of these invariable
`features are missing. The most bizarre is
`one of the serine tRNA’s in which a
`complete loop of the cloverleaf structure
`is missing (32, 33). Nevertheless, it func-
`tions as a tRNA.
`In normal cytoplasmic systems at least
`32 tRNA’s are required to code for all
`the amino acids. This is related to the
`“wobble" efi'ect. Codon-anticodon rela—
`
`tionships in the first and setond posi-
`tions of the codons are defined by the
`normal base—pairing rules, but
`in the
`third position G can pair with U. The
`result of this is that one tRNA can recog-
`nize two codons. There are many cases
`in the genetic code where all four codons
`starting with the same two nucleotides
`code for the same amino acid. These are
`known as “family boxes.” The situation
`
`Second letter
`
`First
`
`letter Thlrd
`
`letter
`
`Fig. 9. The human mitochondrial genetic code, showing the coding properties of the tRNA‘s
`(boxed codons) and the total number of codons used in the whole genome shown in Fig.
`IO.
`(One methionine tRNA has been detected. but since there is some uncertainty about the number
`present and their coding properties. these codons are not boxed.)
`
`1000
`I
`
`2000
`l
`
`3000
`I
`
`6000
`I
`
`
`D loop
`F
`V
`4—!
`H
`123 rFlNA
`H
`168 rRNA
`H strand
`0
`o 0
`4090
`orlgin
`.
`L
`
`5000
`i
`unFe
`
`unF 1
`
`WAN c‘r
`IOFM
`a: 7 131- 12
`_
`L strand origin
`'4' colon
`* 31 0
`70'00
`0 2
`.
`co:
`
`mono
`“ 1 0
`11000
`251
`'45?
`12000
`'
`'
`UHF Aal.
`'
`com
`RF 3
`a um: 41.
`UHF 4
`
`a:
`t o
`I
`* o
`4?
`13000
`14000
`15000
`
` I*ODCI G
`Cytochrome b
`/ (1/
`
`
`
`
`#2
`
`
`
`Fig. 10. Gene map of human mtDNA deduced from the DNA sequence. Boxed regions are the
`predicted reading frames for the proteins. URF, unidentified reading frame. The tRNA genes
`are denoted by the one-letter amino acid code and are either L strand coded (b) or H strand
`coded (4). Numbers above the genes show the scale in nucleotides and below the predicted
`number between genes. * Indicates that termination codons are created by polyadenylation of
`the mRNA.
`
`1209
`
`Oxford, EXh. 1012, p. 5
`
`Oxford, Exh. 1012, p. 5
`
`
`
`for the alanine family box is shown in
`Table 2, indicating that with the normal
`wobble system two tRNA’s are required
`to code for the four alanine codons.
`Only 22 tRNA genes could be found in
`mammalian mtDNA, and for all the fam-
`ily boxes there was only one, which had
`a T in the position corresponding to the
`third position of the codon (34). It seems
`very unlikely that none of the other
`predicted tRNA’s would have been de-
`tected, and the most feasible explanation
`is that in mitochondria one tRNA can
`recognize all four codons in a family box
`and that a U in the first position of the
`anticodon can pair with U, C, A, or G in
`the third position of the codon. Clearly in
`boxes in which two of the codons code
`for one amino acid and two for a difierent
`one, there must be two different tRNA’s
`and the wobble efiect still applies. Such
`tRNA’s are found, as expected, in the
`mitochondrial genes. The coding proper—
`ties of the mitochondrial
`tRNA’s are
`shown in Fig. 9. Similar conclusions
`have been reached by Heckman et al.
`(35) and by Bonitz et al. (36). working
`respectively on neurospora and yeast
`mitochondria.
`Distribution of protein genes. Mito-
`chondrial DNA was known to code for
`three of the subunits of cytochrome oxi-
`dase, probably three subunits of the
`adenosine triphosphatase complex, cyto-
`chrome b, and a number of other uniden-
`tified proteins. In order to identify the
`protein-coding genes,
`the DNA was
`searched for reading frames; that is, long
`stretches of DNA containing no termina-
`tion codons in one of the phases and thus
`being capable of coding for long polypep-
`tide chains. Such reading frames should
`start with an initiation codon, which in
`normal systems is nearly always ATG,
`and end with a termination codon. Fig-
`ure 10 summarizes the distribution of the
`reading frames on the DNA and these
`are believed to be the genes coding for
`the proteins. The gene for C011 was
`identified from the amino acid sequence
`as described above,
`for subunit
`I of
`cytochrome oxidase from amino acid se-
`quence studies on the bovine protein by
`J. E. Walker (personal communication),
`and COIII, cytochrome b, and, proba-
`bly, adenosine triphosphatase 6 were
`identified by comparison with the DNA
`sequences of the corresponding genes in
`yeast mitochondria. Tzagolofl‘ and his
`colleagues were able to identify these
`genes in yeast by genetic methods. It has
`not yet been possible to assign proteins
`to the other reading frames.
`One unusual feature of the mtDNA is
`
`that it has a very compact