`
`DARRYL LEJA
`
`A.BARRINGTON BROWN/SPL
`
`in a few weeks by a single graduate student
`with access to DNA samples and associated
`phenotypes, an Internet connection to the
`public genome databases, a thermal cycler
`and a DNA-sequencing machine. With the
`recent publication of a draft sequence of
`the mouse genome11,
`identification of
`the mutations underlying a vast number
`of interesting mouse phenotypes has simi-
`larly been greatly simplified. Comparison
`of the human and mouse sequences
`shows that the proportion of the
`mammalian genome under evolu-
`tionary selection is more than twice that
`previously assumed.
`Our ability to explore genome function is
`increasing in specificity as each subsequent
`genome is sequenced. Microarray
`technologies have catapulted many
`laboratories from studying the expres-
`sion of one or two genes in a month
`to studying the expression of tens of
`thousands of genes in a single after-
`noon12. Clinical opportunities
`for gene-based pre-symptomatic
`prediction of illness and adverse
`drug response are emerging at a
`rapid pace, and the therapeutic
`promise of genomics has ushered
`in an exciting phase of expansion
`and exploration in the commercial
`sector13. The investment of the HGP in
`studying the ethical, legal and social
`implications of these scientific advances
`has created a talented cohort of scholars in
`ethics, law, social science, clinical research,
`theology and public policy, and has already
`resulted in substantial increases in public
`awareness and the introduction of significant
`(but still incomplete) protections against
`misuses such as genetic discrimination (see
`www.genome.gov/PolicyEthics).
`These accomplishments fulfil the expan-
`sive vision articulated in the 1988 report of
`the National Research Council, Mapping and
`Sequencing the Human Genome14. The suc-
`cessful completion of the HGP this year thus
`represents an opportunity to look forward
`and offer a blueprint for the future of
`genomics research over the next several years.
`The vision presented here addresses a
`different world from that reflected in earlier
`plans published in 1990, 1993 and 1998 (refs
`15–17). Those documents addressed the
`goals of the 1988 report, defining detailed
`paths towards the development of genome-
`
`835
`
`A vision for the future of
`genomics research
`
`A blueprint for the genomic era.
`
`Francis S. Collins, Eric D. Green,
`Alan E. Guttmacher and Mark S.
`Guyer on behalf of the US National
`Human Genome Research Institute*
`
`The completion of a high-quality,
`comprehensive sequence of
`the
`human genome,
`in this fiftieth
`anniversary year of the discovery of the
`double-helical structure of DNA, is a
`landmark event. The genomic era is
`now a reality.
`In contemplating a vision for the
`future of genomics research,it is appropri-
`ate to consider the remarkable path that
`has brought us here. The rollfold
`(Figure 1) shows a timeline of land-
`mark accomplishments in genetics
`and genomics, beginning with Gregor
`Mendel’s discovery of the laws of heredity1
`and their rediscovery in the early days of the
`twentieth century.Recognition of DNA as the
`hereditary material2, determination of its
`structure3, elucidation of the genetic code4,
`development of recombinant DNA tech-
`nologies5,6, and establishment of increasingly
`automatable methods for DNA sequen-
`cing7–10 set the stage for the Human Genome
`Project (HGP) to begin in 1990 (see also
`www.nature.com/nature/DNA50). Thanks
`to the vision of the original planners, and
`the creativity and determination of a legion
`of talented scientists who decided to make
`this project their overarching focus, all of
`the initial objectives of the HGP have now
`been achieved at least two years ahead of
`expectation, and a revolution in biological
`research has begun.
`The project’s new research strategies and
`experimental technologies have generated a
`steady stream of ever-larger and more com-
`plex genomic data sets that have poured into
`public databases and have transformed the
`study of virtually all life processes. The
`genomic approach of technology develop-
`ment and large-scale generation of commu-
`nity resource data sets has introduced an
`important new dimension into biological and
`biomedical research. Interwoven advances
`in genetics, comparative genomics, high-
`throughput biochemistry and bioinformatics
`*Endorsed by the National Advisory Council for Human Genome
`Research, whose members are Vickie Yates Brown, David R. Burgess,
`Wylie Burke, Ronald W. Davis, William M. Gelbart, Eric T. Juengst,
`Bronya J. Keats, Raju Kucherlapati, Richard P. Lifton, Kim J.
`Nickerson, Maynard V. Olson, Janet D. Rowley, Robert Tepper,
`Robert H. Waterston and Tadataka Yamada.
`
`NATURE| VOL 422| 24 APRIL 2003| www.nature.com/nature
`
`are providing biologists with a
`markedly improved repertoire of research
`tools that will allow the functioning of organ-
`isms in health and disease to be analysed and
`comprehended at an unprecedented level of
`molecular detail. Genome sequences, the
`bounded sets of information that guide bio-
`logical development and function, lie at the
`heart of this revolution. In short, genomics
`has become a central and cohesive discipline
`of biomedical research.
`The practical consequences of the emer-
`gence of this new field are widely apparent.
`Identification of the genes responsible for
`human mendelian diseases,once a herculean
`task requiring large research teams, many
`years of hard work, and an uncertain out-
`come, can now be routinely accomplished
`
`© 2003 Nature Publishing Group
`
`Ariosa Exhibit 1015, pg. 1
`IPR2013-00276
`
`
`
`feature
`
`Fig 2 The future of genomics rests on the foundation of the Human Genome Project.
`
`the physical and
`analysis technologies,
`genetic mapping of genomes, and the
`sequencing of model organism genomes
`and, ultimately, the human genome. Now,
`with the effective completion of these goals,
`we offer a broader and still more ambitious
`vision, appropriate for the true dawning of
`the genomic era. The challenge is to capital-
`ize on the immense potential of the HGP to
`improve human health and well-being.
`The articulation of a new vision is an
`opportunity to explore transformative new
`approaches to achieve health benefits.
`Although genome-based analysis methods
`are rapidly permeating biomedical research,
`the challenge of establishing robust paths
`
`from genomic information to improved
`human health remains immense. Current
`efforts to meet this challenge are largely
`organized around the study of specific dis-
`eases, as exemplified by the missions of the
`disease-oriented institutes at the US Nation-
`al Institutes of Health (NIH, www.nih.gov)
`and numerous national and international
`governmental and charitable organizations
`that support medical research. The National
`Human Genome Research
`Institute
`(NHGRI), in budget terms a rather small
`(less than 2%) component of the NIH, will
`work closely with all these organizations in
`exploring and supporting these biomedical
`research capabilities. In addition, we envi-
`
`BOX 1 Resources
`One of the key and distinctive
`objectives of the Human Genome
`Project (HGP) has been the generation
`of large, publicly available,
`comprehensive sets of reagents and
`data (scientific resources or ‘infrastructure’) that,
`along with other new, powerful technologies,
`comprise a toolkit for genomics-based research.
`Genomic maps and sequences are the most
`obvious examples. Others include databases of
`sequence variation, clone libraries and collections
`of anonymous cell lines. The continued generation
`of such resources is critical, in particular:
`u Genome sequences of key mammals,
`vertebrates, chordates, and invertebrates
`u Comprehensive reference sets of coding
`sequences from key species in various formats,
`for example, full-length cDNA sequences and
`corresponding clones, oligonucleotide primers,
`and microarrays
`
`u Comprehensive collections of knockouts and
`knock-downs of all genes in selected animals to
`accelerate the development of models of disease
`u Comprehensive reference sets of proteins from
`key species in various formats, for example in
`expression vectors, with affinity tags and spotted
`onto protein chips
`u Comprehensive sets of protein affinity reagents
`u Databases that integrate sequences with
`curated information and other large data sets, as
`well as tools for effective mining of the data
`u Cohort populations for studies designed to
`identify genetic contributors to health and to
`assess the effect of individual gene variants on
`disease risk, including a ‘healthy’ cohort
`u Large libraries of small molecules, together
`with robotic methods to screen them and
`access to medicinal chemistry for follow-up,
`to provide investigators easy and affordable
`access to these tools
`
`sion a more direct role for both the extra-
`mural and intramural programmes of the
`NHGRI in bringing a genomic approach to
`the translation of genomic sequence infor-
`mation into health benefits.
`The NHGRI brings two unique assets to
`this challenge. First, it has close ties to a scien-
`tific community whose direct role over the
`past 13 years in bringing about the genomic
`revolution provides great familiarity with its
`potential to transform biomedical research.
`Second,the NHGRI’s long-standing mission,
`to investigate the broadest possible implica-
`tions of genomics,allows unique flexibility to
`explore the whole spectrum of human health
`and disease from the fresh perspective of
`genome science. By engaging the energetic
`and
`interdisciplinary genomics-research
`community more directly in health-related
`research and by exploiting the NHGRI’s abili-
`ty to pursue opportunities across all areas of
`human biology, the institute seeks to partici-
`pate directly in translating the promises of
`the HGP into improved human health.
`To fully achieve this goal, the NHGRI
`must also continue in its vigorous support of
`another of its vital missions — the coupling
`of its scientific research programme with
`research into the social consequences of
`increased availability of new genetic tech-
`nologies and information. Translating the
`success of the HGP into medical advances
`intensifies the need for proactive efforts to
`ensure that benefits are maximized and
`harms minimized in the many dimensions
`of human experience.
`
`A reader’s guide
`The vision for genomics research detailed
`here is the outcome of almost two years of
`intense discussions with hundreds of scien-
`tists and members of the public,in more than
`a dozen workshops and numerous individ-
`ual consultations (see www.genome.gov/
`About/Planning). The vision is formulated
`into three major themes — genomics to biol-
`ogy, genomics to health, and genomics to
`society — and six crosscutting elements.
`We envisage the themes as three floors
`of a building, firmly resting on the founda-
`tion of the HGP (Figure 2). For each theme,
`we present a series of grand challenges, in the
`spirit of the proposals put forward for math-
`ematics by David Hilbert at the turn of the
`twentieth century18. These grand challenges
`are intended to be bold, ambitious research
`targets for the scientific community. Some
`can be planned on specific timescales, others
`are less amenable to that level of precision.
`We list the grand challenges in an order that
`makes logical sense, not representing priori-
`ty. The challenges are broad in sweep, not
`parochial — some can be led by the NHGRI
`alone, whereas others will be best pursued
`in partnership with other organizations.
`Below, we clarify areas in which the NHGRI
`intends to play a leading role.
`
`836
`
`© 2003 Nature Publishing Group
`
`NATURE| VOL 422| 24 APRIL 2003| www.nature.com/nature
`
`Ariosa Exhibit 1015, pg. 2
`IPR2013-00276
`
`
`
`The six critically important crosscutting
`elements are relevant to all three thematic
`areas. They are: resources (Box 1); technolo-
`gy development (Box 2); computational
`biology (Box 3); training (Box 4); ethical,
`legal and social implications (ELSI, Box 5);
`and education (Box 6). We also stress the
`critical importance of early, unfettered
`access to genomic data in achieving maxi-
`mum public benefit. Finally, we propose a
`series of ‘quantum leaps’, achievements that
`would lead to substantial advances in
`genomics research and its applications to
`medicine. Some of these may seem overly
`bold,but no laws of physics need to be violat-
`ed to achieve them. Such leaps would have
`profound implications, just as the dreams of
`the mid-1980s about the complete sequence
`of the human genome have been realized in
`the accomplishments now being celebrated.
`I Genomics to biology
`Elucidating the structure
`and function of genomes
`The broadly available genome sequences of
`human and a select set of additional organ-
`isms represent foundational information
`for biology and biomedicine. Embedded
`within this as-yet poorly understood code
`are the genetic instructions for the entire
`repertoire of cellular components, knowl-
`edge of which is needed to unravel the
`complexities of biological systems. Elucidat-
`ing the structure of genomes and identifying
`the function of the myriad encoded elements
`will allow connections to be made between
`genomics and biology and will, in turn,
`accelerate the exploration of all realms of the
`biological sciences.
`For this, new conceptual and technologi-
`cal approaches will be needed to:
`u Develop a comprehensive and com-
`prehensible catalogue of all of the
`components encoded in the human
`genome.
`u Determine how the genome-encoded
`components function in an integrated
`manner to perform cellular and
`organismal functions.
`u Understand how genomes change and
`take on new functional roles.
`
`Grand Challenge I-1 Comprehensively
`identify the structural and functional
`components encoded in the human
`genome
`Although DNA is relatively simple and well
`understood chemically,the human genome’s
`structure is extraordinarily complex and its
`function is poorly understood.Only 1–2% of
`its bases encode proteins7, and the full com-
`plement of protein-coding sequences still
`remains to be established. A roughly equiva-
`lent amount of the non-coding portion of
`the genome is under active selection11, sug-
`gesting that it is also functionally important,
`yet vanishingly little is known about it. It
`
`NATURE| VOL 422| 24 APRIL 2003| www.nature.com/nature
`
`feature
`
`elements, such as protein-coding sequences,
`still cannot be accurately predicted from
`sequence information alone. Other types of
`known functional sequences, such as genetic
`regulatory elements, are even less well
`understood; undoubtedly new types remain
`to be defined, so we must be ready to investi-
`gate novel, perhaps unexpected, ways in
`which DNA sequence can confer function.
`Similarly, a better understanding of epi-
`genetic changes (for example, methylation
`and chromatin remodelling) is needed to
`comprehend the full repertoire of ways in
`which DNA can encode information.
`Comparison of genome sequences from
`evolutionarily diverse species has emerged as
`a powerful tool for identifying functionally
`important genomic elements. Initial analyses
`of available vertebrate genome sequences7,11,19
`have revealed many previously undiscovered
`protein-coding
`sequences. Mammal-to-
`mammal sequence comparisons have revealed
`large numbers of homologies in non-coding
`regions11, few of which can be defined in
`functional terms. Further comparisons of
`sequences derived from multiple species,espe-
`cially those occupying distinct evolutionary
`positions, will lead to significant refinements
`in our understanding of the functional impor-
`tance of conserved sequences20. Thus, the
`generation of additional genome sequences
`from several well-chosen species is crucial to
`the functional characterization of the human
`genome (Box 1). The generation of such large
`sequence data sets will benefit from further
`advances in sequencing technology that yield
`significant cost reductions (Box 2). The study
`of sequence variation within species will also
`be important in defining the functional nature
`of some sequences (see Grand Challenge I-3).
`
`probably contains the bulk of the regulatory
`information controlling the expression of
`the approximately 30,000 protein-coding
`genes, and myriad other functional ele-
`ments, such as non-protein-coding genes
`and the sequence determinants of chromo-
`some dynamics.Even less is known about the
`function of the roughly half of the genome
`that consists of highly repetitive sequences or
`of the remaining non-coding,non-repetitive
`DNA.
`The next phase of genomics is to cata-
`logue, characterize and comprehend the
`entire set of functional elements encoded in
`the human and other genomes. Compiling
`this genome ‘parts list’ will be an immense
`challenge. Well-known classes of functional
`
`BOX 2 Technology development
`elements that do not encode protein
`The Human Genome Project was
`u In vivo, real-time monitoring of gene expression
`aided by several ‘breakthrough’
`and the localization, specificity, modification and
`technological developments, including
`activity/kinetics of gene products in all relevant
`Sanger DNA sequencing and its
`cell types
`automation, DNA-based genetic
`u Modulation of expression of all gene products
`markers, large-insert cloning systems and the
`using, for example, large-scale mutagenesis,
`polymerase chain reaction. During the project,
`small-molecule inhibitors and knock-down
`these methods were scaled up and made more
`approaches (such as RNA-mediated inhibition)
`efficient by ‘evolutionary’ advances, such as
`u Monitoring of the absolute abundance of
`automation and miniaturization. New
`any protein (including membrane proteins,
`technologies, including capillary-based
`proteins at low abundance and all modified
`sequencing and methods for genotyping single-
`forms) in any cell
`nucleotide polymorphisms, have recently been
`u Improved imaging methods that allow non-
`introduced, leading to further improvements in
`invasive molecular phenotyping
`capacity for genomic analyses. Even newer
`u Correlating genetic variation to human health
`approaches, such as nanotechnology and
`and disease using haplotype information or
`microfluidics, are being developed, and hold great
`comprehensive variation information
`promise, but further advances are still needed.
`u Laboratory-based phenotyping, including the
`Some examples are:
`use of protein affinity reagents, proteomic
`u Sequencing and genotyping technologies to
`approaches and analysis of gene expression
`reduce costs further and increase access to a
`u Linking molecular profiles to biology,
`wider range of investigators
`particularly pathway biology to disease
`u Identification and validation of functional
`
`© 2003 Nature Publishing Group
`
`837
`
`Ariosa Exhibit 1015, pg. 3
`IPR2013-00276
`
`
`
`feature
`
`Effective identification and analysis of
`functional genomic elements will require
`increasingly powerful computational capa-
`bilities, including new approaches for tack-
`ling ever-growing and increasingly complex
`data sets and a suitably robust computation-
`al infrastructure for housing, accessing and
`analysing those data sets (Box 3). In parallel,
`investigators will need to become increasing-
`ly adept in dealing with this treasure trove of
`new information (Box 4). As a better under-
`standing of genome function is gained,
`refined computational tools for de novo
`prediction of the identity and behaviour of
`functional elements should emerge21.
`Complementing
`the
`computational
`detection of
`functional elements will be
`the generation of additional experimental
`data by high-throughput methodologies.
`One example is the production of
`full-
`length
`complementary DNA
`(cDNA)
`sequences (see, for example, mgc.nci.nih.gov
`and www.fruitfly.org/EST/full.shtml). Major
`challenges inherent in programmes to dis-
`cover genes are the experimental identifica-
`tion and validation of alternate splice forms
`and messenger RNAs expressed in a highly
`restricted fashion. Even more challenging is
`the experimental validation of functional ele-
`ments that do not encode protein (for exam-
`ple, regulatory regions and non-coding RNA
`sequences). High-throughput approaches
`to identify them (Box 2) will be needed to
`generate the experimental data that will be
`necessary to develop, confirm and enhance
`computational methods for detecting func-
`tional elements in genomes.
`Because current technologies cannot
`yet identify all functional elements, there is
`a need for a phased approach in which
`new methodologies are developed, tested
`on a pilot scale and finally applied to the
`
`entire human genome. Along these lines,
`the NHGRI recently launched the Encyclo-
`pedia of DNA Elements (ENCODE)
`Project (www.genome.gov/Pages/Research/
`ENCODE) to identify all the functional
`elements in the human genome. In a pilot
`project, systematic strategies for identifying
`all functionally important genomic ele-
`ments will be developed and tested using a
`selected 1% of the human genome. Parallel
`projects
`involving well-studied model
`organisms,for example,yeast,nematode and
`fruitfly, are ongoing. The lessons learned will
`serve as the basis for implementing a broader
`programme for the entire human genome.
`
`Grand Challenge I-2 Elucidate the
`organization of genetic networks and
`protein pathways and establish how they
`
`BOX 3 Computational biology
`regulation, the elucidation of protein structure and
`Computational methods have become
`protein–protein interactions, the determination of
`intrinsic to modern biological research,
`the relationship between genotype and
`and their importance can only increase
`phenotype, and the identification of the patterns
`as large-scale methods for data
`of genetic variation in populations and the
`generation become more prominent, as
`processes that produced those patterns
`the amount and complexity of the data increase,
`u Reusable software modules to facilitate
`and as the questions being addressed become
`interoperability
`more sophisticated. All future biomedical research
`u Methods to elucidate the effects of
`will integrate computational and experimental
`environmental (non-genetic) factors and of
`components. New computational capabilities will
`gene–environment interactions on health and
`enable the generation of hypotheses and stimulate
`disease
`the development of experimental approaches to
`u New ontologies to describe different data types
`test them. The resulting experimental data will, in
`u Improved database technologies to facilitate
`turn, be used to generate more refined models that
`the integration and visualization of different data
`will improve overall understanding and increase
`types, for example, information about pathways,
`opportunities for application to disease. The areas
`protein structure, gene variation, chemical
`of computational biology critical to the future of
`inhibition and clinical information/phenotypes
`genomics research include:
`u Improved knowledge management systems
`u New approaches to solving problems, such as
`and the standardization of data sets to allow the
`the identification of different features in a DNA
`coalescence of knowledge across disciplines
`sequence, the analysis of gene expression and
`
`contribute to cellular and organismal
`phenotypes
`Genes and gene products do not function
`independently, but participate in complex,
`interconnected pathways, networks and
`molecular systems that, taken together, give
`rise to the workings of cells, tissues, organs
`and organisms. Defining these systems and
`determining their properties and inter-
`actions is crucial to understanding how
`biological systems
`function. Yet
`these
`systems are far more complex than any
`problem that molecular biology, genetics or
`genomics has yet approached.On the basis of
`previous experience, one effective path will
`begin with the study of relatively simple
`model organisms22, such as bacteria and
`yeast, and then extend the early findings to
`more complex organisms, such as mouse
`and human. Alternatively, focusing on a few
`well-characterized systems in mammals will
`be a useful test of the approach (see, for
`example, www.signaling-gateway.org).
`Understanding biological pathways, net-
`works and molecular systems will require
`information from several levels.At the genetic
`level, the architecture of regulatory inter-
`actions will need to be identified in different
`cell types, requiring, among other things,
`methods for simultaneously monitoring the
`expression of all genes in a cell12. At the gene-
`product level, similar techniques that allow
`in vivo, real-time measurement of protein
`expression,
`localization, modification and
`activity/kinetics will be needed (Box 2). It
`will be important to develop, refine and scale
`up techniques that modulate gene expression,
`such as conventional gene-knockout meth-
`ods23, newer knock-down approaches24 and
`small-molecule inhibitors25 to establish the
`temporal and cellular expression pattern of
`individual proteins and to determine the
`functions of those proteins. This is a key first
`step towards assigning all genes and their
`products to functional pathways.
`The ability to monitor all proteins in a cell
`simultaneously would profoundly improve
`our ability to understand protein pathways
`and systems biology. A critical step towards
`gaining a complete understanding of sys-
`tems biology will be to take an accurate
`census of the proteins present in particular
`cell types under different physiological con-
`ditions. This is becoming possible in some
`model systems, such as microorganisms26.
`It will be a major challenge to catalogue
`proteins present in low abundance or in
`membranes. Determining
`the absolute
`abundance of each protein, including all
`modified forms, will be an important next
`step. A complete interaction map of the pro-
`teins in a cell,and their cellular locations,will
`serve as an atlas for the biological and med-
`ical explorations of cellular metabolism27
`(see www.nrcam.uchc.edu, for example).
`These and other related areas constitute the
`developing field of proteomics.
`
`838
`
`© 2003 Nature Publishing Group
`
`NATURE| VOL 422| 24 APRIL 2003| www.nature.com/nature
`
`Ariosa Exhibit 1015, pg. 4
`IPR2013-00276
`
`
`
`Establishing a true understanding of how
`organized molecular pathways and networks
`give rise to normal and pathological cellular
`and organismal phenotypes will require
`more than large,experimentally derived data
`sets. Once again, computational investiga-
`tion will be essential (Box 3), and there will
`be a greatly increased need for the collection,
`storage and display of the data in robust
`databases. By modelling specific pathways
`and networks, predicting how they affect
`phenotype, testing hypotheses derived from
`these models and refining the models based
`on new experimental data, it should be
`possible to understand more completely the
`difference between a ‘bag of molecules’ and a
`functioning biological system.
`
`Grand Challenge I-3 Develop a detailed
`understanding of the heritable variation in
`the human genome
`Genetics seeks to correlate variation in DNA
`sequence with phenotypic differences
`(traits). The greatest advances in human
`genetics have been made for traits associated
`with variation in a single gene. But most
`phenotypes,
`including common diseases
`and variable responses to pharmacological
`agents, have a more complex origin, involv-
`ing the interplay between multiple genetic
`factors (genes and their products) and non-
`genetic factors (environmental influences).
`Unravelling such complexity will require
`both a complete description of the genetic
`variation in the human genome and the
`development of analytical tools for using
`that information to understand the genetic
`basis of disease.
`Establishing a catalogue of all common
`variants in the human population, including
`single-nucleotide polymorphisms (SNPs),
`small deletions and insertions, and other
`structural differences, began in earnest
`several years ago. Many SNPs have been
`identified28, and most are publicly available
`(www.ncbi.nlm.nih.gov/SNP). A public
`collaboration, the International HapMap
`Project (www.genome.gov/Pages/Research/
`HapMap), was formed in 2002 to character-
`ize the patterns of linkage disequilibrium
`and haplotypes across the human genome
`and to identify subsets of SNPs that capture
`most of the information about these patterns
`of genetic variation to enable large-scale
`genetic association studies.To reach fruition,
`such studies need more robust experimental
`(Box 2) and computational (Box 3) methods
`that use this new knowledge of human
`haplotype structure29.
`A comprehensive understanding of genetic
`variation, both in humans and in model
`organisms,would facilitate studies to establish
`relationships between genotype and biologi-
`cal function. The study of particular variants
`and how they affect the functioning of specific
`proteins and protein pathways will yield
`important new insights about physiological
`
`NATURE| VOL 422| 24 APRIL 2003| www.nature.com/nature
`
`feature
`
`sphere of animal,plant and microbial species.
`A complete elucidation of genome function
`requires a parallel understanding of the
`sequence differences across species and the
`fundamental processes that have sculpted
`their genomes into the modern-day forms.
`The study of inter-species sequence com-
`parisons is important for identifying func-
`tional elements in the genome (see Grand
`Challenge I-1). Beyond this illuminating
`role, determining the sequence differences
`between species will provide insight into
`the distinct anatomical, physiological and
`developmental features of different organ-
`isms, will help to define the genetic basis for
`speciation and will facilitate the characteri-
`zation of mutational processes. This last
`point deserves particular attention, because
`mutation both drives long-term evolution-
`ary change and is the underlying cause of
`inherited disease. The recent finding that
`mutation rates vary widely across the mam-
`malian genome11 raises numerous questions
`about the molecular basis for these evolu-
`tionary changes.At present,our understand-
`ing of DNA mutation and repair, including
`the important role of environmental factors,
`is limited.
`Genomics will provide the ability to sub-
`stantively advance insight into evolutionary
`variation, which will, in turn, yield new
`insights into the dynamic nature of genomes
`in a broader evolutionary framework.
`
`Grand Challenge I-5 Develop policy
`options that facilitate the widespread use
`of genome information in both research
`and clinical settings
`Realization of the opportunities provided by
`genomics depends on effective access to the
`
`their specific research efforts), at a collaborative
`level (researchers will need to be able to
`participate effectively in interdisciplinary research
`collaborations that bring biology together with
`many other disciplines) and at the disciplinary
`level (new disciplines will need to emerge at the
`interfaces between the traditional disciplines).
`u Different perspectives Individuals from
`minority or disadvantaged populations are
`significantly under-represented as both
`researchers and participants in genomics
`research. This regrettable circumstance deprives
`the field of the best and brightest from all
`backgrounds, narrows the field of questions
`asked, can lessen sensitivity to cultural concerns
`in implementing research protocols, and
`compromises the overall effectiveness of the
`research. Genomics can learn from successful
`efforts in training individuals from under-
`represented populations in other areas of science
`and health (see, for example,
`www.genome.gov/Pages/Grants/Policies/
`ActionPlanGuide).
`
`processes in normal and disease states. An
`enhanced ability to incorporate information
`about genetic variation into human genetic
`studies would usher in a new era for investigat-
`ing the genetic bases of human disease and
`drug response (see Grand Challenge II-1).
`
`Grand Challenge I-4 Understand
`evolutionary variation across species and
`the mechanisms underlying it
`The genome is a dynamic structure, continu-
`ally subjected to modification by the forces
`of evolution. The genomic variation seen
`in humans represents only a small glimpse
`through the larger window of evolution,
`where hundreds of millions of years of trial-
`and-error efforts have created today’s bio-
`
`BOX 4 Training
`Meeting the scientific, medical and
`social/ethical challenges now facing
`genomics will require scientists,
`clinicians and scholars with the skills
`to understand biological systems and
`to use that information effectively for the benefit
`of humankind. Adequate training capacity will be
`required to address the following needs:
`u Computational skills As biomedical research
`is becoming increasingly data intensive,
`computational capability is increasingly becoming
`a critical skill.
`u Interdisciplinary skills Although a good start
`has been made, expanded interactions will be
`required between the sciences (biology, computer
`science, physics, mathematics, statistics,
`chemistry and engineering), between the basic
`and the clinical sci