`
`
`
`Gene 205 (1997) 73-94
`
`GENE
`
`AN INTERNATIONAL. JOURNAL ON
`
`GENES AND GENOMES
`
`Locus control regions of mammalian /3-globin gene clusters: combining
`
`
`
`
`
`
`
`
`
`
`phylogenetic analyses and experimental results to gain functional insights
`
`Review
`
`a,b,*, Jerry L. Slightom C, Deborah L. Gumucio d, Morris Goodman e,
`
`
`
`Ross Hardison
`b,r
`
`Nikola Stojanovic r, Webb Miller
`a Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
`
`
`
`
`
`
`
`
`
`
`
`
`
`b Center for Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
`
`
`
`
`
`
`
`c Molecular Biology Unit 7242, Pharmacia and Upjohn, Inc., Kalamazoo, MI 49007, USA
`d Department of Anatomy and Cell Biology, University of Michigan Medical School, Ann Arbor, MI 48109-0616, USA
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`e Department of Anatomy and Cell Biology, Wayne State School of Medicine, Detroit, MI 48201, USA
`
`
`
`
`
`
`
`
`r Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
`
`
`
`Accepted 22 July 1997
`
`Abstract
`
`Locus control regions (LCRs) are cis-acting DNA segments needed for activation of an entire locus or gene cluster. They are
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`operationally defined as DNA sequences needed to achieve a high level of gene expression regardless of the position of integration
`
`
`
`
`
`
`
`
`in transgenic mice or stably transfected cells. This review brings together the large amount of DNA sequence data from the /3-
`
`
`
`
`
`
`
`
`globin LCR with the vast amount of functional data obtained through the use of biochemical, cellular and transgenic experimental
`
`
`
`
`
`
`
`
`
`
`systems. Alignment of orthologous LCR sequences from five mammalian species locates numerous conserved regions, including
`
`
`
`
`
`
`
`
`
`
`previously identified cis-acting elements within the cores of nuclease hypersensitive sites (HSs) as well as conserved regions located
`
`
`
`
`
`
`
`
`
`between the HS cores. The distribution of these conserved sequences, combined with the effects of LCR fragments utilized in
`
`
`
`
`
`
`
`
`
`expression studies, shows that important sites are more widely distributed in the LCR than previously anticipated, especially in
`
`
`
`
`
`and around HS2 and HS3. We propose that the HS cores plus HS flanking DNAs comprise a 'unit' to which proteins bind and
`
`
`
`
`
`
`form an optimally functional structure. Multiple HS units (at least three: HS2, HS3 and HS4 cores plus flanking DNAs) together
`
`
`
`
`
`
`
`
`establish a chromatin structure that allows the proper developmental regulation of genes within the cluster. © 1997 Elsevier
`
`Science B.V.
`
`
`
`Keywords: Hemoglobin; Sequence conservation; Enhancement; Chromatin; Domain opening; DNA-binding proteins
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1.Expression patterns of mammalian hemoglobin gene
`
`
`
`clusters
`
`
`clusters in birds and mammals. In humans, the /3-like
`
`
`
`
`
`
`globin genes (including pseudogenes denoted by the
`in the array 5'-E-Gy-Ay-if;r,-f>-/3-3'
`prefix if;) are clustered
`The genes that encode the polypeptides of the a2/32
`
`
`
`that covers about 75 kb on chromosome l lp l5.4, and
`
`
`
`tetramer of hemoglobin are encoded in two separate
`5'-(2-if;(l
`
`
`the a-like globin genes are in a 4O-kb cluster,
`
`if;a2-if;al-a2-al-0-3', very close to the telomere o f the
`
`
`short arm of chromosome 16. Expression of the a-and
`*Corresponding author. Present address: Department of Biochemistry
`
`
`
`
`
`
`
`/3-like globin genes is limited to erythroid cells and is
`
`
`
`
`and Molecular Biology, The Pennsylvania State University, 206
`
`
`
`balanced so that equal amounts of the two polypeptides
`
`
`
`
`Althouse Laboratory, University Park, PA 16802, USA. Tel.: + I 814
`
`
`
`are available to assemble the hemoglobin heterotet
`
`
`8630113 ; Fax: + 1 814 8637024; e-mail: rch8@psu.edu
`
`
`
`
`ramer. Expression of genes within the clusters is develop
`
`
`
`
`Abbreviations: LCR, locus control region; HS, hypersensitive site;
`
`
`
`mentally controlled, so that different forms of
`
`
`
`
`
`HIC, highest information content; DPF, differential phylogenetic foot
`
`
`
`hemoglobin are produced in embryonic, fetal and adult
`
`print; CACBPs, proteins that bind to the CACC motif; MAR, matrix
`
`
`
`
`
`life (reviewed in Stamatoyannopoulos and Nienhuis,
`
`
`
`attachment region; bHLH, basic helix-loop-helix; MEL, murine
`erythroleukemia.
`1994).
`
`0378-1119/97/$17.00 © 1997 Elsevier Science B.V. All rights reserved.
`
`
`
`
`
`Pl/ S0378-l l 19(97)00474-5
`
`SKI Exhibit 2015
`Page 1 of 22
`
`
`
`74
`
`
`
`
`
`R.Hardison et al. / Gene 205 ( 1997) 73-94
`
`This process of hemoglobin switching is an excellent
`
`
`anthropoid primates, its expression continues and predo
`
`
`
`
`
`
`model system for increasing our understanding of the
`
`
`
`minates in fetal red cells. The appearance of this new
`
`
`
`molecular mechanisms of differential gene expression
`
`
`
`
`pattern of fetal expression of the y-globin genes coincides
`
`
`
`
`during development. These developmental switches also
`
`
`roughly with the duplication of the genes in primate
`
`
`
`offer new approaches to therapy for inherited anemias.
`
`
`evolution, which leads to the hypothesis that the duplica
`
`
`
`
`For example, continued expression of the normally fetal
`
`
`tion allowed the changes that caused the fetal recruit
`
`
`
`HbF (o:2y2) in adults will reduce the severity of symptoms
`
`
`ment (Hayasaka et al., 1993). The /J-globin gene is
`
`
`
`
`of patients producing an abnormal P-globin in sickle
`
`
`expressed after birth in all mammals, but in galago,
`
`
`
`
`cell disease and possibly also in patients lacking sufficient
`
`
`
`mouse and rabbit, its expression initiates and predomi
`
`
`P-globin (P-thalassemia). An understanding of the
`
`
`
`nates in the fetal liver (arguing that fetal expression of
`
`
`
`molecular basis of globin gene switching will facilitate
`
`
`
`the p-globin gene is the ancestral state). The recruitment
`
`
`
`development of new therapeutic strategies (pharmaco
`
`
`of y-globin genes for fetal expression in anthropoid
`
`
`
`logical and/or DNA transfer) that continue y-globin
`
`
`
`primates is accompanied by a corresponding delay in
`
`gene expression in adults.
`
`
`expression of the p-globin gene.
`In addition to biochemical and genetic approaches to
`
`
`
`
`
`
`Comparisons of DNA sequences among mammalian
`
`studying regulation of globin genes, phylogenetic
`
`
`
`p-globin gene clusters can reveal candidates for
`
`
`approaches are also highly informative. The detailed
`
`
`
`
`
`sequences involved in shared regulatory functions; these
`
`
`
`
`
`
`
`
`study of globin gene clusters in many mammalian species will be detected as conserved sequence blocks, or phylo
`
`
`genetic footprints, found in all mammals (Gumucio
`
`
`
`has provided a rich resource of information from which
`
`
`
`
`
`
`
`to glean further insight into not only the evolution of et al., 1992; Hardison et al., 1993). Notable similarities
`
`
`the gene clusters but also their regulation. The p-globin
`
`
`are found in alignments of the proximal 5' flanking
`
`
`
`
`
`regions of the orthologous P-like globin genes, consistent
`
`
`
`gene clusters have been extensively studied in human,
`
`
`with their roles as promoters and other regulators of
`
`
`
`
`the prosimian galago, the lagomorph rabbit, the artio
`
`
`
`
`expression. In addition, striking and extensive sequence
`
`dactyls goat and cow, and the rodent mouse. Maps of
`
`matches are found at the far 5' end of the gene clusters,
`
`
`these gene clusters are shown in Fig. I, and aspects of
`
`
`in the region that we now recognize as the locus control
`
`
`
`their evolution and regulation have been reviewed
`
`
`region (LCR), which is the dominant, distal control
`
`
`(Collins and Weissman, 1984; Goodman et al., 1987;
`
`
`
`sequence for these gene clusters. Sequence comparisons
`
`
`
`
`Hardison and Miller, 1993). The E:-globin gene is at the
`
`
`can be used also to identify candidates for regulatory
`
`
`
`5' end of all the mammalian globin gene clusters and is
`
`
`
`elements that lead to differences in expression patterns.
`
`
`
`expressed only in embryonic red cells in all cases. In
`
`
`
`In this case, one searches for sequences conserved in the
`
`
`
`most eutherian mammals, expression of the y-globin
`
`
`set of mammals that show a particular phenotype but
`
`
`
`gene is also limited to embryonic red cells, but in
`
`0
`I
`
`Human 13
`
`duplication of y
`
`and
`fetal recruitment '::iii.
`
`Galago 13 �
`
`
`
`L - -- - Rabbit l3 �
`
`
`
`
`
`Ancestral eutherian mammal
`
`20
`I
`
`40
`I
`
`E GyAy lj/Tl cl
`
`60
`I
`
`80
`I
`
`100
`I
`
`120
`I
`
`140 kb
`I
`
`E FF
`cl l3
`E Y lj/Tl
`
`A
`
`EE
`F,A
`E Y ljlcl 13
`
`E E F,A
`
`Goatl3
`
`Mouse 13
`
`E E J E
`y bh0 bh1bh2 bh3 b1 b2
`
`A
`
`F
`
`E E
`
`F,A F,A
`
`Fig. I. Evolution of /3-globin gene clusters in eutherian mammals. The inferred ancestral gene cluster and the branching pathways to contemporary
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`gene clusters are shown. The time of expression during development is indicated beneath the box representing each gene; E, embryonic; F, fetal;
`
`
`A, adult. The boxes for orthologous genes have the same shading.
`
`SKI Exhibit 2015
`Page 2 of 22
`
`
`
`
`
`R Hardison et al./ Gene 205 ( 1997) 73-94
`
`75
`
`2.2. Position-independent expression and enhancement
`
`
`
`
`
`
`cluster
`
`1986). Thus, the LCR marks an open chromatin domain
`
`
`which differ in the species with a different pattern of
`
`
`
`
`
`
`
`for the /3-like globin gene cluster in erythroid cells from
`
`
`
`expression. For instance, such differential phylogenetic
`
`
`
`
`all developmental stages, and functional assays implicate
`
`
`footprints (Gumucio et al., 1994) led to the discovery
`
`
`the LCR in generating this open domain, as described
`
`
`
`
`of a sequence implicated in fetal-specific expression of
`in the next section.
`
`
`the y-globin genes in higher primates (Jane et al., 1992)
`
`
`and a sequence that binds several proteins implicated in
`
`
`
`fetal silencing of the y-globin gene (Gumucio et al.,
`
`
`
`
`
`1994). In this review, we summarize the results of
`As illustrated in Fig. 2, the /3-globin LCR will confer
`
`
`
`
`
`sequence comparisons for both types of regulatory ele
`
`
`
`high-level, position-independent expression on globin
`ment in the LCR.
`
`
`
`gene constructs in transgenic mice (reviewed in Townes
`
`
`and Behringer, 1990; Grosveld et al., 1993). In the
`
`
`
`absence of the LCR, the human /3-or y-globin gene is
`2.General features of mammalian �-globin LCRs
`
`expressed in only about half of the lines of transgenic
`
`
`
`mice carrying the integrated gene, and expression levels
`2.1. DNase hypersensitive sites 5' to the f3-globin gene
`
`
`are low relative to those of the endogenous mouse globin
`
`
`genes. The lack of expression in many lines of transgenic
`
`
`
`
`mice is presumed to result from negative position effects
`The /3-globin LCR was initially discovered as a set of
`
`
`
`
`
`
`generated by adjacent sequences at the site of integ
`
`
`dnase hypersensitive sites located 5' to the £-globin gene
`
`
`
`
`ration, which prevent expression of the transgene in
`
`
`(Tuan et al., 1985; Forrester et al., 1986, 1987). At least
`
`
`
`erythroid cells. However, when a large DNA fragment
`
`5 DNase HSs, called HS1-HS5 (Fig. 2), have been
`
`
`
`containing the full LCR is linked to the /3-globin gene,
`
`
`
`
`characterized within the region that provided the original
`
`
`
`all resulting transgenic mouse lines express the gene,
`
`
`
`
`gain-of-function effects described below (Grosveld et al.,
`
`and at a level comparable to that of the endogenous
`
`
`1987), and we will refer to this region with all five HSs
`
`
`
`globin genes (Grosveld et al., 1987). Hence, the negative
`
`as the 'full LCR.' The presence of DNase HSs is
`
`
`
`
`
`position effects are no longer observed, indicating that
`
`
`
`indicative of an altered chromatin structure associated
`
`
`
`
`either a strong domain-opening activity (that overrides
`
`
`
`with important cis-regulatory regions (Gross and
`
`
`
`
`the negative effects of adjacent sequences), or an insula
`
`
`
`Garrard, 1988). Some of these sites, especially HS3,
`
`
`tor that blocks the effects of adjacent sequences, or
`
`
`
`appear preferentially in erythroid nuclei (Dhar et al.,
`
`both, are present in the LCR. The high level of expres
`
`
`
`1990), but in contrast to the DNase hypersensitive sites
`
`
`
`sion of the transgene indicates the presence of enhancers
`
`
`
`
`at promoters, all are developmentally stable, i.e., present
`
`in the LCR as well. Both enhancers and LCRs increase
`
`
`in embryonic, fetal and adult red cells (Forrester et al.,
`
`0
`I
`
`20
`I
`
`40
`I
`
`60
`I
`
`DNase HSs
`
`80 kb
`
`1Expr!:!SS!:!d in Develgpmental
`
`!;!l(th[QiQ
`f.QfilllQD.
`� Chromatin
`red cells
`RegulatiQn
`
`E
`
`E
`
`Yes
`
`No Open
`
`No
`
`Yes? Closed
`
`GyAy 'lfll 6 13
`Human
`♦
`Yes
`ttttt
`13-globin�:�9.�:l
`I II □� I
`gene cluster
`GyAy 'lfll 6 13
`Hispanic
`No
`(y6l3) thalassemia
`I
`I II
`0--§
`
`In transgenic mice:
`13
`Sometimes Yes Yes Sometimes
`open
`I
`13
`
`.
`
`Yes PrecociousNo Open
`expression
`
`E
`
`GyAy 'lfll 6 13
`♦
`No Open
`Yes
`Yes
`ttttt
`I II 0--{J I
`�=�=l
`
`
`
`
`
`
`
`Fig. 2. Summary of the major effects of the P-globin locus control region.
`
`SKI Exhibit 2015
`Page 3 of 22
`
`
`
`76
`
`
`
`R Hardison et al. / Gene 205 ( 1997) 73-94
`
`2.5. Developmental regulation
`
`2.3. Copy-number dependent expression
`
`the probability that a locus will be in a transcriptionally
`
`semia deletion, which removes HS2 through HS5
`
`
`
`
`
`
`competent state without affecting the transcription rate
`
`
`(Fig. 2), not only leaves the locus in a closed chromatin
`
`
`
`in a cell actively expressing that locus (Walters et al.,
`
`conformation but also delays the time of replication
`
`
`
`1995, 1996; Wijgerde et al., 1996). This further argues
`
`
`from early to late in S phase in erythroid cells (Forrester
`
`that one of the major functions of the LCR is to open
`
`
`
`et al., 1990). Replication of the /3-globin gene locus
`
`
`a chromatin domain around the locus in erythroid cells.
`
`
`
`normally initiates just 5' to the /J-globin gene (Kitsberg
`
`et al., 1993), which is 50 kb 3' to the LCR. Surprisingly,
`
`In fact, deletion of most of the LCR but not the /3-
`
`
`globin genes, e.g., as occurs in Hispanic (yb/3)-thalas
`
`
`
`
`chromosomes with the Hispanic thalassemia deletion no
`
`
`
`
`semia, leaves the gene cluster in a chromatin conforma
`
`
`
`longer use the normal replication origin, even though it
`
`tion that is inaccessible to DNase I, and the globin
`
`
`
`
`is intact, but instead use an origin located 3' to the /3-
`
`
`
`genes are not expressed (Forrester et al., 1990). Thus,
`
`globin locus (Aladjem et al., 1995).
`
`
`this loss-of-function analysis also shows that the LCR
`
`
`
`is necessary for the establishment and maintenance of
`
`
`
`an open chromatin domain within which the globin
`
`genes are expressed (Fig. 2).
`The effects of the LCR, if any, on developmental
`
`Minimal DNA sequences that confer position-inde
`
`
`
`regulation are more complicated to analyze. Several
`
`
`pendent expression of a linked /3-globin gene in
`
`
`
`lines of evidence show that sequences proximal to the
`
`
`
`genes are sufficient to specify expression at a given
`
`
`
`transgenic mice have been determined in regions around
`
`
`
`
`the sites of strong DNase cleavage (reviewed in Grosveld
`
`
`
`developmental stage. In the absence of an LCR, human
`
`
`et al., 1993). These regions are referred to as the
`
`
`
`
`/3-like globin genes are expressed at the 'correct' develop
`
`
`'hypersensitive site cores' for HSI, HS2, HS3 and HS4.
`
`
`mental stage in transgenic mice, i.e., mimicking the
`
`
`expression pattern of the orthologous endogenous
`
`mouse genes (summarized in Trudel and Costantini,
`
`
`1987). In fact, developmental switching can occur
`
`
`
`
`Transgene constructs that confer full protection from
`
`
`
`between human y-and /3-globin genes in transgenic mice
`
`
`in the absence of an LCR (Starck et al., 1994 ), demon
`
`
`
`position effects should not be affected by any adjacent
`
`
`
`sequences. Thus, when the construct is integrated in
`
`
`strating that the LCR is not essential for switching.
`
`
`Point mutations in the promoter of the human y-globin
`
`
`
`multiple copies, as is frequently the case in transgenic
`
`
`
`
`mice lines and in stably transfected cultured cells, each
`
`
`
`genes are associated with prolonged expression in the
`
`
`
`copy should be expressed independently of other copies,
`
`
`
`
`adult stage, i.e., hereditary persistence of fetal hemoglo
`
`
`
`resulting in a level of expression that increases linearly
`
`
`bin (reviewed in Stamatoyannopoulos et al., 1994).
`
`with the number of copies. This 'copy-number-depen
`
`
`
`Detailed studies of the human E-and y-globin genes in
`
`
`dent' expression has been observed in some cases with
`
`
`
`constructs also containing LCR fragments have revealed
`
`
`
`
`particular fragments of the /3-globin LCR (Talbot et al.,
`
`
`sequences extending up to about 0.8 kb away from the
`
`
`
`1989), as well as with the chicken /3/E-globin enhancer
`
`
`gene that have both positive and negative effects on
`
`
`
`developmental control (Stamatoyannopoulos et al.,
`
`
`
`
`(Reitman and Felsenfeld, 1990). Other experiments with
`
`1993; Trepicchio et al., 1993; Li and Stamatoy
`
`
`fragments of the /3-globin LCR do not show a clear
`
`
`annopoulos, 1994b; Trepicchio et al., 1994 ). Recent
`
`
`dependence on copy-number (Ryan et al., 1989), and
`
`
`studies in transgenic mice show that the human y-globin
`
`
`
`
`occasional studies show inverse relationships between
`
`
`copy number and level of expression (Morley et al.,
`
`
`
`
`gene is expressed fetally, whereas the orthologous galago
`
`
`
`y-globin gene is expressed embryonically, in the context
`
`1992; TomHon et al., 1997). Although the minimal
`
`
`
`
`of an otherwise identical transgene construct (TomHon
`
`
`
`sequences that will achieve full dependence on copy
`
`
`
`et al., 1997). This recapitulation of developmental speci
`
`
`number are not yet known, this property appears to
`
`
`
`ficity shows that the dominant determinants of develop
`
`require sequences from both the LCR and the gene
`
`
`mental timing are encoded by nucleotide differences
`
`proximal region (Lloyd et al., 1992; Fraser et al., 1993;
`
`
`
`
`
`within the 4.0-kb fragment containing the y-globin gene.
`
`Li and Stamatoyannopoulos, 1994b ). For the y-globin
`
`
`
`
`Although developmental switches in expression can
`
`
`
`gene, copy-number dependence requires both sequences
`
`
`
`occur in the absence of the LCR, it is still possible that,
`
`
`3' to the y-globin gene and one or more elements in the
`
`HS cores (Stamatoyannopoulos et al., 1997).
`
`
`
`when present, the LCR participates directly in develop
`
`
`
`mental regulation (e.g., Stamatoyannopoulos, 1991;
`
`
`
`Wijgerde et al., 1996). Addition of the LCR to a single
`
`human /3-or y-globin gene will alter developmental
`
`
`
`
`control (Fig. 2), leading to precocious expression of the
`In addition to the strong effects of the /3-globin LCR
`
`
`
`
`
`
`/3-globin gene in embryonic red cells and expression of
`
`
`
`on chromatin opening and enhancement of expression,
`
`
`the y-globin gene in fetal and adult stages (Enver et al.,
`
`the LCR also has a dominant effect on the regulation
`
`
`1989; Behringer et al., 1990). Inclusion of bothy-and
`
`
`
`
`of replication in the locus. The Hispanic (yb/3)-thalas-
`
`
`
`
`
`
`
`2.4. Replication of the locus
`
`SKI Exhibit 2015
`Page 4 of 22
`
`
`
`
`
`R Hardison et al./ Gene 205 ( 1997) 73-94
`
`77
`
`2.6. Models for LCR action
`
`cells. This could occur indirectly, with recognition of
`
`
`
`
`
`
`{1-globin genes will improve the developmental switch
`
`
`specific sequences in the LCR by trans-activator proteins
`
`
`
`ing, leading to a model of competition between promot
`
`such as members of the APl family of proteins and
`ers for the LCR (Enver et al., 1990). The order of
`
`
`
`recruitment of chromatin remodeling and/or histone
`
`
`
`multiple globin genes in LCR-containing constructs also
`
`
`
`modifying activities by specific interaction between these
`
`
`
`influences their regulation (Hanscombe et al., 1991;
`
`
`enzymes and the trans-activator. For instance, the
`
`
`Peterson and Stamatoyannopoulos, 1993). Although
`
`
`
`co-activator proteins CBP and P300 are histone acetyl
`
`
`these data can be explained by a competition model,
`
`
`
`transferases and also interact with APl (Ogrysko et al.,
`
`
`the apparent loss of developmental control seen in the
`
`
`
`1996). In addition, some DNA sequences in the LCR
`
`
`presence of an LCR could result from the increased
`
`
`
`
`could recruit chromatin remodeling and modifying activ
`
`
`
`sensitivity of the assays, and the effects of additional
`
`ities directly.
`
`
`genes in the construct can be explained by gene order
`Several other issues remain unresolved. For instance,
`
`
`
`
`
`effects (such as transcriptional interference from the
`
`upstream gene) as opposed to proximity to the LCR
`
`
`the LCR could influence all or several of the genes in
`
`
`
`
`the locus at once (Bresnick and Felsenfeld, 1994; Martin
`
`(Martin et al., 1996).
`
`
`et al., 1996) or it could serve to activate expression of
`The effects that led to models of competition in
`
`
`
`one gene at a time (Wijgerde et al., 1995). If the LCR
`
`
`
`developmental regulation are seen primarily for the
`
`
`does influence predominantly one gene at time, it could
`
`
`regulation of the human {1-globin gene. The €-globin
`
`
`
`do so by interaction directly with the target gene with
`
`
`(Raich et al., 1990; Shih et al., 1990), a-globin and (
`
`
`
`
`looping out of DNA between this distal regulator and
`
`
`globin (Pondel et al., 1992; Liebhaber et al., 1996) genes
`
`the proximal regulatory elements (Grosveld et al., 1993)
`
`
`
`are autonomously regulated during development in the
`
`
`or the positive effect of the LCR could 'track' along the
`
`
`
`
`presence of LCR-like elements, and constructs contain
`
`
`
`DNA to the target gene (Tuan et al., 1992). Neither the
`
`
`
`ing larger LCR fragments with the y-globin gene also
`
`
`
`
`molecular targets of the direct interactions (in the former
`
`
`
`show autonomous regulation (Dillon and Grosveld,
`
`
`model) nor the molecular basis of the tracking effects
`1991 ).
`
`
`'tracking' (in the latter model) are known. For instance,
`
`
`
`could involve movement of transcription factors along
`
`
`the DNA, or it could result from spreading of the active
`
`chromatin domain down the locus.
`
`
`
`Many studies are consistent with the hypothesis that
`
`
`several DNase HSs in the LCR work together in a
`
`
`
`
`holocomplex to generate the several effects enumerated
`
`
`
`above. One explicit model stating that each HS has a
`
`
`predominant effect on only one specific gene in the
`DNA sequences of much of the {1-globin LCR are
`
`
`
`
`
`
`cluster (Engel, 1993) can be excluded since deletions of
`
`
`
`
`now available from several mammalian species, includ
`
`
`
`
`single HSs in the context of entire gene clusters either
`ing human (Li et al., 1985; Yu et al., 1994), galago
`
`
`have little effect or affect expression of all the genes in
`
`
`
`(Slightom et al., 1997), rabbit (Hardison et al., 1993;
`
`the locus (reviewed below). Indeed, removal of any
`
`
`Slightom et al., 1997), goat (Li et al., 1991) and mouse
`
`
`
`single HS makes the entire human {1-globin gene cluster
`
`(Moon and Ley, 1990; Hug et al., 1992; Jimenez et al.,
`
`
`
`more sensitive to position effects in transgenic mice
`
`
`
`
`1992). The remainder of this review will discuss insights
`
`
`
`(Milot et al., 1996), arguing that this defining property
`
`
`
`into the regions required for LCR function based on
`
`
`of the LCR requires all of the HSs. This result contrasts
`
`
`
`patterns of conservation revealed by a simultaneous
`
`
`
`with the implications of reports on the ability of indivi
`
`
`
`alignment of these DNA sequences (Slightam et al.,
`
`
`dual HSs to provide position-independent, copy-number
`
`1997). Key features of the LCRs from the different
`
`
`
`
`dependent expression (e.g., Fraser et al., 1993), and the
`mammals are mapped in Fig. 3.
`
`
`
`molecular basis for this apparent discrepancy is not
`
`
`
`clear. Functional interactions between the HSs have
`
`3.1. Conservation of number and order of HSs
`
`
`been demonstrated, but require DNA sequences inside
`
`
`
`and outside the core HSs (reviewed below). Thus,
`All of the known mammalian /1-globin LCRs have
`
`
`
`
`
`
`although several individual HSs do exhibit substantial
`
`segments homologous to HSI, HS2 and HS3 (Fig. 3).
`
`
`function alone, it is most likely that they normally
`
`
`
`HS4 is likely present in all these species as well, although
`
`
`
`interact in a holocomplex (Ellis et al., 1996) that encom
`
`
`
`the currently available goat sequence does not include
`
`
`passes a substantial amount of DNA.
`
`
`
`the region corresponding to HS4. Homologs to human
`
`The ability of the LCR to open a chromosomal
`
`HS5 are found in galago (Slightam et al., 1997) and
`
`
`domain suggests that it recruits chromatin-remodeling
`
`mouse (A. Reik, M. Bender and M. Groudine, pers.
`
`activities such as SWI/SNF (Cote et al., 1994; Peterson
`
`
`
`commun.), suggesting a wide distribution of HS5 as
`
`
`and Tamkun, 1995) and/or histone acetyl transferases
`
`
`well. If HS5 is present in rabbit, it does not occur in
`
`
`(Brownell et al., 1996) to this locus, but only in erythroid
`
`
`the same place in human or galago. Thus the presence
`
`
`
`
`
`3.Sequence analysis of mammalian P-globin LCRs
`
`
`
`SKI Exhibit 2015
`Page 5 of 22
`
`
`
`78
`
`
`
`
`
`R Hardison et al. / Gene 205 ( 1997) 73-94
`
`0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000
`
`HSS
`
`HS4
`
`HS3
`
`HS2
`
`HS1
`
`Human
`
`I ►I ◄ I ► ►'6inhum.► �
`
`• ..
`
`f"globi+
`
`◄
`
`0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000
`
`-globin
`
`Galago
`
`0
`
`2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000
`
`0
`
`2000 4000 6000 8000 10000 12000 14000 16000
`
`not sequenced
`in goat
`
`� I � � � I ���B-
`
`10000 12000 14000 16000
`18000 20000 22000
`2000 4000 6000 8000
`
`L1Md
`
`L 1Md >--------it--globin
`
`Goat
`
`Mouse
`
`0
`
`I- ,,_---I
`
`
`
`distance = 6.5 kb in a allele (BALB/c)
`
`5.0 kb in b allele (C67BU6J)
`
`Fig. 3. Mammalian p-globin LCRs. Maps of the p-globin LCRs of human, galago, rabbit, goat and mouse show the positions of HS cores in
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`humans, their homologs in other species, positions and identities of repeats, and the new regions sequenced in rabbit (double-arrowed lines under
`
`
`
`
`
`
`the rabbit p-globin LCR map). The HS cores are shown as boxes with distinctive fills, long interspersed repeats (Lis) are open arrowed boxes,
`
`
`
`
`
`
`
`
`
`
`and short interspersed repeats are triangles (in the latter two cases, the icon points in the direction that the repeat is oriented). Short repeats are
`
`
`
`
`
`
`
`
`
`in mouse. in goats, and Bl repeats Alu repeats in humans, both type I and type II Alu repeats in galago, C repeats in rabbits, Nia and D repeats
`
`
`
`
`An insertion between positions 14 419 and 14 599 of galago does not match any known short or long repeats, and it may represent a newly
`
`
`
`
`
`
`
`discovered repeat. An insertion of 81 bp that begins at position 3614 of galago is a novel short insertion sequence.
`
`
`
`3.2. Conserved sequences within the LCR
`
`
`
`
`
`of four (HSl-4) and possibly all five major HSs is
`
`Fig. 4. The information content reflects both the amount
`
`
`
`
`
`
`conserved in these eutherian mammals. This conserva
`
`
`
`of variability in a column in a multiple alignment as
`
`
`
`well as the base composition for the sequences being
`
`
`
`tion extends even further back in evolutionary time,
`
`
`
`
`aligned, and provides a finely graded function for meas
`
`with at least LCR HS2, HS3, HS4, and possibly HSI
`
`uring conservation. The second method simply finds
`
`
`being found in Australian marsupials and monotremes
`
`
`(R. Baird, J. Kuliwaba, R. Hope, M. Goodman et al.,
`
`
`runs of exact matches; Fig. 4 plots positions with seven
`
`or more consecutive invariant columns such that
`
`personal communication).
`
`
`
`sequences from some minimal number of species align
`
`
`(four in one case, three in the other). A third method
`
`
`
`(Stojanovic et al., 1997) was devised to better reflect
`
`
`
`
`matches found at protein binding sites. Specifically,
`We used the program yama2 (Chao et al., 1994) to
`
`
`
`Fig. 4 identifies all runs of six or more columns possess
`
`compute a simultaneous alignment of the available
`
`
`
`
`ing a plausible consensus sequence, i.e., each row in that
`
`
`mammalian {3-globin LCR sequences. We then used
`
`
`region can have at most one mismatch with the (a priori
`
`
`three different approaches to search for conserved
`
`
`
`unspecified) consensus. This requirement mimics the
`
`
`
`
`sequences at a variety of criteria (Slightom et al., 1997;
`
`
`
`documented ability of some proteins to bind equally
`
`
`
`Stojanovic et al., 1997). The first method computes the
`
`
`well to similar but not identical sequences. For instance,
`
`information content
`
`of each column (Schneider et al.,
`GATAl binds to AGATAA or to TGATAG, which each
`
`
`1986); the positions of the 10 and 30 blocks with the
`
`
`
`highest information content (HIC) are displayed in
`
`differ in only one position from AGATAG.
`
`
`SKI Exhibit 2015
`Page 6 of 22
`
`
`
`
`
`R.Hardison et al./ Gene 205 ( 1997) 73-94
`
`
`
`79
`
`HS2
`
`; H$1
`
`HS4 HS3
`HS cores
`HSs
`10 HIC
`30 HIC
`1=7, n=4
`1=7, n=3
`1 mismatch
`DPF
`AP1/NFE2
`CACBP
`GATA
`TATTT/ATTTA
`
`clusters that some single-copy regions do not align
`
`
`
`
`
`
`(Hardison et al., 1994; Hardison and Miller, 1993). One
`1f11 I II' 11111;
`
`
`
`
`
`notable example is the intergenic region between the c5-
`I II I I I
`'
`II 11 1 11111 II II II 1111 I I
`
`and p-globin genes in mouse vs. human comparisons
`I Ill 'I II I I I
`
`
`
`(Hardison et al., 1997). This shows that in the time
`1111111 1111 Ill 'I II 11 II I I Ill I
`
`
`
`since the ancestors to rodents and primates separated,
`11 I I HIIIIIIIH 11111111111 11111 Ill 111111
`
`
`
`
`1 II
`I, 11 I I 11 II ,
`
`
`some sequences in this locus (presumably those not
`I
`I
`
`
`
`
`necessary for function) have diverged extensively. Thus,
`I I II I I I
`
`
`the phylogenetic footprints in the LCR are indeed
`I
`II I II 11 I
`1111 II II 1111 i 1 1I
`
`
`candidates for functional sequences.
`
`4000 6000 8000 10000 12000 14000 16000
`position
`
`3.3. Correspondence between DNase HSs and conserved
`
`
`
`
`blocks
`
`3.4. Repeated, conserved sequence motifs in the LCR and
`
`Fig. 4. Positions of selected features revealed by the multiple sequence
`
`
`
`
`
`The positions of DNase HSs are also plotted in Fig. 4.
`
`
`
`
`alignment. The positions of the HS cores are shown on the top line.
`
`
`Several HSs are reported around each of the cores.
`
`
`Reported positions of DNase HSs are on the second line. The next
`
`
`Although some of this heterogeneity simply reflects
`
`
`
`
`five lines show the positions of conserved sequences, as detected by
`
`
`
`multiple reports of the same HS, some of it results from
`
`
`
`
`
`three different methods. Differential phylogenetic footprints (DPFs)
`
`
`
`
`are on line 8. Conserved matches to consensus binding sites for the
`
`
`
`a wide distribution of cleavage. For instance, the regions
`
`
`indicated proteins are shown on lines 9-11. The last line shows posi
`
`
`
`around HS3 and HS2 have DNase cleavage sites outside
`
`
`one mismatch tions of matches between the
`(line
`unspecified consensus
`
`
`the minimal cores (Philipsen et al., 1990; Talbot et al.,
`
`7)and the motifs TATTT or ATTTA. GenBank entry HUMHBB
`
`1990). DNase HSs have been mapped at approximate
`
`
`
`begins at 2688 in the current human sequence file.
`
`
`positions 6200 (Stamatoyannopoulos et al., 1995) and
`
`6500 (Tuan et al., 1985), which are about 1000 bp 5' to
`These various methods for finding conserved segments
`
`
`the HS3 core. This is the same region that displays
`
`
`
`
`
`multiple conserved sequence blocks, showing a good
`
`
`
`
`produce generally congruent results, with substantial
`
`congruence between HS mapping and conserved
`
`
`
`overlap in the blocks detected by each of the methods.
`
`
`This indicates that the combination of the various
`
`
`sequences not only in the cores but far outside them
`
`
`
`
`methods for finding conserved sequences is quite robust.
`as well.
`
`
`
`As expected, all three methods find strongly conserved
`
`
`blocks within the HS cores, as well as juxtaposed to
`
`
`
`proteins that bind to them
`
`
`
`them (in particular, a phylogenetic footprint located just
`
`3' to the HS4 core and an API binding site immediately
`
`
`5' to the HS3 core). In addition, some, but not all, of
`The distribution of conserved binding sites for some
`
`
`
`
`
`
`the regions between the cores are conserved, with some
`
`
`
`
`prominent proteins involved in globin gene regulation
`
`
`
`
`phylogenetic footprints as strongly conserved as those
`
`
`
`
`was determined by searching for matches between the
`
`
`
`
`in the HS cores. Notable conserved regions are as much
`
`
`
`
`consensus sequence for the protein binding sites and the
`as 1000 bp 5' to and 3' to the HS3 core and also 5' to
`
`
`
`
`'unspecified consensus' computed allowing one mis
`
`
`the HS2 core. Interestingly, a conserved sequence is
`
`
`match (see Section 3.2. above). As shown in Fig. 4, three
`
`
`located between HS2 and HS 1 as well.
`
`
`
`
`conserved segments matching the consensus API bind
`
`
`The pattern of many conserved blocks in certain
`
`
`ing site (