`
`
`
`A.N INTERNATIONAL. JOU .. NAL ON
`
`OE.Nt:S AND Gt:NOM[S
`
`ELSEVIER
`
`
`
`Gene 205 (1997) 73-94
`
`Review
`
`Locus control regions of mammalian P-globin gene clusters: combining
`
`
`
`
`
`
`
`
`
`phylogenetic analyses and experimental results to gain functional insights
`
`
`
`
`
`
`Ross Hardison
`b,f
`
`
`Nikola Stojanovic r, Webb Miller
`
`a,b,*, Jerry L. Slightom C, Deborah L. Gumucio d, Morris Goodman e,
`
`a Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`b Center for Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
`
`
`
`0 Molecular Biology Unit 7242, Pharmacia and Upjohn, Inc., Kalamazoo, MI 49007, USA
`d Dep artment
`
`
`
`
`
`
`
`
`of Anatomy and Cell Biology, University of Michigan Medical School, Ann Arbor, MI 48109-0616, USA
`
`0 Department of Anatomy
`
`
`
`
`and Cell Biology, Wayne State School of Medicine, Detroit, MI 48201, USA
`
`
`
`
`
`
`r Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
`
`
`
`Accepted 22 July 1997
`
`Abstract
`
`Locus control regions (LCRs) are cis-acting DNA segments needed for activation of an entire locus or gene cluster. They are
`
`
`
`
`
`
`
`
`
`
`
`
`
`operationally defined as DNA sequences needed to achieve a high level of gene expression regardless of the position of integration
`
`
`
`
`
`
`in transgenic mice or stably transfected cells. This review brings together the large amount of DNA sequence data from the P
`
`
`
`
`
`
`
`globin LCR with the vast amount of functional data obtained through the use of biochemical, cellular and transgenic experimental
`
`
`
`
`
`
`
`
`
`
`systems. Alignment of orthologous LCR sequences from five mammalian species locates numerous conserved regions, including
`
`
`
`
`
`
`
`
`
`
`
`previously identified cis-acting elements within the cores of nuclease hypersensitive sites (HSs) as well as conserved regions located
`
`
`
`
`
`
`
`between the HS cores. The distribution of these conserved sequences, combined with the effects of LCR fragments utilized in
`
`
`
`
`
`
`
`
`expression studies, shows that important sites are more widely distributed in the LCR than previously anticipated, especially in
`
`
`
`
`
`
`and around HS2 and HS3. We propose that the HS cores plus HS flanking DNAs comprise a 'unit' to which proteins bind and
`
`
`
`
`
`
`
`form an optimally functional structure. Multiple HS units (at least three: HS2, HS3 and HS4 cores plus flanking DNAs) together
`
`
`
`
`
`
`
`establish a chromatin structure that allows the proper developmental regulation of genes within the cluster. © 1997 Elsevier
`Science B.V.
`
`
`
`
`
`
`
`
`
`
`
`Keywords: Hemoglobin; Sequence conservation; Enhancement; Chromatin; Domain opening; DNA-binding proteins
`
`
`
`
`
`
`
`1.Expression patterns of mammalian hemoglobin gene
`
`
`
`clusters
`
`clusters in birds and mammals. In humans, the /3-like
`
`
`
`
`
`
`globin genes (including pseudogenes denoted by the
`
`prefix l/1) are clustered in the array 5'-c-Gy-Ay-l/111-b-/3-3'
`The genes that encode the polypeptides of the 1:1.2/32
`
`
`
`that covers about 75 kb on chromosome 11 p 15.4, and
`
`
`
`tetramer of hemoglobin are encoded in two separate
`
`
`
`the a-like globin genes are in a 40-kb cluster, 5'-(2-l/10-
`
`
`l{la2-l{l1:1.l-1:1.2-al-0-3', very close to the telomere of the
`
`
`short arm of chromosome 16. Expression of the 1:1.-and
`*Corresponding author. Present address: Department of Biochemistry
`
`
`
`
`
`
`
`
`/3-like globin genes is limited to erythroid cells and is
`
`
`
`and Molecular Biology, The Pennsylvania State University, 206
`
`
`balanced so that equal amounts of the two polypeptides
`
`
`Althouse Laboratory, University Park, PA 16802, USA. Tel.: + I 814
`
`
`
`are available to assemble the hemoglobin heterotet
`
`
`8630113; Fax: + I 814 8637024; e-mail: rch8@psu.edu
`
`
`
`
`ramer. Expression of genes within the clusters is develop
`Abbreviations: LCR, locus control region; HS, hypersensitive site;
`
`
`
`
`
`mentally controlled, so that different forms of
`
`
`
`
`
`HIC, highest information content; DPF, differential phylogenetic foot
`
`
`
`hemoglobin are produced in embryonic, fetal and adult
`
`
`
`
`print; CACBPs, proteins that bind to the CACC motif; MAR, matrix
`
`
`life (reviewed in Stamatoyannopoulos and Nienhuis,
`
`
`attachment region; bHLH, basic helix-loop-helix; MEL, murine
`1994),
`erythroleukemia.
`
`0378-1119/97/$17.00 © 1997 Elsevier Science B.V. All rights reserved.
`
`
`PIT S0378-l l 19(97)00474-5
`
`SKI Exhibit 2042
`Page 1 of 23
`
`
`
`74
`
`
`
`R Hardison et al./ Gene 205 ( 1997) 73-94
`
`This process of hemoglobin switching is an excellent
`
`
`
`anthropoid primates, its expression continues and predo
`
`
`
`
`
`model system for increasing our understanding of the
`
`
`
`minates in fetal red cells. The appearance of this new
`
`mechanisms of differential gene expression
`molecular
`pattern
`
`
`of fetal expression of the y-globin genes coincides
`
`
`
`during development. These developmental switches also
`
`
`roughly with the duplication of the genes in primate
`
`
`
`offer new approaches to therapy for inherited anemias.
`
`
`evolution, which leads to the hypothesis that the duplica
`
`
`
`
`For example, continued expression of the normally fetal
`
`
`tion allowed the changes that caused the fetal recruit
`
`
`
`of symptoms the severity HbF (a2y2) in adults will reduce
`
`
`ment (Hayasaka et al., 1993). The P-globin gene is
`
`
`
`
`of patients producing an abnormal /J-globin in sickle
`
`
`expressed after birth in all mammals, but in galago,
`
`
`
`
`cell disease and possibly also in patients lacking sufficient
`
`
`
`mouse and rabbit, its expression initiates and predomi
`
`
`/J-globin (/J-thalassemia). An understanding of the
`
`
`nates in the fetal liver (arguing that fetal expression of
`
`
`
`molecular basis of globin gene switching will facilitate
`
`
`the /3-globin gene is the ancestral state). The recruitment
`
`
`
`development of new therapeutic strategies (pharmaco
`
`
`of y-globin genes for fetal expression in anthropoid
`
`
`
`logical and/or DNA transfer) that continue y-globin
`
`
`
`primates is accompanied by a corresponding delay in
`
`gene expression in adults.
`
`expression of the /1-glo bin gene.
`In addition to biochemical and genetic approaches to
`
`
`
`
`
`
`Comparisons of DNA sequences among mammalian
`
`studying regulation of globin genes, phylogenetic
`
`
`
`/3-globin gene clusters can reveal candidates for
`
`
`approaches are also highly informative. The detailed
`
`
`
`
`
`sequences involved in shared regulatory functions; these
`
`
`
`will be detected as conserved sequence blocks, or phylo
`
`
`
`study of globin gene clusters in many mammalian species
`
`
`
`has provided a rich resource of information from which
`
`
`genetic footprints, found in all mammals (Gumucio
`
`
`
`et al., 1992; Hardison et al., 1993). Notable similarities
`
`
`to glean further insight into not only the evolution of
`
`
`are found in alignments of the proximal 5' flanking
`
`
`the gene clusters but also their regulation. The p-globin
`
`
`
`gene clusters have been extensively studied in human,
`
`
`
`
`
`regions of the orthologous P-like globin genes, consistent
`
`
`with their roles as promoters and other regulators of
`
`
`
`
`the prosimian galago, the lagomorph rabbit, the artio
`
`
`
`
`expression. In addition, striking and extensive sequence
`
`
`dactyls goat and cow, and the rodent mouse. Maps of
`
`matches are found at the far 5' end of the gene clusters,
`
`
`these gene clusters are shown in Fig. 1, and aspects of
`
`
`in the region that we now recognize as the locus control
`
`
`their evolution and regulation have been reviewed
`
`
`(Collins and Weissman, 1984; Goodman et al., 1987;
`
`
`region (LCR), which is the dominant, distal control
`
`
`
`sequence for these gene clusters. Sequence comparisons
`
`
`
`
`Hardison and Miller, 1993). The £-globin gene is at the
`
`
`can be used also to identify candidates for regulatory
`
`
`
`5' end of all the mammalian globin gene clusters and is
`
`
`
`elements that lead to differences in expression patterns.
`
`
`
`expressed only in embryonic red cells in all cases. In
`
`
`
`In this case, one searches for sequences conserved in the
`
`
`
`most eutherian mammals, expression of the y-globin
`
`set of mammals that show a particular phenotype but
`
`
`
`gene is also limited to embryonic red cells, but in
`
`o
`I
`
`20
`I
`
`40
`I
`
`60
`I
`
`80
`I
`
`100
`I
`
`120
`I
`
`140 kb
`I
`
`Human P
`
`duplication of y
`
`and
`fetal recruitment� Galago P �
`
`L...----
`
`Rabbit p �
`
`
`
`Ancestral eutherian mammal
`
`E GyAy ljlll Ii
`
`A
`E F F
`E Y ljlll
`Ii J}
`
`E E F,A
`£ y \jlli p
`
`E E
`
`F,A
`
`Goatp
`
`Mouse P
`
`J E
`E E
`y bh0bhlbh2 bh3 bl b2
`
`A
`
`F
`
`E E F,A F,A
`
`Fig. I. Evolution of fi-globin gene clusters in eutherian mammals. The inferred ancestral gene cluster and the branching pathways to contemporary
`
`
`
`
`
`
`
`
`
`
`
`
`
`gene clusters are shown. The time of expression during development is indicated beneath the box representing each gene; E, embryonic; F, fetal;
`
`
`
`A, adult. The boxes for orthologous genes have the same shading.
`
`SKI Exhibit 2042
`Page 2 of 23
`
`
`
`
`
`R. Hardison et al. / Gene 205 ( 1997) 73-94
`
`75
`
`
`
`
`
`2.2. Position-independent expression and enhancement
`
`1986). Thus, the LCR marks an open chromatin domain
`
`
`which differ in the species with a different pattern of
`
`
`
`
`
`
`
`for the /3-like globin gene cluster in erythroid cells from
`
`
`
`expression. For instance, such differential phylogenetic
`
`
`
`
`all developmental stages, and functional assays implicate
`
`
`footprints (Gumucio et al., 1994) led to the discovery
`
`
`the LCR in generating this open domain, as described
`
`
`
`
`of a sequence implicated in fetal-specific expression of
`in the next section.
`
`
`
`the y-globin genes in higher primates (Jane et al., 1992)
`
`
`
`
`and a sequence that binds several proteins implicated in
`
`
`
`fetal silencing of the y-globin gene (Gumucio et al.,
`
`
`
`
`1994). In this review, we summarize the results of
`As illustrated in Fig. 2, the /3-globin LCR will confer
`
`
`
`
`
`sequence comparisons for both types of regulatory ele
`
`
`
`high-level, position-independent expression on globin
`ment in the LCR.
`
`
`
`gene constructs in transgenic mice (reviewed in Townes
`
`
`and Behringer, 1990; Grosveld et al., 1993). In the
`
`
`absence of the LCR, the human /3-or y-globin gene is
`2.General features of mammalian p-globin LCRs
`
`expressed in only about half of the lines of transgenic
`
`
`
`mice carrying the integrated gene, and expression levels
`2.1. DNase hyp ersensitive
`
`sites 5' to the [l-globin gene
`
`
`are low relative to those of the endogenous mouse globin
`cluster
`
`
`genes. The lack of expression in many lines of transgenic
`
`
`
`
`mice is presumed to result from negative position effects
`The /3-globin LCR was initially discovered as a set of
`
`
`
`
`
`
`generated by adjacent sequences at the site of integ
`
`
`
`
`dnase hypersensitive sites located 5' to the €-globin gene
`
`
`
`
`ration, which prevent expression of the transgene in
`
`
`(Tuan et al., 1985; Forrester et al., 1986, 1987). At least
`
`
`
`erythroid cells. However, when a large DNA fragment
`
`5 DNase HSs, called HS1-HS5 (Fig. 2), have been
`
`
`
`containing the full LCR is linked to the ,8-globin gene,
`
`
`
`
`characterized within the region that provided the original
`
`
`
`all resulting transgenic mouse lines express the gene,
`
`
`
`
`gain-of-function effects described below (Grosveld et al.,
`
`and at a level comparable to that of the endogenous
`
`
`1987), and we will refer to this region with all five HSs
`
`
`
`globin genes (Grosveld et al., 1987). Hence, the negative
`
`as the 'full LCR.' The presence of DNase HSs is
`
`
`
`
`
`position effects are no longer observed, indicating that
`
`
`
`
`indicative of an altered chromatin structure associated
`
`
`
`
`either a strong domain-opening activity (that overrides
`
`
`
`with important cis-regulatory regions (Gross and
`
`
`
`
`the negative effects of adjacent sequences), or an insula
`
`
`
`Garrard, 1988). Some of these sites, especially HS3,
`
`
`tor that blocks the effects of adjacent sequences, or
`
`
`appear preferentially in erythroid nuclei (Dhar et al.,
`
`both, are present in the LCR. The high level of expres
`
`
`
`1990), but in contrast to the DNase hypersensitive sites
`
`
`
`sion of the transgene indicates the presence of enhancers
`
`
`
`
`at promoters, all are developmentally stable, i.e., present
`
`in the LCR as well. Both enhancers and LCRs increase
`
`
`in embryonic, fetal and adult red cells (Forrester et al.,
`
`0
`
`I I
`
`20
`
`DNase HSs
`
`40
`
`60
`
`I I
`
`BO kb
`l;!l(lb[Qid
`�21,m�ssed ia Qe�IQ12rneotal
`
`EQsi.tiQo
`RegulruiQ□
`� Chromatin
`�
`
`E G.yA-y 'lfTl 6 13
`t
`Human ttttt
`Yes
`13-globin
`�:��-�:!
`I II □� I
`gene cluster
`E G.yA-y 'lfTl 6
`Hispanic
`(r613) thalassemia
`I
`I II
`0-§
`
`In transgenic mice:
`13
`
`No
`
`Yes
`
`No Open
`
`No
`
`Yes? Closed
`
`Sometimes Yes
`I
`
`Yes Sometimes
`open
`
`.
`
`Yes Precocious No Open
`expression
`
`E GyA-y 'lfTl 6 13
`♦
`tt♦♦t
`Yes
`�:�:) I II 0----fil I
`
`Yes
`
`No Open
`
`
`
`
`
`
`
`
`
`Fig. 2. Summary of the major effects of the P-globin locus control region.
`
`SKI Exhibit 2042
`Page 3 of 23
`
`
`
`76
`
`
`
`
`
`R Hardison et al. ( 1997) 73-94 / Gene 205
`
`2.5. Developmental regulation
`
`
`
`
`semia deletion, which removes HS2 through HS5
`the probability that a locus will be in a transcriptionally
`
`
`
`
`
`competent state without affecting the transcription rate
`
`
`
`(Fig. 2), not only leaves the locus in a closed chromatin
`
`
`conformation but also delays the time of replication
`
`
`
`in a cell actively expressing that locus (Walters et al.,
`
`
`
`
`
`
`1995, 1996; Wijgerde et al., 1996). This further argues from early to late in S phase in erythroid cells (Forrester
`
`that one of the major functions of the LCR is to open
`
`
`
`et al., 1990). Replication of the /3-globin gene locus
`
`
`a chromatin domain around the locus in erythroid cells.
`
`
`
`normally initiates just 5' to the /3-globin gene (Kitsberg
`
`In fact, deletion of most of the LCR but not the /3-
`
`
`et al., 1993), which is 50 kb 3' to the LCR. Surprisingly,
`
`globin genes, e.g., as occurs in Hispanic (yb/J)-thalas
`
`
`
`
`chromosomes with the Hispanic thalassemia deletion no
`
`
`
`
`semia, leaves the gene cluster in a chromatin conforma
`
`
`
`longer use the normal replication origin, even though it
`
`tion that is inaccessible to DNase I, and the globin
`
`
`
`
`is intact, but instead use an origin located 3' to the /3-
`
`
`
`genes are not expressed (Forrester et al., 1990). Thus,
`
`
`globin locus (Aladjem et al., 1995).
`
`
`this loss-of-function analysis also shows that the LCR
`
`
`
`is necessary for the establishment and maintenance of
`
`
`an open chromatin domain within which the globin
`
`genes are expressed (Fig. 2).
`The effects of the LCR, if any, on developmental
`Minimal DNA sequences that confer position-inde
`
`
`
`
`
`
`regulation are more complicated to analyze. Several
`
`
`
`pendent expression of a linked /3-globin gene in
`
`
`
`lines of evidence show that sequences proximal to the
`
`
`
`
`transgenic mice have been determined in regions around
`
`
`genes are sufficient to specify expression at a given
`
`
`
`the sites of strong DNase cleavage (reviewed in Grosveld
`
`
`
`developmental stage. In the absence of an LCR, human
`
`
`et al., 1993). These regions are referred to as the
`
`
`
`
`/3-like globin genes are expressed at the 'correct' develop
`
`
`'hypersensitive site cores' for HSI, HS2, HS3 and HS4.
`
`
`
`mental stage in transgenic mice, i.e., mimicking the
`
`
`expression pattern of the orthologous endogenous
`2.3. Copy-number dependent expression
`
`mouse genes (summarized in Trudel and Costantini,
`
`
`1987). In fact, developmental switching can occur
`Transgene constructs that confer full protection from
`
`
`
`
`
`
`
`between human y-and /3-globin genes in transgenic mice
`
`
`
`position effects should not be affected by any adjacent
`
`
`in the absence of an LCR (Starck et al., 1994 ), demon
`
`
`
`sequences. Thus, when the construct is integrated in
`
`
`strating that the LCR is not essential for switching.
`
`
`multiple copies, as is frequently the case in transgenic
`
`
`Point mutations in the promoter of the human y-globin
`
`
`
`
`mice lines and in stably transfected cultured cells, each
`
`
`
`genes are associated with prolonged expression in the
`
`
`
`
`adult stage, i.e., hereditary persistence of fetal hemoglo
`
`
`
`copy should be expressed independently of other copies,
`
`
`
`
`
`
`resulting in a level of expression that increases linearly bin (reviewed in Stamatoyannopoulos et al., 1994).
`
`with the number of copies. This 'copy-number-depen
`
`
`
`Detailed studies of the human E-and y-globin genes in
`
`
`dent' expression has been observed in some cases with
`
`constructs also containing LCR fragments have revealed
`
`
`
`particular fragments of the /J-globin LCR (Talbot et al.,
`
`
`sequences extending up to about 0.8 kb away from the
`
`
`
`1989), as well as with the chicken /3/E-globin enhancer
`
`
`gene that have both positive and negative effects on
`
`
`
`developmental control (Stamatoyannopoulos et al.,
`
`
`
`(Reitman and Felsenfeld, 1990). Other experiments with
`
`1993; Trepicchio et al., 1993; Li and Stamatoy
`
`
`fragments of the /3-globin LCR do not show a clear
`
`
`annopoulos, 19946; Trepicchio et al., 1994). Recent
`
`
`
`dependence on copy-number (Ryan et al., 1989), and
`
`
`studies in transgenic mice show that the human y-globin
`
`
`
`occasional studies show inverse relationships between
`
`
`galago whereas the orthologous
`
`gene is expressed fetally,
`
`
`copy number and level of expression (Morley et al.,
`
`
`
`y-globin gene is expressed embryonically, in the context
`1992; TomHon et al., 1997). Although the minimal
`
`
`
`
`of an otherwise identical transgene construct (TomHon
`
`
`sequences that will achieve full dependence on copy
`
`
`
`et al., 1997). This recapitulation of developmental speci
`
`
`number are not yet known, this property appears to
`
`
`ficity shows that the dominant determinants of develop
`
`
`require sequences from both the LCR and the gene
`
`
`mental timing are encoded by nucleotide differences
`
`
`
`proximal region (Lloyd et al., 1992; Fraser et al., 1993;
`
`
`
`
`
`within the 4.0-kb fragment containing the y-globin gene.
`
`
`Li and Stamatoyannopoulos, 1994b). For the y-globin
`
`
`
`
`Although developmental switches in expression can
`
`
`
`gene, copy-number dependence requires both sequences
`
`
`
`occur in the absence of the LCR, it is still possible that,
`
`
`3' to the y-globin gene and one or more elements in the
`
`
`when present, the LCR participates directly in develop
`
`HS cores (Stamatoyannopoulos et al., 1997).
`
`
`mental regulation (e.g., Stamatoyannopoulos, 1991;
`
`
`
`Wijgerde et al., 1996). Addition of the LCR to a single
`
`human /3-or y-globin gene will alter developmental
`
`
`
`
`
`control (Fig. 2), leading to precocious expression of the
`In addition to the strong effects of the /3-globin LCR
`
`
`
`
`
`
`
`/3-globin gene in embryonic red cells and expression of
`
`
`
`on chromatin opening and enhancement of expression,
`
`
`the y-globin gene in fetal and adult stages (Enver et al.,
`
`the LCR also has a dominant effect on the regulation
`
`
`
`1989; Behringer et al., 1990). Inclusion of bothy-and
`
`
`
`of replication in the locus. The Hispanic (yb/3)-thalas-
`
`
`
`
`
`2.4. Replication of the locus
`
`SKI Exhibit 2042
`Page 4 of 23
`
`
`
`
`
`R Hardison et al./ Gene 205 ( 1997) 73-94
`
`77
`
`
`
`2.6. Models for LCR action
`
`cells. This could occur indirectly, with recognition of
`
`
`
`
`
`
`/J-globin genes will improve the developmental switch
`
`
`
`specific sequences in the LCR by trans-activator proteins
`
`
`ing, leading to a model of competition between promot
`such as members of the APl family of proteins and
`ers for the LCR (Enver et al., 1990). The order of
`
`
`recruitment of chromatin remodeling and/or histone
`
`
`
`multiple globin genes in LCR-containing constructs also
`
`
`
`modifying activities by specific interaction between these
`
`
`
`influences their regulation (Hanscombe et al., 1991;
`
`
`enzymes and the trans-activator. For instance, the
`
`
`Peterson and Stamatoyannopoulos, 1993). Although
`
`
`
`co-activator proteins CBP and P300 are histone acetyl
`
`
`these data can be explained by a competition model,
`
`
`transferases and also interact with API (Ogrysko et al.,
`
`
`
`the apparent loss of developmental control seen in the
`
`
`
`1996). In addition, some DNA sequences in the LCR
`
`
`presence of an LCR could result from the increased
`
`
`
`could recruit chromatin remodeling and modifying activ
`
`
`
`sensitivity of the assays, and the effects of additional
`
`ities directly.
`
`genes in the construct can be explained by gene order
`
`Several other issues remain unresolved. For instance,
`
`
`
`
`
`effects (such as transcriptional interference from the
`
`
`the LCR could influence all or several of the genes in
`
`
`
`upstream gene) as opposed to proximity to the LCR
`the locus at once ( Bresnick
`
`and Felsenfeld, 1994; Martin
`
`(Martin et al., 1996).
`
`
`et al., 1996) or it could serve to activate expression of
`The effects that led to models of competition in
`
`
`
`one gene at a time (Wijgerde et al., 1995). If the LCR
`
`
`developmental regulation are seen primarily for the
`
`
`does influence predominantly one gene at time, it could
`
`
`regulation of the human /J-globin gene. The £-globin
`do so by interaction
`
`directly with the target gene with
`
`
`(Raich et al., 1990; Shih et al., 1990), a-globin and (
`
`
`
`
`looping out of DNA between this distal regulator and
`
`
`globin (Pondel et al., 1992; Liebhaber et al., 1996) genes
`
`
`
`the proximal regulatory elements (Grosveld et al., 1993)
`
`
`
`are autonomously regulated during development in the
`
`
`or the positive effect of the LCR could 'track' along the
`
`
`
`
`presence of LCR-like elements, and constructs contain
`
`
`
`DNA to the target gene (Tuan et al., 1992). Neither the
`
`
`ing larger LCR fragments with the y-globin gene also
`
`
`
`
`molecular targets of the direct interactions (in the former
`
`
`
`show autonomous regulation (Dillon and Grosveld,
`
`
`model) nor the molecular basis of the tracking effects
`1991).
`
`
`(in the latter model) are known. For instance, 'tracking'
`
`
`
`could involve movement of transcription factors along
`
`
`the DNA, or it could result from spreading of the active
`
`chromatin domain down the locus.
`
`
`
`Many studies are consistent with the hypothesis that
`
`
`several DNase HSs in the LCR work together in a
`
`
`to generate the several effects enumerated
`holocomplex
`
`
`
`
`above. One explicit model stating that each HS has a
`
`predominant effect on only one specific
`gene in the
`DNA sequences of much of the P-globin LCR are
`
`
`
`
`cluster (Engel, 1993) can be excluded since deletions of
`
`
`
`
`now available from several mammalian species, includ
`
`
`
`
`single HSs in the context of entire gene clusters either
`
`ing human (Li et al., 1985; Yu et al., 1994), galago
`
`
`have little effect or affect expression of all the genes in
`
`
`
`(Slightam et al., 1997), rabbit (Hardison et al., 1993;
`
`
`
`removal of any the locus (reviewed below). Indeed,
`
`
`Slightam et al., 1997), goat (Li et al., 1991) and mouse
`
`
`single HS makes the entire human p-globin gene cluster
`
`(Moon and Ley, 1990; Hug et al., 1992; Jimenez et al.,
`
`
`
`more sensitive to position effects in transgenic mice
`
`
`
`
`1992). The remainder of this review will discuss insights
`
`
`
`
`(Milot et al., 1996), arguing that this defining property
`
`
`
`into the regions required for LCR function based on
`
`
`of the LCR requires all of the HSs. This result contrasts
`
`
`
`revealed by a simultaneous patterns of conservation
`
`
`
`with the implications of reports on the ability of indivi
`
`
`
`alignment of these DNA sequences (Slightom et al.,
`
`dual HSs to provide position-independent, copy-number
`
`1997). Key features of the LCRs from the different
`
`
`
`
`
`dependent expression (e.g., Fraser et al., 1993), and the
`mammals are mapped in Fig. 3.
`
`
`molecular basis for this apparent discrepancy is not
`
`
`
`
`clear. Functional interactions between the HSs have
`
`
`
`been demonstrated, but require DNA sequences inside
`
`
`
`and outside the core HSs (reviewed below). Thus,
`All of the known mammalian /3-globin LCRs have
`
`
`
`
`HSs do exhibit substantial although several individual
`
`segments homologous to HSI, HS2 and HS3 (Fig. 3).
`
`
`function alone, it is most likely that they normally
`
`
`
`HS4 is likely present in all these species as well, although
`
`
`
`interact in a holocomplex (Ellis et al., 1996) that encom
`
`
`
`the currently available goat sequence does not include
`
`
`passes a substantial amount of DNA
`
`
`
`the region corresponding to HS4. Homologs to human
`
`The ability of the LCR to open a chromosomal
`
`HS5 are found in galago (Slightom et al., 1997) and
`
`
`
`domain suggests that it recruits chromatin-remodeling
`
`mouse (A. Reik, M. Bender and M. Groudine, pers.
`
`activities such as SWI/SNF (Cote et al., 1994; Peterson
`
`
`
`commun.), suggesting a wide distribution of HS5 as
`
`
`and Tamkun, 1995) and/or histone acetyl transferases
`in rabbit, it does not occur in
`well. If HS5 is present
`
`
`(Brownell et al., 1996) to this locus, but only in erythroid
`
`the same place in human or galago. Thus the presence
`
`
`
`
`
`3.Sequence analysis of mammalian P,.globin LCRs
`
`3.1. Conservation of number and order of HSs
`
`SKI Exhibit 2042
`Page 5 of 23
`
`
`
`78
`
`
`
`R Hardison e1 al. / Gene 205 ( 1997) 73-94
`
`
`
`0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000
`HS1
`HS2
`
`HS5
`
`HS4
`
`HS3
`
`Human I ►I ◄ I ► ►�in hum.► .--0 ◄ f•globi+
`
`0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000
`
`I ► I ►► le�
`Galago I I
`
`IS
`
`� � ◄◄ e
`
`0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000
`
`Rabbit
`
`Goat
`
`Mouse
`
`..
`
`.. ..
`
`0 2000 4000 6000 8000 10000 12000 14000 16000
`
`�I � � �I� in goat
`not sequenced
`
`-globin
`
`0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000
`
`L1Md
`
`-globin
`
`I-,,_
`distance = 6.5 kb In a allele (BALB/c)
`
`
`5.0 kb in b allele (C67BU6J)
`
`L1Md
`
`Fig. 3. Mammalian p-globin LCRs. Maps of the p-globin LCRs of human, galago, rabbit, goal and mouse show the positions of HS cores in
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`humans, their homologs in other species, positions and identities of repeats, and the new regions sequenced in rabbit (double-arrowed lines under
`
`
`
`
`
`
`
`the rabbit P-globin LCR map). The HS cores are shown as boxes with distinctive fills, long interspersed repeats (Lis) are open arrowed boxes,
`
`
`
`
`
`
`
`
`and short interspersed repeats are triangles (in the latter two cases, the icon points in the direction that the repeat is oriented). Short repeats are
`
`
`
`
`
`
`
`
`in goals, and Bl repeats in mouse. in rabbits, in galago, C repeats Alu repeats in humans, both type I and type II Alu repeats Nia and D repeats
`
`
`
`
`
`
`An insertion between positions 14419 and 14599 of galago does not match any known short or long repeats, and it may represent a newly
`
`
`
`
`
`
`
`discovered repeat. An insertion of 81 bp that begins at position 3614 of galago is a novel short insertion sequence.
`
`
`
`3.2. Conserved sequences within the LCR
`
`
`
`
`
`Fig. 4. The information content reflects both the amount
`
`
`
`of four (HSl-4) and possibly all five major HSs is
`
`
`
`
`
`
`
`
`conserved in these eutherian mammals. This conservaof variability in a column in a multiple alignment as
`
`
`well as the base composition for the sequences being
`
`
`tion extends even further back in evolutionary time,
`
`
`
`aligned, and provides a finely graded function for meas
`
`with at least LCR HS2, HS3, HS4, and possibly HSl
`
`uring conservation. The second method simply finds
`
`
`being found in Australian marsupials and monotremes
`
`
`runs of exact matches; Fig. 4 plots positions with seven
`
`
`(R. Baird, J. Kuliwaba, R. Hope, M. Goodman et al.,
`
`
`or more consecutive invariant columns such that
`
`personal communication).
`
`
`
`sequences from some minimal number of species align
`
`(four in one case, three in the other). A third method
`
`
`
`(Stojanovic et al., 1997) was devised to better reflect
`
`
`
`
`matches found at protein binding sites. Specifically,
`We used the program yama2 (Chao et al., 1994) to
`
`
`
`Fig. 4 identifies all runs of six or more columns possess
`
`
`compute a simultaneous alignment of the available
`
`
`
`
`
`ing a plausible consensus sequence, i.e., each row in that
`
`
`
`mammalian ,B-globin LCR sequences. We then used
`
`
`region can have at most one mismatch with the (a priori
`
`
`three different approaches to search for conserved
`
`
`
`unspecified) consensus. This requirement mimics the
`
`
`
`
`sequences at a variety of criteria (Slightom et al., 1997;
`
`
`
`documented ability of some proteins to bind equally
`
`
`
`Stojanovic et al., 1997). The first method computes the
`
`
`
`well to similar but not identical sequences. For instance,
`
`of each column (Schneider et al.,
`
`information content
`GATA I binds to AGATAA or to TGATAG, which each
`
`
`
`1986); the positions of the 10 and 30 blocks with the
`
`differ in only one position from AGATAG.
`
`
`
`
`highest information content (HIC) are displayed in
`
`SKI Exhibit 2042
`Page 6 of 23
`
`
`
`
`
`
`
`R.Hardison et al./ Gene 205 ( 1997) 73-94
`
`79
`
`;
`
`I
`
`I
`
`1 mismatch
`DPF
`AP1/NFE2
`
`GATA
`
`l II
`
`clusters that some single-copy regions do not align
`
`
`
`I HS� , Hsa
`HS cores
`
`
`
`
`(Hardison et al., 1994; Hardison and Miller, 1993). One
`:II I' : 11�11I
`1i
`HSs
`
`
`
`
`
`notable example is the intergenic region between the 8-
`10 HIC
`: I II ti I I
`11 I 11111 I II ill 1111
`and /J-globin genes in mouse vs. human comparisons
`30 HIC
`1=7, n=4
`
`(Hardison et al., 1997). This shows that in the time
`
`11 II 111! 1111 I !II I I
`111
`i i: Ill 11
`1=7, n=3
`I. 1111111
`
`
`
`since the ancestors to rodents and primates separated,
`11, ■1111 �RI 1111�
`I �11111
`II 11 ! I i' 11 I II [ I
`
`
`some sequences in this locus (presumably those not
`
`
`
`
`necessary for function) have diverged extensively. Thus,
`I I II ii I
`1! I i !I
`CACBP
`11 I I
`ll
`I I I
`
`
`the phylogenetic footprints in the LCR are indeed
`l Iii I
`I 1111 II ill 111 I
`
`
`candidates for functional sequences.
`TATTT/ATTTA
`
`4000 6000 8000 10000 12000 14000 16000
`
`position
`
`3.3. Correspondence between DNase HSs and conserved
`
`
`
`
`blocks
`
`3.4. Repeated, conserved sequence motifs in the LCR and
`
`Fig. 4. Positions of selected features revealed by the multiple sequence
`
`
`
`
`
`
`of DNase HSs are also plotted in Fig. 4.
`The positions
`
`
`alignment. The positions of the HS cores are shown on the top line.
`
`
`Several HSs are reported around each of the cores.
`
`
`Reported positions of DNase HSs are on the second line. The next
`
`
`Although some of this heterogeneity simply reflects
`
`
`
`
`five lines show the positions of conserved sequences, as detected by
`
`
`
`multiple reports of the same HS, some of it results from
`
`
`
`
`
`three different methods. Differential phylogenetic footprints (DPFs)
`
`
`
`
`
`are on line 8. Conserved matches to consensus binding sites for the
`
`
`
`a wide distribution of cleavage. For instance, the regions
`
`
`indicated proteins are shown on lines 9-11. The last line shows posi
`
`
`around HS3 and HS2 have DNase cleavage sites outside
`
`
`tions of matches between the one mismatch
`(line
`
`unspecified consensus
`
`
`the minimal cores (Philipsen et al., 1990; Talbot et al.,
`
`
`7)and the motifs TATIT or ATITA. GenBank entry HUMHBB
`
`1990). DNase HSs have been mapped at approximate
`
`
`
`begins at 2688 in the current human sequence file.
`
`
`positions 6200 (Stamatoyannopoulos et al., 1995) and
`
`6500 (Tuan et al., 1985), which are about 1000 bp 5' to
`These various methods for finding conserved segments
`
`
`the HS3 core. This is the same region that displays
`
`
`
`
`produce generally congruent results, with substantial
`
`
`
`
`multiple conserved sequence blocks, showing a good
`
`
`overlap in the blocks detected by each of the methods.
`
`congruence between HS mapping and conserved
`
`
`This indicates that the combination of the various
`
`
`sequences not only in the cores but far outside them
`
`
`
`methods for finding conserved sequences is quite robust.
`as well.
`
`
`
`As expected, all three methods find strongly conserved
`
`
`blocks within the HS cores, as well as juxtaposed to
`
`
`
`
`
`
`
`them (in particular, a phylogenetic footprint located just
`
`proteins that bind to them
`
`3' to the HS4 core and an APl binding site immediately
`5' to the HS3 core). In addition, some, but not all, of
`The distribution of conserved binding sites for some
`
`
`
`
`
`
`the regions between the cores are conserved, with some
`
`
`
`prominent proteins involved in globin gene regulation
`
`
`
`
`was determined by searching for matches between the
`
`
`
`
`phylogenetic footprints as strongly conserved as those
`
`
`
`in the HS cores. Notable conserved regions are as much
`
`
`
`
`
`consensus sequence for the protein binding sites and the
`as 1000 bp 5' to and 3' to the HS3 core and also 5' to
`
`
`
`'unspecified consensus' computed allowing one mis
`
`
`
`the HS2 core. Interestingly, a conserved sequence is
`
`
`match (see Section 3.2. above). As shown in Fig. 4, three
`
`located between HS2 and HSI as well.
`
`
`
`
`conserved segments matching the consensus APl bind
`
`
`ing site (TGASTCA) are found close to or within the
`
`
`The pattern of many conserved blocks in c