throbber
ELSEVIER Gene 205 (1997) 73-94 GENE AN INT~RNATIC)NAL 0OURNA[. ON OlENES AND GENOMES Review Locus control regions of mammalian fl-globin gene clusters: combining phylogenetic analyses and experimental results to gain functional insights Ross Hardison a,b,., Jerry L. Slightom c, Deborah L. Gumucio d, Morris Goodman e, Nikola Stojanovic f, Webb Miller b,f a Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA b Center for Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA Molecular Biology Unit 7242, Pharmacia and Upjohn, Inc., Kalamazoo, MI49007, USA d Department of Anatomy and Cell Biology, University of Michigan Medical School Ann Arbor, MI48109-0616, USA ° Department of Anatomy and Cell Biology, Wayne State School of Medicine, Detroit, MI48201, USA f Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA Accepted 22 July 1997 Abstract Locus control regions (LCRs) are cis-acting DNA segments needed for activation of an entire locus or gene cluster. They are operationally defined as DNA sequences needed to achieve a high level of gene expression regardless of the position of integration in transgenic mice or stably transfected cells. This review brings together the large amount of DNA sequence data from the fl- globin LCR with the vast amount of functional data obtained through the use of biochemical, cellular and transgenic experimental systems. Alignment of orthologous LCR sequences from five mammalian species locates numerous conserved regions, including previously identified cis-acting elements within the cores of nuclease hypersensitive sites (HSs) as well as conserved regions located between the HS cores. The distribution of these conserved sequences, combined with the effects of LCR fragments utilized in expression studies, shows that important sites are more widely distributed in the LCR than previously anticipated, especially in and around HS2 and HS3. We propose that the HS cores plus HS flanking DNAs comprise a 'unit' to which proteins bind and form an optimally functional structure. Multiple HS units (at least three: HS2, HS3 and HS4 cores plus flanking DNAs) together establish a chromatin structure that allows the proper developmental regulation of genes within the cluster. © 1997 Elsevier Science B.V. Keywords: Hemoglobin; Sequence conservation; Enhancement; Chromatin; Domain opening; DNA-binding proteins 1. Expression patterns of mammalian hemoglobin gene clusters The genes that encode the polypeptides of the O~2fl 2 tetramer of hemoglobin are encoded in two separate * Corresponding author. Present address: Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 206 Althouse Laboratory, University Park, PA 16802, USA. Tel. : + 1 814 8630113; Fax: + 1 814 8637024; e-mail: rch8@psu.edu Abbreviations: LCR, locus control region; HS, hypersensitive site; HIC, highest information content; DPF, differential phylogenetic foot- print; CACBPs, proteins that bind to the CACC motif; MAR, matrix attachment region; bHLH, basic helix-loop-helix; MEL, murine erythroleukemia. 0378-1119/97/$17.00 © 1997 Elsevier Science B.V. All rights reserved. PH S0378-1119(97)00474-5 clusters in birds and mammals. In humans, the fl-like globin genes (including pseudogenes denoted by the prefix ~) are clustered in the array 5'-e-~v-Ay-~t/-6-fl-3 ' that covers about 75 kb on chromosome 11p15.4, and the a-like globin genes are in a 40-kb cluster, 5'-~2-~l- ~a2-~l-a2-~l-0-3', very close to the telomere of the short arm of chromosome 16. Expression of the a- and fl-like globin genes is limited to erythroid cells and is balanced so that equal amounts of the two polypeptides are available to assemble the hemoglobin heterotet- ramer. Expression of genes within the clusters is develop- mentally controlled, so that different forms of hemoglobin are produced in embryonic, fetal and adult life (reviewed in Stamatoyannopoulos and Nienhuis, 1994).
`
`Page 1 of 22
`
`BLUEBIRD EXHIBIT 1012
`
`

`

`74 R. Hardison et al. / Gene 205 (1997) 73-94 This process of hemoglobin switching is an excellent model system for increasing our understanding of the molecular mechanisms of differential gene expression during development. These developmental switches also offer new approaches to therapy for inherited anemias. For example, continued expression of the normally fetal HbF (~2~2) in adults will reduce the severity of symptoms of patients producing an abnormal /%globin in sickle cell disease and possibly also in patients lacking sufficient /3-globin (/%thalassemia). An understanding of the molecular basis of globin gene switching will facilitate development of new therapeutic strategies (pharmaco- logical and/or DNA transfer) that continue ~-globin gene expression in adults. In addition to biochemical and genetic approaches to studying regulation of globin genes, phylogenetic approaches are also highly informative. The detailed study of globin gene clusters in many mammalian species has provided a rich resource of information from which to glean further insight into not only the evolution of the gene clusters but also their regulation. The/~-globin gene clusters have been extensively studied in human, the prosimian galago, the lagomorph rabbit, the artio- dactyls goat and cow, and the rodent mouse. Maps of these gene clusters are shown in Fig. 1, and aspects of their evolution and regulation have been reviewed (Collins and Weissman, 1984; Goodman et al., 1987; Hardison and Miller, 1993). The e-globin gene is at the 5' end of all the mammalian globin gene clusters and is expressed only in embryonic red cells in all cases. In most eutherian mammals, expression of the 7-globin gene is also limited to embryonic red cells, but in anthropoid primates, its expression continues and predo- minates in fetal red cells. The appearance of this new pattern of fetal expression of the y-globin genes coincides roughly with the duplication of the genes in primate evolution, which leads to the hypothesis that the duplica- tion allowed the changes that caused the fetal recruit- ment (Hayasaka et al., 1993). The fl-globin gene is expressed after birth in all mammals, but in galago, mouse and rabbit, its expression initiates and predomi- nates in the fetal liver (arguing that fetal expression of the fl-globin gene is the ancestral state). The recruitment of 7-globin genes for fetal expression in anthropoid primates is accompanied by a corresponding delay in expression of the fl-globin gene. Comparisons of DNA sequences among mammalian fi-globin gene clusters can reveal candidates for sequences involved in shared regulatory functions; these will be detected as conserved sequence blocks, or phylo- genetic footprints, found in all mammals (Gumucio et al., 1992; Hardison et al., 1993). Notable similarities are found in alignments of the proximal 5' flanking regions of the orthologous fl-like globin genes, consistent with their roles as promoters and other regulators of expression. In addition, striking and extensive sequence matches are found at the far 5' end of the gene clusters, in the region that we now recognize as the locus control region (LCR), which is the dominant, distal control sequence for these gene clusters. Sequence comparisons can be used also to identify candidates for regulatory elements that lead to differences in expression patterns. In this case, one searches for sequences conserved in the set of mammals that show a particular phenotype but 0 20 40 60 80 100 120 140 kb I I I I I I I I / .uman A duplication of y / fetal reanditment . , / / Galago ~ y E E F,A ~ /~--------"~ Rabbit l3E y q 8 ~ / \ E, ~.~13x 13c ~.l Eiv~13z I}A ~v= ~13Y 13F Ancestral eutherian mammal ~ ~ E v E A F y bhobhlbh2 bh3 bl hi2 ~- Mouse !3 ~ [] [] E E F,A F,A Fig. l. Evolution of fl-globin gene clusters in eutherian mammals. The inferred ancestral gene cluster and the branching pathways to contemporary gene clusters are shown. The time of expression during development is indicated beneath the box representing each gene; E, embryonic; F, fetal; A, adult. The boxes for orthologous genes have the same shading.
`
`Page 2 of 22
`
`

`

`R. Hardison et aL /Gene 205 (1997) 73-94 75 which differ in the species with a different pattern of expression. For instance, such differential phylogenetic footprints (Gumucio et al., 1994) led to the discovery of a sequence implicated in fetal-specific expression of the 7-globin genes in higher primates (Jane et al., 1992) and a sequence that binds several proteins implicated in fetal silencing of the ):-globin gene (Gumucio et al., 1994). In this review, we summarize the results of sequence comparisons for both types of regulatory ele- ment in the LCR. 2. General features of mammalian [I-globin LCRs 2.1. DNase hypersensitive sites 5' to the ~-globin gene cluster The fl-globin LCR was initially discovered as a set of dnase hypersensitive sites located 5' to the e-globin gene (Tuan et al., 1985; Forrester et al., 1986, 1987). At least 5 DNase HSs, called HS1-HS5 (Fig. 2), have been characterized within the region that provided the original gain-of-function effects described below (Grosveld et al., 1987), and we will refer to this region with all five HSs as the 'full LCR.' The presence of DNase HSs is indicative of an altered chromatin structure associated with important cis-regulatory regions (Gross and Garrard, 1988). Some of these sites, especially HS3, appear preferentially in erythroid nuclei (Dhar et al., 1990), but in contrast to the DNase hypersensitive sites at promoters, all are developmentally stable, i.e., present in embryonic, fetal and adult red cells (Forrester et al., 1986). Thus, the LCR marks an open chromatin domain for the fl-like globin gene cluster in erythroid cells from all developmental stages, and functional assays implicate the LCR in generating this open domain, as described in the next section. 2.2. Position-independent expression and enhancement As illustrated in Fig. 2, the fl-globin LCR will confer high-level, position-independent expression on globin gene constructs in transgenic mice (reviewed in Townes and Behringer, 1990; Grosveld et al., 1993). In the absence of the LCR, the human fl- or 7-globin gene is expressed in only about half of the lines of transgenic mice carrying the integrated gene, and expression levels are low relative to those of the endogenous mouse globin genes. The lack of expression in many lines of transgenic mice is presumed to result from negative position effects generated by adjacent sequences at the site of integ- ration, which prevent expression of the transgene in erythroid cells. However, when a large DNA fragment containing the full LCR is linked to the fl-globin gene, all resulting transgenic mouse lines express the gene, and at a level comparable to that of the endogenous globin genes (Grosveld et al., 1987). Hence, the negative position effects are no longer observed, indicating that either a strong domain-opening activity (that overrides the negative effects of adjacent sequences), or an insula- tor that blocks the effects of adjacent sequences, or both, are present in the LCR. The high level of expres- sion of the transgene indicates the presence of enhancers in the LCR as well. Both enhancers and LCRs increase 0 20 40 60 I I I I DNase HSs Human ttttt E GTAT ~ ~ ~ t ~-globin ~__ gene cluster Hispanic (7~513) thalassemia ~-- 80 kb IExDressed in DeveloDmental Position Ervthroid red cells Reaula-tion Effects Cllromatin Yes Yes No Open No No Yes? Closed In transgenic mice: tt Sometimes Yes Yes Sometimes open P Yes BI Yes [] Precocious No Open expression Yes No Open Fig. 2. Summary of the major effects of the fl-globin locus control region.
`
`Page 3 of 22
`
`

`

`76 tL Hardison et al. / Gene 205 (1997) 73-94 the probability that a locus will be in a transcriptionally competent state without affecting the transcription rate in a cell actively expressing that locus (Walters et al., 1995, 1996; Wijgerde et al., 1996). This further argues that one of the major functions of the LCR is to open a chromatin domain around the locus in erythroid cells. In fact, deletion of most of the LCR but not the fl- globin genes, e.g., as occurs in Hispanic (?6fl)-thalas- semia, leaves the gene cluster in a chromatin conforma- tion that is inaccessible to DNase I, and the globin genes are not expressed (Forrester et al., 1990). Thus, this loss-of-function analysis also shows that the LCR is necessary for the establishment and maintenance of an open chromatin domain within which the globin genes are expressed (Fig. 2). Minimal DNA sequences that confer position-inde- pendent expression of a linked fl-globin gene in transgenic mice have been determined in regions around the sites of strong DNase cleavage (reviewed in Grosveld et al., 1993). These regions are referred to as the 'hypersensitive site cores' for HS1, HS2, HS3 and HS4. 2.3. Copy-number dependent expression Transgene constructs that confer full protection from position effects should not be affected by any adjacent sequences. Thus, when the construct is integrated in multiple copies, as is frequently the case in transgenic mice lines and in stably transfected cultured cells, each copy should be expressed independently of other copies, resulting in a level of expression that increases linearly with the number of copies. This 'copy-number-depen- dent' expression has been observed in some cases with particular fragments of the fl-globin LCR (Talbot et al., 1989), as well as with the chicken fl/e-globin enhancer (Reitman and Felsenfeld, 1990). Other experiments with fragments of the fl-globin LCR do not show a clear dependence on copy-number (Ryan et al., 1989), and occasional studies show inverse relationships between copy number and level of expression (Morley et al., 1992; TomHon et al., 1997). Although the minimal sequences that will achieve full dependence on copy number are not yet known, this property appears to require sequences from both the LCR and the gene proximal region (Lloyd et al., 1992; Fraser et al., 1993; Li and Stamatoyannopoulos, 1994b). For the 7-globin gene, copy-number dependence requires both sequences 3' to the ?-globin gene and one or more elements in the HS cores (Stamatoyannopoulos et al., 1997). 2.4. Replication of the locus In addition to the strong effects of the fl-globin LCR on chromatin opening and enhancement of expression, the LCR also has a dominant effect on the regulation of replication in the locus. The Hispanic (?6fl)-thalas- semia deletion, which removes HS2 through HS5 (Fig. 2), not only leaves the locus in a closed chromatin conformation but also delays the time of replication from early to late in S phase in erythroid cells (Forrester et al., 1990). Replication of the fl-globin gene locus normally initiates just 5' to the fl-globin gene (Kitsberg et al., 1993), which is 50 kb 3' to the LCR. Surprisingly, chromosomes with the Hispanic thalassemia deletion no longer use the normal replication origin, even though it is intact, but instead use an origin located 3' to the fl- globin locus (Aladjem et al., 1995). 2.5. Developmental regulation The effects of the LCR, if any, on developmental regulation are more complicated to analyze. Several lines of evidence show that sequences proximal to the genes are sufficient to specify expression at a given developmental stage. In the absence of an LCR, human fl-like globin genes are expressed at the 'correct' develop- mental stage in transgenic mice, i.e., mimicking the expression pattern of the orthologous endogenous mouse genes (summarized in Trudel and Costantini, 1987). In fact, developmental switching can occur between human ?- and fl-globin genes in transgenic mice in the absence of an LCR (Starck et al., 1994), demon- strating that the LCR is not essential for switching. Point mutations in the promoter of the human ?-globin genes are associated with prolonged expression in the adult stage, i.e., hereditary persistence of fetal hemoglo- bin (reviewed in Stamatoyannopoulos et al., 1994). Detailed studies of the human E- and ?-globin genes in constructs also containing LCR fragments have revealed sequences extending up to about 0.8 kb away from the gene that have both positive and negative effects on developmental control (Stamatoyannopoulos et al., 1993; Trepicchio et al., 1993; Li and Stamatoy- annopoulos, 1994b; Trepicchio et al., 1994). Recent studies in transgenic mice show that the human ?-globin gene is expressed fetally, whereas the orthologous galago ?-globin gene is expressed embryonically, in the context of an otherwise identical transgene construct (TomHon et al., 1997). This recapitulation of developmental speci- ficity shows that the dominant determinants of develop- mental timing are encoded by nucleotide differences within the 4.0-kb fragment containing the ?-globin gene. Although developmental switches in expression can occur in the absence of the LCR, it is still possible that, when present, the LCR participates directly in develop- mental regulation (e.g., Stamatoyannopoulos, 1991; Wijgerde et al., 1996). Addition of the LCR to a single human fl- or ?-globin gene will alter developmental control (Fig. 2), leading to precocious expression of the fl-globin gene in embryonic red cells and expression of the 7-globin gene in fetal and adult stages (Enver et al., 1989; Behringer et al., 1990). Inclusion of both ?- and
`
`Page 4 of 22
`
`

`

`1~ Hardison et al. / Gene 205 (1997) 73-94 77 fl-globin genes will improve the developmental switch- ing, leading to a model of competition between promot- ers for the LCR (Enver et al., 1990). The order of multiple globin genes in LCR-containing constructs also influences their regulation (Hanscombe et al., 1991; Peterson and Stamatoyannopoulos, 1993). Although these data can be explained by a competition model, the apparent loss of developmental control seen in the presence of an LCR could result from the increased sensitivity of the assays, and the effects of additional genes in the construct can be explained by gene order effects (such as transcriptional interference from the upstream gene) as opposed to proximity to the LCR (Martin et al., 1996). The effects that led to models of competition in developmental regulation are seen primarily for the regulation of the human fl-globin gene. The e-globin (Raich et al., 1990; Shih et al., 1990), ~-globin and (- globin (Pondel et al., 1992; Liebhaber et al., 1996) genes are autonomously regulated during development in the presence of LCR-like elements, and constructs contain- ing larger LCR fragments with the ~,-globin gene also show autonomous regulation (Dillon and Grosveld, 1991). 2.6. Models for LCR action Many studies are consistent with the hypothesis that several DNase HSs in the LCR work together in a holocomplex to generate the several effects enumerated above. One explicit model stating that each HS has a predominant effect on only one specific gene in the cluster (Engel, 1993) can be excluded since deletions of single HSs in the context of entire gene clusters either have little effect or affect expression of all the genes in the locus (reviewed below). Indeed, removal of any single HS makes the entire human fl-globin gene cluster more sensitive to position effects in transgenic mice (Milot et al., 1996), arguing that this defining property of the LCR requires all of the HSs. This result contrasts with the implications of reports on the ability of indivi- dual HSs to provide position-independent, copy-number dependent expression (e.g., Fraser et al., 1993), and the molecular basis for this apparent discrepancy is not clear. Functional interactions between the HSs have been demonstrated, but require DNA sequences inside and outside the core HSs (reviewed below). Thus, although several individual HSs do exhibit substantial function alone, it is most likely that they normally interact in a holocomplex (Ellis et al., 1996) that encom- passes a substantial amount of DNA. The ability of the LCR to open a chromosomal domain suggests that it recruits chromatin-remodeling activities such as SWI/SNF (Cote et al., 1994; Peterson and Tamkun, 1995) and/or histone acetyl transferases (Brownell et al., 1996) to this locus, but only in erythroid cells. This could occur indirectly, with recognition of specific sequences in the LCR by trans-activator proteins such as members of the AP1 family of proteins and recruitment of chromatin remodeling and/or histone modifying activities by specific interaction between these enzymes and the trans-activator. For instance, the co-activator proteins CBP and P300 are histone acetyl transferases and also interact with AP1 (Ogrysko et al., 1996). In addition, some DNA sequences in the LCR could recruit chromatin remodeling and modifying activ- ities directly. Several other issues remain unresolved. For instance, the LCR could influence all or several of the genes in the locus at once (Bresnick and Felsenfeld, 1994; Martin et al., 1996) or it could serve to activate expression of one gene at a time (Wijgerde et al., 1995). If the LCR does influence predominantly one gene at time, it could do so by interaction directly with the target gene with looping out of DNA between this distal regulator and the proximal regulatory elements (Grosveld et al., 1993) or the positive effect of the LCR could 'track' along the DNA to the target gene (Tuan et al., 1992). Neither the molecular targets of the direct interactions (in the former model) nor the molecular basis of the tracking effects (in the latter model) are known. For instance, 'tracking' could involve movement of transcription factors along the DNA, or it could result from spreading of the active chromatin domain down the locus. 3. Sequence analysis of mammalian fl-globin LCRs DNA sequences of much of the fl-globin LCR are now available from several mammalian species, includ- ing human (Li et al., 1985; Yu et al., 1994), galago (Slightom et al., 1997), rabbit (Hardison et al., 1993; Slightom et al., 1997), goat (Li et al., 1991) and mouse (Moon and Ley, 1990; Hug et al., 1992; Jimenez et al., 1992). The remainder of this review will discuss insights into the regions required for LCR function based on patterns of conservation revealed by a simultaneous alignment of these DNA sequences (Slightom et al., 1997). Key features of the LCRs from the different mammals are mapped in Fig. 3. 3.1. Conservation of number and order of HSs All of the known mammalian fl-globin LCRs have segments homologous to HS1, HS2 and HS3 (Fig. 3). HS4 is likely present in all these species as well, although the currently available goat sequence does not include the region corresponding to HS4. Homologs to human HS5 are found in galago (Slightom et al., 1997) and mouse (A. Reik, M. Bender and M. Groudine, pers. commun.), suggesting a wide distribution of HS5 as well. If HS5 is present in rabbit, it does not occur in the same place in human or galago. Thus the presence
`
`Page 5 of 22
`
`

`

`78 P~ Hardison et al. /Gene 205 (1997) 73 94 Human Galago 2000 I I HS5 g ! 4000 I I I HS4 F 0 2000 I ! I I I 60OO I 8000 10000 12000 14000 I I I I I I I I HS3 HS2 16000 18000 2000O I I I I I I HS1 I i, I F kI i j F ~l~ in hum.F ~ 10000 12000 14000 16000 18000 ! I ! I I I I I I I kL 4000 6000 I I I I IS bl , 8000 I I I I 22000 I I 20000 22000 I I I I Rabbit 0 2000 4000 6000 8000 I I I I I I I I I {] ki, n (not match huma~ Y~ I 10000 12000 14000 16000 18000 20000 I I I I I I I I I I I I ; 22000 I I Goat Mouse 0 2000 4000 6000 ' 'l" 4 distance = 6.5 kb in a allele (BALB/c) 5.0 kb in b allele (C67BL/6J) 2000 4000 6000 8000 10000 I I I I I I I I I I m goat 8000 10000 12000 14000 16000 18000 I I I I I I I I I I I 12000 14000 I I I I not sequenced 16000 ! I 20000 22000 I I I I Fig. 3. Mammalian ~-globin LCRs. Maps of the /~-globin LCRs of human, galago, rabbit, goat and mouse show the positions of HS cores in humans, their homologs in other species, positions and identities of repeats, and the new regions sequenced in rabbit (double-arrowed lines under the rabbit/~-globin LCR map). The HS cores are shown as boxes with distinctive fills, long interspersed repeats (Lls) are open arrowed boxes, and short interspersed repeats are triangles (in the latter two cases, the icon points in the direction that the repeat is oriented). Short repeats are Alu repeats in humans, both type I and type II Alu repeats in galago, C repeats in rabbits, Nla and D repeats in goats, and B1 repeats in mouse. An insertion between positions 14419 and 14 599 of galago does not match any known short or long repeats, and it may represent a newly discovered repeat. An insertion of 81 bp that begins at position 3614 of galago is a novel short insertion sequence. of four (HS1-4) and possibly all five major HSs is conserved in these eutherian mammals. This conserva- tion extends even further back in evolutionary time, with at least LCR HS2, HS3, HS4, and possibly HS1 being found in Australian marsupials and monotremes (R. Baird, J. Kuliwaba, R. Hope, M. Goodman et al., personal communication). 3.2. Conserved sequences within the LCR We used the program yama2 (Chao et al., 1994) to compute a simultaneous alignment of the available mammalian /%globin LCR sequences. We then used three different approaches to search for conserved sequences at a variety of criteria (Slightom et al., 1997; Stojanovic et al., 1997). The first method computes the information content of each column (Schneider et al., 1986); the positions of the 10 and 30 blocks with the highest information content (HIC) are displayed in Fig. 4. The information content reflects both the amount of variability in a column in a multiple alignment as well as the base composition for the sequences being aligned, and provides a finely graded function for meas- uring conservation. The second method simply finds runs of exact matches; Fig. 4 plots positions with seven or more consecutive invariant columns such that sequences from some minimal number of species align (four in one case, three in the other). A third method (Stojanovic et al., 1997) was devised to better reflect matches found at protein binding sites. Specifically, Fig. 4 identifies all runs of six or more columns possess- ing a plausible consensus sequence, i.e., each row in that region can have at most one mismatch with the (a priori unspecified) consensus. This requirement mimics the documented ability of some proteins to bind equally well to similar but not identical sequences. For instance, GATA1 binds to AGATAA or to TGATAG, which each differ in only one position from AGATAG.
`
`Page 6 of 22
`
`

`

`R. Hardison et aL /Gene 205 (1997) 73-94 79 HS cores HSs 10 HIC 30 HIC 1=7, n=4 1=7, n=3 1 mismatch DPF AP1/NFE2 CACBP GATA TATTT/ATTT,~ HS4 III J .s3 .s2 II I I II ILlll i l II II IIit I11 i l LI I I Ill ~,1 II IIIk g ~1111111111111 Ill I lilll lii I U qi i lOI I ]i[ hi i L Ill: li i illl iI lltL I Inl Ill Illllll i'll 4000 6000 8000 10000 12000 14000 16000 position Fig. 4. Positions of selected features revealed by the multiple sequence alignment. The positions of the HS cores are shown on the top line. Reported positions of DNase HSs are on the second line. The next five lines show the positions of conserved sequences, as detected by three different methods. Differential phylogenetic footprints (DPFs) are on line 8. Conserved matches to consensus binding sites for the indicated proteins are shown on lines 9-11. The last line shows posi- tions of matches between the one mismatch unspecified consensus (line 7) and the motifs TATTT or ATTTA. GenBank entry HUMHBB begins at 2688 in the current human sequence file. These various methods for finding conserved segments produce generally congruent results, with substantial overlap in the blocks detected by each of the methods. This indicates that the combination of the various methods for finding conserved sequences is quite robust. As expected, all three methods find strongly conserved blocks within the HS cores, as well as juxtaposed to them (in particular, a phylogenetic footprint located just 3' to the HS4 core and an AP1 binding site immediately 5' to the HS3 core). In addition, some, but not all, of the regions between the cores are conserved, with some phylogenetic footprints as strongly conserved as those in the HS cores. Notable conserved regions are as much as 1000 bp 5' to and 3' to the HS3 core and also 5' to the HS2 core. Interestingly, a conserved sequence is located between HS2 and HS1 as well. The pattern of many conserved blocks in certain broad regions is a sign of a distributed regulatory function. In particular, it implicates some of the regions outside the HS cores in the function of the LCR. Direct tests of the involvement of sequences outside the HS cores, using progressively larger DNA fragments con- taining the HS cores (reviewed below), demonstrate that these sequences do contribute significantly to the regula- tory functions of the LCR. The large numbers and wide distribution of conserved sequence blocks in the LCR raise the question of whether this pattern is characteristic of the entire gene cluster, and hence reflect the common ancestry of the gene clusters rather than revealing sequences resistant to change over evolutionary time. However, it is clear from pairwise and multiple alignments of the entire gene clusters that some single-copy regions do not align (Hardison et al., 1994; Hardison and Miller, 1993). One notable example is the intergenic region between the g- and fl-globin genes in mouse vs. human comparisons (Hardison et al., 1997). This shows that in the time since the ancestors to rodents and primates separated, some sequences in this locus (presumably those not necessary for function) have diverged extensively. Thus, the phylogenetic footprints in the LCR are indeed candidates for functional sequences. 3.3. Correspondence between DNase HSs and conserved blocks The positions of DNase HSs are also plotted in Fig. 4. Several HSs are reported around each of the cores. Although some of this heterogeneity simply reflects multiple reports of the same HS, some of it results from a wide distribution of cleavage. For instance, the regions around HS3 and HS2 have DNase cleavage sites outside the minimal cores (Philipsen et al., 1990; Talbot et al., 1990). DNase HSs have been mapped at approximate positions 6200 (Stamatoyannopoulos et al., 1995) and 6500 (Tuan et al., 1985), which are about 1000 bp 5' to the HS3 core. This is the same region that displays multiple conserved sequence blocks, showing a good congruence between HS mapping and conserved sequences not only in the cores but far outside them as well. 3.4. Repeated, conserved sequence motifs in the LCR and proteins that bind to them The distribution of conserved binding sites for some prominent proteins involved in globin gene regulation was determined by searching for matches between the consensus sequence for the protein binding sites and the 'unspecified consensus' computed allowing one mis- match (see Section 3.2. above). As shown in Fig. 4, three conserved segments matching the consensus AP1 bind- ing site (TGASTCA) are found close to or within the cores for HS3 and HS2 (HS2 has tandem AP1 binding sites). Characteristics of proteins interacting with the LCR DNA have been reviewed recently (Orkin, 1995; Baron, 1997) and are summarized in Table 1. Kruppel- like Zn finger proteins, such as Spl and EKLF, bind to sequences containing a CACC motif (hence, we use the generic designation CACBP for such binding proteins). Conserved CACC motifs are found at eight locations in the fl-globin LCR, including two in the HS3 core and one in the HS2 core. Three of the remaining conserved CACC motifs are found between HS3 and HS2. Conserved matches to the binding sites for GATA transcription factors (WGATAR) are present at nine positions within the LCR, including all four HS cores as well as several outside the HS cores.
`
`Page 7 of 22
`
`

`

`80 R, Hardison et al. / Gene 205 (1997) 73-94 Table 1 Transcription factors implicated in LCR function and globin gene regulation Protein Consensus binding site Composition Class of protein Relatives NFE2 YGCTGASTCAY LCRF1/Nrfl YGCTGASTCAY GATA1 WGATAR EKLF CCNCNCCCN BKLF/TEF2 CCNCNCCCN Spl GGGCGG YY1 a. VDCCATNWY b. GACATNTT SSP GGGGCCGGCGGCTGGCTAGGG USF CACGTG TAL1/SCL AACAGATGGT 45-kDa and 18-kDa (maf) subunits 49-kDa plus other subunit 50-kDa monomer 38-kDa monomer 95-105-kDa monomer 23-kDa monomer 66-kDa CP2 and 40-45-kDa subunit 50-kDa homodimer 40-kDa plus E2A or other partners

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket