throbber
Downloaded from
`
`http://jvi.asm.org/
`
` on March 13, 2017 by guest
`
`JOURNAL OF VIROLOGY,
`0022-538X/98/$04.0010
`Copyright © 1998, American Society for Microbiology
`
`May 1998, p. 4005–4014
`
`Vol. 72, No. 5
`
`Chromosome Structure and Human Immunodeficiency Virus
`Type 1 cDNA Integration: Centromeric Alphoid
`Repeats Are a Disfavored Target
`SANDRINE CARTEAU, CHRISTOPHER HOFFMANN, AND FREDERIC BUSHMAN*
`Infectious Disease Laboratory, The Salk Institute for Biological Studies, La Jolla, California 92037
`
`Received 22 August 1997/Accepted 19 January 1998
`
`Integration of retroviral cDNA into host chromosomal DNA is an essential and distinctive step in viral
`replication. Despite considerable study, the host determinants of sites for integration have not been fully
`clarified. To investigate integration site selection in vivo, we used two approaches. (i) We have analyzed the host
`sequences flanking 61 human immunodeficiency virus type 1 (HIV-1) integration sites made by experimental
`infection and compared them to a library of 104 control sequences. (ii) We have also analyzed HIV-1
`integration frequencies near several human repeated-sequence DNA families, using a repeat-specific PCR-
`based assay. At odds with previous reports from smaller-scale studies, we found no strong biases either for or
`against integration near repetitive sequences such as Alu or LINE-1 elements. We also did not find a clear bias
`for integration in transcription units as proposed previously, although transcription units were found some-
`what more frequently near integration sites than near controls. However, we did find that centromeric alphoid
`repeats were selectively absent at integration sites. The repeat-specific PCR-based assay also indicated that
`alphoid repeats were disfavored for integration in vivo but not as naked DNA in vitro. Evidently the distinctive
`DNA organization at centromeres disfavors cDNA integration. We also found a weak consensus sequence for
`host DNA at integration sites, and assays of integration in vitro indicated that this sequence is favored as
`naked DNA, revealing in addition an influence of target primary sequence.
`
`To replicate, a retrovirus must integrate a cDNA copy of its
`RNA genome into a chromosome of the host. The host inte-
`gration acceptor sites are not expected to be present as naked
`DNA but rather associated with histones and other DNA-
`binding proteins in chromatin. DNA packaging in vivo is ex-
`pected to influence integration site selection, and the choice of
`integration site may have profound effects on both the virus
`and the host (13, 57). The determinants of integration effi-
`ciency in vivo remain incompletely defined, despite their im-
`portance.
`Previous surveys of in vivo integration sites have led to
`several proposals for factors influencing site selection. Studies
`of Moloney murine leukemia virus have supported a model in
`which open chromatin regions at transcription units were fa-
`vored, since associated features such as DNase I-hypersensitive
`sites (45, 58) or CpG islands (47) were apparently enriched
`near integration sites. Another study proposed that unusual
`host DNA structures were common near integration sites (34).
`A recent study of avian leukosis virus integration frequencies
`at several chromosomal sites failed to show any major differ-
`ences among the regions studied (62), contrary to an earlier
`report (50). For human immunodeficiency virus type 1 (HIV-
`1), it has been proposed that integration may be favored near
`repetitive elements (including LINE-1 elements [54] or Alu
`islands [55]) or topoisomerase cleavage sites (24).
`Assays of integration in vitro have revealed several effects of
`proteins bound to target DNA. Simple DNA-binding proteins
`can block access of integration complexes to target DNA, cre-
`ating regions refractory for integration (3, 9, 44). In contrast,
`wrapping DNA on nucleosomes can create hot spots for inte-
`
`* Corresponding author. Mailing address: Infectious Disease Labo-
`ratory, The Salk Institute for Biological Studies, 10010 N. Torrey Pines
`Rd., La Jolla, CA 92037. Phone: (619) 453-4100, ext. 1630. Fax: (619)
`554-0341. E-mail: rick_bushman@qm.salk.edu.
`
`gration at sites of probable DNA distortion (40–42, 44). Dis-
`tortion of DNA in several other protein-DNA complexes can
`also favor integration (3, 35), consistent with the possibility
`that DNA distortion is involved in the integrase mechanism
`(11, 48).
`Here we present two experiments designed to address some
`of the questions surrounding integration site selection in vivo.
`We have (i) sequenced 61 integration junctions made after
`experimental infection of cultured human T cells and com-
`pared them with 104 control DNA fragments from uninfected
`human cells and (ii) used a region-specific PCR assay to assess
`the frequency of integration near several repeated-sequence
`families. In addition, we have identified a weakly conserved
`sequence at in vivo integration sites and determined that it is
`favored for integration when tested in vitro.
`
`MATERIALS AND METHODS
`
`DNA manipulation. Plasmids containing synthetic integration target sites were
`prepared by annealing pairs of oligonucleotides (CH10-1–CH10-2, CH11-1–
`CH11-2, and CH13-1–CH13-2) (Table 1) and ligating them with pUC19 DNA
`that had been cleaved with EcoRI and HindIII. The standard cloning methods
`used were as described previously (46). Integration target DNAs were prepared
`by cleaving the plasmids mentioned above with PvuII, which releases the oligo-
`nucleotide insert together with flanking plasmid DNA.
`The oligonucleotides used in this study are shown in Table 1.
`Construction of DNA libraries. To generate a large pool of independent
`integration events, SupT1 cells (2 3 107 cells) were infected with the HXB2 or
`R9 (56) (referred to as R8 in reference 22) HIV-1 strain. Viral stocks were
`assayed by measuring the concentration of p24, and the infectivity was scored by
`the MAGI assay (28). Cells were infected at a multiplicity of 1 to 10 and
`harvested 12 to 14 h later. The cellular genomic DNA was depleted of low-
`molecular-weight DNA prior to cloning as described previously (39).
`For construction of library 1 (Fig. 1, method 1), DNA from infected cells was
`cleaved with HindIII and circularized by ligation (31). Sixty-six nanograms of
`DNA was used as the template for PCR. HUA and HUB, divergently oriented
`primers complementary to the HIV long terminal repeats (LTRs), were used for
`the first amplification. Amplification was carried out for 35 cycles of 94°C for 1
`min, 58°C for 1 min, and 72°C for 3 min. The products were purified by using the
`Qiaquick PCR purification kit (Qiagen, Santa Clarita, Calif.). One microliter
`
`4005
`
`MYR1025
`Myriad Genetics, Inc. et al. (Petitioners) v. The Johns Hopkins University (Patent Owner)
`IPR For USPN 8,859,206
`
`Page 1 of 10
`
`

`

`Downloaded from
`
`http://jvi.asm.org/
`
` on March 13, 2017 by guest
`
`4006
`
`CARTEAU ET AL.
`
`J. VIROL.
`
`TABLE 1. Oligonucleotides used in this study
`
`Sequence
`
`Comments
`
`59-CTTTTTGCCTGTACTGGGTCTC-39
`59-GATCAAGGATATCTTGTCTTCGT-39
`59-TCTTGTCTTCGTTGGGAGTGA
`59-GAACCCACTGCTTAAGCCTC-39
`59-CTTCGTTGGGAGTGAATTAG-39
`59-CTTCAAGTAGTGTGTGCCCG-39
`59-GGGTTTTCCAGTCACACCTCAGG-39
`59-CATCAAGCTTGGTACCGAGC-39
`59-TAATACGACTCACTATAGGG-39
`59-TGGCGCAATCTCGGCTCAC-39
`59-CTCCGCTTCCCGGGTTC-39
`59-CTTCCAGTTTTTGCCCATTCAGT-39
`59-AGTATGATATTGGCTGTGGGTTTGTC-39
`59-GCAAGGGGATATGTGGACC-39
`59-ACCACCGTAGGCCTGAAAGCAGTC-39
`59-CCTGAGGCCTCCCTCAGCCAT-39
`59-GCCATGATTGTAAGTTTCCTGAGG-39
`59-GTTTTCCCAGTCACGAC-39
`59-TGTGGAAAATCTCTAGCA-39
`59-CTCCGCTTCCCGGGTTC-39
`59-GCCTAGATCCGTGTGGAAAATC-39
`59-ACTGCTAGAGATTTTCCACACGGATCCTAGGC-39
`59-GCCTAGGATCCGTGTGGAAAATCTCTCTCTAGCA-39
`59-CCATCCTAATACGACTCACTATAGGGC-39
`59-ACTCACTATAGGCTCGAGCGGC-39
`59-CTAATACGACTCACTATAGGGCTCGAGCGGCCGCCCGGGCAGGT-39
`59-ACCTGCCC-NH2-39
`59-AATTCTTCTCGAGTAGGTTACCTATGATCAA-39
`59-AGCTTTGATCATAGGTAACCTACTCGAGAAG-39
`59-AATTCTTCTCGAGTAGTTTAACTATGATCAA-39
`59-AGCTTTGATCATAGTTAAACTACTCGAGAAG-39
`59-AATTCGTGTTAACTCGGTGACCGAAGGCCTA-39
`59-AGCTTAGGCCTTCGGTCACCGAGTTAACACG-39
`
`HIV U3 primer for inverse PCR
`HIV U3 primer for inverse PCR
`HIV U3 primer for inverse PCR
`HIV U3 primer for inverse PCR
`Primer for detection of circle junctions
`Primer for detection of circle junctions
`Primer for detection of the HIV internal fragment
`Primer for sequencing from pTA vector
`Primer for sequencing from pTA vector
`Primer for amplifying Alu1 sequences
`Primer for amplifying Alu1 sequences
`Primer for amplifying LINE-1 sequences
`Primer for amplifying LINE-1 sequences
`Primer for amplifying alphoid repeats
`Primer for amplifying alphoid repeats
`Primer for amplifying THE 1 repeats
`Primer for amplifying THE 1 repeats
`Primer for amplifying integration products in pUC19
`Primer for amplifying HIV U5 sequences
`Primer for amplifying integration products in pUC19
`Primer for amplifying products made with purified integrase
`Substrate for purified integrase (annealed to FB65-2)
`Substrate for purified integrase (annealed to FB64)
`Adaptor primer 1
`Adaptor primer 2
`Vectorette adaptor primer (top strand)
`Vectorette adaptor primer (bottom strand)
`Insert for pCH10 (top strand)
`Insert for pCH10 (bottom strand)
`Insert for pCH11 (top strand)
`Insert for pCH11 (bottom strand)
`Insert for pCH12 (top strand)
`Insert for pCH12 (bottom strand)
`
`Oligo-
`nucleotide
`
`HUA
`HUB
`IP3
`det3b
`det3a
`sc8
`sc10
`TA6
`TA7
`SC24
`CH12
`CH5
`CH6
`SC21
`SC23
`CH15
`CH16
`NEB-40
`FB652
`CH 11
`FB66
`FB64
`FB65-2
`AP1
`AP2
`ADAPT1
`ADAPT2
`CH10-1
`CH10-2
`CH11-1
`CH11-2
`CH13-1
`CH13-2
`
`from the 50-ml column eluate was used as the template for the second-round
`PCR (20 cycles; program as described above) with nested primers det3b and IP3.
`For construction of library 2 (Fig. 1, method 2) DNA fragments sheared by
`sonication (average length, about 1.5 kb) were made blunt-ended by treatment
`with Bal 31 followed by T4 DNA polymerase and deoxynucleoside triphosphates.
`Ligation of adapters, amplification, and cloning were carried out as described
`previously (51), except that primers HUB and IP3 were used as viral end primers
`for the first and second amplifications, respectively. PCR products were cloned
`by using the pCR II TA cloning vector from Invitrogen (San Diego, Calif.).
`The products of PCRs contained two contaminants in addition to the desired
`integration junctions, one derived from a circular form of the viral DNA (2-LTR
`circle) and the second from the 39 internal part of the viral DNA (for a discus-
`sion, see reference 31). Colonies containing host-virus junctions were distin-
`guished from colonies containing contaminating sequences by PCR. Bacterial
`colonies containing plasmids were resuspended in PCR buffer and amplified with
`Taq polymerase for 20 cycles of 1 min at 94°C, 30 s at 60°C, and 1 min at 72°C.
`The circle junctions were detected using primers det3a and sc8. The internal
`fragment was detected using primers sc10 and IP3. The inserts were sequenced
`by using primers TA6 and TA7, which are complementary to the vector (pCR II;
`Invitrogen). Sequences of integration junctions and controls were determined by
`the dideoxy sequencing method.
`Each sequence was determined at least twice. For each integration site clone,
`the sequence of 34 bases of viral DNA at the LTR tip was determined, in
`addition to the flanking host DNA. For most integration site clones (59 of 61),
`all of the cloned human DNA adjacent to the proviral DNA was sequenced.
`A control experiment was carried out to exclude a possible artifact. Since DNA
`samples were treated with DNA ligase, free HIV genomes might have become
`joined to host DNA fragments by DNA ligase instead of integration. This is
`unlikely in the case of library 1, however, since the blunt-ended or 39 cleaved
`forms of the HIV cDNA would not be expected to become ligated to the
`protruding 59 ends generated by cleavage with HindIII. However, to document
`this expectation, a control experiment was performed in which purified uninte-
`grated HIV cDNA was incubated in the presence of DNA ligase with
`HindIII-cleaved sequences and possible ligation was assayed by PCR across the
`ligation junction (one primer complementary to the HIV DNA and the other
`complementary to the HindIII-cleaved test DNA). No ligation was detected
`(data not shown). In the case of library 2, hypothetical ligation of unintegrated
`
`HIV cDNA should have yielded predominantly the vectorette linker joined
`directly to HIV cDNA, since DNA ends from the linkers were present in vast
`excess over ends from viral or human DNA. However, no such forms were
`detected (data not shown). Internal evidence also argues against this class of
`artifacts. For example, the 5-bp consensus host sequence flanking integration
`sites identified here closely resembles that found in a previous study employing
`conventional cloning and sequencing (55), an observation that helps validate
`each study.
`DNA sequence analysis. Sequences were analyzed by comparison to the non-
`redundant human sequence (nr) database, the human cDNA (dbEST) database,
`and the MONTH (November 1997) database by using BLASTN with Search
`Launcher and Repeat Masker. Default parameters were used. For comparisons
`between integration sites and control libraries, only a subset of the available
`sequence was considered (see Table 2), with either an average length of 144 bp
`or a length of exactly 50 bp (see Table 3). A total of 8,809 bp of human DNA
`flanking 61 integration sites was sequenced and analyzed for the integration site
`libraries (see Tables 2 and 3). The lengths of flanking human DNA sequences
`analyzed ranged from 37 to 430 bp. For the control human DNA fragments, a
`total of 14,989 bp in a total of 104 DNA clones were sequenced. Lengths of
`sequences analyzed ranged from 51 to 264 bp. Links to integration site and
`control sequences can be found at http://www.salk.edu/faculty/bushman.html.
`Similarities to repeated sequences were ranked in accordance with the Smith-
`Waterman parameter (SW) generated by Repeat Masker (see A. F. A. Smit and
`P. Green, RepeatMasker at http://ftp.genome.washington.edu/RM/RepeatMas-
`ker.html) or by the probability of matching by chance generated by BLASTN (1)
`(P value) (see http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-blast?Jform50).
`Minimum similarities for each sequence class considered to be significant
`matches are as follows: cDNA, P 5 4.6 3 1026; LINE 1, SW 5 217; Alu repeat,
`SW 5 195; alphoid repeat, SW 5 218; other repeats, SW 5 190. Most regions of
`sequence similarity extended over at least 50 bp, although in the case of the
`lowest scoring cDNA, a 31-bp perfect match was judged to be significant.
`Integration in vitro. Preintegration complexes (PICs) were extracted from a
`6-h coculture of SupT1 cells grown in RPMI 1640 medium containing 10% fetal
`calf serum and chronically infected MoltIIIB cells stimulated with phorbol 12-
`myristate 13-acetate as previously described by Farnet and Haseltine (19). In
`vitro integration was achieved by incubating 400 ml of PIC extract with 1.2 mg of
`DNA from uninfected SupT1 cells for 45 min. The integration product was
`
`Page 2 of 10
`
`

`

`VOL. 72, 1998
`
`CHROMOSOME STRUCTURE AND HIV-1 cDNA INTEGRATION
`
`4007
`
`Downloaded from
`
`http://jvi.asm.org/
`
` on March 13, 2017 by guest
`
`FIG. 1. Cloning strategies for constructing integration site libraries. See the text for details and Table 1 for the sequences of oligonucleotides used.
`
`recovered by incubating it with proteinase K in 0.5% sodium dodecyl sulfate
`followed by extraction with phenol-chloroform. The same procedure was fol-
`lowed for the inactive PICs after first incubating the concentrated PICs in 15 mM
`EDTA for 5 min prior to adding target DNA. Integration assays with recombi-
`nant HIV-1 integrase were carried out essentially as described previously (4, 10).
`Region-specific analysis of integration acceptor sites. Integration junctions
`were amplified essentially as described previously (9, 30, 44). Cellular DNA
`templates were prepared from infected and uninfected samples as described
`above. Integration products were visualized by nested PCR. Products were first
`amplified with viral primer HUB and a repeat primer. Products were then
`reamplified with the viral primer IP3 which had been end labeled by treatment
`with [g-32P]ATP and kinase and a nested repeat primer. The primers for re-
`peated sequences were designed by aligning multiple repeat copies and identi-
`fying conserved regions. Primers for amplifying repeated sequences were as
`follows (see Table 1 for sequences; in each case, the second primer is the nested
`second primer). Alu1, SC24 and CH12 (27); LINE-1, CH5 and CH6 (64); alphoid
`repeat, SC21 and SC23 (61); and THE 1, CH15 and CH16 (52). The amounts of
`integration products generated in vivo and in vitro that were used as templates
`for PCR were adjusted to provide equal numbers of proviruses in each case. The
`first round of PCR was carried out for 30 cycles of 94°C for 30 s, 55°C for 30 s,
`and 72°C for 1 min. For the second round of PCR, 2 ml from the initial PCR was
`added to a 25-ml reaction mixture and the mixture was amplified for 20 cycles of
`94°C for 30 s, 60°C for 30 s, and 72°C for 30 s. TaqStart antibody (Clontech, Palo
`Alto, Calif.) was used in both amplifications (hot-start PCR) in accordance with
`the manufacturer’s recommendations.
`Assays of integration into cloned target DNAs were carried out as described
`previously (for PICs [4, 8] and for purified integrase [3, 33]). PICs were concen-
`trated and partially purified by pelleting through 20% sucrose as described
`before (4). Integration targets were (i) a purified PvuII fragment containing the
`sequence of interest (PICs) or (ii) uncleaved plasmid DNA (purified integrase).
`Similar results were also obtained with PICs when uncleaved plasmid DNAs
`were used as the target. Primers for amplifying integration products were as
`follows: PIC reactions, top strand, NEB-40 and FB 652 (4); PIC reactions,
`bottom strand, CH 11 and FB 652; purified integrase reactions, top strand, FB 66
`(4) and NEB-40; purified integrase reactions, bottom strand, FB 66 and CH 11.
`
`RESULTS
`
`Construction of integration site libraries. DNA for library
`construction was obtained from a human T-cell line (SupT1)
`acutely infected with cell-free stocks of HIV-1. Cellular DNA
`was harvested 12 to 14 h after initiation of infection, allowing
`
`initial integration to be studied separately from selection dur-
`ing subsequent growth of cells.
`Libraries were constructed by two different methods in an
`effort to control for possible biases introduced in the DNA
`cloning steps (Fig. 1). For library 1, genomic DNA from in-
`fected cells was digested with HindIII, which cleaved the pop-
`ulation of proviruses near the viral DNA ends and at numerous
`positions in flanking host DNA. HindIII-cleaved DNA was
`then circularized by treatment with DNA ligase, and virus-host
`DNA junctions were amplified with divergent primers comple-
`mentary to viral end sequences (inverse PCR) (31, 49). For
`library 2, DNA fragments were made blunt ended by treatment
`with Bal 31 nuclease and T4 DNA polymerase and ligated to
`short linkers. DNA fragments were amplified with primers
`complementary to the linker and the HIV cDNA end (vector-
`ette PCR) (51). PCR fragments were then cloned and se-
`quenced. Sixty-one integration sites were analyzed by this
`means.
`To aid in interpretation of the data, control libraries were
`constructed from uninfected SupT1 cell DNA by methods par-
`allel to those used for cloning integration sites. SupT1 DNA
`fragments were generated by cleavage with HindIII (control
`library 1) or sonication and end repair (control library 2),
`cloned into plasmid vectors, and sequenced. One hundred four
`control clones from uninfected human DNA were character-
`ized by this means.
`Analysis of integration site libraries. Analysis of the se-
`quencing data presented several challenges. Our raw sequence
`data contained different numbers of base pairs determined for
`each DNA clone analyzed. To compare the integration site and
`control data sets in a meaningful fashion, it was necessary to
`compare matching numbers of base pairs in each DNA clone
`and then compare the frequencies of appearance of different
`types of sequences in each data set. The average length of host
`DNA flanking integration sites was 144 bp, so sequences in the
`
`Page 3 of 10
`
`

`

`Downloaded from
`
`http://jvi.asm.org/
`
` on March 13, 2017 by guest
`
`4008
`
`CARTEAU ET AL.
`
`J. VIROL.
`
`control library, which were slightly longer, were each truncated
`to yield test sequences with an average length of 144 bp (fur-
`ther parameters describing the data sets are presented in Ma-
`terials and Methods).
`Some copies of the human repeated DNA sequences are
`quite divergent from the family consensus sequence, present-
`ing a challenge for identification. Repeated sequences were
`identified here by a two-step process. The program Repeat
`Masker, which compares unknown sequences to a set of con-
`sensus sequences derived from human repeat sequences (52),
`was used first. In a second step, all sequences were compared
`to the nr, dbEST, and MONTH (November 1997) databases by
`using BLASTN with default settings. In some cases, highly
`repeated sequences missed by Repeat Masker were identified
`by BLASTN and further analysis allowed them to be grouped
`into known sequence classes. The minimum degrees of simi-
`larity scored as matches are given in Materials and Methods.
`Analysis of cDNA matches presented another challenge.
`New sequences are being added to the dbEST database at a
`high rate, and even during the course of this work many anon-
`ymous sequences were found in later searches to match new
`cDNAs. The data presented here represent the number of
`matches to cDNAs as of November 1997, but new additions to
`the database will likely increase the number of matches in the
`future. For cDNAs, there was a natural partitioning of se-
`quences into plausible and unlikely matches, since integration
`into a transcribed region should yield a near-perfect match
`over a discrete region.
`Integration sites sequenced and the matches to known se-
`quences are summarized in Table 2 and 3. Sequences were
`classified as transcription units, Alu elements, LINE elements,
`alphoid repeats, other repeats, or anonymous. Transcription
`units were identified in database searches either as cDNAs or
`as sequences within the transcribed regions of known genes.
`Alu elements and LINE elements are the familiar interspersed
`nuclear repeats characteristic of human DNA. Alphoid repeats
`comprise the alpha satellite DNA, tandem arrays of 171-bp
`repeats associated with centromeric heterochromatin (38, 61).
`The “other repeat” class included several types, namely, SINE
`elements apart from Alu elements, low-complexity repeats, and
`retrovirus-related sequences such as THE 1 elements (36) and
`MLT1 sequences (14, 52) (for a recent summary of nomencla-
`ture, see reference 52). Anonymous sequences were defined as
`sequences contained in none of the classes.
`For the control libraries, Alu sequences were identified in
`10% of clones. Previous studies suggest that Alu elements
`comprise 8 to 15% of the human genome (53). LINE-1 ele-
`ments comprised 13% of the control sequences; 5 to 18% was
`expected (16, 25, 53). Information available on transcription
`units, alphoid repeats, and the other repeats was insufficient to
`allow their abundance to be predicted with confidence. Anal-
`ysis of the %GC of DNA in control library clones and in
`human DNA flanking integration sites revealed no obvious
`differences from that of bulk human DNA (data not shown).
`Thus, in those cases that could be checked, sequences in our
`control libraries had compositions close to those expected for
`randomly selected human genomic DNA fragments.
`Comparison of the integration site and control libraries re-
`vealed that centromeric alphoid repeats were absent among
`integration sites but that six alphoid repeats were present in
`the control libraries (Tables 2 and 3). Alphoid repeats were
`also absent among previously characterized HIV-1 integration
`sites (37, 59).
`Other types of sequences were differentially distributed be-
`tween integration site sequences and control sequences,
`although none showed the all-or-nothing partitioning charac-
`
`teristic of alphoid repeats. Transcription units were more
`abundant in the integration sites (18%) than in controls (8%).
`The other repeats were also differentially distributed (7%) in
`integration sites versus 23% in controls), although in this case
`many different sequence types contributed to the totals. Alu
`elements and LINE elements were not obviously differentially
`distributed.
`As a test of the robustness of our conclusions, integration
`site sequences were reanalyzed after truncation so that only 50
`bp of host DNA remained at the junction between viral and
`host sequences for all clones. The control data was similarly
`truncated to 50 bp in each sequence, arbitrarily starting from
`one junction with the DNA vector used for cloning. Sequence
`similarities were identified in the 50-bp data set by using the
`criteria described above (Table 3). Fewer matches were de-
`tected, as expected, since the sequences were shorter. How-
`ever, in this case also, alphoid repeats were detected in the
`control library and not the integration site library.
`A weak consensus sequence at integration sites. Figure 2
`presents an analysis of the 5 bp of host DNA at the junction
`between virus and host sequences expected to be duplicated
`upon integration. A weak consensus sequence can be derived
`from this data [59 GT(A/T)AC 39]. Only one end was se-
`quenced for each integrant, so the duplicated nature of this
`sequence is inferred. The consensus sequence is rotationally
`symmetric, as expected, since each end of the HIV cDNA is
`joined to the 59 end of each strand of this sequence (Fig. 2). A
`closely related sequence was derived from a previous study of
`HIV integration sites by Stevens and Griffith [59 GTA(A/
`T)(T/C) 39] (55). In this study, DNA from HIV-infected cells
`was cloned in lambda vectors, followed by isolation of provirus-
`containing clones by hybridization and sequencing of 29 pro-
`viral integration sites. The observation that our methods and
`that of Stevens and Griffith yielded similar integration site
`consensus sequences strongly validates each study.
`Region-specific assays of integration target sites. Several
`features of the sequencing data complicated interpretation. (i)
`The number of matching sequences detected was determined
`in part by the choice of parameters in the similarity search. (ii)
`In some clones the integration junctions were within the iden-
`tified cDNA or repeated sequence, while in others the junc-
`tions were near but not within the identified sequence. In
`Tables 2 and 3, these were considered together. (iii) Although
`this study of HIV-1 integration site sequences is the largest yet
`reported, the differences between integration sites and controls
`were generally not clearly significant, as evaluated by the chi-
`square or Fisher’s exact test. No finding was clearly significant
`in the analysis of both the 144-bp flanking sequences and the
`50-bp sequence data. For these reasons, it was important to
`test some of the hypotheses generated by the sequence analysis
`by an independent method.
`To this end, integration near repeated sequences was stud-
`ied by using an assay based on PCR amplification of host-virus
`DNA junctions. In each reaction, one primer was complemen-
`tary to an HIV-1 LTR end and the second primer was com-
`plementary to a repeated sequence (alphoid, Alu, LINE-1, or
`THE 1 repeats) (Fig. 3) (30, 44, 62). The first PCR amplifica-
`tion was followed by a second PCR with nested primers. The
`LTR primer in the second amplification was labeled at the 59
`end with 32P. Amplification products were separated on DNA
`sequencing-type gels and analyzed by autoradiography. An in-
`tegration event in or near the repeated sequence studied gave
`rise to a labeled band by amplification. Amplification of many
`such integration events gave rise to a ladder of labeled bands
`on the final autoradiogram.
`The importance of the in vivo setting was assessed by com-
`
`Page 4 of 10
`
`

`

`Downloaded from
`
`http://jvi.asm.org/
`
` on March 13, 2017 by guest
`
`VOL. 72, 1998
`
`CHROMOSOME STRUCTURE AND HIV-1 cDNA INTEGRATION
`
`4009
`
`TABLE 2. Integration sites analyzed and their similarities to known sequences
`
`Sequence
`namea
`
`Length
`(bp)b
`
`Dup seqc
`
`Identified similaritiesd
`
`Identified similarities truncated to 50 bpe
`
`*f
`106 ATGTC
`*
`60 CAAGC
`LINE-1 [2–153, SW 5 508]
`156 TCTTC
`*
`132 GCTAC
`91 GGAAA *
`139 GTGGT
`*
`140 TATAT
`*
`114 ATCCC
`*
`230 GCATG *
`82 CTATA
`*
`LINE-1 [2–107, SW 5 251]
`212 TACAC
`Alu [15–110, SW 5 716]
`166 CATGC
`89 GTTGG *
`Transcription unit (cDNA) [5–62, P 5 1.6 3 10216]
`63 CTCAC
`111 GTCAC
`*
`164 TATGG LINE-1 [2–107, SW 5 400]
`66 AACAG *
`54 CTCAC
`*
`159 GTTGT
`*
`Alu [3–125, SW 5 956]
`342 GTTTC
`173 CATAT
`*
`38 CACAC
`*
`258 CATTC
`*
`110 GTAAT
`*
`37 CTTTT
`*
`160 CCATT
`*
`Transcription unit (cDNA) [1–93, P 5 3.7 3 10233]
`93 AATAC
`143 GCCCA
`*
`188 ATATT
`*
`Transcription unit (cDNA) [59–157, P 5 5.9 3 10234]
`157 GTTGA
`Transcription unit (VACH1 gene) [1–50, P 5 6 3 10213]
`50 CTTCA
`50 AGTTG *
`Transcription unit (cDNA) [52–143, P 5 2.8 3 10225];
`420 TTAAC
`LINE-2 [223–274, SW 5 252]
`
`MolH 1
`MolH 2
`SupH 1
`SupH 2
`SupH 3
`SupH 4
`SupH 5
`SupH 6
`SupH 7
`SupH 9
`SupH 10
`SupH 11
`SupH 12
`SupH 13
`SupH 14
`SupH 15
`SupH 16
`SupH 17
`SupH 18
`SupH 20
`SupH 21
`SupH 22
`SupH 23
`SupH 24
`SupH 25
`SupH 27
`SupH 28
`SupH 29
`SupH 31
`SupH 32
`SupH 33
`SupH 34
`SupH 35
`
`SupH 36
`SupH 37
`SupH 38
`SupH 39
`SupH 41
`SupH 42
`SupH 43
`
`SupH 44
`SupH 46
`SupH 47
`SupH 48
`SupH 49
`SupS 1
`SupS 2
`SupS 3
`SupS 4
`SupS 5
`SupS 7
`SupS 8
`SupS 9
`SupS 10
`SupS 11
`SupS 12
`SupS 13
`SupS 14
`SupS 15
`SupS 16
`SupS 17
`Total bp
`Avg
`
`*
`*
`*
`*
`*
`*
`*
`*
`*
`*
`*
`Alu [SW 5 304]
`*
`Transcription unit (cDNA) [P 5 1.9 3 10212]
`*
`*
`*
`*
`*
`Alu [SW 5 373]
`*
`Excluded
`*
`*
`Excluded
`*
`Transcription unit (cDNA) [P 5 1.5 3 10213]
`*
`*
`*
`Transcription unit (VACH1 gene) [P 5 6 3 10213]
`*
`*
`
`*
`Alu [SW 5 371]
`*
`*
`Excluded
`LINE-1 [SW 5 264]
`Transcription unit (cDNA) [P 5 3.8 3 10213]
`
`*
`237 CTTGT
`Alu [1–69, SW 5 471]
`69 CACAC
`*
`68 GTTAT
`89 CAAAA *
`41 ATGGC
`*
`LINE-1 [1–437, SW 5 2684]
`437 AAAAC
`Transcription unit (cDNA) [1–179, P 5 9.4 3 10265];
`179 ATAGT
`other repeat (LTR element) [98–152, SW 5 198]
`337 GAAAC Other repeat (MIR, SINE) [191–315, SW 5 493]
`*
`Transcription unit (cDNA) [P 5 4.6 3 1026]
`81 GGGAG Transcription unit (cDNA) [1–33, P 5 3.9 3 1026]
`Transcription unit (cDNA) [P 5 2.2 3 1029]
`Transcription unit (cDNA) [1–57, P 5 2.1 3 10213]
`111 AAAAC
`Other repeat (MIR, SINE) [SW 5 245]
`Other repeat (MIR, SINE) [1–123, SW 5 474]
`125 CTGTG
`Alu [SW 5 300]
`Alu [1–128, SW 5 698]
`260 TTTTG
`Transcription unit (cDNA) [P 5 5.4 3 10213]
`176 GCAGG Transcription unit (CD27 gene) [1–176, P 5 2.7 3 1062]
`*
`113 GTTCT
`*
`Alu [SW 5 195]
`Alu [4–115, SW 5 540]
`125 ATACC
`Other repeat (MER74, LTR element) [1–213, SW 5 599] Other repeat (MER74, LTR element) [SW 5 277]
`215 CCCTC
`147 CAGCA *
`*
`171 GAGTC
`*
`*
`Transcription unit (cDNA) [1–81, 3.2 3 10226]
`Transcription unit (cDNA) [P 5 3.6 3 10213]
`85 TGAGT
`86 GTACC
`*
`*
`52 AAAGC Alu [2–59, SW 5 356]
`Alu [SW 5 310]
`147 CTAAC
`*
`*
`131 GTTTC
`*
`*
`94 ATGTG Transcription unit (cDNA) [1–94, P 5 5.1 3 10228]
`Transcription unit (cDNA) [P 5 3.4 3 10212]
`184 GAGAC *
`*
`120 AAATG *
`*
`161 CTCTG
`*
`*
`215 GTATG *
`*
`8,809
`2,900
`144
`50
`
`a Laboratory designation for each DNA clone.
`b Number of human DNA base pairs sequenced adjacent to the HIV cDNA terminus.
`c Nucleotide sequence of the 5 bp of human DNA at the junction with viral DNA expected to be duplicated upon integration.
`d Sequence similarities found by comparison to sequence databases (the first designation is the sequence class given in Table 3, the name in parentheses is a more
`detailed designation, and the numbers in brackets represent the location of the sequence match [e.g., 1 5 the first cDNA-proximal base pair in host DNA] and the
`degree of similarity).
`e Similarities identified in the 50-bp sequence data set. For explanation of bracketed data, see footnote d.
`f *, anonymous.
`
`Page 5 of 10
`
`

`

`Downloaded from
`
`http://jvi.asm.org/
`
` on March 13, 2017 by guest
`
`4010
`
`CARTEAU ET AL.
`
`J. VIROL.
`
`TABLE 3. Sequence composition of libraries of integration sites
`and control DNA fragments
`
`Sequence class
`
`Analysis of 144-bp
`sequences (avg length)a
`
`Reanalysis of 50-bp
`sequencesb
`
`Integration
`sites (%)
`
`Genomic
`DNA
`(%)
`
`Integration
`sites (%)
`
`Genomic
`DNA
`(%)
`
`Anonymous
`Alu element
`LINE element
`Alphoid repeat
`Other repeats
`Transcription unit
`
`61
`10
`8
`0
`7
`18
`
`43
`9
`13
`6
`22
`8
`
`69
`10
`2
`0
`3
`16
`
`71
`6
`6
`3
`10
`4
`
`a For data from sequences of 144-bp average length, 61 integration sites and
`104 control sequences were considered.
`b For the reanalysis of integration site sequences considering only the proximal
`50 bp of human DNA sequence, 58 integration sites and 104 control sequences
`were considered.
`
`paring integration sites from infected cel

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket