`
`http://jvi.asm.org/
`
` on March 13, 2017 by guest
`
`JOURNAL OF VIROLOGY,
`0022-538X/98/$04.0010
`Copyright © 1998, American Society for Microbiology
`
`May 1998, p. 4005–4014
`
`Vol. 72, No. 5
`
`Chromosome Structure and Human Immunodeficiency Virus
`Type 1 cDNA Integration: Centromeric Alphoid
`Repeats Are a Disfavored Target
`SANDRINE CARTEAU, CHRISTOPHER HOFFMANN, AND FREDERIC BUSHMAN*
`Infectious Disease Laboratory, The Salk Institute for Biological Studies, La Jolla, California 92037
`
`Received 22 August 1997/Accepted 19 January 1998
`
`Integration of retroviral cDNA into host chromosomal DNA is an essential and distinctive step in viral
`replication. Despite considerable study, the host determinants of sites for integration have not been fully
`clarified. To investigate integration site selection in vivo, we used two approaches. (i) We have analyzed the host
`sequences flanking 61 human immunodeficiency virus type 1 (HIV-1) integration sites made by experimental
`infection and compared them to a library of 104 control sequences. (ii) We have also analyzed HIV-1
`integration frequencies near several human repeated-sequence DNA families, using a repeat-specific PCR-
`based assay. At odds with previous reports from smaller-scale studies, we found no strong biases either for or
`against integration near repetitive sequences such as Alu or LINE-1 elements. We also did not find a clear bias
`for integration in transcription units as proposed previously, although transcription units were found some-
`what more frequently near integration sites than near controls. However, we did find that centromeric alphoid
`repeats were selectively absent at integration sites. The repeat-specific PCR-based assay also indicated that
`alphoid repeats were disfavored for integration in vivo but not as naked DNA in vitro. Evidently the distinctive
`DNA organization at centromeres disfavors cDNA integration. We also found a weak consensus sequence for
`host DNA at integration sites, and assays of integration in vitro indicated that this sequence is favored as
`naked DNA, revealing in addition an influence of target primary sequence.
`
`To replicate, a retrovirus must integrate a cDNA copy of its
`RNA genome into a chromosome of the host. The host inte-
`gration acceptor sites are not expected to be present as naked
`DNA but rather associated with histones and other DNA-
`binding proteins in chromatin. DNA packaging in vivo is ex-
`pected to influence integration site selection, and the choice of
`integration site may have profound effects on both the virus
`and the host (13, 57). The determinants of integration effi-
`ciency in vivo remain incompletely defined, despite their im-
`portance.
`Previous surveys of in vivo integration sites have led to
`several proposals for factors influencing site selection. Studies
`of Moloney murine leukemia virus have supported a model in
`which open chromatin regions at transcription units were fa-
`vored, since associated features such as DNase I-hypersensitive
`sites (45, 58) or CpG islands (47) were apparently enriched
`near integration sites. Another study proposed that unusual
`host DNA structures were common near integration sites (34).
`A recent study of avian leukosis virus integration frequencies
`at several chromosomal sites failed to show any major differ-
`ences among the regions studied (62), contrary to an earlier
`report (50). For human immunodeficiency virus type 1 (HIV-
`1), it has been proposed that integration may be favored near
`repetitive elements (including LINE-1 elements [54] or Alu
`islands [55]) or topoisomerase cleavage sites (24).
`Assays of integration in vitro have revealed several effects of
`proteins bound to target DNA. Simple DNA-binding proteins
`can block access of integration complexes to target DNA, cre-
`ating regions refractory for integration (3, 9, 44). In contrast,
`wrapping DNA on nucleosomes can create hot spots for inte-
`
`* Corresponding author. Mailing address: Infectious Disease Labo-
`ratory, The Salk Institute for Biological Studies, 10010 N. Torrey Pines
`Rd., La Jolla, CA 92037. Phone: (619) 453-4100, ext. 1630. Fax: (619)
`554-0341. E-mail: rick_bushman@qm.salk.edu.
`
`gration at sites of probable DNA distortion (40–42, 44). Dis-
`tortion of DNA in several other protein-DNA complexes can
`also favor integration (3, 35), consistent with the possibility
`that DNA distortion is involved in the integrase mechanism
`(11, 48).
`Here we present two experiments designed to address some
`of the questions surrounding integration site selection in vivo.
`We have (i) sequenced 61 integration junctions made after
`experimental infection of cultured human T cells and com-
`pared them with 104 control DNA fragments from uninfected
`human cells and (ii) used a region-specific PCR assay to assess
`the frequency of integration near several repeated-sequence
`families. In addition, we have identified a weakly conserved
`sequence at in vivo integration sites and determined that it is
`favored for integration when tested in vitro.
`
`MATERIALS AND METHODS
`
`DNA manipulation. Plasmids containing synthetic integration target sites were
`prepared by annealing pairs of oligonucleotides (CH10-1–CH10-2, CH11-1–
`CH11-2, and CH13-1–CH13-2) (Table 1) and ligating them with pUC19 DNA
`that had been cleaved with EcoRI and HindIII. The standard cloning methods
`used were as described previously (46). Integration target DNAs were prepared
`by cleaving the plasmids mentioned above with PvuII, which releases the oligo-
`nucleotide insert together with flanking plasmid DNA.
`The oligonucleotides used in this study are shown in Table 1.
`Construction of DNA libraries. To generate a large pool of independent
`integration events, SupT1 cells (2 3 107 cells) were infected with the HXB2 or
`R9 (56) (referred to as R8 in reference 22) HIV-1 strain. Viral stocks were
`assayed by measuring the concentration of p24, and the infectivity was scored by
`the MAGI assay (28). Cells were infected at a multiplicity of 1 to 10 and
`harvested 12 to 14 h later. The cellular genomic DNA was depleted of low-
`molecular-weight DNA prior to cloning as described previously (39).
`For construction of library 1 (Fig. 1, method 1), DNA from infected cells was
`cleaved with HindIII and circularized by ligation (31). Sixty-six nanograms of
`DNA was used as the template for PCR. HUA and HUB, divergently oriented
`primers complementary to the HIV long terminal repeats (LTRs), were used for
`the first amplification. Amplification was carried out for 35 cycles of 94°C for 1
`min, 58°C for 1 min, and 72°C for 3 min. The products were purified by using the
`Qiaquick PCR purification kit (Qiagen, Santa Clarita, Calif.). One microliter
`
`4005
`
`MYR1025
`Myriad Genetics, Inc. et al. (Petitioners) v. The Johns Hopkins University (Patent Owner)
`IPR For USPN 8,859,206
`
`Page 1 of 10
`
`
`
`Downloaded from
`
`http://jvi.asm.org/
`
` on March 13, 2017 by guest
`
`4006
`
`CARTEAU ET AL.
`
`J. VIROL.
`
`TABLE 1. Oligonucleotides used in this study
`
`Sequence
`
`Comments
`
`59-CTTTTTGCCTGTACTGGGTCTC-39
`59-GATCAAGGATATCTTGTCTTCGT-39
`59-TCTTGTCTTCGTTGGGAGTGA
`59-GAACCCACTGCTTAAGCCTC-39
`59-CTTCGTTGGGAGTGAATTAG-39
`59-CTTCAAGTAGTGTGTGCCCG-39
`59-GGGTTTTCCAGTCACACCTCAGG-39
`59-CATCAAGCTTGGTACCGAGC-39
`59-TAATACGACTCACTATAGGG-39
`59-TGGCGCAATCTCGGCTCAC-39
`59-CTCCGCTTCCCGGGTTC-39
`59-CTTCCAGTTTTTGCCCATTCAGT-39
`59-AGTATGATATTGGCTGTGGGTTTGTC-39
`59-GCAAGGGGATATGTGGACC-39
`59-ACCACCGTAGGCCTGAAAGCAGTC-39
`59-CCTGAGGCCTCCCTCAGCCAT-39
`59-GCCATGATTGTAAGTTTCCTGAGG-39
`59-GTTTTCCCAGTCACGAC-39
`59-TGTGGAAAATCTCTAGCA-39
`59-CTCCGCTTCCCGGGTTC-39
`59-GCCTAGATCCGTGTGGAAAATC-39
`59-ACTGCTAGAGATTTTCCACACGGATCCTAGGC-39
`59-GCCTAGGATCCGTGTGGAAAATCTCTCTCTAGCA-39
`59-CCATCCTAATACGACTCACTATAGGGC-39
`59-ACTCACTATAGGCTCGAGCGGC-39
`59-CTAATACGACTCACTATAGGGCTCGAGCGGCCGCCCGGGCAGGT-39
`59-ACCTGCCC-NH2-39
`59-AATTCTTCTCGAGTAGGTTACCTATGATCAA-39
`59-AGCTTTGATCATAGGTAACCTACTCGAGAAG-39
`59-AATTCTTCTCGAGTAGTTTAACTATGATCAA-39
`59-AGCTTTGATCATAGTTAAACTACTCGAGAAG-39
`59-AATTCGTGTTAACTCGGTGACCGAAGGCCTA-39
`59-AGCTTAGGCCTTCGGTCACCGAGTTAACACG-39
`
`HIV U3 primer for inverse PCR
`HIV U3 primer for inverse PCR
`HIV U3 primer for inverse PCR
`HIV U3 primer for inverse PCR
`Primer for detection of circle junctions
`Primer for detection of circle junctions
`Primer for detection of the HIV internal fragment
`Primer for sequencing from pTA vector
`Primer for sequencing from pTA vector
`Primer for amplifying Alu1 sequences
`Primer for amplifying Alu1 sequences
`Primer for amplifying LINE-1 sequences
`Primer for amplifying LINE-1 sequences
`Primer for amplifying alphoid repeats
`Primer for amplifying alphoid repeats
`Primer for amplifying THE 1 repeats
`Primer for amplifying THE 1 repeats
`Primer for amplifying integration products in pUC19
`Primer for amplifying HIV U5 sequences
`Primer for amplifying integration products in pUC19
`Primer for amplifying products made with purified integrase
`Substrate for purified integrase (annealed to FB65-2)
`Substrate for purified integrase (annealed to FB64)
`Adaptor primer 1
`Adaptor primer 2
`Vectorette adaptor primer (top strand)
`Vectorette adaptor primer (bottom strand)
`Insert for pCH10 (top strand)
`Insert for pCH10 (bottom strand)
`Insert for pCH11 (top strand)
`Insert for pCH11 (bottom strand)
`Insert for pCH12 (top strand)
`Insert for pCH12 (bottom strand)
`
`Oligo-
`nucleotide
`
`HUA
`HUB
`IP3
`det3b
`det3a
`sc8
`sc10
`TA6
`TA7
`SC24
`CH12
`CH5
`CH6
`SC21
`SC23
`CH15
`CH16
`NEB-40
`FB652
`CH 11
`FB66
`FB64
`FB65-2
`AP1
`AP2
`ADAPT1
`ADAPT2
`CH10-1
`CH10-2
`CH11-1
`CH11-2
`CH13-1
`CH13-2
`
`from the 50-ml column eluate was used as the template for the second-round
`PCR (20 cycles; program as described above) with nested primers det3b and IP3.
`For construction of library 2 (Fig. 1, method 2) DNA fragments sheared by
`sonication (average length, about 1.5 kb) were made blunt-ended by treatment
`with Bal 31 followed by T4 DNA polymerase and deoxynucleoside triphosphates.
`Ligation of adapters, amplification, and cloning were carried out as described
`previously (51), except that primers HUB and IP3 were used as viral end primers
`for the first and second amplifications, respectively. PCR products were cloned
`by using the pCR II TA cloning vector from Invitrogen (San Diego, Calif.).
`The products of PCRs contained two contaminants in addition to the desired
`integration junctions, one derived from a circular form of the viral DNA (2-LTR
`circle) and the second from the 39 internal part of the viral DNA (for a discus-
`sion, see reference 31). Colonies containing host-virus junctions were distin-
`guished from colonies containing contaminating sequences by PCR. Bacterial
`colonies containing plasmids were resuspended in PCR buffer and amplified with
`Taq polymerase for 20 cycles of 1 min at 94°C, 30 s at 60°C, and 1 min at 72°C.
`The circle junctions were detected using primers det3a and sc8. The internal
`fragment was detected using primers sc10 and IP3. The inserts were sequenced
`by using primers TA6 and TA7, which are complementary to the vector (pCR II;
`Invitrogen). Sequences of integration junctions and controls were determined by
`the dideoxy sequencing method.
`Each sequence was determined at least twice. For each integration site clone,
`the sequence of 34 bases of viral DNA at the LTR tip was determined, in
`addition to the flanking host DNA. For most integration site clones (59 of 61),
`all of the cloned human DNA adjacent to the proviral DNA was sequenced.
`A control experiment was carried out to exclude a possible artifact. Since DNA
`samples were treated with DNA ligase, free HIV genomes might have become
`joined to host DNA fragments by DNA ligase instead of integration. This is
`unlikely in the case of library 1, however, since the blunt-ended or 39 cleaved
`forms of the HIV cDNA would not be expected to become ligated to the
`protruding 59 ends generated by cleavage with HindIII. However, to document
`this expectation, a control experiment was performed in which purified uninte-
`grated HIV cDNA was incubated in the presence of DNA ligase with
`HindIII-cleaved sequences and possible ligation was assayed by PCR across the
`ligation junction (one primer complementary to the HIV DNA and the other
`complementary to the HindIII-cleaved test DNA). No ligation was detected
`(data not shown). In the case of library 2, hypothetical ligation of unintegrated
`
`HIV cDNA should have yielded predominantly the vectorette linker joined
`directly to HIV cDNA, since DNA ends from the linkers were present in vast
`excess over ends from viral or human DNA. However, no such forms were
`detected (data not shown). Internal evidence also argues against this class of
`artifacts. For example, the 5-bp consensus host sequence flanking integration
`sites identified here closely resembles that found in a previous study employing
`conventional cloning and sequencing (55), an observation that helps validate
`each study.
`DNA sequence analysis. Sequences were analyzed by comparison to the non-
`redundant human sequence (nr) database, the human cDNA (dbEST) database,
`and the MONTH (November 1997) database by using BLASTN with Search
`Launcher and Repeat Masker. Default parameters were used. For comparisons
`between integration sites and control libraries, only a subset of the available
`sequence was considered (see Table 2), with either an average length of 144 bp
`or a length of exactly 50 bp (see Table 3). A total of 8,809 bp of human DNA
`flanking 61 integration sites was sequenced and analyzed for the integration site
`libraries (see Tables 2 and 3). The lengths of flanking human DNA sequences
`analyzed ranged from 37 to 430 bp. For the control human DNA fragments, a
`total of 14,989 bp in a total of 104 DNA clones were sequenced. Lengths of
`sequences analyzed ranged from 51 to 264 bp. Links to integration site and
`control sequences can be found at http://www.salk.edu/faculty/bushman.html.
`Similarities to repeated sequences were ranked in accordance with the Smith-
`Waterman parameter (SW) generated by Repeat Masker (see A. F. A. Smit and
`P. Green, RepeatMasker at http://ftp.genome.washington.edu/RM/RepeatMas-
`ker.html) or by the probability of matching by chance generated by BLASTN (1)
`(P value) (see http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-blast?Jform50).
`Minimum similarities for each sequence class considered to be significant
`matches are as follows: cDNA, P 5 4.6 3 1026; LINE 1, SW 5 217; Alu repeat,
`SW 5 195; alphoid repeat, SW 5 218; other repeats, SW 5 190. Most regions of
`sequence similarity extended over at least 50 bp, although in the case of the
`lowest scoring cDNA, a 31-bp perfect match was judged to be significant.
`Integration in vitro. Preintegration complexes (PICs) were extracted from a
`6-h coculture of SupT1 cells grown in RPMI 1640 medium containing 10% fetal
`calf serum and chronically infected MoltIIIB cells stimulated with phorbol 12-
`myristate 13-acetate as previously described by Farnet and Haseltine (19). In
`vitro integration was achieved by incubating 400 ml of PIC extract with 1.2 mg of
`DNA from uninfected SupT1 cells for 45 min. The integration product was
`
`Page 2 of 10
`
`
`
`VOL. 72, 1998
`
`CHROMOSOME STRUCTURE AND HIV-1 cDNA INTEGRATION
`
`4007
`
`Downloaded from
`
`http://jvi.asm.org/
`
` on March 13, 2017 by guest
`
`FIG. 1. Cloning strategies for constructing integration site libraries. See the text for details and Table 1 for the sequences of oligonucleotides used.
`
`recovered by incubating it with proteinase K in 0.5% sodium dodecyl sulfate
`followed by extraction with phenol-chloroform. The same procedure was fol-
`lowed for the inactive PICs after first incubating the concentrated PICs in 15 mM
`EDTA for 5 min prior to adding target DNA. Integration assays with recombi-
`nant HIV-1 integrase were carried out essentially as described previously (4, 10).
`Region-specific analysis of integration acceptor sites. Integration junctions
`were amplified essentially as described previously (9, 30, 44). Cellular DNA
`templates were prepared from infected and uninfected samples as described
`above. Integration products were visualized by nested PCR. Products were first
`amplified with viral primer HUB and a repeat primer. Products were then
`reamplified with the viral primer IP3 which had been end labeled by treatment
`with [g-32P]ATP and kinase and a nested repeat primer. The primers for re-
`peated sequences were designed by aligning multiple repeat copies and identi-
`fying conserved regions. Primers for amplifying repeated sequences were as
`follows (see Table 1 for sequences; in each case, the second primer is the nested
`second primer). Alu1, SC24 and CH12 (27); LINE-1, CH5 and CH6 (64); alphoid
`repeat, SC21 and SC23 (61); and THE 1, CH15 and CH16 (52). The amounts of
`integration products generated in vivo and in vitro that were used as templates
`for PCR were adjusted to provide equal numbers of proviruses in each case. The
`first round of PCR was carried out for 30 cycles of 94°C for 30 s, 55°C for 30 s,
`and 72°C for 1 min. For the second round of PCR, 2 ml from the initial PCR was
`added to a 25-ml reaction mixture and the mixture was amplified for 20 cycles of
`94°C for 30 s, 60°C for 30 s, and 72°C for 30 s. TaqStart antibody (Clontech, Palo
`Alto, Calif.) was used in both amplifications (hot-start PCR) in accordance with
`the manufacturer’s recommendations.
`Assays of integration into cloned target DNAs were carried out as described
`previously (for PICs [4, 8] and for purified integrase [3, 33]). PICs were concen-
`trated and partially purified by pelleting through 20% sucrose as described
`before (4). Integration targets were (i) a purified PvuII fragment containing the
`sequence of interest (PICs) or (ii) uncleaved plasmid DNA (purified integrase).
`Similar results were also obtained with PICs when uncleaved plasmid DNAs
`were used as the target. Primers for amplifying integration products were as
`follows: PIC reactions, top strand, NEB-40 and FB 652 (4); PIC reactions,
`bottom strand, CH 11 and FB 652; purified integrase reactions, top strand, FB 66
`(4) and NEB-40; purified integrase reactions, bottom strand, FB 66 and CH 11.
`
`RESULTS
`
`Construction of integration site libraries. DNA for library
`construction was obtained from a human T-cell line (SupT1)
`acutely infected with cell-free stocks of HIV-1. Cellular DNA
`was harvested 12 to 14 h after initiation of infection, allowing
`
`initial integration to be studied separately from selection dur-
`ing subsequent growth of cells.
`Libraries were constructed by two different methods in an
`effort to control for possible biases introduced in the DNA
`cloning steps (Fig. 1). For library 1, genomic DNA from in-
`fected cells was digested with HindIII, which cleaved the pop-
`ulation of proviruses near the viral DNA ends and at numerous
`positions in flanking host DNA. HindIII-cleaved DNA was
`then circularized by treatment with DNA ligase, and virus-host
`DNA junctions were amplified with divergent primers comple-
`mentary to viral end sequences (inverse PCR) (31, 49). For
`library 2, DNA fragments were made blunt ended by treatment
`with Bal 31 nuclease and T4 DNA polymerase and ligated to
`short linkers. DNA fragments were amplified with primers
`complementary to the linker and the HIV cDNA end (vector-
`ette PCR) (51). PCR fragments were then cloned and se-
`quenced. Sixty-one integration sites were analyzed by this
`means.
`To aid in interpretation of the data, control libraries were
`constructed from uninfected SupT1 cell DNA by methods par-
`allel to those used for cloning integration sites. SupT1 DNA
`fragments were generated by cleavage with HindIII (control
`library 1) or sonication and end repair (control library 2),
`cloned into plasmid vectors, and sequenced. One hundred four
`control clones from uninfected human DNA were character-
`ized by this means.
`Analysis of integration site libraries. Analysis of the se-
`quencing data presented several challenges. Our raw sequence
`data contained different numbers of base pairs determined for
`each DNA clone analyzed. To compare the integration site and
`control data sets in a meaningful fashion, it was necessary to
`compare matching numbers of base pairs in each DNA clone
`and then compare the frequencies of appearance of different
`types of sequences in each data set. The average length of host
`DNA flanking integration sites was 144 bp, so sequences in the
`
`Page 3 of 10
`
`
`
`Downloaded from
`
`http://jvi.asm.org/
`
` on March 13, 2017 by guest
`
`4008
`
`CARTEAU ET AL.
`
`J. VIROL.
`
`control library, which were slightly longer, were each truncated
`to yield test sequences with an average length of 144 bp (fur-
`ther parameters describing the data sets are presented in Ma-
`terials and Methods).
`Some copies of the human repeated DNA sequences are
`quite divergent from the family consensus sequence, present-
`ing a challenge for identification. Repeated sequences were
`identified here by a two-step process. The program Repeat
`Masker, which compares unknown sequences to a set of con-
`sensus sequences derived from human repeat sequences (52),
`was used first. In a second step, all sequences were compared
`to the nr, dbEST, and MONTH (November 1997) databases by
`using BLASTN with default settings. In some cases, highly
`repeated sequences missed by Repeat Masker were identified
`by BLASTN and further analysis allowed them to be grouped
`into known sequence classes. The minimum degrees of simi-
`larity scored as matches are given in Materials and Methods.
`Analysis of cDNA matches presented another challenge.
`New sequences are being added to the dbEST database at a
`high rate, and even during the course of this work many anon-
`ymous sequences were found in later searches to match new
`cDNAs. The data presented here represent the number of
`matches to cDNAs as of November 1997, but new additions to
`the database will likely increase the number of matches in the
`future. For cDNAs, there was a natural partitioning of se-
`quences into plausible and unlikely matches, since integration
`into a transcribed region should yield a near-perfect match
`over a discrete region.
`Integration sites sequenced and the matches to known se-
`quences are summarized in Table 2 and 3. Sequences were
`classified as transcription units, Alu elements, LINE elements,
`alphoid repeats, other repeats, or anonymous. Transcription
`units were identified in database searches either as cDNAs or
`as sequences within the transcribed regions of known genes.
`Alu elements and LINE elements are the familiar interspersed
`nuclear repeats characteristic of human DNA. Alphoid repeats
`comprise the alpha satellite DNA, tandem arrays of 171-bp
`repeats associated with centromeric heterochromatin (38, 61).
`The “other repeat” class included several types, namely, SINE
`elements apart from Alu elements, low-complexity repeats, and
`retrovirus-related sequences such as THE 1 elements (36) and
`MLT1 sequences (14, 52) (for a recent summary of nomencla-
`ture, see reference 52). Anonymous sequences were defined as
`sequences contained in none of the classes.
`For the control libraries, Alu sequences were identified in
`10% of clones. Previous studies suggest that Alu elements
`comprise 8 to 15% of the human genome (53). LINE-1 ele-
`ments comprised 13% of the control sequences; 5 to 18% was
`expected (16, 25, 53). Information available on transcription
`units, alphoid repeats, and the other repeats was insufficient to
`allow their abundance to be predicted with confidence. Anal-
`ysis of the %GC of DNA in control library clones and in
`human DNA flanking integration sites revealed no obvious
`differences from that of bulk human DNA (data not shown).
`Thus, in those cases that could be checked, sequences in our
`control libraries had compositions close to those expected for
`randomly selected human genomic DNA fragments.
`Comparison of the integration site and control libraries re-
`vealed that centromeric alphoid repeats were absent among
`integration sites but that six alphoid repeats were present in
`the control libraries (Tables 2 and 3). Alphoid repeats were
`also absent among previously characterized HIV-1 integration
`sites (37, 59).
`Other types of sequences were differentially distributed be-
`tween integration site sequences and control sequences,
`although none showed the all-or-nothing partitioning charac-
`
`teristic of alphoid repeats. Transcription units were more
`abundant in the integration sites (18%) than in controls (8%).
`The other repeats were also differentially distributed (7%) in
`integration sites versus 23% in controls), although in this case
`many different sequence types contributed to the totals. Alu
`elements and LINE elements were not obviously differentially
`distributed.
`As a test of the robustness of our conclusions, integration
`site sequences were reanalyzed after truncation so that only 50
`bp of host DNA remained at the junction between viral and
`host sequences for all clones. The control data was similarly
`truncated to 50 bp in each sequence, arbitrarily starting from
`one junction with the DNA vector used for cloning. Sequence
`similarities were identified in the 50-bp data set by using the
`criteria described above (Table 3). Fewer matches were de-
`tected, as expected, since the sequences were shorter. How-
`ever, in this case also, alphoid repeats were detected in the
`control library and not the integration site library.
`A weak consensus sequence at integration sites. Figure 2
`presents an analysis of the 5 bp of host DNA at the junction
`between virus and host sequences expected to be duplicated
`upon integration. A weak consensus sequence can be derived
`from this data [59 GT(A/T)AC 39]. Only one end was se-
`quenced for each integrant, so the duplicated nature of this
`sequence is inferred. The consensus sequence is rotationally
`symmetric, as expected, since each end of the HIV cDNA is
`joined to the 59 end of each strand of this sequence (Fig. 2). A
`closely related sequence was derived from a previous study of
`HIV integration sites by Stevens and Griffith [59 GTA(A/
`T)(T/C) 39] (55). In this study, DNA from HIV-infected cells
`was cloned in lambda vectors, followed by isolation of provirus-
`containing clones by hybridization and sequencing of 29 pro-
`viral integration sites. The observation that our methods and
`that of Stevens and Griffith yielded similar integration site
`consensus sequences strongly validates each study.
`Region-specific assays of integration target sites. Several
`features of the sequencing data complicated interpretation. (i)
`The number of matching sequences detected was determined
`in part by the choice of parameters in the similarity search. (ii)
`In some clones the integration junctions were within the iden-
`tified cDNA or repeated sequence, while in others the junc-
`tions were near but not within the identified sequence. In
`Tables 2 and 3, these were considered together. (iii) Although
`this study of HIV-1 integration site sequences is the largest yet
`reported, the differences between integration sites and controls
`were generally not clearly significant, as evaluated by the chi-
`square or Fisher’s exact test. No finding was clearly significant
`in the analysis of both the 144-bp flanking sequences and the
`50-bp sequence data. For these reasons, it was important to
`test some of the hypotheses generated by the sequence analysis
`by an independent method.
`To this end, integration near repeated sequences was stud-
`ied by using an assay based on PCR amplification of host-virus
`DNA junctions. In each reaction, one primer was complemen-
`tary to an HIV-1 LTR end and the second primer was com-
`plementary to a repeated sequence (alphoid, Alu, LINE-1, or
`THE 1 repeats) (Fig. 3) (30, 44, 62). The first PCR amplifica-
`tion was followed by a second PCR with nested primers. The
`LTR primer in the second amplification was labeled at the 59
`end with 32P. Amplification products were separated on DNA
`sequencing-type gels and analyzed by autoradiography. An in-
`tegration event in or near the repeated sequence studied gave
`rise to a labeled band by amplification. Amplification of many
`such integration events gave rise to a ladder of labeled bands
`on the final autoradiogram.
`The importance of the in vivo setting was assessed by com-
`
`Page 4 of 10
`
`
`
`Downloaded from
`
`http://jvi.asm.org/
`
` on March 13, 2017 by guest
`
`VOL. 72, 1998
`
`CHROMOSOME STRUCTURE AND HIV-1 cDNA INTEGRATION
`
`4009
`
`TABLE 2. Integration sites analyzed and their similarities to known sequences
`
`Sequence
`namea
`
`Length
`(bp)b
`
`Dup seqc
`
`Identified similaritiesd
`
`Identified similarities truncated to 50 bpe
`
`*f
`106 ATGTC
`*
`60 CAAGC
`LINE-1 [2–153, SW 5 508]
`156 TCTTC
`*
`132 GCTAC
`91 GGAAA *
`139 GTGGT
`*
`140 TATAT
`*
`114 ATCCC
`*
`230 GCATG *
`82 CTATA
`*
`LINE-1 [2–107, SW 5 251]
`212 TACAC
`Alu [15–110, SW 5 716]
`166 CATGC
`89 GTTGG *
`Transcription unit (cDNA) [5–62, P 5 1.6 3 10216]
`63 CTCAC
`111 GTCAC
`*
`164 TATGG LINE-1 [2–107, SW 5 400]
`66 AACAG *
`54 CTCAC
`*
`159 GTTGT
`*
`Alu [3–125, SW 5 956]
`342 GTTTC
`173 CATAT
`*
`38 CACAC
`*
`258 CATTC
`*
`110 GTAAT
`*
`37 CTTTT
`*
`160 CCATT
`*
`Transcription unit (cDNA) [1–93, P 5 3.7 3 10233]
`93 AATAC
`143 GCCCA
`*
`188 ATATT
`*
`Transcription unit (cDNA) [59–157, P 5 5.9 3 10234]
`157 GTTGA
`Transcription unit (VACH1 gene) [1–50, P 5 6 3 10213]
`50 CTTCA
`50 AGTTG *
`Transcription unit (cDNA) [52–143, P 5 2.8 3 10225];
`420 TTAAC
`LINE-2 [223–274, SW 5 252]
`
`MolH 1
`MolH 2
`SupH 1
`SupH 2
`SupH 3
`SupH 4
`SupH 5
`SupH 6
`SupH 7
`SupH 9
`SupH 10
`SupH 11
`SupH 12
`SupH 13
`SupH 14
`SupH 15
`SupH 16
`SupH 17
`SupH 18
`SupH 20
`SupH 21
`SupH 22
`SupH 23
`SupH 24
`SupH 25
`SupH 27
`SupH 28
`SupH 29
`SupH 31
`SupH 32
`SupH 33
`SupH 34
`SupH 35
`
`SupH 36
`SupH 37
`SupH 38
`SupH 39
`SupH 41
`SupH 42
`SupH 43
`
`SupH 44
`SupH 46
`SupH 47
`SupH 48
`SupH 49
`SupS 1
`SupS 2
`SupS 3
`SupS 4
`SupS 5
`SupS 7
`SupS 8
`SupS 9
`SupS 10
`SupS 11
`SupS 12
`SupS 13
`SupS 14
`SupS 15
`SupS 16
`SupS 17
`Total bp
`Avg
`
`*
`*
`*
`*
`*
`*
`*
`*
`*
`*
`*
`Alu [SW 5 304]
`*
`Transcription unit (cDNA) [P 5 1.9 3 10212]
`*
`*
`*
`*
`*
`Alu [SW 5 373]
`*
`Excluded
`*
`*
`Excluded
`*
`Transcription unit (cDNA) [P 5 1.5 3 10213]
`*
`*
`*
`Transcription unit (VACH1 gene) [P 5 6 3 10213]
`*
`*
`
`*
`Alu [SW 5 371]
`*
`*
`Excluded
`LINE-1 [SW 5 264]
`Transcription unit (cDNA) [P 5 3.8 3 10213]
`
`*
`237 CTTGT
`Alu [1–69, SW 5 471]
`69 CACAC
`*
`68 GTTAT
`89 CAAAA *
`41 ATGGC
`*
`LINE-1 [1–437, SW 5 2684]
`437 AAAAC
`Transcription unit (cDNA) [1–179, P 5 9.4 3 10265];
`179 ATAGT
`other repeat (LTR element) [98–152, SW 5 198]
`337 GAAAC Other repeat (MIR, SINE) [191–315, SW 5 493]
`*
`Transcription unit (cDNA) [P 5 4.6 3 1026]
`81 GGGAG Transcription unit (cDNA) [1–33, P 5 3.9 3 1026]
`Transcription unit (cDNA) [P 5 2.2 3 1029]
`Transcription unit (cDNA) [1–57, P 5 2.1 3 10213]
`111 AAAAC
`Other repeat (MIR, SINE) [SW 5 245]
`Other repeat (MIR, SINE) [1–123, SW 5 474]
`125 CTGTG
`Alu [SW 5 300]
`Alu [1–128, SW 5 698]
`260 TTTTG
`Transcription unit (cDNA) [P 5 5.4 3 10213]
`176 GCAGG Transcription unit (CD27 gene) [1–176, P 5 2.7 3 1062]
`*
`113 GTTCT
`*
`Alu [SW 5 195]
`Alu [4–115, SW 5 540]
`125 ATACC
`Other repeat (MER74, LTR element) [1–213, SW 5 599] Other repeat (MER74, LTR element) [SW 5 277]
`215 CCCTC
`147 CAGCA *
`*
`171 GAGTC
`*
`*
`Transcription unit (cDNA) [1–81, 3.2 3 10226]
`Transcription unit (cDNA) [P 5 3.6 3 10213]
`85 TGAGT
`86 GTACC
`*
`*
`52 AAAGC Alu [2–59, SW 5 356]
`Alu [SW 5 310]
`147 CTAAC
`*
`*
`131 GTTTC
`*
`*
`94 ATGTG Transcription unit (cDNA) [1–94, P 5 5.1 3 10228]
`Transcription unit (cDNA) [P 5 3.4 3 10212]
`184 GAGAC *
`*
`120 AAATG *
`*
`161 CTCTG
`*
`*
`215 GTATG *
`*
`8,809
`2,900
`144
`50
`
`a Laboratory designation for each DNA clone.
`b Number of human DNA base pairs sequenced adjacent to the HIV cDNA terminus.
`c Nucleotide sequence of the 5 bp of human DNA at the junction with viral DNA expected to be duplicated upon integration.
`d Sequence similarities found by comparison to sequence databases (the first designation is the sequence class given in Table 3, the name in parentheses is a more
`detailed designation, and the numbers in brackets represent the location of the sequence match [e.g., 1 5 the first cDNA-proximal base pair in host DNA] and the
`degree of similarity).
`e Similarities identified in the 50-bp sequence data set. For explanation of bracketed data, see footnote d.
`f *, anonymous.
`
`Page 5 of 10
`
`
`
`Downloaded from
`
`http://jvi.asm.org/
`
` on March 13, 2017 by guest
`
`4010
`
`CARTEAU ET AL.
`
`J. VIROL.
`
`TABLE 3. Sequence composition of libraries of integration sites
`and control DNA fragments
`
`Sequence class
`
`Analysis of 144-bp
`sequences (avg length)a
`
`Reanalysis of 50-bp
`sequencesb
`
`Integration
`sites (%)
`
`Genomic
`DNA
`(%)
`
`Integration
`sites (%)
`
`Genomic
`DNA
`(%)
`
`Anonymous
`Alu element
`LINE element
`Alphoid repeat
`Other repeats
`Transcription unit
`
`61
`10
`8
`0
`7
`18
`
`43
`9
`13
`6
`22
`8
`
`69
`10
`2
`0
`3
`16
`
`71
`6
`6
`3
`10
`4
`
`a For data from sequences of 144-bp average length, 61 integration sites and
`104 control sequences were considered.
`b For the reanalysis of integration site sequences considering only the proximal
`50 bp of human DNA sequence, 58 integration sites and 104 control sequences
`were considered.
`
`paring integration sites from infected cel