`of inclusion bodies
`
`RENAUD VINCENTELLI,1 STE´ PHANE CANAAN,1 VALE´ RIE CAMPANACCI,
`CHRISTEL VALENCIA,2 DAMIEN MAURIN, FRE´ DE´ RIC FRASSINETTI,
`LORE´ NA SCAPPUCINI-CALVO, YVES BOURNE, CHRISTIAN CAMBILLAU,
`AND CHRISTOPHE BIGNON
`Architecture et Fonction des Macromole´cules Biologiques, Unité Mixte de Recherche (UMR) 6098, Centre National
`de la Recherche Scientifique (CNRS) et Universite´s d’Aix-Marseille I et II, 13402 Marseille Cedex 20, France
`(RECEIVED April 9, 2004; FINAL REVISION June 25, 2004; ACCEPTED June 30, 2004)
`
`Abstract
`
`One of the main stumbling blocks encountered when attempting to express foreign proteins in Escherichia
`coli is the occurrence of amorphous aggregates of misfolded proteins, called inclusion bodies (IB). Devel-
`oping efficient protein native structure recovery procedures based on IB refolding is therefore an important
`challenge. Unfortunately, there is no “universal” refolding buffer: Experience shows that refolding buffer
`composition varies from one protein to another. In addition, the methods developed so far for finding a
`suitable refolding buffer suffer from a number of weaknesses. These include the small number of refolding
`formulations, which often leads to negative results, solubility assays incompatible with high-throughput, and
`experiment formatting not suitable for automation. To overcome these problems, it was proposed in the
`present study to address some of these limitations. This resulted in the first completely automated IB
`refolding screening procedure to be developed using a 96-well format. The 96 refolding buffers were
`obtained using a fractional factorial approach. The screening procedure is potentially applicable to any
`nonmembrane protein, and was validated with 24 proteins in the framework of two Structural Genomics
`projects. The tests used for this purpose included the use of quality control methods such as circular
`dichroism, dynamic light scattering, and crystallogenesis. Out of the 24 proteins, 17 remained soluble in at
`least one of the 96 refolding buffers, 15 passed large-scale purification tests, and five gave crystals.
`Keywords: screening; refolding; solubility; inclusion bodies; automation; high-throughput
`
`Reprint requests to: Ste´phane Canaan or Christophe Bignon, Architec-
`ture et Fonction des Macromole´cules Biologiques, UMR 6098, CNRS et
`Universite´s d’Aix-Marseille I et II, 31 chemin Joseph Aiguier, 13402 Mar-
`seille Cedex 20, France; e-mail: stephane.canaan@afmb.cnrs-mrs.fr or
`bignon@afmb.cnrs-mrs.fr; fax: +00-334-91-16-45-36.
`1These authors contributed equally to this work.
`2Present address: Institut Gilbert Laustriat, IFR85, 74 route du Rhin, BP
`60024, F-67401 Illkirch Cedex, France.
`Abbreviations: BAC, bacterial artificial chromosome; -MSH, -mer-
`captoethanol; BSA, bovine serum albumin; CD, circular dichroism; DLS,
`dynamic light scattering; DsbA, disulfide oxidoreductase; GSH, reduced
`glutathione; GSSG, oxidized glutathione; IB, inclusion bodies; IP, isoelec-
`tric point; MT, Mycobacterium tuberculosis; OD, optical density; PEG,
`polyethylene glycol; SEC, size exclusion chromatography; SG, structural
`genomics; SPINE, Structural Proteomics In Europe.
`Article and publication are at http://www.proteinscience.org/cgi/doi/
`10.1110/ps.04806004.
`
`In the context of Structural Genomics (SG) projects in-
`volving targets from Escherichia coli (ASG), Mycobacte-
`rium tuberculosis (MT), and viruses (SPINE), we have
`performed expression assays on ∼600 genes (Sulzenbacher
`et al. 2002; Vincentelli et al. 2003). One of the main ob-
`stacles we and other authors have encountered when
`expressing recombinant proteins in E. coli is the relatively
`low soluble protein yield obtained with many of the source
`organisms used. In the case of eukaryotes, viruses, and
`Mycobacterium tuberculosis, most of
`the genes were
`expressed in the form of insoluble aggregates called “inclu-
`sion bodies” (IB). This obstacle to obtaining suitable
`targets for performing structural studies was particularly
`severe in the case of MT, with which 93% of our 182 tar-
`
`2782
`
`Protein Science (2004), 13:2782–2792. Published by Cold Spring Harbor Laboratory Press. Copyright © 2004 The Protein Society
`
`APOTEX EX1024
`
`Page 1
`
`
`
`gets yielded IB when proteins were expressed fused to an
`N-terminal His tag.
`IBs are assumed to result from illegitimate interactions
`between hydrophobic residues located in the core of differ-
`ent molecules. This process is auto-catalyzed and therefore
`rapidly results in the precipitation of all the recombinant
`proteins produced in the cell (Mukhopadhyay 1997). Meth-
`ods have been designed to recover correctly folded proteins
`from these amorphous aggregates. These include the “dilu-
`tion,” “dialysis,” and “solid phase” methods (De Bernardez-
`Clark 1998), all of which involve an initial IB solubilization
`step using highly concentrated solutions of chaotropic
`agents such as guanidinium chloride and urea. The subse-
`quent step in all these methods consists of removing the
`denaturing agent and restoring the protein to its native shape
`from the unfolded soluble state. The pathway used to re-
`move the chaotropic agent differs between the three meth-
`ods, however, although the same result is reached in each
`case. With the dilution method, refolding is assumed to
`occur immediately upon diluting the protein in a large vol-
`ume of nondenaturing buffer (“refolding buffer”), which
`has to be sufficiently large to both cancel out the solubiliz-
`ing effect of the chaotropic agent and reduce the probability
`that protein interactions will occur. The dialysis method
`involves the use of the same initial and final buffer compo-
`sitions as the dilution method, but in this case, there is no
`dilution to decrease the protein–protein contacts (Rudolph
`and Lilie 1996; Mukhopadhyay 1997). Finally, it was es-
`tablished that physically separating molecules from each
`other during the renaturation process (solid phase refolding)
`greatly improved the refolding yield (Stempfer et al. 1996).
`Whatever the method used to replace denaturing by non-
`denaturing buffer
`(a dilution, dialysis, or solid state
`method), it would be easier to use a single refolding buffer.
`Unfortunately, experience has shown that the composition
`of the refolding buffer is strongly protein dependent and that
`simply maintaining a difference between the pH of the re-
`folding buffer and the isoelectric point (IP) of the protein
`does not usually suffice to keep the protein soluble.
`Hence the idea of testing several refolding buffers simul-
`taneously. For instance, Perbio has addressed this issue with
`Pro-Matrix, a refolding kit consisting of nine basic buffers,
`which can be supplemented with additives (Qoronfleh
`2004). Using a fractional factorial approach, Armstrong et
`al. (1999), Chen and Gouaux (1997), and Hampton Re-
`search (FoldIt) have each developed separate procedures
`using 16 refolding conditions.
`Despite these improvements, some difficulties were still
`encountered in the protein solubility assays performed to
`monitor the refolding process. Because no solubility assay
`was provided with the Pro-Matrix kit, this assay had to be
`set up by the customer, and the methods suggested for a
`solubility assay in the case of the FoldIt kit (size exclusion
`chromatography [SEC]), as well as those used by Arm-
`
`High-throughput refolding screening
`
`strong et al. (1999) and Chen and Gouaux (1997) (dialysis
`and centrifugation), were not compatible with a high-
`throughput or with automation, which are two of the most
`crucial features in SG studies.
`To solve the problems associated with the above limita-
`tions, a protein solubility test based on light scattering has
`been devised (Tre´saugues et al. 2004). In practice, the tur-
`bidity of the solution is assessed by measuring the optical
`density (OD) at 390 nm, before and after adding the protein.
`If the protein remains soluble, the absorbance remains un-
`changed. In the opposite case, the OD increases proportion-
`ally to the amount of precipitate produced. This procedure is
`much faster than SEC and can be easily automated, but the
`number of conditions was still limited to 12, and the pro-
`teins often precipitated in all of them. This clearly suggested
`that the number of conditions needed to be further in-
`creased. A method of making this quantitative jump has
`been experimented in microtiter plate format, using 203
`refolding conditions (Sijwali et al. 2001). However, the
`latter study was only designed for screening different
`GSH:GSSG ratios.
`It is worth noting that although increasing the number of
`refolding conditions increases the probability that a protein
`will meet a buffer composition favoring its solubility, it also
`increases the number of samples to be handled. One pos-
`sible solution to this problem consists of automating the
`screening process. In addition, automation is required to
`obtain sufficiently large SG throughputs. A partially auto-
`mated refolding screening procedure was recently described
`(Scheich et al. 2004). With this procedure, however, the
`automation did not include any test for assessing the solu-
`bility and only 30 refolding conditions were used.
`We therefore designed a refolding strategy involving the
`use of 96 different buffers in microtiter plate format, based
`on the above mentioned idea that the probability of a protein
`encountering a buffer composition favoring correct folding
`was likely to increase with the number of buffers tested. The
`solubility assay used in our screening procedure is basically
`the same as that described by Tre´saugues et al. (2004),
`which accounts for protein solubility, and not for protein
`folding. After the preparatory refolding stage, circular di-
`chroism (CD), dynamic light scattering (DLS), and crystal-
`logenesis quality control procedures were added to respec-
`tively assess the folding, aggregation state, and homogene-
`ity of the protein solution. These methods were chosen
`because they can be applied in theory to any protein, which
`is a prerequisite in the field of post-Genomics, which deals
`mainly with proteins having an unknown function. Finally,
`the availability of a pipetting robot made it possible to au-
`tomate the whole process in a 96-well plate format.
`To the best of our knowledge, this is the first completely
`automated “wide spectrum” 96-well IB refolding screening
`procedure to be developed based on a factorial approach.
`The present article describes the setup involved and con-
`
`www.proteinscience.org
`
`2783
`
`Page 2
`
`
`
`Vincentelli et al.
`
`firms the validity of the method, based on tests carried out
`with proteins originating from two SG projects.
`
`Results
`
`Optimization of the solubility assay
`
`The recently described solubility test, in which the turbidity
`of the solution is measured in terms of the light absorbance
`at 390 nm, involves light scattering by a protein precipitate
`(Tre´saugues et al. 2004). As no proof was available that this
`wavelength was the most suitable one, we first addressed
`this point.
`For this purpose, the absorbance of a bovine serum albu-
`min (BSA) precipitate was scanned between 230 and 600
`nm. As shown in Figure 1 (curve A), the absorbance de-
`creased continuously from 230 to 600 nm. In addition to this
`regular decay, a small shoulder was present in the 280 nm
`region. To determine whether this feature was due to any
`remaining soluble proteins, the precipitate was spun down
`and the scanning performed again on the supernatant. Sur-
`prisingly, in this case, OD230–600 was indistinguishable from
`the baseline, which means that the protein content had been
`entirely converted into insoluble species. These results in-
`dicate that the absorbance pattern of the protein precipitate,
`which is shown in Figure 1 (curve A), was entirely ac-
`counted for in terms of light scattering and not even par-
`tially in terms of the absorbance of soluble proteins.
`
`Figure 1. Absorbance spectra of precipitated and soluble forms of a pro-
`tein. Twenty microliters of a 20 mg/mL BSA solution were diluted in 500
`L of either 100% isopropanol or 8 M guanidinium chloride. A chaotropic
`solution was used to ensure that the entire protein content was soluble. The
`absorbance of the resulting protein suspension (in isopropanol) or solution
`(in guanidinium chloride) was recorded from 230 to 600 nm, using a
`Varian Cary Scan 50 spectrophotometer. After subtracting the baseline (the
`absorbance of each solvent in the absence of protein), the absorbance
`intensities were plotted vs. the wavelengths. (Curve A) Precipitated protein
`in isopropanol. (Curve B) Soluble protein in guanidinium chloride. From
`left to right, three vertical arrows indicate the position of 280, 340/350, and
`390 nm wavelengths, respectively.
`
`2784
`
`Protein Science, vol. 13
`
`Because the solubility assay was expected to distinguish
`between the absorbance due to precipitated and soluble pro-
`teins, the same experiment was performed under conditions
`where the proteins remained 100% soluble. In this case (Fig.
`1, curve B), the absorbance profile was that of a typical
`protein solution, peaking at 280 nm (aromatic side chains)
`and at 200 nm (peptide bonds). Note that only the beginning
`of the peptide bonds’ absorbance peak ( max 200 nm;
`Stoscheck 1990) was visible between 230 and 240 nm.
`In conclusion, the wavelength to be used in the solubility
`test should satisfy the following contradictory criteria: (1) It
`should be high enough above 280 nm to prevent any risk of
`obtaining false negative results due to the absorbance of
`(partially or totally) soluble proteins, at values of 280 nm
`and below, but (2) it should be as small as possible to
`provide the highest signal-to-noise ratio, according to curve
`A, and hence the most sensitive assay. In practice, 340-
`(manual procedure) and 350-nm (automated procedure)
`wavelengths were selected because they fulfilled these two
`criteria and provided better results than 390 nm.
`
`Selection of 96 refolding conditions
`
`The chemicals listed in Table 1, which were used to prepare
`the refolding mixes presented in Figure 2, were selected on
`the basis of the following criteria:
`
`1. A 4 pH to 9 pH range was chosen because the proteins
`to be screened had various IPs and were likely to dena-
`ture below or above these values.
`
`2. Various ionic strengths (none; 100 mM NaCl or KCl;
`and 200 mM NaCl) were used because the solubility can
`increase (salting in) or decrease (salting out) with the salt
`concentration from one protein to another.
`
`3. With the dilution method used, refolding was allowed to
`proceed for a very short time. Amphiphilic components
`(glycerol, PEG) were introduced to prevent the hydro-
`phobic residues of different molecules still accessible at
`intermediate refolding stages from interacting with each
`other. In addition, glycerol and PEG were already pro-
`vided in other refolding kits (Tre´saugues et al. 2004) and
`were compatible with crystallogenesis. Glucose and ar-
`ginine were used for the same reason, although Arg had
`to be removed before the crystallogenesis trials (see be-
`low).
`
`4. Solubilizing reagents in the NDSB series were selected
`because they have been successfully used in protein crys-
`tallogenesis (Karaveg et al. 2003) and refolding experi-
`ments (Vuillard et al. 1998; Expert-Bezancon et al.
`2003).
`
`5. Proteins bearing odd numbers of cystein can form un-
`natural intermolecular disulfide bonds, which is a pos-
`
`Page 3
`
`
`
`High-throughput refolding screening
`
`Table 1. Chemicals used to make the 80 first refolding buffers
`
`Buffer
`(50 mM)
`
`Ionic strength
`
`Amphiphilic
`
`Detergent
`(100 mM)
`
`Reducing agent
`(10 mM)
`
`NDSB 195
`NaAc, pH 4 NaCl 100 mM Glycerol 20% (v/v)
`MES, pH 5
`NaCl 200 mM PEG 4000 0.05% (w/v) NDSB 201
`MES, pH 6
`KCl 100 mM PEG 400 0.05% (w/v)
`NDSB 256
`TRIS, pH 7
`TRIS, pH 8
`CHES, pH 9
`
`-MSH
`
`Additive
`
`Arginine 800 mM
`Glucose 500 mM
`Cocktaila
`EDTA 1 mM
`
`The concentrations indicated are those used before adding the protein.
`a Consisted of 50 M of each of the following: NADH, thiamine HCl, biotine, CaCl2, MgCl2, CuSO4, ZnCl2,
`CoSO4, ADP, and NiCl2.
`
`sible cause of precipitation during the refolding process.
`Ten millimolar of -MSH were introduced to prevent
`this mispairing.
`
`6. The “cocktail” contained potential cofactors that might
`be required during the refolding process in the case of
`some proteins, whereas some other proteins tend to pre-
`cipitate in the presence of divalent cations, hence the
`presence of EDTA.
`
`7. The chaotrops (urea and guanidinium chloride) present
`in the commercial kits were discarded because they were
`liable to damage the robot’s pipetting valves.
`
`It was necessary to use a fractional factorial approach
`on the first 80 wells, because the combination of 20 chemi-
`cals would have resulted in too many experimental points
`(the full factorial design would have been 2560 combina-
`tions).
`In the 16 remaining microplate wells, mini chaperones (a
`soluble form of GroEL; Altamirano et al. 1997) and redox
`components (GSSH, GSSG, DsbA) were combined, be-
`cause the disulfide bond formation/reduction during the
`folding process itself has been found to be crucial (Wei et al.
`1999). Details of each of the refolding conditions are given
`in Figure 2.
`
`Figure 2. Detailed composition of each well in the refolding plate. (*) Tris (pH 8), NaCl 150 mM, EDTA. For details, see Table 1.
`
`www.proteinscience.org
`
`2785
`
`Page 4
`
`
`
`Vincentelli et al.
`
`Testing of 96 refolding conditions
`The 96-well screening procedure was tested on a panel of 24
`proteins from two SG projects: MT (18 targets) and SPINE
`(6 targets). The results obtained are given in Table 2. Eleven
`out of the 18 MT targets (61%) and all the SPINE targets
`subjected to screening remained soluble under at least one
`of the 96 refolding conditions. In addition, except for MT
`target Rv1373 (buffer 57), all the responsive targets re-
`mained soluble in many buffers, which made it possible to
`choose the most suitable one(s) for the downstream steps
`such as crystallogenesis. In addition, the pH was not found
`to be a decisive parameter, because most of the targets
`remained soluble in a wide pH range, except Rv1525,
`Rv1515c, Rv0323c, and Rv2045, which remained soluble
`only at pH 4. Generally speaking, no particular buffer com-
`position (pH, ionic strength, etc.) peaked more than the
`
`others, which suggests that the solution was always protein
`specific. The solubility yield at the production stage also
`appeared to be very high: 10 out of the 11 responsive MT
`targets (91%), and five out of the six responsive SPINE
`targets (83%) succeeded in passing the large-scale refolding
`and the first concentration steps. Only one SPINE (63) and
`two MT (Rv0323c and Rv1515c) targets were lost during
`the second concentration step following the gel filtration. In
`these particular cases, CD was nonetheless performed, but
`on protein solutions with concentrations too low for crys-
`tallogenesis.
`
`Validity of the refolding screening procedure
`
`Protein solubility and folding superimpose satisfactorily, but
`the overlap is not always 100%. We therefore tried to assess
`
`Table 2. (A) MT and SPINE targets remaining soluble in at least one refolding buffer and (B) summary of positive targets at each step
`
`A
`
`Target
`
`MW Organism
`
`Soluble in buffer a
`
`Purification
`
`IP
`
`pH CD DLS Crystal
`
`Rv2391
`Rv2392
`Rv1399c
`Rv1208
`Rv1373
`Rv1564c
`Rv1523
`Rv1515c
`Rv0323c
`Rv2045c
`Rv3487c
`
`SPINE 5
`SPINE 10
`
`SPINE 21
`
`SPINE 22
`SPINE 23c
`SPINE 63
`
`66
`30
`36
`37
`40
`84
`40
`36
`27
`59
`29
`
`23
`23
`
`52
`
`53
`23
`23
`
`MT
`MT
`MT
`MT
`MT
`MT
`MT
`MT
`MT
`MT
`MT
`
`Sendai
`Measles
`
`SFV
`
`SFV
`Human
`HIV
`
`39, 54, 57
`39, 49, 55, 56, 59, 61, 63, 64, 66
`41, 44, 48, 49, 56, 59, 65, 66
`41, 43, 48, 54, 56, 59, 63, 65, 66, 68, 69, 70, 74, 80
`57
`41, 43, 44, 49, 56, 57, 59, 63, 66
`4, 7, 10, 11, 12
`4, 5, 7, 10, 11, 12
`2, 3, 4, 5, 9, 10, 11, 12
`3, 4, 5, 6, 7, 10, 11, 12
`2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 18, 19, 20,
`21, 22, 23, 24, 29, 45, 47, 49, 54, 57, 75
`10, 58, 59, 67, 73, 76
`1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 16, 17, 22, 24, 26, 32,
`49, 54, 75, 78, 79
`1, 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 21, 29, 31, 45, 49,
`75, 78
`2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 31, 45, 49, 57, 78, 79
`All except 5, 6, 16, 17, 26, 42, 53, 61, 65, 76
`1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
`18, 19, 20, 21, 22, 25, 26, 27, 28, 29, 31, 32, 33,
`34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 47,
`48, 49, 51, 52, 54, 55, 57, 58, 66, 78
`
`57 (−Arg)
`59
`41
`74
`57 (−Arg)
`41
`4 (−glyc)
`4 (−glyc)
`4 (−glyc)
`4
`nd
`
`69
`6
`
`4, 6
`
`nd
`33
`19
`
`6.31
`5.87
`4.38
`4.75
`6.36
`4.95
`8.06
`6.79
`5.81
`7.67
`8.85
`
`5.06
`8.99
`
`8.80
`
`9.03
`8.68
`9.9
`
`8
`8
`7
`9
`8
`7
`4
`4
`4
`4
`
`9
`4
`
`4
`
`6
`5
`
`nd
`Ok
`Ok
`Ok
`nd
`Ok
`Ok
`Ok
`Ok
`Ok
`nd
`
`Ok
`Ok
`
`nd
`
`nd
`Ok
`Ok
`
`nd
`nd
`M
`A
`A
`D
`nd
`nd
`nd
`nd
`nd
`
`T
`A
`
`A
`
`nd
`D
`H
`
`No
`Yes
`Yes
`Yes
`No
`No
`No
`Nob
`Nob
`No
`nd
`
`Yes
`No
`
`No
`
`nd
`Yes
`ndb
`
`B
`
`MT
`SPINE
`
`Target
`number
`
`18
`6
`
`Responsive targets
`
`11
`6
`
`Large-scale
`purification
`
`10
`5
`
`CD
`OK
`
`DLS
`OK
`
`Crystal
`
`8
`4
`
`2
`3
`
`3
`2
`
`(Target) The Rv nomenclature used was that of the MT genome (Cole et al., 1998; Camus et al. 2002). (MW) theoretical molecular weight (kDa).
`(IP) isoelectric point (taking into account the His tag when present). (pH) pH of the mix used for large-scale purification. (CD) ok, the protein fulfilled the
`criteria defined in Materials and Methods. (DLS) Only the main (>95%) population (M, D, etc. ) was included in the table. (M) monomeric; (D) dimeric;
`(T) tetrameric; (H) Hexameric; (A) Aggregates (see Materials and Methods for details).
`a The numbers refer to the buffers listed in Fig. 2 (1 ⳱ 1A, 2 ⳱ 1B … 9⳱ 2A, etc.). (−Arg), (−glyc) protein purification was performed using the buffer
`indicated devoid of arginine or glycerol, respectively.
`b Lost during gel filtration or after the last concentration step.
`c This target was not refolded from IB, but from a Ni eluate that precipitated just after elution.
`(Target number) Number of targets subjected to refolding screening. (Responsive targets) Number of targets subjected to refolding screening that remained
`soluble in at least one refolding buffer. (DLS OK) DLS was taken to be satisfactory when the criteria defined in Materials and Methods were fulfilled.
`
`2786
`
`Protein Science, vol. 13
`
`Page 5
`
`
`
`the overlap in the case of proteins that were quantitatively
`refolded. In post-Genomics, one is often dealing with genes
`encoding proteins with an unknown function, and functional
`tests for each of the targets are frequently lacking. There-
`fore, depending on the targets, generic and/or specific meth-
`ods can be used to assess the folding.
`
`Generic methods
`Circular dichroism (protein folding), dynamic light scat-
`tering (protein aggregation), and crystallogenesis (protein
`folding and dispersion homogeneity) were used for this pur-
`pose. Note that out of the 17 targets that reached the large-
`scale refolding stage, five could not be subjected to CD
`analysis either because of the presence of NDSB in the
`refolding buffer or because the amount of protein available
`was not sufficient. Crystallogenesis was also taken to be a
`valid folding criterion, because only properly folded pro-
`teins with an even aggregation state yield well-ordered crys-
`tals.
`The results obtained with these three methods, which are
`summarized in Table 2, indicated upon CD analysis that all
`the targets that produced crystals also displayed folding fea-
`tures. This was so in the case of both MT (Rv2392,
`Rv1399c, Rv1208) and SPINE (targets 5 and 23). However,
`the opposite was not true: CD-positive MT targets Rv1564c
`Rv1523, Rv1515c, Rv0323c, and Rv2045 and SPINE target
`10 did not produce crystals. Therefore, although the sole
`presence of secondary structures (-sheet and/or ␣-helix)
`did not necessarily lead to successful crystallogenesis, its
`absence could be said to suggest a poor prognosis in terms
`of crystallogenesis, at
`least with this particular protein
`sample. By contrast, protein aggregation detected by DLS
`analysis seems to have a lower predictive value, because
`MT target Rv1208 produced crystals despite its aggregated
`state. Finally, the crystallization yield obtained with this
`procedure (five targets [36%]) was outstandingly high.
`
`Specific method
`Although the presence of secondary structures (CD), the
`lack of aggregates (DLS), and crystal growth argue in favor
`
`Table 3. Rv1399c refolding in the preparatory stage
`
`High-throughput refolding screening
`
`of correct folding, it is necessary to carry out more specific
`tests whenever possible. This was the case with Rv1399c.
`Because this target had been annotated as a putative lipase,
`a specific enzymatic assay was set up (Canaan et al. 2004).
`As illustrated in Table 3, the enzymatic activity could be
`measured after the refolding step, which provides evidence
`that our refolding screening procedure yields functional pro-
`teins, and not only soluble proteins. Two additional points
`are worth noting in Table 3: First, the refolding yield could
`be assessed, and turned out to be particularly high (50%).
`Second, 24 h after the refolding process, the total enzymatic
`activity was six times higher, which reflects the occurrence
`of a slow refolding process.
`
`Scale up: Criteria for the choice of refolding buffer
`
`Isoelectric point
`
`As can be seen from Table 2, whenever possible, we
`chose conditions giving the largest difference in pH with the
`isoelectric point (IP) of the protein. Although we do not
`know how many proteins would remain soluble if a mixture
`with a pH near the IP was used, our choice actually resulted
`in 100% of the targets being successfully purified.
`
`Compatibility with downstream steps
`
`High concentrations of arginine sometimes artificially
`maintained proteins in the soluble state. Consequently, the
`removal of arginine often resulted in protein precipitation
`(not illustrated). In addition, due to its “anti-aggregation”
`effects (Umetsu et al. 2003), 800 mM Arg would have ham-
`pered crystallogenesis. We therefore tested the solubility of
`Rv2391 and Rv1373 in buffers with decreasing concentra-
`tions of Arg. Because these proteins remained soluble with-
`out any Arg, we decided to purify them in Arg-free buf-
`fer 57.
`Pipetting a solution containing both a high protein con-
`centration and 20% glycerol would lead to poor perfor-
`
`Step
`
`Protein
`(mg)
`
`Total activity
`(U)
`
`Active protein
`(mg)
`
`Specific activity
`(U/mg)
`
`Refolding
`yield (%)
`
`Ni+ affinity column and concentration
`Dilution in refolding buffer
`Dilution in refolding buffer (24 h later)
`Freezing/thawing (before centrifugation)
`Freezing/thawing (after centrifugation)
`
`160
`160
`160
`92
`80
`
`0
`12,880
`77,760
`96,600
`108,000
`
`0
`9.5
`57.6
`71.5
`80
`
`0
`80.5
`486
`1050
`1350
`
`0
`5.9
`36
`44.7
`50
`
`The enzymatic activity was measured as described (Canaan et al. 2004), one unit (U) of activity being defined as the hydrolysis of one micromole of
`substrate per minute. The amount of active protein was calculated by dividing the total activity recorded at each step by the maximum specific activity (1350
`U/mg). The refolding yield was calculated by dividing the amount of active protein obtained in each step by the amount of starting material (160 mg eluted
`from the Ni affinity column).
`
`www.proteinscience.org
`
`2787
`
`Page 6
`
`
`
`Vincentelli et al.
`
`mances of the Cartesian crystallization robot in the ∼100 nL
`range. The same dilution technique was therefore used with
`glycerol as that described above in the case of Arg, with
`similar results and effects on large-scale purification.
`It can therefore be said that although Arg and glycerol
`were helpful during the refolding step, they were no longer
`required subsequently to maintain the solubility of the pro-
`tein, at least with these particular targets.
`
`in 2 h 30 minwithout any human interventions. Thanks to
`the color code, the automated procedure, in addition to sav-
`ing time, made it possible to display the results in a form
`that was easier to analyze than the manual procedure
`(Fig. 3B).
`
`Discussion
`
`Choosing between manual and automated procedures
`
`If a small number of proteins have to be screened, the
`manual procedure is preferable, whereas a large number of
`targets (tens to hundreds) requires the use of an automated
`procedure. In this case, screening one plate takes only 5
`min, and in its present form, the robot can process 27 plates
`
`IB refolding versus soluble expression in SG
`
`To manage our SG programs, we have developed a general
`strategy based on several “screening rounds” of increasing
`complexity (Vincentelli et al. 2003). In the first round, tar-
`gets are expressed using a single vector encoding an N-
`terminal His-tag fusion and a single E. coli strain. In the
`second round, eight E. coli strains are transformed by the
`
`Figure 3. (A) Robot used in the automated procedure. The tools required for the refolding screening procedure are indicated by arrows.
`(B) Results of Rv2392 refolding screening. At the end of the experiment, the results (in Excel format) were displayed using a color
`code: Green and red indicate the wells containing soluble (DO < 0.05) and precipitated (DO > 0.05) proteins, respectively.
`
`2788
`
`Protein Science, vol. 13
`
`Page 7
`
`
`
`same vector as in round 1, and used to express the recom-
`binant proteins at different temperatures. In the third round,
`the coding sequences are fused with maltose-binding pro-
`tein, thioredoxin, glutathione S-transferase, and NusA. In
`the fourth round, the same experimental conditions are used
`as in round 1, except that the proteins are refolded from IB.
`
`Comparisons between rounds 3 and 4
`In the MT program, screening round 3 seems to be the
`most fruitful procedure so far, as it yielded 56 soluble pro-
`teins after proteolytic cleavage of the fusion (S. Canaan, R.
`Vincentelli, D. Maurin, F. Frassinetti, L. Scappucini-Calvo,
`Y. Bourne, C. Cambillau, and C. Bignon, unpubl.). How-
`ever, its cost (in terms of the time required to prepare fusion
`constructs and to process the fusion vectors, the price of the
`endopeptidase, etc.) could easily be prohibitive. Conversely,
`IB refolding at preparative scale yielded 10 MT soluble
`proteins at a much lower cost, starting with only a fraction
`(27%) of the insoluble MT targets. Although no SPINE
`target was processed in round 3, it is worth noting that five
`out of six targets (83%) yielded soluble proteins in the pre-
`paratory stages of IB refolding, starting with only 3% of
`SPINE insoluble proteins.
`
`Comparisons between rounds 1 and 4
`In addition, the success rate (defined as the percentage of
`the proteins that succeeded in passing the scale-up step)
`obtained in round 4 with 18 MT and six SPINE targets (61%
`and 83%, respectively) was much more satisfactory than
`that obtained in round 1: Out of 182 MT and 244 SPINE
`target genes, only 14 (7.7%) MT targets and 80 (33%)
`SPINE targets were directly recovered in the form of
`soluble proteins after E. coli cell lysis. This means that at
`least in some cases, the IB chemical refolding procedure
`produces soluble species more efficiently than living bac-
`teria. Therefore, we propose to adopt IB refolding in the
`initial stages of SG projects dealing with highly insoluble
`proteins, such as the MT project. The validity of this ap-
`proach has been established in the case of small (<18 kDa)
`proteins intended for NMR structural analysis (Maxwell et
`al. 2003). Because 58% of the proteins were found to be
`properly refolded when a single renaturation buffer was
`used, one can expect to obtain a much higher refolding yield
`if an upstream refolding screening procedure is carried out
`in addition (Maxwell et al. 2003).
`
`Limitations of the screening procedure
`
`The 96-well plate refolding screening procedure is not suit-
`able for use with either high pressure (St. John et al. 1999)
`or reverse micelle (Vinogradov et al. 2003) approaches, for
`physical reasons. Nor can this method be used to study
`refolding processes using time-dependent techniques such
`
`High-throughput refolding screening
`
`as stepwise dialysis with additives (Umetsu et al. 2003) or
`air oxidation techniques (Menzella et al. 2002). Other limi-
`tations of our method are due to the OD340 detection method
`used:
`
`1. IB redissolved in chaotrop must be free of contaminants,
`otherwise these might promote precipitation, yielding
`false negative results. In this respect, the nickel affinity
`purification step is of particular importance.
`
`2. If the protein concentration is too low in the chaotropic
`agent, there may be no detectable precipitate after dilut-
`ing the protein in refolding buffer, even if the buffer is
`not favorable to maintaining the solubility.
`
`3. We have observed that the first OD340/350 reading was
`sometimes misleading: Some positive spots became
`negative due to the slow protein precipitation with time.
`The opposite also occurred, presumably due to the pres-
`ence of proteins with slow refolding kinetics, such as
`Rv1399c (see Table 3). This prompted us t