`factorial screen: A study of reagent effects and
`interactions
`
`MELISSA SWOPE WILLIS, JAMES K. HOGAN, PRAKASH PRABHAKAR,
`XUN LIU, KUENHI TSAI, YUNYI WEI, AND TED FOX
`Vertex Pharmaceuticals, Cambridge, Massachusetts 02139, USA
`
`(RECEIVED February 25, 2005; FINAL REVISION February 25, 2005; ACCEPTED March 28, 2005)
`
`Abstract
`
`A recurring obstacle for structural genomics is the expression of insoluble, aggregated proteins. In these
`cases, the use of alternative salvage strategies, like in vitro refolding, is hindered by the lack of a universal
`refolding method. To overcome this obstacle, fractional factorial screens have been introduced as a
`systematic and rapid method to identify refolding conditions. However, methodical analyses of the
`effectiveness of refolding reagents on large sets of proteins remain limited. In this study, we address this
`void by designing a fractional factorial screen to rapidly explore the effect of 14 different reagents on the
`refolding of 33 structurally and functionally diverse proteins. The refolding data was analyzed using
`statistical methods to determine the effect of each refolding additive. The screen has been miniaturized
`for automation resulting in reduced protein requirements and increased throughput. Our results show that
`the choice of pH and reducing agent had the largest impact on protein refolding. Bis-mercaptoacetamide
`cyclohexane (BMC) and tris(2-carboxyethylphosphine) (TCEP) were superior reductants when compared
`to others in the screen. BMC was particularly effective in refolding disulfide-containing proteins, while
`TCEP was better for nondisulfide-containing proteins. From the screen, we successfully identified a positive
`synergistic interaction between nondetergent sulfobetaine 201 (NDSB 201) and BMC on Cdc25A refold-
`ing. The soluble protein resulting from this interaction crystallized and yielded a 2.2 A˚ structure. Our
`method, which combines a fractional factorial screen with statistical analysis of the data, provides a
`powerful approach for the identification of optimal refolding reagents in a general refolding screen.
`inclusion bodies; high-
`Keywords: protein folding; fractional factorial screen; crystal structure;
`throughput refolding; structural genomics
`
`Reprint requests to: Ted Fox, Vertex Pharmaceuticals, 130 Waverly
`Street, Cambridge, MA 02139, USA; e-mail: ted_fox@vrtx.com; fax:
`(617) 444-6820.
`Abbreviations: AMP-PNP, 50-adenylylimidodiphosphate; bME, b-mer-
`captoethanol; BMC, bis-mercaptoacetamide cyclohexane; Cdc25A, cell
`division cycle 25A phosphatase; DDM, n-dodecyl-b-D-maltopyranoside;
`DTT, dithiothreitol; DYRK3, dual specificity Yak1-related kinase 3;
`GdnHCl, guanidine hydrochloride; GSH, reduced glutathione; GSSG,
`oxidized glutathione; ICE, interleukin-1b converting enzyme; IMPDH,
`inosine 50-monophosphate dehydrogenase; LDH, lactate dehydrogenase;
`MAPKAP-K5, mitogen-activated protein kinase-activated protein kinase
`5; MES, 2-[N-morpholino] ethanesulfonic acid; NADH, b-nicotinamide-
`adenine-dinucleotide, reduced; NADP+, b-nicotinamide-adenine-dinu-
`cleotide phosphate; NDSB, nondetergent sulfobetaine; PEG 3350,
`polyethylene glycol 3350 Da; pNPP, p-nitrophenylphosphate; RNase A,
`ribonuclease A; TCEP, tris(2-carboxyethylphosphine); Tris-HCl, tris
`(hydroxymethyl)aminomethane hydrochloride; Tween 80, polyoxyethyl-
`ene (80) sorbitan monolaurate.
`Article published online ahead of print. Article and publication date
`are at http://www.proteinscience.org/cgi/doi/10.1110/ps.051433205.
`
`The identification of 20,000–25,000 genes from the human
`genome project has resulted in a wealth of potential targets
`for structural biology investigation and pharmaceutical
`design (International Human Genome Sequencing Consor-
`tium 2004). Since the completion of the project, expecta-
`tions have been high that the number of protein crystal
`structures would dramatically increase but,
`in reality,
`there has only been a moderate rise in the number of crystal
`structures, due largely to a lack of sufficient quantities of
`protein suitable for structural studies (Service 2002).
`Although the technology responsible for expressing recom-
`binant proteins is highly developed (Chambers et al. 2004),
`it is still difficult to produce enough soluble protein for
`these structural studies. The ultimate goal of determining
`crystal structures on a genome-wide scale requires methods
`designed to improve the yield of functional protein.
`
`1818
`
`Protein Science (2005), 14:1818–1826. Published by Cold Spring Harbor Laboratory Press. Copyright ª 2005 The Protein Society
`
`ps0514332
`
`Willis et al.
`
`Article RA
`
`APOTEX EX1025
`
`Page 1
`
`
`
`Historically, optimization of soluble protein expres-
`sion has been the first strategy when trying to obtain
`protein for structural studies. In contrast, refolding
`insoluble protein has often been a strategy of last resort
`due to the unpredictable and time-consuming nature of
`the refolding process. However, the literature shows that
`numerous proteins can be refolded into their active
`forms, and that certain additives can assist in the refold-
`ing process. The combination of these additives dictates
`the efficiency of refolding as well as the utility of this
`method to gain soluble protein. Some of the more effec-
`tive additives include reducing agents, thiol shuffling
`enzymes, polar and nonpolar reagents, various deter-
`gents, and chaperonins; numerous excellent reviews
`have previously discussed these and other refolding
`additives in more detail (Rudolph and Lilie 1996; De
`Bernardez Clark 1998; Lilie et al. 1998; Voziyan et al.
`2000; Clark 2001; Middelberg 2002). Due to the unpre-
`dictable nature of the refolding process, the develop-
`ment of a systematic method for identifying useful
`refolding conditions is needed. Fractional
`factorial
`refolding screens have emerged as a way to compensate
`for this unpredictability. Fractional factorial screens
`contain a representative subset of reagent combinations
`contained in full factorial screens and are designed to
`maximize the number of refolding variables explored
`while minimizing the amount of data collection (Hof-
`mann et al. 1995; Chen and Gouaux 1997; Armstrong
`et al. 1999; Tobbell et al. 2002). These screens have been
`used successfully to refold proteins, but the choice of
`refolding additives included in these screens is based on
`historical precedent and does not take into account
`novel reagents shown to improve protein renaturation.
`More recently, Vincentelli et al. (2004) designed an
`automated, 96-well refolding strategy that incorporated
`a fractional factorial buffer design utilizing both the
`traditional refolding additives used in previous refolding
`screens as well as a newer class of refolding agents
`known as NDSBs.
`Although prior refolding screens identify useful condi-
`tions for protein refolding, they stop short of using statis-
`tical methods to determine the utility of each reagent when
`used in a general screen on a diverse protein data set. In
`this study, we investigate the effects of additives on the
`refolding of 33 proteins using a fractional factorial refold-
`ing screen. We include reagents such as the reductants
`BMC and TCEP, and the detergent-mimic NDSB 201 in
`our matrix as a way of assessing their utility in refolding a
`variety of proteins. These reagents have been shown to be
`beneficial to protein refolding, extraction, and stability
`(Vuillard et al. 1995a,b; Woycechowsky et al. 1999;
`Chong and Chen 2000; English et al. 2002). The screen
`has been miniaturized for automation, resulting in
`reduced protein requirements, increased throughput, and
`
`Reagent effects and interactions on protein refolding
`
`enhanced reproducibility. To assess the applicability of
`the screen to a wide spectrum of proteins, we refolded
`multiple members from five gene families, as well as single
`members from additional families. The data gathered
`from refolding 33 proteins were analyzed using statistical
`methods to identify individual reagents, and reagent inter-
`actions having a significant effect on protein refolding.
`Every buffer condition successfully refolded at least one
`protein, and of the 14 reagents tested, 12 reagents signifi-
`cantly improved protein refolding. Finally, this screen was
`used successfully to identify a positive synergistic interac-
`tion between reagents that resulted in the production of
`soluble, functional protein leading to diffraction quality
`crystals and the solution of a protein structure. The results
`obtained support the use of a fractional factorial screen in
`combination with statistical analysis to identify suitable
`reagents to be included in a general refolding screen and
`provide a systematic method for optimizing the refolding
`process.
`
`Results
`
`Refolding screen design and the use of automation
`
`Additives such as the reducing agents BMC and TCEP,
`the detergent Tween 80, and the detergent-mimic NDSB
`201, were identified from the literature as useful refold-
`ing agents and evaluated for their suitability in a refold-
`ing screen (Vuillard et al. 1995a; Goldberg et al. 1996;
`Woycechowsky et al. 1999; Arakawa and Kita 2000;
`English et al. 2002). A fractional factorial design was
`used to sample multiple components in 32 buffers that
`included seven factors assessed at two levels (salt, PEG,
`GdnHCl, divalent metal
`ions, sucrose, arginine, and
`ligand) and three factors assessed at four levels (pH,
`detergent, and reductant) (Table 1). Miniaturization of
`the refolding and assay reactions to a 96-well plate
`format reduced the protein requirements to <500 mg
`of unfolded protein per triplicate screen and allowed the
`introduction of automatic pipetting systems at multiple
`steps in the refolding process, resulting in improved data
`quality and throughput.
`
`Protein target selection
`
`To ensure this screen met the criteria of broad applic-
`ability, we investigated the refolding of 33 proteins
`from different families of varying molecular weights
`(14–80 kDa), pIs (5.3–9.4), and disulfide content. The
`protein set in this study was comprised of 11 kinases,
`9 proteases, 5 dehydrogenases, 4 phosphatases, and 4
`single proteins (hepatitis C virus NS3 helicase,
`lyso-
`zyme, RNase A, and sRNase A), representing three
`other gene families. In addition to simple monomeric
`
`www.proteinscience.org
`
`1819
`
`Page 2
`
`
`
`Willis et al.
`
`Table 1. Thirty-two condition fractional factorial screen
`
`Buffera
`
`Detergentb
`
`Reductantc
`
`Saltd
`
`PEG 3350e
`
`GdnHClf
`
`Cationg
`
`Sucroseh
`
`Argininei
`
`Ligandj
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`
`MES 5.5
`MES 6.5
`BORATE 9.5
`MES 6.5
`TRIS 8.2
`TRIS 8.2
`MES 5.5
`MES 5.5
`MES 5.5
`TRIS 8.2
`MES 6.5
`MES 5.5
`TRIS 8.2
`MES 6.5
`BORATE 9.5
`BORATE 9.5
`TRIS 8.2
`TRIS 8.2
`TRIS 8.2
`BORATE 9.5
`TRIS 8.2
`MES 6.5
`MES 5.5
`MES 6.5
`MES 5.5
`MES 6.5
`BORATE 9.5
`BORATE 9.5
`BORATE 9.5
`MES 6.5
`MES 5.5
`BORATE 9.5
`
`0
`DDM
`DDM
`T80
`NDSB
`T80
`DDM
`T80
`NDSB
`T80
`NDSB
`DDM
`DDM
`NDSB
`NDSB
`T80
`0
`DDM
`0
`0
`NDSB
`0
`T80
`DDM
`NDSB
`0
`0
`T80
`NDSB
`T80
`0
`DDM
`
`BMC
`BMC
`GSH:GSSG
`BMC
`DTT
`BMC
`DTT
`DTT
`TCEP
`TCEP
`GSH:GSSG
`GSH:GSSG
`TCEP
`DTT
`TCEP
`DTT
`DTT
`BMC
`GSH:GSSG
`BMC
`GSH:GSSG
`DTT
`GSH:GSSG
`TCEP
`BMC
`GSH:GSSG
`TCEP
`GSH:GSSG
`BMC
`TCEP
`TCEP
`DTT
`
`0
`1
`1
`1
`1
`1
`0
`0
`1
`0
`0
`1
`0
`1
`1
`0
`1
`1
`0
`0
`0
`1
`1
`0
`0
`0
`1
`1
`0
`0
`1
`0
`
`0
`1
`0
`0
`1
`1
`0
`1
`0
`0
`1
`1
`1
`0
`1
`0
`0
`0
`1
`1
`0
`1
`0
`0
`1
`0
`0
`1
`0
`1
`1
`1
`
`0
`0
`0
`1
`0
`0
`0
`1
`0
`1
`0
`1
`0
`1
`1
`0
`1
`1
`0
`1
`1
`0
`0
`1
`1
`1
`0
`1
`0
`0
`1
`1
`
`0
`0
`1
`1
`1
`0
`1
`0
`0
`1
`1
`0
`0
`0
`1
`1
`0
`1
`1
`1
`0
`1
`1
`1
`1
`0
`0
`0
`0
`0
`1
`0
`
`0
`1
`1
`0
`0
`1
`1
`0
`0
`0
`0
`0
`1
`1
`1
`1
`1
`0
`0
`1
`1
`0
`1
`0
`1
`1
`0
`0
`0
`1
`1
`0
`
`0
`1
`0
`1
`0
`1
`1
`1
`1
`0
`1
`0
`0
`0
`1
`1
`0
`1
`1
`0
`1
`0
`0
`0
`0
`1
`1
`0
`0
`0
`1
`1
`
`0
`1
`0
`1
`1
`0
`1
`1
`0
`0
`0
`1
`0
`0
`1
`0
`1
`0
`1
`1
`1
`0
`1
`1
`0
`0
`1
`0
`1
`1
`0
`0
`
`a 50 mM buffer.
`b DDM, 0.3 mM; Tween 80, 0.5 mM; NDSB 201, 1 M.
`c BMC, TCEP, and DTT, 5 mM; GSH:GSSG, 1 mM GSH:0.1 mM GSSG.
`d 0 = 10.56 mM NaCl, 0.44 mM KCl; 1 = 264 mM NaCl, 11 mM KCl.
`e 0 = no PEG 3350; 1 = 0.06% PEG 3350 w/v.
`f 0 = no GdnHCl; 1 = 550 mM GdnHCl.
`g 0 = 1.1 mM EDTA; 1 = 2 mM MgCl2, 2 mM CaCl2.
`h 0 = no sucrose; 1 = 440 mM sucrose.
`i 0 = no arginine; 1 = 550 mM arginine.
`j 0 = no ligand; 1 = presence of ligand (target:ligand, kinases:100 mM AMP-PNP, phosphatases:100 mM o-phospho-L-tyrosine, proteases, RNase
`A, sRNase A, helicase:1–10 mM assay substrate, dehydrogenases:20 mM NADH or NADP, lysozyme:10 mg/mL Micrococcus lysodeikticus).
`
`proteins, our data set also includes examples with com-
`plex quaternary structures such as a homodimer (steroid
`dehydrogenase), homotetramers (inosine 50-monopho-
`sphate dehydrogenase [IMPDH] and lactate dehydro-
`genase [LDH]), and an a2b2 heterodimer (interleukin-
`1b converting enzyme [ICE]). Our initial focus was on
`soluble, active proteins with well-characterized activities
`that were then unfolded in a high concentration of
`denaturant. To ensure that the screen’s utility extended
`to insoluble proteins, we also included three proteins
`purified from inclusion bodies (ICE, pro-memapsin 2,
`and Cdc25A).
`
`In vitro folding of 33 proteins
`
`The range of significant refolding across the 33 protein
`set varied dramatically from 0.1% to 65% of the total
`protein being refolded. The feasibility and practicality of
`scaling-up refolding reactions yielding <1% activity has
`not been tested. However, we have previously shown
`that ICE, which typically refolds with 1% efficiency,
`can generate sufficient protein for solving high resolu-
`tion crystal structures (Wilson et al. 1994; Wei et al.
`2000). When all buffers in the primary screen are con-
`sidered, the refolding data indicate that every buffer
`
`1820
`
`Protein Science, vol. 14
`
`Page 3
`
`
`
`resulted in the successful refolding of at least one pro-
`tein (Fig. 1A). The four buffers refolding the largest
`number of proteins were Buffer 22 > Buffer 6 > Buffer
`29 > Buffer 1. These buffers cover the entire pH range;
`three of the four contain BMC, and all lack GdnHCl.
`Grouping the refolding data by protein family indicates
`that the best refolding conditions are specific for each
`family (Fig. 1B). The unpredictability of the best refold-
`ing conditions (shown in black) suggests that there is no
`universal refolding buffer in this screen and provides
`
`Figure 1. Protein refolding from the primary screen. (A) The effect of
`each buffer on the in vitro refolding of 33 proteins. The upper and
`lower activity quartiles for each protein were determined, and each
`buffer was scored positive for activity in the upper quartile and nega-
`tive for activity in the lower quartile for all proteins. (B) In vitro
`refolding of protein families. The activity for each refolded protein
`was normalized, sorted into quartiles, scored, and a weighted average
`calculated for each buffer. Black represents high refolding (0.76–1.0);
`gray represents intermediate refolding (0.26–0.75); and white repre-
`sents low refolding (0–0.25).
`
`Reagent effects and interactions on protein refolding
`
`one of the strongest reasons why a fractional factorial
`screen is so useful. Given that many protein refolding
`studies in the literature use an iterative approach to
`identify optimal refolding conditions, the data from
`this set of proteins supports the utility of using a
`broad screen to more efficiently explore conditions
`resulting in refolding.
`
`Identification of reagent effects on refolding
`
`Triplicate refolding data sets were subjected to a rank
`transformation and the significance of each protein/
`reagent combination (p < 0.05) was determined by anal-
`ysis of variance followed by pair-wise comparisons for
`four level factors. Refolding factors shown to have a
`significant positive effect on refolding this set of proteins
`include dodecyl maltoside, NDSB 201, Tween 80, argi-
`nine, ligand, PEG 3350, salt, and sucrose. The presence
`of divalent metal ions and GdnHCl had a negative effect
`on refolding (Fig. 2). The design of the screen with four
`levels for both pH and reductant required that the base-
`line be assigned to the activity set with the lowest
`amount of refolding. This analysis indicates an optimal
`pH of 8.2 for refolding this set of proteins when com-
`pared to all other pH levels. In addition, expansion of
`the pH range resulted in the refolding of three proteins
`that did not refold at pH 6.5 or pH 8.2. Likewise, BMC
`and TCEP significantly enhanced the refolding of pro-
`teins when compared to the more commonly used
`dithiothreitol (DTT) and reduced:oxidized glutathione
`(GSH:GSSG). A comparison of both compounds was
`conducted on refolding proteins in the data set whose
`crystal structures have been published. The results of
`this comparison demonstrate that BMC significantly
`aids in refolding 64% of the proteins with disulfide
`bonds, while TCEP significantly aids in refolding 75%
`of the proteins lacking disulfide bonds.
`
`Secondary screens for reagent optimization
`
`Reagents shown to have a significant positive effect on
`protein refolding from the primary screen analysis were
`chosen for secondary screens. Five proteins (Cdc25A,
`IMPDH, DYRK3, MAPKAP-K5, and lysozyme) were
`selected to confirm the observed effects of BMC, Tween
`80, dodecyl maltoside, pH, TCEP, and GSH:GSSG,
`respectively, on protein refolding. These experiments
`confirmed all six of the initial observations made from
`the primary screen in which a reagent had a positive
`effect, thus illustrating the power of this approach. The
`optimal reagent concentrations determined from these
`secondary screens were Tween 80, 0.5 mM; dodecyl
`maltoside, 0.3 mM; pH, 7.0; TCEP, 5–10 mM; and
`GSH:GSSG, 2.5–10 mM GSH: 0.25–1 mM GSSG.
`
`www.proteinscience.org
`
`1821
`
`Page 4
`
`
`
`Willis et al.
`
`protein for structural studies. Meeting this demand has
`proven to be a challenge, given the low success rate for
`expressing soluble eukaryotic proteins compared to pro-
`karyotic proteins (Yee et al. 2002; Chambers et al.
`2004). An alternative approach for generating sufficient
`quantities of soluble protein is refolding the insoluble
`protein expressed in the inclusion bodies of Escherichia
`coli. In theory, refolding these proteins should be a
`straightforward process given that the refolding litera-
`ture is replete with the effects of individual reagents on
`the refolding of single proteins. In practice, however,
`there is no universal method or buffer for reliably
`refolding a given protein of interest and identification
`of initial refolding conditions remains a major hurdle.
`One way to overcome this obstacle is by the
`
`Figure 2. The effect of each reagent on protein refolding. The total
`number of significant effects (p < 0.05) for each reagent from a pairwise
`comparison of the rank transformed refolding activity was determined.
`
`Secondary screen of interacting reagents resulting
`in a high resolution crystal structure
`
`The fractional factorial used in the primary screen was
`designed to identify main effects of the refolding addi-
`tives. In addition, the resolution of the screen is suffi-
`cient to identify some interactions between reagents.
`The entire data set from the primary screen was exam-
`ined for interactions between two reagents resulting in
`enhanced refolding. A positive synergistic interaction
`between NDSB 201 and BMC on the refolding of
`Cdc25A was identified, and a secondary screen designed
`to vary both reagents simultaneously (Fig. 3A). When
`either NDSB 201 or BMC were present individually, the
`maximum refolding was less than twofold over the base-
`line condition that lacked both reagents. However, the
`combination of both reagents resulted in up to a 36-fold
`increase in refolded protein confirming the positive
`interaction observed in the primary screen. The best
`condition for refolding Cdc25A resulted from 0.6 M
`NDSB 201 and 5 mM BMC, and this was used to refold
`the protein on a larger scale. The final yield of soluble,
`active protein suitable for crystallization after refolding
`and purification was 1.5%. The protein crystallized
`under conditions similar to those reported previously
`and the resultant 2.2 A˚
`crystal structure of refolded
`Cdc25A is identical
`to that published for soluble
`Cdc25A (Fig. 3B,C) (Fauman et al. 1998).
`
`Discussion
`
`A significant barrier facing structural genomic projects
`is the generation of soluble,
`functional eukaryotic
`
`1822
`
`Protein Science, vol. 14
`
`Fig 3. live 4/c
`
`Figure 3. Reagent interaction effects on Cdc25A refolding and crystal-
`lization. (A) The interaction between NDSB 201 and BMC on Cdc25A
`refolding identified from the primary screen was tested by varying
`NDSB 201 (0–1 M) and BMC (0–5 mM) concentrations to identify
`the optimal reagent combination. Activity was measured as described
`and a 36-fold increase over baseline lacking both reagents was
`observed. (B) Cdc25A crystals. (C) Ribbon diagram for the 2.2 A˚
`crystal structure of refolded Cdc25A.
`
`Page 5
`
`
`
`introduction of refolding screens to rapidly identify
`initial conditions that result in folded protein (Hofmann
`et al. 1995; Chen and Gouaux 1997; Armstrong et al.
`1999; Tobbell et al. 2002; Maxwell et al. 2003; Scheich
`et al. 2004; Tresaugues et al. 2004; Vincentelli et al.
`2004). These screens were designed to test a variety of
`refolding additives in a minimal number of experiments.
`Although these screens have been successful in refolding
`multiple proteins, a comprehensive statistical analysis
`of the importance of the reagents for generalized pro-
`tein refolding is minimal. Our method uses a fractional
`factorial design combined with statistical analysis to
`directly compare the effects of both well-known, and
`lesser-known, refolding reagents on a large and diverse
`set of proteins. The data gathered from this study was
`used to determine the general utility of each reagent for
`the better design of future refolding screens.
`Based on our analysis, pH and reductants had the larg-
`est impact on refolding our set of 33 proteins. The effect
`of pH on protein refolding has been well documented on a
`protein-specific basis, but previous analysis regarding the
`optimal pH for protein refolding has been limited. Our
`data demonstrates a direct comparison of four pH levels
`and provides examples where pH extremes are crucial for
`protein refolding. Likewise, the data from a refolding
`screen designed by Vincentelli et al. (2004) showed that a
`broad pH range was important for protein solubility,
`underscoring the importance of exploring pH when
`designing a generalized refolding screen. Reducing agents
`also play an important role in refolding proteins; how-
`ever, the use of compounds for protein refolding beyond
`the more traditional reductants (DTT, GSH:GSSG,
`and bME) remains protein-specific. BMC is a dithiol
`that improves protein refolding both in vitro and in
`vivo, and is thought to mimic protein disulfide isomerase
`(PDI) by catalyzing native disulfide bond formation
`(Woycechowsky and Raines 2000). TCEP is a nonthiol-
`containing molecule and is a stronger reductant than DTT
`at pH values below 8 (Getz et al. 1999). The results from
`this protein data set strongly support the inclusion of BMC
`and TCEP in a refolding screen. Proteins containing di-
`sulfide bonds were more effectively refolded using BMC
`than its well-studied counterpart, GSH:GSSG. In contrast,
`proteins lacking disulfide bonds were more effectively
`refolded using TCEP than DTT. The utility of alternative
`reductants, such as 4-mercaptobenzeneacetate (4-MPA)
`shown in the literature to aid protein folding (Gough
`et al. 2002), suggests that other compounds may also be
`useful, and could be explored in future refolding screens.
`Although important, pH and reductants are not the
`only variables to consider when designing a refolding
`screen. Studies have shown that a single protein can refold
`under markedly different conditions (Hofmann et al.
`1995; Armstrong et al. 1999). Our data set contained
`
`Reagent effects and interactions on protein refolding
`
`two phosphatases with 65% sequence identity and nearly
`identical structural folds. Even with such a high level of
`identity, one of the proteins refolded productively in twice
`as many buffer conditions as the other. One way to over-
`come the unpredictable nature of protein refolding is to
`include an array of reagents known to improve refolding
`as a way to maximize the opportunity to recover func-
`tional protein. As such, our screen also includes all the
`reagents originally described in a fractional factorial
`screen by Chen et al. (Chen and Gouaux 1997) as well as
`the detergent Tween 80 and the detergent-mimic NDSB
`201. The latter two were added because they inhibit aggre-
`gation during the refolding process resulting in increased
`yields of soluble protein (Goldberg et al. 1996; Arakawa
`and Kita 2000; Chong and Chen 2000). NDSBs lack the
`hydrophobic tail of detergents, thereby preventing micelle
`formation and have been shown to be especially helpful in
`refolding at higher protein concentrations
`(Expert-
`Bezancon et al. 2003). Vincentelli et al. (2004) included
`NDSBs 195, 201, and 256 in their refolding screen and
`found them to be useful refolding additives. The remain-
`ing reagents in our screen improved the refolding of at
`least one protein with the exception of GdnHCl and diva-
`lent metal ions. The results from our analysis suggest that
`inclusion of all the reagents discussed, aside from GdnHCl
`and divalent metal ions, will increase the chance of suc-
`cessfully applying a broad refolding screen. The inclusion
`of alternative refolding agents like cyclodextrins, which
`have been used successfully in prior refolding studies
`(Machida et al. 2000; Scheich et al. 2004), could be
`explored in future fractional factorial screens.
`While the effects of reagent interactions on refolding
`have been touched upon previously (Tobbell et al.
`2002), the optimization of a positive reagent interaction
`for generating crystallization quality protein is unique.
`Reagent interactions can be identified depending on the
`resolution of the fractional factorial screen. The impor-
`tance of using appropriate experimental designs and
`statistical methods to analyze the refolding data is par-
`ticularly relevant when looking beyond the main effects
`for these interactions. SAmBA, a software program
`used previously to design a refolding matrix (Vincentelli
`et al. 2004), is good for setting up the experimental
`design but lacks the complementary statistical methods
`needed to analyze the data. The reagent interactions in
`our screen were not immediately discernable, and could
`only be identified using statistical analysis. Using this
`method, we were able to identify potential interactions,
`and interestingly, a third of these interactions were
`between pH and the various reductants. The interaction
`between NDSB 201 and BMC on the refolding of
`Cdc25A was selected for follow-up due to the novelty
`of the reagents. In addition, the low refolding efficiency
`of the protein made it a more challenging example to
`
`www.proteinscience.org
`
`1823
`
`Page 6
`
`
`
`Willis et al.
`
`pursue. The resultant crystal structure of Cdc25A sup-
`ports the literature in promoting the utility of refolding
`for generating soluble protein for structural genomics
`programs (Maxwell et al. 2003).
`The matrix described here allowed the rapid exploration
`of 14 different reagents on the refolding of 33 proteins
`representing significant diversity in structure and function.
`Moreover, this screen incorporated recently described
`reagents shown to improve the refolding process while
`decreasing the total number of conditions from >8000
`data points in a full factorial to a mere 32 data points.
`While other refolding screens have used light scattering as
`a measurement of refolding (Tresaugues et al. 2004;
`Vincentelli et al. 2004), protein activity provides a useful
`alternative method to measure refolding, and has low
`protein requirements of <500 mg of unfolded protein per
`triplicate primary screen. In addition, the small reaction
`volumes allow future screening designs to include more
`difficult to obtain refolding reagents such as chaperonins.
`The identification of important new reagent effects and
`interactions that enhance refolding highlights the need to
`identify optimal buffer conditions for refolding proteins in
`a methodical, fast, and economical way. In this regard, the
`combination of automation, fractional factorial screens,
`and a thorough analysis of the data using statistical soft-
`ware provide a powerful tool to expand on existing refold-
`ing methodology. The data presented here demonstrates
`the strength of this strategy as a way to overcome the
`bottleneck of obtaining soluble, functional protein for
`structural genomics programs.
`
`Materials and methods
`
`Protein expression and purification
`
`The proteins expressed and purified in this study have been pub-
`lished elsewhere: kinases (Takahashi et al. 1989; Lindberg and
`Hunter 1990; McTigue et al. 1999; Chambers et al. 2004), phos-
`phatases (Cool et al. 1989; Fauman et al. 1998; Andersen et al.
`2000; Austen et al. 2004), proteases (Thompson et al. 1995; Hong
`et al. 2000; Wei et al. 2000; Austen et al. 2004), IMPDH (Fleming
`et al. 1996), and helicase (Kim et al. 1998). Recombinant proteins
`were expressed in E. coli or insect cells using a multisystem ex-
`pression vector (Chambers 2002). Proteins were flanked with a
`cleavable (His)6 tag, allowing purification by metal affinity
`chromatography, followed by size-exclusion and ion-exchange
`chromatography when necessary. Commercial enzymes were pu-
`rified by size-exclusion chromatography. Protein concentrations
`were determined from the A280 using calculated extinction coeffi-
`cients (Gill and Von Hippel 1989).
`
`Refolding matrix design
`
`previously reported strategy of combining a pair of two level
`factors (Montgomery 1991). The four levels for each reagent
`are pH (5.5, 6.5, 8.2, and 9.5), reducing agents (GSH:GSSG,
`TCEP, BMC, and DTT), and detergent (dodecyl maltoside,
`Tween 80, NDSB 201, and no detergent). The remaining
`reagents (ligand, divalent metal
`ions, arginine, GdnHCl,
`NaCl/KCl, PEG 3350, and sucrose) were either present or
`absent
`(two levels), giving the fractional
`factorial design
`shown in Table 1. All refolding buffers were made and stored
`in deep, 96-well blocks and frozen at 80C.
`
`Refolding protocol
`
`Proteins were unfolded overnight in 6 M GdnHCl and 5 mM
`bME at 25C and then concentrated to 1 mg/mL. Prior to use,
`deep, 96-well blocks housing enough refolding buffer to per-
`form each primary screen in triplicate were thawed and ligands
`added to the appropriate wells. Daughter plates (round-bot-
`tom polypropylene) of refolding buffers were made using an
`Apricot Designs pipetting station (Perkin-Elmer). The plates
`were cooled to 4C, unfolded protein added to a final concen-
`tration of 50 mg/mL, and incubated overnight with rocking at
`4C.
`
`Activity measurements
`
`Refolded proteins (5–40 mL) were assayed and data collected
`on a Spectramax (absorbance) or an Fmax (fluorescence) plate
`reader using the Spectramax Pro software for data analysis
`(Molecular Devices Corp.). Negative control plates containing
`everything but refolded protein were subtracted from experi-
`mental data. A coupled assay using the appropriate peptide
`phosphoacceptor substrates and measuring NADH conversion
`at 340 nm was used to detect kinase activity (Fox et al. 1998).
`Phosphatase activity was measured by monitoring pNPP
`hydrolysis at 405 nm (Dunphy and Kumagai 1991). Protease
`activity was measured by monitoring cleavage of the appro-
`priate peptide substrates (Nakajima et al. 1979). Dehydrogen-
`ase activity was measured by monitoring 340 nm as described
`(Fleming et al. 1996; Prabhakar et al. 1998). The activity of
`lysozyme, RNase A, and helicase were measured as described
`(Goldberg et al. 1991; Kim et al. 1998; Schultz et al. 1998).
`
`Analysis of primary screen data
`
`The raw refolding data was subjected to a rank transforma-
`tion (Conover and Iman 1981) and significance (p < 0.05)
`for each reagent/protein combination was determined by
`analysis of variance using SAS Institute statistical software.
`Dunnett’s test was applied to the four level factors for
`pairwise comparisons of
`the individual
`levels
`(Dunnett
`1955). For reductant and pH, the reagent level with the
`poorest refolding was set as the baseline. Interactions were
`obtained from a stepwise regression model using the rank
`transformed data sets.
`
`A fractional factorial design was constructed using the Design
`of Experiments (DOE) function within the JMP statistical
`analysis software package (JMP v. 4, SAS Institute). Three
`sets of reagents (buffer pH, detergent, and reductant) were
`grouped and considered as single factors at four levels by the
`
`Secondary screen of main effects
`
`Reagents that were shown to have a significant positive effect on
`refolding were subsequently investigated individually. Significance
`was determined after a rank transformation of the raw data and
`
`1824
`
`Protein Science, vol. 14
`
`Page 7
`
`
`
`analysis of variance of the protein/reagent combinations. The
`effec