`
`www.elsevier.com/locate/yprep
`
`REFOLD: An analytical database of protein refolding methods
`
`Michelle K.M. Chow a, Abdullah A. Amin a,b, Kate F. Fulton a, James C. Whisstock a,b,c,
`Ashley M. Buckle a,b,¤, Stephen P. Bottomley a,¤
`
`a Department of Biochemistry and Molecular Biology, P.O. Box 13D, Monash University,1 Vic. 3800, Australia
`b Victorian Bioinformatics Consortium,2 P.O. Box 53, Monash University, Clayton, Vic. 3800, Australia
`c ARC Centre for Structure and Functional Microbial Genomics, Monash University,1 Clayton, Vic. 3800, Australia
`
`Received 14 June 2005, and in revised form 19 July 2005
`Available online 15 August 2005
`
`Abstract
`
`The expression and harvesting of proteins from insoluble inclusion bodies by solubilization and refolding is a technique com-
`monly used in the production of recombinant proteins. To bring clarity to the large and widespread quantity of published protein
`refolding data, we have recently established the REFOLD database (http://refold.med.monash.edu.au), which is a freely available,
`open repository for protocols describing the refolding and puriWcation of recombinant proteins. Refolding methods are currently
`published in many diVerent formats and resources—REFOLD provides a standardized system for the structured reporting and pre-
`sentation of these data. Furthermore, data in REFOLD are readily accessible using a simple search function, and the database also
`enables analyses which identify and highlight particular trends between suitable refolding and puriWcation conditions and speciWc
`protein properties. This information may in turn serve to facilitate the rational design and development of new refolding protocols
`for novel proteins. There are approximately 200 proteins currently listed in REFOLD, and it is anticipated that with the continued
`contribution of data by researchers this number will grow signiWcantly, thus strengthening the emerging trends and patterns and
`making this database a valuable tool for the scientiWc community.
` 2005 Elsevier Inc. All rights reserved.
`
`Keywords: Inclusion bodies; Protein refolding; Renaturation; Database
`
`Biomedical and biotechnical research often involves
`the need to purify recombinant proteins in the simplest
`and most eYcient manner possible, whilst maximizing
`both the yield and quality of protein puriWed. The use of
`recombinant techniques and bacterial systems facilitates
`the expression of proteins on a large scale, however, a key
`limitation of such systems is often the insolubility of the
`target protein, which may be expressed largely as non-
`functional aggregates in inclusion bodies [1,2]. Despite
`the development of various growth conditions, bacterial
`
`* Corresponding authors. Fax: +61 3 99053703 (S.P. Bottomley).
`E-mail addresses: ashley.buckle@med.monash.edu.au (A.M. Buckle) ,
`steve.bottomley@med.monash.edu.au (S.P. Bottomley).
`1 Web: www.med.monash.edu.au.
`2 Web: www.vicbioinformatics.com.
`
`1046-5928/$ - see front matter 2005 Elsevier Inc. All rights reserved.
`doi:10.1016/j.pep.2005.07.022
`
`strains, expression systems, and solubilizing fusion part-
`ners to increase and maximize protein solubility [1,3–5],
`for some proteins these strategies still prove to be ineVec-
`tive or highly ineYcient. On the other hand, the overex-
`pression of insoluble proteins can be exploited by the fact
`that proteins produced in inclusion bodies are often very
`pure. As such, the solubilization and unfolding of aggre-
`gated proteins, followed by refolding and a simple one-
`step puriWcation, either sequentially or concurrently, in
`many cases proves to be the most direct and eVective
`method of producing highly puriWed protein.
`In the realm of biochemical research there is a pleth-
`ora of documented and anecdotal data regarding the
`techniques of refolding proteins in vitro. The various
`procedures and methods involved have been extensively
`
`Amgen Exhibit 2032
`Apotex Inc. et al. v. Amgen Inc. et al., IPR2016-01542
`Page 1
`
`
`
`M.K.M. Chow et al. / Protein Expression and PuriWcation 46 (2006) 166–171
`
`167
`
`reviewed [6–11], however, until recently there has been
`no central repository for the actual experimental data,
`nor any logical process by which optimal conditions may
`be gleaned for proteins with speciWc characteristics.
`Thus, for a researcher working with a novel protein,
`Wnding the most suitable conditions for expression, solu-
`bilization, and refolding of proteins a priori can be a rel-
`atively random process. To facilitate this process, we
`have recently developed the REFOLD database (http://
`refold.med.monash.edu.au), which has been designed
`with a view to add structure to the deposition and
`retrieval of refolding data [12]. REFOLD is intended to
`provide a valuable resource for researchers in developing
`new protocols for the puriWcation of proteins. To date
`(May 2005), we have collated the details of approxi-
`mately 200 published protocols, involving the overex-
`pression, solubilization, and refolding of recombinant
`proteins. We have also annotated entries with data relat-
`ing to the properties of the protein, such as structural
`data, isoelectric point, molecular weight, oligomeric
`state, and the presence of disulWde bonds, as well as ref-
`erences and links to other knowledge databases.
`REFOLD provides a wealth of information in diVer-
`ent ways. At its most basic level, it provides a detailed
`catalogue of successful refolding and puriWcation meth-
`ods for a wide range of proteins in a readily accessible
`and easy-to-read format. Beyond this, detailed annota-
`tion allows the relationships between protein character-
`istics and refolding protocols to be delineated. Despite
`the youth of this resource, certain trends are already
`becoming evident. As REFOLD continues to grow,
`these emerging patterns will become statistically stron-
`ger and may be useful in the rational design of new pro-
`tocols.
`We would like to advocate a standardized data entry
`and reporting system for refolding data as demonstrated
`in REFOLD, such that this information may be readily
`accessible and available for all researchers in a stream-
`lined format. Here, we describe the implementation and
`details of this system and examine some of the early
`trends emerging from the data in REFOLD.
`
`Data entry
`
`(http://refold.med.
`freely available
`is
`REFOLD
`monash.edu.au), and free registration on the website
`allows users to enter their own protein refolding proto-
`cols into the database. Data are entered using a simple 1-
`page form (Fig. 1), entailing details about the protein of
`interest as well as the refolding and puriWcation proce-
`dures. This form is logically structured, such that proper-
`ties of the protein are entered Wrst, followed by details of
`expression, and Wnally the refolding methodology. This
`allows for a standardized format, and thus provides a
`streamlined reference catalogue.
`
`Upon entry of a new protein into the database, basic
`details regarding properties of the PROTEIN, such as
`chain length, pI, molecularity, disulWde bonds, molecular
`weight, and species are recorded. This part of the form
`also provides a cross-reference to the UniProt [13] and
`SCOP [14] databases and the SCOP family to which the
`protein belongs (if known). The entry of protein traits is
`then followed by details of the paper in which the proto-
`col was originally published, with the journal name,
`paper title, publication details, and PubMed cross-refer-
`ence.
`Details of protein EXPRESSION comprises informa-
`tion such as the cell type (bacterial, yeast, insect, etc.) and
`strain in which the protein is expressed, as well as the
`expression vector used to encode the protein. The cell
`density at which protein expression is induced, as mea-
`sured by optical density at 600nm (OD600), is also
`entered into the database, as well as the time and temper-
`ature of expression.
`The entry of data concerning REFOLDING proce-
`dures is one of the central aspects of this database. The
`form provides for details of refolding method, that is, the
`technique used to refold the protein, as well as various
`buVer conditions used in the protocol. This includes the
`solubilization buVer in which the protein is unfolded, the
`wash buVer which is used to wash inclusion bodies and
`remove cellular debris and loosely bound proteins, and
`Wnally the refolding buVer in which the protein is
`refolded. Details regarding refolding conditions includ-
`ing time, temperature, pH, redox reagents, and chaper-
`ones (if used) are also speciWed, as well as other variables
`such as pre-puriWcation steps prior to refolding, refolding
`yield, and purity. There is also an entry point for a com-
`prehensive description of the protocol for expression,
`refolding, and puriWcation as would be detailed in a
`paper.
`
`Data retrieval and analysis
`
`Users can easily access data in the REFOLD data-
`base by executing a simple search on any chosen term, or
`alternatively, an advanced search can be performed
`according to more speciWc parameters. Search results
`can be represented either in a table format or in a drop-
`down tree-view mode with the records sorted according
`to structural classiWcation. The tabulated results provide
`details of a number of sortable parameters, such as vari-
`ous protein properties, SCOP family and refolding
`method and conditions. SpeciWed links provide access to
`full refolding records, whilst selecting the name of a pro-
`tein leads to more detailed information about the pro-
`tein itself. Additionally, following links from the search
`results page to other parameters will lead to refolding
`records sharing that property. A PubMed cross-refer-
`ence also allows users access to the original article.
`
`Page 2
`
`
`
`168
`
`M.K.M. Chow et al. / Protein Expression and PuriWcation 46 (2006) 166–171
`
`Fig. 1. REFOLD data entry form—an example of an existing record is shown. Entry of information into the database takes place via this 1-page
`form, detailing information about the protein, expression conditions, and the refolding protocol. When previously unentered data are added, drop-
`down Welds allow for the inclusion of pertinent information, as shown under “Protein” details in this example.
`
`Page 3
`
`
`
`M.K.M. Chow et al. / Protein Expression and PuriWcation 46 (2006) 166–171
`
`169
`
`With the collation of many protocols in REFOLD,
`the database provides an excellent opportunity to exam-
`ine the assembled data and delineate any trends that
`may be instructive to the design of new protocols.
`Although REFOLD is a relatively new resource, some
`early patterns can already be observed in both the
`expression and refolding data.
`To date, all of the proteins entered into the database
`have been expressed in bacterial Escherichia coli cells.
`This is not surprising, given that bacterial expression is
`generally considered to be the simplest and most eco-
`nomical method of producing recombinant protein. Fur-
`thermore, when purifying proteins from inclusion bodies
`optimization of protein solubility becomes an irrelevant
`factor, thus reducing the need to adopt more complex
`expression systems. The most commonly employed E.
`coli strain for protein expression is BL21, which has been
`used in »68% of the protocols entered so far, which
`again, is consistent with the view of BL21 strain being a
`robust standard expression system [2]. Expression of
`most proteins is induced in the mid-log phase with
`OD600 ranging between 0.4 and 0.8, which is in accord
`with standard expression protocols, although in some
`large-scale fermentations the OD600 at induction has
`been considerably greater than this, even reaching up to
`50 in one case [15].
`As was evident early in the life of REFOLD [12], by
`far the most frequent method of refolding proteins
`from solubilized inclusion bodies is by simple dilution,
`which accounts for 40% of entries in the database
`(Fig. 2A). The second-most common technique is dialy-
`sis and taken together, these methods account for
`three-quarters of the protocols recorded. This suggests
`that in most cases the simplest methods may be suY-
`cient to yield adequate quantities of protein without
`the need to complicate the protocol further. Beyond
`this, column-assisted refolding using a nickel-chelating
`resin or gel Wltration chromatography are the next
`most commonly used methods. It is also interesting to
`note that the majority (68%) of proteins are expressed
`without a fusion tag (Fig. 2B). Of the proteins which
`are expressed with fusion tags, the most common tag is
`a C- or N-terminal hexahistidine (his6) tag, consistent
`with the fact that column-assisted refolding on a
`nickel-chelating resin is the third-most common refold-
`ing method.
`A number of proteins are refolded in the presence of
`additives such as arginine (39 entries) and glycerol (27
`entries), which are both compounds commonly used to
`aid the folding of proteins [6,7,11,16]. In cases where
`arginine is used, it is generally present in the refolding
`buVer at concentrations ranging from 0.25 to 1.0 M,
`and the buVer pH is always above 7.4. In contrast, glyc-
`erol, which is generally used at concentrations between
`1 and 30% (v/v), has been included in buVers with pH
`values as low as 5.5. Other additives that are known to
`
`Fig. 2. Analysis of data in REFOLD. Pie charts showing percentage
`breakdown of proteins entered in the database by (A) refolding tech-
`nique and (B) fusion tag construct.
`
`assist protein refolding have been used in a few
`instances, for example, ethylene glycol (50% v/v),
`magnesium chloride (5–200 mM), and glycine (0.5–
`1.0 M). Molecular chaperones have been employed in
`only a few cases, and in these situations GroEL or
`its apical domain (GroEL minichaperone) has been
`used.
`It is in the provision of data about protein proper-
`ties in conjunction with detailed refolding procedures
`that REFOLD has its greatest potential as a predictive
`knowledge database. The combination of structural
`and technical information allows for the delineation of
`relationships between successful refolding methodolo-
`gies and for particular proteins traits. Such trends may
`then be instructive to the new design of protocols for
`proteins of similar structure. For example, a number of
`proteins in the database cluster in structural families
`according to their SCOP classiWcation (Table 1). It can
`be seen that the refolding of E-set domains of sugar-
`utilizing enzymes has been undertaken within a rela-
`tively narrow range of pH values, as is the case for
`
`Page 4
`
`
`
`170
`
`M.K.M. Chow et al. / Protein Expression and PuriWcation 46 (2006) 166–171
`
`Table 1
`Refolding data for SCOP family clusters in REFOLD
`
`SCOP family
`
`Caspase-like proteases
`
`E-set domains of sugar-utilizing enzymes
`
`Eukaryotic proteases
`
`Kringle modules
`Long-chain cytokines
`
`MHC antigen-recognition domain
`
`Papain-like cysteine proteinases
`
`Pepsin-like proteases
`
`Ribonuclease A-like
`
`Serpins
`
`Serum albumin-like
`
`Short chain cytokines
`
`Transforming growth factor (TGF)-♢
`
`V-set domains (antibody variable domain-like)
`
`12
`
`Refolding methoda
`Dialysis (2)
`Dilution/dialysis (1)
`Dialysis (2)
`Dilution (1)
`Dilution (2)
`DiaWltration (1)
`Size exclusion chromatography (1)
`Dilution (3)
`Dialysis (3)
`Dilution (1)
`DiaWltration (1)
`High pressure (1)
`Size exclusion chromatography (1)
`Dilution (1)
`Oxidative chromatography (2)
`Dialysis (3)
`Dilution (2)
`Dialysis (1)
`Dilution (2)
`Dialysis (2)
`Dilution (2)
`Size exclusion chromatography (1)
`Dilution (2)
`Dilution/dialysis (2)
`Size exclusion chromatography (1)
`Dialysis (3)
`Dilution (1)
`Dialysis (3)
`Dilution/dialysis (2)
`Dilution/column chromatography (1)
`Dilution (2)
`Nickel chelating chromatography (1)
`Dilution (6)
`Dialysis (3)
`Dilution/dialysis (3)
`a Number in brackets indicates the number of protein entries for which the speciWed refolding method has been used.
`
`No. entries
`
`3
`
`3
`
`4
`
`3
`7
`
`3
`
`5
`
`3
`
`5
`
`5
`
`4
`
`6
`
`3
`
`Refolding pH
`
`7.8–10.0
`
`7.8–8.5
`
`8.5–8.8
`
`7.6–9.0
`5.0–8.8
`
`6.6–8.0
`
`7.0–10.7
`
`7.0–10.5
`
`7.5–8.6
`
`5.6–7.8
`
`8.5–10
`
`7.5–9.5
`
`8.0–8.5
`
`7.5–9.5
`
`eukaryotic proteases and transforming growth factor-♢
`proteins. Therefore, when developing refolding proce-
`dures for other proteins belonging to these families, it
`would be logical to be guided by the respective pH
`ranges for suitable buVer conditions. Similarly, when
`considering the refolding method, all of the Kringle
`Module proteins entered so far have been refolded by
`dilution while a number of other protein families are
`refolded by either dilution or dialysis only. Most com-
`pellingly, of the 12 V-set domain proteins, all have been
`successfully refolded by dilution and/or dialysis
`(Table 1). Therefore, for these groups of proteins, these
`simple methods would be appropriate starting points
`for the design of refolding protocols.
`Thus, even with the moderate amount of data cur-
`rently available in the database, some early patterns can
`already be identiWed. As the amount of data entered into
`REFOLD increases, the number and statistical robust-
`ness of such trends will continue to grow, thus strength-
`ening the ability to deWne and predict appropriate
`conditions for new protocols.
`
`Additional features in REFOLD
`
`Aside from the data entry and retrieval functions, the
`REFOLD website contains extra
`features which
`enhance its value as a resource. For example, the provi-
`sion of a graphical breakdown of various parameters for
`refolding techniques and protein properties allows par-
`ticular trends and patterns to be observed at a glance,
`providing information about the most common features
`and methods used.
`REFOLD also oVers an extra opportunity for user
`input by providing a space for commentary on existing
`data at the end of each record. This presents an opportu-
`nity for scientiWc discourse and exchange between
`researchers as users can comment on particular proto-
`cols or proteins which they may have employed them-
`selves based on the data supplied in REFOLD. Such
`remarks could address issues such as the usefulness of a
`given protocol, its application and/or adaptation to a
`diVerent protein, comments on the protein
`itself,
`or other related topics. As such, users can contribute
`
`Page 5
`
`
`
`M.K.M. Chow et al. / Protein Expression and PuriWcation 46 (2006) 166–171
`
`171
`
`feedback on existing data and add further information
`which may be useful to other researchers. Hence,
`through this commentary capacity, REFOLD oVers an
`extra level of data annotation, as well as providing an
`open forum for discussion and dialogue between
`scientists.
`
`REFOLD. This work was supported by grants from the
`National Health and Medical Research Council, the Victo-
`rian State Government, and the Victorian Partnership for
`Advanced Computing. SPB is a Monash University Senior
`Logan Fellow and NHMRC R.D. Wright Fellow. J.C.W. is a
`Monash University Logan Fellow and NHMRC Senior
`Research Fellow. K.F.F. is a NHMRC Peter Doherty Fellow.
`
`Future directions
`
`It is anticipated that REFOLD will become an
`invaluable resource for protein researchers. In order for
`the database to Xourish and maximize its usefulness, the
`entry of new data is required. We strongly encourage
`and welcome the deposition of published data by all
`researchers in the standardized format as described in
`this paper. With the contribution of more records to the
`database, it is envisaged that the accumulated knowl-
`edge in the database will combine to produce a compre-
`hensive picture of the most appropriate refolding
`techniques for diVerent proteins, and deWnitive methods
`and conditions will emerge as being appropriate for
`polypeptides with speciWc properties. This will enable a
`strong predictive capacity for REFOLD to evolve,
`whereby the rational design of refolding protocols may
`be informed and facilitated by the knowledge of a pro-
`tein’s characteristics and suitable methodologies appro-
`priate for those properties.
`The beneWts of entering protocols REFOLD are
`multi-fold, with a number of diVerent perspectives—
`for a researcher developing a refolding procedure for
`their protein, the database can impart valuable and
`relevant information which is easily accessible, not
`only based on protein properties but also through
`direct links to already proven and published protocols.
`For scientists depositing records in the database,
`REFOLD provides an opportunity to disseminate
`one’s published work to a wider audience. For
`REFOLD itself, the contribution of more records
`increases the amount of data available and thus
`strengthens the analytical capacity of the database.
`And Wnally, for the research community in general, the
`continued expansion of REFOLD will see it become
`an invaluable tool, providing a vast repository of col-
`lective knowledge with a standardized format in a cen-
`tralized and interactive resource.
`
`Acknowledgments
`
`The authors acknowledge the contribution of all the
`researchers whose published data has been entered into
`
`References
`
`[1] H.P. Sorensen, K.K. Mortensen, Soluble expression of recombi-
`nant proteins in the cytoplasm of Escherichia coli, Microb. Cell
`Fact. 4 (2005) 1–8.
`[2] H.P. Sorensen, K.K. Mortensen, Advanced genetic strategies for
`recombinant protein expression in Escherichia coli, J. Biotechnol.
`115 (2005) 113–128.
`[3] V. De Marco, G. Stier, S. Blandin, A. de Marco, The solubility and
`stability of recombinant proteins are increased by their fusion to
`NusA, Biochem. Biophys. Res. Commun. 322 (2004) 766–771.
`[4] M.R. Dyson, S.P. Shadbolt, K.J. Vincent, R.L. Perera, J.
`McCaVerty, Production of soluble mammalian proteins in Esche-
`richia coli: identiWcation of protein features that correlate with
`successful expression, BMC Biotechnol. 4 (2004) 32–48.
`[5] R.B. Kapust, D.S. Waugh, Escherichia coli maltose-binding pro-
`tein is uncommonly eVective at promoting the solubility of poly-
`peptides to which it is fused, Protein Sci. 8 (1999) 1668–1674.
`[6] L.D. Cabrita, S.P. Bottomley, Protein expression and refolding—
`A practical guide to getting the most out of inclusion bodies, Bio-
`technol. Annu. Rev. 10 (2004) 31–50.
`[7] B. Fahnert, H. Lilie, P. Neubauer, Inclusion bodies: formation and
`utilisation, Adv. Biochem. Eng. Biotechnol. 89 (2004) 93–142.
`[8] A.P. Middelberg, Preparative protein refolding, Trends Biotech-
`nol. 20 (2002) 437–443.
`[9] A.K. Panda, Bioprocessing of therapeutic proteins from the inclu-
`sion bodies of Escherichia coli, Adv. Biochem. Eng. Biotechnol. 85
`(2003) 43–93.
`[10] K. Tsumoto, D. Ejima, I. Kumagai, T. Arakawa, Practical consid-
`erations in refolding proteins from inclusion bodies, Protein Expr.
`Purif. 28 (2003) 1–8.
`[11] E.D.B. Clark, Refolding of recombinant proteins, Curr. Opin. Bio-
`technol. 9 (1998) 157–163.
`[12] A.M. Buckle, G.L. Devlin, R.A. Jodun, K.F. Fulton, N. Faux, J.C.
`Whisstock, S.P. Bottomley, The matrix refolded, Nat. Methods 2
`(2005) 3.
`[13] R. Apweiler, A. Bairoch, C.H. Wu, W.C. Barker, B. Boeckmann, S.
`Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, M.J. Mar-
`tin, D.A. Natale, C. O’Donovan, N. Redaschi, L.S. Yeh, UniProt:
`the Universal Protein knowledgebase, Nucleic Acids Res. 32
`(2004) D115–D119.
`[14] A.G. Murzin, S.E. Brenner, T. Hubbard, C. Chothia, SCOP: a
`structural classiWcation of proteins database for the investigation
`of sequences and structures, J. Mol. Biol. 247 (1995) 536–540.
`[15] A. Bazarsuren, U. Grauschopf, M. Wozny, D. Reusch, E. HoVmann,
`W. Schaefer, S. Panzner, R. Rudolph, In vitro folding, functional
`characterization, and disulWde pattern of the extracellular domain
`of human GLP-1 receptor, Biophys. Chem. 96 (2002) 305–318.
`[16] K. Tsumoto, M. Umetsu, I. Kumagai, D. Ejima, J.S. Philo, T.
`Arakawa, Role of arginine in protein refolding, solubilization, and
`puriWcation, Biotechnol. Prog. 20 (2004) 1301–1308.
`
`Page 6
`
`