It is often challenging to isolate sufficient quantities of a target protein in a soluble and folded form for molecular analysis. This issue is particularly acute for multi-domain proteins that may not express or correctly fold in recombinant expression systems. To help overcome this obstacle we are developing a high throughput method, called ‘Domain Seeking’, in which protein fragment libraries are screened for solubility and the results are used as the basis for further fragment design and optimisation.
The Domain Seeking procedure has three major steps. Firstly, a random gene fragment library is generated for the protein.1 Secondly, the library is expressed in E. coli and a split-green fluorescent protein (GFP) assay is used to report on in vivo protein fragment solubility.2 Lastly, a form of cluster analysis is used to combine the data from all the hit fragments to identify the likely number and boundaries of structural domains within the protein. This takes into account that fragments with both the optimal start and end points for each structural domain may not necessarily be present in the library screened, but these points are indicated by an overall analysis of the fragment data obtained.
Here we present the application of the Domain Seeking procedure to a multi-domain human protein, p85α. This structurally characterized protein is often used as a test case for domain mapping methods, hence providing a benchmark for our approach. Our cluster analysis of around one hundred p85α soluble fragments identified with the in vivo solubility assay, defines the boundaries of four out of the five known structural domains of this protein.
1. Reich, S. et al.. Protein science 15, 2356-2365, (2006).
2. Cabantous, S. & Waldo, G. S. Nature methods 3, 845-854, (2006).