Monash University

Restricted Access

Reason: Access restricted by the author. A copy can be requested for private research and study by contacting your institution's library service. This copy cannot be republished

Development of bioinformatics approach for soluble periplasmic expression in Escherichia coli

posted on 2017-01-10, 00:13 authored by Catherine Chang Ching Han
Periplasmic expression of soluble recombinant proteins in Escherichia coli not only offers a much simplified downstream purification process, but also increase the chances of attaining correctly folded and biologically active proteins. For different combinations of signal peptide and target gene, the soluble periplasmic yield can range from negligible to several grams per litre. The selection of signal peptide generally depends on knowledge derived from past studies or trial-and-error approach involving in vivo experiments. A systematic approach for the rational selection of promising signal peptide and target gene pairing is needed to improve the efficiency of recombinant protein production by reducing the amount of time, effort and resources required to conduct in vivo expression studies with trial-and-error approach. In addition, the search dimension for optimal signal peptide and target gene combination will amplify exponentially as the growth rate of protein discovery continues to expand.
   In this research work, bioinformatics approach was employed to improve the efficiency and success rate of soluble recombinant protein production in the periplasm of E. coli. Experimental data from past literature was curated and used to build prediction models for the prediction of soluble periplasmic yield in E. coli based on amino acid sequence. Comprehensive analysis was conducted for both protein solubility and protein folding rate prediction tools to rationally select the most suitable tools for subsequent utilization. Standardized independent test datasets were generated to justly assess the predictive performance of respective tools. PROSO II and SeqRate outperformed other prediction tools and were selected to provide reliable inputs for the subsequent development of prediction models.
   Using a sequence redundancy reduced dataset, a web-based prediction tool with two-stage architecture, named Periscope, was developed to compute quantitative prediction of soluble recombinant protein yield in the periplasm of E. coli. Upon cross-validations, Periscope recorded prediction accuracy of 78%. Further validation of the developed predictor was conducted by experimentally cloning and expressing 21 different combinations of signal peptide and target gene. Experimental verifications of Periscope predictive performance through in vivo expression studies yielded 86% prediction accuracy.
   Due to limitations of the developed predictor, the effect of amino acid sequence independent factors on soluble periplasmic yield would not be reflected on Periscope’s prediction. To further address these amino acid sequence independent factors, a detailed analysis on the quantitative and qualitative features derivable from mRNA secondary structure was conducted. This information would be valuable to complement the prediction from Periscope for improved rational selection of promising signal peptide and target gene combination. Other findings derived from amino acid sequence independent factors based on in vivo expression studies would also be useful to complement the predictions from Periscope. In overall, the developed prediction tool would present as valuable means to improve the efficiency of soluble recombinant protein production in the field of protein expression and its extended applications


Campus location


Principal supervisor

Ramakrishnan Nagasundara Ramanan

Additional supervisor 1

Tey Beng Ti

Additional supervisor 2

Lakshminarasimhan Krishnaswamy

Year of Award


Department, School or Centre

School of Engineering (Monash University Malaysia)


Doctor of Philosophy

Degree Type



Faculty of Engineering