Development of bioinformatics approach for soluble periplasmic expression in Escherichia coli
thesis
posted on 2017-01-10, 00:13authored byCatherine Chang Ching Han
Periplasmic
expression of soluble recombinant proteins in Escherichia coli not only offers
a much simplified downstream purification process, but also increase the
chances of attaining correctly folded and biologically active proteins. For
different combinations of signal peptide and target gene, the soluble
periplasmic yield can range from negligible to several grams per litre. The
selection of signal peptide generally depends on knowledge derived from past
studies or trial-and-error approach involving in vivo experiments. A systematic
approach for the rational selection of promising signal peptide and target gene
pairing is needed to improve the efficiency of recombinant protein production
by reducing the amount of time, effort and resources required to conduct in
vivo expression studies with trial-and-error approach. In addition, the search
dimension for optimal signal peptide and target gene combination will amplify
exponentially as the growth rate of protein discovery continues to expand.
In this research work, bioinformatics approach was employed
to improve the efficiency and success rate of soluble recombinant protein
production in the periplasm of E. coli. Experimental data from past literature
was curated and used to build prediction models for the prediction of soluble
periplasmic yield in E. coli based on amino acid sequence. Comprehensive
analysis was conducted for both protein solubility and protein folding rate
prediction tools to rationally select the most suitable tools for subsequent
utilization. Standardized independent test datasets were generated to justly
assess the predictive performance of respective tools. PROSO II and SeqRate
outperformed other prediction tools and were selected to provide reliable
inputs for the subsequent development of prediction models.
Using a sequence redundancy reduced dataset, a web-based
prediction tool with two-stage architecture, named Periscope, was developed to
compute quantitative prediction of soluble recombinant protein yield in the
periplasm of E. coli. Upon cross-validations, Periscope recorded prediction
accuracy of 78%. Further validation of the developed predictor was conducted by
experimentally cloning and expressing 21 different combinations of signal
peptide and target gene. Experimental verifications of Periscope predictive
performance through in vivo expression studies yielded 86% prediction accuracy.
Due to limitations of the developed predictor, the effect of
amino acid sequence independent factors on soluble periplasmic yield would not
be reflected on Periscope’s prediction. To further address these amino acid
sequence independent factors, a detailed analysis on the quantitative and
qualitative features derivable from mRNA secondary structure was conducted.
This information would be valuable to complement the prediction from Periscope
for improved rational selection of promising signal peptide and target gene
combination. Other findings derived from amino acid sequence independent
factors based on in vivo expression studies would also be useful to complement
the predictions from Periscope. In overall, the developed prediction tool would
present as valuable means to improve the efficiency of soluble recombinant
protein production in the field of protein expression and its extended
applications
History
Campus location
Malaysia
Principal supervisor
Ramakrishnan Nagasundara Ramanan
Additional supervisor 1
Tey Beng Ti
Additional supervisor 2
Lakshminarasimhan Krishnaswamy
Year of Award
2017
Department, School or Centre
School of Engineering (Monash University Malaysia)