Hybrid computational models for protein sequence analysis and secondary structure prediction
thesis
posted on 2017-01-09, 03:36authored byBidargaddi, Niranjan
The aim of the thesis is to develop novel hybrid computational models for
protein sequence analysis and secondary structure prediction. The research
work specifically deals with (a) protein sequence alignment and family identification,
(b) prediction of secondary structures and (c) prediction of contact
maps and contact numbers.
Protein sequence alignment and family identification has been approached
widely in the past using classical profile hidden Markov model (HMM) based
on probability theory. Despite being used successfully, a profile HMM has
a limitation of inherent statistical independence assumptions. To overcome
this limitation, a novel architecture of fuzzy profile HMM incorporating
fuzzy measures and integrals is presented. The superior performance of the
fuzzy profile HMM over the classical profile HMM is established based on
the experiments carried out using widely studied globin and kinase families.
A comparative study using Z-score plots and ROC analysis is also carried
out on three different variants of fuzzy profile HMM based on possibility,
lambda (.>.) and belief measures. The possibility measure based fuzzy profile
HMM demonstrated the best performance.
For secondary structure prediction, the prominent methods are mostly based
on neural networks, which involve mappings from a local window of residues
in the sequence to the structural state of the central residue in the window,
thus capturing the local interactions more effectively than distant interactions
among residues. Alternatively, secondary structure prediction problem
has been approached using generative models based on semi hidden Markov
models. These models have been effective in capturing non-local interactions
through a joint sequence-structure probability distribution based on
structural segments. In work reported in the thesis, investigations are done
using a hierarchical model based on semi hidden Markov model and neural
network together with physical-chemical and structural properties of the
amino acids without using evolutionary information (viz., single sequence
methods). The proposed hybrid model exploits the relative advantages of
semi hidden Markov models, neural networks, and physical-chemical properties
of the amino acids for secondary structure prediction. The performance of the proposed architecture is further enhanced using neural network optimization
and ensemble techniques. The novelty of the proposed architecture
lies in its design and integration of different components.
Secondary structure of proteins is also influenced by the residue contact
maps and contact numbers. Novel Residue Contact Order matrices are
proposed to study the preferences of the amino acid residues for structural
types based on contacts at different positions. The complementary
information provided by these matrices is incorporated in the semi hidden
Markov model, which achieves better accuracies compared to conventional
approaches without this information. Further, a detailed theoretical framework
has also been developed for Markov chain Monte Carlo sampling in the
semi hidden Markov model to predict contact maps and numbers. Investigations
show that the proposed approach observes the pattern of contact
maps and contact numbers closely.
History
Campus location
Australia
Principal supervisor
Madhu Chetty
Year of Award
2007
Department, School or Centre
Information Technology (Monash University Gippsland)