Genetic algorithm for Ab initio protein structure prediction based on low resolution models
thesis
posted on 2017-01-09, 02:46authored byHoque, Md Tamjidul
Protein is a sequence of amino acids bounded into a linear chain that adopts a specific folded
three-dimensional (3D) shape. This specific folded shape enables protein to perform specific
tasks. Amongst various available computational methods, the protein structure prediction by
the ab initio approach is promising and can help to unravel the relationship between sequence
and its associated structure. This thesis is focused on the ab initio protein structure prediction
(PSP), by developing novel Genetic Algorithm (GA) for an efficient and effective
conformation search of low resolution models derived from the two-bead hydrophobichydrophilic
(HP) models. The thesis also proposes a novel low resolution model, called
hHPNX model providing more accurate predictions compared to the existing low resolution
HP models.
As a search technique, GA shows promise in the complex search landscape for
investigating the PSP problem. However, for longer sequences the performance of GA can
deteriorate and cause the algorithm to frequently stall or become stuck in local minima.
Therefore, in this thesis, a critical analysis of the working principle of GA (i.e., the schemata
theorem) is presented. This analysis leads to the generalisation of the schemata theorem. The
fallacies in the selection procedure of the schemata theorem are removed and its crossover
operation has been fully defined. A novel concept, a chromosome correlation factor (CCF), is
proposed to identify similar chromosomes within the GA population, and the optimal value of
CCF enables GA to perform effectively and thus helps provide superior results.
Further, a non-isomorphic encoding algorithm is proposed for a bijective encoding within
GA that prevents the expansion of the search landscape by maintaining a 1:1 relationship
between the genotype and the phenotype. The non-isomorphic encoding reduces the chances
of GA stalling and also prevents the tendency of the normal stochastic GA search to behave
like a random search.
Since the PSP solutions are compact in nature, the simple GA developed without any
heuristics is further improved as hybrid GA (HGA) by utilising domain-specific knowledge.
For an optimal core cavity, we have defined likely sub-conformations to provide guided
search. Further, the multi-objective formulation of the search problem can overcome possible
stall or stuck conditions by backtracking effectively and performing efficiently. Novel and
effective move operators are designed and applied to efficiently move part of the converging
compact conformation and thus achieve overall superior results.
The simplified HP model and its extension, the HPNX model, are effective in exploring
the convoluted PSP search landscape quickly. With its simplicity maintained, the HPNX is
extended to a novel model called hHPNX model, which reduces the amount of degeneracy
and which additionally captures the characteristics oftwo distinguished amino acids (Alanine
and Valine) from the hydrophobic group. A corrected interaction potential matrix for an
existing YhHX model is proposed, leading to its correct representation. Further, the facecentred-
cube (FCC) model is shown to have the optimal lattice configuration for closely
mapping the real folded protein.
Three novel techniques are developed to compute the fitness function efficiently, to
reduce the computation time. Most importantly, improvement in the speed of computation is
achieved without sacrificing the accuracy of the prediction. All the techniques are
complementary to each other and can work concurrently thereby reducing the computation
time significantly.
History
Campus location
Australia
Principal supervisor
Madhu Chetty
Year of Award
2008
Department, School or Centre
Information Technology (Monash University Gippsland)