Embargoed and Restricted Access
Reason: Under embargo until June 2011. After this date a copy can be supplied under Section 51(2) of the Australian Copyright Act 1968 by submitting a document delivery request through your library
Genetic algorithm for Ab initio protein structure prediction based on low resolution models
thesisposted on 09.01.2017, 02:46 authored by Hoque, Md Tamjidul
Protein is a sequence of amino acids bounded into a linear chain that adopts a specific folded three-dimensional (3D) shape. This specific folded shape enables protein to perform specific tasks. Amongst various available computational methods, the protein structure prediction by the ab initio approach is promising and can help to unravel the relationship between sequence and its associated structure. This thesis is focused on the ab initio protein structure prediction (PSP), by developing novel Genetic Algorithm (GA) for an efficient and effective conformation search of low resolution models derived from the two-bead hydrophobichydrophilic (HP) models. The thesis also proposes a novel low resolution model, called hHPNX model providing more accurate predictions compared to the existing low resolution HP models. As a search technique, GA shows promise in the complex search landscape for investigating the PSP problem. However, for longer sequences the performance of GA can deteriorate and cause the algorithm to frequently stall or become stuck in local minima. Therefore, in this thesis, a critical analysis of the working principle of GA (i.e., the schemata theorem) is presented. This analysis leads to the generalisation of the schemata theorem. The fallacies in the selection procedure of the schemata theorem are removed and its crossover operation has been fully defined. A novel concept, a chromosome correlation factor (CCF), is proposed to identify similar chromosomes within the GA population, and the optimal value of CCF enables GA to perform effectively and thus helps provide superior results. Further, a non-isomorphic encoding algorithm is proposed for a bijective encoding within GA that prevents the expansion of the search landscape by maintaining a 1:1 relationship between the genotype and the phenotype. The non-isomorphic encoding reduces the chances of GA stalling and also prevents the tendency of the normal stochastic GA search to behave like a random search. Since the PSP solutions are compact in nature, the simple GA developed without any heuristics is further improved as hybrid GA (HGA) by utilising domain-specific knowledge. For an optimal core cavity, we have defined likely sub-conformations to provide guided search. Further, the multi-objective formulation of the search problem can overcome possible stall or stuck conditions by backtracking effectively and performing efficiently. Novel and effective move operators are designed and applied to efficiently move part of the converging compact conformation and thus achieve overall superior results. The simplified HP model and its extension, the HPNX model, are effective in exploring the convoluted PSP search landscape quickly. With its simplicity maintained, the HPNX is extended to a novel model called hHPNX model, which reduces the amount of degeneracy and which additionally captures the characteristics oftwo distinguished amino acids (Alanine and Valine) from the hydrophobic group. A corrected interaction potential matrix for an existing YhHX model is proposed, leading to its correct representation. Further, the facecentred- cube (FCC) model is shown to have the optimal lattice configuration for closely mapping the real folded protein. Three novel techniques are developed to compute the fitness function efficiently, to reduce the computation time. Most importantly, improvement in the speed of computation is achieved without sacrificing the accuracy of the prediction. All the techniques are complementary to each other and can work concurrently thereby reducing the computation time significantly.