File(s) under permanent embargo

Reason: Restricted by author. A copy can be supplied under Section 51(2) of the Australian Copyright Act 1968 by submitting a document delivery request through your library or by emailing document.delivery@monash.edu

Multi-modal memetic framework using H-core for low resolution protein structure prediction

thesis
posted on 21.02.2017, 23:47 authored by Nazmul, Rumana
Proteins are cellular macromolecules made up of linear chains of amino acids that adopt a unique three-dimensional structure to carry out specific biological functions. Amongst various computational approaches developed for Protein Structure Prediction (PSP), ab initio methods perform the prediction without prior knowledge of known structures. However, due to huge computational complexity incurred by these methods, simplified hydrophobic-polar (HP) models are often used for the investigations. Despite this simplicity, PSP continues to be a hard combinatorial optimization problem requiring development of efficient heuristic search techniques. Memetic algorithm (MA), with its flexible architecture, encompasses the strength of both local and global search and has shown promise in solving problems with complex search landscape, including PSP. This thesis is focused on developing an effective ab initio technique under a suitable MA framework for solving the PSP problem using low resolution HP models. To enhance the effectiveness of the MA, a mechanism is devised to include knowledge from the problem domain by exploiting the concept of hydrophobic-core (H-core) formation. The size of the maximal possible H-core is estimated and the approximation is validated by our proposed deterministic search technique. Further, a new knowledge-based initial population generation technique ensures generating conformations which are not only valid, but also maintain diversity in the population. This helps to commence the search with good seeds leading towards faster convergence. While solving the PSP problem using MA, the greatest intricacy arises from the complex nature of its multi-modal landscape. Like all the evolutionary algorithms (EAs), MAs have the tendency to get trapped into local minima due to selection pressure and genetic drift induced by the genetic operators, and techniques are necessary to surmount this difficulty. A novel parental selection technique called “Adaptive Strategy for Assortative Mating” is developed that dynamically distributes the re production opportunity among the individuals of the population to boost the effectiveness of cross-over. Further, a new survival selection strategy, namely “Sib-based Survival Selection”, prevents the rapid spread of genetic material from a particular individual to other members in the population, ensuring the concurrent preservation of several potential solutions, and restricting premature convergence. A new diversification mechanism, namely “Memory-based Diversification”, systematically diversifies the population by introducing new genetic material whenever it gets stuck at local peaks and allows it to proceed for further progress. An extensive experimental analysis is also reported on the influence of all three techniques proposed to maintain the diversity of population. Furthermore, in order to carry out the conformational search efficiently, especially with the increase in sequence length, the optimization technique is designed such that it provides an effective balance between exploration and exploitation. A robust Multimodal Memetic Framework has been developed where optimization is accomplished in hierarchical stages. The population of each stage is partitioned into three different states: an Exploratory state to locate promising peaks, an Exploitative state to fine-tune the solutions found in the Exploratory state of the previous stage, and finally, a Central state to carry out the formation of potential solutions to explore the undiscovered regions. Non-identical genetic operators have been implemented in each state to balance the exploration and exploitation by employing necessary selection pressure and diversity. A non-isomorphic encoding based technique has been developed to capture the sub-structures that are representing a particular region. Furthermore, a novel local search technique exploits already explored fit individuals by employing the knowledge of sub-structures. The efficacy of the proposed framework is demonstrated with an adequate number of distinctive data sets and comparison with state-of-the-art methods.

History

Campus location

Australia

Principal supervisor

Madhu Chetty

Year of Award

2015

Department, School or Centre

Gippsland School of IT

Course

Doctor of Philosophy

Degree Type

DOCTORATE

Faculty

Faculty of Information Technology