monash_120457.pdf (6.54 MB)
Download file

Modeling and learning realistic genetic interactions using dynamic Bayesian network and information theory

Download (6.54 MB)
thesis
posted on 28.02.2017, 03:48 authored by Morshed, Nizamul
Deciphering genetic interactions is of fundamental importance in computational systems biology, with wide applications in a number of other associated areas. Realistic modeling of these interactions poses novel challenges while dealing with the problem. Further, learning these interactions using computational methods becomes increasingly complex with the adoption of advanced and more realistic modeling techniques. In this thesis, we propose methods to address this challenge using a graphical model having sound probabilistic underpinnings, commonly known as dynamic Bayesian networks. Inference of genetic interactions is usually carried out using DNA microarray data. This data provides snapshots of mRNA expression levels of a large number of genes from a single experiment. However, the number of samples from such experiments is small, and additionally, they contain missing values and noise. Bayesian networks are considered as one of the most promising ways by which these issues can be tackled. However, traditional Bayesian networks have their own limitations; for example, they neither take time information into account nor can they capture feedback. Further, accurate determination of the direction of regulation requires a significant number of tests to be performed. Dynamic Bayesian networks (DBN) are extensions of Bayesian networks that can effectively address these limitations. In this thesis, we develop novel techniques for gene regulatory network reconstruction using DBN based modeling approach. We start with a basic DBN based model, and improve it so that it can represent and model both instantaneous and time-delayed genetic interactions. Initially, we aim to detect the occurrence of instantaneous and single-step time-delayed interactions, and subsequently this approach is further extended to model the instantaneous and multi-step time-delayed interactions. This approach of modeling both instantaneous and multi-step time-delayed genetic interactions is superior to traditional DBN based GRN reconstruction techniques, where only the time delayed interactions are learnt.%, thereby advancing the state of the art for modeling genetic regulations using DBNs. In addition to modeling interactions, one needs a learning mechanism for inferring genetic interactions. To facilitate detection of nonlinear gene to gene interactions (in addition to linear interactions), which are prevalent in all genetic networks, we propose using well known properties, including fundamental results related to information theoretic measures for testing conditional independence relations in a DBN. This enables us to formulate efficient learning techniques for reconstructing GRNs. Using these theoretical underpinnings, we first implement simple hill-climbing techniques that enable detection of various types of interactions among genes. Subsequently, we use these results to devise novel score and search based evolutionary computation techniques, which can effectively explore a significantly larger search space. We carry out investigations using both synthetic networks as well as real-life networks. For real-life network study, we use four different microarray data sources, covering three organisms, namely, yeast, E. coli and cyanobacteria. We use networks of varying sizes, ranging from five-gene small networks (yeast) to large scale networks of cyanobacteria (730 genes). The evaluation of the performance is carried out using four widely used performance measures. For some networks where we do not have sufficient information for calculating these performance measures, we use literature mining for performing comparative evaluations of the proposed approaches. For the large scale network of cyanobacteria, we use gene ontology (GO) based analysis of gene functionalities, in addition to degree distribution analysis of the inferred network. Due to the inherent difficulties associated with inferring GRNs using DNA microarray data, it is often supplemented by other sources of data; for example, genomic data and protein-protein interaction data. In this thesis, we propose a framework that jointly learns the structure of a GRN and a protein-protein interaction network (PPIN). Using this process, the GRN reconstruction technique can effectively make use of the vast wealth of knowledge available from these external sources of data. This knowledge is fed to the GRN reconstruction process probabilistically, thereby enabling it to weigh each different data source according to the reliability of that source. The approach is applied on yeast networks where four different interaction data sources and a number of genomic data sources are used. Together with the novel modeling and learning techniques proposed in this thesis, the probabilistic integration of different types of knowledge sources and the co-learning of GRN with PPIN represents a significant step towards the reconstruction of GRNs using DBNs.

History

Campus location

Australia

Principal supervisor

Madhu Chetty

Year of Award

2013

Department, School or Centre

Gippsland School of IT

Course

Doctor of Philosophy

Degree Type

DOCTORATE

Faculty

Faculty of Information Technology