Incorporating and Generating Prior Knowledge to Improve Gene Regulatory Network Inference

Nair, Ajay

doi:10.4225/03/59bf0bad8745a

AjayNair_PhDThesis_2017July19.pdf (7.75 MB)

Incorporating and Generating Prior Knowledge to Improve Gene Regulatory Network Inference

thesis

posted on 2017-09-17, 23:56 authored by Ajay Nair

Cells regulate the gene expression and protein activity to grow and adapt to the external environment. Identifying the regulatory interactions in a cell is critical to understand and engineer the life process. Gene regulatory network (GRN) inference is the process of reconstructing the network of regulatory interactions from experimental data by using statistical or machine-learning techniques. GRN inference remains an unsolved grand challenge. Incorporating prior knowledge into GRN inference is a promising approach proposed in literature for accurate GRN reconstruction.

There are limitations in the reported methods of incorporating prior knowledge (termed priors). Firstly, the current methods focus on the knowledge of the presence of interactions between genes (edge priors). Secondly, only a few methods are known to incorporate priors, which incorporate it `before' the inference. Thus, many high-performing methods are not known to incorporate priors. Thirdly, priors exist only for a few well-studied organisms.

The thesis demonstrated that the edge priors provide only a limited improvement in the accuracy of GRN inference. It proposed and demonstrated that prior knowledge of the absence of interactions between genes (non-edge priors) is significant in improving the overall accuracy. The specificity, precision, and F1-score improved by 2-10%, 5-40%, and 5-12%, respectively. A method to generate around 70% of non-edge priors was also demonstrated.

This thesis analysed the maxP technique, which is widely used to reduce computational time, and identified its limitations. Two algorithms that overcome the limitations but retain the strengths of maxP, by incorporating GRN topology priors 'during' the inference, were proposed and developed. The theoretical and experimental results showed that these algorithms take only one-third of the normal computational time, without sacrificing the accuracy.

The thesis proposed and developed two algorithms that integrate priors 'after' the GRN inference process. Further, a method to identify and remove wrong interactions by using priors was proposed and developed. The results showed that the accuracy improved and errors reduced; around 970 additional correct edges were obtained and 1300 wrong interactions were removed with the incorporation of half of the total priors, when compared to a normal GRN inference. Moreover, the limitation that only a few GRN inference methods can incorporate the priors is overcome.

A generic mapping pipeline for predicting regulatory interactions with confidence ranks in an organism by using the known regulatory interactions from another organism was developed. This mapping pipeline was used to predict 20,280 regulatory interactions in 30 strains of cyanobacteria, which are a less-studied but scientifically and industrially relevant. A database, the RegCyanoDB, for these regulatory interactions is developed and made available for public access.

Thus, this thesis has focused on developing efficient methods for incorporating priors into GRN inference and generating priors for less-studied organisms. The thesis demonstrated that non-edge priors are significant in priors 'before' inference methods. Further, priors 'during' and 'after' inference methods were proposed and developed. A bioinformatic pipeline to predict regulatory interactions in less-studied organisms was also developed and applied.

History

Campus location

Australia

Principal supervisor

Madhu Chetty

Additional supervisor 1

Pramod P Wangikar

Additional supervisor 2

Sue McKemmish

Year of Award

2017

Department, School or Centre

Information Technology

Additional Institution or Organisation

Indian Institute of Technology Bombay

Course

Doctor of Philosophy

Degree Type

DOCTORATE

Faculty

Faculty of Information Technology

Usage metrics

Keywords

gene regulatory network gene regulatory network inference prior knowledge Bayesian network reverse best hit systems biology bioinformatics computational biology

Licence

CC BY-NC 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Incorporating and Generating Prior Knowledge to Improve Gene Regulatory Network Inference

History

Campus location

Principal supervisor

Additional supervisor 1

Additional supervisor 2

Year of Award

Department, School or Centre

Additional Institution or Organisation

Course

Degree Type

Faculty

Usage metrics

Categories

Keywords

Licence

Exports