AjayNair_PhDThesis_2017July19.pdf (7.75 MB)
Incorporating and Generating Prior Knowledge to Improve Gene Regulatory Network Inference
thesis
posted on 2017-09-17, 23:56 authored by Ajay NairCells regulate the
gene expression and protein activity to grow and adapt to the external
environment. Identifying the regulatory interactions in a cell is critical to
understand and engineer the life process. Gene regulatory network (GRN)
inference is the process of reconstructing the network of regulatory
interactions from experimental data by using statistical or machine-learning
techniques. GRN inference remains an unsolved grand challenge. Incorporating
prior knowledge into GRN inference is a promising approach proposed in
literature for accurate GRN reconstruction.
There are limitations in the reported methods of incorporating prior knowledge (termed priors). Firstly, the current methods focus on the knowledge of the presence of interactions between genes (edge priors). Secondly, only a few methods are known to incorporate priors, which incorporate it `before' the inference. Thus, many high-performing methods are not known to incorporate priors. Thirdly, priors exist only for a few well-studied organisms.
The thesis demonstrated that the edge priors provide only a limited improvement in the accuracy of GRN inference. It proposed and demonstrated that prior knowledge of the absence of interactions between genes (non-edge priors) is significant in improving the overall accuracy. The specificity, precision, and F1-score improved by 2-10%, 5-40%, and 5-12%, respectively. A method to generate around 70% of non-edge priors was also demonstrated.
This thesis analysed the maxP technique, which is widely used to reduce computational time, and identified its limitations. Two algorithms that overcome the limitations but retain the strengths of maxP, by incorporating GRN topology priors 'during' the inference, were proposed and developed. The theoretical and experimental results showed that these algorithms take only one-third of the normal computational time, without sacrificing the accuracy.
The thesis proposed and developed two algorithms that integrate priors 'after' the GRN inference process. Further, a method to identify and remove wrong interactions by using priors was proposed and developed. The results showed that the accuracy improved and errors reduced; around 970 additional correct edges were obtained and 1300 wrong interactions were removed with the incorporation of half of the total priors, when compared to a normal GRN inference. Moreover, the limitation that only a few GRN inference methods can incorporate the priors is overcome.
A generic mapping pipeline for predicting regulatory interactions with confidence ranks in an organism by using the known regulatory interactions from another organism was developed. This mapping pipeline was used to predict 20,280 regulatory interactions in 30 strains of cyanobacteria, which are a less-studied but scientifically and industrially relevant. A database, the RegCyanoDB, for these regulatory interactions is developed and made available for public access.
Thus, this thesis has focused on developing efficient methods for incorporating priors into GRN inference and generating priors for less-studied organisms. The thesis demonstrated that non-edge priors are significant in priors 'before' inference methods. Further, priors 'during' and 'after' inference methods were proposed and developed. A bioinformatic pipeline to predict regulatory interactions in less-studied organisms was also developed and applied.
There are limitations in the reported methods of incorporating prior knowledge (termed priors). Firstly, the current methods focus on the knowledge of the presence of interactions between genes (edge priors). Secondly, only a few methods are known to incorporate priors, which incorporate it `before' the inference. Thus, many high-performing methods are not known to incorporate priors. Thirdly, priors exist only for a few well-studied organisms.
The thesis demonstrated that the edge priors provide only a limited improvement in the accuracy of GRN inference. It proposed and demonstrated that prior knowledge of the absence of interactions between genes (non-edge priors) is significant in improving the overall accuracy. The specificity, precision, and F1-score improved by 2-10%, 5-40%, and 5-12%, respectively. A method to generate around 70% of non-edge priors was also demonstrated.
This thesis analysed the maxP technique, which is widely used to reduce computational time, and identified its limitations. Two algorithms that overcome the limitations but retain the strengths of maxP, by incorporating GRN topology priors 'during' the inference, were proposed and developed. The theoretical and experimental results showed that these algorithms take only one-third of the normal computational time, without sacrificing the accuracy.
The thesis proposed and developed two algorithms that integrate priors 'after' the GRN inference process. Further, a method to identify and remove wrong interactions by using priors was proposed and developed. The results showed that the accuracy improved and errors reduced; around 970 additional correct edges were obtained and 1300 wrong interactions were removed with the incorporation of half of the total priors, when compared to a normal GRN inference. Moreover, the limitation that only a few GRN inference methods can incorporate the priors is overcome.
A generic mapping pipeline for predicting regulatory interactions with confidence ranks in an organism by using the known regulatory interactions from another organism was developed. This mapping pipeline was used to predict 20,280 regulatory interactions in 30 strains of cyanobacteria, which are a less-studied but scientifically and industrially relevant. A database, the RegCyanoDB, for these regulatory interactions is developed and made available for public access.
Thus, this thesis has focused on developing efficient methods for incorporating priors into GRN inference and generating priors for less-studied organisms. The thesis demonstrated that non-edge priors are significant in priors 'before' inference methods. Further, priors 'during' and 'after' inference methods were proposed and developed. A bioinformatic pipeline to predict regulatory interactions in less-studied organisms was also developed and applied.
History
Campus location
AustraliaPrincipal supervisor
Madhu ChettyAdditional supervisor 1
Pramod P WangikarAdditional supervisor 2
Sue McKemmishYear of Award
2017Department, School or Centre
Information TechnologyAdditional Institution or Organisation
Indian Institute of Technology BombayCourse
Doctor of PhilosophyDegree Type
DOCTORATEFaculty
Faculty of Information TechnologyUsage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC