Cells regulate the
gene expression and protein activity to grow and adapt to the external
environment. Identifying the regulatory interactions in a cell is critical to
understand and engineer the life process. Gene regulatory network (GRN)
inference is the process of reconstructing the network of regulatory
interactions from experimental data by using statistical or machine-learning
techniques. GRN inference remains an unsolved grand challenge. Incorporating
prior knowledge into GRN inference is a promising approach proposed in
literature for accurate GRN reconstruction.
There are limitations in the reported methods of
incorporating prior knowledge (termed priors). Firstly, the current methods
focus on the knowledge of the presence of interactions between genes (edge
priors). Secondly, only a few methods are known to incorporate priors, which incorporate
it `before' the inference. Thus, many high-performing methods are not known to
incorporate priors. Thirdly, priors exist only for a few well-studied
organisms.
The thesis demonstrated that the edge priors provide only a
limited improvement in the accuracy of GRN inference. It proposed and
demonstrated that prior knowledge of the absence of interactions between genes
(non-edge priors) is significant in improving the overall accuracy. The
specificity, precision, and F1-score improved by 2-10%, 5-40%, and 5-12%,
respectively. A method to generate around 70% of non-edge priors was also
demonstrated.
This thesis analysed the maxP technique, which is widely used
to reduce computational time, and identified its limitations. Two algorithms
that overcome the limitations but retain the strengths of maxP, by
incorporating GRN topology priors 'during' the inference, were proposed and
developed. The theoretical and experimental results showed that these
algorithms take only one-third of the normal computational time, without
sacrificing the accuracy.
The thesis proposed and developed two algorithms that
integrate priors 'after' the GRN inference process. Further, a method to
identify and remove wrong interactions by using priors was proposed and
developed. The results showed that the accuracy improved and errors reduced;
around 970 additional correct edges were obtained and 1300 wrong interactions
were removed with the incorporation of half of the total priors, when compared
to a normal GRN inference. Moreover, the limitation that only a few GRN
inference methods can incorporate the priors is overcome.
A generic mapping pipeline for predicting regulatory
interactions with confidence ranks in an organism by using the known regulatory
interactions from another organism was developed. This mapping pipeline was
used to predict 20,280 regulatory interactions in 30 strains of cyanobacteria,
which are a less-studied but scientifically and industrially relevant. A
database, the RegCyanoDB, for these regulatory interactions is developed and
made available for public access.
Thus, this thesis has focused on developing efficient methods
for incorporating priors into GRN inference and generating priors for
less-studied organisms. The thesis demonstrated that non-edge priors are
significant in priors 'before' inference methods. Further, priors 'during' and
'after' inference methods were proposed and developed. A bioinformatic pipeline
to predict regulatory interactions in less-studied organisms was also developed
and applied.
History
Campus location
Australia
Principal supervisor
Madhu Chetty
Additional supervisor 1
Pramod P Wangikar
Additional supervisor 2
Sue McKemmish
Year of Award
2017
Department, School or Centre
Information Technology
Additional Institution or Organisation
Indian Institute of Technology Bombay, India (IITB)