A data driven approach to detect regulatory features in multi-omics high throughput sequencing data

Chen, Tyrone; McGee, Matthew; Rigby, Jason; Tyagi, Sonika

doi:10.26180/5dfc759452985

ABACBS_2019.pdf (669.21 kB)

A data driven approach to detect regulatory features in multi-omics high throughput sequencing data

poster

posted on 2019-12-20, 07:17 authored by Tyrone ChenTyrone Chen, Matthew McGee, Jason RigbyJason Rigby, Sonika Tyagi

Integrating multiple biological datasets, especially across different types of experiments for example RNA-Seq, ChIP-Seq, ATAC-Seq, Hi-C and single cell sequencing data coherently is a difficult task. Many existing data integration strategies involve repeatedly summarising layers of information, as raw sequence data from each of the different types of sequencing experiments is not directly comparable. This process usually collapses the information to sets of gene regulatory networks for direct comparison. As a result, a significant volume of quantitative information is lost.

Therefore, a data-driven approach was taken to address this problem, designed to take high throughput sequencing data directly as input. In the overall framework, known models of gene regulatory patterns such as position weight matrices will be incorporated. This will be supplemented with available biological information of the system such as evolutionary information in the form of phylogenetic distances, interaction maps of biomolecules (DNA, RNA or protein).

The end result is an agnostic framework which is capable of taking any combination of types of high throughput sequencing data, and identifying any regulatory patterns present within DNA sequences of interest. A major advantage of the design is that it limits significant assumptions about the data as the user will be required to input high throughput sequencing data directly, instead of summarised or heavily processed data. At the same time, providing data in its primary form reduces information loss, allowing the algorithm to be more sensitive to weak signals in the data.