Reason: Access restricted by the author. A copy can be requested for private research and study by contacting your institution's library service. This copy cannot be republished
Computational algorithms for comparative genomics
thesisposted on 2017-02-14, 02:34 authored by Mahmood, Khalid
Advances in high throughput genome sequencing has presented an opportunity to study how species are related, especially, in terms of their evolution and molecular functions. However, the capability to generate genome sequence data outweighs the ability to decipher and translate this data to biological information. Therefore, computational methods play a key role in deciphering large and complex genome data that is essential for bridging the growing gap between genes of known and unknown functions. To this end, computational comparative genomics is an essential task for studying the organization, topology and conservation of genes and strings of genes that lends to a better biological understanding of gene function and annotation. At the core of comparative genomic is the task of identifying gene relationships or matches across genomes. However, large dimensionality of genome data and complex evolutionary artefacts means that gene matching is a non-trivial task and new computational approaches are constantly required to address these issues. This thesis presents new algorithms for gene matching to identify gene relationships across genomes (or complete proteomes). Novel computational methods are presented here that (1) perform comparisons between small related species such as microbial strains, (2) calculate gene matching on large-scale genome data to identify gene orthologs, conserved gene strings and evolutionary rearrangements, (3) calculate complex orthologous relationships such as co-orthologs and (4) calculate rapid large-scale sequence comparisons. The methods described here are applied to a variety of genome comparisons ranging from small microbial strains to large eukarytoes such as human, mouse and rat genomes. The results from these comparisons revealed orthologous and co-orthologous genes, syntenic regions, conserved gene strings and genome rearrangements with high accuracy. Further experiments have also shown the methods described here to be computationally efficient and robust.