k-Word matches: an alignment-free sequence comparison method

Burden

Conrad J.

Forêt

Sylvain

Wilson

Susan R.

2017

k-word matches, the number of words of length k shared between two sequences, also known as the D2 statistic, are used in alignment-free sequence comparison statistic. The advantages of the use of this statistic over alignment-based methods for nucleotide and amino-acid sequence comparisons are firstly that it does not assume that homologous segments are contiguous, and secondly that the algorithm is computationally extremely fast, the runtime being proportional to the size of the sequence under scrutiny. We summarise our results to date on determing the distributional properties of the D2 statistic for a range of biologically relevant parameters and outline the directions in which the research will proceed. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1 Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.