Monash University
Browse

Modelling is more versatile than shuffling

Download (116.22 kB)
report
posted on 2022-08-31, 03:06 authored by L Allison, D Powell, T I Dix
Sequences having low information content cause problems for standard algorithms, e.g. causing false-positive matches. Shuffling is a popular technique of correcting for the abnormally low alignment costs (or high scores) between such sequences. Shuffling cannot be used safely on arbitrary populations of sequences. It is only used "after the fact" to judge the significance of alignments and does not change their rank-order. We seek a better solution. An alternative alignment methodology is described which directly models the information content of sequences. It can be used with a very large class of statistical models for different populations of sequences. In general, it not only judges the significance of alignments but can change their rank-order, promoting some and demoting others. The populations that the sequences come from can be identified, probably. The new methodology is compared to shuffling for the purpose of juding the significance of optimal alignments. The methodology described can be incorporated into any alignment algorithm that allows mutation costs to be treated as (-logs of) probabilities.

History

Technical report number

2000/83

Year of publication

2000

Usage metrics

    Monash Information Technology Technical Reports

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC