Monash University
Browse
tr-2005-178-full.pdf (207.81 kB)

Inferring Phylogenetic Graphs for Natural Languages using MML

Download (207.81 kB)
report
posted on 2022-07-25, 00:39 authored by J N Ooi, D L Dowe
Languages, like everything around us, evolve and change over a period of time. The aim of this report is to be able to model this evolution that occurs between natural languages. We introduce the idea of inferring phylogenetic (or evolutionary) models for natural languages using the Minimum Message Length (MML) principle. Phylogenetic models show the evolutionary interrelatiionship among various species or other entities. We extend phylogenetic trees to phylogenetic graphs. Minimum Message Length (MML) is an inductive inference method that measures the goodness of a model. We use MML to infer phylogenetic graphs (including mutation probabilities along arcs). We introduce the use of MML to infer phylogenetic graphs for artificial languages as well as for some European languages (English, French, Spanish and German). Unlike phylogenetic trees, phylogenetic graphs are capable of modelling evolution where a child node inherits features from more than one parent node. In a phylogenetic tree, each child node has exactly one parent node. This means that each child language is allowed to inherit from only one parent language. However, it is clear that in the real world, such a situation is unlikely to occur. Hence, we extend phylogenetic trees to phylogenetic graphs to model the fact that a language can be influenced by more than one other language. The first part of our modelling assumes only copy and change operations on characters, and is based on words that have the same length in all natural languages considered, whereas the subsequent section uses string alignment techniques to model words with different lengths and allows for copy, change, insert and delete operations on characters. All methods have been verified by testing them on artificial languages for which the evolutionary order is known. The resulting phylogenetic model inferred by MML reflects the correct evolutionary order.

History

Technical report number

2005/178

Year of publication

2005

Usage metrics

    Monash Information Technology Technical Reports

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC