MML Inference of Decision Trees, Graphs and Forests
TanPeter J.
2017
The Minimum
Message Length (MML) principle is a general inductive inference framework which
recasts inductive inference as a coding process. Using assumptions identical to
those from the well-known Bayesian inference, prediction and modelling, MML
represents general inductive and statistical inference problems as a data
encoding process which conveys the data to a receiver in a two-part message. The
first part is the message to encode the inferred model. The second part is the
message to encode the data in light of the inferred model. MML states that the
model with the shortest two-part message is the best model approximating the
true model. <br>
<br>
Decision trees, decision graphs and decision forests are
popular supervised learning methods in machine learning. In this dissertation,
the MML principle is applied to build machine learning schemes for decision
trees, decision graphs and decision forests. Two novel MML inference schemes
are developed. One is a MML coding scheme for Oblique decision trees, which are
decision trees with linear discriminate functions at their internal nodes.
Another is a MML coding scheme for decision graphs with multi-way joins and
dynamic attributes. A decision forests learning scheme based on MML oblique
decision trees is also presented. <br>
<br>
Experiments were conducted across a range of problems using
data from University of California Machine Learning Repository and the
Singapore Data Mining Centre. These experiments showed that compared to other
popular decision tree models such C4.5 and C5, models generated by MML
inference schemes achieved favourable results in both classification and
probabilistic predictive accuracy. The study showed that MML inference schemes
are able to find the optimal trade-off between the complexity of these
structure models and goodness-of-fit for a given set of data.