20170309-Tan-Thesis.pdf (1.64 MB)
MML Inference of Decision Trees, Graphs and Forests
thesis
posted on 2017-03-16, 04:38 authored by Peter J. TanThe Minimum
Message Length (MML) principle is a general inductive inference framework which
recasts inductive inference as a coding process. Using assumptions identical to
those from the well-known Bayesian inference, prediction and modelling, MML
represents general inductive and statistical inference problems as a data
encoding process which conveys the data to a receiver in a two-part message. The
first part is the message to encode the inferred model. The second part is the
message to encode the data in light of the inferred model. MML states that the
model with the shortest two-part message is the best model approximating the
true model.
Decision trees, decision graphs and decision forests are popular supervised learning methods in machine learning. In this dissertation, the MML principle is applied to build machine learning schemes for decision trees, decision graphs and decision forests. Two novel MML inference schemes are developed. One is a MML coding scheme for Oblique decision trees, which are decision trees with linear discriminate functions at their internal nodes. Another is a MML coding scheme for decision graphs with multi-way joins and dynamic attributes. A decision forests learning scheme based on MML oblique decision trees is also presented.
Experiments were conducted across a range of problems using data from University of California Machine Learning Repository and the Singapore Data Mining Centre. These experiments showed that compared to other popular decision tree models such C4.5 and C5, models generated by MML inference schemes achieved favourable results in both classification and probabilistic predictive accuracy. The study showed that MML inference schemes are able to find the optimal trade-off between the complexity of these structure models and goodness-of-fit for a given set of data.
Decision trees, decision graphs and decision forests are popular supervised learning methods in machine learning. In this dissertation, the MML principle is applied to build machine learning schemes for decision trees, decision graphs and decision forests. Two novel MML inference schemes are developed. One is a MML coding scheme for Oblique decision trees, which are decision trees with linear discriminate functions at their internal nodes. Another is a MML coding scheme for decision graphs with multi-way joins and dynamic attributes. A decision forests learning scheme based on MML oblique decision trees is also presented.
Experiments were conducted across a range of problems using data from University of California Machine Learning Repository and the Singapore Data Mining Centre. These experiments showed that compared to other popular decision tree models such C4.5 and C5, models generated by MML inference schemes achieved favourable results in both classification and probabilistic predictive accuracy. The study showed that MML inference schemes are able to find the optimal trade-off between the complexity of these structure models and goodness-of-fit for a given set of data.