Reason: Restricted by author. A copy can be supplied under Section 51(2) of the Australian Copyright Act 1968 by submitting a document delivery request through your library.
Scalability and Performance of Random Forest based Learning-to-Rank for Information Retrieval
thesis
posted on 2016-12-15, 05:26authored byMuhammad Ibrahim
For
a query submitted by a user, the goal of an information retrieval system is to
return a list of documents which are highly relevant with respect to that
query. Traditionally different scoring methods, ranging from simple heuristic
models to probabilistic models, have been used for this task. Recently
researchers have started to use supervised machine learning techniques for
solving this problem which is then called the learning-to-rank (LtR) problem.
Many supervised learning methods have been tested so far with empirical success
over conventional methods.
The random forest is a relatively simple but effective and
efficient learning algorithm which aggregates the predictions of a large number
of independent and variant base learners, namely decision trees. Its major
benefits over other state-of-the-art methods include inherent
parallelizability, ease of tuning and competitive performance. These benefits
attract researchers across various disciplines where a random forest is a very
popular choice. However, for LtR task, the random forest has not been
thoroughly investigated.
In this research, we investigate the random forest based LtR
algorithms. We aim at improving the efficiency, effectiveness, and
understanding of these algorithms. With respect to the first goal, we employ
undersampling techniques and leverage the inherent structure of a random forest
to achieve better scalability, especially for highly imbalanced datasets. We
also reduce the correlation among the trees to reduce learning time and to
improve performance. With respect to the second goal, we investigate various
objective functions ranging from completely randomized splitting criterion to
so-called listwise splitting. We also conduct a thorough study on random forest
based pointwise algorithms. With respect to the third goal, we develop methods
for estimating the bias and variance of rank-learning algorithms, and examine
their empirical behavior against parameters of the learning algorithm.