Restricted Access

Reason: Restricted by author. A copy can be supplied under Section 51(2) of the Australian Copyright Act 1968 by submitting a document delivery request through your library, or by emailing document.delivery@monash.edu

Scalability and Performance of Random Forest based Learning-to-Rank for Information Retrieval

thesis
posted on 15.12.2016, 05:26 authored by Muhammad Ibrahim
For a query submitted by a user, the goal of an information retrieval system is to return a list of documents which are highly relevant with respect to that query. Traditionally different scoring methods, ranging from simple heuristic models to probabilistic models, have been used for this task. Recently researchers have started to use supervised machine learning techniques for solving this problem which is then called the learning-to-rank (LtR) problem. Many supervised learning methods have been tested so far with empirical success over conventional methods.
  The random forest is a relatively simple but effective and efficient learning algorithm which aggregates the predictions of a large number of independent and variant base learners, namely decision trees. Its major benefits over other state-of-the-art methods include inherent parallelizability, ease of tuning and competitive performance. These benefits attract researchers across various disciplines where a random forest is a very popular choice. However, for LtR task, the random forest has not been thoroughly investigated.
  In this research, we investigate the random forest based LtR algorithms. We aim at improving the efficiency, effectiveness, and understanding of these algorithms. With respect to the first goal, we employ undersampling techniques and leverage the inherent structure of a random forest to achieve better scalability, especially for highly imbalanced datasets. We also reduce the correlation among the trees to reduce learning time and to improve performance. With respect to the second goal, we investigate various objective functions ranging from completely randomized splitting criterion to so-called listwise splitting. We also conduct a thorough study on random forest based pointwise algorithms. With respect to the third goal, we develop methods for estimating the bias and variance of rank-learning algorithms, and examine their empirical behavior against parameters of the learning algorithm.

History

Campus location

Australia

Principal supervisor

Mark Carman

Additional supervisor 1

Manzur Murshed

Year of Award

2016

Department, School or Centre

Information Technology

Course

Doctor of Philosophy

Degree Type

DOCTORATE

Faculty

Faculty of Information Technology