Scalability and Performance of Random Forest based Learning-to-Rank for Information Retrieval

Ibrahim, Muhammad

doi:10.4225/03/5852297e1b7fe

Restricted Access

Reason: Restricted by author. A copy can be supplied under Section 51(2) of the Australian Copyright Act 1968 by submitting a document delivery request through your library.

Scalability and Performance of Random Forest based Learning-to-Rank for Information Retrieval

thesis

posted on 2016-12-15, 05:26 authored by Muhammad Ibrahim

For a query submitted by a user, the goal of an information retrieval system is to return a list of documents which are highly relevant with respect to that query. Traditionally different scoring methods, ranging from simple heuristic models to probabilistic models, have been used for this task. Recently researchers have started to use supervised machine learning techniques for solving this problem which is then called the learning-to-rank (LtR) problem. Many supervised learning methods have been tested so far with empirical success over conventional methods.
The random forest is a relatively simple but effective and efficient learning algorithm which aggregates the predictions of a large number of independent and variant base learners, namely decision trees. Its major benefits over other state-of-the-art methods include inherent parallelizability, ease of tuning and competitive performance. These benefits attract researchers across various disciplines where a random forest is a very popular choice. However, for LtR task, the random forest has not been thoroughly investigated.
In this research, we investigate the random forest based LtR algorithms. We aim at improving the efficiency, effectiveness, and understanding of these algorithms. With respect to the first goal, we employ undersampling techniques and leverage the inherent structure of a random forest to achieve better scalability, especially for highly imbalanced datasets. We also reduce the correlation among the trees to reduce learning time and to improve performance. With respect to the second goal, we investigate various objective functions ranging from completely randomized splitting criterion to so-called listwise splitting. We also conduct a thorough study on random forest based pointwise algorithms. With respect to the third goal, we develop methods for estimating the bias and variance of rank-learning algorithms, and examine their empirical behavior against parameters of the learning algorithm.

History

Campus location

Australia

Principal supervisor

Mark Carman

Additional supervisor 1

Manzur Murshed

Year of Award

2016

Department, School or Centre

Information Technology

Course

Doctor of Philosophy

Degree Type

DOCTORATE

Faculty

Faculty of Information Technology

Usage metrics

Keywords

Information retrieval Supervised Machine Learning Random forest Learning-to-rank Effectiveness Efficiency Scalability NDCG

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Restricted Access

Scalability and Performance of Random Forest based Learning-to-Rank for Information Retrieval

History

Campus location

Principal supervisor

Additional supervisor 1

Year of Award

Department, School or Centre

Course

Degree Type

Faculty

Usage metrics

Categories

Keywords

Licence

Exports