Scalable Non-Markovian Sequential Modelling for Natural Language Processing

NOJEHDEH, EHSAN SHAREGHI

doi:10.4225/03/59e403f673143

Scalable Non-Markovian Sequential Modelling for Natural Language Processing

thesis

posted on 2017-10-16, 00:57 authored by EHSAN SHAREGHI NOJEHDEH

We show that finite-order Markov models fail to capture long range dependencies that exist in human language and propose infinite-order non-Markovian (Bayesian and non-Bayesian) models which are capable of capturing unbounded dependencies. Presenting the structure of an infinite-order model amounts to a significant memory usage, and its very large space of parameters introduces computational and statistical burdens in the learning phase. We propose a framework based on compressed data structures which keeps the memory usage of modelling, learning, and inference steps independent from the order of the models. Our approach scales nicely with the order of the Markov model and data size, and is highly competitive with the state-of-the-art in terms of the memory and runtime, while allowing us to develop more accurate models.

History

Campus location

Australia

Principal supervisor

Gholamreza Haffari

Additional supervisor 1

Trevor Cohn

Additional supervisor 2

Ann Nicholson

Year of Award

2017

Department, School or Centre

Information Technology (Monash University Clayton)

Course

Doctor of Philosophy

Degree Type

DOCTORATE

Faculty

Faculty of Information Technology

Usage metrics

Keywords

Language Model Compressed Data Strucutres Sequence Modeling Nonparametric Bayesian Kneser-Ney Smoothing Infinite Context Infinite Order Hierarchical Pitman-Yor Process MCMC Structured Prediction Scalability Data Structures Knowledge Representation and Machine Learning Natural Language Processing

Licence

In Copyright

Scalable Non-Markovian Sequential Modelling for Natural Language Processing

History

Campus location

Principal supervisor

Additional supervisor 1

Additional supervisor 2

Year of Award

Department, School or Centre

Course

Degree Type

Faculty

Usage metrics

Categories

Keywords

Licence

Exports