Exploring relational features and learning under distant supervision for information extraction tasks.

Nagesh, Ajay

doi:10.4225/03/58b65b5f397f5

monash_162389.pdf (3.04 MB)

Exploring relational features and learning under distant supervision for information extraction tasks.

thesis

posted on 2017-03-01, 05:25 authored by Nagesh, Ajay

Information Extraction (IE) has become an indispensable tool in our quest to handle the data deluge of the information age. IE can broadly be decomposed into Named-entity Recognition (NER) and Relation Extraction (RE). In this thesis, we view the task of IE as finding patterns in unstructured data, which can either take the form of features and/or be specified by constraints. In NER, we study the categorization of complex relational features and outline methods to learn feature combinations though induction. We demonstrate the efficacy of induction techniques in learning : i) rules for the identification of named entities in text — the novelty is the application of induction techniques to learn in a very expressive declarative rule language ii) a richer sequence labeling model — enabling optimal learning of discriminative features. In RE, our investigations are in the paradigm of distant supervision, which facilitates the creation of large, albeit noisy training data. We devise an inference framework in which constraints can be easily specified in learning relation extractors. In addition, we reformulate the learning objective in a max-margin framework. To the best of our knowledge, our formulation is the first to optimize multi-variate non-linear performance measures such as Fβ for a latent variable structure prediction task. Thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of the Indian Institute of Technology Bombay, India and Monash University, Australia.

History

Campus location

Australia

Principal supervisor

Gholamreza Haffari

Year of Award

2015

Department, School or Centre

Information Technology (Monash University Clayton)

Additional Institution or Organisation

Indian Institute of Technology Bombay

Degree Type

DOCTORATE

Faculty

Faculty of Information Technology

Usage metrics

Keywords

monash:162389 Rule induction Relation extraction thesis(doctorate)1959.1/1219901 Multivariate performance measures Named entity recognition 2015 Distant supervision Information extraction ethesis-20151009-005135 Open access

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Exploring relational features and learning under distant supervision for information extraction tasks.

History

Campus location

Principal supervisor

Year of Award

Department, School or Centre

Additional Institution or Organisation

Degree Type

Faculty

Usage metrics

Categories

Keywords

Licence

Exports