monash_162389.pdf (3.04 MB)
Exploring relational features and learning under distant supervision for information extraction tasks.
thesis
posted on 2017-03-01, 05:25 authored by Nagesh, AjayInformation Extraction (IE) has become an indispensable tool in our quest to handle the data deluge of the information age. IE can broadly be decomposed into Named-entity Recognition (NER) and Relation Extraction (RE). In this thesis, we view the task of IE as finding patterns in unstructured data, which can either take the form of features and/or be specified by constraints. In NER, we study the categorization of complex relational features and outline methods to learn feature combinations though induction. We demonstrate the efficacy of induction techniques in learning : i) rules for the identification of named entities in text — the novelty is the application of induction techniques to learn in a very expressive declarative rule language ii) a richer sequence labeling model — enabling optimal learning of discriminative features. In RE, our investigations are in the paradigm of distant supervision, which facilitates the creation of large, albeit noisy training data. We devise an inference framework in which constraints can be easily specified in learning relation extractors. In addition, we reformulate the learning objective in a max-margin framework. To the best of our knowledge, our formulation is the first to optimize multi-variate non-linear performance measures such as Fβ for a latent variable structure prediction task. Thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of the Indian Institute of Technology Bombay, India and Monash University, Australia.
History
Campus location
AustraliaPrincipal supervisor
Gholamreza HaffariYear of Award
2015Department, School or Centre
Information Technology (Monash University Clayton)Additional Institution or Organisation
Indian Institute of Technology BombayDegree Type
DOCTORATEFaculty
Faculty of Information TechnologyUsage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC