Monash University
Browse

Low Data Visual Question Answering

Download (6.07 MB)
thesis
posted on 2022-07-27, 02:00 authored by Narjes Askarian

Visual question answering (VQA) is the problem of understanding rich image contexts and answering complex natural language questions about them. VQA models have recently achieved remarkable results when training on large-scale labeled datasets. However, annotating large amounts of data is not feasible in many domains. In this thesis, we address the problem of VQA in low labeled data regimes, which is under-explored in the literature. We leverage natural language's inherent compositional properties to break down the complex questions and learn the sub-questions which are easy to understand. We propose four different approaches to learn sub-questions and provide a strong foundation for learning to answer complex questions with low data. Our results demonstrate significant improvements over the baselines.

History

Campus location

Australia

Principal supervisor

Gholamreza Haffari

Additional supervisor 1

Wray Buntine

Additional supervisor 2

Ingrid Zukerman

Additional supervisor 3

Sarvnaz Karimi

Year of Award

2022

Department, School or Centre

Data Science & Artificial Intelligence

Course

Doctor of Philosophy

Degree Type

DOCTORATE

Faculty

Faculty of Information Technology

Usage metrics

    Faculty of Information Technology Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC