Monash University
Final Thesis.pdf (6.07 MB)

Low Data Visual Question Answering

Download (6.07 MB)
posted on 2022-07-27, 02:00 authored by Narjes Askarian

Visual question answering (VQA) is the problem of understanding rich image contexts and answering complex natural language questions about them. VQA models have recently achieved remarkable results when training on large-scale labeled datasets. However, annotating large amounts of data is not feasible in many domains. In this thesis, we address the problem of VQA in low labeled data regimes, which is under-explored in the literature. We leverage natural language's inherent compositional properties to break down the complex questions and learn the sub-questions which are easy to understand. We propose four different approaches to learn sub-questions and provide a strong foundation for learning to answer complex questions with low data. Our results demonstrate significant improvements over the baselines.


Campus location


Principal supervisor

Gholamreza Haffari

Additional supervisor 1

Wray Buntine

Additional supervisor 2

Ingrid Zukerman

Additional supervisor 3

Sarvnaz Karimi

Year of Award


Department, School or Centre

Data Science & Artificial Intelligence


Doctor of Philosophy

Degree Type



Faculty of Information Technology