Low Data Visual Question Answering
Visual question answering (VQA) is the problem of understanding rich image contexts and answering complex natural language questions about them. VQA models have recently achieved remarkable results when training on large-scale labeled datasets. However, annotating large amounts of data is not feasible in many domains. In this thesis, we address the problem of VQA in low labeled data regimes, which is under-explored in the literature. We leverage natural language's inherent compositional properties to break down the complex questions and learn the sub-questions which are easy to understand. We propose four different approaches to learn sub-questions and provide a strong foundation for learning to answer complex questions with low data. Our results demonstrate significant improvements over the baselines.