Monash University
Browse

Towards Efficient Inference in Video Understanding

Download (18.04 MB)
thesis
posted on 2025-05-22, 18:55 authored by Yuetian Weng
While Vision Transformers and Large Language Models have advanced video understanding, their computational demands have pose significant challenges. This thesis explores the temporal redundancy issue in video understanding tasks, developing efficient video understanding models that improve inference efficiency while maintaining competitive model performance for real-world applications.

History

Campus location

Australia

Principal supervisor

Bohan Zhuang

Additional supervisor 1

Chung-Hsing Yeh

Additional supervisor 2

Xiaojun Chang

Year of Award

2025

Department, School or Centre

Data Science & Artificial Intelligence

Course

Doctor of Philosophy

Degree Type

DOCTORATE

Faculty

Faculty of Information Technology

Usage metrics

    Faculty of Information Technology Theses

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC