posted on 2025-05-21, 06:54authored byLinh Ngoc Vu
This research focuses on novel approaches for speech-based emotion recognition (SER). SER technologies can be applied in various contexts, such as assessing customer satisfaction in call centers, tracking personal moods, and monitoring emotions in healthcare settings. Numerous machine learning methods have been proposed, ranging from traditional feature-based models to end-to-end interpretable neural networks and self-supervised learning techniques. These methods have produced explainable representations and identifiable features related to vocal cues, which we refer to as paralinguistic representations (i.e., beyond linguistics).
By incorporating a pre-trained paralinguistic representation, our method achieved accuracy comparable to state-of-the-art techniques while maintaining high efficiency. A detailed analysis of errors and metadata indicated that our proposed method reduces gender bias and generalizes well to unseen speakers and spontaneous emotions, extending beyond recordings of scripted utterances.
History
Campus location
Malaysia
Principal supervisor
Lim Wern Han
Additional supervisor 1
Prof. Raphael Phan
Additional supervisor 2
Prof. Dinh Phung
Year of Award
2025
Department, School or Centre
School of Information Technology (Monash University Malaysia)