Abstract: In recent years, massive datasets have significantly driven the advancement of visual learning such as multi-modal large model at the expense of high computational costs and extensive ...
Abstract: In video-based emotion recognition, effective multi-modal fusion techniques are essential to leverage the complementary relationship between audio and visual modalities. Recent ...