Abstract
Enhancing the accuracy of emotion recognition models through multimodal learning is a common approach. However, challenges such as insufficient modal feature learning in multimodal inference and scarcity of sample data continue to pose obstacles that need to be overcome. Therefore, we propose a novel adaptive lightweight multimodal efficient feature inference network (ALME-FIN). We introduce a time-domain lightweight adaptive network (TDLAN) and a two-dimensional dynamic focusing network (TDDFN) for multimodal feature learning. The TDLAN incorporates the denoising process as an integral part of network training, achieving adaptive denoising for each sample through the continuous optimization of the trainable filtering threshold. Simultaneously, it incorporates an interactive convolutional sampling module, enabling lightweight multi-scale feature extraction in the time domain. TDDFN effectively extracts core image features while filtering out redundancies. During the training process, the Multi-network dynamic gradient adjustment framework (MDGAF) dynamically monitors the feature learning efficacy across different modalities. It timely adjusts the training gradients of networks to allocate additional optimization time for under-optimized modalities, thereby maximizing the utilization of multimodal feature information. Moreover, the introduction of a Multi-class relationship interaction module prior to the classifier aids the model in clearly understanding the relationships among different category samples. This approach enables the model to achieve relatively accurate emotion recognition even in scenarios of limited sample availability. Compared to existing multimodal learning techniques, ALME-FIN exhibits a more efficient multimodal feature inference method that can achieve satisfactory emotional recognition performance even with a limited number of samples.
Original language | English |
---|---|
Article number | 24 |
Journal | Cognitive Neurodynamics |
Volume | 19 |
Issue number | 1 |
DOIs | |
Publication status | Published - Dec 2025 |
Keywords
- ALME-FIN
- Emotion recognition
- MDGAF
- Multimodal feature
- TDLAN