Abstract
With the continuous development of multimodal learning, emotion recognition using multimodal physiological signals has become a research hotspot. Studies have shown that combining electroencephalogram (EEG) signals and eye movements can significantly improve the results of emotion recognition. However, current research still faces the following challenges: (1) Individuals’ response times and durations to different emotions vary, leading to data diversity and variability. (2) Different modalities exhibit spatiotemporal discrepancies, which may result in varying semantic relevance and significance under the same spatiotemporal conditions. To address these challenges, we propose a Regional to Global Fusion Network with a Spatial-Temporal Semantic Alignment Mechanism (R2GFANet). Initially, R2GFANet addresses the first challenge by employing padding masks in conjunction with a 1D-CNN network to encode temporal semantic information from variable-length EEG signals and eye movements. Subsequently, R2GFANet leverages a Multi-Region Cross-Modal Attention mechanism for parallel temporal semantic alignment within each brain region and applies region-level spatial attention to highlight the semantic information of critical brain regions, effectively addressing spatiotemporal discrepancies across modalities. By comparing our method with numerous state-of-the-art approaches on two public datasets, SEED-IV and SEED-V, we demonstrate the outstanding performance and statistical significance of the proposed R2GFANet. Additionally, we conduct ablation studies and visualization analyses. The results indicate that aligning EEG signals with eye movements not only improves classification performance but also provides neuroscientific interpretability.
| Original language | English |
|---|---|
| Article number | 104224 |
| Journal | Information Fusion |
| Volume | 132 |
| DOIs | |
| Publication status | Published - Aug 2026 |
Keywords
- Emotion recognition
- Multimodal fusion
- Multimodal physiological signals
- Semantic alignment
Fingerprint
Dive into the research topics of 'Emotion recognition using multimodal physiological signals through regional to global fusion with a spatial-temporal semantic alignment mechanism'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver