Skip to main navigation Skip to search Skip to main content

A Novel Multimodal Method for Decoding Speech Perception from Brain Activities

  • Peking University
  • National Key Laboratory of General Artificial Intelligence

Research output: Contribution to journalConference articlepeer-review

Abstract

Decoding speech from neural recordings has critical importance in application and scientific research. However, this task is still challenging with non-invasive recordings. Previous research has shown significant improvement in speech perception decoding task by leveraging wav2vec vectors and gives the potential for applications. To further explore this problem, we proposed a novel multimodal method by using functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG). In our method, separate encoders for fMRI and MEG are considered, then features extracted from both modalities are integrated and aligned with wav2vec vectors that were extracted from the speech. The multimodal method reaches averaged performance of 72.6% in top-10 accuracy with a negative sample size of 128. Performance evaluated with various metrics achieves steady improvement across subjects, demonstrating the effectiveness of the proposed data fusion method. Interpretation of the performance increment was also investigated by testing the correlation between encoder hidden outputs and different level of features extracted from the speech. Results demonstrate that MEG encoder learns more low-level information and fMRI encoder learns more high-level information, which indicates both complementary characteristics lead to the improvement. The result of this work shows the potential of multimodal methods for speech decoding.

Keywords

  • brain-computer interface
  • data fusion
  • fMRI
  • MEG
  • speech perception decoding

Fingerprint

Dive into the research topics of 'A Novel Multimodal Method for Decoding Speech Perception from Brain Activities'. Together they form a unique fingerprint.

Cite this