Self-Supervised Speech Representation and Contextual Text Embedding for Match-Mismatch Classification with EEG Recording

Bo Wang, Xiran Xu, Zechen Zhang, Haolin Zhu, Yu Jie Yan, Xihong Wu, Jing Chen*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Relating speech to EEG holds considerable importance but is challenging. In this study, a deep convolutional network was employed to extract spatiotemporal features from EEG data. Self-supervised speech representation and contextual text embedding were used as speech features. Contrastive learning was used to relate EEG features to speech features. The experimental results demonstrate the benefits of using self-supervised speech representation and contextual text embedding. Through feature fusion and model ensemble, an accuracy of 60.29% was achieved, and the performance was ranked as No.2 in Task 1 of the Auditory EEG Challenge (ICASSP 2024). The code to implement our work is available on Github: https://github.com/bobwangPKU/EEG-Stimulus-Match-Mismatch.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages111-112
Number of pages2
ISBN (Electronic)9798350374513
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event49th IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024

Publication series

Name2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Proceedings

Conference

Conference49th IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024
Country/TerritoryKorea, Republic of
CitySeoul
Period14/04/2419/04/24

Keywords

  • Auditory EEG decoding
  • contextual text embedding
  • self-supervised speech representation

Fingerprint

Dive into the research topics of 'Self-Supervised Speech Representation and Contextual Text Embedding for Match-Mismatch Classification with EEG Recording'. Together they form a unique fingerprint.

Cite this