Live Speech Recognition via Earphone Motion Sensors

Yetong Cao; Fan Li; Huijie Chen; Xiaochen Liu; Shengchun Zhai; Song Yang; Yu Wang

doi:10.1109/TMC.2023.3333214

Live Speech Recognition via Earphone Motion Sensors

Yetong Cao, Fan Li^*, Huijie Chen, Xiaochen Liu, Shengchun Zhai, Song Yang, Yu Wang

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Recent literature advances motion sensors mounted on smartphones and AR/VR headsets to speech eavesdropping due to their sensitivity to subtle vibrations. The popularity of motion sensors in earphones has fueled a rise in their sampling rate, which enables various enhanced features. This paper investigates a new threat of eavesdropping via motion sensors of earphones by developing EarSpy, which builds on our observation that the earphone's accelerometer can capture bone conduction vibrations (BCVs) and ear canal dynamic motions (ECDMs) associated with speaking; they enable EarSpy to derive unique information about the wearer's speech. Leveraging a study on the motion sensor measurements captured from earphones, EarSpy gains abilities to disentangle the wearer's live speech from interference caused by body motions and vibrations generated when the earphone's speaker plays audio. To enable user-independent attacks, EarSpy involves novel efforts, including a trajectory instability reduction method to calibrate the waveform of ECDMs and a data augmentation method to enrich the diversity of BCVs. Moreover, EarSpy explores effective representations from BCVs and ECDMs, and develops a neural network model with character-level and word-level speech recognition models to realize speech recognition. Extensive experiments involving 14 participants demonstrate that EarSpy reaches a promising recognition for the wearer's speech.

源语言	英语
页（从-至）	7284-7300
页数	17
期刊	IEEE Transactions on Mobile Computing
卷	23
期	6
DOI	https://doi.org/10.1109/TMC.2023.3333214
出版状态	已出版 - 1 6月 2024

访问文件

10.1109/TMC.2023.3333214

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{4ebcb24644d54d9cb44ca5159d70a197,

title = "Live Speech Recognition via Earphone Motion Sensors",

abstract = "Recent literature advances motion sensors mounted on smartphones and AR/VR headsets to speech eavesdropping due to their sensitivity to subtle vibrations. The popularity of motion sensors in earphones has fueled a rise in their sampling rate, which enables various enhanced features. This paper investigates a new threat of eavesdropping via motion sensors of earphones by developing EarSpy, which builds on our observation that the earphone's accelerometer can capture bone conduction vibrations (BCVs) and ear canal dynamic motions (ECDMs) associated with speaking; they enable EarSpy to derive unique information about the wearer's speech. Leveraging a study on the motion sensor measurements captured from earphones, EarSpy gains abilities to disentangle the wearer's live speech from interference caused by body motions and vibrations generated when the earphone's speaker plays audio. To enable user-independent attacks, EarSpy involves novel efforts, including a trajectory instability reduction method to calibrate the waveform of ECDMs and a data augmentation method to enrich the diversity of BCVs. Moreover, EarSpy explores effective representations from BCVs and ECDMs, and develops a neural network model with character-level and word-level speech recognition models to realize speech recognition. Extensive experiments involving 14 participants demonstrate that EarSpy reaches a promising recognition for the wearer's speech.",

keywords = "Earphone, motion sensor, neural network, speech recognition",

author = "Yetong Cao and Fan Li and Huijie Chen and Xiaochen Liu and Shengchun Zhai and Song Yang and Yu Wang",

note = "Publisher Copyright: {\textcopyright} 2002-2012 IEEE.",

year = "2024",

month = jun,

day = "1",

doi = "10.1109/TMC.2023.3333214",

language = "English",

volume = "23",

pages = "7284--7300",

journal = "IEEE Transactions on Mobile Computing",

issn = "1536-1233",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "6",

}

TY - JOUR

T1 - Live Speech Recognition via Earphone Motion Sensors

AU - Cao, Yetong

AU - Li, Fan

AU - Chen, Huijie

AU - Liu, Xiaochen

AU - Zhai, Shengchun

AU - Yang, Song

AU - Wang, Yu

PY - 2024/6/1

Y1 - 2024/6/1

N2 - Recent literature advances motion sensors mounted on smartphones and AR/VR headsets to speech eavesdropping due to their sensitivity to subtle vibrations. The popularity of motion sensors in earphones has fueled a rise in their sampling rate, which enables various enhanced features. This paper investigates a new threat of eavesdropping via motion sensors of earphones by developing EarSpy, which builds on our observation that the earphone's accelerometer can capture bone conduction vibrations (BCVs) and ear canal dynamic motions (ECDMs) associated with speaking; they enable EarSpy to derive unique information about the wearer's speech. Leveraging a study on the motion sensor measurements captured from earphones, EarSpy gains abilities to disentangle the wearer's live speech from interference caused by body motions and vibrations generated when the earphone's speaker plays audio. To enable user-independent attacks, EarSpy involves novel efforts, including a trajectory instability reduction method to calibrate the waveform of ECDMs and a data augmentation method to enrich the diversity of BCVs. Moreover, EarSpy explores effective representations from BCVs and ECDMs, and develops a neural network model with character-level and word-level speech recognition models to realize speech recognition. Extensive experiments involving 14 participants demonstrate that EarSpy reaches a promising recognition for the wearer's speech.

AB - Recent literature advances motion sensors mounted on smartphones and AR/VR headsets to speech eavesdropping due to their sensitivity to subtle vibrations. The popularity of motion sensors in earphones has fueled a rise in their sampling rate, which enables various enhanced features. This paper investigates a new threat of eavesdropping via motion sensors of earphones by developing EarSpy, which builds on our observation that the earphone's accelerometer can capture bone conduction vibrations (BCVs) and ear canal dynamic motions (ECDMs) associated with speaking; they enable EarSpy to derive unique information about the wearer's speech. Leveraging a study on the motion sensor measurements captured from earphones, EarSpy gains abilities to disentangle the wearer's live speech from interference caused by body motions and vibrations generated when the earphone's speaker plays audio. To enable user-independent attacks, EarSpy involves novel efforts, including a trajectory instability reduction method to calibrate the waveform of ECDMs and a data augmentation method to enrich the diversity of BCVs. Moreover, EarSpy explores effective representations from BCVs and ECDMs, and develops a neural network model with character-level and word-level speech recognition models to realize speech recognition. Extensive experiments involving 14 participants demonstrate that EarSpy reaches a promising recognition for the wearer's speech.

KW - Earphone

KW - motion sensor

KW - neural network

KW - speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85177043939&partnerID=8YFLogxK

U2 - 10.1109/TMC.2023.3333214

DO - 10.1109/TMC.2023.3333214

M3 - Article

AN - SCOPUS:85177043939

SN - 1536-1233

VL - 23

SP - 7284

EP - 7300

JO - IEEE Transactions on Mobile Computing

JF - IEEE Transactions on Mobile Computing

IS - 6

ER -

Live Speech Recognition via Earphone Motion Sensors

摘要

访问文件

其它文件与链接

指纹

引用此