TY - GEN
T1 - I Can Hear You Without a Microphone
T2 - 42nd IEEE International Conference on Computer Communications, INFOCOM 2023
AU - Cao, Yetong
AU - Li, Fan
AU - Chen, Huijie
AU - Liu, Xiaochen
AU - Duan, Chunhui
AU - Wang, Yu
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Recent literature advances motion sensors mounted on smartphones and AR/VR headsets to speech eavesdropping due to their sensitivity to subtle vibrations. The popularity of motion sensors in earphones has fueled a rise in their sampling rate, which enables various enhanced features. This paper investigates a new threat of eavesdropping via motion sensors of earphones by developing EarSpy, which builds on our observation that the earphone's accelerometer can capture bone conduction vibrations (BCVs) and ear canal dynamic motions (ECDMs) associated with speaking; they enable EarSpy to derive unique information about the wearer's speech. Leveraging a study on the motion sensor measurements captured from earphones, EarSpy gains abilities to disentangle the wearer's live speech from interference caused by body motions and vibrations generated when the earphone's speaker plays audio. To enable user-independent attacks, EarSpy involves novel efforts, including a trajectory instability reduction method to calibrate the waveform of ECDMs and a data augmentation method to enrich the diversity of BCVs. Moreover, EarSpy explores effective representations from BCVs and ECDMs, and develops a convolutional neural model with Connectionist Temporal Classification (CTC) to realize accurate speech recognition. Extensive experiments involving 14 participants demonstrate that EarSpy reaches a promising recognition for the wearer's speech.
AB - Recent literature advances motion sensors mounted on smartphones and AR/VR headsets to speech eavesdropping due to their sensitivity to subtle vibrations. The popularity of motion sensors in earphones has fueled a rise in their sampling rate, which enables various enhanced features. This paper investigates a new threat of eavesdropping via motion sensors of earphones by developing EarSpy, which builds on our observation that the earphone's accelerometer can capture bone conduction vibrations (BCVs) and ear canal dynamic motions (ECDMs) associated with speaking; they enable EarSpy to derive unique information about the wearer's speech. Leveraging a study on the motion sensor measurements captured from earphones, EarSpy gains abilities to disentangle the wearer's live speech from interference caused by body motions and vibrations generated when the earphone's speaker plays audio. To enable user-independent attacks, EarSpy involves novel efforts, including a trajectory instability reduction method to calibrate the waveform of ECDMs and a data augmentation method to enrich the diversity of BCVs. Moreover, EarSpy explores effective representations from BCVs and ECDMs, and develops a convolutional neural model with Connectionist Temporal Classification (CTC) to realize accurate speech recognition. Extensive experiments involving 14 participants demonstrate that EarSpy reaches a promising recognition for the wearer's speech.
UR - http://www.scopus.com/inward/record.url?scp=85171627993&partnerID=8YFLogxK
U2 - 10.1109/INFOCOM53939.2023.10228943
DO - 10.1109/INFOCOM53939.2023.10228943
M3 - Conference contribution
AN - SCOPUS:85171627993
T3 - Proceedings - IEEE INFOCOM
BT - INFOCOM 2023 - IEEE Conference on Computer Communications
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 17 May 2023 through 20 May 2023
ER -