TY - GEN
T1 - SepVAMark
T2 - 33rd ACM International Conference on Multimedia, MM 2025
AU - Zhang, Chuan
AU - Li, Zihan
AU - Xu, Zihao
AU - Ren, Xuhao
AU - Zhu, Liehuang
N1 - Publisher Copyright:
© 2025 ACM.
PY - 2025/10/27
Y1 - 2025/10/27
N2 - Visual-audio Deepfake has become increasingly prevalent in today's online environment. Passive detection methods, lacking preventive measures, struggle with detecting unknown forgery techniques, limiting their effectiveness. While proactive detection methods offer greater robustness, unimodal watermarking approaches remain vulnerable in visual-audio Deepfake scenarios, posing challenges to reliable forensics. To address these challenges, we propose a novel Separable Visual-Audio waterMark framework, called SepVAMark, for proactive Deepfake detection. SepVAMark incorporates a multi-layer perceptron-based mixer layer to fuse intra-modality and inter-modality features from both audio and visual data. We introduce the concept of separable visual-audio watermark, along with a bimodal robust extractor for traceability and two unimodal semi-robust extractors for Deepfake detection. This design ensures reliable copyright protection for source audio-video content while enabling authenticity verification for redistributed content. Experimental results on the FakeAVCeleb dataset demonstrate that SepVAMark effectively detects a wide range of advanced Deepfake manipulations, outperforming existing single-modal and multi-modal watermarking methods with superior robustness.
AB - Visual-audio Deepfake has become increasingly prevalent in today's online environment. Passive detection methods, lacking preventive measures, struggle with detecting unknown forgery techniques, limiting their effectiveness. While proactive detection methods offer greater robustness, unimodal watermarking approaches remain vulnerable in visual-audio Deepfake scenarios, posing challenges to reliable forensics. To address these challenges, we propose a novel Separable Visual-Audio waterMark framework, called SepVAMark, for proactive Deepfake detection. SepVAMark incorporates a multi-layer perceptron-based mixer layer to fuse intra-modality and inter-modality features from both audio and visual data. We introduce the concept of separable visual-audio watermark, along with a bimodal robust extractor for traceability and two unimodal semi-robust extractors for Deepfake detection. This design ensures reliable copyright protection for source audio-video content while enabling authenticity verification for redistributed content. Experimental results on the FakeAVCeleb dataset demonstrate that SepVAMark effectively detects a wide range of advanced Deepfake manipulations, outperforming existing single-modal and multi-modal watermarking methods with superior robustness.
KW - deep watermarking
KW - deepfake forensics
KW - visual-audio
UR - https://www.scopus.com/pages/publications/105024069575
U2 - 10.1145/3746027.3755783
DO - 10.1145/3746027.3755783
M3 - Conference contribution
AN - SCOPUS:105024069575
T3 - MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
SP - 8910
EP - 8919
BT - MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PB - Association for Computing Machinery, Inc
Y2 - 27 October 2025 through 31 October 2025
ER -