SepVAMark: Deep Separable Visual-Audio Fusion Watermarking for Source Tracing and Deepfake Detection

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Visual-audio Deepfake has become increasingly prevalent in today's online environment. Passive detection methods, lacking preventive measures, struggle with detecting unknown forgery techniques, limiting their effectiveness. While proactive detection methods offer greater robustness, unimodal watermarking approaches remain vulnerable in visual-audio Deepfake scenarios, posing challenges to reliable forensics. To address these challenges, we propose a novel Separable Visual-Audio waterMark framework, called SepVAMark, for proactive Deepfake detection. SepVAMark incorporates a multi-layer perceptron-based mixer layer to fuse intra-modality and inter-modality features from both audio and visual data. We introduce the concept of separable visual-audio watermark, along with a bimodal robust extractor for traceability and two unimodal semi-robust extractors for Deepfake detection. This design ensures reliable copyright protection for source audio-video content while enabling authenticity verification for redistributed content. Experimental results on the FakeAVCeleb dataset demonstrate that SepVAMark effectively detects a wide range of advanced Deepfake manipulations, outperforming existing single-modal and multi-modal watermarking methods with superior robustness.

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages8910-8919
Number of pages10
ISBN (Electronic)9798400720352
DOIs
Publication statusPublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Keywords

  • deep watermarking
  • deepfake forensics
  • visual-audio

Fingerprint

Dive into the research topics of 'SepVAMark: Deep Separable Visual-Audio Fusion Watermarking for Source Tracing and Deepfake Detection'. Together they form a unique fingerprint.

Cite this