Abstract
The proliferation of artificial intelligence has heightened the need for robust intellectual property protection of machine learning models, especially in black-box settings where model internals are inaccessible. While backdoor watermarking has emerged as a promising mechanism for model verification, existing methods are predominantly designed for homogeneous data types such as images and text, leaving a critical gap for heterogeneous tabular data prevalent form in healthcare, finance, and industrial applications. This paper introduces a unified, high-fidelity backdoor watermarking framework specifically tailored for classification models trained on heterogeneous tabular data. Our framework systematically deconstructs watermarking into four core components: trigger set generation, watermark embedding, ownership verification, and security analysis. Heterogeneous tabular data presents unique challenges, including mixed discrete-continuous features and compact model architecture. To address these, we propose the DiscreteFool algorithm for generating minimal adversarial perturbations and a micro-perturbation sample selection mechanism to preserve model fidelity. Extensive experiments on five public datasets and eight model types demonstrate that our method achieves highest watermark accuracy (98.3%) with minimal degradation in primary task performance (average ∆Acc<0.018, ∆AUC<0.006), while maintaining strong robustness against removal attacks and watermark overwriting attacks. The proposed approach not only establishes a new benchmark for tabular model watermarking but also offers a generalizable framework adaptable to diverse data types and model architectures.
| Original language | English |
|---|---|
| Article number | 133646 |
| Journal | Neurocomputing |
| Volume | 687 |
| DOIs | |
| Publication status | Published - 28 Jul 2026 |
| Externally published | Yes |
Keywords
- Adversarial Samples
- Classification models
- Frontier perturbing
- High Fidelity
- Tabular data
- Watermarking
Fingerprint
Dive into the research topics of 'High-fidelity backdoor watermark embedding framework for classification models in heterogeneous tabular data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver