A semantics-maintained differential privacy protection for high-utility text

  • Zhouting Wu*
  • , Senlin Luo
  • , Limin Pan
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Preserving utility is essential for downstream applications of privatized text data. While widely adopted differentially private mechanisms offer formal privacy guarantees, they often compromise utility. This degradation primarily stems from two issues: treating all tokens equally regardless of token's privacy-criticality, and allowing semantically irrelevant candidates to replace input tokens, leading to unnecessary semantic distortion. To address the above problems, a Utility-oriented Truncated Exponential Mechanism (UTEM) is proposed, which employs a two-stage utility enhancement strategy. First, UTEM distinguishes between sensitive and non-sensitive tokens, reducing unnecessary perturbations on non-sensitive tokens and thereby preserving the original semantics. Second, it leverages distance truncation and tail penalty mechanism to optimize the candidate set by pruning semantically irrelevant tokens, thereby further improving utility. Theoretical derivations of privacy and utility are presented, establishing formal guarantees for UTEM. Experiments on SST-2, QNLI, and ChnSentiCorp demonstrate accuracy scores of 85.09 %, 87.04 %, and 87.64 %, respectively, with corresponding defense success rates of 75.51 %, 78.40 %, and 51.09 %. On the IMDB dataset, UTEM achieves an accuracy of 85.80 %, while reducing the attack AUC to 51.22 %. Extensive Experiments on four public benchmark datasets under ten evaluation settings demonstrate that UTEM consistently outperforms state-of-the-art baselines in terms of utility while maintaining comparable privacy protection.

Original languageEnglish
Article number115478
JournalKnowledge-Based Systems
Volume338
DOIs
Publication statusPublished - 8 Apr 2026
Externally publishedYes

Keywords

  • Differential privacy
  • Privacy protection
  • Semantics-maintaining
  • Text data

Fingerprint

Dive into the research topics of 'A semantics-maintained differential privacy protection for high-utility text'. Together they form a unique fingerprint.

Cite this