Abstract
Preserving utility is essential for downstream applications of privatized text data. While widely adopted differentially private mechanisms offer formal privacy guarantees, they often compromise utility. This degradation primarily stems from two issues: treating all tokens equally regardless of token's privacy-criticality, and allowing semantically irrelevant candidates to replace input tokens, leading to unnecessary semantic distortion. To address the above problems, a Utility-oriented Truncated Exponential Mechanism (UTEM) is proposed, which employs a two-stage utility enhancement strategy. First, UTEM distinguishes between sensitive and non-sensitive tokens, reducing unnecessary perturbations on non-sensitive tokens and thereby preserving the original semantics. Second, it leverages distance truncation and tail penalty mechanism to optimize the candidate set by pruning semantically irrelevant tokens, thereby further improving utility. Theoretical derivations of privacy and utility are presented, establishing formal guarantees for UTEM. Experiments on SST-2, QNLI, and ChnSentiCorp demonstrate accuracy scores of 85.09 %, 87.04 %, and 87.64 %, respectively, with corresponding defense success rates of 75.51 %, 78.40 %, and 51.09 %. On the IMDB dataset, UTEM achieves an accuracy of 85.80 %, while reducing the attack AUC to 51.22 %. Extensive Experiments on four public benchmark datasets under ten evaluation settings demonstrate that UTEM consistently outperforms state-of-the-art baselines in terms of utility while maintaining comparable privacy protection.
| Original language | English |
|---|---|
| Article number | 115478 |
| Journal | Knowledge-Based Systems |
| Volume | 338 |
| DOIs | |
| Publication status | Published - 8 Apr 2026 |
| Externally published | Yes |
Keywords
- Differential privacy
- Privacy protection
- Semantics-maintaining
- Text data
Fingerprint
Dive into the research topics of 'A semantics-maintained differential privacy protection for high-utility text'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver