TY - JOUR
T1 - Robust AI generated text detection through multi-grained latent feature denoising and contrastive representation learning
AU - Liu, Xin
AU - Wang, Shuo
AU - Li, Yang
AU - Li, Kan
N1 - Publisher Copyright:
© The Author(s) 2026
PY - 2026
Y1 - 2026
N2 - As large language models (LLMs) evolve rapidly, distinguishing AI-generated text (AIGT) from human-written text (HWT) is becoming increasingly challenging. Recently, some AIGT detectors have been developed to overcome this challenge and have achieved decent accuracy. However, their brittle text representations make them highly susceptible to text perturbations, such that even minor character-level perturbations can reverse their predictions. In this work, we propose a multi-grained latent feature denoising and contrastive representation learning architecture to enhance text representations in terms of granularity, robustness, and distinguishability of features, thereby achieving robust AIGT detection. Specifically, we first extract both document-level and fine-grained segment-level features using a dual network, which captures the global and subtle local differences between AIGT and HWT. To encourage feature stability under perturbations, we inject random noise into both latent features and employ a denoising network to reconstruct the original representations. While this does not precisely simulate discrete character-level perturbations, it acts as a feature-level regularizer that suppresses non-essential variations and promotes smoother, more stable representations. Considering the similarities between AIGT and HWT, we further design a contrastive augmentation mechanism to increase the distinguishability between them. Extensive experiments demonstrate that our method not only outperforms baseline models in terms of classification accuracy but also exhibits superior robustness against various text perturbations.
AB - As large language models (LLMs) evolve rapidly, distinguishing AI-generated text (AIGT) from human-written text (HWT) is becoming increasingly challenging. Recently, some AIGT detectors have been developed to overcome this challenge and have achieved decent accuracy. However, their brittle text representations make them highly susceptible to text perturbations, such that even minor character-level perturbations can reverse their predictions. In this work, we propose a multi-grained latent feature denoising and contrastive representation learning architecture to enhance text representations in terms of granularity, robustness, and distinguishability of features, thereby achieving robust AIGT detection. Specifically, we first extract both document-level and fine-grained segment-level features using a dual network, which captures the global and subtle local differences between AIGT and HWT. To encourage feature stability under perturbations, we inject random noise into both latent features and employ a denoising network to reconstruct the original representations. While this does not precisely simulate discrete character-level perturbations, it acts as a feature-level regularizer that suppresses non-essential variations and promotes smoother, more stable representations. Considering the similarities between AIGT and HWT, we further design a contrastive augmentation mechanism to increase the distinguishability between them. Extensive experiments demonstrate that our method not only outperforms baseline models in terms of classification accuracy but also exhibits superior robustness against various text perturbations.
KW - AI-generated text detection
KW - contrastive representation learning
KW - latent feature denoising
KW - model robustness
UR - https://www.scopus.com/pages/publications/105036707985
U2 - 10.1177/1088467X261441866
DO - 10.1177/1088467X261441866
M3 - Article
AN - SCOPUS:105036707985
SN - 1088-467X
JO - Intelligent Data Analysis
JF - Intelligent Data Analysis
ER -