TY - GEN
T1 - Who is the Writer? Identifying the Generative Model by Writing Style
AU - Yan, Jiawen
AU - Zhang, Baohua
AU - Cui, Wenyao
AU - Zhang, Huaping
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - In the current digital landscape, distinguishing between human-written texts and those generated by large language models (LLMs) is essential for information security and the prevention of academic fraud. Texts generated by LLMs often closely resemble high-quality human-written content, posing significant challenges for accurate identification. To tackle this, we introduce the Identify the Writer by Writing Style (IWWS) model, which integrates perplexity scores with text embeddings through feature fusion. Our innovative approach employs a similarity matrix and contrastive learning to improve the model’s ability to detect unique writing styles. Additionally, we present the HumanGenTextify dataset, which reflects real-world text generation scenarios and serves as a robust foundation for distinguishing between human and model-generated texts. Experimental results show that our IWWS model has superior performance over existing methods, achieving high accuracy in text source detection and offering insights into distinctive writing styles. In addition, our research paves the way for future advancements in automated LLMs-generated text detection and authenticity verification.
AB - In the current digital landscape, distinguishing between human-written texts and those generated by large language models (LLMs) is essential for information security and the prevention of academic fraud. Texts generated by LLMs often closely resemble high-quality human-written content, posing significant challenges for accurate identification. To tackle this, we introduce the Identify the Writer by Writing Style (IWWS) model, which integrates perplexity scores with text embeddings through feature fusion. Our innovative approach employs a similarity matrix and contrastive learning to improve the model’s ability to detect unique writing styles. Additionally, we present the HumanGenTextify dataset, which reflects real-world text generation scenarios and serves as a robust foundation for distinguishing between human and model-generated texts. Experimental results show that our IWWS model has superior performance over existing methods, achieving high accuracy in text source detection and offering insights into distinctive writing styles. In addition, our research paves the way for future advancements in automated LLMs-generated text detection and authenticity verification.
KW - Contrastive Learning
KW - Feature Fusion
KW - Large Language Model
KW - Text Detection
KW - Text Generation
UR - https://www.scopus.com/pages/publications/105009968796
U2 - 10.1007/978-981-96-6591-4_22
DO - 10.1007/978-981-96-6591-4_22
M3 - Conference contribution
AN - SCOPUS:105009968796
SN - 9789819665907
T3 - Lecture Notes in Computer Science
SP - 319
EP - 331
BT - Neural Information Processing - 31st International Conference, ICONIP 2024, Proceedings
A2 - Mahmud, Mufti
A2 - Doborjeh, Maryam
A2 - Wong, Kevin
A2 - Leung, Andrew Chi Sing
A2 - Doborjeh, Zohreh
A2 - Tanveer, M.
PB - Springer Science and Business Media Deutschland GmbH
T2 - 31st International Conference on Neural Information Processing, ICONIP 2024
Y2 - 2 December 2024 through 6 December 2024
ER -