TY - GEN
T1 - TOSWT
T2 - 13th International Conference on Information and Education Technology, ICIET 2025
AU - Lu, Pinren
AU - Lin, Zhifeng
AU - Zhang, Lin
AU - Liu, Jiawen
AU - Qu, Shaojie
AU - Li, Kan
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - In recent years, generative large language models (LLMs) have undergone rapid development, producing content that is nearly indistinguishable from human-written text. While this advancement has found widespread application across various fields, it has also raised significant concerns among educators regarding the authenticity of student submissions. Consequently, addressing the misuse of AI-generated text (AIGT) in the educational sector has become an urgent priority. Current detection strategies primarily focus on whole documents, which do not fully satisfy practical requirements. Due to the likelihood that students may modify AI-generated content to some extent before incorporating it into their essays, fine-grained detection, particularly at the sentence level, is of paramount importance. Consequently, the task of tracing text provenance has increasingly garnered attention. In light of this, this study innovatively proposes the task of text provenance tracing within the educational domain and constructs a corresponding dataset named TOSWT (Tracing the Origins of Students' Writing Texts). This dataset, which comprises texts generated by five outstanding large language models, is based on argumentative essays written by students and contains a total of 53,328 document-level and 147,976 sentencelevel data samples. The study evaluates multiple deep learning detection models through experimental assessments on both document-level and sentence-level data. The results indicate that the task of text provenance tracing is highly challenging, with the sentence-level task proving particularly difficult.
AB - In recent years, generative large language models (LLMs) have undergone rapid development, producing content that is nearly indistinguishable from human-written text. While this advancement has found widespread application across various fields, it has also raised significant concerns among educators regarding the authenticity of student submissions. Consequently, addressing the misuse of AI-generated text (AIGT) in the educational sector has become an urgent priority. Current detection strategies primarily focus on whole documents, which do not fully satisfy practical requirements. Due to the likelihood that students may modify AI-generated content to some extent before incorporating it into their essays, fine-grained detection, particularly at the sentence level, is of paramount importance. Consequently, the task of tracing text provenance has increasingly garnered attention. In light of this, this study innovatively proposes the task of text provenance tracing within the educational domain and constructs a corresponding dataset named TOSWT (Tracing the Origins of Students' Writing Texts). This dataset, which comprises texts generated by five outstanding large language models, is based on argumentative essays written by students and contains a total of 53,328 document-level and 147,976 sentencelevel data samples. The study evaluates multiple deep learning detection models through experimental assessments on both document-level and sentence-level data. The results indicate that the task of text provenance tracing is highly challenging, with the sentence-level task proving particularly difficult.
KW - detect
KW - education
KW - large language model
KW - text provenance
KW - tracing origin
UR - https://www.scopus.com/pages/publications/105010679940
U2 - 10.1109/ICIET66371.2025.11046280
DO - 10.1109/ICIET66371.2025.11046280
M3 - Conference contribution
AN - SCOPUS:105010679940
T3 - 2025 13th International Conference on Information and Education Technology, ICIET 2025
SP - 456
EP - 461
BT - 2025 13th International Conference on Information and Education Technology, ICIET 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 April 2025 through 20 April 2025
ER -