TY - GEN
T1 - 面向心理健康咨询的藏语数据集及大语言模型构建
AU - Zhu, Mengxiao
AU - Shajiu,
AU - Feng, Chong
N1 - Publisher Copyright:
© 2024 China National Conference on Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Anxiety and depression have become prevalent psychological disorders, and moderate counselling plays a critical role in alleviating mental and psychological stress. However, due to reasons such as the sense of shame, many individuals do not receive timely counseling and treatment. With the advancement of artificial intelligence, large language models (LLMs) with their superior abilities in knowledge integration and cognitive chaining have become effective tools for psychological counseling. Nevertheless, existing psychological health LLMs are primarily focused on resource-rich languages like English and Chinese, with limited research on their application in low-resource languages. This paper focuses on Tibetan, a representative low-resource language, to explore the construction of Tibetan psychological counseling datasets and Tibetan psychological health LLMs. Initially, we collect high-quality Chinese psychological counseling dialogue data, process it, and create a multi-turn dialogue dataset for mental health; subsequently, we develop a Chinese-Tibetan translation tool to translate this into Tibetan, using multiple mechanisms to filter and produce high-quality Tibetan psychological health multi-turn dialogue data. Utilizing the constructed data, we fine-tune existing general LLMs, Baichuan2 and LLaMA2, to develop a Tibetan psychological health LLM, which will be open-sourced for scientific research. Finally, experiments validate the effectiveness of the released Tibetan psychological health multi-turn dialogue dataset and the Tibetan psychological health counseling LLM.
AB - Anxiety and depression have become prevalent psychological disorders, and moderate counselling plays a critical role in alleviating mental and psychological stress. However, due to reasons such as the sense of shame, many individuals do not receive timely counseling and treatment. With the advancement of artificial intelligence, large language models (LLMs) with their superior abilities in knowledge integration and cognitive chaining have become effective tools for psychological counseling. Nevertheless, existing psychological health LLMs are primarily focused on resource-rich languages like English and Chinese, with limited research on their application in low-resource languages. This paper focuses on Tibetan, a representative low-resource language, to explore the construction of Tibetan psychological counseling datasets and Tibetan psychological health LLMs. Initially, we collect high-quality Chinese psychological counseling dialogue data, process it, and create a multi-turn dialogue dataset for mental health; subsequently, we develop a Chinese-Tibetan translation tool to translate this into Tibetan, using multiple mechanisms to filter and produce high-quality Tibetan psychological health multi-turn dialogue data. Utilizing the constructed data, we fine-tune existing general LLMs, Baichuan2 and LLaMA2, to develop a Tibetan psychological health LLM, which will be open-sourced for scientific research. Finally, experiments validate the effectiveness of the released Tibetan psychological health multi-turn dialogue dataset and the Tibetan psychological health counseling LLM.
KW - Large language model
KW - Psychological health support
KW - Tibetan
UR - http://www.scopus.com/inward/record.url?scp=105001922494&partnerID=8YFLogxK
M3 - 会议稿件
AN - SCOPUS:105001922494
T3 - CCL 2024 - 23rd Chinese National Conference on Computational Linguistics
SP - 326
EP - 339
BT - Main Conference
A2 - Sun, Maosong
A2 - Liang, Jiye
A2 - Han, Xianpei
A2 - Liu, Zhiyuan
A2 - He, Yulan
PB - Chinese National Conference on Computational Linguistic (CCL)
T2 - 23rd Chinese National Conference on Computational Linguistics, CCL 2024
Y2 - 24 July 2024 through 28 July 2024
ER -