TY - JOUR
T1 - A survey on the safety of large language model
T2 - Classification, evaluation, attribution, mitigation and prospect
AU - Huang, Heyan
AU - Li, Silin
AU - Lan, Tianwei
AU - Qiu, Yuli
AU - Liu, Zeming
AU - Yao, Jiashu
AU - Zeng, Li
AU - Shan, Yingyu
AU - Shi, Xiaoming
AU - Guo, Yuhang
N1 - Publisher Copyright:
© 2025, Author. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Large language models can provide answers comparable to human levels in multiple fields. It demonstrates a wealth of emergent capabilities in fields and tasks that have not been trained. However, at present, there are many hidden dangers in artificial intelligence system based on large language model. The artificial intelligence systems based on large language model have many potential safety hazard. For example, large language models are vulnerable to undetectable attacks, including intricately elusive ones. The content generated by those models may have problems such as illegality, leaks, hatred, bias, errors, etc. Whafs more, in practical applications, the abuse of large language models is also an important issue. The content generated by the model may cause troubles at multiple levels such as countries, social groups, and fields. This paper aims to deeply explore and classify the safety risks feced by large language models, review existing evaluation methods, study the causal mechanisms behind the safety risks, and summarizes existing solutions. Specifically, this paper identifies 10 safety risks of large language models and categorizes them into two aspects: The safety risks of the model itself and the safety risks of the generated content. What's more, this paper systematically analyzes the safety risks of the large language model itself from two perspectives of life cycle and hazard level, and introduces the methods for risk assessment of existing large language models, the causes for occurrence of safety risks of large language model and corresponding mitigation methods. The safety risk of large language models is an important issue that needs to be solved urgently.
AB - Large language models can provide answers comparable to human levels in multiple fields. It demonstrates a wealth of emergent capabilities in fields and tasks that have not been trained. However, at present, there are many hidden dangers in artificial intelligence system based on large language model. The artificial intelligence systems based on large language model have many potential safety hazard. For example, large language models are vulnerable to undetectable attacks, including intricately elusive ones. The content generated by those models may have problems such as illegality, leaks, hatred, bias, errors, etc. Whafs more, in practical applications, the abuse of large language models is also an important issue. The content generated by the model may cause troubles at multiple levels such as countries, social groups, and fields. This paper aims to deeply explore and classify the safety risks feced by large language models, review existing evaluation methods, study the causal mechanisms behind the safety risks, and summarizes existing solutions. Specifically, this paper identifies 10 safety risks of large language models and categorizes them into two aspects: The safety risks of the model itself and the safety risks of the generated content. What's more, this paper systematically analyzes the safety risks of the large language model itself from two perspectives of life cycle and hazard level, and introduces the methods for risk assessment of existing large language models, the causes for occurrence of safety risks of large language model and corresponding mitigation methods. The safety risk of large language models is an important issue that needs to be solved urgently.
KW - generated content safety
KW - large language model
KW - model safety
KW - safety classification
KW - safety research prospect
KW - safety risk attribution
KW - safety risk evaluation
KW - safety risk mitigation measures
UR - https://www.scopus.com/pages/publications/105000258892
U2 - 10.11992/tis.202401006
DO - 10.11992/tis.202401006
M3 - Article
AN - SCOPUS:105000258892
SN - 1673-4785
VL - 20
SP - 2
EP - 32
JO - CAAI Transactions on Intelligent Systems
JF - CAAI Transactions on Intelligent Systems
IS - 1
ER -