ChatGPT 中文性能测评与风险应对

Huaping Zhang*, Linhan Li, Chunjin Li

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

12 引用 (Scopus)

摘要

[Objective] This paper briefly introduces the main technical innovations of ChatGPT, and evaluates the performance of ChatGPT in Chinese on four tasks using nine datasets, analyzes the risk with ChatGPT and proposes our solutions. [Methods] ChatGPT and WeLM models were tested using the ChnSentiCorp dataset, and ChatGPT and ERNIE 3.0 Titan were tested using the EPRSTMT dataset, and it was found that ChatGPT did not differ much from the large domestic models in sentiment analysis tasks. The LCSTS and TTNews datasets were used to test the ChatGPT and WeLM models, and both ChatGPT outperformed the WeLM model; CMRC2018 and DRCD were used for extractive machine reading comprehension(MRC), and the C3 dataset was used for common sense MRC, and it was found that ERNIE 3.0 Titan outperformed ChatGPT in this task. WebQA and CKBQA were used to do Chinese closed-book quiz testing, and it was found that ChatGPT was prone to make factual errors in this task, and the domestic model outperformed ChatGPT. [Results] ChatGPT performed well on classic tasks of natural language processing, such as sentiment analysis with an accuracy rate of more than 85% and a higher probability of factual errors on closed-book questions. [Limitations] The error of evaluation score may be introduced in the process of converting discriminative tasks into generative ones. This paper only evaluated ChatGPT in zero-shot case, so it is not clear how it performs in other cases. ChatGPT may be updated iteratively in subsequent releases, and the profiling results may be time-sensitive. [Conclusions] ChatGPT is powerful but still has some drawbacks, for the large model of Chinese need to be national strategy oriented and pay attention to the limitations of the language model.

投稿的翻译标题ChatGPT Performance Evaluation on Chinese Language and Risk Measures
源语言繁体中文
页(从-至)16-25
页数10
期刊Data Analysis and Knowledge Discovery
7
3
DOI
出版状态已出版 - 3月 2023

关键词

  • Artificial Intelligence
  • ChatGPT
  • Language Model

指纹

探究 'ChatGPT 中文性能测评与风险应对' 的科研主题。它们共同构成独一无二的指纹。

引用此