ChatGPT 中文性能测评与风险应对

Translated title of the contribution: ChatGPT Performance Evaluation on Chinese Language and Risk Measures

Huaping Zhang*, Linhan Li, Chunjin Li

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

12 Citations (Scopus)

Abstract

[Objective] This paper briefly introduces the main technical innovations of ChatGPT, and evaluates the performance of ChatGPT in Chinese on four tasks using nine datasets, analyzes the risk with ChatGPT and proposes our solutions. [Methods] ChatGPT and WeLM models were tested using the ChnSentiCorp dataset, and ChatGPT and ERNIE 3.0 Titan were tested using the EPRSTMT dataset, and it was found that ChatGPT did not differ much from the large domestic models in sentiment analysis tasks. The LCSTS and TTNews datasets were used to test the ChatGPT and WeLM models, and both ChatGPT outperformed the WeLM model; CMRC2018 and DRCD were used for extractive machine reading comprehension(MRC), and the C3 dataset was used for common sense MRC, and it was found that ERNIE 3.0 Titan outperformed ChatGPT in this task. WebQA and CKBQA were used to do Chinese closed-book quiz testing, and it was found that ChatGPT was prone to make factual errors in this task, and the domestic model outperformed ChatGPT. [Results] ChatGPT performed well on classic tasks of natural language processing, such as sentiment analysis with an accuracy rate of more than 85% and a higher probability of factual errors on closed-book questions. [Limitations] The error of evaluation score may be introduced in the process of converting discriminative tasks into generative ones. This paper only evaluated ChatGPT in zero-shot case, so it is not clear how it performs in other cases. ChatGPT may be updated iteratively in subsequent releases, and the profiling results may be time-sensitive. [Conclusions] ChatGPT is powerful but still has some drawbacks, for the large model of Chinese need to be national strategy oriented and pay attention to the limitations of the language model.

Translated title of the contributionChatGPT Performance Evaluation on Chinese Language and Risk Measures
Original languageChinese (Traditional)
Pages (from-to)16-25
Number of pages10
JournalData Analysis and Knowledge Discovery
Volume7
Issue number3
DOIs
Publication statusPublished - Mar 2023

Fingerprint

Dive into the research topics of 'ChatGPT Performance Evaluation on Chinese Language and Risk Measures'. Together they form a unique fingerprint.

Cite this