生成式大语言模型在中文放射医学领域的应用研究

Longfei Chen; Xin Gao; Haotian Hou; Chuyang Ye; Ya'ou Liu; Meihui Zhang

doi:10.3778/j.issn.1673-9418.2406041

生成式大语言模型在中文放射医学领域的应用研究

Translated title of the contribution: Application of Generative Large Language Models in Chinese Radiology Domain

Longfei Chen, Xin Gao, Haotian Hou, Chuyang Ye, Ya'ou Liu, Meihui Zhang^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

In the Chinese radiology domain, radiology reports serve as a crucial basis for clinical decision-making. Therefore, utilizing natural language processing (NLP) technology to understand and learn from the textual content of radiology reports, thereby aiding radiological clinical work, has become an important research direction in this domain. However, when dealing with the natural language classification and generation tasks based on Chinese radiology reports using traditional methods, there are still challenges such as a lack of training corpora, privacy concerns, and poor model generalization capabilities, leading to insufficient overall performance. To address these issues, a solution for natural language tasks in the Chinese radiology domain based on locally efficient fine-tuning large language models is proposed. By collecting and constructing a large-scale, high-quality dataset for natural language tasks in the Chinese radiology reports, and employing the LoRA efficient fine-tuning method for supervised fine-tuning training of the open-source large language model Baichuan2, the“RadGPT”capable of solving four types of clinical tasks in the Chinese radiology domain simultaneously is proposed. A set of evaluation systems for natural language classification and generation tasks in the Chinese radiology domain is introduced. Multiple sets of experiments are conducted on three types of radiology report datasets from two centers, and comparisons are made with several typical existing methods. The results demonstrate that the proposed method performs better in terms of classification performance, text summarization and expansion capabilities, and model generalization.

Translated title of the contribution	Application of Generative Large Language Models in Chinese Radiology Domain
Original language	Chinese (Traditional)
Pages (from-to)	2337-2348
Number of pages	12
Journal	Journal of Frontiers of Computer Science and Technology
Volume	18
Issue number	9
DOIs	https://doi.org/10.3778/j.issn.1673-9418.2406041
Publication status	Published - 1 Sept 2024

Access to Document

10.3778/j.issn.1673-9418.2406041

Cite this

Chen, L., Gao, X., Hou, H., Ye, C., Liu, Y., & Zhang, M. (2024). 生成式大语言模型在中文放射医学领域的应用研究. Journal of Frontiers of Computer Science and Technology, 18(9), 2337-2348. https://doi.org/10.3778/j.issn.1673-9418.2406041

@article{143f082dde22410cb9648edf10b96b24,

title = "生成式大语言模型在中文放射医学领域的应用研究",

abstract = "In the Chinese radiology domain, radiology reports serve as a crucial basis for clinical decision-making. Therefore, utilizing natural language processing (NLP) technology to understand and learn from the textual content of radiology reports, thereby aiding radiological clinical work, has become an important research direction in this domain. However, when dealing with the natural language classification and generation tasks based on Chinese radiology reports using traditional methods, there are still challenges such as a lack of training corpora, privacy concerns, and poor model generalization capabilities, leading to insufficient overall performance. To address these issues, a solution for natural language tasks in the Chinese radiology domain based on locally efficient fine-tuning large language models is proposed. By collecting and constructing a large-scale, high-quality dataset for natural language tasks in the Chinese radiology reports, and employing the LoRA efficient fine-tuning method for supervised fine-tuning training of the open-source large language model Baichuan2, the“RadGPT”capable of solving four types of clinical tasks in the Chinese radiology domain simultaneously is proposed. A set of evaluation systems for natural language classification and generation tasks in the Chinese radiology domain is introduced. Multiple sets of experiments are conducted on three types of radiology report datasets from two centers, and comparisons are made with several typical existing methods. The results demonstrate that the proposed method performs better in terms of classification performance, text summarization and expansion capabilities, and model generalization.",

keywords = "efficient fine-tuning strategy, large language model, radiology report, text classification, text generation",

author = "Longfei Chen and Xin Gao and Haotian Hou and Chuyang Ye and Ya'ou Liu and Meihui Zhang",

year = "2024",

month = sep,

day = "1",

doi = "10.3778/j.issn.1673-9418.2406041",

language = "繁体中文",

volume = "18",

pages = "2337--2348",

journal = "Journal of Frontiers of Computer Science and Technology",

issn = "1673-9418",

publisher = "Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press",

number = "9",

}

TY - JOUR

T1 - 生成式大语言模型在中文放射医学领域的应用研究

AU - Chen, Longfei

AU - Gao, Xin

AU - Hou, Haotian

AU - Ye, Chuyang

AU - Liu, Ya'ou

AU - Zhang, Meihui

PY - 2024/9/1

Y1 - 2024/9/1

N2 - In the Chinese radiology domain, radiology reports serve as a crucial basis for clinical decision-making. Therefore, utilizing natural language processing (NLP) technology to understand and learn from the textual content of radiology reports, thereby aiding radiological clinical work, has become an important research direction in this domain. However, when dealing with the natural language classification and generation tasks based on Chinese radiology reports using traditional methods, there are still challenges such as a lack of training corpora, privacy concerns, and poor model generalization capabilities, leading to insufficient overall performance. To address these issues, a solution for natural language tasks in the Chinese radiology domain based on locally efficient fine-tuning large language models is proposed. By collecting and constructing a large-scale, high-quality dataset for natural language tasks in the Chinese radiology reports, and employing the LoRA efficient fine-tuning method for supervised fine-tuning training of the open-source large language model Baichuan2, the“RadGPT”capable of solving four types of clinical tasks in the Chinese radiology domain simultaneously is proposed. A set of evaluation systems for natural language classification and generation tasks in the Chinese radiology domain is introduced. Multiple sets of experiments are conducted on three types of radiology report datasets from two centers, and comparisons are made with several typical existing methods. The results demonstrate that the proposed method performs better in terms of classification performance, text summarization and expansion capabilities, and model generalization.

AB - In the Chinese radiology domain, radiology reports serve as a crucial basis for clinical decision-making. Therefore, utilizing natural language processing (NLP) technology to understand and learn from the textual content of radiology reports, thereby aiding radiological clinical work, has become an important research direction in this domain. However, when dealing with the natural language classification and generation tasks based on Chinese radiology reports using traditional methods, there are still challenges such as a lack of training corpora, privacy concerns, and poor model generalization capabilities, leading to insufficient overall performance. To address these issues, a solution for natural language tasks in the Chinese radiology domain based on locally efficient fine-tuning large language models is proposed. By collecting and constructing a large-scale, high-quality dataset for natural language tasks in the Chinese radiology reports, and employing the LoRA efficient fine-tuning method for supervised fine-tuning training of the open-source large language model Baichuan2, the“RadGPT”capable of solving four types of clinical tasks in the Chinese radiology domain simultaneously is proposed. A set of evaluation systems for natural language classification and generation tasks in the Chinese radiology domain is introduced. Multiple sets of experiments are conducted on three types of radiology report datasets from two centers, and comparisons are made with several typical existing methods. The results demonstrate that the proposed method performs better in terms of classification performance, text summarization and expansion capabilities, and model generalization.

KW - efficient fine-tuning strategy

KW - large language model

KW - radiology report

KW - text classification

KW - text generation

UR - http://www.scopus.com/inward/record.url?scp=85203299216&partnerID=8YFLogxK

U2 - 10.3778/j.issn.1673-9418.2406041

DO - 10.3778/j.issn.1673-9418.2406041

M3 - 文章

AN - SCOPUS:85203299216

SN - 1673-9418

VL - 18

SP - 2337

EP - 2348

JO - Journal of Frontiers of Computer Science and Technology

JF - Journal of Frontiers of Computer Science and Technology

IS - 9

ER -

生成式大语言模型在中文放射医学领域的应用研究

Abstract

Access to Document

Other files and links

Fingerprint

Cite this