Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer

Wenjuan Liu; Xi Zhang; Han Lv; Jia Li; Yawen Liu; Zhenghan Yang; Xutao Weng; Yucong Lin; Hong Song; Zhenchang Wang

doi:10.3389/fonc.2022.913806

Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer

Wenjuan Liu, Xi Zhang, Han Lv^*, Jia Li, Yawen Liu, Zhenghan Yang, Xutao Weng, Yucong Lin, Hong Song^*, Zhenchang Wang^*

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Background: Medical imaging is critical in clinical practice, and high value radiological reports can positively assist clinicians. However, there is a lack of methods for determining the value of reports. Objective: The purpose of this study was to establish an ensemble learning classification model using natural language processing (NLP) applied to the Chinese free text of radiological reports to determine their value for liver lesion detection in patients with colorectal cancer (CRC). Methods: Radiological reports of upper abdominal computed tomography (CT) and magnetic resonance imaging (MRI) were divided into five categories according to the results of liver lesion detection in patients with CRC. The NLP methods including word segmentation, stop word removal, and n-gram language model establishment were applied for each dataset. Then, a word-bag model was built, high-frequency words were selected as features, and an ensemble learning classification model was constructed. Several machine learning methods were applied, including logistic regression (LR), random forest (RF), and so on. We compared the accuracy between priori choosing pertinent word strings and our machine language methodologies. Results: The dataset of 2790 patients included CT without contrast (10.2%), CT with/without contrast (73.3%), MRI without contrast (1.8%), and MRI with/without contrast (14.6%). The ensemble learning classification model determined the value of reports effectively, reaching 95.91% in the CT with/without contrast dataset using XGBoost. The logistic regression, random forest, and support vector machine also achieved good classification accuracy, reaching 95.89%, 95.04%, and 95.00% respectively. The results of XGBoost were visualized using a confusion matrix. The numbers of errors in categories I, II and V were very small. ELI5 was used to select important words for each category. Words such as “no abnormality”, “suggest”, “fatty liver”, and “transfer” showed a relatively large degree of positive correlation with classification accuracy. The accuracy based on string pattern search method model was lower than that of machine learning. Conclusions: The learning classification model based on NLP was an effective tool for determining the value of radiological reports focused on liver lesions. The study made it possible to analyze the value of medical imaging examinations on a large scale.

源语言	英语
文章编号	913806
期刊	Frontiers in Oncology
卷	12
DOI	https://doi.org/10.3389/fonc.2022.913806
出版状态	已出版 - 21 11月 2022

联合国可持续发展目标

此成果有助于实现下列可持续发展目标：

访问文件

10.3389/fonc.2022.913806

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{364dd6db563748fe8382e01039ebd35f,

title = "Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer",

abstract = "Background: Medical imaging is critical in clinical practice, and high value radiological reports can positively assist clinicians. However, there is a lack of methods for determining the value of reports. Objective: The purpose of this study was to establish an ensemble learning classification model using natural language processing (NLP) applied to the Chinese free text of radiological reports to determine their value for liver lesion detection in patients with colorectal cancer (CRC). Methods: Radiological reports of upper abdominal computed tomography (CT) and magnetic resonance imaging (MRI) were divided into five categories according to the results of liver lesion detection in patients with CRC. The NLP methods including word segmentation, stop word removal, and n-gram language model establishment were applied for each dataset. Then, a word-bag model was built, high-frequency words were selected as features, and an ensemble learning classification model was constructed. Several machine learning methods were applied, including logistic regression (LR), random forest (RF), and so on. We compared the accuracy between priori choosing pertinent word strings and our machine language methodologies. Results: The dataset of 2790 patients included CT without contrast (10.2%), CT with/without contrast (73.3%), MRI without contrast (1.8%), and MRI with/without contrast (14.6%). The ensemble learning classification model determined the value of reports effectively, reaching 95.91% in the CT with/without contrast dataset using XGBoost. The logistic regression, random forest, and support vector machine also achieved good classification accuracy, reaching 95.89%, 95.04%, and 95.00% respectively. The results of XGBoost were visualized using a confusion matrix. The numbers of errors in categories I, II and V were very small. ELI5 was used to select important words for each category. Words such as “no abnormality”, “suggest”, “fatty liver”, and “transfer” showed a relatively large degree of positive correlation with classification accuracy. The accuracy based on string pattern search method model was lower than that of machine learning. Conclusions: The learning classification model based on NLP was an effective tool for determining the value of radiological reports focused on liver lesions. The study made it possible to analyze the value of medical imaging examinations on a large scale.",

keywords = "classification model, colorectal cancer, liver lesion, medical imaging report, natural language processing",

author = "Wenjuan Liu and Xi Zhang and Han Lv and Jia Li and Yawen Liu and Zhenghan Yang and Xutao Weng and Yucong Lin and Hong Song and Zhenchang Wang",

note = "Publisher Copyright: Copyright {\textcopyright} 2022 Liu, Zhang, Lv, Li, Liu, Yang, Weng, Lin, Song and Wang.",

year = "2022",

month = nov,

day = "21",

doi = "10.3389/fonc.2022.913806",

language = "English",

volume = "12",

journal = "Frontiers in Oncology",

issn = "2234-943X",

publisher = "Frontiers Media SA",

}

TY - JOUR

T1 - Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer

AU - Liu, Wenjuan

AU - Zhang, Xi

AU - Lv, Han

AU - Li, Jia

AU - Liu, Yawen

AU - Yang, Zhenghan

AU - Weng, Xutao

AU - Lin, Yucong

AU - Song, Hong

AU - Wang, Zhenchang

PY - 2022/11/21

Y1 - 2022/11/21

N2 - Background: Medical imaging is critical in clinical practice, and high value radiological reports can positively assist clinicians. However, there is a lack of methods for determining the value of reports. Objective: The purpose of this study was to establish an ensemble learning classification model using natural language processing (NLP) applied to the Chinese free text of radiological reports to determine their value for liver lesion detection in patients with colorectal cancer (CRC). Methods: Radiological reports of upper abdominal computed tomography (CT) and magnetic resonance imaging (MRI) were divided into five categories according to the results of liver lesion detection in patients with CRC. The NLP methods including word segmentation, stop word removal, and n-gram language model establishment were applied for each dataset. Then, a word-bag model was built, high-frequency words were selected as features, and an ensemble learning classification model was constructed. Several machine learning methods were applied, including logistic regression (LR), random forest (RF), and so on. We compared the accuracy between priori choosing pertinent word strings and our machine language methodologies. Results: The dataset of 2790 patients included CT without contrast (10.2%), CT with/without contrast (73.3%), MRI without contrast (1.8%), and MRI with/without contrast (14.6%). The ensemble learning classification model determined the value of reports effectively, reaching 95.91% in the CT with/without contrast dataset using XGBoost. The logistic regression, random forest, and support vector machine also achieved good classification accuracy, reaching 95.89%, 95.04%, and 95.00% respectively. The results of XGBoost were visualized using a confusion matrix. The numbers of errors in categories I, II and V were very small. ELI5 was used to select important words for each category. Words such as “no abnormality”, “suggest”, “fatty liver”, and “transfer” showed a relatively large degree of positive correlation with classification accuracy. The accuracy based on string pattern search method model was lower than that of machine learning. Conclusions: The learning classification model based on NLP was an effective tool for determining the value of radiological reports focused on liver lesions. The study made it possible to analyze the value of medical imaging examinations on a large scale.

AB - Background: Medical imaging is critical in clinical practice, and high value radiological reports can positively assist clinicians. However, there is a lack of methods for determining the value of reports. Objective: The purpose of this study was to establish an ensemble learning classification model using natural language processing (NLP) applied to the Chinese free text of radiological reports to determine their value for liver lesion detection in patients with colorectal cancer (CRC). Methods: Radiological reports of upper abdominal computed tomography (CT) and magnetic resonance imaging (MRI) were divided into five categories according to the results of liver lesion detection in patients with CRC. The NLP methods including word segmentation, stop word removal, and n-gram language model establishment were applied for each dataset. Then, a word-bag model was built, high-frequency words were selected as features, and an ensemble learning classification model was constructed. Several machine learning methods were applied, including logistic regression (LR), random forest (RF), and so on. We compared the accuracy between priori choosing pertinent word strings and our machine language methodologies. Results: The dataset of 2790 patients included CT without contrast (10.2%), CT with/without contrast (73.3%), MRI without contrast (1.8%), and MRI with/without contrast (14.6%). The ensemble learning classification model determined the value of reports effectively, reaching 95.91% in the CT with/without contrast dataset using XGBoost. The logistic regression, random forest, and support vector machine also achieved good classification accuracy, reaching 95.89%, 95.04%, and 95.00% respectively. The results of XGBoost were visualized using a confusion matrix. The numbers of errors in categories I, II and V were very small. ELI5 was used to select important words for each category. Words such as “no abnormality”, “suggest”, “fatty liver”, and “transfer” showed a relatively large degree of positive correlation with classification accuracy. The accuracy based on string pattern search method model was lower than that of machine learning. Conclusions: The learning classification model based on NLP was an effective tool for determining the value of radiological reports focused on liver lesions. The study made it possible to analyze the value of medical imaging examinations on a large scale.

KW - classification model

KW - colorectal cancer

KW - liver lesion

KW - medical imaging report

KW - natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85143304971&partnerID=8YFLogxK

U2 - 10.3389/fonc.2022.913806

DO - 10.3389/fonc.2022.913806

M3 - Article

AN - SCOPUS:85143304971

SN - 2234-943X

VL - 12

JO - Frontiers in Oncology

JF - Frontiers in Oncology

M1 - 913806

ER -

Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer

摘要

联合国可持续发展目标

访问文件

其它文件与链接

指纹

引用此