TY - JOUR
T1 - Research on eight machine learning algorithms applicability on different characteristics data sets in medical classification tasks
AU - Zhang, Yiyan
AU - Li, Qin
AU - Xin, Yi
N1 - Publisher Copyright:
Copyright © 2024 Zhang, Li and Xin.
PY - 2024
Y1 - 2024
N2 - With the vigorous development of data mining field, more and more algorithms have been proposed or improved. How to quickly select a data mining algorithm that is suitable for data sets in medical field is a challenge for some medical workers. The purpose of this paper is to study the comparative characteristics of the general medical data set and the general data sets in other fields, and find the applicability rules of the data mining algorithm suitable for the characteristics of the current research data set. The study quantified characteristics of the research data set with 26 indicators, including simple indicators, statistical indicators and information theory indicators. Eight machine learning algorithms with high maturity, low user involvement and strong family representation were selected as the base algorithms. The algorithm performances were evaluated by three aspects: prediction accuracy, running speed and memory consumption. By constructing decision tree and stepwise regression model to learn the above metadata, the algorithm applicability knowledge of medical data set is obtained. Through cross-verification, the accuracy of all the algorithm applicability prediction models is above 75%, which proves the validity and feasibility of the applicability knowledge.
AB - With the vigorous development of data mining field, more and more algorithms have been proposed or improved. How to quickly select a data mining algorithm that is suitable for data sets in medical field is a challenge for some medical workers. The purpose of this paper is to study the comparative characteristics of the general medical data set and the general data sets in other fields, and find the applicability rules of the data mining algorithm suitable for the characteristics of the current research data set. The study quantified characteristics of the research data set with 26 indicators, including simple indicators, statistical indicators and information theory indicators. Eight machine learning algorithms with high maturity, low user involvement and strong family representation were selected as the base algorithms. The algorithm performances were evaluated by three aspects: prediction accuracy, running speed and memory consumption. By constructing decision tree and stepwise regression model to learn the above metadata, the algorithm applicability knowledge of medical data set is obtained. Through cross-verification, the accuracy of all the algorithm applicability prediction models is above 75%, which proves the validity and feasibility of the applicability knowledge.
KW - algorithm applicability
KW - data mining
KW - dataset characteristic quantization
KW - decision tree
KW - medical dataset
UR - http://www.scopus.com/inward/record.url?scp=85184690085&partnerID=8YFLogxK
U2 - 10.3389/fncom.2024.1345575
DO - 10.3389/fncom.2024.1345575
M3 - Article
AN - SCOPUS:85184690085
SN - 1662-5188
VL - 18
JO - Frontiers in Computational Neuroscience
JF - Frontiers in Computational Neuroscience
M1 - 1345575
ER -