Research on eight machine learning algorithms applicability on different characteristics data sets in medical classification tasks

Yiyan Zhang; Qin Li; Yi Xin

doi:10.3389/fncom.2024.1345575

Research on eight machine learning algorithms applicability on different characteristics data sets in medical classification tasks

Yiyan Zhang^*, Qin Li, Yi Xin

^*Corresponding author for this work

School of Medical and Technology

Qingdao Huanghai University

Research output: Contribution to journal › Article › peer-review

4 Citations (Scopus)

Abstract

With the vigorous development of data mining field, more and more algorithms have been proposed or improved. How to quickly select a data mining algorithm that is suitable for data sets in medical field is a challenge for some medical workers. The purpose of this paper is to study the comparative characteristics of the general medical data set and the general data sets in other fields, and find the applicability rules of the data mining algorithm suitable for the characteristics of the current research data set. The study quantified characteristics of the research data set with 26 indicators, including simple indicators, statistical indicators and information theory indicators. Eight machine learning algorithms with high maturity, low user involvement and strong family representation were selected as the base algorithms. The algorithm performances were evaluated by three aspects: prediction accuracy, running speed and memory consumption. By constructing decision tree and stepwise regression model to learn the above metadata, the algorithm applicability knowledge of medical data set is obtained. Through cross-verification, the accuracy of all the algorithm applicability prediction models is above 75%, which proves the validity and feasibility of the applicability knowledge.

Original language	English
Article number	1345575
Journal	Frontiers in Computational Neuroscience
Volume	18
DOIs	https://doi.org/10.3389/fncom.2024.1345575
Publication status	Published - 2024

Keywords

algorithm applicability
data mining
dataset characteristic quantization
decision tree
medical dataset

Access to Document

10.3389/fncom.2024.1345575

Cite this

Zhang, Y., Li, Q., & Xin, Y. (2024). Research on eight machine learning algorithms applicability on different characteristics data sets in medical classification tasks. Frontiers in Computational Neuroscience, 18, Article 1345575. https://doi.org/10.3389/fncom.2024.1345575

@article{2067fbe788754655915d286dae0980f7,

title = "Research on eight machine learning algorithms applicability on different characteristics data sets in medical classification tasks",

abstract = "With the vigorous development of data mining field, more and more algorithms have been proposed or improved. How to quickly select a data mining algorithm that is suitable for data sets in medical field is a challenge for some medical workers. The purpose of this paper is to study the comparative characteristics of the general medical data set and the general data sets in other fields, and find the applicability rules of the data mining algorithm suitable for the characteristics of the current research data set. The study quantified characteristics of the research data set with 26 indicators, including simple indicators, statistical indicators and information theory indicators. Eight machine learning algorithms with high maturity, low user involvement and strong family representation were selected as the base algorithms. The algorithm performances were evaluated by three aspects: prediction accuracy, running speed and memory consumption. By constructing decision tree and stepwise regression model to learn the above metadata, the algorithm applicability knowledge of medical data set is obtained. Through cross-verification, the accuracy of all the algorithm applicability prediction models is above 75%, which proves the validity and feasibility of the applicability knowledge.",

keywords = "algorithm applicability, data mining, dataset characteristic quantization, decision tree, medical dataset",

author = "Yiyan Zhang and Qin Li and Yi Xin",

note = "Publisher Copyright: Copyright {\textcopyright} 2024 Zhang, Li and Xin.",

year = "2024",

doi = "10.3389/fncom.2024.1345575",

language = "English",

volume = "18",

journal = "Frontiers in Computational Neuroscience",

issn = "1662-5188",

publisher = "Frontiers Media SA",

}

TY - JOUR

T1 - Research on eight machine learning algorithms applicability on different characteristics data sets in medical classification tasks

AU - Zhang, Yiyan

AU - Li, Qin

AU - Xin, Yi

PY - 2024

Y1 - 2024

N2 - With the vigorous development of data mining field, more and more algorithms have been proposed or improved. How to quickly select a data mining algorithm that is suitable for data sets in medical field is a challenge for some medical workers. The purpose of this paper is to study the comparative characteristics of the general medical data set and the general data sets in other fields, and find the applicability rules of the data mining algorithm suitable for the characteristics of the current research data set. The study quantified characteristics of the research data set with 26 indicators, including simple indicators, statistical indicators and information theory indicators. Eight machine learning algorithms with high maturity, low user involvement and strong family representation were selected as the base algorithms. The algorithm performances were evaluated by three aspects: prediction accuracy, running speed and memory consumption. By constructing decision tree and stepwise regression model to learn the above metadata, the algorithm applicability knowledge of medical data set is obtained. Through cross-verification, the accuracy of all the algorithm applicability prediction models is above 75%, which proves the validity and feasibility of the applicability knowledge.

AB - With the vigorous development of data mining field, more and more algorithms have been proposed or improved. How to quickly select a data mining algorithm that is suitable for data sets in medical field is a challenge for some medical workers. The purpose of this paper is to study the comparative characteristics of the general medical data set and the general data sets in other fields, and find the applicability rules of the data mining algorithm suitable for the characteristics of the current research data set. The study quantified characteristics of the research data set with 26 indicators, including simple indicators, statistical indicators and information theory indicators. Eight machine learning algorithms with high maturity, low user involvement and strong family representation were selected as the base algorithms. The algorithm performances were evaluated by three aspects: prediction accuracy, running speed and memory consumption. By constructing decision tree and stepwise regression model to learn the above metadata, the algorithm applicability knowledge of medical data set is obtained. Through cross-verification, the accuracy of all the algorithm applicability prediction models is above 75%, which proves the validity and feasibility of the applicability knowledge.

KW - algorithm applicability

KW - data mining

KW - dataset characteristic quantization

KW - decision tree

KW - medical dataset

UR - http://www.scopus.com/inward/record.url?scp=85184690085&partnerID=8YFLogxK

U2 - 10.3389/fncom.2024.1345575

DO - 10.3389/fncom.2024.1345575

M3 - Article

AN - SCOPUS:85184690085

SN - 1662-5188

VL - 18

JO - Frontiers in Computational Neuroscience

JF - Frontiers in Computational Neuroscience

M1 - 1345575

ER -

Research on eight machine learning algorithms applicability on different characteristics data sets in medical classification tasks

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this