研究八种机器学习算法在医疗分类任务中对不同特征数据集的适用性

IF 2.1 4区医学 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Frontiers in Computational Neuroscience Pub Date : 2024-01-15 DOI:10.3389/fncom.2024.1345575

Yiyan Zhang, Qin Li, Yi Xin

{"title":"研究八种机器学习算法在医疗分类任务中对不同特征数据集的适用性","authors":"Yiyan Zhang, Qin Li, Yi Xin","doi":"10.3389/fncom.2024.1345575","DOIUrl":null,"url":null,"abstract":"<p>With the vigorous development of data mining field, more and more algorithms have been proposed or improved. How to quickly select a data mining algorithm that is suitable for data sets in medical field is a challenge for some medical workers. The purpose of this paper is to study the comparative characteristics of the general medical data set and the general data sets in other fields, and find the applicability rules of the data mining algorithm suitable for the characteristics of the current research data set. The study quantified characteristics of the research data set with 26 indicators, including simple indicators, statistical indicators and information theory indicators. Eight machine learning algorithms with high maturity, low user involvement and strong family representation were selected as the base algorithms. The algorithm performances were evaluated by three aspects: prediction accuracy, running speed and memory consumption. By constructing decision tree and stepwise regression model to learn the above metadata, the algorithm applicability knowledge of medical data set is obtained. Through cross-verification, the accuracy of all the algorithm applicability prediction models is above 75%, which proves the validity and feasibility of the applicability knowledge.</p>","PeriodicalId":12363,"journal":{"name":"Frontiers in Computational Neuroscience","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2024-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on eight machine learning algorithms applicability on different characteristics data sets in medical classification tasks\",\"authors\":\"Yiyan Zhang, Qin Li, Yi Xin\",\"doi\":\"10.3389/fncom.2024.1345575\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>With the vigorous development of data mining field, more and more algorithms have been proposed or improved. How to quickly select a data mining algorithm that is suitable for data sets in medical field is a challenge for some medical workers. The purpose of this paper is to study the comparative characteristics of the general medical data set and the general data sets in other fields, and find the applicability rules of the data mining algorithm suitable for the characteristics of the current research data set. The study quantified characteristics of the research data set with 26 indicators, including simple indicators, statistical indicators and information theory indicators. Eight machine learning algorithms with high maturity, low user involvement and strong family representation were selected as the base algorithms. The algorithm performances were evaluated by three aspects: prediction accuracy, running speed and memory consumption. By constructing decision tree and stepwise regression model to learn the above metadata, the algorithm applicability knowledge of medical data set is obtained. Through cross-verification, the accuracy of all the algorithm applicability prediction models is above 75%, which proves the validity and feasibility of the applicability knowledge.</p>\",\"PeriodicalId\":12363,\"journal\":{\"name\":\"Frontiers in Computational Neuroscience\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-01-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Computational Neuroscience\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3389/fncom.2024.1345575\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Computational Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fncom.2024.1345575","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

随着数据挖掘领域的蓬勃发展，越来越多的算法被提出或改进。如何快速选择适合医学领域数据集的数据挖掘算法，是摆在一些医学工作者面前的难题。本文旨在研究一般医学数据集与其他领域一般数据集的比较特征，找到适合当前研究数据集特征的数据挖掘算法的适用性规则。研究用 26 个指标量化了研究数据集的特征，包括简单指标、统计指标和信息论指标。选择了成熟度高、用户参与度低、家族代表性强的 8 种机器学习算法作为基础算法。从预测精度、运行速度和内存消耗三个方面对算法性能进行了评估。通过构建决策树和逐步回归模型来学习上述元数据，从而获得医疗数据集的算法适用性知识。通过交叉验证，所有算法适用性预测模型的准确率都在 75% 以上，证明了适用性知识的有效性和可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Research on eight machine learning algorithms applicability on different characteristics data sets in medical classification tasks

With the vigorous development of data mining field, more and more algorithms have been proposed or improved. How to quickly select a data mining algorithm that is suitable for data sets in medical field is a challenge for some medical workers. The purpose of this paper is to study the comparative characteristics of the general medical data set and the general data sets in other fields, and find the applicability rules of the data mining algorithm suitable for the characteristics of the current research data set. The study quantified characteristics of the research data set with 26 indicators, including simple indicators, statistical indicators and information theory indicators. Eight machine learning algorithms with high maturity, low user involvement and strong family representation were selected as the base algorithms. The algorithm performances were evaluated by three aspects: prediction accuracy, running speed and memory consumption. By constructing decision tree and stepwise regression model to learn the above metadata, the algorithm applicability knowledge of medical data set is obtained. Through cross-verification, the accuracy of all the algorithm applicability prediction models is above 75%, which proves the validity and feasibility of the applicability knowledge.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in Computational Neuroscience MATHEMATICAL & COMPUTATIONAL BIOLOGY-NEUROSCIENCES

CiteScore

5.30

自引率

3.10%

发文量

166

审稿时长

6-12 weeks

期刊介绍： Frontiers in Computational Neuroscience is a first-tier electronic journal devoted to promoting theoretical modeling of brain function and fostering interdisciplinary interactions between theoretical and experimental neuroscience. Progress in understanding the amazing capabilities of the brain is still limited, and we believe that it will only come with deep theoretical thinking and mutually stimulating cooperation between different disciplines and approaches. We therefore invite original contributions on a wide range of topics that present the fruits of such cooperation, or provide stimuli for future alliances. We aim to provide an interactive forum for cutting-edge theoretical studies of the nervous system, and for promulgating the best theoretical research to the broader neuroscience community. Models of all styles and at all levels are welcome, from biophysically motivated realistic simulations of neurons and synapses to high-level abstract models of inference and decision making. While the journal is primarily focused on theoretically based and driven research, we welcome experimental studies that validate and test theoretical conclusions. Also: comp neuro