Stacking Ensemble Technique for Multiple Medical Datasets Classification: A Generalized Prediction Model

Nahrin Jannat, S. M. Mahedy Hasan, Anwar Hossain Efat, Md Fakrul Taraque, Mostarina Mitu, Md. Al Mamun, Md. Farukuzzaman Faruk
{"title":"Stacking Ensemble Technique for Multiple Medical Datasets Classification: A Generalized Prediction Model","authors":"Nahrin Jannat, S. M. Mahedy Hasan, Anwar Hossain Efat, Md Fakrul Taraque, Mostarina Mitu, Md. Al Mamun, Md. Farukuzzaman Faruk","doi":"10.1109/ECCE57851.2023.10101523","DOIUrl":null,"url":null,"abstract":"Precise early detection of diseases can reduce the worsening and lethality, but it is not a spontaneous act to deal with complex medical data. Machine Learning (ML) can help the research community extensively in this aspect by playing a vast role in predicting the status of diseases at early stages. The study intended to develop a generalized model based on ML techniques that can classify frequently occurring diseases with better performance and reliability. In this research, four datasets collected from different repositories, such as the MRI and Alzheimer's Dataset (MAD), the SPECTF Heart Dataset (SHD), the Early Stage Diabetes Dataset (ESDD), and Lower Back Pain Dataset (LBPD), followed by analyzing and evaluating according to their performances to propose the prediction model. Numerous studies on this aspect conducted by others are available, but there is still scope for prosperity. To overcome the shortcomings of previous research, we have driven the first step with data preprocessing followed by six classification techniques such as Logistic regression (LR), Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT), Random Forest (RF), and Extra tree (ET) are performed with 10-fold cross-validation as evaluation measure after assigning the best parameters manually by randomized search. In addition, the three best-performing classifiers (LR, RF, and SVM) are selected with their hyper-parameters to create an ensemble model through the stacking ensemble technique. After all, our generalized stacking ensemble model outperformed all other classifiers used in this study as well as other researchers in terms of accuracy that 96.97% in MAD, 95.08% in SHD, 98.90% in ESDD and 91.34% in LBPD are obtained.","PeriodicalId":131537,"journal":{"name":"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECCE57851.2023.10101523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Precise early detection of diseases can reduce the worsening and lethality, but it is not a spontaneous act to deal with complex medical data. Machine Learning (ML) can help the research community extensively in this aspect by playing a vast role in predicting the status of diseases at early stages. The study intended to develop a generalized model based on ML techniques that can classify frequently occurring diseases with better performance and reliability. In this research, four datasets collected from different repositories, such as the MRI and Alzheimer's Dataset (MAD), the SPECTF Heart Dataset (SHD), the Early Stage Diabetes Dataset (ESDD), and Lower Back Pain Dataset (LBPD), followed by analyzing and evaluating according to their performances to propose the prediction model. Numerous studies on this aspect conducted by others are available, but there is still scope for prosperity. To overcome the shortcomings of previous research, we have driven the first step with data preprocessing followed by six classification techniques such as Logistic regression (LR), Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT), Random Forest (RF), and Extra tree (ET) are performed with 10-fold cross-validation as evaluation measure after assigning the best parameters manually by randomized search. In addition, the three best-performing classifiers (LR, RF, and SVM) are selected with their hyper-parameters to create an ensemble model through the stacking ensemble technique. After all, our generalized stacking ensemble model outperformed all other classifiers used in this study as well as other researchers in terms of accuracy that 96.97% in MAD, 95.08% in SHD, 98.90% in ESDD and 91.34% in LBPD are obtained.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多医疗数据集分类的叠加集成技术:一种广义预测模型
精确的疾病早期检测可以减少疾病的恶化和致死率,但处理复杂的医疗数据并不是一种自发的行为。机器学习(ML)可以通过在早期阶段预测疾病状态发挥巨大作用,在这方面广泛帮助研究界。本研究旨在开发一种基于机器学习技术的广义模型,该模型可以对常见病进行分类,具有更好的性能和可靠性。在本研究中,从不同的存储库中收集了4个数据集,如MRI和阿尔茨海默病数据集(MAD)、spect心脏数据集(SHD)、早期糖尿病数据集(ESDD)和腰痛数据集(LBPD),然后根据其性能进行分析和评估,提出预测模型。其他人在这方面进行了许多研究,但仍有繁荣的空间。为了克服以往研究的不足,我们首先对数据进行预处理,然后采用Logistic回归(LR)、支持向量机(SVM)、朴素贝叶斯(NB)、决策树(DT)、随机森林(RF)和额外树(ET)等6种分类技术,通过随机搜索手动分配最佳参数,并进行10倍交叉验证作为评估措施。此外,选择三个表现最好的分类器(LR、RF和SVM)及其超参数,通过堆叠集成技术创建集成模型。毕竟,我们的广义叠加集成模型在准确率方面优于本研究中使用的所有其他分类器以及其他研究人员,在MAD、SHD、ESDD和LBPD中分别获得了96.97%、95.08%、98.90%和91.34%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Cyclone Prediction Visualization Tools Using Machine Learning Models and Optical Flow Exploratory Perspective of PV Net-Energy-Metering for Residential Prosumers: A Case Study in Dhaka, Bangladesh Estimation of Soil Moisture with Meteorological Variables in Supervised Machine Learning Models Deep CNN-GRU Based Human Activity Recognition with Automatic Feature Extraction Using Smartphone and Wearable Sensors Bengali-English Neural Machine Translation Using Deep Learning Techniques
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1