Nahrin Jannat, S. M. Mahedy Hasan, Anwar Hossain Efat, Md Fakrul Taraque, Mostarina Mitu, Md. Al Mamun, Md. Farukuzzaman Faruk
{"title":"Stacking Ensemble Technique for Multiple Medical Datasets Classification: A Generalized Prediction Model","authors":"Nahrin Jannat, S. M. Mahedy Hasan, Anwar Hossain Efat, Md Fakrul Taraque, Mostarina Mitu, Md. Al Mamun, Md. Farukuzzaman Faruk","doi":"10.1109/ECCE57851.2023.10101523","DOIUrl":null,"url":null,"abstract":"Precise early detection of diseases can reduce the worsening and lethality, but it is not a spontaneous act to deal with complex medical data. Machine Learning (ML) can help the research community extensively in this aspect by playing a vast role in predicting the status of diseases at early stages. The study intended to develop a generalized model based on ML techniques that can classify frequently occurring diseases with better performance and reliability. In this research, four datasets collected from different repositories, such as the MRI and Alzheimer's Dataset (MAD), the SPECTF Heart Dataset (SHD), the Early Stage Diabetes Dataset (ESDD), and Lower Back Pain Dataset (LBPD), followed by analyzing and evaluating according to their performances to propose the prediction model. Numerous studies on this aspect conducted by others are available, but there is still scope for prosperity. To overcome the shortcomings of previous research, we have driven the first step with data preprocessing followed by six classification techniques such as Logistic regression (LR), Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT), Random Forest (RF), and Extra tree (ET) are performed with 10-fold cross-validation as evaluation measure after assigning the best parameters manually by randomized search. In addition, the three best-performing classifiers (LR, RF, and SVM) are selected with their hyper-parameters to create an ensemble model through the stacking ensemble technique. After all, our generalized stacking ensemble model outperformed all other classifiers used in this study as well as other researchers in terms of accuracy that 96.97% in MAD, 95.08% in SHD, 98.90% in ESDD and 91.34% in LBPD are obtained.","PeriodicalId":131537,"journal":{"name":"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECCE57851.2023.10101523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Precise early detection of diseases can reduce the worsening and lethality, but it is not a spontaneous act to deal with complex medical data. Machine Learning (ML) can help the research community extensively in this aspect by playing a vast role in predicting the status of diseases at early stages. The study intended to develop a generalized model based on ML techniques that can classify frequently occurring diseases with better performance and reliability. In this research, four datasets collected from different repositories, such as the MRI and Alzheimer's Dataset (MAD), the SPECTF Heart Dataset (SHD), the Early Stage Diabetes Dataset (ESDD), and Lower Back Pain Dataset (LBPD), followed by analyzing and evaluating according to their performances to propose the prediction model. Numerous studies on this aspect conducted by others are available, but there is still scope for prosperity. To overcome the shortcomings of previous research, we have driven the first step with data preprocessing followed by six classification techniques such as Logistic regression (LR), Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT), Random Forest (RF), and Extra tree (ET) are performed with 10-fold cross-validation as evaluation measure after assigning the best parameters manually by randomized search. In addition, the three best-performing classifiers (LR, RF, and SVM) are selected with their hyper-parameters to create an ensemble model through the stacking ensemble technique. After all, our generalized stacking ensemble model outperformed all other classifiers used in this study as well as other researchers in terms of accuracy that 96.97% in MAD, 95.08% in SHD, 98.90% in ESDD and 91.34% in LBPD are obtained.