An enhanced machine learning approach with stacking ensemble learner for accurate liver cancer diagnosis using feature selection and gene expression data
{"title":"An enhanced machine learning approach with stacking ensemble learner for accurate liver cancer diagnosis using feature selection and gene expression data","authors":"Amena Mahmoud , Eiko Takaoka","doi":"10.1016/j.health.2024.100373","DOIUrl":null,"url":null,"abstract":"<div><div>Liver cancer is a significant global health concern, necessitating accurate and timely diagnosis for effective treatment. Machine learning approaches have emerged as promising tools for improving liver cancer classification using gene expression data in recent years. This study presents an advanced machine learning approach for liver cancer diagnosis using gene expression data, combining feature selection techniques with a stacking ensemble learning model. Our method addresses the challenges of high dimensionality and complex patterns in genomic data to improve diagnostic accuracy and interpretability. We employed a feature selection process to identify the most relevant gene expressions associated with liver cancer. This approach reduced the dimensionality of the data while preserving crucial biological information. The selected features were then used to train a stacking ensemble model, which combined multiple base learners, including Multi-Layer Perceptron (MLP), Random Forest (RF) model, K-nearest neighbor (KNN) model, and Support vector machine (SVM), with a meta-learner Extreme Gradient Boosting (Xgboost) model to make final predictions. The stacking ensemble achieved an accuracy of (97%), outperforming individual machine learning algorithms and traditional diagnostic methods. Furthermore, the model demonstrated high sensitivity (96.8%) and specificity (98.1%), crucial for early detection and minimizing false positives.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100373"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare analytics (New York, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772442524000753","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Liver cancer is a significant global health concern, necessitating accurate and timely diagnosis for effective treatment. Machine learning approaches have emerged as promising tools for improving liver cancer classification using gene expression data in recent years. This study presents an advanced machine learning approach for liver cancer diagnosis using gene expression data, combining feature selection techniques with a stacking ensemble learning model. Our method addresses the challenges of high dimensionality and complex patterns in genomic data to improve diagnostic accuracy and interpretability. We employed a feature selection process to identify the most relevant gene expressions associated with liver cancer. This approach reduced the dimensionality of the data while preserving crucial biological information. The selected features were then used to train a stacking ensemble model, which combined multiple base learners, including Multi-Layer Perceptron (MLP), Random Forest (RF) model, K-nearest neighbor (KNN) model, and Support vector machine (SVM), with a meta-learner Extreme Gradient Boosting (Xgboost) model to make final predictions. The stacking ensemble achieved an accuracy of (97%), outperforming individual machine learning algorithms and traditional diagnostic methods. Furthermore, the model demonstrated high sensitivity (96.8%) and specificity (98.1%), crucial for early detection and minimizing false positives.