A comparative study of machine learning models with LASSO and SHAP feature selection for breast cancer prediction

Md. Shazzad Hossain Shaon, Tasmin Karim, Md. Shahriar Shakil, Md. Zahid Hasan
{"title":"A comparative study of machine learning models with LASSO and SHAP feature selection for breast cancer prediction","authors":"Md. Shazzad Hossain Shaon,&nbsp;Tasmin Karim,&nbsp;Md. Shahriar Shakil,&nbsp;Md. Zahid Hasan","doi":"10.1016/j.health.2024.100353","DOIUrl":null,"url":null,"abstract":"<div><p>In recent decades, breast cancer has become the most prevalent type of cancer that impacts women in the world, which shows a significant risk to the death rates of women. Early identification of breast cancer might drastically decrease patient mortality and greatly improve the chance of an effective treatment. In modern times, machine learning models have become crucial for classifying cancer and strengthening both the accuracy and efficiency of diagnostic and medical treatment strategies. Therefore, this study is focused on early detection of breast cancer using a variety of machine learning algorithms and desires to identify the most effective feature selection process with an amalgamated dataset. Initially, we evaluated five traditional models and two meta-models on separate datasets. To find the most valuable features, the study used the Least Absolute Shrinkage and Selection Operator (LASSO) as well as SHapley Additive exPlanations (SHAP) selection methods and analyzed them through a wide range of performance regulations. Additionally, we applied these models to the combined dataset and observed that the mergeddataset was significantly beneficial for breast cancer diagnosis. After analyzing the feature selection strategies, it was demonstrated that the majority of models performed more accurately when utilizing SHAP methodologies. Notably, three traditional models and two meta-classifiers obtained an accuracy of 99.82%, demonstrating superior performance compared to state-of-the-art methods. This advancement holds a crucial role as it lays the foundation for refining diagnostic tools and enhancing the progression of medical science in this field.</p></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772442524000558/pdfft?md5=86753ff6e5dca7c27f447a4a08fa5813&pid=1-s2.0-S2772442524000558-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare analytics (New York, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772442524000558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent decades, breast cancer has become the most prevalent type of cancer that impacts women in the world, which shows a significant risk to the death rates of women. Early identification of breast cancer might drastically decrease patient mortality and greatly improve the chance of an effective treatment. In modern times, machine learning models have become crucial for classifying cancer and strengthening both the accuracy and efficiency of diagnostic and medical treatment strategies. Therefore, this study is focused on early detection of breast cancer using a variety of machine learning algorithms and desires to identify the most effective feature selection process with an amalgamated dataset. Initially, we evaluated five traditional models and two meta-models on separate datasets. To find the most valuable features, the study used the Least Absolute Shrinkage and Selection Operator (LASSO) as well as SHapley Additive exPlanations (SHAP) selection methods and analyzed them through a wide range of performance regulations. Additionally, we applied these models to the combined dataset and observed that the mergeddataset was significantly beneficial for breast cancer diagnosis. After analyzing the feature selection strategies, it was demonstrated that the majority of models performed more accurately when utilizing SHAP methodologies. Notably, three traditional models and two meta-classifiers obtained an accuracy of 99.82%, demonstrating superior performance compared to state-of-the-art methods. This advancement holds a crucial role as it lays the foundation for refining diagnostic tools and enhancing the progression of medical science in this field.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
采用 LASSO 和 SHAP 特征选择的机器学习模型在乳腺癌预测方面的比较研究
近几十年来,乳腺癌已成为影响全球妇女的最常见癌症类型,这对妇女的死亡率构成了重大风险。乳腺癌的早期识别可能会大大降低患者的死亡率,并大大提高有效治疗的机会。在现代,机器学习模型已成为癌症分类的关键,并能提高诊断和医疗策略的准确性和效率。因此,本研究的重点是利用各种机器学习算法对乳腺癌进行早期检测,并希望通过综合数据集找出最有效的特征选择过程。最初,我们在不同的数据集上评估了五个传统模型和两个元模型。为了找到最有价值的特征,研究使用了最小绝对收缩和选择操作符(LASSO)以及 SHapley Additive exPlanations(SHAP)选择方法,并通过一系列性能规定对它们进行了分析。此外,我们还将这些模型应用于合并数据集,并观察到合并数据集明显有利于乳腺癌诊断。在对特征选择策略进行分析后,我们发现大多数模型在使用 SHAP 方法时表现得更为准确。值得注意的是,三个传统模型和两个元分类器获得了 99.82% 的准确率,与最先进的方法相比,表现出了卓越的性能。这一进步为完善诊断工具和促进医学科学在这一领域的发展奠定了基础,具有至关重要的作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Healthcare analytics (New York, N.Y.)
Healthcare analytics (New York, N.Y.) Applied Mathematics, Modelling and Simulation, Nursing and Health Professions (General)
CiteScore
4.40
自引率
0.00%
发文量
0
审稿时长
79 days
期刊最新文献
An electrocardiogram signal classification using a hybrid machine learning and deep learning approach An inter-hospital performance assessment model for evaluating hospitals performing hip arthroplasty A data envelopment analysis model for optimizing transfer time of ischemic stroke patients under endovascular thrombectomy An investigation of Susceptible–Exposed–Infectious–Recovered (SEIR) tuberculosis model dynamics with pseudo-recovery and psychological effect A novel integrated logistic regression model enhanced with recursive feature elimination and explainable artificial intelligence for dementia prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1