{"title":"采用 LASSO 和 SHAP 特征选择的机器学习模型在乳腺癌预测方面的比较研究","authors":"Md. Shazzad Hossain Shaon, Tasmin Karim, Md. Shahriar Shakil, Md. Zahid Hasan","doi":"10.1016/j.health.2024.100353","DOIUrl":null,"url":null,"abstract":"<div><p>In recent decades, breast cancer has become the most prevalent type of cancer that impacts women in the world, which shows a significant risk to the death rates of women. Early identification of breast cancer might drastically decrease patient mortality and greatly improve the chance of an effective treatment. In modern times, machine learning models have become crucial for classifying cancer and strengthening both the accuracy and efficiency of diagnostic and medical treatment strategies. Therefore, this study is focused on early detection of breast cancer using a variety of machine learning algorithms and desires to identify the most effective feature selection process with an amalgamated dataset. Initially, we evaluated five traditional models and two meta-models on separate datasets. To find the most valuable features, the study used the Least Absolute Shrinkage and Selection Operator (LASSO) as well as SHapley Additive exPlanations (SHAP) selection methods and analyzed them through a wide range of performance regulations. Additionally, we applied these models to the combined dataset and observed that the mergeddataset was significantly beneficial for breast cancer diagnosis. After analyzing the feature selection strategies, it was demonstrated that the majority of models performed more accurately when utilizing SHAP methodologies. Notably, three traditional models and two meta-classifiers obtained an accuracy of 99.82%, demonstrating superior performance compared to state-of-the-art methods. This advancement holds a crucial role as it lays the foundation for refining diagnostic tools and enhancing the progression of medical science in this field.</p></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"6 ","pages":"Article 100353"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772442524000558/pdfft?md5=86753ff6e5dca7c27f447a4a08fa5813&pid=1-s2.0-S2772442524000558-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A comparative study of machine learning models with LASSO and SHAP feature selection for breast cancer prediction\",\"authors\":\"Md. Shazzad Hossain Shaon, Tasmin Karim, Md. Shahriar Shakil, Md. Zahid Hasan\",\"doi\":\"10.1016/j.health.2024.100353\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In recent decades, breast cancer has become the most prevalent type of cancer that impacts women in the world, which shows a significant risk to the death rates of women. Early identification of breast cancer might drastically decrease patient mortality and greatly improve the chance of an effective treatment. In modern times, machine learning models have become crucial for classifying cancer and strengthening both the accuracy and efficiency of diagnostic and medical treatment strategies. Therefore, this study is focused on early detection of breast cancer using a variety of machine learning algorithms and desires to identify the most effective feature selection process with an amalgamated dataset. Initially, we evaluated five traditional models and two meta-models on separate datasets. To find the most valuable features, the study used the Least Absolute Shrinkage and Selection Operator (LASSO) as well as SHapley Additive exPlanations (SHAP) selection methods and analyzed them through a wide range of performance regulations. Additionally, we applied these models to the combined dataset and observed that the mergeddataset was significantly beneficial for breast cancer diagnosis. After analyzing the feature selection strategies, it was demonstrated that the majority of models performed more accurately when utilizing SHAP methodologies. Notably, three traditional models and two meta-classifiers obtained an accuracy of 99.82%, demonstrating superior performance compared to state-of-the-art methods. This advancement holds a crucial role as it lays the foundation for refining diagnostic tools and enhancing the progression of medical science in this field.</p></div>\",\"PeriodicalId\":73222,\"journal\":{\"name\":\"Healthcare analytics (New York, N.Y.)\",\"volume\":\"6 \",\"pages\":\"Article 100353\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2772442524000558/pdfft?md5=86753ff6e5dca7c27f447a4a08fa5813&pid=1-s2.0-S2772442524000558-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Healthcare analytics (New York, N.Y.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772442524000558\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare analytics (New York, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772442524000558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A comparative study of machine learning models with LASSO and SHAP feature selection for breast cancer prediction
In recent decades, breast cancer has become the most prevalent type of cancer that impacts women in the world, which shows a significant risk to the death rates of women. Early identification of breast cancer might drastically decrease patient mortality and greatly improve the chance of an effective treatment. In modern times, machine learning models have become crucial for classifying cancer and strengthening both the accuracy and efficiency of diagnostic and medical treatment strategies. Therefore, this study is focused on early detection of breast cancer using a variety of machine learning algorithms and desires to identify the most effective feature selection process with an amalgamated dataset. Initially, we evaluated five traditional models and two meta-models on separate datasets. To find the most valuable features, the study used the Least Absolute Shrinkage and Selection Operator (LASSO) as well as SHapley Additive exPlanations (SHAP) selection methods and analyzed them through a wide range of performance regulations. Additionally, we applied these models to the combined dataset and observed that the mergeddataset was significantly beneficial for breast cancer diagnosis. After analyzing the feature selection strategies, it was demonstrated that the majority of models performed more accurately when utilizing SHAP methodologies. Notably, three traditional models and two meta-classifiers obtained an accuracy of 99.82%, demonstrating superior performance compared to state-of-the-art methods. This advancement holds a crucial role as it lays the foundation for refining diagnostic tools and enhancing the progression of medical science in this field.