利用行为风险因素预测宫颈癌的集合分类法

Healthcare analytics (New York, N.Y.) Pub Date : 2024-03-28 DOI:10.1016/j.health.2024.100324

Md Shahin Ali, Md Maruf Hossain, Moutushi Akter Kona, Kazi Rubaya Nowrin, Md Khairul Islam

{"title":"利用行为风险因素预测宫颈癌的集合分类法","authors":"Md Shahin Ali, Md Maruf Hossain, Moutushi Akter Kona, Kazi Rubaya Nowrin, Md Khairul Islam","doi":"10.1016/j.health.2024.100324","DOIUrl":null,"url":null,"abstract":"<div><p>Cervical cancer is a significant public health concern among females worldwide. Despite being preventable, it remains a leading cause of mortality. Early detection is crucial for successful treatment and improved survival rates. This study proposes an ensemble Machine Learning (ML) classifier for efficient and accurate identification of cervical cancer using medical data. The proposed methodology involves preparing two datasets using effective preprocessing techniques, extracting essential features using the scikit-learn package, and developing an ensemble classifier based on Random Forest, Support Vector Machine, Gaussian Naïve Bayes, and Decision Tree classifier traits. Comparison with other state-of-the-art algorithms using several ML techniques, including support vector machine, decision tree, random forest, Naïve Bayes, logistic regression, CatBoost, and AdaBoost, demonstrates that the proposed ensemble classifier outperforms them significantly, achieving accuracies of 98.06% and 95.45% for Dataset 1 and Dataset 2, respectively. The proposed ensemble classifier outperforms current state-of-the-art algorithms by 1.50% and 6.67% for Dataset 1 and Dataset 2, respectively, highlighting its superior performance compared to existing methods. The study also utilizes a five-fold cross-validation technique to analyze the benefits and drawbacks of the proposed methodology for predicting cervical cancer using medical data. The Receiver Operating Characteristic (ROC) curves with corresponding Area Under the Curve (AUC) values are 0.95 for Dataset 1 and 0.97 for Dataset 2, indicating the overall performance of the classifiers in distinguishing between the classes. Additionally, we employed SHapley Additive exPlanations (SHAP) as an Explainable Artificial Intelligence (XAI) technique to visualize the classifier’s performance, providing insights into the important features contributing to cervical cancer identification. The results demonstrate that the proposed ensemble classifier can efficiently and accurately identify cervical cancer and potentially improve cervical cancer diagnosis and treatment.</p></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"5 ","pages":"Article 100324"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772442524000261/pdfft?md5=70cb57a926b1a9a3779e32e8685de5dc&pid=1-s2.0-S2772442524000261-main.pdf","citationCount":"0","resultStr":"{\"title\":\"An ensemble classification approach for cervical cancer prediction using behavioral risk factors\",\"authors\":\"Md Shahin Ali, Md Maruf Hossain, Moutushi Akter Kona, Kazi Rubaya Nowrin, Md Khairul Islam\",\"doi\":\"10.1016/j.health.2024.100324\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Cervical cancer is a significant public health concern among females worldwide. Despite being preventable, it remains a leading cause of mortality. Early detection is crucial for successful treatment and improved survival rates. This study proposes an ensemble Machine Learning (ML) classifier for efficient and accurate identification of cervical cancer using medical data. The proposed methodology involves preparing two datasets using effective preprocessing techniques, extracting essential features using the scikit-learn package, and developing an ensemble classifier based on Random Forest, Support Vector Machine, Gaussian Naïve Bayes, and Decision Tree classifier traits. Comparison with other state-of-the-art algorithms using several ML techniques, including support vector machine, decision tree, random forest, Naïve Bayes, logistic regression, CatBoost, and AdaBoost, demonstrates that the proposed ensemble classifier outperforms them significantly, achieving accuracies of 98.06% and 95.45% for Dataset 1 and Dataset 2, respectively. The proposed ensemble classifier outperforms current state-of-the-art algorithms by 1.50% and 6.67% for Dataset 1 and Dataset 2, respectively, highlighting its superior performance compared to existing methods. The study also utilizes a five-fold cross-validation technique to analyze the benefits and drawbacks of the proposed methodology for predicting cervical cancer using medical data. The Receiver Operating Characteristic (ROC) curves with corresponding Area Under the Curve (AUC) values are 0.95 for Dataset 1 and 0.97 for Dataset 2, indicating the overall performance of the classifiers in distinguishing between the classes. Additionally, we employed SHapley Additive exPlanations (SHAP) as an Explainable Artificial Intelligence (XAI) technique to visualize the classifier’s performance, providing insights into the important features contributing to cervical cancer identification. The results demonstrate that the proposed ensemble classifier can efficiently and accurately identify cervical cancer and potentially improve cervical cancer diagnosis and treatment.</p></div>\",\"PeriodicalId\":73222,\"journal\":{\"name\":\"Healthcare analytics (New York, N.Y.)\",\"volume\":\"5 \",\"pages\":\"Article 100324\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2772442524000261/pdfft?md5=70cb57a926b1a9a3779e32e8685de5dc&pid=1-s2.0-S2772442524000261-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Healthcare analytics (New York, N.Y.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772442524000261\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare analytics (New York, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772442524000261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

宫颈癌是全世界女性关注的一个重大公共卫生问题。尽管宫颈癌是可以预防的，但它仍然是导致死亡的主要原因。早期发现对于成功治疗和提高生存率至关重要。本研究提出了一种集合式机器学习（ML）分类器，用于利用医疗数据高效、准确地识别宫颈癌。建议的方法包括使用有效的预处理技术准备两个数据集，使用 scikit-learn 软件包提取基本特征，并开发基于随机森林、支持向量机、高斯奈夫贝叶斯和决策树分类器特征的集合分类器。与使用支持向量机、决策树、随机森林、奈夫贝叶斯、逻辑回归、CatBoost和AdaBoost等多种ML技术的其他先进算法相比，所提出的集合分类器的性能明显优于它们，在数据集1和数据集2中的准确率分别达到了98.06%和95.45%。就数据集 1 和数据集 2 而言，所提出的集合分类器分别比目前最先进的算法高出 1.50% 和 6.67%，凸显了其优于现有方法的性能。研究还利用五重交叉验证技术分析了所提方法在利用医疗数据预测宫颈癌方面的优缺点。数据集 1 和数据集 2 的接收方操作特征曲线（ROC）及相应的曲线下面积（AUC）值分别为 0.95 和 0.97，表明分类器在区分类别方面的整体性能良好。此外，我们还采用了可解释人工智能（XAI）技术--SHAPLE Additive exPlanations（SHAP）来可视化分类器的性能，从而深入了解有助于宫颈癌识别的重要特征。结果表明，所提出的集合分类器可以高效、准确地识别宫颈癌，并有望改善宫颈癌的诊断和治疗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An ensemble classification approach for cervical cancer prediction using behavioral risk factors

Cervical cancer is a significant public health concern among females worldwide. Despite being preventable, it remains a leading cause of mortality. Early detection is crucial for successful treatment and improved survival rates. This study proposes an ensemble Machine Learning (ML) classifier for efficient and accurate identification of cervical cancer using medical data. The proposed methodology involves preparing two datasets using effective preprocessing techniques, extracting essential features using the scikit-learn package, and developing an ensemble classifier based on Random Forest, Support Vector Machine, Gaussian Naïve Bayes, and Decision Tree classifier traits. Comparison with other state-of-the-art algorithms using several ML techniques, including support vector machine, decision tree, random forest, Naïve Bayes, logistic regression, CatBoost, and AdaBoost, demonstrates that the proposed ensemble classifier outperforms them significantly, achieving accuracies of 98.06% and 95.45% for Dataset 1 and Dataset 2, respectively. The proposed ensemble classifier outperforms current state-of-the-art algorithms by 1.50% and 6.67% for Dataset 1 and Dataset 2, respectively, highlighting its superior performance compared to existing methods. The study also utilizes a five-fold cross-validation technique to analyze the benefits and drawbacks of the proposed methodology for predicting cervical cancer using medical data. The Receiver Operating Characteristic (ROC) curves with corresponding Area Under the Curve (AUC) values are 0.95 for Dataset 1 and 0.97 for Dataset 2, indicating the overall performance of the classifiers in distinguishing between the classes. Additionally, we employed SHapley Additive exPlanations (SHAP) as an Explainable Artificial Intelligence (XAI) technique to visualize the classifier’s performance, providing insights into the important features contributing to cervical cancer identification. The results demonstrate that the proposed ensemble classifier can efficiently and accurately identify cervical cancer and potentially improve cervical cancer diagnosis and treatment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊