An ensemble classification approach for cervical cancer prediction using behavioral risk factors

Md Shahin Ali, Md Maruf Hossain, Moutushi Akter Kona, Kazi Rubaya Nowrin, Md Khairul Islam
{"title":"An ensemble classification approach for cervical cancer prediction using behavioral risk factors","authors":"Md Shahin Ali,&nbsp;Md Maruf Hossain,&nbsp;Moutushi Akter Kona,&nbsp;Kazi Rubaya Nowrin,&nbsp;Md Khairul Islam","doi":"10.1016/j.health.2024.100324","DOIUrl":null,"url":null,"abstract":"<div><p>Cervical cancer is a significant public health concern among females worldwide. Despite being preventable, it remains a leading cause of mortality. Early detection is crucial for successful treatment and improved survival rates. This study proposes an ensemble Machine Learning (ML) classifier for efficient and accurate identification of cervical cancer using medical data. The proposed methodology involves preparing two datasets using effective preprocessing techniques, extracting essential features using the scikit-learn package, and developing an ensemble classifier based on Random Forest, Support Vector Machine, Gaussian Naïve Bayes, and Decision Tree classifier traits. Comparison with other state-of-the-art algorithms using several ML techniques, including support vector machine, decision tree, random forest, Naïve Bayes, logistic regression, CatBoost, and AdaBoost, demonstrates that the proposed ensemble classifier outperforms them significantly, achieving accuracies of 98.06% and 95.45% for Dataset 1 and Dataset 2, respectively. The proposed ensemble classifier outperforms current state-of-the-art algorithms by 1.50% and 6.67% for Dataset 1 and Dataset 2, respectively, highlighting its superior performance compared to existing methods. The study also utilizes a five-fold cross-validation technique to analyze the benefits and drawbacks of the proposed methodology for predicting cervical cancer using medical data. The Receiver Operating Characteristic (ROC) curves with corresponding Area Under the Curve (AUC) values are 0.95 for Dataset 1 and 0.97 for Dataset 2, indicating the overall performance of the classifiers in distinguishing between the classes. Additionally, we employed SHapley Additive exPlanations (SHAP) as an Explainable Artificial Intelligence (XAI) technique to visualize the classifier’s performance, providing insights into the important features contributing to cervical cancer identification. The results demonstrate that the proposed ensemble classifier can efficiently and accurately identify cervical cancer and potentially improve cervical cancer diagnosis and treatment.</p></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"5 ","pages":"Article 100324"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772442524000261/pdfft?md5=70cb57a926b1a9a3779e32e8685de5dc&pid=1-s2.0-S2772442524000261-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare analytics (New York, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772442524000261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Cervical cancer is a significant public health concern among females worldwide. Despite being preventable, it remains a leading cause of mortality. Early detection is crucial for successful treatment and improved survival rates. This study proposes an ensemble Machine Learning (ML) classifier for efficient and accurate identification of cervical cancer using medical data. The proposed methodology involves preparing two datasets using effective preprocessing techniques, extracting essential features using the scikit-learn package, and developing an ensemble classifier based on Random Forest, Support Vector Machine, Gaussian Naïve Bayes, and Decision Tree classifier traits. Comparison with other state-of-the-art algorithms using several ML techniques, including support vector machine, decision tree, random forest, Naïve Bayes, logistic regression, CatBoost, and AdaBoost, demonstrates that the proposed ensemble classifier outperforms them significantly, achieving accuracies of 98.06% and 95.45% for Dataset 1 and Dataset 2, respectively. The proposed ensemble classifier outperforms current state-of-the-art algorithms by 1.50% and 6.67% for Dataset 1 and Dataset 2, respectively, highlighting its superior performance compared to existing methods. The study also utilizes a five-fold cross-validation technique to analyze the benefits and drawbacks of the proposed methodology for predicting cervical cancer using medical data. The Receiver Operating Characteristic (ROC) curves with corresponding Area Under the Curve (AUC) values are 0.95 for Dataset 1 and 0.97 for Dataset 2, indicating the overall performance of the classifiers in distinguishing between the classes. Additionally, we employed SHapley Additive exPlanations (SHAP) as an Explainable Artificial Intelligence (XAI) technique to visualize the classifier’s performance, providing insights into the important features contributing to cervical cancer identification. The results demonstrate that the proposed ensemble classifier can efficiently and accurately identify cervical cancer and potentially improve cervical cancer diagnosis and treatment.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用行为风险因素预测宫颈癌的集合分类法
宫颈癌是全世界女性关注的一个重大公共卫生问题。尽管宫颈癌是可以预防的,但它仍然是导致死亡的主要原因。早期发现对于成功治疗和提高生存率至关重要。本研究提出了一种集合式机器学习(ML)分类器,用于利用医疗数据高效、准确地识别宫颈癌。建议的方法包括使用有效的预处理技术准备两个数据集,使用 scikit-learn 软件包提取基本特征,并开发基于随机森林、支持向量机、高斯奈夫贝叶斯和决策树分类器特征的集合分类器。与使用支持向量机、决策树、随机森林、奈夫贝叶斯、逻辑回归、CatBoost和AdaBoost等多种ML技术的其他先进算法相比,所提出的集合分类器的性能明显优于它们,在数据集1和数据集2中的准确率分别达到了98.06%和95.45%。就数据集 1 和数据集 2 而言,所提出的集合分类器分别比目前最先进的算法高出 1.50% 和 6.67%,凸显了其优于现有方法的性能。研究还利用五重交叉验证技术分析了所提方法在利用医疗数据预测宫颈癌方面的优缺点。数据集 1 和数据集 2 的接收方操作特征曲线(ROC)及相应的曲线下面积(AUC)值分别为 0.95 和 0.97,表明分类器在区分类别方面的整体性能良好。此外,我们还采用了可解释人工智能(XAI)技术--SHAPLE Additive exPlanations(SHAP)来可视化分类器的性能,从而深入了解有助于宫颈癌识别的重要特征。结果表明,所提出的集合分类器可以高效、准确地识别宫颈癌,并有望改善宫颈癌的诊断和治疗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Healthcare analytics (New York, N.Y.)
Healthcare analytics (New York, N.Y.) Applied Mathematics, Modelling and Simulation, Nursing and Health Professions (General)
CiteScore
4.40
自引率
0.00%
发文量
0
审稿时长
79 days
期刊最新文献
Optimized early fusion of handcrafted and deep learning descriptors for voice pathology detection and classification A deep neural network model with spectral correlation function for electrocardiogram classification and diagnosis of atrial fibrillation An ensemble convolutional neural network model for brain stroke prediction using brain computed tomography images A hierarchical Bayesian approach for identifying socioeconomic factors influencing self-rated health in Japan An electrocardiogram signal classification using a hybrid machine learning and deep learning approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1