Md Shahin Ali, Md Maruf Hossain, Moutushi Akter Kona, Kazi Rubaya Nowrin, Md Khairul Islam
{"title":"利用行为风险因素预测宫颈癌的集合分类法","authors":"Md Shahin Ali, Md Maruf Hossain, Moutushi Akter Kona, Kazi Rubaya Nowrin, Md Khairul Islam","doi":"10.1016/j.health.2024.100324","DOIUrl":null,"url":null,"abstract":"<div><p>Cervical cancer is a significant public health concern among females worldwide. Despite being preventable, it remains a leading cause of mortality. Early detection is crucial for successful treatment and improved survival rates. This study proposes an ensemble Machine Learning (ML) classifier for efficient and accurate identification of cervical cancer using medical data. The proposed methodology involves preparing two datasets using effective preprocessing techniques, extracting essential features using the scikit-learn package, and developing an ensemble classifier based on Random Forest, Support Vector Machine, Gaussian Naïve Bayes, and Decision Tree classifier traits. Comparison with other state-of-the-art algorithms using several ML techniques, including support vector machine, decision tree, random forest, Naïve Bayes, logistic regression, CatBoost, and AdaBoost, demonstrates that the proposed ensemble classifier outperforms them significantly, achieving accuracies of 98.06% and 95.45% for Dataset 1 and Dataset 2, respectively. The proposed ensemble classifier outperforms current state-of-the-art algorithms by 1.50% and 6.67% for Dataset 1 and Dataset 2, respectively, highlighting its superior performance compared to existing methods. The study also utilizes a five-fold cross-validation technique to analyze the benefits and drawbacks of the proposed methodology for predicting cervical cancer using medical data. The Receiver Operating Characteristic (ROC) curves with corresponding Area Under the Curve (AUC) values are 0.95 for Dataset 1 and 0.97 for Dataset 2, indicating the overall performance of the classifiers in distinguishing between the classes. Additionally, we employed SHapley Additive exPlanations (SHAP) as an Explainable Artificial Intelligence (XAI) technique to visualize the classifier’s performance, providing insights into the important features contributing to cervical cancer identification. The results demonstrate that the proposed ensemble classifier can efficiently and accurately identify cervical cancer and potentially improve cervical cancer diagnosis and treatment.</p></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"5 ","pages":"Article 100324"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772442524000261/pdfft?md5=70cb57a926b1a9a3779e32e8685de5dc&pid=1-s2.0-S2772442524000261-main.pdf","citationCount":"0","resultStr":"{\"title\":\"An ensemble classification approach for cervical cancer prediction using behavioral risk factors\",\"authors\":\"Md Shahin Ali, Md Maruf Hossain, Moutushi Akter Kona, Kazi Rubaya Nowrin, Md Khairul Islam\",\"doi\":\"10.1016/j.health.2024.100324\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Cervical cancer is a significant public health concern among females worldwide. Despite being preventable, it remains a leading cause of mortality. Early detection is crucial for successful treatment and improved survival rates. This study proposes an ensemble Machine Learning (ML) classifier for efficient and accurate identification of cervical cancer using medical data. The proposed methodology involves preparing two datasets using effective preprocessing techniques, extracting essential features using the scikit-learn package, and developing an ensemble classifier based on Random Forest, Support Vector Machine, Gaussian Naïve Bayes, and Decision Tree classifier traits. Comparison with other state-of-the-art algorithms using several ML techniques, including support vector machine, decision tree, random forest, Naïve Bayes, logistic regression, CatBoost, and AdaBoost, demonstrates that the proposed ensemble classifier outperforms them significantly, achieving accuracies of 98.06% and 95.45% for Dataset 1 and Dataset 2, respectively. The proposed ensemble classifier outperforms current state-of-the-art algorithms by 1.50% and 6.67% for Dataset 1 and Dataset 2, respectively, highlighting its superior performance compared to existing methods. The study also utilizes a five-fold cross-validation technique to analyze the benefits and drawbacks of the proposed methodology for predicting cervical cancer using medical data. The Receiver Operating Characteristic (ROC) curves with corresponding Area Under the Curve (AUC) values are 0.95 for Dataset 1 and 0.97 for Dataset 2, indicating the overall performance of the classifiers in distinguishing between the classes. Additionally, we employed SHapley Additive exPlanations (SHAP) as an Explainable Artificial Intelligence (XAI) technique to visualize the classifier’s performance, providing insights into the important features contributing to cervical cancer identification. The results demonstrate that the proposed ensemble classifier can efficiently and accurately identify cervical cancer and potentially improve cervical cancer diagnosis and treatment.</p></div>\",\"PeriodicalId\":73222,\"journal\":{\"name\":\"Healthcare analytics (New York, N.Y.)\",\"volume\":\"5 \",\"pages\":\"Article 100324\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2772442524000261/pdfft?md5=70cb57a926b1a9a3779e32e8685de5dc&pid=1-s2.0-S2772442524000261-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Healthcare analytics (New York, N.Y.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772442524000261\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare analytics (New York, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772442524000261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An ensemble classification approach for cervical cancer prediction using behavioral risk factors
Cervical cancer is a significant public health concern among females worldwide. Despite being preventable, it remains a leading cause of mortality. Early detection is crucial for successful treatment and improved survival rates. This study proposes an ensemble Machine Learning (ML) classifier for efficient and accurate identification of cervical cancer using medical data. The proposed methodology involves preparing two datasets using effective preprocessing techniques, extracting essential features using the scikit-learn package, and developing an ensemble classifier based on Random Forest, Support Vector Machine, Gaussian Naïve Bayes, and Decision Tree classifier traits. Comparison with other state-of-the-art algorithms using several ML techniques, including support vector machine, decision tree, random forest, Naïve Bayes, logistic regression, CatBoost, and AdaBoost, demonstrates that the proposed ensemble classifier outperforms them significantly, achieving accuracies of 98.06% and 95.45% for Dataset 1 and Dataset 2, respectively. The proposed ensemble classifier outperforms current state-of-the-art algorithms by 1.50% and 6.67% for Dataset 1 and Dataset 2, respectively, highlighting its superior performance compared to existing methods. The study also utilizes a five-fold cross-validation technique to analyze the benefits and drawbacks of the proposed methodology for predicting cervical cancer using medical data. The Receiver Operating Characteristic (ROC) curves with corresponding Area Under the Curve (AUC) values are 0.95 for Dataset 1 and 0.97 for Dataset 2, indicating the overall performance of the classifiers in distinguishing between the classes. Additionally, we employed SHapley Additive exPlanations (SHAP) as an Explainable Artificial Intelligence (XAI) technique to visualize the classifier’s performance, providing insights into the important features contributing to cervical cancer identification. The results demonstrate that the proposed ensemble classifier can efficiently and accurately identify cervical cancer and potentially improve cervical cancer diagnosis and treatment.