A precise machine learning model: Detecting cervical cancer using feature selection and explainable AI

Q2 Medicine Journal of Pathology Informatics Pub Date : 2024-09-26 DOI:10.1016/j.jpi.2024.100398

Rashiduzzaman Shakil, Sadia Islam, Bonna Akter

{"title":"A precise machine learning model: Detecting cervical cancer using feature selection and explainable AI","authors":"Rashiduzzaman Shakil, Sadia Islam, Bonna Akter","doi":"10.1016/j.jpi.2024.100398","DOIUrl":null,"url":null,"abstract":"<div><div>Cervical cancer is a cancer that remains a significant global health challenge all over the world. Due to improper screening in the early stages, and healthcare disparities, a large number of women are suffering from this disease, and the mortality rate increases day by day. Hence, in these studies, we presented a precise approach utilizing six different machine learning models (decision tree, logistic regression, naïve bayes, random forest, k nearest neighbors, support vector machine), which can predict the early stage of cervical cancer by analysing 36 risk factor attributes of 858 individuals. In addition, two data balancing techniques—Synthetic Minority Oversampling Technique and Adaptive Synthetic Sampling—were used to mitigate the data imbalance issues. Furthermore, Chi-square and Least Absolute Shrinkage and Selection Operator are two distinct feature selection processes that have been applied to evaluate the feature rank, which are mostly correlated to identify the particular disease, and also integrate an explainable artificial intelligence technique, namely Shapley Additive Explanations, for clarifying the model outcome. The applied machine learning model outcome is evaluated by performance evaluation matrices, namely accuracy, sensitivity, specificity, precision, f1-score, false-positive rate and false-negative rate, and area under the Receiver operating characteristic curve score. The decision tree outperformed in Chi-square feature selection with outstanding accuracy with 97.60%, 98.73% sensitivity, 80% specificity, and 98.73% precision, respectively. During the data imbalance, DT performed 97% accuracy, 99.35% sensitivity, 69.23% specificity, and 97.45% precision. This research is focused on developing diagnostic frameworks with automated tools to improve the detection and management of cervical cancer, as well as on helping healthcare professionals deliver more efficient and personalized care to their patients.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"15 ","pages":"Article 100398"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pathology Informatics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2153353924000373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Cervical cancer is a cancer that remains a significant global health challenge all over the world. Due to improper screening in the early stages, and healthcare disparities, a large number of women are suffering from this disease, and the mortality rate increases day by day. Hence, in these studies, we presented a precise approach utilizing six different machine learning models (decision tree, logistic regression, naïve bayes, random forest, k nearest neighbors, support vector machine), which can predict the early stage of cervical cancer by analysing 36 risk factor attributes of 858 individuals. In addition, two data balancing techniques—Synthetic Minority Oversampling Technique and Adaptive Synthetic Sampling—were used to mitigate the data imbalance issues. Furthermore, Chi-square and Least Absolute Shrinkage and Selection Operator are two distinct feature selection processes that have been applied to evaluate the feature rank, which are mostly correlated to identify the particular disease, and also integrate an explainable artificial intelligence technique, namely Shapley Additive Explanations, for clarifying the model outcome. The applied machine learning model outcome is evaluated by performance evaluation matrices, namely accuracy, sensitivity, specificity, precision, f1-score, false-positive rate and false-negative rate, and area under the Receiver operating characteristic curve score. The decision tree outperformed in Chi-square feature selection with outstanding accuracy with 97.60%, 98.73% sensitivity, 80% specificity, and 98.73% precision, respectively. During the data imbalance, DT performed 97% accuracy, 99.35% sensitivity, 69.23% specificity, and 97.45% precision. This research is focused on developing diagnostic frameworks with automated tools to improve the detection and management of cervical cancer, as well as on helping healthcare professionals deliver more efficient and personalized care to their patients.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

精确的机器学习模型：利用特征选择和可解释人工智能检测宫颈癌

宫颈癌是一种癌症，在全世界仍然是一项重大的全球健康挑战。由于早期筛查不当和医疗保健方面的不平等，大量妇女罹患此病，死亡率与日俱增。因此，在这些研究中，我们提出了一种精确的方法，利用六种不同的机器学习模型（决策树、逻辑回归、奈夫贝叶斯、随机森林、k 近邻、支持向量机），通过分析 858 人的 36 个风险因素属性，预测宫颈癌的早期阶段。此外，还采用了两种数据平衡技术--合成少数群体过度采样技术和自适应合成采样技术，以缓解数据不平衡问题。此外，Chi-square 和 Least Absolute Shrinkage and Selection Operator 是两种不同的特征选择过程，用于评估特征等级，这些特征等级大多与特定疾病的识别相关，同时还集成了一种可解释的人工智能技术，即 Shapley Additive Explanations，用于澄清模型结果。应用机器学习模型的结果由性能评价矩阵进行评价，即准确度、灵敏度、特异性、精确度、f1-分数、假阳性率、假阴性率和接收者工作特征曲线下面积分数。决策树的准确率、灵敏度、特异度和精确度分别为 97.60%、98.73%、80% 和 98.73%，优于 Chi-square 特征选择。在数据不平衡时，决策树的准确率为 97%，灵敏度为 99.35%，特异度为 69.23%，精确度为 97.45%。这项研究的重点是开发带有自动化工具的诊断框架，以改善宫颈癌的检测和管理，并帮助医疗保健专业人员为患者提供更高效、更个性化的护理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Pathology Informatics Medicine-Pathology and Forensic Medicine

CiteScore

3.70

自引率

0.00%

发文量

审稿时长

18 weeks

期刊介绍： The Journal of Pathology Informatics (JPI) is an open access peer-reviewed journal dedicated to the advancement of pathology informatics. This is the official journal of the Association for Pathology Informatics (API). The journal aims to publish broadly about pathology informatics and freely disseminate all articles worldwide. This journal is of interest to pathologists, informaticians, academics, researchers, health IT specialists, information officers, IT staff, vendors, and anyone with an interest in informatics. We encourage submissions from anyone with an interest in the field of pathology informatics. We publish all types of papers related to pathology informatics including original research articles, technical notes, reviews, viewpoints, commentaries, editorials, symposia, meeting abstracts, book reviews, and correspondence to the editors. All submissions are subject to rigorous peer review by the well-regarded editorial board and by expert referees in appropriate specialties.