A precise machine learning model: Detecting cervical cancer using feature selection and explainable AI

Rashiduzzaman Shakil, Sadia Islam, Bonna Akter
{"title":"A precise machine learning model: Detecting cervical cancer using feature selection and explainable AI","authors":"Rashiduzzaman Shakil,&nbsp;Sadia Islam,&nbsp;Bonna Akter","doi":"10.1016/j.jpi.2024.100398","DOIUrl":null,"url":null,"abstract":"<div><div>Cervical cancer is a cancer that remains a significant global health challenge all over the world. Due to improper screening in the early stages, and healthcare disparities, a large number of women are suffering from this disease, and the mortality rate increases day by day. Hence, in these studies, we presented a precise approach utilizing six different machine learning models (decision tree, logistic regression, naïve bayes, random forest, k nearest neighbors, support vector machine), which can predict the early stage of cervical cancer by analysing 36 risk factor attributes of 858 individuals. In addition, two data balancing techniques—Synthetic Minority Oversampling Technique and Adaptive Synthetic Sampling—were used to mitigate the data imbalance issues. Furthermore, Chi-square and Least Absolute Shrinkage and Selection Operator are two distinct feature selection processes that have been applied to evaluate the feature rank, which are mostly correlated to identify the particular disease, and also integrate an explainable artificial intelligence technique, namely Shapley Additive Explanations, for clarifying the model outcome. The applied machine learning model outcome is evaluated by performance evaluation matrices, namely accuracy, sensitivity, specificity, precision, f1-score, false-positive rate and false-negative rate, and area under the Receiver operating characteristic curve score. The decision tree outperformed in Chi-square feature selection with outstanding accuracy with 97.60%, 98.73% sensitivity, 80% specificity, and 98.73% precision, respectively. During the data imbalance, DT performed 97% accuracy, 99.35% sensitivity, 69.23% specificity, and 97.45% precision. This research is focused on developing diagnostic frameworks with automated tools to improve the detection and management of cervical cancer, as well as on helping healthcare professionals deliver more efficient and personalized care to their patients.</div></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"15 ","pages":"Article 100398"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pathology Informatics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2153353924000373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Cervical cancer is a cancer that remains a significant global health challenge all over the world. Due to improper screening in the early stages, and healthcare disparities, a large number of women are suffering from this disease, and the mortality rate increases day by day. Hence, in these studies, we presented a precise approach utilizing six different machine learning models (decision tree, logistic regression, naïve bayes, random forest, k nearest neighbors, support vector machine), which can predict the early stage of cervical cancer by analysing 36 risk factor attributes of 858 individuals. In addition, two data balancing techniques—Synthetic Minority Oversampling Technique and Adaptive Synthetic Sampling—were used to mitigate the data imbalance issues. Furthermore, Chi-square and Least Absolute Shrinkage and Selection Operator are two distinct feature selection processes that have been applied to evaluate the feature rank, which are mostly correlated to identify the particular disease, and also integrate an explainable artificial intelligence technique, namely Shapley Additive Explanations, for clarifying the model outcome. The applied machine learning model outcome is evaluated by performance evaluation matrices, namely accuracy, sensitivity, specificity, precision, f1-score, false-positive rate and false-negative rate, and area under the Receiver operating characteristic curve score. The decision tree outperformed in Chi-square feature selection with outstanding accuracy with 97.60%, 98.73% sensitivity, 80% specificity, and 98.73% precision, respectively. During the data imbalance, DT performed 97% accuracy, 99.35% sensitivity, 69.23% specificity, and 97.45% precision. This research is focused on developing diagnostic frameworks with automated tools to improve the detection and management of cervical cancer, as well as on helping healthcare professionals deliver more efficient and personalized care to their patients.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
精确的机器学习模型:利用特征选择和可解释人工智能检测宫颈癌
宫颈癌是一种癌症,在全世界仍然是一项重大的全球健康挑战。由于早期筛查不当和医疗保健方面的不平等,大量妇女罹患此病,死亡率与日俱增。因此,在这些研究中,我们提出了一种精确的方法,利用六种不同的机器学习模型(决策树、逻辑回归、奈夫贝叶斯、随机森林、k 近邻、支持向量机),通过分析 858 人的 36 个风险因素属性,预测宫颈癌的早期阶段。此外,还采用了两种数据平衡技术--合成少数群体过度采样技术和自适应合成采样技术,以缓解数据不平衡问题。此外,Chi-square 和 Least Absolute Shrinkage and Selection Operator 是两种不同的特征选择过程,用于评估特征等级,这些特征等级大多与特定疾病的识别相关,同时还集成了一种可解释的人工智能技术,即 Shapley Additive Explanations,用于澄清模型结果。应用机器学习模型的结果由性能评价矩阵进行评价,即准确度、灵敏度、特异性、精确度、f1-分数、假阳性率、假阴性率和接收者工作特征曲线下面积分数。决策树的准确率、灵敏度、特异度和精确度分别为 97.60%、98.73%、80% 和 98.73%,优于 Chi-square 特征选择。在数据不平衡时,决策树的准确率为 97%,灵敏度为 99.35%,特异度为 69.23%,精确度为 97.45%。这项研究的重点是开发带有自动化工具的诊断框架,以改善宫颈癌的检测和管理,并帮助医疗保健专业人员为患者提供更高效、更个性化的护理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Pathology Informatics
Journal of Pathology Informatics Medicine-Pathology and Forensic Medicine
CiteScore
3.70
自引率
0.00%
发文量
2
审稿时长
18 weeks
期刊介绍: The Journal of Pathology Informatics (JPI) is an open access peer-reviewed journal dedicated to the advancement of pathology informatics. This is the official journal of the Association for Pathology Informatics (API). The journal aims to publish broadly about pathology informatics and freely disseminate all articles worldwide. This journal is of interest to pathologists, informaticians, academics, researchers, health IT specialists, information officers, IT staff, vendors, and anyone with an interest in informatics. We encourage submissions from anyone with an interest in the field of pathology informatics. We publish all types of papers related to pathology informatics including original research articles, technical notes, reviews, viewpoints, commentaries, editorials, symposia, meeting abstracts, book reviews, and correspondence to the editors. All submissions are subject to rigorous peer review by the well-regarded editorial board and by expert referees in appropriate specialties.
期刊最新文献
Improving the generalizability of white blood cell classification with few-shot domain adaptation Pathology Informatics Summit 2024 Abstracts Ann Arbor Marriott at Eagle Crest Resort May 20-23, 2024 Ann Arbor, Michigan Deep learning-based classification of breast cancer molecular subtypes from H&E whole-slide images. Enhancing human phenotype ontology term extraction through synthetic case reports and embedding-based retrieval: A novel approach for improved biomedical data annotation. Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1