Unveiling Cancer: A Data-Driven Approach for Early Identification and Prediction Using F-RUS-RF Model

IF 3 4区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC International Journal of Imaging Systems and Technology Pub Date : 2024-11-20 DOI:10.1002/ima.23221
Ashir Javeed, Peter Anderberg, Muhammad Asim Saleem, Ahmad Nauman Ghazi, Johan Sanmartin Berglund
{"title":"Unveiling Cancer: A Data-Driven Approach for Early Identification and Prediction Using F-RUS-RF Model","authors":"Ashir Javeed,&nbsp;Peter Anderberg,&nbsp;Muhammad Asim Saleem,&nbsp;Ahmad Nauman Ghazi,&nbsp;Johan Sanmartin Berglund","doi":"10.1002/ima.23221","DOIUrl":null,"url":null,"abstract":"<p>Globally, cancer is the second-leading cause of death after cardiovascular disease. To improve survival rates, risk factors and cancer predictors must be identified early. From the literature, researchers have developed several kinds of machine learning-based diagnostic systems for early cancer prediction. This study presented a diagnostic system that can identify the risk factors linked to the onset of cancer in order to anticipate cancer early. The newly constructed diagnostic system consists of two modules: the first module relies on a statistical F-score method to rank the variables in the dataset, and the second module deploys the random forest (RF) model for classification. Using a genetic algorithm, the hyperparameters of the RF model were optimized for improved accuracy. A dataset including 10 765 samples with 74 variables per sample was gathered from the Swedish National Study on Aging and Care (SNAC). The acquired dataset has a bias issue due to the extreme imbalance between the classes. In order to address this issue and prevent bias in the newly constructed model, we balanced the classes using a random undersampling strategy. The model's components are integrated into a single unit called F-RUS-RF. With a sensitivity of 92.25% and a specificity of 85.14%, the F-RUS-RF model achieved the highest accuracy of 86.15%, utilizing only six highly ranked variables according to the statistical F-score approach. We can lower the incidence of cancer in the aging population by addressing the risk factors for cancer that the F-RUS-RF model found.</p>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"34 6","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ima.23221","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Imaging Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ima.23221","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Globally, cancer is the second-leading cause of death after cardiovascular disease. To improve survival rates, risk factors and cancer predictors must be identified early. From the literature, researchers have developed several kinds of machine learning-based diagnostic systems for early cancer prediction. This study presented a diagnostic system that can identify the risk factors linked to the onset of cancer in order to anticipate cancer early. The newly constructed diagnostic system consists of two modules: the first module relies on a statistical F-score method to rank the variables in the dataset, and the second module deploys the random forest (RF) model for classification. Using a genetic algorithm, the hyperparameters of the RF model were optimized for improved accuracy. A dataset including 10 765 samples with 74 variables per sample was gathered from the Swedish National Study on Aging and Care (SNAC). The acquired dataset has a bias issue due to the extreme imbalance between the classes. In order to address this issue and prevent bias in the newly constructed model, we balanced the classes using a random undersampling strategy. The model's components are integrated into a single unit called F-RUS-RF. With a sensitivity of 92.25% and a specificity of 85.14%, the F-RUS-RF model achieved the highest accuracy of 86.15%, utilizing only six highly ranked variables according to the statistical F-score approach. We can lower the incidence of cancer in the aging population by addressing the risk factors for cancer that the F-RUS-RF model found.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
揭开癌症的面纱:利用 F-RUS-RF 模型进行早期识别和预测的数据驱动方法
在全球范围内,癌症是仅次于心血管疾病的第二大死因。为了提高生存率,必须及早发现风险因素和癌症预测因素。根据文献,研究人员开发了多种基于机器学习的早期癌症预测诊断系统。本研究提出了一种诊断系统,可以识别与癌症发病相关的风险因素,从而及早预测癌症。新构建的诊断系统由两个模块组成:第一个模块依靠统计 F 分数法对数据集中的变量进行排序,第二个模块部署随机森林(RF)模型进行分类。利用遗传算法,对 RF 模型的超参数进行了优化,以提高准确性。瑞典国家老龄化与护理研究(SNAC)收集了一个数据集,其中包括 10 765 个样本,每个样本有 74 个变量。获得的数据集存在偏差问题,原因是类别之间极不平衡。为了解决这个问题并防止在新构建的模型中出现偏差,我们采用了随机欠采样策略来平衡类别。该模型的各个组成部分被整合为一个名为 F-RUS-RF 的单元。F-RUS-RF 模型的灵敏度为 92.25%,特异度为 85.14%,根据统计 F 分数法,仅利用六个高度排序的变量,就达到了 86.15% 的最高准确度。我们可以通过解决 F-RUS-RF 模型发现的癌症风险因素来降低老龄人口的癌症发病率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Journal of Imaging Systems and Technology
International Journal of Imaging Systems and Technology 工程技术-成像科学与照相技术
CiteScore
6.90
自引率
6.10%
发文量
138
审稿时长
3 months
期刊介绍: The International Journal of Imaging Systems and Technology (IMA) is a forum for the exchange of ideas and results relevant to imaging systems, including imaging physics and informatics. The journal covers all imaging modalities in humans and animals. IMA accepts technically sound and scientifically rigorous research in the interdisciplinary field of imaging, including relevant algorithmic research and hardware and software development, and their applications relevant to medical research. The journal provides a platform to publish original research in structural and functional imaging. The journal is also open to imaging studies of the human body and on animals that describe novel diagnostic imaging and analyses methods. Technical, theoretical, and clinical research in both normal and clinical populations is encouraged. Submissions describing methods, software, databases, replication studies as well as negative results are also considered. The scope of the journal includes, but is not limited to, the following in the context of biomedical research: Imaging and neuro-imaging modalities: structural MRI, functional MRI, PET, SPECT, CT, ultrasound, EEG, MEG, NIRS etc.; Neuromodulation and brain stimulation techniques such as TMS and tDCS; Software and hardware for imaging, especially related to human and animal health; Image segmentation in normal and clinical populations; Pattern analysis and classification using machine learning techniques; Computational modeling and analysis; Brain connectivity and connectomics; Systems-level characterization of brain function; Neural networks and neurorobotics; Computer vision, based on human/animal physiology; Brain-computer interface (BCI) technology; Big data, databasing and data mining.
期刊最新文献
Issue Information A Lightweight Multimodal Xception Network for Glioma Grading Using MRI Images Unveiling Cancer: A Data-Driven Approach for Early Identification and Prediction Using F-RUS-RF Model Predicting the Early Detection of Breast Cancer Using Hybrid Machine Learning Systems and Thermographic Imaging CATNet: A Cross Attention and Texture-Aware Network for Polyp Segmentation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1