Feature reduction for hepatocellular carcinoma prediction using machine learning algorithms

IF 8.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Journal of Big Data Pub Date : 2024-06-18 DOI:10.1186/s40537-024-00944-3
Ghada Mostafa, Hamdi Mahmoud, Tarek Abd El-Hafeez, Mohamed E. ElAraby
{"title":"Feature reduction for hepatocellular carcinoma prediction using machine learning algorithms","authors":"Ghada Mostafa, Hamdi Mahmoud, Tarek Abd El-Hafeez, Mohamed E. ElAraby","doi":"10.1186/s40537-024-00944-3","DOIUrl":null,"url":null,"abstract":"<p>Hepatocellular carcinoma (HCC) is a highly prevalent form of liver cancer that necessitates accurate prediction models for early diagnosis and effective treatment. Machine learning algorithms have demonstrated promising results in various medical domains, including cancer prediction. In this study, we propose a comprehensive approach for HCC prediction by comparing the performance of different machine learning algorithms before and after applying feature reduction methods. We employ popular feature reduction techniques, such as weighting features, hidden features correlation, feature selection, and optimized selection, to extract a reduced feature subset that captures the most relevant information related to HCC. Subsequently, we apply multiple algorithms, including Naive Bayes, support vector machines (SVM), Neural Networks, Decision Tree, and K nearest neighbors (KNN), to both the original high-dimensional dataset and the reduced feature set. By comparing the predictive accuracy, precision, F Score, recall, and execution time of each algorithm, we assess the effectiveness of feature reduction in enhancing the performance of HCC prediction models. Our experimental results, obtained using a comprehensive dataset comprising clinical features of HCC patients, demonstrate that feature reduction significantly improves the performance of all examined algorithms. Notably, the reduced feature set consistently outperforms the original high-dimensional dataset in terms of prediction accuracy and execution time. After applying feature reduction techniques, the employed algorithms, namely decision trees, Naive Bayes, KNN, neural networks, and SVM achieved accuracies of 96%, 97.33%, 94.67%, 96%, and 96.00%, respectively.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"22 1","pages":""},"PeriodicalIF":8.6000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s40537-024-00944-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Hepatocellular carcinoma (HCC) is a highly prevalent form of liver cancer that necessitates accurate prediction models for early diagnosis and effective treatment. Machine learning algorithms have demonstrated promising results in various medical domains, including cancer prediction. In this study, we propose a comprehensive approach for HCC prediction by comparing the performance of different machine learning algorithms before and after applying feature reduction methods. We employ popular feature reduction techniques, such as weighting features, hidden features correlation, feature selection, and optimized selection, to extract a reduced feature subset that captures the most relevant information related to HCC. Subsequently, we apply multiple algorithms, including Naive Bayes, support vector machines (SVM), Neural Networks, Decision Tree, and K nearest neighbors (KNN), to both the original high-dimensional dataset and the reduced feature set. By comparing the predictive accuracy, precision, F Score, recall, and execution time of each algorithm, we assess the effectiveness of feature reduction in enhancing the performance of HCC prediction models. Our experimental results, obtained using a comprehensive dataset comprising clinical features of HCC patients, demonstrate that feature reduction significantly improves the performance of all examined algorithms. Notably, the reduced feature set consistently outperforms the original high-dimensional dataset in terms of prediction accuracy and execution time. After applying feature reduction techniques, the employed algorithms, namely decision trees, Naive Bayes, KNN, neural networks, and SVM achieved accuracies of 96%, 97.33%, 94.67%, 96%, and 96.00%, respectively.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用机器学习算法减少预测肝细胞癌的特征
肝细胞癌(HCC)是一种高发的肝癌,需要准确的预测模型来进行早期诊断和有效治疗。机器学习算法在包括癌症预测在内的各种医疗领域都取得了可喜的成果。在本研究中,我们通过比较不同机器学习算法在应用特征缩减方法前后的性能,提出了一种用于 HCC 预测的综合方法。我们采用了流行的特征缩减技术,如加权特征、隐藏特征相关性、特征选择和优化选择,以提取能捕捉与 HCC 最相关信息的缩减特征子集。随后,我们对原始高维数据集和缩减后的特征集应用了多种算法,包括奈维贝叶、支持向量机(SVM)、神经网络、决策树和 K 近邻(KNN)。通过比较每种算法的预测准确率、精确度、F Score、召回率和执行时间,我们评估了特征缩减在提高 HCC 预测模型性能方面的有效性。我们使用包含 HCC 患者临床特征的综合数据集获得的实验结果表明,特征缩减显著提高了所有研究算法的性能。值得注意的是,缩减后的特征集在预测准确性和执行时间方面始终优于原始高维数据集。在应用特征缩减技术后,所采用的算法,即决策树、Naive Bayes、KNN、神经网络和 SVM 的准确率分别达到了 96%、97.33%、94.67%、96% 和 96.00%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Big Data
Journal of Big Data Computer Science-Information Systems
CiteScore
17.80
自引率
3.70%
发文量
105
审稿时长
13 weeks
期刊介绍: The Journal of Big Data publishes high-quality, scholarly research papers, methodologies, and case studies covering a broad spectrum of topics, from big data analytics to data-intensive computing and all applications of big data research. It addresses challenges facing big data today and in the future, including data capture and storage, search, sharing, analytics, technologies, visualization, architectures, data mining, machine learning, cloud computing, distributed systems, and scalable storage. The journal serves as a seminal source of innovative material for academic researchers and practitioners alike.
期刊最新文献
Shielding networks: enhancing intrusion detection with hybrid feature selection and stack ensemble learning Machine learning and deep learning models based grid search cross validation for short-term solar irradiance forecasting Optimizing poultry audio signal classification with deep learning and burn layer fusion Integrating microarray-based spatial transcriptomics and RNA-seq reveals tissue architecture in colorectal cancer A model for investment type recommender system based on the potential investors based on investors and experts feedback using ANFIS and MNN
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1