Performance Analysis of Data Mining Techniques for the Prediction Breast Cancer Risk on Big Data

Solmaz Sohrabei, Alireza Atashi
{"title":"Performance Analysis of Data Mining Techniques for the Prediction Breast Cancer Risk on Big Data","authors":"Solmaz Sohrabei, Alireza Atashi","doi":"10.30699/FHI.V10I1.296","DOIUrl":null,"url":null,"abstract":"Introduction: Early detection breast cancer Causes it most curable cancer in among other types of cancer, early detection and accurate examination for breast cancer ensures an extended survival rate of the patients. Risk factors are an important parameter in breast cancer has an important effect on breast cancer. Data mining techniques have a growing reputation in the medical field because of high predictive capability and useful classification. These methods can help practitioners to develop tools that allow detecting the early stages of breast cancer.Material and Methods: The database used in this paper is provided by Motamed Cancer Institute, ACECR Tehran, Iran. It contains of 7834 records of breast cancer patients clinical and risk factors data. There were 4008 patients (52.4%) with breast cancers (malignant) and the remaining 3617 patients (47.6%) without breast cancers (benign). Support vector machine, multi-layer perceptron, decision tree, K nearest neighbor, random forest, naïve Bayesian models were developed using 20 fields (risk factor) of the database because database feature was restrictions. Used 10-fold crossover for models evaluate. Ultimately, the comparison of the models was made based on sensitivity, specificity and accuracy indicators.Results: Naïve Bayesian and artificial neural network are better models for the prediction of breast cancer risks. Naïve Bayesian had accuracy of 93%, specificity of 93.32%, sensitivity of 95056%, ROC of 0.95 and artificial neural network had accuracy of 93.23%, specificity of 91.98%, sensitivity of 92.69%, and ROC of 0.8.Conclusion: Strangely the different artificial intelligent calculations utilized in this examination yielded close precision subsequently these techniques could be utilized as option prescient instruments in the bosom malignancy risk considers. The significant prognostic components affecting risk pace of bosom disease distinguished in this investigation, which were approved by risk, are helpful and could be converted into choice help devices in the clinical area.","PeriodicalId":154611,"journal":{"name":"Frontiers in Health Informatics","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30699/FHI.V10I1.296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Introduction: Early detection breast cancer Causes it most curable cancer in among other types of cancer, early detection and accurate examination for breast cancer ensures an extended survival rate of the patients. Risk factors are an important parameter in breast cancer has an important effect on breast cancer. Data mining techniques have a growing reputation in the medical field because of high predictive capability and useful classification. These methods can help practitioners to develop tools that allow detecting the early stages of breast cancer.Material and Methods: The database used in this paper is provided by Motamed Cancer Institute, ACECR Tehran, Iran. It contains of 7834 records of breast cancer patients clinical and risk factors data. There were 4008 patients (52.4%) with breast cancers (malignant) and the remaining 3617 patients (47.6%) without breast cancers (benign). Support vector machine, multi-layer perceptron, decision tree, K nearest neighbor, random forest, naïve Bayesian models were developed using 20 fields (risk factor) of the database because database feature was restrictions. Used 10-fold crossover for models evaluate. Ultimately, the comparison of the models was made based on sensitivity, specificity and accuracy indicators.Results: Naïve Bayesian and artificial neural network are better models for the prediction of breast cancer risks. Naïve Bayesian had accuracy of 93%, specificity of 93.32%, sensitivity of 95056%, ROC of 0.95 and artificial neural network had accuracy of 93.23%, specificity of 91.98%, sensitivity of 92.69%, and ROC of 0.8.Conclusion: Strangely the different artificial intelligent calculations utilized in this examination yielded close precision subsequently these techniques could be utilized as option prescient instruments in the bosom malignancy risk considers. The significant prognostic components affecting risk pace of bosom disease distinguished in this investigation, which were approved by risk, are helpful and could be converted into choice help devices in the clinical area.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于大数据的乳腺癌风险预测数据挖掘技术性能分析
简介:早期发现乳腺癌是所有癌症中治愈率最高的癌症,乳腺癌的早期发现和准确检查可确保患者的生存率延长。危险因素是乳腺癌的一个重要参数,对乳腺癌有重要影响。数据挖掘技术由于具有较高的预测能力和有用的分类能力,在医学领域受到越来越多的关注。这些方法可以帮助医生开发出能够检测乳腺癌早期阶段的工具。材料和方法:本文使用的数据库由伊朗德黑兰ACECR Motamed癌症研究所提供。它包含7834例乳腺癌患者的临床记录和危险因素数据。其中恶性乳腺癌4008例(52.4%),非良性乳腺癌3617例(47.6%)。由于数据库特征受限制,利用数据库的20个字段(风险因子)建立了支持向量机、多层感知机、决策树、K近邻、随机森林、naïve贝叶斯模型。采用10倍交叉对模型进行评价。最后根据敏感性、特异性和准确性指标对模型进行比较。结果:Naïve贝叶斯和人工神经网络是较好的乳腺癌风险预测模型。Naïve贝叶斯准确率为93%,特异度为93.32%,灵敏度为95056%,ROC为0.95;人工神经网络准确率为93.23%,特异度为91.98%,灵敏度为92.69%,ROC为0.8。结论:奇怪的是,不同的人工智能计算在本检查中获得了接近的精度,因此这些技术可以作为乳房恶性肿瘤风险考虑的选择预测工具。本研究区分出影响胸部疾病风险步速的重要预后因素,经风险认可,具有一定的帮助作用,可转化为临床选择的辅助装置。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.20
自引率
0.00%
发文量
0
期刊最新文献
Self-Care Application for Rheumatoid Arthritis: Identifying Key Data Elements Effective use of electronic health records system for healthcare delivery in Ghana Predictive Modeling of COVID-19 Hospitalization Using Twenty Machine Learning Classification Algorithms on Cohort Data Development and Usability Evaluation of a Web-Based Health Information Technology Dashboard of Quality and Economic Indicators Potentially Highly Effective Drugs for COVID-19: Virtual Screening and Molecular Docking Study Through PyRx-Vina Approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1