机器学习算法在乳腺癌预测中的性能改进

Maryam Poornajaf, Sajad Yosefi
{"title":"机器学习算法在乳腺癌预测中的性能改进","authors":"Maryam Poornajaf, Sajad Yosefi","doi":"10.30699/fhi.v12i0.400","DOIUrl":null,"url":null,"abstract":"Introduction: Breast cancer is one of the most common cancers among women compared to all other ones. Machine learning techniques can bring a large contribute on the process of prediction and early diagnosis of breast cancer, became a research hotspot and has been proved as a strong technique. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for tumor classification prediction.Materials & Methods: Several supervised machine learning algorithms were utilized to diagnosis and prediction of cancer tumor such as Logistic Regression Decision Tree, Random Forest and KNN. The algorithms are applied to a dataset taken from the UCI repository including 699 samples. The dataset includes Breast cancer features. To enhance the algorithms’ performance, these features are analyzed, the feature importance score and cross validation are considered. In this paper ML algorithms improved coupled by limited and selective features to produce high classification accuracy in tumor classification.Results: As a result of evaluation, Logistic Regression algorithm with accuracy value equal to 99.14%, AUC ROC equal to 99.6%, Extra Tree algorithm with accuracy value equal to 99.14% and AUC ROC equal to 99.1% have better performance than other algorithms. therefore, these techniques can be useful for diagnosis and prediction of cancer tumor and prescribe it correctly.Conclusions: The technique of machine learning can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate breast cancer and indeed, the diagnosis and prediction of breast cancer is compared to determine the most appropriate classifier. ","PeriodicalId":154611,"journal":{"name":"Frontiers in Health Informatics","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improvement of the Performance of Machine Learning Algorithms in Predicting Breast Cancer\",\"authors\":\"Maryam Poornajaf, Sajad Yosefi\",\"doi\":\"10.30699/fhi.v12i0.400\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: Breast cancer is one of the most common cancers among women compared to all other ones. Machine learning techniques can bring a large contribute on the process of prediction and early diagnosis of breast cancer, became a research hotspot and has been proved as a strong technique. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for tumor classification prediction.Materials & Methods: Several supervised machine learning algorithms were utilized to diagnosis and prediction of cancer tumor such as Logistic Regression Decision Tree, Random Forest and KNN. The algorithms are applied to a dataset taken from the UCI repository including 699 samples. The dataset includes Breast cancer features. To enhance the algorithms’ performance, these features are analyzed, the feature importance score and cross validation are considered. In this paper ML algorithms improved coupled by limited and selective features to produce high classification accuracy in tumor classification.Results: As a result of evaluation, Logistic Regression algorithm with accuracy value equal to 99.14%, AUC ROC equal to 99.6%, Extra Tree algorithm with accuracy value equal to 99.14% and AUC ROC equal to 99.1% have better performance than other algorithms. therefore, these techniques can be useful for diagnosis and prediction of cancer tumor and prescribe it correctly.Conclusions: The technique of machine learning can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate breast cancer and indeed, the diagnosis and prediction of breast cancer is compared to determine the most appropriate classifier. \",\"PeriodicalId\":154611,\"journal\":{\"name\":\"Frontiers in Health Informatics\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30699/fhi.v12i0.400\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30699/fhi.v12i0.400","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

与其他癌症相比,乳腺癌是女性中最常见的癌症之一。机器学习技术可以为乳腺癌的预测和早期诊断过程带来巨大的贡献,成为研究热点,并已被证明是一项强有力的技术。本文旨在利用多维数据集上的机器学习模型,寻找最高效、最准确的肿瘤分类预测机器学习模型。材料与方法:将几种有监督机器学习算法用于癌症肿瘤的诊断和预测,如逻辑回归决策树、随机森林和KNN。这些算法应用于从UCI存储库中提取的数据集,包括699个样本。该数据集包括乳腺癌特征。为了提高算法的性能,对这些特征进行了分析,并考虑了特征重要性评分和交叉验证。本文对机器学习算法进行了改进,结合有限特征和选择性特征,提高了肿瘤分类的准确率。结果:经评价,Logistic回归算法的准确率为99.14%,AUC ROC为99.6%,Extra Tree算法的准确率为99.14%,AUC ROC为99.1%,均优于其他算法。因此,这些技术可用于癌症肿瘤的诊断和预测,并对其进行正确的处方。结论:机器学习技术可用于医学中对疾病的相关数据收集进行分析和预测。ROC曲线下的面积和评估标准涉及到许多机器学习的分类算法来评估乳腺癌,实际上,对乳腺癌的诊断和预测进行比较,以确定最合适的分类器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Improvement of the Performance of Machine Learning Algorithms in Predicting Breast Cancer
Introduction: Breast cancer is one of the most common cancers among women compared to all other ones. Machine learning techniques can bring a large contribute on the process of prediction and early diagnosis of breast cancer, became a research hotspot and has been proved as a strong technique. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for tumor classification prediction.Materials & Methods: Several supervised machine learning algorithms were utilized to diagnosis and prediction of cancer tumor such as Logistic Regression Decision Tree, Random Forest and KNN. The algorithms are applied to a dataset taken from the UCI repository including 699 samples. The dataset includes Breast cancer features. To enhance the algorithms’ performance, these features are analyzed, the feature importance score and cross validation are considered. In this paper ML algorithms improved coupled by limited and selective features to produce high classification accuracy in tumor classification.Results: As a result of evaluation, Logistic Regression algorithm with accuracy value equal to 99.14%, AUC ROC equal to 99.6%, Extra Tree algorithm with accuracy value equal to 99.14% and AUC ROC equal to 99.1% have better performance than other algorithms. therefore, these techniques can be useful for diagnosis and prediction of cancer tumor and prescribe it correctly.Conclusions: The technique of machine learning can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate breast cancer and indeed, the diagnosis and prediction of breast cancer is compared to determine the most appropriate classifier. 
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.20
自引率
0.00%
发文量
0
期刊最新文献
Self-Care Application for Rheumatoid Arthritis: Identifying Key Data Elements Effective use of electronic health records system for healthcare delivery in Ghana Predictive Modeling of COVID-19 Hospitalization Using Twenty Machine Learning Classification Algorithms on Cohort Data Development and Usability Evaluation of a Web-Based Health Information Technology Dashboard of Quality and Economic Indicators Potentially Highly Effective Drugs for COVID-19: Virtual Screening and Molecular Docking Study Through PyRx-Vina Approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1