使用随机森林和高斯Naïve贝叶斯算法预测乳腺癌

Ita Sulistiani, Windu Wulandari, Fathia Dwi Astuti, Widodo
{"title":"使用随机森林和高斯Naïve贝叶斯算法预测乳腺癌","authors":"Ita Sulistiani, Windu Wulandari, Fathia Dwi Astuti, Widodo","doi":"10.1109/ICISIT54091.2022.9872808","DOIUrl":null,"url":null,"abstract":"Breast cancer is the second deadliest cancer after lung cancer. In 2021, ASCO-American Society of Clinical Oncology states that female invasive breast cancer increased by half a percent from 2008 to 2017. Breast cancer is induced by a misspelling of a cell, which causes the cell to become uncontrollable. If the problem is not treated soon within a few months, a large number of cells containing the wrong instructions can be detected as cancer. Machine learning has been widely used for developing breast cancer prediction models. Unfortunately, the problem of imbalanced datasets tends to have little to no attention in previous research using machine learning. This research aimed to develop breast cancer prediction models using Random Forest and Gaussian Naïve Bayes Classifier. Borderline Synthetic Minority Oversampling Technique (BSM) is applied to handle the imbalanced dataset problem; meanwhile, machine learning algorithms such as Random Forest and Gaussian Naïve Bayes algorithms were used to build the prediction models. Using UCI Machine Learning Wisconsin Breast Cancer Dataset (WBCD), the combination of BSM and Random Forest algorithm showed the highest recall score, approximately around 99.8%. Meanwhile, the BSM and Gaussian Naïve Bayes Classifier combination provided the lowest recall score among generated models, 78.2%.","PeriodicalId":214014,"journal":{"name":"2022 1st International Conference on Information System & Information Technology (ICISIT)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Breast Cancer Prediction Using Random Forest and Gaussian Naïve Bayes Algorithms\",\"authors\":\"Ita Sulistiani, Windu Wulandari, Fathia Dwi Astuti, Widodo\",\"doi\":\"10.1109/ICISIT54091.2022.9872808\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Breast cancer is the second deadliest cancer after lung cancer. In 2021, ASCO-American Society of Clinical Oncology states that female invasive breast cancer increased by half a percent from 2008 to 2017. Breast cancer is induced by a misspelling of a cell, which causes the cell to become uncontrollable. If the problem is not treated soon within a few months, a large number of cells containing the wrong instructions can be detected as cancer. Machine learning has been widely used for developing breast cancer prediction models. Unfortunately, the problem of imbalanced datasets tends to have little to no attention in previous research using machine learning. This research aimed to develop breast cancer prediction models using Random Forest and Gaussian Naïve Bayes Classifier. Borderline Synthetic Minority Oversampling Technique (BSM) is applied to handle the imbalanced dataset problem; meanwhile, machine learning algorithms such as Random Forest and Gaussian Naïve Bayes algorithms were used to build the prediction models. Using UCI Machine Learning Wisconsin Breast Cancer Dataset (WBCD), the combination of BSM and Random Forest algorithm showed the highest recall score, approximately around 99.8%. Meanwhile, the BSM and Gaussian Naïve Bayes Classifier combination provided the lowest recall score among generated models, 78.2%.\",\"PeriodicalId\":214014,\"journal\":{\"name\":\"2022 1st International Conference on Information System & Information Technology (ICISIT)\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 1st International Conference on Information System & Information Technology (ICISIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICISIT54091.2022.9872808\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 1st International Conference on Information System & Information Technology (ICISIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICISIT54091.2022.9872808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

乳腺癌是仅次于肺癌的第二致命的癌症。2021年,asco -美国临床肿瘤学会指出,从2008年到2017年,女性浸润性乳腺癌增加了0.5%。乳腺癌是由一个细胞的拼写错误引起的,这会导致细胞变得无法控制。如果在几个月内不及时治疗,大量含有错误指令的细胞就会被诊断为癌症。机器学习已被广泛用于开发乳腺癌预测模型。不幸的是,在以前使用机器学习的研究中,数据集不平衡的问题往往很少或根本没有得到关注。本研究旨在利用随机森林和高斯Naïve贝叶斯分类器建立乳腺癌预测模型。采用边界合成少数过采样技术(BSM)处理数据集不平衡问题;同时,利用随机森林和高斯Naïve贝叶斯算法等机器学习算法建立预测模型。使用UCI机器学习威斯康星乳腺癌数据集(WBCD), BSM和随机森林算法的组合显示出最高的召回率,约为99.8%。同时,BSM和Naïve高斯贝叶斯分类器组合在生成的模型中召回率最低,为78.2%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Breast Cancer Prediction Using Random Forest and Gaussian Naïve Bayes Algorithms
Breast cancer is the second deadliest cancer after lung cancer. In 2021, ASCO-American Society of Clinical Oncology states that female invasive breast cancer increased by half a percent from 2008 to 2017. Breast cancer is induced by a misspelling of a cell, which causes the cell to become uncontrollable. If the problem is not treated soon within a few months, a large number of cells containing the wrong instructions can be detected as cancer. Machine learning has been widely used for developing breast cancer prediction models. Unfortunately, the problem of imbalanced datasets tends to have little to no attention in previous research using machine learning. This research aimed to develop breast cancer prediction models using Random Forest and Gaussian Naïve Bayes Classifier. Borderline Synthetic Minority Oversampling Technique (BSM) is applied to handle the imbalanced dataset problem; meanwhile, machine learning algorithms such as Random Forest and Gaussian Naïve Bayes algorithms were used to build the prediction models. Using UCI Machine Learning Wisconsin Breast Cancer Dataset (WBCD), the combination of BSM and Random Forest algorithm showed the highest recall score, approximately around 99.8%. Meanwhile, the BSM and Gaussian Naïve Bayes Classifier combination provided the lowest recall score among generated models, 78.2%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analysis of Employee Attendance Mobile Application Problems Based on User Reviews: A Case Study Information System Analysis And Design For Mobile-Based Homain Applications Classification of Glaucoma in Fundus Images Using Convolutional Neural Network with MobileNet Architecture Kampusku: Information Portal Mobile Application Design of Private Universities in Indonesia Measurement of Employee Information Security Awareness on Data Security: A Case Study at XYZ Polytechnic
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1