使用随机森林算法在肺癌预测分析中的应用数据

Laura Sari, Annisa Romadloni, R. Listyaningrum
{"title":"使用随机森林算法在肺癌预测分析中的应用数据","authors":"Laura Sari, Annisa Romadloni, R. Listyaningrum","doi":"10.35970/infotekmesin.v14i1.1751","DOIUrl":null,"url":null,"abstract":"Cancer is the second highest cause of death in the world. In Indonesia, it is a disease with a high mortality rate. Most patients do not realize that they have lung cancer thus the treatment is sometimes too late. A prediction method with a high degree of accuracy is needed to detect lung cancer earlier. Previous research used data mining calcification methods with the Naïve Bayes algorithm to predict lung cancer. This research resulted in high recall values for the positive class (Yes class) but low for the negative class (No class). This research was made using the Random Forest algorithm which is known to have good performance. The modeling is optimized by applying the K-fold Cross Validation technique. The Random Forest algorithm produces a higher Accuracy value than the Naïve Bayes algorithm, which is 98.4%. This algorithm produces 100% Recall for the positive class, 80% for the negative class and provides a 100% correct prediction as can be seen from the AUC value of 1. Although a statistical test with a significance level of 5% shows the results of the two algorithms are not significantly different.","PeriodicalId":33598,"journal":{"name":"Infotekmesin Media Komunikasi Ilmiah Politeknik Cilacap","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Penerapan Data Mining dalam Analisis Prediksi Kanker Paru Menggunakan Algoritma Random Forest\",\"authors\":\"Laura Sari, Annisa Romadloni, R. Listyaningrum\",\"doi\":\"10.35970/infotekmesin.v14i1.1751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cancer is the second highest cause of death in the world. In Indonesia, it is a disease with a high mortality rate. Most patients do not realize that they have lung cancer thus the treatment is sometimes too late. A prediction method with a high degree of accuracy is needed to detect lung cancer earlier. Previous research used data mining calcification methods with the Naïve Bayes algorithm to predict lung cancer. This research resulted in high recall values for the positive class (Yes class) but low for the negative class (No class). This research was made using the Random Forest algorithm which is known to have good performance. The modeling is optimized by applying the K-fold Cross Validation technique. The Random Forest algorithm produces a higher Accuracy value than the Naïve Bayes algorithm, which is 98.4%. This algorithm produces 100% Recall for the positive class, 80% for the negative class and provides a 100% correct prediction as can be seen from the AUC value of 1. Although a statistical test with a significance level of 5% shows the results of the two algorithms are not significantly different.\",\"PeriodicalId\":33598,\"journal\":{\"name\":\"Infotekmesin Media Komunikasi Ilmiah Politeknik Cilacap\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Infotekmesin Media Komunikasi Ilmiah Politeknik Cilacap\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.35970/infotekmesin.v14i1.1751\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infotekmesin Media Komunikasi Ilmiah Politeknik Cilacap","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35970/infotekmesin.v14i1.1751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

癌症是世界上第二大死因。在印度尼西亚,这是一种死亡率很高的疾病。大多数患者没有意识到他们患有癌症,因此治疗有时为时已晚。早期发现癌症需要一种高准确度的预测方法。先前的研究使用数据挖掘钙化方法和Naive Bayes算法来预测癌症。这项研究导致积极类(是类)的回忆值较高,而消极类(否类)的回想值较低。这项研究是使用随机森林算法进行的,该算法已知具有良好的性能。通过应用K折叠交叉验证技术对建模进行了优化。随机森林算法产生的准确度值高于朴素贝叶斯算法,为98.4%。该算法对正类产生100%的回忆,对负类产生80%的回忆,并提供100%的正确预测,如AUC值1所示。尽管显著性水平为5%的统计测试表明,两种算法的结果没有显著差异。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Penerapan Data Mining dalam Analisis Prediksi Kanker Paru Menggunakan Algoritma Random Forest
Cancer is the second highest cause of death in the world. In Indonesia, it is a disease with a high mortality rate. Most patients do not realize that they have lung cancer thus the treatment is sometimes too late. A prediction method with a high degree of accuracy is needed to detect lung cancer earlier. Previous research used data mining calcification methods with the Naïve Bayes algorithm to predict lung cancer. This research resulted in high recall values for the positive class (Yes class) but low for the negative class (No class). This research was made using the Random Forest algorithm which is known to have good performance. The modeling is optimized by applying the K-fold Cross Validation technique. The Random Forest algorithm produces a higher Accuracy value than the Naïve Bayes algorithm, which is 98.4%. This algorithm produces 100% Recall for the positive class, 80% for the negative class and provides a 100% correct prediction as can be seen from the AUC value of 1. Although a statistical test with a significance level of 5% shows the results of the two algorithms are not significantly different.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
30
审稿时长
12 weeks
期刊最新文献
Perhitungan Pengendalian Persediaan Fast Moving Spare Part Dengan Metode Min-Max Stock Rancang Bangun Sistem Pengisian Otomatis Merica Bubuk Berbasis Kontroler Arduino Nano Klasifikasi Opini Publik di Twitter Terhadap Bakal Calon Presiden Indonesia Tahun 2024 Menggunakan LSTM Secara Realtime Berbasis Website Optimalisasi Labview Sebagai Kendali dan Monitoring Arus Tegangan pada Modul Solar Cell Menggunakan Jaringan Lokal Analisis Kekuatan Tarik dan Regangan Filamen Carbon Fiber Hasil 3D Print dengan Variasi Fill Density
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1