基于数据的机器学习优化肺癌分类方法

Windania Purba, Sumita Wardani, Diana Febrina Lumbantoruan, Fransiska Celia Ivoi Silalahi, Thomas Leo Edison
{"title":"基于数据的机器学习优化肺癌分类方法","authors":"Windania Purba, Sumita Wardani, Diana Febrina Lumbantoruan, Fransiska Celia Ivoi Silalahi, Thomas Leo Edison","doi":"10.34012/jurnalsisteminformasidanilmukomputer.v6i2.3413","DOIUrl":null,"url":null,"abstract":"Lung cancer is one of the three deadliest diseases in the world and has rapidly developed. Based on this, researchers conducted research to predict the factors that influence lung cancer. One method to identify this is using data mining methods and classification techniques. Researchers used several popular algorithms in classification to make comparisons of the most accurate algorithms for lung cancer classification. The algorithms used include K-Nearest Neighbor, Random Forest Classifier, Logistic Regression, Linear SVM, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, Kernel SVM, and MLPClassifier. The researcher used this algorithm because, in the research that the researcher found on the Kaggle platform, he examined the comparison of the algorithm using the breast cancer dataset. In previous studies, their researchers used SVM, which obtained an accuracy of 96.47%, Neural Networks of 97.06%, and Naïve Bayes with an accuracy of 91.18% to study breast cancer. The difference from previous research is that this study uses several existing algorithms in Machine Learning such as K-Nearest Neighbor, Random Forest Classifier, Logistic Regression, Linear SVM, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, Kernel SVM, and MLPClassifier. In addition, this research was conducted to see whether the results of the accuracy of the algorithm that the researchers carried out using the lung cancer dataset had different results. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood.","PeriodicalId":499639,"journal":{"name":"Jusikom : Jurnal Sistem Informasi Ilmu Komputer","volume":"164 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OPTIMIZATION OF LUNG CANCER CLASSIFICATION METHOD USING EDA-BASED MACHINE LEARNING\",\"authors\":\"Windania Purba, Sumita Wardani, Diana Febrina Lumbantoruan, Fransiska Celia Ivoi Silalahi, Thomas Leo Edison\",\"doi\":\"10.34012/jurnalsisteminformasidanilmukomputer.v6i2.3413\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Lung cancer is one of the three deadliest diseases in the world and has rapidly developed. Based on this, researchers conducted research to predict the factors that influence lung cancer. One method to identify this is using data mining methods and classification techniques. Researchers used several popular algorithms in classification to make comparisons of the most accurate algorithms for lung cancer classification. The algorithms used include K-Nearest Neighbor, Random Forest Classifier, Logistic Regression, Linear SVM, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, Kernel SVM, and MLPClassifier. The researcher used this algorithm because, in the research that the researcher found on the Kaggle platform, he examined the comparison of the algorithm using the breast cancer dataset. In previous studies, their researchers used SVM, which obtained an accuracy of 96.47%, Neural Networks of 97.06%, and Naïve Bayes with an accuracy of 91.18% to study breast cancer. The difference from previous research is that this study uses several existing algorithms in Machine Learning such as K-Nearest Neighbor, Random Forest Classifier, Logistic Regression, Linear SVM, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, Kernel SVM, and MLPClassifier. In addition, this research was conducted to see whether the results of the accuracy of the algorithm that the researchers carried out using the lung cancer dataset had different results. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood.\",\"PeriodicalId\":499639,\"journal\":{\"name\":\"Jusikom : Jurnal Sistem Informasi Ilmu Komputer\",\"volume\":\"164 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jusikom : Jurnal Sistem Informasi Ilmu Komputer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.34012/jurnalsisteminformasidanilmukomputer.v6i2.3413\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jusikom : Jurnal Sistem Informasi Ilmu Komputer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34012/jurnalsisteminformasidanilmukomputer.v6i2.3413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

肺癌是世界上最致命的三大疾病之一,发展迅速。在此基础上,研究人员进行了预测肺癌影响因素的研究。识别这一点的一种方法是使用数据挖掘方法和分类技术。研究人员使用了几种流行的分类算法来比较最准确的肺癌分类算法。使用的算法包括k近邻、随机森林分类器、逻辑回归、线性支持向量机、Naïve贝叶斯、决策树、随机森林、梯度增强、核支持向量机和MLPClassifier。研究人员使用这个算法是因为,在研究人员在Kaggle平台上发现的研究中,他使用乳腺癌数据集检查了算法的比较。在之前的研究中,他们的研究人员分别使用准确率为96.47%的SVM、准确率为97.06%的Neural Networks和准确率为91.18%的Naïve Bayes来研究乳腺癌。与以往研究的不同之处在于,本研究使用了机器学习中现有的几种算法,如K-Nearest Neighbor、Random Forest Classifier、Logistic Regression、Linear SVM、Naïve Bayes、Decision Tree、Random Forest、Gradient Boosting、Kernel SVM和MLPClassifier。此外,进行这项研究是为了看看研究人员使用肺癌数据集进行的算法的准确性结果是否有不同的结果。本研究结果发现,准确率更高的算法是Random Forest和Gradient Boosting,准确率值为100%,而在以往的研究中,准确率值是相同的。尽管如此,梯度增强比随机森林具有更高的精度值。然后,根据本研究中使用的数据,预测肺癌诊断的最大影响因素是肥胖和咳血。本研究结果发现,准确率更高的算法是Random Forest和Gradient Boosting,准确率值为100%,而在以往的研究中,准确率值是相同的。尽管如此,梯度增强比随机森林具有更高的精度值。然后,根据本研究中使用的数据,预测肺癌诊断的最大影响因素是肥胖和咳血。本研究结果发现,准确率更高的算法是Random Forest和Gradient Boosting,准确率值为100%,而在以往的研究中,准确率值是相同的。尽管如此,梯度增强比随机森林具有更高的精度值。然后,根据本研究中使用的数据,预测肺癌诊断的最大影响因素是肥胖和咳血。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
OPTIMIZATION OF LUNG CANCER CLASSIFICATION METHOD USING EDA-BASED MACHINE LEARNING
Lung cancer is one of the three deadliest diseases in the world and has rapidly developed. Based on this, researchers conducted research to predict the factors that influence lung cancer. One method to identify this is using data mining methods and classification techniques. Researchers used several popular algorithms in classification to make comparisons of the most accurate algorithms for lung cancer classification. The algorithms used include K-Nearest Neighbor, Random Forest Classifier, Logistic Regression, Linear SVM, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, Kernel SVM, and MLPClassifier. The researcher used this algorithm because, in the research that the researcher found on the Kaggle platform, he examined the comparison of the algorithm using the breast cancer dataset. In previous studies, their researchers used SVM, which obtained an accuracy of 96.47%, Neural Networks of 97.06%, and Naïve Bayes with an accuracy of 91.18% to study breast cancer. The difference from previous research is that this study uses several existing algorithms in Machine Learning such as K-Nearest Neighbor, Random Forest Classifier, Logistic Regression, Linear SVM, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, Kernel SVM, and MLPClassifier. In addition, this research was conducted to see whether the results of the accuracy of the algorithm that the researchers carried out using the lung cancer dataset had different results. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood. The results of this study found that the more accurate algorithms were Random Forest and Gradient Boosting with an accuracy value of 100%, whereas in previous studies, it was the same. Still, Gradient Boosting had a higher accuracy value than Random Forest. Then, based on the data used in this study, the most influencing factors in predicting a diagnosis of lung cancer are obesity and coughing up blood.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
COMPARISON OF SUPPORT VECTOR REGRESSION AND RANDOM FOREST REGRESSION ALGORITHMS ON GOLD PRICE PREDICTIONS DECISION SUPPORT SYSTEM IMPLEMENTATION IN DETERMINING STUDENTS TO RECEIVE BOS FUNDING USING THE WASPAS METHOD APPLICATION OF THE K-MEANS CLUSTERING METHOD FOR PERFORMANCE ASSESSMENT BASED ON EDUCATOR COMPETENCE WEBSITE-BASED LIBRARY DATA PROCESSING DESIGN USE OF COMPUTER-BASED EDUCATIONAL MEDIA TO IMPROVE THE LEARNING ABILITY OF INDUSTRIAL ENGINEERING STUDENTS IN CHEMISTRY
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1