Annisa Nugraheni, R. Ramadhani, Amalia Beladinna Arifa, Agi Prasetiadi
{"title":"Perbandingan Performa Antara Algoritma Naive Bayes Dan K-Nearest Neighbour Pada Klasifikasi Kanker Payudara","authors":"Annisa Nugraheni, R. Ramadhani, Amalia Beladinna Arifa, Agi Prasetiadi","doi":"10.20895/dinda.v2i1.391","DOIUrl":null,"url":null,"abstract":"Breast cancer is the second most common cause of death from cancer after lung cancer is in the first place. Breast cancer occurs when cells in breast tissue begin to grow uncontrollably and can disrupt existing healthy tissue. Therefore, there is a need for a classification to distinguish breast cancer patients and healthy people. Based on previous research, the Naïve Bayes and K-Nearest Neighbor algorithms are considered capable of classifying breast cancer. In the research process using the breast cancer dataset from the Breast Cancer Coimbra dataset in 2018 UCI Machine Learning Repository with a total of 116 data, while for the calculation of the feasibility of the method using the Confusion Matrix (Accuracy, Precision, and Recall) and the ROC-AUC curve. The purpose of this study is to compare the performance of the Naïve Bayes and K-Nearest Neighbor algorithms. In testing using the Naïve Bayes algorithm and the K-Nearest Neighbor algorithm, there are several test scenarios, namely, data testing before and after normalization, model testing based on a comparison of training data and testing data, model testing based on K values in K-Nearest Neighbors, and model testing. based on the selection of the strongest attribute with the Pearson correlation test. The results of this study indicate that the Naïve Bayes algorithm has the highest average accuracy of 69.12%, healthy precision 64.90%, pain precision 83%, healthy recall 88%, sick recall 61.11% and AUC 0.82 which is included in the good classification category. Meanwhile, the highest average results of the K-Nearest Neighbor algorithm are 76.83% for accuracy, 76% healthy precision, 80.21% pain precision, 74.18% for healthy recall, 80.81% sick recall and 0.91 AUC which is included in the excellent classification category.","PeriodicalId":419119,"journal":{"name":"Journal of Dinda : Data Science, Information Technology, and Data Analytics","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Dinda : Data Science, Information Technology, and Data Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20895/dinda.v2i1.391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
乳腺癌是仅次于肺癌的第二大常见癌症死亡原因。当乳腺组织中的细胞开始不受控制地生长并破坏现有的健康组织时,就会发生乳腺癌。因此,有必要对乳腺癌患者和健康人进行分类。根据之前的研究,Naïve贝叶斯和k近邻算法被认为能够对乳腺癌进行分类。在研究过程中使用了来自2018年UCI机器学习存储库中乳腺癌科英布拉数据集的乳腺癌数据集共116个数据,同时使用混淆矩阵(Accuracy, Precision, and Recall)和ROC-AUC曲线来计算该方法的可行性。本研究的目的是比较Naïve贝叶斯和k -最近邻算法的性能。在使用Naïve贝叶斯算法和K近邻算法进行测试时,有几种测试场景,分别是归一化前后的数据测试、基于训练数据和测试数据对比的模型测试、基于K近邻中K值的模型测试、模型测试。通过Pearson相关检验选择最强属性。研究结果表明,Naïve贝叶斯算法的平均准确率最高,为69.12%,健康准确率为64.90%,疼痛准确率为83%,健康召回率为88%,疾病召回率为61.11%,AUC为0.82,属于良好分类类别。同时,k -最邻近算法的最高平均结果为准确率76.83%,健康准确率76%,疼痛准确率80.21%,健康召回率74.18%,疾病召回率80.81%,AUC 0.91,属于优秀分类类别。
Perbandingan Performa Antara Algoritma Naive Bayes Dan K-Nearest Neighbour Pada Klasifikasi Kanker Payudara
Breast cancer is the second most common cause of death from cancer after lung cancer is in the first place. Breast cancer occurs when cells in breast tissue begin to grow uncontrollably and can disrupt existing healthy tissue. Therefore, there is a need for a classification to distinguish breast cancer patients and healthy people. Based on previous research, the Naïve Bayes and K-Nearest Neighbor algorithms are considered capable of classifying breast cancer. In the research process using the breast cancer dataset from the Breast Cancer Coimbra dataset in 2018 UCI Machine Learning Repository with a total of 116 data, while for the calculation of the feasibility of the method using the Confusion Matrix (Accuracy, Precision, and Recall) and the ROC-AUC curve. The purpose of this study is to compare the performance of the Naïve Bayes and K-Nearest Neighbor algorithms. In testing using the Naïve Bayes algorithm and the K-Nearest Neighbor algorithm, there are several test scenarios, namely, data testing before and after normalization, model testing based on a comparison of training data and testing data, model testing based on K values in K-Nearest Neighbors, and model testing. based on the selection of the strongest attribute with the Pearson correlation test. The results of this study indicate that the Naïve Bayes algorithm has the highest average accuracy of 69.12%, healthy precision 64.90%, pain precision 83%, healthy recall 88%, sick recall 61.11% and AUC 0.82 which is included in the good classification category. Meanwhile, the highest average results of the K-Nearest Neighbor algorithm are 76.83% for accuracy, 76% healthy precision, 80.21% pain precision, 74.18% for healthy recall, 80.81% sick recall and 0.91 AUC which is included in the excellent classification category.