Decision Tree C4.5 Performance Improvement using Synthetic Minority Oversampling Technique (SMOTE) and K-Nearest Neighbor for Debtor Eligibility Evaluation

Edi Priyanto, Enny Itje Sela, Luther Alexander Latumakulita, Noourul Islam
{"title":"Decision Tree C4.5 Performance Improvement using Synthetic Minority Oversampling Technique (SMOTE) and K-Nearest Neighbor for Debtor Eligibility Evaluation","authors":"Edi Priyanto, Enny Itje Sela, Luther Alexander Latumakulita, Noourul Islam","doi":"10.33096/ilkom.v15i2.1676.373-381","DOIUrl":null,"url":null,"abstract":"Nowadays, information technology especially machine learning has been used to evaluate the feasibility of debtors. One of the challenges in this classification model is the occurrence of imbalanced datasets, especially in the German Credit Dataset. Another challenge is developing an optimal model for evaluating debtor eligibility. Based on these challenges, this study aims to develop an optimal model for evaluating debtor eligibility on the German Credit Dataset, using the decision trees, k-Nearest Neighbor (k-NN) and Synthetic Minority Oversampling Technique (SMOTE). SMOTE and k-NN is used to overcome challenges regarding imbalanced datasets. While the decision tree are applied to produce a debtor classification model. In general, the steps taken are preparing datasets, pre-processing data, dividing datasets, oversampling with SMOTE, and classification models using decision trees, and testing. Model performance evaluation is represented by accuracy values obtained from the confusion matrix and area under curve (AUC) values generated by the Receiver Operating Characteristic (ROC). Based on the tests that have been carried out, the best accuracy value in the test is obtained at 73.00% and the AUC value is 0.708, in parameters k = 3 and Max-Depth = 25. Based on the analysis produced, the proposed model can improve performance compared to if the dataset is not applied SMOTE.","PeriodicalId":33690,"journal":{"name":"Ilkom Jurnal Ilmiah","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ilkom Jurnal Ilmiah","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33096/ilkom.v15i2.1676.373-381","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Nowadays, information technology especially machine learning has been used to evaluate the feasibility of debtors. One of the challenges in this classification model is the occurrence of imbalanced datasets, especially in the German Credit Dataset. Another challenge is developing an optimal model for evaluating debtor eligibility. Based on these challenges, this study aims to develop an optimal model for evaluating debtor eligibility on the German Credit Dataset, using the decision trees, k-Nearest Neighbor (k-NN) and Synthetic Minority Oversampling Technique (SMOTE). SMOTE and k-NN is used to overcome challenges regarding imbalanced datasets. While the decision tree are applied to produce a debtor classification model. In general, the steps taken are preparing datasets, pre-processing data, dividing datasets, oversampling with SMOTE, and classification models using decision trees, and testing. Model performance evaluation is represented by accuracy values obtained from the confusion matrix and area under curve (AUC) values generated by the Receiver Operating Characteristic (ROC). Based on the tests that have been carried out, the best accuracy value in the test is obtained at 73.00% and the AUC value is 0.708, in parameters k = 3 and Max-Depth = 25. Based on the analysis produced, the proposed model can improve performance compared to if the dataset is not applied SMOTE.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
决策树C4.5使用合成少数过采样技术(SMOTE)和k近邻进行债务人资格评估的性能改进
如今,信息技术特别是机器学习已被用于评估债务人的可行性。这种分类模型面临的挑战之一是不平衡数据集的出现,特别是在德国信用数据集中。另一个挑战是制定评估债务人资格的最佳模型。基于这些挑战,本研究旨在利用决策树、k-近邻(k-NN)和合成少数过采样技术(SMOTE),开发一个评估德国信贷数据集债务人资格的最佳模型。SMOTE和k-NN用于克服不平衡数据集的挑战。同时应用决策树生成债务人分类模型。一般来说,所采取的步骤是准备数据集、预处理数据、划分数据集、使用SMOTE进行过采样、使用决策树进行分类模型和测试。模型性能评价由混淆矩阵得到的精度值和由接收者工作特征(ROC)产生的曲线下面积(AUC)值表示。在k = 3, Max-Depth = 25的条件下,试验的最佳精度值为73.00%,AUC值为0.708。基于所产生的分析,与未应用SMOTE的数据集相比,所提出的模型可以提高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
4 weeks
期刊最新文献
K-Nearest Neighbors Analysis for Public Sentiment towards Implementation of Booster Vaccines in Indonesia Feature Space Augmentation for Negation Handling on Sentiment Analysis Diabetes Mellitus Early Detection Simulation using The K-Nearest Neighbors Algorithm with Cloud-Based Runtime (COLAB) Comparative Study of Herbal Leaves Classification using Hybrid of GLCM-SVM and GLCM-CNN Decision Tree C4.5 Performance Improvement using Synthetic Minority Oversampling Technique (SMOTE) and K-Nearest Neighbor for Debtor Eligibility Evaluation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1