Comparison of Ensemble Learning Method: Random Forest, Support Vector Machine, AdaBoost for Classification Human Development Index (HDI)

Ressa Isnaini Arumnisaa, Arie Wahyu Wijayanto
{"title":"Comparison of Ensemble Learning Method: Random Forest, Support Vector Machine, AdaBoost for Classification Human Development Index (HDI)","authors":"Ressa Isnaini Arumnisaa, Arie Wahyu Wijayanto","doi":"10.32520/stmsi.v12i1.2501","DOIUrl":null,"url":null,"abstract":"Classification in supervised learning is a way to find patterns in data base that the classes are already known. In the classification of machine learning, there is a term called ensemble classifier. The workings of the ensemble classifier aimed to improve model accuracy and optimize classification performance. This study aims to analyze the comparison of algorithms that work with ensemble learning , including Random Forest, Support Vector Machine (SVM), and AdaBoost. The data used is the Human Development Index ( HDI ) of districts/cities in Indonesia . O ther variables that are strongly related to human development are GRDP per capita, gross enrollment rate, n et enrollment rate, labor force participation rate, unemployment rate, poverty rate, poverty depth, poverty severity, and average consumption per capita. The reason for using HDI is that apart from being an important macroeconomic variable in describing the condition of human resources in Indonesia, HDI already has a n obvious classification according to the Badan Pusat Statistik (BPS) so that supervised learning can be applied . Comparison of model evaluation using accuracy, specificity, sensitivity, and kappa statistics . The analysis flow starts with data preprocessing , resampling and cross - validation , then modeling using the Random Forest, Support Vector Machine (SVM), and AdaBoost algorithm . T he final stage is the model evaluation by comparing the best models in the classification s of districts/ cities according to HDI. The results showed that the Random Forest model had the best performance compared to the Support Vector Machine (SVM) and AdaBoost models with an accuracy value of 85,23%, spe c ifi c it y of 71,63% , sensitivit y of 95,05% , and kappa coefficient of 0,7698 . From this research, the an ensemble classifier can be developed to help classify scores on the Human Development Index in Indonesia. Keywords: AdaBoost, Random Forest, Support Vector Machine , Ensemble Learning, Human Development Index","PeriodicalId":32357,"journal":{"name":"Jurnal Sistem Informasi","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Sistem Informasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32520/stmsi.v12i1.2501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Classification in supervised learning is a way to find patterns in data base that the classes are already known. In the classification of machine learning, there is a term called ensemble classifier. The workings of the ensemble classifier aimed to improve model accuracy and optimize classification performance. This study aims to analyze the comparison of algorithms that work with ensemble learning , including Random Forest, Support Vector Machine (SVM), and AdaBoost. The data used is the Human Development Index ( HDI ) of districts/cities in Indonesia . O ther variables that are strongly related to human development are GRDP per capita, gross enrollment rate, n et enrollment rate, labor force participation rate, unemployment rate, poverty rate, poverty depth, poverty severity, and average consumption per capita. The reason for using HDI is that apart from being an important macroeconomic variable in describing the condition of human resources in Indonesia, HDI already has a n obvious classification according to the Badan Pusat Statistik (BPS) so that supervised learning can be applied . Comparison of model evaluation using accuracy, specificity, sensitivity, and kappa statistics . The analysis flow starts with data preprocessing , resampling and cross - validation , then modeling using the Random Forest, Support Vector Machine (SVM), and AdaBoost algorithm . T he final stage is the model evaluation by comparing the best models in the classification s of districts/ cities according to HDI. The results showed that the Random Forest model had the best performance compared to the Support Vector Machine (SVM) and AdaBoost models with an accuracy value of 85,23%, spe c ifi c it y of 71,63% , sensitivit y of 95,05% , and kappa coefficient of 0,7698 . From this research, the an ensemble classifier can be developed to help classify scores on the Human Development Index in Indonesia. Keywords: AdaBoost, Random Forest, Support Vector Machine , Ensemble Learning, Human Development Index
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
集成学习方法:随机森林、支持向量机、AdaBoost分类人类发展指数(HDI)的比较
监督学习中的分类是一种在数据库中发现已知类别的模式的方法。在机器学习的分类中,有一个术语叫做集成分类器。集成分类器的工作旨在提高模型精度和优化分类性能。本研究旨在分析集成学习算法的比较,包括随机森林、支持向量机(SVM)和AdaBoost。所使用的数据是印度尼西亚各区/城市的人类发展指数。其他与人类发展密切相关的变量有人均gdp、毛入学率、净入学率、劳动力参与率、失业率、贫困率、贫困深度、贫困严重程度和人均平均消费。使用HDI的原因是,除了作为描述印度尼西亚人力资源状况的重要宏观经济变量外,HDI根据巴丹普萨统计(BPS)已经有了一个明显的分类,因此可以应用监督学习。模型评估的准确性、特异性、敏感性和kappa统计的比较。分析流程从数据预处理、重采样和交叉验证开始,然后使用随机森林、支持向量机(SVM)和AdaBoost算法建模。最后阶段是模型评价,比较各区/市按HDI分类的最佳模型。结果表明,与支持向量机(SVM)和AdaBoost模型相比,随机森林模型的准确率为85,23%,特异识别率为71,63%,灵敏度为95,05%,kappa系数为0,7698,具有最佳性能。从这项研究中,可以开发一个集成分类器来帮助对印度尼西亚人类发展指数的分数进行分类。关键词:AdaBoost,随机森林,支持向量机,集成学习,人类发展指数
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
12
审稿时长
12 weeks
期刊最新文献
Designing Indonesian Maternal and Child Health Mobile Applications using User-Centered Design Usability Evaluation and Interface Design Improvement for the Maxim Application with User-Centered Design Approach Analyst’s Perception on the Use of AI-based Tools in the Software Development Life Cycle Onboarding Model to Integrate Newcomers into Scrum Team at an Insurance Company What Makes Gen Z in Indonesia Use P2P Lending Applications: An Extension of Technology Acceptance Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1