集成学习方法:随机森林、支持向量机、AdaBoost分类人类发展指数(HDI)的比较

Jurnal Sistem Informasi Pub Date : 2023-01-31 DOI:10.32520/stmsi.v12i1.2501

Ressa Isnaini Arumnisaa, Arie Wahyu Wijayanto

{"title":"集成学习方法:随机森林、支持向量机、AdaBoost分类人类发展指数(HDI)的比较","authors":"Ressa Isnaini Arumnisaa, Arie Wahyu Wijayanto","doi":"10.32520/stmsi.v12i1.2501","DOIUrl":null,"url":null,"abstract":"Classification in supervised learning is a way to find patterns in data base that the classes are already known. In the classification of machine learning, there is a term called ensemble classifier. The workings of the ensemble classifier aimed to improve model accuracy and optimize classification performance. This study aims to analyze the comparison of algorithms that work with ensemble learning , including Random Forest, Support Vector Machine (SVM), and AdaBoost. The data used is the Human Development Index ( HDI ) of districts/cities in Indonesia . O ther variables that are strongly related to human development are GRDP per capita, gross enrollment rate, n et enrollment rate, labor force participation rate, unemployment rate, poverty rate, poverty depth, poverty severity, and average consumption per capita. The reason for using HDI is that apart from being an important macroeconomic variable in describing the condition of human resources in Indonesia, HDI already has a n obvious classification according to the Badan Pusat Statistik (BPS) so that supervised learning can be applied . Comparison of model evaluation using accuracy, specificity, sensitivity, and kappa statistics . The analysis flow starts with data preprocessing , resampling and cross - validation , then modeling using the Random Forest, Support Vector Machine (SVM), and AdaBoost algorithm . T he final stage is the model evaluation by comparing the best models in the classification s of districts/ cities according to HDI. The results showed that the Random Forest model had the best performance compared to the Support Vector Machine (SVM) and AdaBoost models with an accuracy value of 85,23%, spe c ifi c it y of 71,63% , sensitivit y of 95,05% , and kappa coefficient of 0,7698 . From this research, the an ensemble classifier can be developed to help classify scores on the Human Development Index in Indonesia. Keywords: AdaBoost, Random Forest, Support Vector Machine , Ensemble Learning, Human Development Index","PeriodicalId":32357,"journal":{"name":"Jurnal Sistem Informasi","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of Ensemble Learning Method: Random Forest, Support Vector Machine, AdaBoost for Classification Human Development Index (HDI)\",\"authors\":\"Ressa Isnaini Arumnisaa, Arie Wahyu Wijayanto\",\"doi\":\"10.32520/stmsi.v12i1.2501\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification in supervised learning is a way to find patterns in data base that the classes are already known. In the classification of machine learning, there is a term called ensemble classifier. The workings of the ensemble classifier aimed to improve model accuracy and optimize classification performance. This study aims to analyze the comparison of algorithms that work with ensemble learning , including Random Forest, Support Vector Machine (SVM), and AdaBoost. The data used is the Human Development Index ( HDI ) of districts/cities in Indonesia . O ther variables that are strongly related to human development are GRDP per capita, gross enrollment rate, n et enrollment rate, labor force participation rate, unemployment rate, poverty rate, poverty depth, poverty severity, and average consumption per capita. The reason for using HDI is that apart from being an important macroeconomic variable in describing the condition of human resources in Indonesia, HDI already has a n obvious classification according to the Badan Pusat Statistik (BPS) so that supervised learning can be applied . Comparison of model evaluation using accuracy, specificity, sensitivity, and kappa statistics . The analysis flow starts with data preprocessing , resampling and cross - validation , then modeling using the Random Forest, Support Vector Machine (SVM), and AdaBoost algorithm . T he final stage is the model evaluation by comparing the best models in the classification s of districts/ cities according to HDI. The results showed that the Random Forest model had the best performance compared to the Support Vector Machine (SVM) and AdaBoost models with an accuracy value of 85,23%, spe c ifi c it y of 71,63% , sensitivit y of 95,05% , and kappa coefficient of 0,7698 . From this research, the an ensemble classifier can be developed to help classify scores on the Human Development Index in Indonesia. Keywords: AdaBoost, Random Forest, Support Vector Machine , Ensemble Learning, Human Development Index\",\"PeriodicalId\":32357,\"journal\":{\"name\":\"Jurnal Sistem Informasi\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jurnal Sistem Informasi\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32520/stmsi.v12i1.2501\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Sistem Informasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32520/stmsi.v12i1.2501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

监督学习中的分类是一种在数据库中发现已知类别的模式的方法。在机器学习的分类中，有一个术语叫做集成分类器。集成分类器的工作旨在提高模型精度和优化分类性能。本研究旨在分析集成学习算法的比较，包括随机森林、支持向量机(SVM)和AdaBoost。所使用的数据是印度尼西亚各区/城市的人类发展指数。其他与人类发展密切相关的变量有人均gdp、毛入学率、净入学率、劳动力参与率、失业率、贫困率、贫困深度、贫困严重程度和人均平均消费。使用HDI的原因是，除了作为描述印度尼西亚人力资源状况的重要宏观经济变量外，HDI根据巴丹普萨统计(BPS)已经有了一个明显的分类，因此可以应用监督学习。模型评估的准确性、特异性、敏感性和kappa统计的比较。分析流程从数据预处理、重采样和交叉验证开始，然后使用随机森林、支持向量机(SVM)和AdaBoost算法建模。最后阶段是模型评价，比较各区/市按HDI分类的最佳模型。结果表明，与支持向量机(SVM)和AdaBoost模型相比，随机森林模型的准确率为85,23%，特异识别率为71,63%，灵敏度为95,05%，kappa系数为0,7698，具有最佳性能。从这项研究中，可以开发一个集成分类器来帮助对印度尼西亚人类发展指数的分数进行分类。关键词:AdaBoost，随机森林，支持向量机，集成学习，人类发展指数

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comparison of Ensemble Learning Method: Random Forest, Support Vector Machine, AdaBoost for Classification Human Development Index (HDI)

Classification in supervised learning is a way to find patterns in data base that the classes are already known. In the classification of machine learning, there is a term called ensemble classifier. The workings of the ensemble classifier aimed to improve model accuracy and optimize classification performance. This study aims to analyze the comparison of algorithms that work with ensemble learning , including Random Forest, Support Vector Machine (SVM), and AdaBoost. The data used is the Human Development Index ( HDI ) of districts/cities in Indonesia . O ther variables that are strongly related to human development are GRDP per capita, gross enrollment rate, n et enrollment rate, labor force participation rate, unemployment rate, poverty rate, poverty depth, poverty severity, and average consumption per capita. The reason for using HDI is that apart from being an important macroeconomic variable in describing the condition of human resources in Indonesia, HDI already has a n obvious classification according to the Badan Pusat Statistik (BPS) so that supervised learning can be applied . Comparison of model evaluation using accuracy, specificity, sensitivity, and kappa statistics . The analysis flow starts with data preprocessing , resampling and cross - validation , then modeling using the Random Forest, Support Vector Machine (SVM), and AdaBoost algorithm . T he final stage is the model evaluation by comparing the best models in the classification s of districts/ cities according to HDI. The results showed that the Random Forest model had the best performance compared to the Support Vector Machine (SVM) and AdaBoost models with an accuracy value of 85,23%, spe c ifi c it y of 71,63% , sensitivit y of 95,05% , and kappa coefficient of 0,7698 . From this research, the an ensemble classifier can be developed to help classify scores on the Human Development Index in Indonesia. Keywords: AdaBoost, Random Forest, Support Vector Machine , Ensemble Learning, Human Development Index

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Jurnal Sistem Informasi

自引率

0.00%

发文量

审稿时长

12 weeks