{"title":"集成学习方法:随机森林、支持向量机、AdaBoost分类人类发展指数(HDI)的比较","authors":"Ressa Isnaini Arumnisaa, Arie Wahyu Wijayanto","doi":"10.32520/stmsi.v12i1.2501","DOIUrl":null,"url":null,"abstract":"Classification in supervised learning is a way to find patterns in data base that the classes are already known. In the classification of machine learning, there is a term called ensemble classifier. The workings of the ensemble classifier aimed to improve model accuracy and optimize classification performance. This study aims to analyze the comparison of algorithms that work with ensemble learning , including Random Forest, Support Vector Machine (SVM), and AdaBoost. The data used is the Human Development Index ( HDI ) of districts/cities in Indonesia . O ther variables that are strongly related to human development are GRDP per capita, gross enrollment rate, n et enrollment rate, labor force participation rate, unemployment rate, poverty rate, poverty depth, poverty severity, and average consumption per capita. The reason for using HDI is that apart from being an important macroeconomic variable in describing the condition of human resources in Indonesia, HDI already has a n obvious classification according to the Badan Pusat Statistik (BPS) so that supervised learning can be applied . Comparison of model evaluation using accuracy, specificity, sensitivity, and kappa statistics . The analysis flow starts with data preprocessing , resampling and cross - validation , then modeling using the Random Forest, Support Vector Machine (SVM), and AdaBoost algorithm . T he final stage is the model evaluation by comparing the best models in the classification s of districts/ cities according to HDI. The results showed that the Random Forest model had the best performance compared to the Support Vector Machine (SVM) and AdaBoost models with an accuracy value of 85,23%, spe c ifi c it y of 71,63% , sensitivit y of 95,05% , and kappa coefficient of 0,7698 . From this research, the an ensemble classifier can be developed to help classify scores on the Human Development Index in Indonesia. Keywords: AdaBoost, Random Forest, Support Vector Machine , Ensemble Learning, Human Development Index","PeriodicalId":32357,"journal":{"name":"Jurnal Sistem Informasi","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of Ensemble Learning Method: Random Forest, Support Vector Machine, AdaBoost for Classification Human Development Index (HDI)\",\"authors\":\"Ressa Isnaini Arumnisaa, Arie Wahyu Wijayanto\",\"doi\":\"10.32520/stmsi.v12i1.2501\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification in supervised learning is a way to find patterns in data base that the classes are already known. In the classification of machine learning, there is a term called ensemble classifier. The workings of the ensemble classifier aimed to improve model accuracy and optimize classification performance. This study aims to analyze the comparison of algorithms that work with ensemble learning , including Random Forest, Support Vector Machine (SVM), and AdaBoost. The data used is the Human Development Index ( HDI ) of districts/cities in Indonesia . O ther variables that are strongly related to human development are GRDP per capita, gross enrollment rate, n et enrollment rate, labor force participation rate, unemployment rate, poverty rate, poverty depth, poverty severity, and average consumption per capita. The reason for using HDI is that apart from being an important macroeconomic variable in describing the condition of human resources in Indonesia, HDI already has a n obvious classification according to the Badan Pusat Statistik (BPS) so that supervised learning can be applied . Comparison of model evaluation using accuracy, specificity, sensitivity, and kappa statistics . The analysis flow starts with data preprocessing , resampling and cross - validation , then modeling using the Random Forest, Support Vector Machine (SVM), and AdaBoost algorithm . T he final stage is the model evaluation by comparing the best models in the classification s of districts/ cities according to HDI. The results showed that the Random Forest model had the best performance compared to the Support Vector Machine (SVM) and AdaBoost models with an accuracy value of 85,23%, spe c ifi c it y of 71,63% , sensitivit y of 95,05% , and kappa coefficient of 0,7698 . From this research, the an ensemble classifier can be developed to help classify scores on the Human Development Index in Indonesia. Keywords: AdaBoost, Random Forest, Support Vector Machine , Ensemble Learning, Human Development Index\",\"PeriodicalId\":32357,\"journal\":{\"name\":\"Jurnal Sistem Informasi\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jurnal Sistem Informasi\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32520/stmsi.v12i1.2501\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Sistem Informasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32520/stmsi.v12i1.2501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison of Ensemble Learning Method: Random Forest, Support Vector Machine, AdaBoost for Classification Human Development Index (HDI)
Classification in supervised learning is a way to find patterns in data base that the classes are already known. In the classification of machine learning, there is a term called ensemble classifier. The workings of the ensemble classifier aimed to improve model accuracy and optimize classification performance. This study aims to analyze the comparison of algorithms that work with ensemble learning , including Random Forest, Support Vector Machine (SVM), and AdaBoost. The data used is the Human Development Index ( HDI ) of districts/cities in Indonesia . O ther variables that are strongly related to human development are GRDP per capita, gross enrollment rate, n et enrollment rate, labor force participation rate, unemployment rate, poverty rate, poverty depth, poverty severity, and average consumption per capita. The reason for using HDI is that apart from being an important macroeconomic variable in describing the condition of human resources in Indonesia, HDI already has a n obvious classification according to the Badan Pusat Statistik (BPS) so that supervised learning can be applied . Comparison of model evaluation using accuracy, specificity, sensitivity, and kappa statistics . The analysis flow starts with data preprocessing , resampling and cross - validation , then modeling using the Random Forest, Support Vector Machine (SVM), and AdaBoost algorithm . T he final stage is the model evaluation by comparing the best models in the classification s of districts/ cities according to HDI. The results showed that the Random Forest model had the best performance compared to the Support Vector Machine (SVM) and AdaBoost models with an accuracy value of 85,23%, spe c ifi c it y of 71,63% , sensitivit y of 95,05% , and kappa coefficient of 0,7698 . From this research, the an ensemble classifier can be developed to help classify scores on the Human Development Index in Indonesia. Keywords: AdaBoost, Random Forest, Support Vector Machine , Ensemble Learning, Human Development Index