使用 XGBoost 和决策树实施特征选择策略以增强分类效果

Fhara Elvina Pingky Nadya, M.Firdaus Ibadi Ferdiansyah, Vinna Rahmayanti Setyaning Nastiti, Christian Sri Kusuma Aditya
{"title":"使用 XGBoost 和决策树实施特征选择策略以增强分类效果","authors":"Fhara Elvina Pingky Nadya, M.Firdaus Ibadi Ferdiansyah, Vinna Rahmayanti Setyaning Nastiti, Christian Sri Kusuma Aditya","doi":"10.15294/sji.v11i1.48145","DOIUrl":null,"url":null,"abstract":"Purpose: Grades in the world of education are often a benchmark for students to be considered successful or not during the learning period. The facilities and teaching staff provided by schools with the same portion do not make student grades the same, the value gap is still found in every school. The purpose of this research is to produce a better accuracy rate by applying feature selection Information Gain (IG), Recursive Feature Elimination (RFE), Lasso, and Hybrid (RFE + Mutual Information) using XGBoost and Decision Tree models.Methods: This research was conducted using 649 Portuguese course student data that had been pre-processed according to data requirements, then, feature selection was carried out to select features that affect the target, after that all data can be classified using XGBoost and Decision tree, finally evaluating and displaying the results. Results: The results showed that feature selection Information Gain combined with the XGBoost algorithm has the best accuracy results compared to others, which is 81.53%.Novelty: The contribution of this research is to improve the classification accuracy results of previous research by using 2 traditional machine learning algorithms and some feature selection.","PeriodicalId":30781,"journal":{"name":"Scientific Journal of Informatics","volume":"32 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Implementation of Feature Selection Strategies to Enhance Classification Using XGBoost and Decision Tree\",\"authors\":\"Fhara Elvina Pingky Nadya, M.Firdaus Ibadi Ferdiansyah, Vinna Rahmayanti Setyaning Nastiti, Christian Sri Kusuma Aditya\",\"doi\":\"10.15294/sji.v11i1.48145\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: Grades in the world of education are often a benchmark for students to be considered successful or not during the learning period. The facilities and teaching staff provided by schools with the same portion do not make student grades the same, the value gap is still found in every school. The purpose of this research is to produce a better accuracy rate by applying feature selection Information Gain (IG), Recursive Feature Elimination (RFE), Lasso, and Hybrid (RFE + Mutual Information) using XGBoost and Decision Tree models.Methods: This research was conducted using 649 Portuguese course student data that had been pre-processed according to data requirements, then, feature selection was carried out to select features that affect the target, after that all data can be classified using XGBoost and Decision tree, finally evaluating and displaying the results. Results: The results showed that feature selection Information Gain combined with the XGBoost algorithm has the best accuracy results compared to others, which is 81.53%.Novelty: The contribution of this research is to improve the classification accuracy results of previous research by using 2 traditional machine learning algorithms and some feature selection.\",\"PeriodicalId\":30781,\"journal\":{\"name\":\"Scientific Journal of Informatics\",\"volume\":\"32 5\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific Journal of Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15294/sji.v11i1.48145\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Journal of Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15294/sji.v11i1.48145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

目的:在教育界,成绩往往是学生在学习期间被视为成功与否的基准。同样分量的学校所提供的设施和师资并不能使学生的成绩相同,价值差距在每所学校仍然存在。本研究的目的是利用 XGBoost 和决策树模型,通过信息增益(IG)、递归特征消除(RFE)、Lasso 和混合(RFE + 互信息)特征选择,提高准确率:本研究使用了 649 个葡萄牙语课程学生数据,这些数据已根据数据要求进行了预处理,然后,进行了特征选择,以选出影响目标的特征,之后,所有数据都可以使用 XGBoost 和决策树进行分类,最后评估并显示结果。结果新颖性:本研究的贡献在于通过使用两种传统的机器学习算法和一些特征选择,提高了之前研究的分类准确率结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Implementation of Feature Selection Strategies to Enhance Classification Using XGBoost and Decision Tree
Purpose: Grades in the world of education are often a benchmark for students to be considered successful or not during the learning period. The facilities and teaching staff provided by schools with the same portion do not make student grades the same, the value gap is still found in every school. The purpose of this research is to produce a better accuracy rate by applying feature selection Information Gain (IG), Recursive Feature Elimination (RFE), Lasso, and Hybrid (RFE + Mutual Information) using XGBoost and Decision Tree models.Methods: This research was conducted using 649 Portuguese course student data that had been pre-processed according to data requirements, then, feature selection was carried out to select features that affect the target, after that all data can be classified using XGBoost and Decision tree, finally evaluating and displaying the results. Results: The results showed that feature selection Information Gain combined with the XGBoost algorithm has the best accuracy results compared to others, which is 81.53%.Novelty: The contribution of this research is to improve the classification accuracy results of previous research by using 2 traditional machine learning algorithms and some feature selection.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
13
审稿时长
24 weeks
期刊最新文献
A Comparative Study of Random Forest and Double Random Forest Models from View Points of Their Interpretability Comparative Analysis of LSTM Neural Network and SVM for USD Exchange Rate Prediction: A Study on Different Training Data Scenarios Knowledge Discovery from Confusion Matrix of Pruned CART in Imbalanced Microarray Data Ovarian Cancer Classification Comparison of Discriminant Analysis and Support Vector Machine on Mixed Categorical and Continuous Independent Variables for COVID-19 Patients Data The Comparison of K-Nearest Neighbors and Random Forest Algorithm to Recognize Indonesian Sign Language in a Real-Time
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1