Empowering education: Harnessing ensemble machine learning approach and ACO-DT classifier for early student academic performance prediction

IF 4.8 2区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Education and Information Technologies Pub Date : 2024-09-02 DOI:10.1007/s10639-024-12976-6

Kajal Mahawar, Punam Rattan

{"title":"Empowering education: Harnessing ensemble machine learning approach and ACO-DT classifier for early student academic performance prediction","authors":"Kajal Mahawar, Punam Rattan","doi":"10.1007/s10639-024-12976-6","DOIUrl":null,"url":null,"abstract":"<p>Higher education institutions have consistently strived to provide students with top-notch education. To achieve better outcomes, machine learning (ML) algorithms greatly simplify the prediction process. ML can be utilized by academicians to obtain insight into student data and mine data for forecasting the performance. In this paper, the authors proposed an ML-based student prediction model based on the demographic, social, psychological, and economic factors, collectively. The dataset utilized for this study was compiled from a designed questionnaire administered to second-year undergraduate students. The objective of this study is to uncover factors that could assist in predicting students' performance. Eight ML classifiers, logistic regression, random forest, support vector machine, XGBoost, support vector machine with a linear kernel, naïve Bayes, K-Nearest Neighbor, and decision tree are used to forecast student performance. Additionally, nine feature selection techniques, variance threshold, XGBoost, feature importance, recursive feature elimination, chi-square, ridge, Pearson correlation, lasso, and random forest, are employed to determine optimal factors. The authors experimented with each technique by creating two sets of training and testing data with 80:20 and 70:30 proportions, respectively. Comparatively, the ensemble DXK (DT + XGB + KNN) model with cross-validation and 80:20 proportions outperformed other standard classifiers, achieving a highest accuracy of 97.83%, an r-square of 96.17%, a precision of 97.94%, a recall of 97.83%, and an f1-score of 97.88%. These were the highest among all models tested. Additionally, the authors propose the ACO-DT model, which improves the prediction performance of the top-performing DT classifier by utilizing the Ant Colony Optimization technique. The findings demonstrate that the proposed model with 80:20 proportions achieve an accuracy of 98.15%, an f1-score of 98.16%, a precision of 98.18%, a recall of 98.15%, and an r-square of 84.75%, surpassing all other models for forecasting student performance. Using the specified data size, this model creation time is 8.49 s. The authors also recommended the future research directions to further enhance this study.</p>","PeriodicalId":51494,"journal":{"name":"Education and Information Technologies","volume":"16 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Education and Information Technologies","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1007/s10639-024-12976-6","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

Higher education institutions have consistently strived to provide students with top-notch education. To achieve better outcomes, machine learning (ML) algorithms greatly simplify the prediction process. ML can be utilized by academicians to obtain insight into student data and mine data for forecasting the performance. In this paper, the authors proposed an ML-based student prediction model based on the demographic, social, psychological, and economic factors, collectively. The dataset utilized for this study was compiled from a designed questionnaire administered to second-year undergraduate students. The objective of this study is to uncover factors that could assist in predicting students' performance. Eight ML classifiers, logistic regression, random forest, support vector machine, XGBoost, support vector machine with a linear kernel, naïve Bayes, K-Nearest Neighbor, and decision tree are used to forecast student performance. Additionally, nine feature selection techniques, variance threshold, XGBoost, feature importance, recursive feature elimination, chi-square, ridge, Pearson correlation, lasso, and random forest, are employed to determine optimal factors. The authors experimented with each technique by creating two sets of training and testing data with 80:20 and 70:30 proportions, respectively. Comparatively, the ensemble DXK (DT + XGB + KNN) model with cross-validation and 80:20 proportions outperformed other standard classifiers, achieving a highest accuracy of 97.83%, an r-square of 96.17%, a precision of 97.94%, a recall of 97.83%, and an f1-score of 97.88%. These were the highest among all models tested. Additionally, the authors propose the ACO-DT model, which improves the prediction performance of the top-performing DT classifier by utilizing the Ant Colony Optimization technique. The findings demonstrate that the proposed model with 80:20 proportions achieve an accuracy of 98.15%, an f1-score of 98.16%, a precision of 98.18%, a recall of 98.15%, and an r-square of 84.75%, surpassing all other models for forecasting student performance. Using the specified data size, this model creation time is 8.49 s. The authors also recommended the future research directions to further enhance this study.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

增强教育能力：利用集合机器学习方法和 ACO-DT 分类器预测学生早期学习成绩

高等教育机构一直致力于为学生提供一流的教育。为了取得更好的成绩，机器学习（ML）算法大大简化了预测过程。学术界可以利用 ML 深入了解学生数据，并挖掘数据用于预测成绩。在本文中，作者基于人口、社会、心理和经济因素，提出了一个基于 ML 的学生预测模型。本研究使用的数据集是通过对本科二年级学生设计的问卷调查编制而成的。本研究的目的是发现有助于预测学生成绩的因素。八种 ML 分类器（逻辑回归、随机森林、支持向量机、XGBoost、带线性核的支持向量机、天真贝叶斯、K-近邻和决策树）被用来预测学生的成绩。此外，还采用了九种特征选择技术，即方差阈值、XGBoost、特征重要性、递归特征消除、chi-square、岭、皮尔逊相关、套索和随机森林，以确定最佳因子。作者通过创建两组分别为 80:20 和 70:30 比例的训练数据和测试数据，对每种技术进行了试验。相比之下，采用交叉验证和 80:20 比例的集合 DXK（DT + XGB + KNN）模型的表现优于其他标准分类器，其最高准确率达到 97.83%，r-square 为 96.17%，精确度为 97.94%，召回率为 97.83%，f1-score 为 97.88%。这些结果是所有测试模型中最高的。此外，作者还提出了 ACO-DT 模型，该模型利用蚁群优化技术提高了表现最佳的 DT 分类器的预测性能。研究结果表明，所提出的 80:20 比例模型在预测学生成绩方面的准确率为 98.15%，f1 分数为 98.16%，精确率为 98.18%，召回率为 98.15%，r 平方为 84.75%，超过了所有其他模型。作者还建议了未来的研究方向，以进一步加强这项研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Education and Information Technologies EDUCATION & EDUCATIONAL RESEARCH-

CiteScore

10.00

自引率

12.70%

发文量

610

期刊介绍： The Journal of Education and Information Technologies (EAIT) is a platform for the range of debates and issues in the field of Computing Education as well as the many uses of information and communication technology (ICT) across many educational subjects and sectors. It probes the use of computing to improve education and learning in a variety of settings, platforms and environments. The journal aims to provide perspectives at all levels, from the micro level of specific pedagogical approaches in Computing Education and applications or instances of use in classrooms, to macro concerns of national policies and major projects; from pre-school classes to adults in tertiary institutions; from teachers and administrators to researchers and designers; from institutions to online and lifelong learning. The journal is embedded in the research and practice of professionals within the contemporary global context and its breadth and scope encourage debate on fundamental issues at all levels and from different research paradigms and learning theories. The journal does not proselytize on behalf of the technologies (whether they be mobile, desktop, interactive, virtual, games-based or learning management systems) but rather provokes debate on all the complex relationships within and between computing and education, whether they are in informal or formal settings. It probes state of the art technologies in Computing Education and it also considers the design and evaluation of digital educational artefacts. The journal aims to maintain and expand its international standing by careful selection on merit of the papers submitted, thus providing a credible ongoing forum for debate and scholarly discourse. Special Issues are occasionally published to cover particular issues in depth. EAIT invites readers to submit papers that draw inferences, probe theory and create new knowledge that informs practice, policy and scholarship. Readers are also invited to comment and reflect upon the argument and opinions published. EAIT is the official journal of the Technical Committee on Education of the International Federation for Information Processing (IFIP) in partnership with UNESCO.