{"title":"Empowering education: Harnessing ensemble machine learning approach and ACO-DT classifier for early student academic performance prediction","authors":"Kajal Mahawar, Punam Rattan","doi":"10.1007/s10639-024-12976-6","DOIUrl":null,"url":null,"abstract":"<p>Higher education institutions have consistently strived to provide students with top-notch education. To achieve better outcomes, machine learning (ML) algorithms greatly simplify the prediction process. ML can be utilized by academicians to obtain insight into student data and mine data for forecasting the performance. In this paper, the authors proposed an ML-based student prediction model based on the demographic, social, psychological, and economic factors, collectively. The dataset utilized for this study was compiled from a designed questionnaire administered to second-year undergraduate students. The objective of this study is to uncover factors that could assist in predicting students' performance. Eight ML classifiers, logistic regression, random forest, support vector machine, XGBoost, support vector machine with a linear kernel, naïve Bayes, K-Nearest Neighbor, and decision tree are used to forecast student performance. Additionally, nine feature selection techniques, variance threshold, XGBoost, feature importance, recursive feature elimination, chi-square, ridge, Pearson correlation, lasso, and random forest, are employed to determine optimal factors. The authors experimented with each technique by creating two sets of training and testing data with 80:20 and 70:30 proportions, respectively. Comparatively, the ensemble DXK (DT + XGB + KNN) model with cross-validation and 80:20 proportions outperformed other standard classifiers, achieving a highest accuracy of 97.83%, an r-square of 96.17%, a precision of 97.94%, a recall of 97.83%, and an f1-score of 97.88%. These were the highest among all models tested. Additionally, the authors propose the ACO-DT model, which improves the prediction performance of the top-performing DT classifier by utilizing the Ant Colony Optimization technique. The findings demonstrate that the proposed model with 80:20 proportions achieve an accuracy of 98.15%, an f1-score of 98.16%, a precision of 98.18%, a recall of 98.15%, and an r-square of 84.75%, surpassing all other models for forecasting student performance. Using the specified data size, this model creation time is 8.49 s. The authors also recommended the future research directions to further enhance this study.</p>","PeriodicalId":51494,"journal":{"name":"Education and Information Technologies","volume":"16 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Education and Information Technologies","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1007/s10639-024-12976-6","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0
Abstract
Higher education institutions have consistently strived to provide students with top-notch education. To achieve better outcomes, machine learning (ML) algorithms greatly simplify the prediction process. ML can be utilized by academicians to obtain insight into student data and mine data for forecasting the performance. In this paper, the authors proposed an ML-based student prediction model based on the demographic, social, psychological, and economic factors, collectively. The dataset utilized for this study was compiled from a designed questionnaire administered to second-year undergraduate students. The objective of this study is to uncover factors that could assist in predicting students' performance. Eight ML classifiers, logistic regression, random forest, support vector machine, XGBoost, support vector machine with a linear kernel, naïve Bayes, K-Nearest Neighbor, and decision tree are used to forecast student performance. Additionally, nine feature selection techniques, variance threshold, XGBoost, feature importance, recursive feature elimination, chi-square, ridge, Pearson correlation, lasso, and random forest, are employed to determine optimal factors. The authors experimented with each technique by creating two sets of training and testing data with 80:20 and 70:30 proportions, respectively. Comparatively, the ensemble DXK (DT + XGB + KNN) model with cross-validation and 80:20 proportions outperformed other standard classifiers, achieving a highest accuracy of 97.83%, an r-square of 96.17%, a precision of 97.94%, a recall of 97.83%, and an f1-score of 97.88%. These were the highest among all models tested. Additionally, the authors propose the ACO-DT model, which improves the prediction performance of the top-performing DT classifier by utilizing the Ant Colony Optimization technique. The findings demonstrate that the proposed model with 80:20 proportions achieve an accuracy of 98.15%, an f1-score of 98.16%, a precision of 98.18%, a recall of 98.15%, and an r-square of 84.75%, surpassing all other models for forecasting student performance. Using the specified data size, this model creation time is 8.49 s. The authors also recommended the future research directions to further enhance this study.
期刊介绍:
The Journal of Education and Information Technologies (EAIT) is a platform for the range of debates and issues in the field of Computing Education as well as the many uses of information and communication technology (ICT) across many educational subjects and sectors. It probes the use of computing to improve education and learning in a variety of settings, platforms and environments.
The journal aims to provide perspectives at all levels, from the micro level of specific pedagogical approaches in Computing Education and applications or instances of use in classrooms, to macro concerns of national policies and major projects; from pre-school classes to adults in tertiary institutions; from teachers and administrators to researchers and designers; from institutions to online and lifelong learning. The journal is embedded in the research and practice of professionals within the contemporary global context and its breadth and scope encourage debate on fundamental issues at all levels and from different research paradigms and learning theories. The journal does not proselytize on behalf of the technologies (whether they be mobile, desktop, interactive, virtual, games-based or learning management systems) but rather provokes debate on all the complex relationships within and between computing and education, whether they are in informal or formal settings. It probes state of the art technologies in Computing Education and it also considers the design and evaluation of digital educational artefacts. The journal aims to maintain and expand its international standing by careful selection on merit of the papers submitted, thus providing a credible ongoing forum for debate and scholarly discourse. Special Issues are occasionally published to cover particular issues in depth. EAIT invites readers to submit papers that draw inferences, probe theory and create new knowledge that informs practice, policy and scholarship. Readers are also invited to comment and reflect upon the argument and opinions published. EAIT is the official journal of the Technical Committee on Education of the International Federation for Information Processing (IFIP) in partnership with UNESCO.