Bridgitte Owusu-Boadu, Isaac Kofi Nti, O. Nyarko-Boateng, J. Aning, Victoria Boafo
{"title":"Academic Performance Modelling with Machine Learning Based on Cognitive and Non-Cognitive Features","authors":"Bridgitte Owusu-Boadu, Isaac Kofi Nti, O. Nyarko-Boateng, J. Aning, Victoria Boafo","doi":"10.2478/acss-2021-0015","DOIUrl":null,"url":null,"abstract":"Abstract The academic performance of students is essential for academic progression at all levels of education. However, the availability of several cognitive and non-cognitive factors that influence students’ academic performance makes it challenging for academic authorities to use conventional analytical tools to extract hidden knowledge in educational data. Therefore, Educational Data Mining (EDM) requires computational techniques to simplify planning and determining students who might be at risk of failing or dropping from school due to academic performance, thus helping resolve student retention. The paper studies several cognitive and non-cognitive factors such as academic, demographic, social and behavioural and their effect on student academic performance using machine learning algorithms. Heterogenous lazy and eager machine learning classifiers, including Decision Tree (DT), K-Nearest-Neighbour (KNN), Artificial Neural Network (ANN), Logistic Regression (LR), Random Forest (RF), AdaBoost and Support Vector Machine (SVM) were adopted and training was performed based on k-fold (k = 10) and leave-one-out cross-validation. We evaluated their predictive performance using well-known evaluation metrics like Area under Curve (AUC), F-1 score, Precision, Accuracy, Kappa, Matthew’s correlation coefficient (MCC) and Recall. The study outcome shows that Student Absence Days (SAD) are the most significant predictor of students’ academic performance. In terms of prediction accuracy and AUC, the RF (Acc = 0.771, AUC = 0.903), LR (Acc = 0.779, AUC = 0.90) and ANN (Acc = 0.760, AUC = 0.895) outperformed all other algorithms (KNN (Acc = 0.638, AUC = 0.826), SVM (Acc = 0.727, AUC = 0.80), DT (Acc = 0.733, AUC = 0.876) and AdaBoost (Acc = 0.748, AUC = 0.808)), making them more suitable for predicting students’ academic performance.","PeriodicalId":41960,"journal":{"name":"Applied Computer Systems","volume":"66 1","pages":"122 - 131"},"PeriodicalIF":0.5000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/acss-2021-0015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 3
Abstract
Abstract The academic performance of students is essential for academic progression at all levels of education. However, the availability of several cognitive and non-cognitive factors that influence students’ academic performance makes it challenging for academic authorities to use conventional analytical tools to extract hidden knowledge in educational data. Therefore, Educational Data Mining (EDM) requires computational techniques to simplify planning and determining students who might be at risk of failing or dropping from school due to academic performance, thus helping resolve student retention. The paper studies several cognitive and non-cognitive factors such as academic, demographic, social and behavioural and their effect on student academic performance using machine learning algorithms. Heterogenous lazy and eager machine learning classifiers, including Decision Tree (DT), K-Nearest-Neighbour (KNN), Artificial Neural Network (ANN), Logistic Regression (LR), Random Forest (RF), AdaBoost and Support Vector Machine (SVM) were adopted and training was performed based on k-fold (k = 10) and leave-one-out cross-validation. We evaluated their predictive performance using well-known evaluation metrics like Area under Curve (AUC), F-1 score, Precision, Accuracy, Kappa, Matthew’s correlation coefficient (MCC) and Recall. The study outcome shows that Student Absence Days (SAD) are the most significant predictor of students’ academic performance. In terms of prediction accuracy and AUC, the RF (Acc = 0.771, AUC = 0.903), LR (Acc = 0.779, AUC = 0.90) and ANN (Acc = 0.760, AUC = 0.895) outperformed all other algorithms (KNN (Acc = 0.638, AUC = 0.826), SVM (Acc = 0.727, AUC = 0.80), DT (Acc = 0.733, AUC = 0.876) and AdaBoost (Acc = 0.748, AUC = 0.808)), making them more suitable for predicting students’ academic performance.