Explainable artificial intelligence-machine learning models to estimate overall scores in tertiary preparatory general science course

Sujan Ghimire , Shahab Abdulla , Lionel P. Joseph , Salvin Prasad , Angela Murphy , Aruna Devi , Prabal Datta Barua , Ravinesh C. Deo , Rajendra Acharya , Zaher Mundher Yaseen
{"title":"Explainable artificial intelligence-machine learning models to estimate overall scores in tertiary preparatory general science course","authors":"Sujan Ghimire ,&nbsp;Shahab Abdulla ,&nbsp;Lionel P. Joseph ,&nbsp;Salvin Prasad ,&nbsp;Angela Murphy ,&nbsp;Aruna Devi ,&nbsp;Prabal Datta Barua ,&nbsp;Ravinesh C. Deo ,&nbsp;Rajendra Acharya ,&nbsp;Zaher Mundher Yaseen","doi":"10.1016/j.caeai.2024.100331","DOIUrl":null,"url":null,"abstract":"<div><div>Educational data mining is valuable for uncovering latent relationships in educational settings, particularly for predicting students' academic performance. This study introduces an interpretable hybrid model, optimised through Tree-structured Parzen Estimation (TPE) and Support Vector Regression (SVR), to predict overall scores (OT) utilising five assignments and one examination mark as predictors. Neural Network-based, Tree-Based, Ensemble-Based, and Boosting-based methods are evaluated against the hybrid TPE-optimised SVR model for forecasting final examination grades among 492 students enrolled in the TPP7155 (General Science) course at the University of Southern Queensland, Australia, during the 2020-2021 academic year. Additionally, Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive explanations (SHAP) techniques are employed to elucidate the inner workings of these prediction models. The findings highlight the superior performance of the proposed model, exhibiting the lowest Root Mean Squared Error (<span><math><mi>R</mi><mi>M</mi><mi>S</mi><mi>E</mi></math></span>) and Relative Root Mean Squared Error (<span><math><mi>R</mi><mi>R</mi><mi>M</mi><mi>S</mi><mi>E</mi></math></span>), as well as the highest Willmott's index (<em>WI</em>), Legates–McCabe index (<em>LM</em>), and Nash–Sutcliffe Efficiency (<em>NS</em>). With assignment and examination marks identified as pivotal predictors of OT. SHAP and LIME analyses reveal the examination score (ET) as the most influential feature, impacting predicted OT by an average of ±4.93. Conversely, Assignment 1 emerges as the least informative feature, contributing merely ±0.64 to OT predictions. This research underscores the efficacy of the proposed interpretable hybrid TPE-optimised SVR model in discerning relationships among continuous learning variables, thereby empowering educators with early intervention capabilities and enhancing their ability to anticipate student performance prior to course completion.</div></div>","PeriodicalId":34469,"journal":{"name":"Computers and Education Artificial Intelligence","volume":"7 ","pages":"Article 100331"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Education Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666920X24001346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

Educational data mining is valuable for uncovering latent relationships in educational settings, particularly for predicting students' academic performance. This study introduces an interpretable hybrid model, optimised through Tree-structured Parzen Estimation (TPE) and Support Vector Regression (SVR), to predict overall scores (OT) utilising five assignments and one examination mark as predictors. Neural Network-based, Tree-Based, Ensemble-Based, and Boosting-based methods are evaluated against the hybrid TPE-optimised SVR model for forecasting final examination grades among 492 students enrolled in the TPP7155 (General Science) course at the University of Southern Queensland, Australia, during the 2020-2021 academic year. Additionally, Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive explanations (SHAP) techniques are employed to elucidate the inner workings of these prediction models. The findings highlight the superior performance of the proposed model, exhibiting the lowest Root Mean Squared Error (RMSE) and Relative Root Mean Squared Error (RRMSE), as well as the highest Willmott's index (WI), Legates–McCabe index (LM), and Nash–Sutcliffe Efficiency (NS). With assignment and examination marks identified as pivotal predictors of OT. SHAP and LIME analyses reveal the examination score (ET) as the most influential feature, impacting predicted OT by an average of ±4.93. Conversely, Assignment 1 emerges as the least informative feature, contributing merely ±0.64 to OT predictions. This research underscores the efficacy of the proposed interpretable hybrid TPE-optimised SVR model in discerning relationships among continuous learning variables, thereby empowering educators with early intervention capabilities and enhancing their ability to anticipate student performance prior to course completion.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用可解释的人工智能-机器学习模型估算大学预科普通科学课程的总分
教育数据挖掘对于揭示教育环境中的潜在关系,尤其是预测学生的学业成绩非常有价值。本研究引入了一个可解释的混合模型,通过树状结构帕尔森估计(TPE)和支持向量回归(SVR)进行优化,利用五门功课和一门考试的分数作为预测因子,预测总分(OT)。在预测澳大利亚南昆士兰大学 2020-2021 学年 TPP7155(普通科学)课程 492 名学生的期末考试成绩时,对基于神经网络、基于树、基于集合和基于提升的方法与混合 TPE 优化 SVR 模型进行了评估。此外,研究还采用了本地可解释模型解释(LIME)和SHapley加性解释(SHAP)技术来阐明这些预测模型的内部运作。研究结果凸显了所提模型的卓越性能,显示出最低的均方根误差(RMSE)和相对均方根误差(RRMSE),以及最高的威尔莫特指数(WI)、莱格茨-麦凯比指数(LM)和纳什-苏特克利夫效率(NS)。作业和考试分数是预测加时赛的关键因素。SHAP 和 LIME 分析显示,考试分数(ET)是影响最大的特征,对预测加时赛的平均影响为 ±4.93。相反,作业 1 是信息量最小的特征,对 OT 预测的影响仅为 ±0.64。这项研究强调了所提出的可解释混合 TPE 优化 SVR 模型在辨别连续学习变量之间关系方面的功效,从而赋予教育工作者早期干预能力,并提高他们在课程完成前预测学生成绩的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
16.80
自引率
0.00%
发文量
66
审稿时长
50 days
期刊最新文献
Enhancing data analysis and programming skills through structured prompt training: The impact of generative AI in engineering education Understanding the practices, perceptions, and (dis)trust of generative AI among instructors: A mixed-methods study in the U.S. higher education Technological self-efficacy and sense of coherence: Key drivers in teachers' AI acceptance and adoption The influence of AI literacy on complex problem-solving skills through systematic thinking skills and intuition thinking skills: An empirical study in Thai gen Z accounting students Psychometrics of an Elo-based large-scale online learning system
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1