在辍学预测中兼顾性能和可解释性

IF 2.9 3区教育学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS IEEE Transactions on Learning Technologies Pub Date : 2024-07-26 DOI:10.1109/TLT.2024.3425959

Andrea Zanellati;Stefano Pio Zingaro;Maurizio Gabbrielli

{"title":"在辍学预测中兼顾性能和可解释性","authors":"Andrea Zanellati;Stefano Pio Zingaro;Maurizio Gabbrielli","doi":"10.1109/TLT.2024.3425959","DOIUrl":null,"url":null,"abstract":"Academic dropout remains a significant challenge for education systems, necessitating rigorous analysis and targeted interventions. This study employs machine learning techniques, specifically random forest (RF) and feature tokenizer transformer (FTT), to predict academic attrition. Utilizing a comprehensive dataset of over 40 000 students from an Italian university, the research incorporates a range of variables, including demographic information, prior educational metrics, and real-time academic performance indicators. We present a nuanced comparative evaluation of the RF and FTT models, highlighting their predictive accuracy and interpretative capabilities. Our empirical results demonstrate the effectiveness of machine learning in managing student attrition, with FTT models outperforming RF models in terms of predictive accuracy and achieving a sensitivity rate of 81%. Significantly, the inclusion of historical academic data enhances the models' ability to identify students at increased risk of dropping out. Furthermore, we apply advanced explanatory techniques, such as shapley additive explanations, to investigate the discriminative power of these models across different student profiles. This provides valuable insights into the key variables influencing dropout risk, contributing to a more holistic understanding of the issue. In addition, we conduct a fairness analysis to ensure the ethical robustness of our predictive models, making them not only effective but also equitable tools.","PeriodicalId":49191,"journal":{"name":"IEEE Transactions on Learning Technologies","volume":"17 ","pages":"2140-2153"},"PeriodicalIF":2.9000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10612222","citationCount":"0","resultStr":"{\"title\":\"Balancing Performance and Explainability in Academic Dropout Prediction\",\"authors\":\"Andrea Zanellati;Stefano Pio Zingaro;Maurizio Gabbrielli\",\"doi\":\"10.1109/TLT.2024.3425959\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Academic dropout remains a significant challenge for education systems, necessitating rigorous analysis and targeted interventions. This study employs machine learning techniques, specifically random forest (RF) and feature tokenizer transformer (FTT), to predict academic attrition. Utilizing a comprehensive dataset of over 40 000 students from an Italian university, the research incorporates a range of variables, including demographic information, prior educational metrics, and real-time academic performance indicators. We present a nuanced comparative evaluation of the RF and FTT models, highlighting their predictive accuracy and interpretative capabilities. Our empirical results demonstrate the effectiveness of machine learning in managing student attrition, with FTT models outperforming RF models in terms of predictive accuracy and achieving a sensitivity rate of 81%. Significantly, the inclusion of historical academic data enhances the models' ability to identify students at increased risk of dropping out. Furthermore, we apply advanced explanatory techniques, such as shapley additive explanations, to investigate the discriminative power of these models across different student profiles. This provides valuable insights into the key variables influencing dropout risk, contributing to a more holistic understanding of the issue. In addition, we conduct a fairness analysis to ensure the ethical robustness of our predictive models, making them not only effective but also equitable tools.\",\"PeriodicalId\":49191,\"journal\":{\"name\":\"IEEE Transactions on Learning Technologies\",\"volume\":\"17 \",\"pages\":\"2140-2153\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10612222\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Learning Technologies\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10612222/\",\"RegionNum\":3,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Learning Technologies","FirstCategoryId":"95","ListUrlMain":"https://ieeexplore.ieee.org/document/10612222/","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

辍学仍然是教育系统面临的一个重大挑战，需要进行严格的分析和有针对性的干预。本研究采用机器学习技术，特别是随机森林（RF）和特征标记转换器（FTT）来预测学业流失。研究利用意大利一所大学 40,000 多名学生的综合数据集，纳入了一系列变量，包括人口统计信息、先前的教育指标和实时学业成绩指标。我们对 RF 模型和 FTT 模型进行了细致入微的比较评估，强调了它们的预测准确性和解释能力。我们的实证结果证明了机器学习在管理学生流失方面的有效性，FTT 模型在预测准确性方面优于 RF 模型，灵敏度高达 81%。值得注意的是，历史学业数据的加入增强了模型识别高辍学风险学生的能力。此外，我们还应用了先进的解释技术（如夏普利加法解释）来研究这些模型在不同学生情况下的判别能力。这为我们深入了解影响辍学风险的关键变量提供了宝贵的资料，有助于我们更全面地认识辍学问题。此外，我们还进行了公平性分析，以确保我们的预测模型在道德上的稳健性，使其不仅有效，而且成为公平的工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Balancing Performance and Explainability in Academic Dropout Prediction

Academic dropout remains a significant challenge for education systems, necessitating rigorous analysis and targeted interventions. This study employs machine learning techniques, specifically random forest (RF) and feature tokenizer transformer (FTT), to predict academic attrition. Utilizing a comprehensive dataset of over 40 000 students from an Italian university, the research incorporates a range of variables, including demographic information, prior educational metrics, and real-time academic performance indicators. We present a nuanced comparative evaluation of the RF and FTT models, highlighting their predictive accuracy and interpretative capabilities. Our empirical results demonstrate the effectiveness of machine learning in managing student attrition, with FTT models outperforming RF models in terms of predictive accuracy and achieving a sensitivity rate of 81%. Significantly, the inclusion of historical academic data enhances the models' ability to identify students at increased risk of dropping out. Furthermore, we apply advanced explanatory techniques, such as shapley additive explanations, to investigate the discriminative power of these models across different student profiles. This provides valuable insights into the key variables influencing dropout risk, contributing to a more holistic understanding of the issue. In addition, we conduct a fairness analysis to ensure the ethical robustness of our predictive models, making them not only effective but also equitable tools.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Learning Technologies COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-

CiteScore

7.50

自引率

5.40%

发文量

审稿时长

>12 weeks

期刊介绍： The IEEE Transactions on Learning Technologies covers all advances in learning technologies and their applications, including but not limited to the following topics: innovative online learning systems; intelligent tutors; educational games; simulation systems for education and training; collaborative learning tools; learning with mobile devices; wearable devices and interfaces for learning; personalized and adaptive learning systems; tools for formative and summative assessment; tools for learning analytics and educational data mining; ontologies for learning systems; standards and web services that support learning; authoring tools for learning materials; computer support for peer tutoring; learning via computer-mediated inquiry, field, and lab work; social learning techniques; social networks and infrastructures for learning and knowledge sharing; and creation and management of learning objects.