An Analysis of Student Representation, Representative Features and Classification Algorithms to Predict Degree Dropout

R. Manrique, B. Nunes, O. Mariño, M. Casanova, Terhi Nurmikko-Fuller
{"title":"An Analysis of Student Representation, Representative Features and Classification Algorithms to Predict Degree Dropout","authors":"R. Manrique, B. Nunes, O. Mariño, M. Casanova, Terhi Nurmikko-Fuller","doi":"10.1145/3303772.3303800","DOIUrl":null,"url":null,"abstract":"Identifying and monitoring students who are likely to dropout is a vital issue for universities. Early detection allows institutions to intervene, addressing problems and retaining students. Prior research into the early detection of at-risk students has opted for the use of predictive models, but a comprehensive assessment of the suitability of different algorithms and approaches is complicated by the large number of variable features that constitute a student's educational experience. Predictive models vary in terms of their amplitude, temporality and the learning algorithms employed. While amplitude refers to the ability of the model to operate on multiple degrees, temporality is often considered due to the natural temporal aspect of the data. In the absence of a comparative framework of learning algorithms, the aim of this paper has been to provide such an analysis, based on a proposed classification of strategies for predicting dropouts in Higher Education Institutions. Three different student representations are implemented (namely Global Feature-Based, Local Feature-Based, and Time Series) in conjunction with the appropriate learning algorithms for each of them. A description of each approach, as well as its implementation process, are presented in this paper as technical contributions. An experiment based on a dataset of student information from two degrees, namely Business Administration and Architecture, acquired through an automated management system from a university in Brazil is used. Our findings can be summarized as: (i) of the three proposed student representations, the Local Feature-Based was the most suitable approach for predicting dropout. In addition to providing high quality results, the Local Feature-Based representations are simple to build, and the construction of the model is less expensive when compared to more complex ones; (ii) as a conclusion of the results obtained via Local Feature-Based, dropout can be said to be accurately predicted using grades of a few core courses, so there is no need for a complex features extraction process; (iii) considering temporal aspects of the data does not seem to contribute to the prediction performance although it increases computational costs as the model complexity increases.","PeriodicalId":382957,"journal":{"name":"Proceedings of the 9th International Conference on Learning Analytics & Knowledge","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Conference on Learning Analytics & Knowledge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3303772.3303800","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27

Abstract

Identifying and monitoring students who are likely to dropout is a vital issue for universities. Early detection allows institutions to intervene, addressing problems and retaining students. Prior research into the early detection of at-risk students has opted for the use of predictive models, but a comprehensive assessment of the suitability of different algorithms and approaches is complicated by the large number of variable features that constitute a student's educational experience. Predictive models vary in terms of their amplitude, temporality and the learning algorithms employed. While amplitude refers to the ability of the model to operate on multiple degrees, temporality is often considered due to the natural temporal aspect of the data. In the absence of a comparative framework of learning algorithms, the aim of this paper has been to provide such an analysis, based on a proposed classification of strategies for predicting dropouts in Higher Education Institutions. Three different student representations are implemented (namely Global Feature-Based, Local Feature-Based, and Time Series) in conjunction with the appropriate learning algorithms for each of them. A description of each approach, as well as its implementation process, are presented in this paper as technical contributions. An experiment based on a dataset of student information from two degrees, namely Business Administration and Architecture, acquired through an automated management system from a university in Brazil is used. Our findings can be summarized as: (i) of the three proposed student representations, the Local Feature-Based was the most suitable approach for predicting dropout. In addition to providing high quality results, the Local Feature-Based representations are simple to build, and the construction of the model is less expensive when compared to more complex ones; (ii) as a conclusion of the results obtained via Local Feature-Based, dropout can be said to be accurately predicted using grades of a few core courses, so there is no need for a complex features extraction process; (iii) considering temporal aspects of the data does not seem to contribute to the prediction performance although it increases computational costs as the model complexity increases.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
学生代表性分析、代表性特征及分类算法预测学位辍学
对大学来说,识别和监控可能辍学的学生是一个至关重要的问题。及早发现可以让学校进行干预,解决问题并留住学生。先前对高危学生早期检测的研究选择了使用预测模型,但由于学生教育经历中存在大量可变特征,因此对不同算法和方法的适用性进行全面评估变得复杂。预测模型在其振幅、时间性和所采用的学习算法方面各不相同。振幅指的是模型在多个度上运行的能力,而时间性通常被认为是由于数据的自然时间方面。在缺乏学习算法的比较框架的情况下,本文的目的是提供这样一个分析,基于预测高等教育机构辍学的策略的拟议分类。实现了三种不同的学生表示(即基于全局特征、基于局部特征和时间序列),并结合了相应的学习算法。每一种方法的描述,以及它的实现过程,在本文中作为技术贡献呈现。实验基于巴西一所大学通过自动化管理系统获得的工商管理和建筑学两个学位的学生信息数据集。我们的研究结果可以总结为:(i)在三种提出的学生表征中,基于局部特征的方法是最适合预测退学的方法。除了提供高质量的结果外,基于局部特征的表示易于构建,并且与更复杂的模型相比,模型的构建成本更低;(ii)从Local Feature-Based得到的结果来看,可以用少数核心课程的成绩准确预测辍学率,不需要进行复杂的特征提取过程;(iii)考虑数据的时间方面似乎对预测性能没有贡献,尽管它会随着模型复杂性的增加而增加计算成本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
DiAd: Domain Adaptation for Learning at Scale Top Concept Networks of Professional Education Reflections Exploring the Subtleties of Agency and Indirect Control in Digital Learning Games Exploring Learner Engagement Patterns in Teach-Outs Using Topic, Sentiment and On-topicness to Reflect on Pedagogy An Analysis of Student Representation, Representative Features and Classification Algorithms to Predict Degree Dropout
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1