Predictive Modeling of Student Dropout in MOOCs and Self-Regulated Learning

IF 2.6 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers Pub Date : 2023-09-27 DOI:10.3390/computers12100194

Georgios Psathas, Theano K. Chatzidaki, Stavros N. Demetriadis

{"title":"Predictive Modeling of Student Dropout in MOOCs and Self-Regulated Learning","authors":"Georgios Psathas, Theano K. Chatzidaki, Stavros N. Demetriadis","doi":"10.3390/computers12100194","DOIUrl":null,"url":null,"abstract":"The primary objective of this study is to examine the factors that contribute to the early prediction of Massive Open Online Courses (MOOCs) dropouts in order to identify and support at-risk students. We utilize MOOC data of specific duration, with a guided study pace. The dataset exhibits class imbalance, and we apply oversampling techniques to ensure data balancing and unbiased prediction. We examine the predictive performance of five classic classification machine learning (ML) algorithms under four different oversampling techniques and various evaluation metrics. Additionally, we explore the influence of self-reported self-regulated learning (SRL) data provided by students and various other prominent features of MOOCs as potential indicators of early stage dropout prediction. The research questions focus on (1) the performance of the classic classification ML models using various evaluation metrics before and after different methods of oversampling, (2) which self-reported data may constitute crucial predictors for dropout propensity, and (3) the effect of the SRL factor on the dropout prediction performance. The main conclusions are: (1) prominent predictors, including employment status, frequency of chat tool usage, prior subject-related experiences, gender, education, and willingness to participate, exhibit remarkable efficacy in achieving high to excellent recall performance, particularly when specific combinations of algorithms and oversampling methods are applied, (2) self-reported SRL factor, combined with easily provided/self-reported features, performed well as a predictor in terms of recall when LR and SVM algorithms were employed, (3) it is crucial to test diverse machine learning algorithms and oversampling methods in predictive modeling.","PeriodicalId":46292,"journal":{"name":"Computers","volume":"139 1","pages":"0"},"PeriodicalIF":2.6000,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/computers12100194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 1

Abstract

The primary objective of this study is to examine the factors that contribute to the early prediction of Massive Open Online Courses (MOOCs) dropouts in order to identify and support at-risk students. We utilize MOOC data of specific duration, with a guided study pace. The dataset exhibits class imbalance, and we apply oversampling techniques to ensure data balancing and unbiased prediction. We examine the predictive performance of five classic classification machine learning (ML) algorithms under four different oversampling techniques and various evaluation metrics. Additionally, we explore the influence of self-reported self-regulated learning (SRL) data provided by students and various other prominent features of MOOCs as potential indicators of early stage dropout prediction. The research questions focus on (1) the performance of the classic classification ML models using various evaluation metrics before and after different methods of oversampling, (2) which self-reported data may constitute crucial predictors for dropout propensity, and (3) the effect of the SRL factor on the dropout prediction performance. The main conclusions are: (1) prominent predictors, including employment status, frequency of chat tool usage, prior subject-related experiences, gender, education, and willingness to participate, exhibit remarkable efficacy in achieving high to excellent recall performance, particularly when specific combinations of algorithms and oversampling methods are applied, (2) self-reported SRL factor, combined with easily provided/self-reported features, performed well as a predictor in terms of recall when LR and SVM algorithms were employed, (3) it is crucial to test diverse machine learning algorithms and oversampling methods in predictive modeling.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

mooc学生退学预测模型与自主学习

本研究的主要目的是研究影响大规模在线开放课程(MOOCs)辍学早期预测的因素，以便识别和支持有风险的学生。我们利用特定时长的MOOC数据，指导学习节奏。数据集表现出类别不平衡，我们采用过采样技术来确保数据平衡和无偏预测。我们在四种不同的过采样技术和各种评估指标下研究了五种经典分类机器学习(ML)算法的预测性能。此外，我们还探讨了学生提供的自我报告自我调节学习(SRL)数据以及mooc的各种其他突出特征作为早期辍学预测的潜在指标的影响。研究问题集中在(1)使用不同过采样方法前后不同评价指标的经典分类ML模型的性能，(2)自我报告数据可能构成辍学倾向的关键预测因子，以及(3)SRL因素对辍学预测性能的影响。主要结论是:(1)突出的预测因素，包括就业状况、聊天工具使用频率、先前的主题相关经验、性别、教育程度和参与意愿，在实现高到优秀的召回性能方面表现出显著的有效性，特别是当应用算法和过采样方法的特定组合时;(2)自我报告的SRL因素，结合容易提供/自我报告的特征;当使用LR和SVM算法时，在召回率方面表现良好，(3)在预测建模中测试不同的机器学习算法和过采样方法至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊