A Predictive Model for the Early Identification of Student Dropout Using Data Classification, Clustering, and Association Methods

IF 1 Q4 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Revista Iberoamericana de Tecnologias del Aprendizaje Pub Date : 2025-01-13 DOI:10.1109/RITA.2025.3528369

Patricia Mariotto Mozzaquatro Chicon;Leo Natan Paschoal;Sandro Sawicki;Fabricia Roos-Frantz;Rafael Z. Frantz

{"title":"A Predictive Model for the Early Identification of Student Dropout Using Data Classification, Clustering, and Association Methods","authors":"Patricia Mariotto Mozzaquatro Chicon;Leo Natan Paschoal;Sandro Sawicki;Fabricia Roos-Frantz;Rafael Z. Frantz","doi":"10.1109/RITA.2025.3528369","DOIUrl":null,"url":null,"abstract":"Technology development has led to increased data generated in education, sparking interest in information extraction to support educational management through automated data analysis. Using such data to create models identifying students likely to drop out has drawn research interest. A crucial factor in reducing dropout rates is the systematic and early identification of the level of student engagement, especially by detecting the students’ behavior profile in the virtual environment, such as grades in assessments. There are predictive models based on data mining processes that identify students prone to dropping out. Unfortunately, the predictive models do not characterize the profiles of these students or the specific trends associated with these profiles. This article aims to fill a gap by presenting a study that identifies and tracks the profiles of undergraduate students likely to drop out, starting with an analysis of academic performance. We propose a predictive model beyond classification by combining data mining techniques such as decision trees, clustering, and frequent pattern analysis. Decision trees, a data mining technique that uses a tree-like graph to represent decisions and their possible consequences, identify students at risk of failure from the entire dataset. Clustering analysis, a data mining technique that groups similar data points together, groups students based on similar characteristics (e.g., students who scored between 0 and 30 points on a specific activity). Frequent pattern analysis, a data mining technique that identifies patterns that occur frequently in a dataset, uncovers the underlying factors contributing to low performance (e.g., identify which activities had the most significant influence on a specific group’s low performance). This integrated approach predicts dropout risk with 93.9% precision and provides a deeper understanding of student profiles and the trends associated with academic failure. The model’s practical application is demonstrated through a study.","PeriodicalId":38963,"journal":{"name":"Revista Iberoamericana de Tecnologias del Aprendizaje","volume":"20 ","pages":"12-21"},"PeriodicalIF":1.0000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista Iberoamericana de Tecnologias del Aprendizaje","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10839030/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Technology development has led to increased data generated in education, sparking interest in information extraction to support educational management through automated data analysis. Using such data to create models identifying students likely to drop out has drawn research interest. A crucial factor in reducing dropout rates is the systematic and early identification of the level of student engagement, especially by detecting the students’ behavior profile in the virtual environment, such as grades in assessments. There are predictive models based on data mining processes that identify students prone to dropping out. Unfortunately, the predictive models do not characterize the profiles of these students or the specific trends associated with these profiles. This article aims to fill a gap by presenting a study that identifies and tracks the profiles of undergraduate students likely to drop out, starting with an analysis of academic performance. We propose a predictive model beyond classification by combining data mining techniques such as decision trees, clustering, and frequent pattern analysis. Decision trees, a data mining technique that uses a tree-like graph to represent decisions and their possible consequences, identify students at risk of failure from the entire dataset. Clustering analysis, a data mining technique that groups similar data points together, groups students based on similar characteristics (e.g., students who scored between 0 and 30 points on a specific activity). Frequent pattern analysis, a data mining technique that identifies patterns that occur frequently in a dataset, uncovers the underlying factors contributing to low performance (e.g., identify which activities had the most significant influence on a specific group’s low performance). This integrated approach predicts dropout risk with 93.9% precision and provides a deeper understanding of student profiles and the trends associated with academic failure. The model’s practical application is demonstrated through a study.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用数据分类、聚类和关联方法早期识别学生辍学的预测模型

技术的发展导致教育中产生的数据增加，激发了人们对信息提取的兴趣，通过自动化数据分析来支持教育管理。利用这些数据来创建识别可能辍学的学生的模型已经引起了研究的兴趣。降低辍学率的一个关键因素是系统地和早期地识别学生的参与水平，特别是通过检测学生在虚拟环境中的行为特征，例如评估中的成绩。有一些基于数据挖掘过程的预测模型可以识别出容易辍学的学生。不幸的是，这些预测模型并没有描述这些学生的概况，也没有描述与这些概况相关的具体趋势。本文旨在通过一项研究来填补这一空白，该研究从对学业表现的分析开始，识别并跟踪可能辍学的本科生的概况。通过结合决策树、聚类和频繁模式分析等数据挖掘技术，提出了一种超越分类的预测模型。决策树是一种数据挖掘技术，它使用树状图来表示决策及其可能的后果，从整个数据集中识别出有失败风险的学生。聚类分析是一种数据挖掘技术，将相似的数据点分组在一起，根据相似的特征对学生进行分组（例如，在特定活动中得分在0到30分之间的学生）。频繁模式分析是一种数据挖掘技术，可识别数据集中频繁出现的模式，揭示导致低绩效的潜在因素（例如，确定哪些活动对特定组的低绩效影响最大）。这种综合方法预测退学风险的准确率为93.9%，并提供了对学生概况和与学业失败相关的趋势的更深入了解。通过研究证明了该模型的实际应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊