利用教育数据挖掘寻找预测学生成绩的最小特征集

S. Sengupta
{"title":"利用教育数据挖掘寻找预测学生成绩的最小特征集","authors":"S. Sengupta","doi":"10.5815/ijmecs.2023.03.04","DOIUrl":null,"url":null,"abstract":": An early prediction of students' academic performance helps to identify at-risk students and enables management to take corrective actions to prevent them from going astray. Most of the research works in this field have used supervised machine learning approaches to their crafted datasets having numerous attributes or features. Since these datasets are not publicly available, it is hard to understand and compare the significance of the chosen features and the efficacy of the different machine learning models employed in the classification task. In this work, we analyzed 27 research papers published in the last ten tears (2011-2021) that used machine learning models for predicting students' performance. We identify the most frequently used features in the private datasets, their interrelationships, and abstraction levels. We also explored three popular public datasets and performed statistical analysis like the Chi-square test and Person's correlation on its features. A minimal set of essential features is prepared by fusing the frequent features and the statistically significant features. We propose an algorithm for selecting a minimal set of features from any dataset with a given set of features. We compared the performance of different machine learning models on the three public datasets in two experimental setups-one with the complete feature set and the other with a minimal set of features. Compared to using the complete feature set, it is observed that most supervised models perform nearly identically and, in some cases, even better with the reduced feature set. The proposed method is capable of identifying the most essential feature set from any new dataset for predicting students' performance.","PeriodicalId":36486,"journal":{"name":"International Journal of Modern Education and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Towards Finding a Minimal Set of Features for Predicting Students' Performance Using Educational Data Mining\",\"authors\":\"S. Sengupta\",\"doi\":\"10.5815/ijmecs.2023.03.04\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": An early prediction of students' academic performance helps to identify at-risk students and enables management to take corrective actions to prevent them from going astray. Most of the research works in this field have used supervised machine learning approaches to their crafted datasets having numerous attributes or features. Since these datasets are not publicly available, it is hard to understand and compare the significance of the chosen features and the efficacy of the different machine learning models employed in the classification task. In this work, we analyzed 27 research papers published in the last ten tears (2011-2021) that used machine learning models for predicting students' performance. We identify the most frequently used features in the private datasets, their interrelationships, and abstraction levels. We also explored three popular public datasets and performed statistical analysis like the Chi-square test and Person's correlation on its features. A minimal set of essential features is prepared by fusing the frequent features and the statistically significant features. We propose an algorithm for selecting a minimal set of features from any dataset with a given set of features. We compared the performance of different machine learning models on the three public datasets in two experimental setups-one with the complete feature set and the other with a minimal set of features. Compared to using the complete feature set, it is observed that most supervised models perform nearly identically and, in some cases, even better with the reduced feature set. The proposed method is capable of identifying the most essential feature set from any new dataset for predicting students' performance.\",\"PeriodicalId\":36486,\"journal\":{\"name\":\"International Journal of Modern Education and Computer Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Modern Education and Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5815/ijmecs.2023.03.04\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Modern Education and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijmecs.2023.03.04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 1

摘要

:对学生学习成绩的早期预测有助于识别有风险的学生,并使管理层能够采取纠正措施,防止他们误入歧途。该领域的大多数研究工作都对其精心制作的具有众多属性或特征的数据集使用了监督机器学习方法。由于这些数据集尚未公开,因此很难理解和比较所选特征的重要性以及分类任务中使用的不同机器学习模型的功效。在这项工作中,我们分析了最近十年(2011-2021年)发表的27篇研究论文,这些论文使用机器学习模型来预测学生的表现。我们确定了私有数据集中最常用的功能、它们的相互关系和抽象级别。我们还探索了三个流行的公共数据集,并对其特征进行了统计分析,如卡方检验和Person相关性。通过融合频繁特征和统计显著特征来制备基本特征的最小集合。我们提出了一种算法,用于从具有给定特征集的任何数据集中选择最小特征集。我们在两个实验设置中比较了不同机器学习模型在三个公共数据集上的性能,一个具有完整的特征集,另一个具有最小的特征集。与使用完整特征集相比,可以观察到大多数监督模型的性能几乎相同,在某些情况下,使用简化特征集甚至更好。所提出的方法能够从任何新的数据集中识别出最重要的特征集,用于预测学生的表现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Towards Finding a Minimal Set of Features for Predicting Students' Performance Using Educational Data Mining
: An early prediction of students' academic performance helps to identify at-risk students and enables management to take corrective actions to prevent them from going astray. Most of the research works in this field have used supervised machine learning approaches to their crafted datasets having numerous attributes or features. Since these datasets are not publicly available, it is hard to understand and compare the significance of the chosen features and the efficacy of the different machine learning models employed in the classification task. In this work, we analyzed 27 research papers published in the last ten tears (2011-2021) that used machine learning models for predicting students' performance. We identify the most frequently used features in the private datasets, their interrelationships, and abstraction levels. We also explored three popular public datasets and performed statistical analysis like the Chi-square test and Person's correlation on its features. A minimal set of essential features is prepared by fusing the frequent features and the statistically significant features. We propose an algorithm for selecting a minimal set of features from any dataset with a given set of features. We compared the performance of different machine learning models on the three public datasets in two experimental setups-one with the complete feature set and the other with a minimal set of features. Compared to using the complete feature set, it is observed that most supervised models perform nearly identically and, in some cases, even better with the reduced feature set. The proposed method is capable of identifying the most essential feature set from any new dataset for predicting students' performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.70
自引率
0.00%
发文量
29
期刊最新文献
Enhancing Algorithm and ProgrammingEducation through Collaborative BlendedLearning: A Problem-Based Approach for FirstYear Students Review the Prospects and Obstacles of AIEnhanced Learning Environments: The Role ofChatGPT in Education A Hybrid Method based on Rules and DeepLearning for Enhancing Single-Word and MultiWord Aspects Extraction from French Reviews Communicative Tasks as a Means of Developingthe Emotional Intelligence of Students Students’ Results Prediction Using MachineLearning Algorithms and Online Learning duringthe COVID-19 Pandemic
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1