Examining Techniques to Solving Imbalanced Datasets in Educational Data Mining Systems

Q3 Computer Science International Journal of Computing Pub Date : 2022-06-30 DOI:10.47839/ijc.21.2.2589
Ahmed Al-Ashoor, S. Abdullah
{"title":"Examining Techniques to Solving Imbalanced Datasets in Educational Data Mining Systems","authors":"Ahmed Al-Ashoor, S. Abdullah","doi":"10.47839/ijc.21.2.2589","DOIUrl":null,"url":null,"abstract":"The educational data mining research attempts have contributed in developing policies to improve student learning in different levels of educational institutions. One of the common challenges to building accurate classification and prediction systems is the imbalanced distribution of classes in the data collected. This study investigates data-level techniques and algorithm-level techniques. Six classifiers from each technique are used to explore their effectiveness to handle the imbalanced data problem while predicting students’ graduation grade based on their performance at the first stage. The classifiers are tested using the k-fold cross-validation approach before and after applying the data-level and algorithm-level techniques. For the purpose of evaluation, various evaluation metrics have been used such as accuracy, precision, recall, and f1-score. The results showed that the classifiers do not perform well with imbalanced dataset, and the performance could be improved by using these techniques. As for the level of improvement, it varies from one technique to another. Additionally, the results of the statistical hypothesis testing confirmed that there were no statistically significant differences for classifiers of the two techniques.","PeriodicalId":37669,"journal":{"name":"International Journal of Computing","volume":"21 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47839/ijc.21.2.2589","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 2

Abstract

The educational data mining research attempts have contributed in developing policies to improve student learning in different levels of educational institutions. One of the common challenges to building accurate classification and prediction systems is the imbalanced distribution of classes in the data collected. This study investigates data-level techniques and algorithm-level techniques. Six classifiers from each technique are used to explore their effectiveness to handle the imbalanced data problem while predicting students’ graduation grade based on their performance at the first stage. The classifiers are tested using the k-fold cross-validation approach before and after applying the data-level and algorithm-level techniques. For the purpose of evaluation, various evaluation metrics have been used such as accuracy, precision, recall, and f1-score. The results showed that the classifiers do not perform well with imbalanced dataset, and the performance could be improved by using these techniques. As for the level of improvement, it varies from one technique to another. Additionally, the results of the statistical hypothesis testing confirmed that there were no statistically significant differences for classifiers of the two techniques.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
探讨解决教育数据挖掘系统中不平衡数据集的技术
教育数据挖掘的研究尝试有助于制定政策,以改善不同层次教育机构的学生学习。建立准确的分类和预测系统的常见挑战之一是所收集数据中类别分布的不平衡。本研究探讨了数据级技术和算法级技术。在根据学生在第一阶段的表现预测学生毕业成绩的同时,利用每种技术中的6个分类器来探索它们处理数据不平衡问题的有效性。在应用数据级和算法级技术之前和之后,使用k-fold交叉验证方法对分类器进行测试。为了评估的目的,使用了各种评估指标,如准确性、精度、召回率和f1-score。结果表明,分类器在不平衡数据集上表现不佳,使用这些技术可以提高分类器的性能。至于提高的程度,则因技术的不同而不同。此外,统计假设检验的结果证实,两种技术的分类器没有统计学上的显著差异。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Journal of Computing
International Journal of Computing Computer Science-Computer Science (miscellaneous)
CiteScore
2.20
自引率
0.00%
发文量
39
期刊介绍: The International Journal of Computing Journal was established in 2002 on the base of Branch Research Laboratory for Automated Systems and Networks, since 2005 it’s renamed as Research Institute of Intelligent Computer Systems. A goal of the Journal is to publish papers with the novel results in Computing Science and Computer Engineering and Information Technologies and Software Engineering and Information Systems within the Journal topics. The official language of the Journal is English; also papers abstracts in both Ukrainian and Russian languages are published there. The issues of the Journal are published quarterly. The Editorial Board consists of about 30 recognized worldwide scientists.
期刊最新文献
Website Quality Measurement of Educational Government Agency in Indonesia using Modified WebQual 4.0 A Comparative Study of Data Annotations and Fluent Validation in .NET Attr4Vis: Revisiting Importance of Attribute Classification in Vision-Language Models for Video Recognition The Improved Method for Identifying Parameters of Interval Nonlinear Models of Static Systems Image Transmission in WMSN Based on Residue Number System
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1