Оптимизация конструкции ансамбля классификаторов: пример интеллектуального анализа образовательных данных

Y. K. Salal, S. Abdullaev
{"title":"Оптимизация конструкции ансамбля классификаторов: пример интеллектуального анализа образовательных данных","authors":"Y. K. Salal, S. Abdullaev","doi":"10.14529/CTCR190414","DOIUrl":null,"url":null,"abstract":"The choosing the best prediction method of education results is major challenge of Educational Data Mining (EDM). This EDM paper compares the results of student's performance forecast produced by the individual binary classifiers (Naive Bayes, Decision Tree, Multi-Layer Perceptron, Nearest Neighbors, Support Vector Machine algorithms) and their ensembles, which are trained (tested) on dataset containing up to 38 input attributes (weekly attendance in mathematics, the intensity of study, interim assessment) of 84 (36) secondary school students from Nasiriyah, Iraq. The two-class school performance was predicted – passing or not passing on final exam. Three following stages of comparison were completed. Аt the first stage of the experiment, the dependence of classifiers from the input attributes was investigated. It was shown that the forecast accuracy rises from 61.1–77.7% when all 38 attributes were used, to 75.0–80.5%, if base classifier trained with five attributes pre-selected by Ranker Search method. Then, in second stage, to each of the basе classifier the AdaBoost M1 procedure has been applied and five homogenous ensembles were created. And only two of these ensembles demonstrated small rise of 3% in accuracy comparing to corresponding stand-alone classifier, but the overall maximal prediction accuracy of 80.5% stayed the same. Finally, comparing the accuracies of 77.7% and 83.3% achieved by the heterogeneous ensemble consisted of five simple voting base classifiers and by the heterogeneous meta-ensemble of five simple voting AdaBoost homogenous ensembles correspondingly, we conclude that improvement of the quality of the individual classifier or homogeneous ensembles allows to construct more powerful EDM prediction methods.","PeriodicalId":338904,"journal":{"name":"Bulletin of the South Ural State University. Ser. Computer Technologies, Automatic Control & Radioelectronics","volume":"17 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of the South Ural State University. Ser. Computer Technologies, Automatic Control & Radioelectronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14529/CTCR190414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The choosing the best prediction method of education results is major challenge of Educational Data Mining (EDM). This EDM paper compares the results of student's performance forecast produced by the individual binary classifiers (Naive Bayes, Decision Tree, Multi-Layer Perceptron, Nearest Neighbors, Support Vector Machine algorithms) and their ensembles, which are trained (tested) on dataset containing up to 38 input attributes (weekly attendance in mathematics, the intensity of study, interim assessment) of 84 (36) secondary school students from Nasiriyah, Iraq. The two-class school performance was predicted – passing or not passing on final exam. Three following stages of comparison were completed. Аt the first stage of the experiment, the dependence of classifiers from the input attributes was investigated. It was shown that the forecast accuracy rises from 61.1–77.7% when all 38 attributes were used, to 75.0–80.5%, if base classifier trained with five attributes pre-selected by Ranker Search method. Then, in second stage, to each of the basе classifier the AdaBoost M1 procedure has been applied and five homogenous ensembles were created. And only two of these ensembles demonstrated small rise of 3% in accuracy comparing to corresponding stand-alone classifier, but the overall maximal prediction accuracy of 80.5% stayed the same. Finally, comparing the accuracies of 77.7% and 83.3% achieved by the heterogeneous ensemble consisted of five simple voting base classifiers and by the heterogeneous meta-ensemble of five simple voting AdaBoost homogenous ensembles correspondingly, we conclude that improvement of the quality of the individual classifier or homogeneous ensembles allows to construct more powerful EDM prediction methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
优化分级组结构:教育数据分析的一个例子
选择最佳的教育结果预测方法是教育数据挖掘的主要挑战。这篇EDM论文比较了单个二元分类器(朴素贝叶斯、决策树、多层感知器、最近邻、支持向量机算法)及其集合产生的学生表现预测结果,这些分类器在包含多达38个输入属性(每周数学出勤率、学习强度、中期评估)的数据集上进行了训练(测试),来自伊拉克纳西里耶的84(36)名中学生。两个班的学习成绩被预测——通过或不通过期末考试。完成了以下三个阶段的比较。Аt在实验的第一阶段,研究了分类器对输入属性的依赖性。结果表明,当使用全部38个属性时,预测准确率从61.1-77.7%提高到使用Ranker Search方法预先选择5个属性训练基分类器时的75.0-80.5%。然后,在第二阶段,对每个基线分类器应用AdaBoost M1程序,并创建五个同质集成。与相应的独立分类器相比,这些集成中只有两个显示出3%的准确度小幅上升,但总体最大预测准确度保持不变,为80.5%。最后,比较由5个简单投票基分类器组成的异构集成和由5个简单投票AdaBoost同质集成组成的异构元集成分别达到77.7%和83.3%的准确率,我们得出结论,单个分类器或同质集成质量的提高可以构建更强大的EDM预测方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Formalization of Basic Processes and Mathematical Model of the System for Monitoring and Analysis of Publications of Electronic Media Determination of the Parameters of the La¬mination of a Bimetallic Plate by Means of Active Thermal Non-Destructive Control Perm Region Natural Resource Potential Forecasting Using Machine Learning Models To the Question of Determining the Barometric Height by a Mechanical Altimeter and Air Signal System Formalism of Writing Out of Manipulators Dynamic Equation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1