Unsupervised Characterization and Visualization of Students' Academic Performance Features

Udoinyang Godwin Inyang, U. Umoh, Ifeoma Nnaemeka, Samuel A. Robinson
{"title":"Unsupervised Characterization and Visualization of Students' Academic Performance Features","authors":"Udoinyang Godwin Inyang, U. Umoh, Ifeoma Nnaemeka, Samuel A. Robinson","doi":"10.5539/CIS.V12N2P103","DOIUrl":null,"url":null,"abstract":"The large nature of students’ dataset has made it difficult to find patterns associated with students’ academic performance (AP) using conventional methods. This has increased the rate of drop-outs, graduands with weak class of degree (CoD) and students that spend more than the minimum stipulated duration of studies. It is necessary to determine students’ AP using educational data mining (EDM) tools in order to know students who are likely to perform poorly at an early stage of their studies. This paper explores k-means and self-organizing map (SOM) in mining pieces of knowledge relating to the natural number of clusters in students’ dataset and the association of the input features using selected demographic, pre-admission and first year performance. Matlab 2015a was the programming environment and the dataset consists of nine sets of computer science graduands. Cluster validity assessment with k-means discovered four (4) clusters with correlation metric yielding the highest mean silhouette value of 0.5912.  SOM provided an hexagonal grid visual of feature component planes and scatter plots of each significant input attribute. The result shows that the significant attributes were highly correlated with each other except entry mode (EM), indicating that the impact of EM on CoD varies with students irrespective of mode of admission. Also, four distinct clusters were also discovered in the dataset by SOM —7.7% belonging to cluster 1 (first class), and 25% for cluster 2 (2nd class Upper) while Clusters 3 and 4 had 35% proportion each. This validates the results of k-means and further confirms the importance of early detection of students’ AP and confirms the effectiveness of SOM as a cluster validity tool. As further work, the labels from SOM will be associated with records in the dataset for association rule mining, supervised learning and prediction of students’ AP.","PeriodicalId":14676,"journal":{"name":"J. Chem. Inf. Comput. Sci.","volume":"98 1","pages":"103-116"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Chem. Inf. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5539/CIS.V12N2P103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The large nature of students’ dataset has made it difficult to find patterns associated with students’ academic performance (AP) using conventional methods. This has increased the rate of drop-outs, graduands with weak class of degree (CoD) and students that spend more than the minimum stipulated duration of studies. It is necessary to determine students’ AP using educational data mining (EDM) tools in order to know students who are likely to perform poorly at an early stage of their studies. This paper explores k-means and self-organizing map (SOM) in mining pieces of knowledge relating to the natural number of clusters in students’ dataset and the association of the input features using selected demographic, pre-admission and first year performance. Matlab 2015a was the programming environment and the dataset consists of nine sets of computer science graduands. Cluster validity assessment with k-means discovered four (4) clusters with correlation metric yielding the highest mean silhouette value of 0.5912.  SOM provided an hexagonal grid visual of feature component planes and scatter plots of each significant input attribute. The result shows that the significant attributes were highly correlated with each other except entry mode (EM), indicating that the impact of EM on CoD varies with students irrespective of mode of admission. Also, four distinct clusters were also discovered in the dataset by SOM —7.7% belonging to cluster 1 (first class), and 25% for cluster 2 (2nd class Upper) while Clusters 3 and 4 had 35% proportion each. This validates the results of k-means and further confirms the importance of early detection of students’ AP and confirms the effectiveness of SOM as a cluster validity tool. As further work, the labels from SOM will be associated with records in the dataset for association rule mining, supervised learning and prediction of students’ AP.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
学生学业表现特征的无监督表征与可视化
学生数据集的庞大性质使得使用传统方法很难找到与学生学习成绩(AP)相关的模式。这增加了辍学率、学位等级较低的毕业生以及学习时间超过规定最低期限的学生的比例。有必要使用教育数据挖掘(EDM)工具来确定学生的AP,以便了解哪些学生可能在学习的早期阶段表现不佳。本文探讨了k-means和自组织地图(SOM)在挖掘与学生数据集中聚类自然数相关的知识片段以及使用选定的人口统计、入学前和第一年表现的输入特征的关联方面的应用。以Matlab 2015a为编程环境,数据集由9组计算机科学专业毕业生组成。用k-means进行聚类效度评估,发现了4个聚类,相关度量的平均剪影值最高,为0.5912。SOM提供了特征分量平面的六角形网格视觉和每个重要输入属性的散点图。结果表明,除入学方式外,其他显著性属性均呈高度相关,表明入学方式对CoD的影响在不同录取方式的学生中存在差异。此外,SOM还在数据集中发现了4个不同的集群,其中7.7%属于集群1(第一类),25%属于集群2(第二类上层),而集群3和4各占35%的比例。这验证了k-means的结果,进一步证实了早期发现学生AP的重要性,并证实了SOM作为聚类效度工具的有效性。作为进一步的工作,SOM的标签将与数据集中的记录相关联,用于关联规则挖掘、监督学习和学生AP预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Cover Image, Volume 41, Issue 13 Cover Image, Volume 41, Issue 15 Cover Image, Volume 41, Issue 14 Cover Image, Volume 41, Issue 11 Cover Image, Volume 41, Issue 9
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1