Application of Neighborhood Components Analysis to Process and Survey Data to Predict Student Learning of Statistics

Yikai Lu, Teresa M. Ober, Cheng Liu, Ying Cheng
{"title":"Application of Neighborhood Components Analysis to Process and Survey Data to Predict Student Learning of Statistics","authors":"Yikai Lu, Teresa M. Ober, Cheng Liu, Ying Cheng","doi":"10.1109/ICALT55010.2022.00051","DOIUrl":null,"url":null,"abstract":"Machine learning methods for predictive analytics have great potential for uncovering trends in educational data. However, simple linear models still appear to be most widely used, in part, because of their interpretability. This study aims to address the issues of interpretability of complex machine learning classifiers by conducting feature extraction by neighborhood components analysis (NCA). Our dataset comprises 287 features from both process data indicators (i.e., derived from log data of an online statistics learning platform) and self-report data from high school students enrolled in Advanced Placement (AP) Statistics (N=733). As a label for prediction, we use students’ scores on the AP Statistics exam. We evaluated the performance of machine learning classifiers with a given feature extraction method by evaluation criteria including F1 scores, the area under the receiver operating characteristic curve (AUC), and Cohen’s Kappas. We find that NCA effectively reduces the dimensionality of training datasets, stabilizes machine learning predictions, and produces interpretable scores. However, interpreting the NCA weights of features, while feasible, is not very straightforward compared to linear regression. Future research should consider developing guidelines to interpret NCA weights.","PeriodicalId":221464,"journal":{"name":"2022 International Conference on Advanced Learning Technologies (ICALT)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Advanced Learning Technologies (ICALT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICALT55010.2022.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning methods for predictive analytics have great potential for uncovering trends in educational data. However, simple linear models still appear to be most widely used, in part, because of their interpretability. This study aims to address the issues of interpretability of complex machine learning classifiers by conducting feature extraction by neighborhood components analysis (NCA). Our dataset comprises 287 features from both process data indicators (i.e., derived from log data of an online statistics learning platform) and self-report data from high school students enrolled in Advanced Placement (AP) Statistics (N=733). As a label for prediction, we use students’ scores on the AP Statistics exam. We evaluated the performance of machine learning classifiers with a given feature extraction method by evaluation criteria including F1 scores, the area under the receiver operating characteristic curve (AUC), and Cohen’s Kappas. We find that NCA effectively reduces the dimensionality of training datasets, stabilizes machine learning predictions, and produces interpretable scores. However, interpreting the NCA weights of features, while feasible, is not very straightforward compared to linear regression. Future research should consider developing guidelines to interpret NCA weights.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
邻域成分分析在处理和调查数据中的应用以预测学生的统计学学习
用于预测分析的机器学习方法在揭示教育数据趋势方面具有巨大潜力。然而,简单的线性模型似乎仍然是最广泛使用的,部分原因是它们的可解释性。本研究旨在通过邻域成分分析(NCA)进行特征提取来解决复杂机器学习分类器的可解释性问题。我们的数据集包括287个特征,这些特征来自过程数据指标(即来自在线统计学习平台的日志数据)和参加大学预修(AP)统计的高中生的自我报告数据(N=733)。作为预测的标签,我们使用学生在AP统计考试中的分数。我们通过评估标准来评估具有给定特征提取方法的机器学习分类器的性能,包括F1分数、接收者工作特征曲线下面积(AUC)和Cohen’s Kappas。我们发现NCA有效地降低了训练数据集的维数,稳定了机器学习预测,并产生了可解释的分数。然而,与线性回归相比,解释特征的NCA权重虽然可行,但并不十分简单。未来的研究应考虑制定解释NCA权重的指南。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Daily Learning Challenge: A Gamified Approach For Microlearning Participatory co-design approach for Greencoin educational tool shaping urban green behaviors Using deep learning models to predict student performance in introductory computer programming courses Emotional computing at the Edge to Support Effective IoE Applications in Future Classroom Mobile Eye Tracking Research in Inclusive Classrooms: Children’s Experiences
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1