邻域成分分析在处理和调查数据中的应用以预测学生的统计学学习

2022 International Conference on Advanced Learning Technologies (ICALT) Pub Date : 2022-07-01 DOI:10.1109/ICALT55010.2022.00051

Yikai Lu, Teresa M. Ober, Cheng Liu, Ying Cheng

{"title":"邻域成分分析在处理和调查数据中的应用以预测学生的统计学学习","authors":"Yikai Lu, Teresa M. Ober, Cheng Liu, Ying Cheng","doi":"10.1109/ICALT55010.2022.00051","DOIUrl":null,"url":null,"abstract":"Machine learning methods for predictive analytics have great potential for uncovering trends in educational data. However, simple linear models still appear to be most widely used, in part, because of their interpretability. This study aims to address the issues of interpretability of complex machine learning classifiers by conducting feature extraction by neighborhood components analysis (NCA). Our dataset comprises 287 features from both process data indicators (i.e., derived from log data of an online statistics learning platform) and self-report data from high school students enrolled in Advanced Placement (AP) Statistics (N=733). As a label for prediction, we use students’ scores on the AP Statistics exam. We evaluated the performance of machine learning classifiers with a given feature extraction method by evaluation criteria including F1 scores, the area under the receiver operating characteristic curve (AUC), and Cohen’s Kappas. We find that NCA effectively reduces the dimensionality of training datasets, stabilizes machine learning predictions, and produces interpretable scores. However, interpreting the NCA weights of features, while feasible, is not very straightforward compared to linear regression. Future research should consider developing guidelines to interpret NCA weights.","PeriodicalId":221464,"journal":{"name":"2022 International Conference on Advanced Learning Technologies (ICALT)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of Neighborhood Components Analysis to Process and Survey Data to Predict Student Learning of Statistics\",\"authors\":\"Yikai Lu, Teresa M. Ober, Cheng Liu, Ying Cheng\",\"doi\":\"10.1109/ICALT55010.2022.00051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning methods for predictive analytics have great potential for uncovering trends in educational data. However, simple linear models still appear to be most widely used, in part, because of their interpretability. This study aims to address the issues of interpretability of complex machine learning classifiers by conducting feature extraction by neighborhood components analysis (NCA). Our dataset comprises 287 features from both process data indicators (i.e., derived from log data of an online statistics learning platform) and self-report data from high school students enrolled in Advanced Placement (AP) Statistics (N=733). As a label for prediction, we use students’ scores on the AP Statistics exam. We evaluated the performance of machine learning classifiers with a given feature extraction method by evaluation criteria including F1 scores, the area under the receiver operating characteristic curve (AUC), and Cohen’s Kappas. We find that NCA effectively reduces the dimensionality of training datasets, stabilizes machine learning predictions, and produces interpretable scores. However, interpreting the NCA weights of features, while feasible, is not very straightforward compared to linear regression. Future research should consider developing guidelines to interpret NCA weights.\",\"PeriodicalId\":221464,\"journal\":{\"name\":\"2022 International Conference on Advanced Learning Technologies (ICALT)\",\"volume\":\"103 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Advanced Learning Technologies (ICALT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICALT55010.2022.00051\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Advanced Learning Technologies (ICALT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICALT55010.2022.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

用于预测分析的机器学习方法在揭示教育数据趋势方面具有巨大潜力。然而，简单的线性模型似乎仍然是最广泛使用的，部分原因是它们的可解释性。本研究旨在通过邻域成分分析(NCA)进行特征提取来解决复杂机器学习分类器的可解释性问题。我们的数据集包括287个特征，这些特征来自过程数据指标(即来自在线统计学习平台的日志数据)和参加大学预修(AP)统计的高中生的自我报告数据(N=733)。作为预测的标签，我们使用学生在AP统计考试中的分数。我们通过评估标准来评估具有给定特征提取方法的机器学习分类器的性能，包括F1分数、接收者工作特征曲线下面积(AUC)和Cohen’s Kappas。我们发现NCA有效地降低了训练数据集的维数，稳定了机器学习预测，并产生了可解释的分数。然而，与线性回归相比，解释特征的NCA权重虽然可行，但并不十分简单。未来的研究应考虑制定解释NCA权重的指南。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Application of Neighborhood Components Analysis to Process and Survey Data to Predict Student Learning of Statistics

Machine learning methods for predictive analytics have great potential for uncovering trends in educational data. However, simple linear models still appear to be most widely used, in part, because of their interpretability. This study aims to address the issues of interpretability of complex machine learning classifiers by conducting feature extraction by neighborhood components analysis (NCA). Our dataset comprises 287 features from both process data indicators (i.e., derived from log data of an online statistics learning platform) and self-report data from high school students enrolled in Advanced Placement (AP) Statistics (N=733). As a label for prediction, we use students’ scores on the AP Statistics exam. We evaluated the performance of machine learning classifiers with a given feature extraction method by evaluation criteria including F1 scores, the area under the receiver operating characteristic curve (AUC), and Cohen’s Kappas. We find that NCA effectively reduces the dimensionality of training datasets, stabilizes machine learning predictions, and produces interpretable scores. However, interpreting the NCA weights of features, while feasible, is not very straightforward compared to linear regression. Future research should consider developing guidelines to interpret NCA weights.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Conference on Advanced Learning Technologies (ICALT)

自引率

0.00%

发文量