Exploring techniques to improve machine learning’s identification of at-risk students in physics classes

IF 2.6 2区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Physical Review Physics Education Research Pub Date : 2024-05-31 DOI:10.1103/physrevphyseducres.20.010149

John Pace, John Hansen, John Stewart

{"title":"Exploring techniques to improve machine learning’s identification of at-risk students in physics classes","authors":"John Pace, John Hansen, John Stewart","doi":"10.1103/physrevphyseducres.20.010149","DOIUrl":null,"url":null,"abstract":"Machine learning models were constructed to predict student performance in an introductory mechanics class at a large land-grant university in the United States using data from 2061 students. Students were classified as either being at risk of failing the course (earning a D or F) or not at risk (earning an A, B, or C). The models focused on variables available in the first few weeks of the class which could potentially allow for early interventions to help at-risk students. Multiple types of variables were used in the model: in-class variables (average homework and clicker quiz scores), institutional variables [college grade point average (GPA)], and noncognitive variables (self-efficacy). The substantial imbalance between the pass and fail rates of the course, with only about 10% of students failing, required modification to the machine learning algorithms. Decision threshold tuning and upsampling were successful in improving performance for at-risk students. Logistic regression combined with a decision threshold tuned to maximize balanced accuracy yielded the strongest classifier, with a DF accuracy of 83% and an ABC accuracy of 81%. Measures of variable importance involving changes in balanced accuracy identified homework grades, clicker grades, college GPA, and the fraction of college classes successfully completed as the most important variables in predicting success in introductory physics. Noncognitive variables added little predictive power to the models. Classification models with performance near the best-performing models using the full set of variables could be constructed with very few variables (homework average, clicker scores, and college GPA) using straightforward to implement algorithms, suggesting the application of these technologies may be fairly easy to include in many physics classes.","PeriodicalId":54296,"journal":{"name":"Physical Review Physics Education Research","volume":"46 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical Review Physics Education Research","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1103/physrevphyseducres.20.010149","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning models were constructed to predict student performance in an introductory mechanics class at a large land-grant university in the United States using data from 2061 students. Students were classified as either being at risk of failing the course (earning a D or F) or not at risk (earning an A, B, or C). The models focused on variables available in the first few weeks of the class which could potentially allow for early interventions to help at-risk students. Multiple types of variables were used in the model: in-class variables (average homework and clicker quiz scores), institutional variables [college grade point average (GPA)], and noncognitive variables (self-efficacy). The substantial imbalance between the pass and fail rates of the course, with only about 10% of students failing, required modification to the machine learning algorithms. Decision threshold tuning and upsampling were successful in improving performance for at-risk students. Logistic regression combined with a decision threshold tuned to maximize balanced accuracy yielded the strongest classifier, with a DF accuracy of 83% and an ABC accuracy of 81%. Measures of variable importance involving changes in balanced accuracy identified homework grades, clicker grades, college GPA, and the fraction of college classes successfully completed as the most important variables in predicting success in introductory physics. Noncognitive variables added little predictive power to the models. Classification models with performance near the best-performing models using the full set of variables could be constructed with very few variables (homework average, clicker scores, and college GPA) using straightforward to implement algorithms, suggesting the application of these technologies may be fairly easy to include in many physics classes.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

探索改进机器学习识别物理课上问题学生的技术

我们利用 2061 名学生的数据构建了机器学习模型，用于预测美国一所大型赠地大学机械入门课的学生成绩。学生被分为有挂科风险（获得 D 或 F）或无挂科风险（获得 A、B 或 C）。模型的重点是开课前几周的变量，这些变量有可能为帮助高风险学生进行早期干预提供帮助。模型中使用了多种类型的变量：课内变量（平均家庭作业和点击测验分数）、机构变量[大学平均学分绩点（GPA）]和非认知变量（自我效能）。该课程的及格率和不及格率严重失衡，只有约 10%的学生不及格，因此需要对机器学习算法进行修改。决策阈值调整和上采样成功地提高了问题学生的成绩。逻辑回归与决策阈值相结合，最大限度地提高了平衡准确率，从而产生了最强的分类器，DF 准确率为 83%，ABC 准确率为 81%。对变量重要性的衡量涉及平衡准确率的变化，结果发现作业成绩、点击成绩、大学平均学分绩点和成功完成的大学课程比例是预测物理入门学习成功与否的最重要变量。非认知变量对模型的预测作用很小。使用简单易行的算法，只需使用很少的变量（作业平均分、点击器成绩和大学平均学分绩点）就能构建分类模型，其性能接近于使用全套变量的最佳模型，这表明这些技术的应用可能很容易被纳入许多物理课程中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Physical Review Physics Education Research Social Sciences-Education

CiteScore

5.70

自引率

41.90%

发文量

审稿时长

32 weeks

期刊介绍： PRPER covers all educational levels, from elementary through graduate education. All topics in experimental and theoretical physics education research are accepted, including, but not limited to: Educational policy Instructional strategies, and materials development Research methodology Epistemology, attitudes, and beliefs Learning environment Scientific reasoning and problem solving Diversity and inclusion Learning theory Student participation Faculty and teacher professional development