将 N-grams 和随机森林应用于行为过程数据，预测办公室模拟中解决问题的成功率

IF 8.9 1区教育学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Education Pub Date : 2024-06-03 DOI:10.1016/j.compedu.2024.105093

Sabrina Ludwig , Andreas Rausch , Viola Deutscher , Jürgen Seifried

{"title":"将 N-grams 和随机森林应用于行为过程数据，预测办公室模拟中解决问题的成功率","authors":"Sabrina Ludwig , Andreas Rausch , Viola Deutscher , Jürgen Seifried","doi":"10.1016/j.compedu.2024.105093","DOIUrl":null,"url":null,"abstract":"<div><p>Predicting students' problem-solving success in computer-based simulations at an early stage allows adaptive educational systems to provide learners with personalized support. In this paper, we predict students' problem-solving success by applying a machine-learning model, the random forest, to produce a binary classification (more vs. less successful students). During a business-related problem scenario that lasted 55 min, early behavioral data (during the first 5, 10, and 20 min) such as mouse clicks and keyboard strokes (approx. 29,800 early-window clickstreams and keystrokes during the first 20 min) of 234 trainees were recorded, mirroring the students' problem-solving behavior. We used the n-gram sequence mining technique, which was originally introduced within the emerging disciplines of natural language processing, text mining, and machine learning and has proven to be effective, particularly in the examination of online behavior. We trained the random forest model with training datasets that included all features (bigrams), as well as selected features (the most predictable bigrams explaining inter-group differences). Our results show that early predictions based on the first 10 and 20 min contained sufficient information to accurately predict problem-solving success, while predictions that are too early (based on the first 5 min) do not. As the size of the initial time window expanded, the classification performance improved. Moreover, the selection of the most predictable features improved the models' performance for all three time intervals. The model that was trained with only selected robust features that occurred in the first 20 min achieved the highest ROC AUC score of almost 0.70. This result falls within the range of accuracy scores observed in similar studies. From the instructor's perspective, predictions help in the early identification of weak students and can provide them with personalized learning prompts. For more successful students, tasks can be enriched adaptively.</p></div>","PeriodicalId":10568,"journal":{"name":"Computers & Education","volume":"218 ","pages":"Article 105093"},"PeriodicalIF":8.9000,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting problem-solving success in an office simulation applying N-grams and a random forest to behavioral process data\",\"authors\":\"Sabrina Ludwig , Andreas Rausch , Viola Deutscher , Jürgen Seifried\",\"doi\":\"10.1016/j.compedu.2024.105093\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Predicting students' problem-solving success in computer-based simulations at an early stage allows adaptive educational systems to provide learners with personalized support. In this paper, we predict students' problem-solving success by applying a machine-learning model, the random forest, to produce a binary classification (more vs. less successful students). During a business-related problem scenario that lasted 55 min, early behavioral data (during the first 5, 10, and 20 min) such as mouse clicks and keyboard strokes (approx. 29,800 early-window clickstreams and keystrokes during the first 20 min) of 234 trainees were recorded, mirroring the students' problem-solving behavior. We used the n-gram sequence mining technique, which was originally introduced within the emerging disciplines of natural language processing, text mining, and machine learning and has proven to be effective, particularly in the examination of online behavior. We trained the random forest model with training datasets that included all features (bigrams), as well as selected features (the most predictable bigrams explaining inter-group differences). Our results show that early predictions based on the first 10 and 20 min contained sufficient information to accurately predict problem-solving success, while predictions that are too early (based on the first 5 min) do not. As the size of the initial time window expanded, the classification performance improved. Moreover, the selection of the most predictable features improved the models' performance for all three time intervals. The model that was trained with only selected robust features that occurred in the first 20 min achieved the highest ROC AUC score of almost 0.70. This result falls within the range of accuracy scores observed in similar studies. From the instructor's perspective, predictions help in the early identification of weak students and can provide them with personalized learning prompts. For more successful students, tasks can be enriched adaptively.</p></div>\",\"PeriodicalId\":10568,\"journal\":{\"name\":\"Computers & Education\",\"volume\":\"218 \",\"pages\":\"Article 105093\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2024-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0360131524001076\",\"RegionNum\":1,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Education","FirstCategoryId":"95","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0360131524001076","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

在早期阶段预测学生在计算机模拟中解决问题的成功率，可以让自适应教育系统为学习者提供个性化支持。在本文中，我们通过应用机器学习模型--随机森林来预测学生解决问题的成功率，从而得出二元分类（成功率较高与较低的学生）。在一个持续 55 分钟的商业相关问题情景中，我们记录了 234 名学员的早期行为数据（前 5 分钟、10 分钟和 20 分钟），如鼠标点击和键盘敲击（前 20 分钟内约有 29,800 次早期窗口点击流和键盘敲击），这些数据反映了学生解决问题的行为。我们使用了 n-gram 序列挖掘技术，该技术最初是在自然语言处理、文本挖掘和机器学习等新兴学科中引入的，已被证明非常有效，尤其是在检查在线行为方面。我们使用包含所有特征（大词组）和选定特征（解释组间差异的最可预测的大词组）的训练数据集对随机森林模型进行了训练。我们的结果表明，基于前 10 分钟和 20 分钟的早期预测包含了足够的信息，可以准确预测问题解决的成功率，而过早（基于前 5 分钟）的预测则不能。随着初始时间窗口的扩大，分类性能也得到了提高。此外，选择最可预测的特征也提高了模型在所有三个时间间隔内的性能。仅选择前 20 分钟内出现的稳健特征来训练的模型的 ROC AUC 得分最高，接近 0.70。这一结果属于类似研究中观察到的准确率范围。从教师的角度来看，预测有助于早期识别能力较弱的学生，并为他们提供个性化的学习提示。对于成绩较好的学生，可以自适应地丰富任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Predicting problem-solving success in an office simulation applying N-grams and a random forest to behavioral process data

Predicting students' problem-solving success in computer-based simulations at an early stage allows adaptive educational systems to provide learners with personalized support. In this paper, we predict students' problem-solving success by applying a machine-learning model, the random forest, to produce a binary classification (more vs. less successful students). During a business-related problem scenario that lasted 55 min, early behavioral data (during the first 5, 10, and 20 min) such as mouse clicks and keyboard strokes (approx. 29,800 early-window clickstreams and keystrokes during the first 20 min) of 234 trainees were recorded, mirroring the students' problem-solving behavior. We used the n-gram sequence mining technique, which was originally introduced within the emerging disciplines of natural language processing, text mining, and machine learning and has proven to be effective, particularly in the examination of online behavior. We trained the random forest model with training datasets that included all features (bigrams), as well as selected features (the most predictable bigrams explaining inter-group differences). Our results show that early predictions based on the first 10 and 20 min contained sufficient information to accurately predict problem-solving success, while predictions that are too early (based on the first 5 min) do not. As the size of the initial time window expanded, the classification performance improved. Moreover, the selection of the most predictable features improved the models' performance for all three time intervals. The model that was trained with only selected robust features that occurred in the first 20 min achieved the highest ROC AUC score of almost 0.70. This result falls within the range of accuracy scores observed in similar studies. From the instructor's perspective, predictions help in the early identification of weak students and can provide them with personalized learning prompts. For more successful students, tasks can be enriched adaptively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Education 工程技术-计算机：跨学科应用

CiteScore

27.10

自引率

5.80%

发文量

204

审稿时长

42 days

期刊介绍： Computers & Education seeks to advance understanding of how digital technology can improve education by publishing high-quality research that expands both theory and practice. The journal welcomes research papers exploring the pedagogical applications of digital technology, with a focus broad enough to appeal to the wider education community.