Sabrina Ludwig , Andreas Rausch , Viola Deutscher , Jürgen Seifried
{"title":"将 N-grams 和随机森林应用于行为过程数据,预测办公室模拟中解决问题的成功率","authors":"Sabrina Ludwig , Andreas Rausch , Viola Deutscher , Jürgen Seifried","doi":"10.1016/j.compedu.2024.105093","DOIUrl":null,"url":null,"abstract":"<div><p>Predicting students' problem-solving success in computer-based simulations at an early stage allows adaptive educational systems to provide learners with personalized support. In this paper, we predict students' problem-solving success by applying a machine-learning model, the random forest, to produce a binary classification (more vs. less successful students). During a business-related problem scenario that lasted 55 min, early behavioral data (during the first 5, 10, and 20 min) such as mouse clicks and keyboard strokes (approx. 29,800 early-window clickstreams and keystrokes during the first 20 min) of 234 trainees were recorded, mirroring the students' problem-solving behavior. We used the n-gram sequence mining technique, which was originally introduced within the emerging disciplines of natural language processing, text mining, and machine learning and has proven to be effective, particularly in the examination of online behavior. We trained the random forest model with training datasets that included all features (bigrams), as well as selected features (the most predictable bigrams explaining inter-group differences). Our results show that early predictions based on the first 10 and 20 min contained sufficient information to accurately predict problem-solving success, while predictions that are too early (based on the first 5 min) do not. As the size of the initial time window expanded, the classification performance improved. Moreover, the selection of the most predictable features improved the models' performance for all three time intervals. The model that was trained with only selected robust features that occurred in the first 20 min achieved the highest ROC AUC score of almost 0.70. This result falls within the range of accuracy scores observed in similar studies. From the instructor's perspective, predictions help in the early identification of weak students and can provide them with personalized learning prompts. For more successful students, tasks can be enriched adaptively.</p></div>","PeriodicalId":10568,"journal":{"name":"Computers & Education","volume":"218 ","pages":"Article 105093"},"PeriodicalIF":8.9000,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting problem-solving success in an office simulation applying N-grams and a random forest to behavioral process data\",\"authors\":\"Sabrina Ludwig , Andreas Rausch , Viola Deutscher , Jürgen Seifried\",\"doi\":\"10.1016/j.compedu.2024.105093\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Predicting students' problem-solving success in computer-based simulations at an early stage allows adaptive educational systems to provide learners with personalized support. In this paper, we predict students' problem-solving success by applying a machine-learning model, the random forest, to produce a binary classification (more vs. less successful students). During a business-related problem scenario that lasted 55 min, early behavioral data (during the first 5, 10, and 20 min) such as mouse clicks and keyboard strokes (approx. 29,800 early-window clickstreams and keystrokes during the first 20 min) of 234 trainees were recorded, mirroring the students' problem-solving behavior. We used the n-gram sequence mining technique, which was originally introduced within the emerging disciplines of natural language processing, text mining, and machine learning and has proven to be effective, particularly in the examination of online behavior. We trained the random forest model with training datasets that included all features (bigrams), as well as selected features (the most predictable bigrams explaining inter-group differences). Our results show that early predictions based on the first 10 and 20 min contained sufficient information to accurately predict problem-solving success, while predictions that are too early (based on the first 5 min) do not. As the size of the initial time window expanded, the classification performance improved. Moreover, the selection of the most predictable features improved the models' performance for all three time intervals. The model that was trained with only selected robust features that occurred in the first 20 min achieved the highest ROC AUC score of almost 0.70. This result falls within the range of accuracy scores observed in similar studies. From the instructor's perspective, predictions help in the early identification of weak students and can provide them with personalized learning prompts. For more successful students, tasks can be enriched adaptively.</p></div>\",\"PeriodicalId\":10568,\"journal\":{\"name\":\"Computers & Education\",\"volume\":\"218 \",\"pages\":\"Article 105093\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2024-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0360131524001076\",\"RegionNum\":1,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Education","FirstCategoryId":"95","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0360131524001076","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Predicting problem-solving success in an office simulation applying N-grams and a random forest to behavioral process data
Predicting students' problem-solving success in computer-based simulations at an early stage allows adaptive educational systems to provide learners with personalized support. In this paper, we predict students' problem-solving success by applying a machine-learning model, the random forest, to produce a binary classification (more vs. less successful students). During a business-related problem scenario that lasted 55 min, early behavioral data (during the first 5, 10, and 20 min) such as mouse clicks and keyboard strokes (approx. 29,800 early-window clickstreams and keystrokes during the first 20 min) of 234 trainees were recorded, mirroring the students' problem-solving behavior. We used the n-gram sequence mining technique, which was originally introduced within the emerging disciplines of natural language processing, text mining, and machine learning and has proven to be effective, particularly in the examination of online behavior. We trained the random forest model with training datasets that included all features (bigrams), as well as selected features (the most predictable bigrams explaining inter-group differences). Our results show that early predictions based on the first 10 and 20 min contained sufficient information to accurately predict problem-solving success, while predictions that are too early (based on the first 5 min) do not. As the size of the initial time window expanded, the classification performance improved. Moreover, the selection of the most predictable features improved the models' performance for all three time intervals. The model that was trained with only selected robust features that occurred in the first 20 min achieved the highest ROC AUC score of almost 0.70. This result falls within the range of accuracy scores observed in similar studies. From the instructor's perspective, predictions help in the early identification of weak students and can provide them with personalized learning prompts. For more successful students, tasks can be enriched adaptively.
期刊介绍:
Computers & Education seeks to advance understanding of how digital technology can improve education by publishing high-quality research that expands both theory and practice. The journal welcomes research papers exploring the pedagogical applications of digital technology, with a focus broad enough to appeal to the wider education community.