Laila Alrajhi, Ahmed Alamri, Filipe Dwan Pereira, Alexandra I. Cristea, Elaine H. T. Oliveira
{"title":"解决数据不平衡问题:MOOC论坛讲师协助的自动紧急检测","authors":"Laila Alrajhi, Ahmed Alamri, Filipe Dwan Pereira, Alexandra I. Cristea, Elaine H. T. Oliveira","doi":"10.1007/s11257-023-09381-y","DOIUrl":null,"url":null,"abstract":"<p>In MOOCs, identifying urgent comments on discussion forums is an ongoing challenge. Whilst urgent comments require immediate reactions from instructors, to improve interaction with their learners, and potentially reducing drop-out rates—the task is difficult, as truly urgent comments are rare. From a data analytics perspective, this represents a <i>highly unbalanced (sparse) dataset</i>. Here, we aim to <i>automate the urgent comments identification process, based on fine-grained learner modelling</i>—to be used for automatic recommendations to instructors. To showcase and compare these models, we apply them to the <i>first gold standard dataset for </i><b><i>U</i></b><i>rgent i</i><b><i>N</i></b><i>structor </i><b><i>I</i></b><i>n</i><b><i>TE</i></b><i>rvention (UNITE)</i>, which we created by labelling FutureLearn MOOC data. We implement both benchmark shallow classifiers and deep learning. Importantly, we not only <i>compare, for the first time for the unbalanced problem, several data balancing techniques</i>, comprising text augmentation, text augmentation with undersampling, and undersampling, but also <i>propose several new pipelines for combining different augmenters for text augmentation</i>. Results show that models with undersampling can predict most urgent cases; and 3X <i>augmentation</i> + <i>undersampling</i> usually attains the best performance. We additionally validate the best models via a generic benchmark dataset (Stanford). As a case study, we showcase how the naïve Bayes with count vector can adaptively support instructors in answering learner questions/comments, potentially saving time or increasing efficiency in supporting learners. Finally, we show that the errors from the classifier mirrors the disagreements between annotators. Thus, our proposed algorithms perform at least as well as a ‘super-diligent’ human instructor (with the time to consider all comments).</p>","PeriodicalId":49388,"journal":{"name":"User Modeling and User-Adapted Interaction","volume":"202 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Solving the imbalanced data issue: automatic urgency detection for instructor assistance in MOOC discussion forums\",\"authors\":\"Laila Alrajhi, Ahmed Alamri, Filipe Dwan Pereira, Alexandra I. Cristea, Elaine H. T. Oliveira\",\"doi\":\"10.1007/s11257-023-09381-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In MOOCs, identifying urgent comments on discussion forums is an ongoing challenge. Whilst urgent comments require immediate reactions from instructors, to improve interaction with their learners, and potentially reducing drop-out rates—the task is difficult, as truly urgent comments are rare. From a data analytics perspective, this represents a <i>highly unbalanced (sparse) dataset</i>. Here, we aim to <i>automate the urgent comments identification process, based on fine-grained learner modelling</i>—to be used for automatic recommendations to instructors. To showcase and compare these models, we apply them to the <i>first gold standard dataset for </i><b><i>U</i></b><i>rgent i</i><b><i>N</i></b><i>structor </i><b><i>I</i></b><i>n</i><b><i>TE</i></b><i>rvention (UNITE)</i>, which we created by labelling FutureLearn MOOC data. We implement both benchmark shallow classifiers and deep learning. Importantly, we not only <i>compare, for the first time for the unbalanced problem, several data balancing techniques</i>, comprising text augmentation, text augmentation with undersampling, and undersampling, but also <i>propose several new pipelines for combining different augmenters for text augmentation</i>. Results show that models with undersampling can predict most urgent cases; and 3X <i>augmentation</i> + <i>undersampling</i> usually attains the best performance. We additionally validate the best models via a generic benchmark dataset (Stanford). As a case study, we showcase how the naïve Bayes with count vector can adaptively support instructors in answering learner questions/comments, potentially saving time or increasing efficiency in supporting learners. Finally, we show that the errors from the classifier mirrors the disagreements between annotators. Thus, our proposed algorithms perform at least as well as a ‘super-diligent’ human instructor (with the time to consider all comments).</p>\",\"PeriodicalId\":49388,\"journal\":{\"name\":\"User Modeling and User-Adapted Interaction\",\"volume\":\"202 1\",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"User Modeling and User-Adapted Interaction\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11257-023-09381-y\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, CYBERNETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"User Modeling and User-Adapted Interaction","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11257-023-09381-y","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}
Solving the imbalanced data issue: automatic urgency detection for instructor assistance in MOOC discussion forums
In MOOCs, identifying urgent comments on discussion forums is an ongoing challenge. Whilst urgent comments require immediate reactions from instructors, to improve interaction with their learners, and potentially reducing drop-out rates—the task is difficult, as truly urgent comments are rare. From a data analytics perspective, this represents a highly unbalanced (sparse) dataset. Here, we aim to automate the urgent comments identification process, based on fine-grained learner modelling—to be used for automatic recommendations to instructors. To showcase and compare these models, we apply them to the first gold standard dataset for Urgent iNstructor InTErvention (UNITE), which we created by labelling FutureLearn MOOC data. We implement both benchmark shallow classifiers and deep learning. Importantly, we not only compare, for the first time for the unbalanced problem, several data balancing techniques, comprising text augmentation, text augmentation with undersampling, and undersampling, but also propose several new pipelines for combining different augmenters for text augmentation. Results show that models with undersampling can predict most urgent cases; and 3X augmentation + undersampling usually attains the best performance. We additionally validate the best models via a generic benchmark dataset (Stanford). As a case study, we showcase how the naïve Bayes with count vector can adaptively support instructors in answering learner questions/comments, potentially saving time or increasing efficiency in supporting learners. Finally, we show that the errors from the classifier mirrors the disagreements between annotators. Thus, our proposed algorithms perform at least as well as a ‘super-diligent’ human instructor (with the time to consider all comments).
期刊介绍:
User Modeling and User-Adapted Interaction provides an interdisciplinary forum for the dissemination of novel and significant original research results about interactive computer systems that can adapt themselves to their users, and on the design, use, and evaluation of user models for adaptation. The journal publishes high-quality original papers from, e.g., the following areas: acquisition and formal representation of user models; conceptual models and user stereotypes for personalization; student modeling and adaptive learning; models of groups of users; user model driven personalised information discovery and retrieval; recommender systems; adaptive user interfaces and agents; adaptation for accessibility and inclusion; generic user modeling systems and tools; interoperability of user models; personalization in areas such as; affective computing; ubiquitous and mobile computing; language based interactions; multi-modal interactions; virtual and augmented reality; social media and the Web; human-robot interaction; behaviour change interventions; personalized applications in specific domains; privacy, accountability, and security of information for personalization; responsible adaptation: fairness, accountability, explainability, transparency and control; methods for the design and evaluation of user models and adaptive systems