揭示 MOOC 论坛中的建议：基于转换器的方法

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence Review Pub Date : 2024-11-04 DOI:10.1007/s10462-024-10997-8

Karen Reina Sánchez, Gonzalo Vaca Serrano, Juan Pedro Arbáizar Gómez, Alfonso Duran-Heras

{"title":"揭示 MOOC 论坛中的建议：基于转换器的方法","authors":"Karen Reina Sánchez, Gonzalo Vaca Serrano, Juan Pedro Arbáizar Gómez, Alfonso Duran-Heras","doi":"10.1007/s10462-024-10997-8","DOIUrl":null,"url":null,"abstract":"<div><p>The field of natural language processing has experienced significant advances in recent years, but these advances have not yet resulted in improved analytics for instructors on MOOC platforms. Valuable information, such as suggestions, is generated in the comment forums of these courses, but due to their volume, manual processing is often impractical. This study examines the feasibility of fine-tuning and effectively utilizing state-of-the-art deep learning models to identify comments that contain suggestions in MOOC forums. The main challenges encountered are the lack of labeled datasets from the MOOC context for fine-tuning classification models and the soaring computational cost of this training. For this study, we manually collected and labeled 2228 comments in Spanish and English from 5 MOOCs and scraped 1.4 million MOOC reviews from 3 platforms. We fine-tuned and evaluated 4 pretrained models based on the transformer architecture and 3 traditional machine learning models to compare their effectiveness in the suggestion mining task in this domain. Transformer-based models proved to be highly effective in this task/domain combination, achieving performance levels that matched or exceeded those deemed appropriate in other contexts and were significantly greater than those achieved by traditional models. Domain adaptation led to improved linguistic understanding of the target domain; however, in this project, this approach did not translate into an observable improvement in suggestion mining. The automated identification of comments that can be labeled as suggestions can result in considerable time savings for instructors, especially considering that less than a quarter of the analyzed comments contain suggestions.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 1","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10997-8.pdf","citationCount":"0","resultStr":"{\"title\":\"Uncovering suggestions in MOOC discussion forums: a transformer-based approach\",\"authors\":\"Karen Reina Sánchez, Gonzalo Vaca Serrano, Juan Pedro Arbáizar Gómez, Alfonso Duran-Heras\",\"doi\":\"10.1007/s10462-024-10997-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The field of natural language processing has experienced significant advances in recent years, but these advances have not yet resulted in improved analytics for instructors on MOOC platforms. Valuable information, such as suggestions, is generated in the comment forums of these courses, but due to their volume, manual processing is often impractical. This study examines the feasibility of fine-tuning and effectively utilizing state-of-the-art deep learning models to identify comments that contain suggestions in MOOC forums. The main challenges encountered are the lack of labeled datasets from the MOOC context for fine-tuning classification models and the soaring computational cost of this training. For this study, we manually collected and labeled 2228 comments in Spanish and English from 5 MOOCs and scraped 1.4 million MOOC reviews from 3 platforms. We fine-tuned and evaluated 4 pretrained models based on the transformer architecture and 3 traditional machine learning models to compare their effectiveness in the suggestion mining task in this domain. Transformer-based models proved to be highly effective in this task/domain combination, achieving performance levels that matched or exceeded those deemed appropriate in other contexts and were significantly greater than those achieved by traditional models. Domain adaptation led to improved linguistic understanding of the target domain; however, in this project, this approach did not translate into an observable improvement in suggestion mining. The automated identification of comments that can be labeled as suggestions can result in considerable time savings for instructors, especially considering that less than a quarter of the analyzed comments contain suggestions.</p></div>\",\"PeriodicalId\":8449,\"journal\":{\"name\":\"Artificial Intelligence Review\",\"volume\":\"58 1\",\"pages\":\"\"},\"PeriodicalIF\":10.7000,\"publicationDate\":\"2024-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10462-024-10997-8.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence Review\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10462-024-10997-8\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-10997-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

近年来，自然语言处理领域取得了长足的进步，但这些进步尚未为 MOOC 平台上的教师带来更好的分析效果。在这些课程的评论论坛中会产生诸如建议等有价值的信息，但由于其数量庞大，人工处理往往不切实际。本研究探讨了微调和有效利用最先进的深度学习模型来识别 MOOC 论坛中包含建议的评论的可行性。所遇到的主要挑战是，缺乏用于微调分类模型的MOOC背景下的标注数据集，以及这种训练的高昂计算成本。在这项研究中，我们手动收集并标注了来自 5 个 MOOC 的 2228 条西班牙语和英语评论，并从 3 个平台上获取了 140 万条 MOOC 评论。我们对 4 个基于转换器架构的预训练模型和 3 个传统机器学习模型进行了微调和评估，以比较它们在该领域建议挖掘任务中的有效性。事实证明，基于变换器的模型在这一任务/领域组合中非常有效，其性能水平达到或超过了其他语境中的适当水平，并且明显高于传统模型。领域适应性提高了对目标领域的语言理解能力；但是，在本项目中，这种方法并没有转化为建议挖掘方面的明显改善。自动识别可标记为建议的评论可为教师节省大量时间，特别是考虑到只有不到四分之一的分析评论包含建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Uncovering suggestions in MOOC discussion forums: a transformer-based approach

The field of natural language processing has experienced significant advances in recent years, but these advances have not yet resulted in improved analytics for instructors on MOOC platforms. Valuable information, such as suggestions, is generated in the comment forums of these courses, but due to their volume, manual processing is often impractical. This study examines the feasibility of fine-tuning and effectively utilizing state-of-the-art deep learning models to identify comments that contain suggestions in MOOC forums. The main challenges encountered are the lack of labeled datasets from the MOOC context for fine-tuning classification models and the soaring computational cost of this training. For this study, we manually collected and labeled 2228 comments in Spanish and English from 5 MOOCs and scraped 1.4 million MOOC reviews from 3 platforms. We fine-tuned and evaluated 4 pretrained models based on the transformer architecture and 3 traditional machine learning models to compare their effectiveness in the suggestion mining task in this domain. Transformer-based models proved to be highly effective in this task/domain combination, achieving performance levels that matched or exceeded those deemed appropriate in other contexts and were significantly greater than those achieved by traditional models. Domain adaptation led to improved linguistic understanding of the target domain; however, in this project, this approach did not translate into an observable improvement in suggestion mining. The automated identification of comments that can be labeled as suggestions can result in considerable time savings for instructors, especially considering that less than a quarter of the analyzed comments contain suggestions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.