使用 ChatGPT 对数据集进行注释：智能辅导系统案例研究

Machine learning with applications Pub Date : 2024-05-09 DOI:10.1016/j.mlwa.2024.100557

Aleksandar Vujinović, Nikola Luburić, Jelena Slivka, Aleksandar Kovačević

{"title":"使用 ChatGPT 对数据集进行注释：智能辅导系统案例研究","authors":"Aleksandar Vujinović, Nikola Luburić, Jelena Slivka, Aleksandar Kovačević","doi":"10.1016/j.mlwa.2024.100557","DOIUrl":null,"url":null,"abstract":"<div><p>Large language models like ChatGPT can learn in-context (ICL) from examples. Studies showed that, due to ICL, ChatGPT achieves impressive performance in various natural language processing tasks. However, to the best of our knowledge, this is the first study that assesses ChatGPT's effectiveness in annotating a dataset for training instructor models in intelligent tutoring systems (ITSs). The task of an ITS instructor model is to automatically provide effective tutoring instruction given a student's state, mimicking human instructors. These models are typically implemented as hardcoded rules, requiring expertise, and limiting their ability to generalize and personalize instructions. These problems could be mitigated by utilizing machine learning (ML). However, developing ML models requires a large dataset of student states annotated by corresponding tutoring instructions. Using human experts to annotate such a dataset is expensive, time-consuming, and requires pedagogical expertise. Thus, this study explores ChatGPT's potential to act as a pedagogy expert annotator. Using prompt engineering, we created a list of instructions a tutor could recommend to a student. We manually filtered this list and instructed ChatGPT to select the appropriate instruction from the list for the given student's state. We manually analyzed ChatGPT's responses that could be considered incorrectly annotated. Our results indicate that using ChatGPT as an annotator is an effective alternative to human experts. The contributions of our work are (1) a novel dataset annotation methodology for the ITS, (2) a publicly available dataset of student states annotated with tutoring instructions, and (3) a list of possible tutoring instructions.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"16 ","pages":"Article 100557"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000331/pdfft?md5=3322a1226bc15e9303a8f45ef791c421&pid=1-s2.0-S2666827024000331-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Using ChatGPT to annotate a dataset: A case study in intelligent tutoring systems\",\"authors\":\"Aleksandar Vujinović, Nikola Luburić, Jelena Slivka, Aleksandar Kovačević\",\"doi\":\"10.1016/j.mlwa.2024.100557\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Large language models like ChatGPT can learn in-context (ICL) from examples. Studies showed that, due to ICL, ChatGPT achieves impressive performance in various natural language processing tasks. However, to the best of our knowledge, this is the first study that assesses ChatGPT's effectiveness in annotating a dataset for training instructor models in intelligent tutoring systems (ITSs). The task of an ITS instructor model is to automatically provide effective tutoring instruction given a student's state, mimicking human instructors. These models are typically implemented as hardcoded rules, requiring expertise, and limiting their ability to generalize and personalize instructions. These problems could be mitigated by utilizing machine learning (ML). However, developing ML models requires a large dataset of student states annotated by corresponding tutoring instructions. Using human experts to annotate such a dataset is expensive, time-consuming, and requires pedagogical expertise. Thus, this study explores ChatGPT's potential to act as a pedagogy expert annotator. Using prompt engineering, we created a list of instructions a tutor could recommend to a student. We manually filtered this list and instructed ChatGPT to select the appropriate instruction from the list for the given student's state. We manually analyzed ChatGPT's responses that could be considered incorrectly annotated. Our results indicate that using ChatGPT as an annotator is an effective alternative to human experts. The contributions of our work are (1) a novel dataset annotation methodology for the ITS, (2) a publicly available dataset of student states annotated with tutoring instructions, and (3) a list of possible tutoring instructions.</p></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"16 \",\"pages\":\"Article 100557\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666827024000331/pdfft?md5=3322a1226bc15e9303a8f45ef791c421&pid=1-s2.0-S2666827024000331-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827024000331\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827024000331","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

像 ChatGPT 这样的大型语言模型可以从示例中学习上下文（ICL）。研究表明，由于有了 ICL，ChatGPT 在各种自然语言处理任务中都取得了令人瞩目的成绩。然而，据我们所知，这是第一项评估 ChatGPT 在为智能辅导系统（ITS）中的教师模型训练数据集注释时的有效性的研究。智能辅导系统教师模型的任务是模仿人类教师，根据学生的状态自动提供有效的辅导指导。这些模型通常是以硬编码规则的形式实现的，需要专业知识，而且限制了其概括和个性化指导的能力。利用机器学习（ML）可以缓解这些问题。然而，开发 ML 模型需要一个由相应辅导指令注释的大型学生状态数据集。使用人类专家来注释这样一个数据集既昂贵又耗时，而且还需要教学方面的专业知识。因此，本研究探索了 ChatGPT 作为教学法专家注释器的潜力。通过使用提示工程，我们创建了一份导师可向学生推荐的说明列表。我们手动筛选了这个列表，并指示 ChatGPT 从列表中为给定的学生状态选择合适的指令。我们手动分析了 ChatGPT 的回复中可能存在的错误注释。我们的结果表明，使用 ChatGPT 作为注释器可以有效替代人类专家。我们工作的贡献在于：(1) 为智能学习系统提供了一种新颖的数据集注释方法；(2) 提供了一个公开的学生状态数据集，其中注释了辅导说明；(3) 提供了一个可能的辅导说明列表。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Using ChatGPT to annotate a dataset: A case study in intelligent tutoring systems

Large language models like ChatGPT can learn in-context (ICL) from examples. Studies showed that, due to ICL, ChatGPT achieves impressive performance in various natural language processing tasks. However, to the best of our knowledge, this is the first study that assesses ChatGPT's effectiveness in annotating a dataset for training instructor models in intelligent tutoring systems (ITSs). The task of an ITS instructor model is to automatically provide effective tutoring instruction given a student's state, mimicking human instructors. These models are typically implemented as hardcoded rules, requiring expertise, and limiting their ability to generalize and personalize instructions. These problems could be mitigated by utilizing machine learning (ML). However, developing ML models requires a large dataset of student states annotated by corresponding tutoring instructions. Using human experts to annotate such a dataset is expensive, time-consuming, and requires pedagogical expertise. Thus, this study explores ChatGPT's potential to act as a pedagogy expert annotator. Using prompt engineering, we created a list of instructions a tutor could recommend to a student. We manually filtered this list and instructed ChatGPT to select the appropriate instruction from the list for the given student's state. We manually analyzed ChatGPT's responses that could be considered incorrectly annotated. Our results indicate that using ChatGPT as an annotator is an effective alternative to human experts. The contributions of our work are (1) a novel dataset annotation methodology for the ITS, (2) a publicly available dataset of student states annotated with tutoring instructions, and (3) a list of possible tutoring instructions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications

自引率

0.00%

发文量

审稿时长

98 days