提示变压器模型的文本分类！使用大型语言模型进行基于提示的学习的示例介绍

IF 5.1 2区教育学 Q1 Social Sciences Journal of Research on Technology in Education Pub Date : 2022-11-22 DOI:10.1080/15391523.2022.2142872

Christian W. F. Mayer, Sabrina Ludwig, Steffen Brandt

{"title":"提示变压器模型的文本分类！使用大型语言模型进行基于提示的学习的示例介绍","authors":"Christian W. F. Mayer, Sabrina Ludwig, Steffen Brandt","doi":"10.1080/15391523.2022.2142872","DOIUrl":null,"url":null,"abstract":"Abstract This study investigates the potential of automated classification using prompt-based learning approaches with transformer models (large language models trained in an unsupervised manner) for a domain-specific classification task. Prompt-based learning with zero or few shots has the potential to (1) make use of artificial intelligence without sophisticated programming skills and (2) make use of artificial intelligence without fine-tuning models with large amounts of labeled training data. We apply this novel method to perform an experiment using so-called zero-shot classification as a baseline model and a few-shot approach for classification. For comparison, we also fine-tune a language model on the given classification task and conducted a second independent human rating to compare it with the given human ratings from the original study. The used dataset consists of 2,088 email responses to a domain-specific problem-solving task that were manually labeled for their professional communication style. With the novel prompt-based learning approach, we achieved a Cohen’s kappa of .40, while the fine-tuning approach yields a kappa of .59, and the new human rating achieved a kappa of .58 with the original human ratings. However, the classifications from the machine learning models have the advantage that each prediction is provided with a reliability estimate allowing us to identify responses that are difficult to score. We, therefore, argue that response ratings should be based on a reciprocal workflow of machine raters and human raters, where the machine rates easy-to-classify responses and the human raters focus and agree on the responses that are difficult to classify. Further, we believe that this new, more intuitive, prompt-based learning approach will enable more people to use artificial intelligence.","PeriodicalId":47444,"journal":{"name":"Journal of Research on Technology in Education","volume":" 10","pages":"125 - 141"},"PeriodicalIF":5.1000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Prompt text classifications with transformer models! An exemplary introduction to prompt-based learning with large language models\",\"authors\":\"Christian W. F. Mayer, Sabrina Ludwig, Steffen Brandt\",\"doi\":\"10.1080/15391523.2022.2142872\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract This study investigates the potential of automated classification using prompt-based learning approaches with transformer models (large language models trained in an unsupervised manner) for a domain-specific classification task. Prompt-based learning with zero or few shots has the potential to (1) make use of artificial intelligence without sophisticated programming skills and (2) make use of artificial intelligence without fine-tuning models with large amounts of labeled training data. We apply this novel method to perform an experiment using so-called zero-shot classification as a baseline model and a few-shot approach for classification. For comparison, we also fine-tune a language model on the given classification task and conducted a second independent human rating to compare it with the given human ratings from the original study. The used dataset consists of 2,088 email responses to a domain-specific problem-solving task that were manually labeled for their professional communication style. With the novel prompt-based learning approach, we achieved a Cohen’s kappa of .40, while the fine-tuning approach yields a kappa of .59, and the new human rating achieved a kappa of .58 with the original human ratings. However, the classifications from the machine learning models have the advantage that each prediction is provided with a reliability estimate allowing us to identify responses that are difficult to score. We, therefore, argue that response ratings should be based on a reciprocal workflow of machine raters and human raters, where the machine rates easy-to-classify responses and the human raters focus and agree on the responses that are difficult to classify. Further, we believe that this new, more intuitive, prompt-based learning approach will enable more people to use artificial intelligence.\",\"PeriodicalId\":47444,\"journal\":{\"name\":\"Journal of Research on Technology in Education\",\"volume\":\" 10\",\"pages\":\"125 - 141\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2022-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Research on Technology in Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://doi.org/10.1080/15391523.2022.2142872\",\"RegionNum\":2,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Research on Technology in Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/15391523.2022.2142872","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 5

摘要

摘要本研究调查了使用基于提示的学习方法和转换器模型（以无监督方式训练的大型语言模型）对特定领域的分类任务进行自动分类的潜力。零次或几次射击的基于提示的学习有可能（1）在没有复杂编程技能的情况下利用人工智能，以及（2）在没有使用大量标记训练数据微调模型的情况下使用人工智能。我们应用这种新方法进行了一项实验，使用所谓的零样本分类作为基线模型，并使用少热点方法进行分类。为了进行比较，我们还对给定分类任务的语言模型进行了微调，并进行了第二次独立的人类评级，将其与原始研究中给定的人类评级进行比较。使用的数据集由2088封针对特定领域解决问题任务的电子邮件回复组成，这些回复根据其专业沟通风格进行了手动标记。使用新的基于提示的学习方法，我们获得了.40的Cohen’s kappa，而微调方法获得了.59的kappa，新的人类评级与原始人类评级相比获得了.58的kappa。然而，来自机器学习模型的分类具有的优点是，每个预测都提供了可靠性估计，使我们能够识别难以评分的响应。因此，我们认为，反应评级应该基于机器评分者和人工评分者的互惠工作流程，其中机器评分易于对反应进行分类，人工评分者专注于难以分类的反应并达成一致。此外，我们相信，这种新的、更直观的、基于提示的学习方法将使更多的人能够使用人工智能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Prompt text classifications with transformer models! An exemplary introduction to prompt-based learning with large language models

Abstract This study investigates the potential of automated classification using prompt-based learning approaches with transformer models (large language models trained in an unsupervised manner) for a domain-specific classification task. Prompt-based learning with zero or few shots has the potential to (1) make use of artificial intelligence without sophisticated programming skills and (2) make use of artificial intelligence without fine-tuning models with large amounts of labeled training data. We apply this novel method to perform an experiment using so-called zero-shot classification as a baseline model and a few-shot approach for classification. For comparison, we also fine-tune a language model on the given classification task and conducted a second independent human rating to compare it with the given human ratings from the original study. The used dataset consists of 2,088 email responses to a domain-specific problem-solving task that were manually labeled for their professional communication style. With the novel prompt-based learning approach, we achieved a Cohen’s kappa of .40, while the fine-tuning approach yields a kappa of .59, and the new human rating achieved a kappa of .58 with the original human ratings. However, the classifications from the machine learning models have the advantage that each prediction is provided with a reliability estimate allowing us to identify responses that are difficult to score. We, therefore, argue that response ratings should be based on a reciprocal workflow of machine raters and human raters, where the machine rates easy-to-classify responses and the human raters focus and agree on the responses that are difficult to classify. Further, we believe that this new, more intuitive, prompt-based learning approach will enable more people to use artificial intelligence.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Research on Technology in Education EDUCATION & EDUCATIONAL RESEARCH-

CiteScore

11.70

自引率

5.90%

发文量

期刊介绍： The Journal of Research on Technology in Education (JRTE) is a premier source for high-quality, peer-reviewed research that defines the state of the art, and future horizons, of teaching and learning with technology. The terms "education" and "technology" are broadly defined. Education is inclusive of formal educational environments ranging from PK-12 to higher education, and informal learning environments, such as museums, community centers, and after-school programs. Technology refers to both software and hardware innovations, and more broadly, the application of technological processes to education.