Comparing and Improving Active Learning Uncertainty Measures for Transformer Models by Discarding Outliers

IF 8.3 3区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Systems Frontiers Pub Date : 2024-06-26 DOI:10.1007/s10796-024-10503-z

Julius Gonsior, Christian Falkenberg, Silvio Magino, Anja Reusch, Claudio Hartmann, Maik Thiele, Wolfgang Lehner

{"title":"Comparing and Improving Active Learning Uncertainty Measures for Transformer Models by Discarding Outliers","authors":"Julius Gonsior, Christian Falkenberg, Silvio Magino, Anja Reusch, Claudio Hartmann, Maik Thiele, Wolfgang Lehner","doi":"10.1007/s10796-024-10503-z","DOIUrl":null,"url":null,"abstract":"<p>Despite achieving state-of-the-art results in nearly all Natural Language Processing applications, fine-tuning Transformer-encoder based language models still requires a significant amount of labeled data to achieve satisfying work. A well known technique to reduce the amount of human effort in acquiring a labeled dataset is <i>Active Learning</i> (AL): an iterative process in which only the minimal amount of samples is labeled. AL strategies require access to a quantified confidence measure of the model predictions. A common choice is the softmax activation function for the final Neural Network layer. In this paper, we compare eight alternatives on seven datasets and show that the softmax function provides misleading probabilities. Our finding is that most of the methods primarily identify hard-to-learn-from samples (commonly called outliers), resulting in worse than random performance, instead of samples, which actually reduce the uncertainty of the learned language model. As a solution, this paper proposes Uncertainty-Clipping, a heuristic to systematically exclude samples, which results in improvements for most methods compared to the softmax function.</p>","PeriodicalId":13610,"journal":{"name":"Information Systems Frontiers","volume":"27 1","pages":""},"PeriodicalIF":8.3000,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems Frontiers","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10796-024-10503-z","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Despite achieving state-of-the-art results in nearly all Natural Language Processing applications, fine-tuning Transformer-encoder based language models still requires a significant amount of labeled data to achieve satisfying work. A well known technique to reduce the amount of human effort in acquiring a labeled dataset is Active Learning (AL): an iterative process in which only the minimal amount of samples is labeled. AL strategies require access to a quantified confidence measure of the model predictions. A common choice is the softmax activation function for the final Neural Network layer. In this paper, we compare eight alternatives on seven datasets and show that the softmax function provides misleading probabilities. Our finding is that most of the methods primarily identify hard-to-learn-from samples (commonly called outliers), resulting in worse than random performance, instead of samples, which actually reduce the uncertainty of the learned language model. As a solution, this paper proposes Uncertainty-Clipping, a heuristic to systematically exclude samples, which results in improvements for most methods compared to the softmax function.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过剔除异常值比较和改进变压器模型的主动学习不确定性测量方法

尽管在几乎所有的自然语言处理应用中都取得了最先进的成果，但要对基于变换器编码器的语言模型进行微调，仍然需要大量的标注数据才能达到令人满意的效果。主动学习（Active Learning，AL）是一项众所周知的技术，它可以减少人类在获取标注数据集方面的工作量：这是一个迭代过程，在此过程中只对最小数量的样本进行标注。主动学习策略要求获得模型预测的量化置信度。一种常见的选择是神经网络最终层的软最大激活函数。在本文中，我们在七个数据集上比较了八种替代方法，结果表明，softmax 函数提供的概率具有误导性。我们的发现是，大多数方法主要识别难以学习的样本（通常称为离群值），结果比随机表现更差，而不是样本，这实际上降低了所学语言模型的不确定性。作为解决方案，本文提出了 "不确定性裁剪"（Uncertainty-Clipping），这是一种系统性地排除样本的启发式方法，与 softmax 函数相比，大多数方法都有所改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Systems Frontiers 工程技术-计算机：理论方法

CiteScore

13.30

自引率

18.60%

发文量

127

审稿时长

9 months

期刊介绍： The interdisciplinary interfaces of Information Systems (IS) are fast emerging as defining areas of research and development in IS. These developments are largely due to the transformation of Information Technology (IT) towards networked worlds and its effects on global communications and economies. While these developments are shaping the way information is used in all forms of human enterprise, they are also setting the tone and pace of information systems of the future. The major advances in IT such as client/server systems, the Internet and the desktop/multimedia computing revolution, for example, have led to numerous important vistas of research and development with considerable practical impact and academic significance. While the industry seeks to develop high performance IS/IT solutions to a variety of contemporary information support needs, academia looks to extend the reach of IS technology into new application domains. Information Systems Frontiers (ISF) aims to provide a common forum of dissemination of frontline industrial developments of substantial academic value and pioneering academic research of significant practical impact.