Label Errors in BANKING77

First Workshop on Insights from Negative Results in NLP Pub Date : 1900-01-01 DOI:10.18653/v1/2022.insights-1.19

Cecilia Ying, Stephen Thomas

引用次数: 4

Abstract

We investigate potential label errors present in the popular BANKING77 dataset and the associated negative impacts on intent classification methods. Motivated by our own negative results when constructing an intent classifier, we applied two automated approaches to identify potential label errors in the dataset. We found that over 1,400 (14%) of the 10,003 training utterances may have been incorrectly labelled. In a simple experiment, we found that by removing the utterances with potential errors, our intent classifier saw an increase of 4.5% and 8% for the F1-Score and Adjusted Rand Index, respectively, in supervised and unsupervised classification. This paper serves as a warning of the potential of noisy labels in popular NLP datasets. Further study is needed to fully identify the breadth and depth of label errors in BANKING77 and other datasets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

银行标签错误

我们研究了流行的BANKING77数据集中存在的潜在标签错误以及相关的对意图分类方法的负面影响。在构建意图分类器时，由于我们自己的负面结果，我们应用了两种自动化方法来识别数据集中潜在的标签错误。我们发现，在1003个训练话语中，超过1400个(14%)可能被错误地标记。在一个简单的实验中，我们发现通过去除潜在错误的话语，我们的意图分类器在监督和非监督分类中F1-Score和Adjusted Rand Index分别提高了4.5%和8%。本文对流行的自然语言处理数据集中可能存在的噪声标签提出了警告。需要进一步的研究来充分确定BANKING77和其他数据集中标签错误的广度和深度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

First Workshop on Insights from Negative Results in NLP

自引率

0.00%

发文量

期刊最新文献

What GPT Knows About Who is Who Pathologies of Pre-trained Language Models in Few-shot Fine-tuning Can Question Rewriting Help Conversational Question Answering? Extending the Scope of Out-of-Domain: Examining QA models in multiple subdomains Do Data-based Curricula Work?