CML: A Contrastive Meta Learning Method to Estimate Human Label Confidence Scores and Reduce Data Collection Cost

Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5) Pub Date : 1900-01-01 DOI:10.18653/v1/2022.ecnlp-1.5

B. Dong, Yiyi Wang, Hanbo Sun, Yunji Wang, Alireza Hashemi, Zheng Du

引用次数: 7

Abstract

Deep neural network models are especially susceptible to noise in annotated labels. In the real world, annotated data typically contains noise caused by a variety of factors such as task difficulty, annotator experience, and annotator bias. Label quality is critical for label validation tasks; however, correcting for noise by collecting more data is often costly. In this paper, we propose a contrastive meta-learning framework (CML) to address the challenges introduced by noisy annotated data, specifically in the context of natural language processing. CML combines contrastive and meta learning to improve the quality of text feature representations. Meta-learning is also used to generate confidence scores to assess label quality. We demonstrate that a model built on CML-filtered data outperforms a model built on clean data. Furthermore, we perform experiments on deidentified commercial voice assistant datasets and demonstrate that our model outperforms several SOTA approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CML:一种估算人类标签置信度分数和降低数据收集成本的对比元学习方法

深度神经网络模型特别容易受到标注标签中的噪声的影响。在现实世界中，注释过的数据通常包含由各种因素引起的噪声，例如任务难度、注释者经验和注释者偏见。标签质量对标签验证任务至关重要;然而，通过收集更多的数据来纠正噪声通常是昂贵的。在本文中，我们提出了一个对比元学习框架(CML)来解决噪声注释数据带来的挑战，特别是在自然语言处理的背景下。CML结合了对比学习和元学习来提高文本特征表示的质量。元学习也用于生成信心分数来评估标签质量。我们证明了基于cml过滤数据构建的模型优于基于干净数据构建的模型。此外，我们在去识别的商业语音助手数据集上进行了实验，并证明我们的模型优于几种SOTA方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)

自引率

0.00%

发文量