Decoding class dynamics in learning with noisy labels

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Recognition Letters Pub Date : 2024-08-01 DOI:10.1016/j.patrec.2024.04.012

{"title":"Decoding class dynamics in learning with noisy labels","authors":"","doi":"10.1016/j.patrec.2024.04.012","DOIUrl":null,"url":null,"abstract":"<div><p><span>The creation of large-scale datasets annotated by humans inevitably introduces noisy labels, leading to reduced generalization in deep-learning models. Sample selection-based learning with noisy labels is a recent approach that exhibits promising upbeat performance improvements<span>. The selection of clean samples amongst the noisy samples is an important criterion in the learning process of these models. In this work, we delve deeper into the clean-noise split decision and highlight the aspect that effective demarcation of samples would lead to better performance. We identify the Global Noise Conundrum in the existing models, where the distribution of samples is treated globally. We propose a per-class-based local distribution of samples and demonstrate the effectiveness of this approach in having a better clean-noise split. We validate our proposal on several benchmarks — both real and synthetic, and show substantial improvements over different state-of-the-art algorithms. We further propose a new metric, classiness to extend our analysis and highlight the effectiveness of the proposed method. Source code and instructions to reproduce this paper are available at </span></span><span><span>https://github.com/aldakata/CCLM/</span><svg><path></path></svg></span></p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"184 ","pages":"Pages 239-245"},"PeriodicalIF":3.9000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865524001132","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The creation of large-scale datasets annotated by humans inevitably introduces noisy labels, leading to reduced generalization in deep-learning models. Sample selection-based learning with noisy labels is a recent approach that exhibits promising upbeat performance improvements. The selection of clean samples amongst the noisy samples is an important criterion in the learning process of these models. In this work, we delve deeper into the clean-noise split decision and highlight the aspect that effective demarcation of samples would lead to better performance. We identify the Global Noise Conundrum in the existing models, where the distribution of samples is treated globally. We propose a per-class-based local distribution of samples and demonstrate the effectiveness of this approach in having a better clean-noise split. We validate our proposal on several benchmarks — both real and synthetic, and show substantial improvements over different state-of-the-art algorithms. We further propose a new metric, classiness to extend our analysis and highlight the effectiveness of the proposed method. Source code and instructions to reproduce this paper are available at https://github.com/aldakata/CCLM/

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在有噪声标签的学习中解码类别动态

创建由人类注释的大规模数据集不可避免地会引入噪声标签，导致深度学习模型的泛化能力降低。基于样本选择的噪声标签学习是最近的一种方法，它在性能提升方面大有可为。在这些模型的学习过程中，从噪声样本中选择干净样本是一个重要标准。在这项工作中，我们深入探讨了 "干净样本-噪声样本 "的划分决策，并强调了有效划分样本将带来更好性能的观点。我们发现了现有模型中的 "全局噪声难题"，即对样本分布进行全局处理。我们提出了一种基于每个类别的局部样本分布方法，并证明了这种方法在更好地划分净噪方面的有效性。我们在多个基准（包括真实基准和合成基准）上验证了我们的建议，结果表明，与不同的先进算法相比，我们的建议有了实质性的改进。我们进一步提出了一个新的指标--分类度，以扩展我们的分析并突出所提方法的有效性。本文的源代码和复制说明可从 https://github.com/aldakata/CCLM/ 网站获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Pattern Recognition Letters 工程技术-计算机：人工智能

CiteScore

12.40

自引率

5.90%

发文量

287

审稿时长

9.1 months

期刊介绍： Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.