TBC-MI：通过最大化清洗样本来抑制噪声标签，从而实现稳健的图像分类

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Processing & Management Pub Date : 2024-09-01 Epub Date: 2024-06-12 DOI:10.1016/j.ipm.2024.103801

Yanhong Li, Zhiqing Guo, Liejun Wang, Lianghui Xu

{"title":"TBC-MI：通过最大化清洗样本来抑制噪声标签，从而实现稳健的图像分类","authors":"Yanhong Li, Zhiqing Guo, Liejun Wang, Lianghui Xu","doi":"10.1016/j.ipm.2024.103801","DOIUrl":null,"url":null,"abstract":"<div><p>In classification tasks with noisy labels, eliminating the interference of noisy label samples in the dataset is the key to improving network performance. However, the distribution between some noise and clean samples is overlapping, so it is a great challenge to distinguish them. Clean label samples within the overlapping region often contain highly representative feature information, which is extremely valuable for deep learning. We propose a new method called twin binary classification-mixed input (TBC-MI) to tackle this challenge. Specifically, TBC-MI utilizes the twin classification network to partition the sample and converts the original complex classification problem into a binary classification. It filters clean label samples from hard label regions using a simple multilayer binary classification network. TBC-MI uses noise from the dataset in the dividing process to better reflect real-world scenarios. After maximizing the clean label samples, TBC-MI adopts a hybrid online and offline input method to expand the subsequent input form of the samples. The proposed method is verified on CIFAR-10 and CIFAR-100 datasets containing artificially synthesized noise and Clothing1M ANIMAL-10N, CIFAR-10N, and CHAOYANG datasets with real-world noise. Extensive experiments show that our method achieves the best test accuracy on most datasets, with the best improvement of 2% compared to previous learning methods with noisy labels.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"61 5","pages":"Article 103801"},"PeriodicalIF":6.9000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TBC-MI : Suppressing noise labels by maximizing cleaning samples for robust image classification\",\"authors\":\"Yanhong Li, Zhiqing Guo, Liejun Wang, Lianghui Xu\",\"doi\":\"10.1016/j.ipm.2024.103801\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In classification tasks with noisy labels, eliminating the interference of noisy label samples in the dataset is the key to improving network performance. However, the distribution between some noise and clean samples is overlapping, so it is a great challenge to distinguish them. Clean label samples within the overlapping region often contain highly representative feature information, which is extremely valuable for deep learning. We propose a new method called twin binary classification-mixed input (TBC-MI) to tackle this challenge. Specifically, TBC-MI utilizes the twin classification network to partition the sample and converts the original complex classification problem into a binary classification. It filters clean label samples from hard label regions using a simple multilayer binary classification network. TBC-MI uses noise from the dataset in the dividing process to better reflect real-world scenarios. After maximizing the clean label samples, TBC-MI adopts a hybrid online and offline input method to expand the subsequent input form of the samples. The proposed method is verified on CIFAR-10 and CIFAR-100 datasets containing artificially synthesized noise and Clothing1M ANIMAL-10N, CIFAR-10N, and CHAOYANG datasets with real-world noise. Extensive experiments show that our method achieves the best test accuracy on most datasets, with the best improvement of 2% compared to previous learning methods with noisy labels.</p></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"61 5\",\"pages\":\"Article 103801\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457324001602\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/6/12 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324001602","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/12 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

在有噪声标签的分类任务中，消除数据集中噪声标签样本的干扰是提高网络性能的关键。然而，一些噪声样本和干净样本之间的分布是重叠的，因此如何区分它们是一个巨大的挑战。重叠区域内的干净标签样本往往包含极具代表性的特征信息，这对深度学习来说极具价值。我们提出了一种名为孪生二元分类混合输入（TBC-MI）的新方法来应对这一挑战。具体来说，TBC-MI 利用孪生分类网络对样本进行分割，将原本复杂的分类问题转换为二元分类。它利用一个简单的多层二进制分类网络，从硬标签区域过滤干净的标签样本。TBC-MI 在划分过程中使用了数据集的噪声，以更好地反映真实世界的场景。在最大化干净标签样本后，TBC-MI 采用在线和离线混合输入法来扩展样本的后续输入形式。我们在包含人工合成噪声的 CIFAR-10 和 CIFAR-100 数据集以及包含真实世界噪声的 Clothing1M ANIMAL-10N、CIFAR-10N 和 CHAOYANG 数据集上验证了所提出的方法。广泛的实验表明，我们的方法在大多数数据集上都达到了最佳的测试准确率，与之前使用噪声标签的学习方法相比，最好的改进幅度为 2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

TBC-MI : Suppressing noise labels by maximizing cleaning samples for robust image classification

In classification tasks with noisy labels, eliminating the interference of noisy label samples in the dataset is the key to improving network performance. However, the distribution between some noise and clean samples is overlapping, so it is a great challenge to distinguish them. Clean label samples within the overlapping region often contain highly representative feature information, which is extremely valuable for deep learning. We propose a new method called twin binary classification-mixed input (TBC-MI) to tackle this challenge. Specifically, TBC-MI utilizes the twin classification network to partition the sample and converts the original complex classification problem into a binary classification. It filters clean label samples from hard label regions using a simple multilayer binary classification network. TBC-MI uses noise from the dataset in the dividing process to better reflect real-world scenarios. After maximizing the clean label samples, TBC-MI adopts a hybrid online and offline input method to expand the subsequent input form of the samples. The proposed method is verified on CIFAR-10 and CIFAR-100 datasets containing artificially synthesized noise and Clothing1M ANIMAL-10N, CIFAR-10N, and CHAOYANG datasets with real-world noise. Extensive experiments show that our method achieves the best test accuracy on most datasets, with the best improvement of 2% compared to previous learning methods with noisy labels.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.