MgCNL: A Sample Separation Approach via Multi-Granularity Balls for Fault Diagnosis With the Interference of Noisy Labels

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-10-07 DOI:10.1109/TASE.2024.3469000

Fir Dunkin;Xinde Li;Heqing Li;Guoliang Wu;Chuanfei Hu;Shuzhi Sam Ge

{"title":"MgCNL: A Sample Separation Approach via Multi-Granularity Balls for Fault Diagnosis With the Interference of Noisy Labels","authors":"Fir Dunkin;Xinde Li;Heqing Li;Guoliang Wu;Chuanfei Hu;Shuzhi Sam Ge","doi":"10.1109/TASE.2024.3469000","DOIUrl":null,"url":null,"abstract":"The fault diagnosis based on supervised learning has achieved remarkable results in the intelligent manufacturing, making it an important guarantee for long-term safe and stable operation in modern industry. However, the accuracy heavily relies on high-quality annotation labels, which are expensive to obtain, limiting the diagnosis models applicability in many scenarios. Although obtaining automatically annotated samples from annotators is a promising solution, the generated dataset is always containing incorrect labels (noisy labels), due to perceptual limitations, resulting in low or even invalid the accuracy of model. With the goal of handling this challenge, a diagnostic approach based on multi-granularity information fusion to combat noisy labels, called MgCNL, is proposed, to train the model with high-accuracy, without knowing the specific noise ratio. Specifically, inspired by granular-ball computing, a confidence evaluation method of labels is designed, so that samples with high confidence labels can be selected from dataset with noisy labels for supervised learning, thus avoiding the negative impact of incorrect labels on model performance. Finally, the efficacy was demonstrated on three datasets using different backbones: MgCNL successfully reduced the adverse impact of noisy labels, achieving significantly better results than other advanced methods in various noisy scenarios, which offers a competitive model training strategy for practitioners in intelligent manufacturing or industrial fault diagnosis who are hampered by the costs associated with sample labeling. Note to Practitioners—In modern industry, the cost of manual/expert annotation for high-quality data is is prohibitively expensive, and the data annotated by automatic annotators often contains noisy labels that seriously damages the accuracy of models, which makes many data-driven diagnosis models constrained by training data and difficult to put into practice, posing an urgent challenge to the automation and intelligence of the manufacturing industry. To address this challenge, this article proposed a robust training strategy called MgCNL, aimed at offsetting the negative impact of noisy labels, in the hope that automatic annotation strategy with lower cost can be more widely applied in model training tasks for industrial practice. MgCNL, based on multi-granularity information, can effectively select high-confidence samples from datasets for supervised learning, even under unknown proportions of noise labels, thus reducing the misleading impact of noisy labels on diagnostic models. As a result, MgCNL possesses the ability to robustly train high-accuracy diagnostic models in data with noisy labels, thus enabling automatic annotators to replace experts in dataset construction as a more economical and efficient potential technical approach. Meanwhile, MgCNL also brings value to datasets with uncertain labels, making them applicable without the need to invest significant human resources to verify label reliability.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"7748-7761"},"PeriodicalIF":6.4000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10706599/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The fault diagnosis based on supervised learning has achieved remarkable results in the intelligent manufacturing, making it an important guarantee for long-term safe and stable operation in modern industry. However, the accuracy heavily relies on high-quality annotation labels, which are expensive to obtain, limiting the diagnosis models applicability in many scenarios. Although obtaining automatically annotated samples from annotators is a promising solution, the generated dataset is always containing incorrect labels (noisy labels), due to perceptual limitations, resulting in low or even invalid the accuracy of model. With the goal of handling this challenge, a diagnostic approach based on multi-granularity information fusion to combat noisy labels, called MgCNL, is proposed, to train the model with high-accuracy, without knowing the specific noise ratio. Specifically, inspired by granular-ball computing, a confidence evaluation method of labels is designed, so that samples with high confidence labels can be selected from dataset with noisy labels for supervised learning, thus avoiding the negative impact of incorrect labels on model performance. Finally, the efficacy was demonstrated on three datasets using different backbones: MgCNL successfully reduced the adverse impact of noisy labels, achieving significantly better results than other advanced methods in various noisy scenarios, which offers a competitive model training strategy for practitioners in intelligent manufacturing or industrial fault diagnosis who are hampered by the costs associated with sample labeling. Note to Practitioners—In modern industry, the cost of manual/expert annotation for high-quality data is is prohibitively expensive, and the data annotated by automatic annotators often contains noisy labels that seriously damages the accuracy of models, which makes many data-driven diagnosis models constrained by training data and difficult to put into practice, posing an urgent challenge to the automation and intelligence of the manufacturing industry. To address this challenge, this article proposed a robust training strategy called MgCNL, aimed at offsetting the negative impact of noisy labels, in the hope that automatic annotation strategy with lower cost can be more widely applied in model training tasks for industrial practice. MgCNL, based on multi-granularity information, can effectively select high-confidence samples from datasets for supervised learning, even under unknown proportions of noise labels, thus reducing the misleading impact of noisy labels on diagnostic models. As a result, MgCNL possesses the ability to robustly train high-accuracy diagnostic models in data with noisy labels, thus enabling automatic annotators to replace experts in dataset construction as a more economical and efficient potential technical approach. Meanwhile, MgCNL also brings value to datasets with uncertain labels, making them applicable without the need to invest significant human resources to verify label reliability.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MgCNL：通过多粒度球进行样本分离的方法，用于受噪声标签干扰的故障诊断

基于监督学习的故障诊断在智能制造中取得了显著的效果，是现代工业长期安全稳定运行的重要保证。然而，准确率严重依赖于高质量的标注标签，而标注标签的获取成本很高，限制了诊断模型在许多场景中的适用性。虽然从标注者那里获得自动标注的样本是一个很有前途的解决方案，但由于感知的限制，生成的数据集总是包含不正确的标签（噪声标签），导致模型的准确性低甚至无效。为了应对这一挑战，提出了一种基于多粒度信息融合对抗噪声标签的诊断方法MgCNL，在不知道具体噪声比的情况下，高精度地训练模型。具体而言，受颗粒球计算的启发，设计了一种标签置信度评价方法，从带有噪声标签的数据集中选择具有高置信度标签的样本进行监督学习，避免了错误标签对模型性能的负面影响。最后，在使用不同主干的三个数据集上证明了该方法的有效性：MgCNL成功地降低了噪声标签的不利影响，在各种噪声场景下取得了比其他先进方法更好的结果，为智能制造或工业故障诊断从业者提供了一种有竞争力的模型训练策略，这些从业者受到样本标记相关成本的阻碍。在现代工业中，人工/专家对高质量数据进行标注的成本过高，而自动标注的数据往往含有噪声标签，严重影响了模型的准确性，这使得许多数据驱动的诊断模型受到训练数据的约束，难以实现，对制造业的自动化和智能化提出了迫切的挑战。为了解决这一挑战，本文提出了一种鲁棒性训练策略MgCNL，旨在抵消噪声标签的负面影响，希望成本更低的自动标注策略能够更广泛地应用于工业实践的模型训练任务中。MgCNL基于多粒度信息，即使在噪声标签比例未知的情况下，也可以有效地从数据集中选择高置信度的样本进行监督学习，从而减少了噪声标签对诊断模型的误导影响。因此，MgCNL具有在带有噪声标签的数据中鲁棒训练高精度诊断模型的能力，从而使自动注释器能够取代数据集构建中的专家，成为一种更经济高效的潜在技术方法。同时，MgCNL也为标签不确定的数据集带来了价值，使其无需投入大量人力资源来验证标签的可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.