A multi-scale dual-decoder autoencoder model for domain-shift machine sound anomaly detection

IF 2.9 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Digital Signal Processing Pub Date : 2024-10-11 DOI:10.1016/j.dsp.2024.104813

Shengbing Chen, Yong Sun, Junjie Wang, Mengyuan Wan, Mengyuan Liu, Xiaofan Li

{"title":"A multi-scale dual-decoder autoencoder model for domain-shift machine sound anomaly detection","authors":"Shengbing Chen, Yong Sun, Junjie Wang, Mengyuan Wan, Mengyuan Liu, Xiaofan Li","doi":"10.1016/j.dsp.2024.104813","DOIUrl":null,"url":null,"abstract":"<div><div>Anomaly detection through machine sounds plays a crucial role in the development of industrial automation due to its excellent flexibility and real-time response capabilities. However, in real-world scenarios, the occurrence frequency of machine anomaly events is relatively low, making it difficult to collect anomaly sound data under various operating conditions. Moreover, due to the influence of operating conditions and environmental noise, the collected sound data may have distribution differences, leading to data domain shifts issues. To address these problems, we propose an unsupervised multi-scale dual-decoder autoencoder (MS-D2AE) network for anomaly sound detection. The MS-D2AE model consists of residual layers, an encoder, and two decoders. The model fuses fine-grained information of sound features through the Multi-scale Feature Fusion Module (MTSFFM), enabling the model to effectively learn feature data from multiple scales. By using a residual layer composed of a single MTSFFM, the encoder's input is directly connected to the intermediate results, further enhancing information transmission. The designed dual-decoder autoencoder structure, in addition to reconstructing error calculation, also utilizes the similarity error calculation between the outputs of the two decoders, encouraging the model to more accurately reconstruct the feature data during learning, thus more comprehensively learning the feature representation of normal data. Additionally, to mitigate the impact of data shift on model performance, we design a feature domain mixing method that blends sound features from both source and target domains to enhance the diversity and generalization of sound features. Finally, we have verified the effectiveness of this method on the Dcase2023 Challenge Task2 and Dcase2022 Challenge Task2 datasets.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104813"},"PeriodicalIF":2.9000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S105120042400438X","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Anomaly detection through machine sounds plays a crucial role in the development of industrial automation due to its excellent flexibility and real-time response capabilities. However, in real-world scenarios, the occurrence frequency of machine anomaly events is relatively low, making it difficult to collect anomaly sound data under various operating conditions. Moreover, due to the influence of operating conditions and environmental noise, the collected sound data may have distribution differences, leading to data domain shifts issues. To address these problems, we propose an unsupervised multi-scale dual-decoder autoencoder (MS-D2AE) network for anomaly sound detection. The MS-D2AE model consists of residual layers, an encoder, and two decoders. The model fuses fine-grained information of sound features through the Multi-scale Feature Fusion Module (MTSFFM), enabling the model to effectively learn feature data from multiple scales. By using a residual layer composed of a single MTSFFM, the encoder's input is directly connected to the intermediate results, further enhancing information transmission. The designed dual-decoder autoencoder structure, in addition to reconstructing error calculation, also utilizes the similarity error calculation between the outputs of the two decoders, encouraging the model to more accurately reconstruct the feature data during learning, thus more comprehensively learning the feature representation of normal data. Additionally, to mitigate the impact of data shift on model performance, we design a feature domain mixing method that blends sound features from both source and target domains to enhance the diversity and generalization of sound features. Finally, we have verified the effectiveness of this method on the Dcase2023 Challenge Task2 and Dcase2022 Challenge Task2 datasets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于域转移机器声音异常检测的多尺度双解码器自动编码器模型

通过机器声音进行异常检测具有出色的灵活性和实时响应能力，因此在工业自动化发展中发挥着至关重要的作用。然而，在实际应用场景中，机器异常事件的发生频率相对较低，因此很难收集到各种运行条件下的异常声音数据。此外，由于工作条件和环境噪声的影响，采集到的声音数据可能存在分布差异，从而导致数据域转移问题。针对这些问题，我们提出了一种用于异常声音检测的无监督多尺度双解码器自动编码器（MS-D2AE）网络。MS-D2AE 模型由残差层、一个编码器和两个解码器组成。该模型通过多尺度特征融合模块（MTSFFM）融合声音特征的细粒度信息，使模型能够有效地学习来自多个尺度的特征数据。通过使用由单个 MTSFFM 组成的残差层，编码器的输入与中间结果直接相连，进一步加强了信息传输。所设计的双解码器自动编码器结构，除了重构误差计算外，还利用两个解码器输出之间的相似性误差计算，促使模型在学习过程中更准确地重构特征数据，从而更全面地学习正常数据的特征表示。此外，为了减轻数据偏移对模型性能的影响，我们设计了一种特征域混合方法，将源域和目标域的声音特征混合在一起，以增强声音特征的多样性和泛化能力。最后，我们在 Dcase2023 Challenge Task2 和 Dcase2022 Challenge Task2 数据集上验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Digital Signal Processing 工程技术-工程：电子与电气

CiteScore

5.30

自引率

17.20%

发文量

435

审稿时长

66 days

期刊介绍： Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal. The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as: • big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,

期刊最新文献

Editorial Board Editorial Board Research on ZYNQ neural network acceleration method for aluminum surface microdefects Cross-scale informative priors network for medical image segmentation An improved digital predistortion scheme for nonlinear transmitters with limited bandwidth