用于域转移机器声音异常检测的多尺度双解码器自动编码器模型

IF 2.9 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Digital Signal Processing Pub Date : 2024-10-11 DOI:10.1016/j.dsp.2024.104813
Shengbing Chen, Yong Sun, Junjie Wang, Mengyuan Wan, Mengyuan Liu, Xiaofan Li
{"title":"用于域转移机器声音异常检测的多尺度双解码器自动编码器模型","authors":"Shengbing Chen,&nbsp;Yong Sun,&nbsp;Junjie Wang,&nbsp;Mengyuan Wan,&nbsp;Mengyuan Liu,&nbsp;Xiaofan Li","doi":"10.1016/j.dsp.2024.104813","DOIUrl":null,"url":null,"abstract":"<div><div>Anomaly detection through machine sounds plays a crucial role in the development of industrial automation due to its excellent flexibility and real-time response capabilities. However, in real-world scenarios, the occurrence frequency of machine anomaly events is relatively low, making it difficult to collect anomaly sound data under various operating conditions. Moreover, due to the influence of operating conditions and environmental noise, the collected sound data may have distribution differences, leading to data domain shifts issues. To address these problems, we propose an unsupervised multi-scale dual-decoder autoencoder (MS-D2AE) network for anomaly sound detection. The MS-D2AE model consists of residual layers, an encoder, and two decoders. The model fuses fine-grained information of sound features through the Multi-scale Feature Fusion Module (MTSFFM), enabling the model to effectively learn feature data from multiple scales. By using a residual layer composed of a single MTSFFM, the encoder's input is directly connected to the intermediate results, further enhancing information transmission. The designed dual-decoder autoencoder structure, in addition to reconstructing error calculation, also utilizes the similarity error calculation between the outputs of the two decoders, encouraging the model to more accurately reconstruct the feature data during learning, thus more comprehensively learning the feature representation of normal data. Additionally, to mitigate the impact of data shift on model performance, we design a feature domain mixing method that blends sound features from both source and target domains to enhance the diversity and generalization of sound features. Finally, we have verified the effectiveness of this method on the Dcase2023 Challenge Task2 and Dcase2022 Challenge Task2 datasets.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104813"},"PeriodicalIF":2.9000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A multi-scale dual-decoder autoencoder model for domain-shift machine sound anomaly detection\",\"authors\":\"Shengbing Chen,&nbsp;Yong Sun,&nbsp;Junjie Wang,&nbsp;Mengyuan Wan,&nbsp;Mengyuan Liu,&nbsp;Xiaofan Li\",\"doi\":\"10.1016/j.dsp.2024.104813\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Anomaly detection through machine sounds plays a crucial role in the development of industrial automation due to its excellent flexibility and real-time response capabilities. However, in real-world scenarios, the occurrence frequency of machine anomaly events is relatively low, making it difficult to collect anomaly sound data under various operating conditions. Moreover, due to the influence of operating conditions and environmental noise, the collected sound data may have distribution differences, leading to data domain shifts issues. To address these problems, we propose an unsupervised multi-scale dual-decoder autoencoder (MS-D2AE) network for anomaly sound detection. The MS-D2AE model consists of residual layers, an encoder, and two decoders. The model fuses fine-grained information of sound features through the Multi-scale Feature Fusion Module (MTSFFM), enabling the model to effectively learn feature data from multiple scales. By using a residual layer composed of a single MTSFFM, the encoder's input is directly connected to the intermediate results, further enhancing information transmission. The designed dual-decoder autoencoder structure, in addition to reconstructing error calculation, also utilizes the similarity error calculation between the outputs of the two decoders, encouraging the model to more accurately reconstruct the feature data during learning, thus more comprehensively learning the feature representation of normal data. Additionally, to mitigate the impact of data shift on model performance, we design a feature domain mixing method that blends sound features from both source and target domains to enhance the diversity and generalization of sound features. Finally, we have verified the effectiveness of this method on the Dcase2023 Challenge Task2 and Dcase2022 Challenge Task2 datasets.</div></div>\",\"PeriodicalId\":51011,\"journal\":{\"name\":\"Digital Signal Processing\",\"volume\":\"156 \",\"pages\":\"Article 104813\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S105120042400438X\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S105120042400438X","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

通过机器声音进行异常检测具有出色的灵活性和实时响应能力,因此在工业自动化发展中发挥着至关重要的作用。然而,在实际应用场景中,机器异常事件的发生频率相对较低,因此很难收集到各种运行条件下的异常声音数据。此外,由于工作条件和环境噪声的影响,采集到的声音数据可能存在分布差异,从而导致数据域转移问题。针对这些问题,我们提出了一种用于异常声音检测的无监督多尺度双解码器自动编码器(MS-D2AE)网络。MS-D2AE 模型由残差层、一个编码器和两个解码器组成。该模型通过多尺度特征融合模块(MTSFFM)融合声音特征的细粒度信息,使模型能够有效地学习来自多个尺度的特征数据。通过使用由单个 MTSFFM 组成的残差层,编码器的输入与中间结果直接相连,进一步加强了信息传输。所设计的双解码器自动编码器结构,除了重构误差计算外,还利用两个解码器输出之间的相似性误差计算,促使模型在学习过程中更准确地重构特征数据,从而更全面地学习正常数据的特征表示。此外,为了减轻数据偏移对模型性能的影响,我们设计了一种特征域混合方法,将源域和目标域的声音特征混合在一起,以增强声音特征的多样性和泛化能力。最后,我们在 Dcase2023 Challenge Task2 和 Dcase2022 Challenge Task2 数据集上验证了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A multi-scale dual-decoder autoencoder model for domain-shift machine sound anomaly detection
Anomaly detection through machine sounds plays a crucial role in the development of industrial automation due to its excellent flexibility and real-time response capabilities. However, in real-world scenarios, the occurrence frequency of machine anomaly events is relatively low, making it difficult to collect anomaly sound data under various operating conditions. Moreover, due to the influence of operating conditions and environmental noise, the collected sound data may have distribution differences, leading to data domain shifts issues. To address these problems, we propose an unsupervised multi-scale dual-decoder autoencoder (MS-D2AE) network for anomaly sound detection. The MS-D2AE model consists of residual layers, an encoder, and two decoders. The model fuses fine-grained information of sound features through the Multi-scale Feature Fusion Module (MTSFFM), enabling the model to effectively learn feature data from multiple scales. By using a residual layer composed of a single MTSFFM, the encoder's input is directly connected to the intermediate results, further enhancing information transmission. The designed dual-decoder autoencoder structure, in addition to reconstructing error calculation, also utilizes the similarity error calculation between the outputs of the two decoders, encouraging the model to more accurately reconstruct the feature data during learning, thus more comprehensively learning the feature representation of normal data. Additionally, to mitigate the impact of data shift on model performance, we design a feature domain mixing method that blends sound features from both source and target domains to enhance the diversity and generalization of sound features. Finally, we have verified the effectiveness of this method on the Dcase2023 Challenge Task2 and Dcase2022 Challenge Task2 datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Digital Signal Processing
Digital Signal Processing 工程技术-工程:电子与电气
CiteScore
5.30
自引率
17.20%
发文量
435
审稿时长
66 days
期刊介绍: Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal. The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as: • big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,
期刊最新文献
Adaptive polarimetric persymmetric detection for distributed subspace targets in lognormal texture clutter MFFR-net: Multi-scale feature fusion and attentive recalibration network for deep neural speech enhancement PV-YOLO: A lightweight pedestrian and vehicle detection model based on improved YOLOv8 Efficient recurrent real video restoration IGGCN: Individual-guided graph convolution network for pedestrian trajectory prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1