Joint Analysis of Acoustic Scenes and Sound Events in Multitask Learning Based on Cross_MMoE Model and Class-Balanced Loss

IF 4.3 2区综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Sensors Journal Pub Date : 2024-04-29 DOI:10.1109/JSEN.2024.3390231

Lin Zhang;Menglong Wu;Xichang Cai;Yundong Li;Wenkai Liu

{"title":"Joint Analysis of Acoustic Scenes and Sound Events in Multitask Learning Based on Cross_MMoE Model and Class-Balanced Loss","authors":"Lin Zhang;Menglong Wu;Xichang Cai;Yundong Li;Wenkai Liu","doi":"10.1109/JSEN.2024.3390231","DOIUrl":null,"url":null,"abstract":"Acoustic scene classification (ASC) and sound event detection (SED) are two research directions in the field of acoustics, and they are closely related. Previous works have adopted a joint analysis method for acoustic scenes and events based on multitask learning (MTL). However, the traditional MTL models are often sensitive to the proportion of dataset partitioning, and multitask analysis is not as effective as single-task analysis. In addition, the performance of traditional MTL models is highly dependent on the weights of the loss function, and manually adjusting weights is costly. In response to these issues, we suggest improvements in both the model and loss function formulation, to utilize additional sound event information to assist in improving the performance of ASC. First, the multigate mixture-of-experts (MMoEs) model is introduced into the field of acoustics. Experimental results obtained using TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets indicate that the mixture-of-experts model achieves an optimal performance of 98.74% in terms of \n<inline-formula> <tex-math>$F1$ </tex-math></inline-formula>\n-score, which is 1.43% higher than traditional MTL models; second, we improve the mixture-of-experts model and propose the Cross_MMoE model, which increases the information interaction between different task branches, and the \n<inline-formula> <tex-math>$F1$ </tex-math></inline-formula>\n-score is further improved to 99.04%; finally, to address the issue of imbalanced sample categories in the dataset, we evaluate the class balanced loss formulation to replace the traditional multitask loss function. The performance of the traditional multitask model, MMoE model, and Cross_MMoE model has been improved, and more specifically, the \n<inline-formula> <tex-math>$F1$ </tex-math></inline-formula>\n-score of the Cross_MMoE model has increased to 99.31%.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10510225/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Acoustic scene classification (ASC) and sound event detection (SED) are two research directions in the field of acoustics, and they are closely related. Previous works have adopted a joint analysis method for acoustic scenes and events based on multitask learning (MTL). However, the traditional MTL models are often sensitive to the proportion of dataset partitioning, and multitask analysis is not as effective as single-task analysis. In addition, the performance of traditional MTL models is highly dependent on the weights of the loss function, and manually adjusting weights is costly. In response to these issues, we suggest improvements in both the model and loss function formulation, to utilize additional sound event information to assist in improving the performance of ASC. First, the multigate mixture-of-experts (MMoEs) model is introduced into the field of acoustics. Experimental results obtained using TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets indicate that the mixture-of-experts model achieves an optimal performance of 98.74% in terms of

$F1$

-score, which is 1.43% higher than traditional MTL models; second, we improve the mixture-of-experts model and propose the Cross_MMoE model, which increases the information interaction between different task branches, and the

$F1$

-score is further improved to 99.04%; finally, to address the issue of imbalanced sample categories in the dataset, we evaluate the class balanced loss formulation to replace the traditional multitask loss function. The performance of the traditional multitask model, MMoE model, and Cross_MMoE model has been improved, and more specifically, the

$F1$

-score of the Cross_MMoE model has increased to 99.31%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于 Cross_MMoE 模型和类平衡损失的多任务学习中的声学场景和声音事件联合分析

声学场景分类（ASC）和声学事件检测（SED）是声学领域的两个研究方向，两者密切相关。以往的研究采用基于多任务学习（MTL）的声学场景和事件联合分析方法。然而，传统的 MTL 模型往往对数据集的划分比例比较敏感，多任务分析的效果不如单任务分析。此外，传统 MTL 模型的性能高度依赖于损失函数的权重，而手动调整权重的成本很高。针对这些问题，我们建议对模型和损失函数公式进行改进，利用更多的声音事件信息来帮助提高 ASC 的性能。首先，我们在声学领域引入了多专家混合物（MMoEs）模型。使用 TUT Sound Events 2016/2017 和 TUT Acoustic Scenes 2016 数据集获得的实验结果表明，专家混合物模型在 $F1$ -score 方面达到了 98.74% 的最佳性能，比传统的 MTL 模型高出 1.43% ；其次，我们改进了专家混合物模型，提出了 Cross_MMoE 模型，增加了不同任务分支之间的信息交互，$F1$ -score 进一步提高到 99.04% ；最后，针对数据集中样本类别不平衡的问题，我们评估了类平衡损失表述来替代传统的多任务损失函数。传统多任务模型、MMoE 模型和 Cross_MMoE 模型的性能都得到了提高，更具体地说，Cross_MMoE 模型的 F1$ -score 分数提高到了 99.31%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Sensors Journal 工程技术-工程：电子与电气

CiteScore

7.70

自引率

14.00%

发文量

2058

审稿时长

5.2 months

期刊介绍： The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice