Lin Zhang;Menglong Wu;Xichang Cai;Yundong Li;Wenkai Liu
{"title":"Joint Analysis of Acoustic Scenes and Sound Events in Multitask Learning Based on Cross_MMoE Model and Class-Balanced Loss","authors":"Lin Zhang;Menglong Wu;Xichang Cai;Yundong Li;Wenkai Liu","doi":"10.1109/JSEN.2024.3390231","DOIUrl":null,"url":null,"abstract":"Acoustic scene classification (ASC) and sound event detection (SED) are two research directions in the field of acoustics, and they are closely related. Previous works have adopted a joint analysis method for acoustic scenes and events based on multitask learning (MTL). However, the traditional MTL models are often sensitive to the proportion of dataset partitioning, and multitask analysis is not as effective as single-task analysis. In addition, the performance of traditional MTL models is highly dependent on the weights of the loss function, and manually adjusting weights is costly. In response to these issues, we suggest improvements in both the model and loss function formulation, to utilize additional sound event information to assist in improving the performance of ASC. First, the multigate mixture-of-experts (MMoEs) model is introduced into the field of acoustics. Experimental results obtained using TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets indicate that the mixture-of-experts model achieves an optimal performance of 98.74% in terms of \n<inline-formula> <tex-math>$F1$ </tex-math></inline-formula>\n-score, which is 1.43% higher than traditional MTL models; second, we improve the mixture-of-experts model and propose the Cross_MMoE model, which increases the information interaction between different task branches, and the \n<inline-formula> <tex-math>$F1$ </tex-math></inline-formula>\n-score is further improved to 99.04%; finally, to address the issue of imbalanced sample categories in the dataset, we evaluate the class balanced loss formulation to replace the traditional multitask loss function. The performance of the traditional multitask model, MMoE model, and Cross_MMoE model has been improved, and more specifically, the \n<inline-formula> <tex-math>$F1$ </tex-math></inline-formula>\n-score of the Cross_MMoE model has increased to 99.31%.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10510225/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Acoustic scene classification (ASC) and sound event detection (SED) are two research directions in the field of acoustics, and they are closely related. Previous works have adopted a joint analysis method for acoustic scenes and events based on multitask learning (MTL). However, the traditional MTL models are often sensitive to the proportion of dataset partitioning, and multitask analysis is not as effective as single-task analysis. In addition, the performance of traditional MTL models is highly dependent on the weights of the loss function, and manually adjusting weights is costly. In response to these issues, we suggest improvements in both the model and loss function formulation, to utilize additional sound event information to assist in improving the performance of ASC. First, the multigate mixture-of-experts (MMoEs) model is introduced into the field of acoustics. Experimental results obtained using TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets indicate that the mixture-of-experts model achieves an optimal performance of 98.74% in terms of
$F1$
-score, which is 1.43% higher than traditional MTL models; second, we improve the mixture-of-experts model and propose the Cross_MMoE model, which increases the information interaction between different task branches, and the
$F1$
-score is further improved to 99.04%; finally, to address the issue of imbalanced sample categories in the dataset, we evaluate the class balanced loss formulation to replace the traditional multitask loss function. The performance of the traditional multitask model, MMoE model, and Cross_MMoE model has been improved, and more specifically, the
$F1$
-score of the Cross_MMoE model has increased to 99.31%.
期刊介绍:
The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following:
-Sensor Phenomenology, Modelling, and Evaluation
-Sensor Materials, Processing, and Fabrication
-Chemical and Gas Sensors
-Microfluidics and Biosensors
-Optical Sensors
-Physical Sensors: Temperature, Mechanical, Magnetic, and others
-Acoustic and Ultrasonic Sensors
-Sensor Packaging
-Sensor Networks
-Sensor Applications
-Sensor Systems: Signals, Processing, and Interfaces
-Actuators and Sensor Power Systems
-Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting
-Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data)
-Sensors in Industrial Practice