A sound event detection support system for smart home based on “two-to-one” teacher–student learning

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Applied Soft Computing Pub Date : 2024-09-10 DOI:10.1016/j.asoc.2024.112224

Rongyan Wang , Yan Leng , Jian Zhuang , Chengli Sun

{"title":"A sound event detection support system for smart home based on “two-to-one” teacher–student learning","authors":"Rongyan Wang , Yan Leng , Jian Zhuang , Chengli Sun","doi":"10.1016/j.asoc.2024.112224","DOIUrl":null,"url":null,"abstract":"<div><p>Sound event detection (SED) is a core technology in smart home projects that rely on detected sound events to trigger specific actions. SED systems face two major challenges: high labeling costs and complex acoustic environments. To reduce labeling costs, some semi-supervised systems extract both global and local features for classification. However, these methods treat global and local features equally, not accounting for their varying importance when recognizing different types of sound events. Furthermore, to address complex acoustic environments, some studies use multitask learning frameworks to introduce SED-related tasks as auxiliaries to improve detection performance. However, these methods fail to align tasks within the framework, leading to conflicting outputs that may limit system performance. To address these issues, in this paper we propose a “two-to-one” teacher-student learning based semi-supervised SED system. This system employs a gating mechanism to selectively enhance global and local features, improving adaptability to different types of sound events, and incorporates a cross-task alignment module to interact SED with related tasks, reducing the risk of performance degradation caused by conflicting outputs. Experimental results on two datasets demonstrate that our system achieves the best performance in all metrics, with EB-F1 scores of 48.1 % and 64.7 %, representing improvements of 15.3 % and 10.6 % over the baseline ConformerSED system, respectively. Our work offers an effective SED solution for smart home projects by providing a semi-supervised SED system that performs well while reducing labeling costs.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494624009980","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Sound event detection (SED) is a core technology in smart home projects that rely on detected sound events to trigger specific actions. SED systems face two major challenges: high labeling costs and complex acoustic environments. To reduce labeling costs, some semi-supervised systems extract both global and local features for classification. However, these methods treat global and local features equally, not accounting for their varying importance when recognizing different types of sound events. Furthermore, to address complex acoustic environments, some studies use multitask learning frameworks to introduce SED-related tasks as auxiliaries to improve detection performance. However, these methods fail to align tasks within the framework, leading to conflicting outputs that may limit system performance. To address these issues, in this paper we propose a “two-to-one” teacher-student learning based semi-supervised SED system. This system employs a gating mechanism to selectively enhance global and local features, improving adaptability to different types of sound events, and incorporates a cross-task alignment module to interact SED with related tasks, reducing the risk of performance degradation caused by conflicting outputs. Experimental results on two datasets demonstrate that our system achieves the best performance in all metrics, with EB-F1 scores of 48.1 % and 64.7 %, representing improvements of 15.3 % and 10.6 % over the baseline ConformerSED system, respectively. Our work offers an effective SED solution for smart home projects by providing a semi-supervised SED system that performs well while reducing labeling costs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于 "二对一 "师生学习的智能家居声音事件检测支持系统

声音事件检测（SED）是智能家居项目中的一项核心技术，它依靠检测到的声音事件来触发特定操作。SED 系统面临两大挑战：高昂的标注成本和复杂的声学环境。为了降低标注成本，一些半监督系统同时提取全局和局部特征进行分类。然而，这些方法对全局和局部特征一视同仁，没有考虑到它们在识别不同类型声音事件时的不同重要性。此外，为了应对复杂的声学环境，一些研究利用多任务学习框架引入 SED 相关任务作为辅助工具，以提高检测性能。然而，这些方法未能协调框架内的任务，导致输出结果相互冲突，从而限制了系统性能。为了解决这些问题，我们在本文中提出了一种基于 "二对一 "师生学习的半监督 SED 系统。该系统采用门控机制，选择性地增强全局和局部特征，提高了对不同类型声音事件的适应性，并结合了跨任务对齐模块，将 SED 与相关任务进行交互，降低了因输出冲突而导致性能下降的风险。在两个数据集上的实验结果表明，我们的系统在所有指标上都取得了最佳性能，EB-F1 分数分别为 48.1 % 和 64.7 %，与基线 ConformerSED 系统相比分别提高了 15.3 % 和 10.6 %。我们的工作为智能家居项目提供了一种有效的 SED 解决方案，它提供了一种半监督 SED 系统，该系统性能良好，同时降低了标签成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.