3MT Competition (EUSIPCO2024): A peek into the black box: Insights into the functionality of complex-valued neural networks for multichannel speech enhancement

Science Talks Pub Date : 2025-01-30 DOI:10.1016/j.sctalk.2025.100430

Annika Briegleb

{"title":"3MT Competition (EUSIPCO2024): A peek into the black box: Insights into the functionality of complex-valued neural networks for multichannel speech enhancement","authors":"Annika Briegleb","doi":"10.1016/j.sctalk.2025.100430","DOIUrl":null,"url":null,"abstract":"<div><div>Artificial neural networks (ANNs) have become an important part of signal processing research. While ANNs outperform model-based signal processing methods in many applications, their internal processing often remains unclear. In this thesis, a framework for analyzing the signal processing performed by ANN-based filters for multichannel speech enhancement is proposed. By designing specific training and test scenarios that allow to associate each time frame with certain information, e.g., spatial cues, and using low-cost analysis tools such as clustering, interpretable information can be extracted from the hidden features of the ANN. The proposed framework allows to assess whether and where spatial information is represented inside the ANN, answering the question whether these ANNs exploit spatial cues in addition to spectral information. Furthermore, the impact of the choice of training target on the functionality and interpretability of the ANN is considered. By applying the proposed analysis tools to two conceptually different speech enhancement frameworks, it is shown that the amount of spatial information extracted inside the ANN varies depending on the training target and the test scenario. The insights from this thesis help to assess the signal processing capabilities of ANNs and allow to make informed decisions when configuring, training, and deploying ANNs.</div></div>","PeriodicalId":101148,"journal":{"name":"Science Talks","volume":"13 ","pages":"Article 100430"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science Talks","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S277256932500012X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Artificial neural networks (ANNs) have become an important part of signal processing research. While ANNs outperform model-based signal processing methods in many applications, their internal processing often remains unclear. In this thesis, a framework for analyzing the signal processing performed by ANN-based filters for multichannel speech enhancement is proposed. By designing specific training and test scenarios that allow to associate each time frame with certain information, e.g., spatial cues, and using low-cost analysis tools such as clustering, interpretable information can be extracted from the hidden features of the ANN. The proposed framework allows to assess whether and where spatial information is represented inside the ANN, answering the question whether these ANNs exploit spatial cues in addition to spectral information. Furthermore, the impact of the choice of training target on the functionality and interpretability of the ANN is considered. By applying the proposed analysis tools to two conceptually different speech enhancement frameworks, it is shown that the amount of spatial information extracted inside the ANN varies depending on the training target and the test scenario. The insights from this thesis help to assess the signal processing capabilities of ANNs and allow to make informed decisions when configuring, training, and deploying ANNs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Science Talks

自引率

0.00%

发文量