3MT Competition (EUSIPCO2024): A peek into the black box: Insights into the functionality of complex-valued neural networks for multichannel speech enhancement
{"title":"3MT Competition (EUSIPCO2024): A peek into the black box: Insights into the functionality of complex-valued neural networks for multichannel speech enhancement","authors":"Annika Briegleb","doi":"10.1016/j.sctalk.2025.100430","DOIUrl":null,"url":null,"abstract":"<div><div>Artificial neural networks (ANNs) have become an important part of signal processing research. While ANNs outperform model-based signal processing methods in many applications, their internal processing often remains unclear. In this thesis, a framework for analyzing the signal processing performed by ANN-based filters for multichannel speech enhancement is proposed. By designing specific training and test scenarios that allow to associate each time frame with certain information, e.g., spatial cues, and using low-cost analysis tools such as clustering, interpretable information can be extracted from the hidden features of the ANN. The proposed framework allows to assess whether and where spatial information is represented inside the ANN, answering the question whether these ANNs exploit spatial cues in addition to spectral information. Furthermore, the impact of the choice of training target on the functionality and interpretability of the ANN is considered. By applying the proposed analysis tools to two conceptually different speech enhancement frameworks, it is shown that the amount of spatial information extracted inside the ANN varies depending on the training target and the test scenario. The insights from this thesis help to assess the signal processing capabilities of ANNs and allow to make informed decisions when configuring, training, and deploying ANNs.</div></div>","PeriodicalId":101148,"journal":{"name":"Science Talks","volume":"13 ","pages":"Article 100430"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science Talks","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S277256932500012X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Artificial neural networks (ANNs) have become an important part of signal processing research. While ANNs outperform model-based signal processing methods in many applications, their internal processing often remains unclear. In this thesis, a framework for analyzing the signal processing performed by ANN-based filters for multichannel speech enhancement is proposed. By designing specific training and test scenarios that allow to associate each time frame with certain information, e.g., spatial cues, and using low-cost analysis tools such as clustering, interpretable information can be extracted from the hidden features of the ANN. The proposed framework allows to assess whether and where spatial information is represented inside the ANN, answering the question whether these ANNs exploit spatial cues in addition to spectral information. Furthermore, the impact of the choice of training target on the functionality and interpretability of the ANN is considered. By applying the proposed analysis tools to two conceptually different speech enhancement frameworks, it is shown that the amount of spatial information extracted inside the ANN varies depending on the training target and the test scenario. The insights from this thesis help to assess the signal processing capabilities of ANNs and allow to make informed decisions when configuring, training, and deploying ANNs.