{"title":"On the Generalization Ability of Complex-Valued Variational U-Networks for Single-Channel Speech Enhancement","authors":"Eike J. Nustede;Jörn Anemüller","doi":"10.1109/TASLP.2024.3444492","DOIUrl":null,"url":null,"abstract":"The ability to generalize well to different environments is of importance for audio de-noising systems in real-world scenarios. Especially single-channel signals require efficient noise filtering without impacting speech intelligibility negatively. Our previous work has shown that a probabilistic latent space model combined with a U-Network architecture increases performance and generalization ability to some extent. Here, we further evaluate magnitude-only, as well as complex-valued U-Network models, on two different datasets, and in a train-test mismatch scenario. Adaptability of models is evaluated by introducing a curve-based score similar to area-under-the-curve metrics. The proposed probabilistic latent space models outperform their ablated variants in most conditions, as well as well-known comparison methods, while increases in network size are negligible. Improvements of up to 0.97 dB SI-SDR in matched, and 2.72 dB SI-SDR in mismatched conditions are observed, with highest total SI-SDR scores of 20.21 dB and 18.71 dB, respectively. The proposed stability-score aligns well with observed performance behaviour, further validating the probabilistic latent space model.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3838-3849"},"PeriodicalIF":4.1000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10637717","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10637717/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
The ability to generalize well to different environments is of importance for audio de-noising systems in real-world scenarios. Especially single-channel signals require efficient noise filtering without impacting speech intelligibility negatively. Our previous work has shown that a probabilistic latent space model combined with a U-Network architecture increases performance and generalization ability to some extent. Here, we further evaluate magnitude-only, as well as complex-valued U-Network models, on two different datasets, and in a train-test mismatch scenario. Adaptability of models is evaluated by introducing a curve-based score similar to area-under-the-curve metrics. The proposed probabilistic latent space models outperform their ablated variants in most conditions, as well as well-known comparison methods, while increases in network size are negligible. Improvements of up to 0.97 dB SI-SDR in matched, and 2.72 dB SI-SDR in mismatched conditions are observed, with highest total SI-SDR scores of 20.21 dB and 18.71 dB, respectively. The proposed stability-score aligns well with observed performance behaviour, further validating the probabilistic latent space model.
期刊介绍:
The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.