{"title":"A T-F Masking based Monaural Speech Enhancement using U-Net Architecture","authors":"Khadija Akter, Nursadul Mamun, Md.Azad Hossain","doi":"10.1109/ECCE57851.2023.10101608","DOIUrl":null,"url":null,"abstract":"In a real-world environment, the intelligibility and quality of speech are reduced inevitably when it is encountered noises. Speech enhancement aims to have reconstructed clean speech by suppressing unwanted ambient noise. Numerous types of research have been accomplished for this enhancement t3ask; some of them uses spectral mapping technique but fails somewhere in real-life condition. This study proposes a time-frequency (T-F) masking-based speech enhancement approach which resembles the human auditory peripheral subsystem using the ratio of clean and noisy signal magnitudes using a U-Net model. The proposed work is carried out in several seen and unseen noisy conditions with several SNR values. To assess the performance of the proposed enhancement approach, speech intelligibility, and quality scores using four objective scores have been evaluated. The proposed network showed improvement in terms of objective scores and spectral mapping-based method over state-of-art networks.","PeriodicalId":131537,"journal":{"name":"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)","volume":"137 1-2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECCE57851.2023.10101608","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In a real-world environment, the intelligibility and quality of speech are reduced inevitably when it is encountered noises. Speech enhancement aims to have reconstructed clean speech by suppressing unwanted ambient noise. Numerous types of research have been accomplished for this enhancement t3ask; some of them uses spectral mapping technique but fails somewhere in real-life condition. This study proposes a time-frequency (T-F) masking-based speech enhancement approach which resembles the human auditory peripheral subsystem using the ratio of clean and noisy signal magnitudes using a U-Net model. The proposed work is carried out in several seen and unseen noisy conditions with several SNR values. To assess the performance of the proposed enhancement approach, speech intelligibility, and quality scores using four objective scores have been evaluated. The proposed network showed improvement in terms of objective scores and spectral mapping-based method over state-of-art networks.