{"title":"基于深度卷积神经网络的语音增强","authors":"Ramesh Nuthakki, Payel Masanta, Yukta T N","doi":"10.1109/I-SMAC52330.2021.9640736","DOIUrl":null,"url":null,"abstract":"Speech enhancement is the process of treating noisy speech signals so as to improve human perception as well as improve system understanding of the signal. For speech signals with medium or high signal to noise ratio (SNR), the aim is to produce subjectively pragmatic signal, and for signals having low SNR the aim is to reduce the noise while still maintaining the intelligibility. Many noise reduction algorithms improve overall speech quality but little progress has been made to improve the overall speech intelligibility. This paper proposes a deep convolutional neural network (DCNN) speech enhancement method by enhancing loss function such as extended short time objective ineligibility (ESTOI) and mean square error (MSE). These loss functions are improved using Harris Hawks Optimization (HHO). The enhanced speech signal is acquired by separating the clean speech signal from the noisy speech signal. By using various predictive measure of objective speech intelligibility like short time objective intelligibility, source to artefact ratio (SAR), coherence speech intelligibility index (CSII) and source to distortion ratio (SDR), the efficacy of speech enhancement is calculated. The quality of the enhanced speech signal is assessed using the quality measure such as speech distortion (SD) and perceptual evaluation of speech quality (PESQ).","PeriodicalId":178783,"journal":{"name":"2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Speech Enhancement based on Deep Convolutional Neural Network\",\"authors\":\"Ramesh Nuthakki, Payel Masanta, Yukta T N\",\"doi\":\"10.1109/I-SMAC52330.2021.9640736\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech enhancement is the process of treating noisy speech signals so as to improve human perception as well as improve system understanding of the signal. For speech signals with medium or high signal to noise ratio (SNR), the aim is to produce subjectively pragmatic signal, and for signals having low SNR the aim is to reduce the noise while still maintaining the intelligibility. Many noise reduction algorithms improve overall speech quality but little progress has been made to improve the overall speech intelligibility. This paper proposes a deep convolutional neural network (DCNN) speech enhancement method by enhancing loss function such as extended short time objective ineligibility (ESTOI) and mean square error (MSE). These loss functions are improved using Harris Hawks Optimization (HHO). The enhanced speech signal is acquired by separating the clean speech signal from the noisy speech signal. By using various predictive measure of objective speech intelligibility like short time objective intelligibility, source to artefact ratio (SAR), coherence speech intelligibility index (CSII) and source to distortion ratio (SDR), the efficacy of speech enhancement is calculated. The quality of the enhanced speech signal is assessed using the quality measure such as speech distortion (SD) and perceptual evaluation of speech quality (PESQ).\",\"PeriodicalId\":178783,\"journal\":{\"name\":\"2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/I-SMAC52330.2021.9640736\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I-SMAC52330.2021.9640736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech Enhancement based on Deep Convolutional Neural Network
Speech enhancement is the process of treating noisy speech signals so as to improve human perception as well as improve system understanding of the signal. For speech signals with medium or high signal to noise ratio (SNR), the aim is to produce subjectively pragmatic signal, and for signals having low SNR the aim is to reduce the noise while still maintaining the intelligibility. Many noise reduction algorithms improve overall speech quality but little progress has been made to improve the overall speech intelligibility. This paper proposes a deep convolutional neural network (DCNN) speech enhancement method by enhancing loss function such as extended short time objective ineligibility (ESTOI) and mean square error (MSE). These loss functions are improved using Harris Hawks Optimization (HHO). The enhanced speech signal is acquired by separating the clean speech signal from the noisy speech signal. By using various predictive measure of objective speech intelligibility like short time objective intelligibility, source to artefact ratio (SAR), coherence speech intelligibility index (CSII) and source to distortion ratio (SDR), the efficacy of speech enhancement is calculated. The quality of the enhanced speech signal is assessed using the quality measure such as speech distortion (SD) and perceptual evaluation of speech quality (PESQ).