K. Nathwani, Faizal M. F. Hafiz, A. Swain, R. Biswas
{"title":"Speech Intelligibility Enhancement using an Optimal Formant Shifting Approach","authors":"K. Nathwani, Faizal M. F. Hafiz, A. Swain, R. Biswas","doi":"10.1109/ISPA52656.2021.9552080","DOIUrl":null,"url":null,"abstract":"The present study proposes a novel delta function-based optimal shift in formants for enhancing the near-end speech intelligibility. The delta function being used here is trapezoidal in shape. The shaping parameters of this delta function are determined using comprehensive learning particle swarm optimization (CLPSO) which maximizes the short time objective intelligibility (STOI) of speech sequences. The proposed method does not require the knowledge of noise statistics in designing the delta function. Further, the proposed method does not require post-processing in terms of the computation of smoothing of the shifted formants. The performance of the proposed method is illustrated using speech signals from the Hearing In Noise Test (HINT) French database by including the engine noise from a car running at 130 km/h. The results of the investigation, at various SNRs, convincingly demonstrate that the optimal delta function (function with the optimized parameters) could significantly improve the speech intelligibility at very low SNRs while preserving the quality and naturalness of the sound.","PeriodicalId":131088,"journal":{"name":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","volume":"353-358 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPA52656.2021.9552080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The present study proposes a novel delta function-based optimal shift in formants for enhancing the near-end speech intelligibility. The delta function being used here is trapezoidal in shape. The shaping parameters of this delta function are determined using comprehensive learning particle swarm optimization (CLPSO) which maximizes the short time objective intelligibility (STOI) of speech sequences. The proposed method does not require the knowledge of noise statistics in designing the delta function. Further, the proposed method does not require post-processing in terms of the computation of smoothing of the shifted formants. The performance of the proposed method is illustrated using speech signals from the Hearing In Noise Test (HINT) French database by including the engine noise from a car running at 130 km/h. The results of the investigation, at various SNRs, convincingly demonstrate that the optimal delta function (function with the optimized parameters) could significantly improve the speech intelligibility at very low SNRs while preserving the quality and naturalness of the sound.