{"title":"基于多分辨率听觉模型(MRAM)特征的非侵入性客观语音质量评价","authors":"R. Dubey, Arun Kumar","doi":"10.1109/CIPECH.2016.7918776","DOIUrl":null,"url":null,"abstract":"The effects of short time-transients of additive noise present over some specific active regions in speech utterances cannot be captured in features computed over an entire speech utterance. Thus the uses of multiple time-scale estimates of auditory features have been sought in this work for non-intrusive speech quality evaluation. It is capable in capturing the time localized information of short-time transient distortions and their distinction from plosive sounds of speech. The features are computed from the combination of different active speech regions of a speech utterance using multi-resolution auditory model (MRAM) on frame-by-frame basis. The voice activity detection (VAD) algorithm has been used for the selection of active speech regions and rejection of silence region from the speech utterance. The multiple time-scale MRAM features are probabilistically modelled to map into mean opinion score (MOS) value using Gaussian Mixture Model (GMM) for each combination of active speech regions. The average value of these multiple time-scale estimates MOS values of the different combinations of active speech regions give the overall objective MOS value of a degraded speech utterance. The results are given in terms of correlation coefficient between the subjective MOS and the overall objective MOS. The results are also compared with the ITU-T Recommendation P.563, the standard for non-intrusive speech quality assessment for telephone band speech.","PeriodicalId":247543,"journal":{"name":"2016 Second International Innovative Applications of Computational Intelligence on Power, Energy and Controls with their Impact on Humanity (CIPECH)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Non-intrusive objective speech quality evaluation using multiple time-scale estimates of multi-resolution auditory model (MRAM) features\",\"authors\":\"R. Dubey, Arun Kumar\",\"doi\":\"10.1109/CIPECH.2016.7918776\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The effects of short time-transients of additive noise present over some specific active regions in speech utterances cannot be captured in features computed over an entire speech utterance. Thus the uses of multiple time-scale estimates of auditory features have been sought in this work for non-intrusive speech quality evaluation. It is capable in capturing the time localized information of short-time transient distortions and their distinction from plosive sounds of speech. The features are computed from the combination of different active speech regions of a speech utterance using multi-resolution auditory model (MRAM) on frame-by-frame basis. The voice activity detection (VAD) algorithm has been used for the selection of active speech regions and rejection of silence region from the speech utterance. The multiple time-scale MRAM features are probabilistically modelled to map into mean opinion score (MOS) value using Gaussian Mixture Model (GMM) for each combination of active speech regions. The average value of these multiple time-scale estimates MOS values of the different combinations of active speech regions give the overall objective MOS value of a degraded speech utterance. The results are given in terms of correlation coefficient between the subjective MOS and the overall objective MOS. The results are also compared with the ITU-T Recommendation P.563, the standard for non-intrusive speech quality assessment for telephone band speech.\",\"PeriodicalId\":247543,\"journal\":{\"name\":\"2016 Second International Innovative Applications of Computational Intelligence on Power, Energy and Controls with their Impact on Humanity (CIPECH)\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Second International Innovative Applications of Computational Intelligence on Power, Energy and Controls with their Impact on Humanity (CIPECH)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIPECH.2016.7918776\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Second International Innovative Applications of Computational Intelligence on Power, Energy and Controls with their Impact on Humanity (CIPECH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIPECH.2016.7918776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Non-intrusive objective speech quality evaluation using multiple time-scale estimates of multi-resolution auditory model (MRAM) features
The effects of short time-transients of additive noise present over some specific active regions in speech utterances cannot be captured in features computed over an entire speech utterance. Thus the uses of multiple time-scale estimates of auditory features have been sought in this work for non-intrusive speech quality evaluation. It is capable in capturing the time localized information of short-time transient distortions and their distinction from plosive sounds of speech. The features are computed from the combination of different active speech regions of a speech utterance using multi-resolution auditory model (MRAM) on frame-by-frame basis. The voice activity detection (VAD) algorithm has been used for the selection of active speech regions and rejection of silence region from the speech utterance. The multiple time-scale MRAM features are probabilistically modelled to map into mean opinion score (MOS) value using Gaussian Mixture Model (GMM) for each combination of active speech regions. The average value of these multiple time-scale estimates MOS values of the different combinations of active speech regions give the overall objective MOS value of a degraded speech utterance. The results are given in terms of correlation coefficient between the subjective MOS and the overall objective MOS. The results are also compared with the ITU-T Recommendation P.563, the standard for non-intrusive speech quality assessment for telephone band speech.