{"title":"再次播放Sam:抖动缓冲播放调整仍然是语音质量预测模型的问题?","authors":"Yusuf Cinar, P. Počta, Andrew Hines","doi":"10.1109/ISSC49989.2020.9180163","DOIUrl":null,"url":null,"abstract":"Objective speech quality assessment techniques, which use the perceptual models to emulate the human listening perception, have seen several revisions in the recent years. This study investigates the evolution of POLQA and ViSQOL models and scrutinise their latest versions. Prior work had identified weaknesses in both prediction models when presented with speech containing imperceptible playout adjustments. This study follows up the experiments to evaluate the progress and report the progress and the current issues, benchmarked against subjective listening quality scores. The assessment is conducted for all published versions of the POLQA and ViSQOL models and the evolution and improvement offered is analysed. We can conclude that the models have been improved in terms of imperceptible jitter buffer adjustments highlighted in prior work. This study also explores the performance of objective quality models and intelligibility (STOI and POLQA Intelligibility) models for a data set produced with realistic but extreme WebRTC scenarios using a standard and novel WebRTC jitter buffer strategy. An expert listening test was conducted to subjectively evaluate the WebRTC data set. It is observed that the standard WebRTC jitter buffer strategy produces more natural speech while the novel approach offers better intelligibility. The subjective and objective quality results suggest that the speech quality for standard jitter buffer were lower but more consistent than for the novel jitter buffer. The objective intelligibility results were conflicting. A followup study will conduct independent subjective evaluations of quality and intelligibility to further explore the relationship between the objective intelligibility and quality results.","PeriodicalId":351013,"journal":{"name":"2020 31st Irish Signals and Systems Conference (ISSC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Playout again Sam: Jitter Buffer Playout Adjustments Still an Issue for Speech Quality Prediction Models?\",\"authors\":\"Yusuf Cinar, P. Počta, Andrew Hines\",\"doi\":\"10.1109/ISSC49989.2020.9180163\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective speech quality assessment techniques, which use the perceptual models to emulate the human listening perception, have seen several revisions in the recent years. This study investigates the evolution of POLQA and ViSQOL models and scrutinise their latest versions. Prior work had identified weaknesses in both prediction models when presented with speech containing imperceptible playout adjustments. This study follows up the experiments to evaluate the progress and report the progress and the current issues, benchmarked against subjective listening quality scores. The assessment is conducted for all published versions of the POLQA and ViSQOL models and the evolution and improvement offered is analysed. We can conclude that the models have been improved in terms of imperceptible jitter buffer adjustments highlighted in prior work. This study also explores the performance of objective quality models and intelligibility (STOI and POLQA Intelligibility) models for a data set produced with realistic but extreme WebRTC scenarios using a standard and novel WebRTC jitter buffer strategy. An expert listening test was conducted to subjectively evaluate the WebRTC data set. It is observed that the standard WebRTC jitter buffer strategy produces more natural speech while the novel approach offers better intelligibility. The subjective and objective quality results suggest that the speech quality for standard jitter buffer were lower but more consistent than for the novel jitter buffer. The objective intelligibility results were conflicting. A followup study will conduct independent subjective evaluations of quality and intelligibility to further explore the relationship between the objective intelligibility and quality results.\",\"PeriodicalId\":351013,\"journal\":{\"name\":\"2020 31st Irish Signals and Systems Conference (ISSC)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 31st Irish Signals and Systems Conference (ISSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSC49989.2020.9180163\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 31st Irish Signals and Systems Conference (ISSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSC49989.2020.9180163","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Playout again Sam: Jitter Buffer Playout Adjustments Still an Issue for Speech Quality Prediction Models?
Objective speech quality assessment techniques, which use the perceptual models to emulate the human listening perception, have seen several revisions in the recent years. This study investigates the evolution of POLQA and ViSQOL models and scrutinise their latest versions. Prior work had identified weaknesses in both prediction models when presented with speech containing imperceptible playout adjustments. This study follows up the experiments to evaluate the progress and report the progress and the current issues, benchmarked against subjective listening quality scores. The assessment is conducted for all published versions of the POLQA and ViSQOL models and the evolution and improvement offered is analysed. We can conclude that the models have been improved in terms of imperceptible jitter buffer adjustments highlighted in prior work. This study also explores the performance of objective quality models and intelligibility (STOI and POLQA Intelligibility) models for a data set produced with realistic but extreme WebRTC scenarios using a standard and novel WebRTC jitter buffer strategy. An expert listening test was conducted to subjectively evaluate the WebRTC data set. It is observed that the standard WebRTC jitter buffer strategy produces more natural speech while the novel approach offers better intelligibility. The subjective and objective quality results suggest that the speech quality for standard jitter buffer were lower but more consistent than for the novel jitter buffer. The objective intelligibility results were conflicting. A followup study will conduct independent subjective evaluations of quality and intelligibility to further explore the relationship between the objective intelligibility and quality results.