Playout again Sam: Jitter Buffer Playout Adjustments Still an Issue for Speech Quality Prediction Models?

2020 31st Irish Signals and Systems Conference (ISSC) Pub Date : 2020-06-01 DOI:10.1109/ISSC49989.2020.9180163

Yusuf Cinar, P. Počta, Andrew Hines

{"title":"Playout again Sam: Jitter Buffer Playout Adjustments Still an Issue for Speech Quality Prediction Models?","authors":"Yusuf Cinar, P. Počta, Andrew Hines","doi":"10.1109/ISSC49989.2020.9180163","DOIUrl":null,"url":null,"abstract":"Objective speech quality assessment techniques, which use the perceptual models to emulate the human listening perception, have seen several revisions in the recent years. This study investigates the evolution of POLQA and ViSQOL models and scrutinise their latest versions. Prior work had identified weaknesses in both prediction models when presented with speech containing imperceptible playout adjustments. This study follows up the experiments to evaluate the progress and report the progress and the current issues, benchmarked against subjective listening quality scores. The assessment is conducted for all published versions of the POLQA and ViSQOL models and the evolution and improvement offered is analysed. We can conclude that the models have been improved in terms of imperceptible jitter buffer adjustments highlighted in prior work. This study also explores the performance of objective quality models and intelligibility (STOI and POLQA Intelligibility) models for a data set produced with realistic but extreme WebRTC scenarios using a standard and novel WebRTC jitter buffer strategy. An expert listening test was conducted to subjectively evaluate the WebRTC data set. It is observed that the standard WebRTC jitter buffer strategy produces more natural speech while the novel approach offers better intelligibility. The subjective and objective quality results suggest that the speech quality for standard jitter buffer were lower but more consistent than for the novel jitter buffer. The objective intelligibility results were conflicting. A followup study will conduct independent subjective evaluations of quality and intelligibility to further explore the relationship between the objective intelligibility and quality results.","PeriodicalId":351013,"journal":{"name":"2020 31st Irish Signals and Systems Conference (ISSC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 31st Irish Signals and Systems Conference (ISSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSC49989.2020.9180163","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Objective speech quality assessment techniques, which use the perceptual models to emulate the human listening perception, have seen several revisions in the recent years. This study investigates the evolution of POLQA and ViSQOL models and scrutinise their latest versions. Prior work had identified weaknesses in both prediction models when presented with speech containing imperceptible playout adjustments. This study follows up the experiments to evaluate the progress and report the progress and the current issues, benchmarked against subjective listening quality scores. The assessment is conducted for all published versions of the POLQA and ViSQOL models and the evolution and improvement offered is analysed. We can conclude that the models have been improved in terms of imperceptible jitter buffer adjustments highlighted in prior work. This study also explores the performance of objective quality models and intelligibility (STOI and POLQA Intelligibility) models for a data set produced with realistic but extreme WebRTC scenarios using a standard and novel WebRTC jitter buffer strategy. An expert listening test was conducted to subjectively evaluate the WebRTC data set. It is observed that the standard WebRTC jitter buffer strategy produces more natural speech while the novel approach offers better intelligibility. The subjective and objective quality results suggest that the speech quality for standard jitter buffer were lower but more consistent than for the novel jitter buffer. The objective intelligibility results were conflicting. A followup study will conduct independent subjective evaluations of quality and intelligibility to further explore the relationship between the objective intelligibility and quality results.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

再次播放Sam:抖动缓冲播放调整仍然是语音质量预测模型的问题?

客观语音质量评估技术使用感知模型来模拟人类的听力感知，近年来已经有了几次修订。本研究调查了POLQA和ViSQOL模型的演变，并仔细检查了它们的最新版本。先前的工作已经发现了两种预测模型在呈现包含难以察觉的播放调整的语音时的弱点。本研究对实验进行跟踪，以主观听力质量分数为基准，评估进展情况，报告进展情况和当前问题。对POLQA和ViSQOL模型的所有已发布版本进行了评估，并分析了所提供的演变和改进。我们可以得出结论，这些模型在先前工作中强调的难以察觉的抖动缓冲调整方面得到了改进。本研究还探讨了客观质量模型和可理解性(STOI和POLQA可理解性)模型的性能，该模型使用标准和新颖的WebRTC抖动缓冲策略，用于由现实但极端的WebRTC场景产生的数据集。通过专家听力测试对WebRTC数据集进行主观评价。观察到，标准的WebRTC抖动缓冲策略产生了更自然的语音，而新的方法提供了更好的可理解性。主观和客观质量结果表明，标准抖动缓冲器的语音质量比新型抖动缓冲器低，但一致性更好。客观的可理解性结果是相互矛盾的。后续研究将对质量和可理解性进行独立的主观评价，进一步探讨客观可理解性与质量结果之间的关系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 31st Irish Signals and Systems Conference (ISSC)

自引率

0.00%

发文量