Should WebRTC Prioritise Intelligibility over Speech Quality?

2020 31st Irish Signals and Systems Conference (ISSC) Pub Date : 2020-06-01 DOI:10.1109/ISSC49989.2020.9180210

P. Sun, Andrew Hines

{"title":"Should WebRTC Prioritise Intelligibility over Speech Quality?","authors":"P. Sun, Andrew Hines","doi":"10.1109/ISSC49989.2020.9180210","DOIUrl":null,"url":null,"abstract":"Network delay remains a challenge for real-time voice communication on the web. Jitter buffer algorithms have been widely deployed in popular platforms such as webRTC to reduce the impact of delay with playout adjustments. A trade off must be made between speech loss and voice degradations as adjustments can either drop segments resulting in a loss of speech intelligibility or change the rate of playout and impact the pitch or natural sound of the speech. Both options can negatively influence a listener's quality of experience (QoE). Optimising this trade-off requires knowledge of how intelligibility and quality are perceived and priorities when a listener syntheses both factors into a fused QoE judgement. This study conducted two subjective experiments to evaluate intelligibility and quality independently along with a short descriptive analysis to address the interplay between the two factors. The study uses a dataset that simulated listener-end speech under extreme but realistic network delay conditions using webRTC's standard jitter buffer and a variation that prioritised minimisation of packet loss. The results show that intelligibility is a key dimension in quality judgement for the scenarios tested. As a result, this study calls for attention when comparing the quality scores as the overlooked non-traditional quality attributes are proven to be actively contributing to the overall QoE. The descriptive analysis also indicates there is inconsistency in the interpretation of ‘quality’ among the assessors. This finding questions the methodology used in standard QoE subjective experiment designs and proposes adopting a more flexible approach to measure subjective QoE.","PeriodicalId":351013,"journal":{"name":"2020 31st Irish Signals and Systems Conference (ISSC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 31st Irish Signals and Systems Conference (ISSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSC49989.2020.9180210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Network delay remains a challenge for real-time voice communication on the web. Jitter buffer algorithms have been widely deployed in popular platforms such as webRTC to reduce the impact of delay with playout adjustments. A trade off must be made between speech loss and voice degradations as adjustments can either drop segments resulting in a loss of speech intelligibility or change the rate of playout and impact the pitch or natural sound of the speech. Both options can negatively influence a listener's quality of experience (QoE). Optimising this trade-off requires knowledge of how intelligibility and quality are perceived and priorities when a listener syntheses both factors into a fused QoE judgement. This study conducted two subjective experiments to evaluate intelligibility and quality independently along with a short descriptive analysis to address the interplay between the two factors. The study uses a dataset that simulated listener-end speech under extreme but realistic network delay conditions using webRTC's standard jitter buffer and a variation that prioritised minimisation of packet loss. The results show that intelligibility is a key dimension in quality judgement for the scenarios tested. As a result, this study calls for attention when comparing the quality scores as the overlooked non-traditional quality attributes are proven to be actively contributing to the overall QoE. The descriptive analysis also indicates there is inconsistency in the interpretation of ‘quality’ among the assessors. This finding questions the methodology used in standard QoE subjective experiment designs and proposes adopting a more flexible approach to measure subjective QoE.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

webbrtc是否应该优先考虑可理解性而不是语音质量?

网络延迟仍然是网络上实时语音通信的一个挑战。抖动缓冲算法已广泛部署在流行的平台，如webRTC，以减少延迟与播放调整的影响。必须在语音损失和语音退化之间进行权衡，因为调整可能会导致语音清晰度损失或改变播放速度并影响语音的音高或自然声音。这两种选择都会对听众的体验质量(QoE)产生负面影响。优化这种权衡需要了解可理解性和质量是如何被感知的，以及当听众将这两个因素综合成一个融合的QoE判断时的优先级。本研究进行了两个主观实验来独立评估可理解性和质量，并进行了简短的描述性分析，以解决这两个因素之间的相互作用。该研究使用了一个数据集，该数据集模拟了极端但现实的网络延迟条件下的听众端语音，使用了webRTC的标准抖动缓冲器和优先最小化数据包丢失的变体。结果表明，可理解性是测试场景质量判断的关键维度。因此，本研究在比较质量分数时需要注意，因为被忽视的非传统质量属性被证明对整体质量评价有积极的贡献。描述性分析还表明，评估人员对“质量”的解释存在不一致。这一发现对标准QoE主观实验设计中使用的方法提出了质疑，并建议采用更灵活的方法来测量主观QoE。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 31st Irish Signals and Systems Conference (ISSC)

自引率

0.00%

发文量