Should WebRTC Prioritise Intelligibility over Speech Quality?

P. Sun, Andrew Hines
{"title":"Should WebRTC Prioritise Intelligibility over Speech Quality?","authors":"P. Sun, Andrew Hines","doi":"10.1109/ISSC49989.2020.9180210","DOIUrl":null,"url":null,"abstract":"Network delay remains a challenge for real-time voice communication on the web. Jitter buffer algorithms have been widely deployed in popular platforms such as webRTC to reduce the impact of delay with playout adjustments. A trade off must be made between speech loss and voice degradations as adjustments can either drop segments resulting in a loss of speech intelligibility or change the rate of playout and impact the pitch or natural sound of the speech. Both options can negatively influence a listener's quality of experience (QoE). Optimising this trade-off requires knowledge of how intelligibility and quality are perceived and priorities when a listener syntheses both factors into a fused QoE judgement. This study conducted two subjective experiments to evaluate intelligibility and quality independently along with a short descriptive analysis to address the interplay between the two factors. The study uses a dataset that simulated listener-end speech under extreme but realistic network delay conditions using webRTC's standard jitter buffer and a variation that prioritised minimisation of packet loss. The results show that intelligibility is a key dimension in quality judgement for the scenarios tested. As a result, this study calls for attention when comparing the quality scores as the overlooked non-traditional quality attributes are proven to be actively contributing to the overall QoE. The descriptive analysis also indicates there is inconsistency in the interpretation of ‘quality’ among the assessors. This finding questions the methodology used in standard QoE subjective experiment designs and proposes adopting a more flexible approach to measure subjective QoE.","PeriodicalId":351013,"journal":{"name":"2020 31st Irish Signals and Systems Conference (ISSC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 31st Irish Signals and Systems Conference (ISSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSC49989.2020.9180210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Network delay remains a challenge for real-time voice communication on the web. Jitter buffer algorithms have been widely deployed in popular platforms such as webRTC to reduce the impact of delay with playout adjustments. A trade off must be made between speech loss and voice degradations as adjustments can either drop segments resulting in a loss of speech intelligibility or change the rate of playout and impact the pitch or natural sound of the speech. Both options can negatively influence a listener's quality of experience (QoE). Optimising this trade-off requires knowledge of how intelligibility and quality are perceived and priorities when a listener syntheses both factors into a fused QoE judgement. This study conducted two subjective experiments to evaluate intelligibility and quality independently along with a short descriptive analysis to address the interplay between the two factors. The study uses a dataset that simulated listener-end speech under extreme but realistic network delay conditions using webRTC's standard jitter buffer and a variation that prioritised minimisation of packet loss. The results show that intelligibility is a key dimension in quality judgement for the scenarios tested. As a result, this study calls for attention when comparing the quality scores as the overlooked non-traditional quality attributes are proven to be actively contributing to the overall QoE. The descriptive analysis also indicates there is inconsistency in the interpretation of ‘quality’ among the assessors. This finding questions the methodology used in standard QoE subjective experiment designs and proposes adopting a more flexible approach to measure subjective QoE.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
webbrtc是否应该优先考虑可理解性而不是语音质量?
网络延迟仍然是网络上实时语音通信的一个挑战。抖动缓冲算法已广泛部署在流行的平台,如webRTC,以减少延迟与播放调整的影响。必须在语音损失和语音退化之间进行权衡,因为调整可能会导致语音清晰度损失或改变播放速度并影响语音的音高或自然声音。这两种选择都会对听众的体验质量(QoE)产生负面影响。优化这种权衡需要了解可理解性和质量是如何被感知的,以及当听众将这两个因素综合成一个融合的QoE判断时的优先级。本研究进行了两个主观实验来独立评估可理解性和质量,并进行了简短的描述性分析,以解决这两个因素之间的相互作用。该研究使用了一个数据集,该数据集模拟了极端但现实的网络延迟条件下的听众端语音,使用了webRTC的标准抖动缓冲器和优先最小化数据包丢失的变体。结果表明,可理解性是测试场景质量判断的关键维度。因此,本研究在比较质量分数时需要注意,因为被忽视的非传统质量属性被证明对整体质量评价有积极的贡献。描述性分析还表明,评估人员对“质量”的解释存在不一致。这一发现对标准QoE主观实验设计中使用的方法提出了质疑,并建议采用更灵活的方法来测量主观QoE。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Effects of Intra-Subject Variation in Gait Analysis on ASD Classification Performance in Machine Learning Models Practical Implementation of APTs on PTP Time Synchronisation Networks Not Everything You Read Is True! Fake News Detection using Machine learning Algorithms Semi-Supervised Learning with Generative Adversarial Networks for Pathological Speech Classification Reduced Complexity Approach for Uplink Rate Trajectory Prediction in Mobile Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1