AMBIQUAL - a full reference objective quality metric for ambisonic spatial audio

M. Narbutt, Andrew Allen, J. Skoglund, Michael Chinen, Andrew Hines
{"title":"AMBIQUAL - a full reference objective quality metric for ambisonic spatial audio","authors":"M. Narbutt, Andrew Allen, J. Skoglund, Michael Chinen, Andrew Hines","doi":"10.1109/QoMEX.2018.8463408","DOIUrl":null,"url":null,"abstract":"Streaming spatial audio over networks requires efficient encoding techniques that compress the raw audio content without compromising quality of experience. Streaming service providers such as YouTube need a perceptually relevant objective audio quality metric to monitor users' perceived quality and spatial localization accuracy. In this paper we introduce a full reference objective spatial audio quality metric, AMBIQUAL, which assesses both Listening Quality and Localization Accuracy. In our solution both metrics are derived directly from the B-format Ambisonic audio. The metric extends and adapts the algorithm used in ViSQOLAudio, a full reference objective metric designed for assessing speech and audio quality. In particular, Listening Quality is derived from the omnidirectional channel and Localization Accuracy is derived from a weighted sum of similarity from B-format directional channels. This paper evaluates whether the proposed AMBIQUAL objective spatial audio quality metric can predict two factors: Listening Quality and Localization Accuracy by comparing its predictions with results from MUSHRA subjective listening tests. In particular, we evaluated the Listening Quality and Localization Accuracy of First and Third-Order Ambisonic audio compressed with the OPUS 1.2 codec at various bitrates (i.e. 32, 128 and 256, 512kbps respectively). The sample set for the tests comprised both recorded and synthetic audio clips with a wide range of time-frequency characteristics. To evaluate Localization Accuracy of compressed audio a number of fixed and dynamic (moving vertically and horizontally) source positions were selected for the test samples. Results showed a strong correlation (PCC=0.919; Spearman=0.882 regarding Listening Quality and PCC=0.854; Spearman=0.842 regarding Localization Accuracy) between objective quality scores derived from the B-format Ambisonic audio using AMBIQUAL and subjective scores obtained during listening MUSHRA tests. AMBIQUAL displays very promising quality assessment predictions for spatial audio. Future work will optimise the algorithm to generalise and validate it for any Higher Order Ambisonic formats.","PeriodicalId":6618,"journal":{"name":"2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX)","volume":"146 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QoMEX.2018.8463408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

Streaming spatial audio over networks requires efficient encoding techniques that compress the raw audio content without compromising quality of experience. Streaming service providers such as YouTube need a perceptually relevant objective audio quality metric to monitor users' perceived quality and spatial localization accuracy. In this paper we introduce a full reference objective spatial audio quality metric, AMBIQUAL, which assesses both Listening Quality and Localization Accuracy. In our solution both metrics are derived directly from the B-format Ambisonic audio. The metric extends and adapts the algorithm used in ViSQOLAudio, a full reference objective metric designed for assessing speech and audio quality. In particular, Listening Quality is derived from the omnidirectional channel and Localization Accuracy is derived from a weighted sum of similarity from B-format directional channels. This paper evaluates whether the proposed AMBIQUAL objective spatial audio quality metric can predict two factors: Listening Quality and Localization Accuracy by comparing its predictions with results from MUSHRA subjective listening tests. In particular, we evaluated the Listening Quality and Localization Accuracy of First and Third-Order Ambisonic audio compressed with the OPUS 1.2 codec at various bitrates (i.e. 32, 128 and 256, 512kbps respectively). The sample set for the tests comprised both recorded and synthetic audio clips with a wide range of time-frequency characteristics. To evaluate Localization Accuracy of compressed audio a number of fixed and dynamic (moving vertically and horizontally) source positions were selected for the test samples. Results showed a strong correlation (PCC=0.919; Spearman=0.882 regarding Listening Quality and PCC=0.854; Spearman=0.842 regarding Localization Accuracy) between objective quality scores derived from the B-format Ambisonic audio using AMBIQUAL and subjective scores obtained during listening MUSHRA tests. AMBIQUAL displays very promising quality assessment predictions for spatial audio. Future work will optimise the algorithm to generalise and validate it for any Higher Order Ambisonic formats.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AMBIQUAL -一个完整的参考客观质量度量双声空间音频
通过网络传输空间音频需要高效的编码技术,在不影响体验质量的情况下压缩原始音频内容。流媒体服务提供商(如YouTube)需要一个感知相关的客观音频质量指标来监控用户的感知质量和空间定位准确性。在本文中,我们介绍了一个完整的参考客观空间音频质量度量,AMBIQUAL,它同时评估聆听质量和定位精度。在我们的解决方案中,这两个指标都直接来自b格式的Ambisonic音频。该度量扩展并适应了ViSQOLAudio中使用的算法,ViSQOLAudio是一个用于评估语音和音频质量的完整参考客观度量。其中,收听质量来源于全向信道,定位精度来源于b格式方向信道的相似度加权和。本文通过将所提出的AMBIQUAL客观空间音质指标的预测结果与MUSHRA主观听力测试结果进行比较,评估其是否能够预测两个因素:听力质量和定位精度。特别是,我们评估了用OPUS 1.2编解码器在不同比特率(分别为32、128和256、512kbps)下压缩的一阶和三阶双声道音频的收听质量和定位精度。测试的样本集包括具有广泛时间频率特性的录制和合成音频剪辑。为了评估压缩音频的定位精度,选择了一些固定和动态(垂直和水平移动)的源位置作为测试样本。结果显示相关性强(PCC=0.919;Spearman=0.882, PCC=0.854;使用AMBIQUAL从b格式双声道音频中获得的客观质量分数与在听MUSHRA测试中获得的主观分数之间的Spearman=0.842(定位精度)。AMBIQUAL显示了非常有前途的空间音频质量评估预测。未来的工作将优化算法,以推广和验证任何高阶双音格式。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
AMBIQUAL - a full reference objective quality metric for ambisonic spatial audio Getting Crevices, Cracks, and Grooves in Line: Anomaly Categorization for AQC Judgment Models Impact of Virtual Environments on Motivation and Engagement During Exergames Evaluation of preference of multimedia content using deep neural networks for electroencephalography A Comparative Quality Assessment Study for Gaming and Non-Gaming Videos
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1