Towards Improved Objective Perceptual Audio Quality Assessment - Part 1: A Novel Data-Driven Cognitive Model

IF 4.1 2区 计算机科学 Q1 ACOUSTICS IEEE/ACM Transactions on Audio, Speech, and Language Processing Pub Date : 2024-10-09 DOI:10.1109/TASLP.2024.3477291
Pablo M. Delgado;Jürgen Herre
{"title":"Towards Improved Objective Perceptual Audio Quality Assessment - Part 1: A Novel Data-Driven Cognitive Model","authors":"Pablo M. Delgado;Jürgen Herre","doi":"10.1109/TASLP.2024.3477291","DOIUrl":null,"url":null,"abstract":"Efficient audioquality assessment is vital for streamlining audio codec development. Objective assessment tools have been developed over time to algorithmically predict quality ratings from subjective assessments, the gold standard for quality judgment. Many of these tools use perceptual auditory models to extract audio features that are mapped to a basic audio quality score prediction using machine learning algorithms and subjective scores as training data. However, existing tools struggle with generalization in quality prediction, especially when faced with unknown signal and distortion types. This is particularly evident in the presence of signals coded using non-waveform-preserving parametric techniques. Addressing these challenges, this two-part work proposes extensions to the Perceptual Evaluation of Audio Quality (PEAQ - ITU-R BS.1387-1) recommendation. Part 1 focuses on increasing generalization, while Part 2 targets accurate spatial audio quality measurement in audio coding. To enhance prediction generalization, this paper (Part 1) introduces a novel machine learning approach that uses subjective data to model cognitive aspects of audio quality perception. The proposed method models the perceived severity of audible distortions by adaptively weighting different distortion metrics. The weights are determined using an interaction cost function that captures relationships between distortion salience and cognitive effects. Compared to other machine learning methods and established tools, the proposed architecture achieves higher prediction accuracy on large databases of previously unseen subjective quality scores. The perceptually-motivated model offers a more manageable alternative to general-purpose machine learning algorithms, allowing potential extensions and improvements to multi-dimensional quality measurement without complete retraining.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4661-4675"},"PeriodicalIF":4.1000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10711267/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Efficient audioquality assessment is vital for streamlining audio codec development. Objective assessment tools have been developed over time to algorithmically predict quality ratings from subjective assessments, the gold standard for quality judgment. Many of these tools use perceptual auditory models to extract audio features that are mapped to a basic audio quality score prediction using machine learning algorithms and subjective scores as training data. However, existing tools struggle with generalization in quality prediction, especially when faced with unknown signal and distortion types. This is particularly evident in the presence of signals coded using non-waveform-preserving parametric techniques. Addressing these challenges, this two-part work proposes extensions to the Perceptual Evaluation of Audio Quality (PEAQ - ITU-R BS.1387-1) recommendation. Part 1 focuses on increasing generalization, while Part 2 targets accurate spatial audio quality measurement in audio coding. To enhance prediction generalization, this paper (Part 1) introduces a novel machine learning approach that uses subjective data to model cognitive aspects of audio quality perception. The proposed method models the perceived severity of audible distortions by adaptively weighting different distortion metrics. The weights are determined using an interaction cost function that captures relationships between distortion salience and cognitive effects. Compared to other machine learning methods and established tools, the proposed architecture achieves higher prediction accuracy on large databases of previously unseen subjective quality scores. The perceptually-motivated model offers a more manageable alternative to general-purpose machine learning algorithms, allowing potential extensions and improvements to multi-dimensional quality measurement without complete retraining.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
改进客观感知音频质量评估 - 第一部分:新颖的数据驱动认知模型
高效的音频质量评估对于简化音频编解码器的开发至关重要。随着时间的推移,人们已经开发出客观的评估工具,通过算法从主观评估(质量判断的黄金标准)中预测质量评级。其中许多工具使用感知听觉模型来提取音频特征,然后使用机器学习算法和主观评分作为训练数据,将这些特征映射到基本的音频质量评分预测中。然而,现有工具在质量预测的通用性方面存在困难,尤其是在面对未知信号和失真类型时。这一点在使用非波形保留参数技术编码的信号中尤为明显。为应对这些挑战,本工作分为两部分,对音频质量感知评估(PEAQ - ITU-R BS.1387-1)建议进行了扩展。第 1 部分侧重于提高通用性,而第 2 部分则针对音频编码中精确的空间音频质量测量。为了提高预测的通用性,本文(第 1 部分)介绍了一种新颖的机器学习方法,该方法利用主观数据对音频质量感知的认知方面进行建模。所提出的方法通过对不同失真指标进行自适应加权,对可感知的听觉失真严重程度进行建模。权重使用交互成本函数确定,该函数可捕捉失真显著性与认知效果之间的关系。与其他机器学习方法和成熟的工具相比,所提出的架构在以前未见过的主观质量评分的大型数据库上实现了更高的预测准确性。以感知为导向的模型为通用机器学习算法提供了更易于管理的替代方案,使多维质量测量的潜在扩展和改进无需完全重新训练。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE/ACM Transactions on Audio, Speech, and Language Processing
IEEE/ACM Transactions on Audio, Speech, and Language Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
11.30
自引率
11.10%
发文量
217
期刊介绍: The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.
期刊最新文献
Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features FxLMS/F Based Tap Decomposed Adaptive Filter for Decentralized Active Noise Control System MRC-PASCL: A Few-Shot Machine Reading Comprehension Approach via Post-Training and Answer Span-Oriented Contrastive Learning Knowledge-Guided Transformer for Joint Theme and Emotion Classification of Chinese Classical Poetry WEDA: Exploring Copyright Protection for Large Language Model Downstream Alignment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1