Binaural cue coding-Part I: psychoacoustic fundamentals and design principles

F. Baumgarte, C. Faller
{"title":"Binaural cue coding-Part I: psychoacoustic fundamentals and design principles","authors":"F. Baumgarte, C. Faller","doi":"10.1109/TSA.2003.818109","DOIUrl":null,"url":null,"abstract":"Binaural Cue Coding (BCC) is a method for multichannel spatial rendering based on one down-mixed audio channel and BCC side information. The BCC side information has a low data rate and it is derived from the multichannel encoder input signal. A natural application of BCC is multichannel audio data rate reduction since only a single down-mixed audio channel needs to be transmitted. An alternative BCC scheme for efficient joint transmission of independent source signals supports flexible spatial rendering at the decoder. This paper (Part I) discusses the most relevant binaural perception phenomena exploited by BCC. Based on that, it presents a psychoacoustically motivated approach for designing a BCC analyzer and synthesizer. This leads to a reference implementation for analysis and synthesis of stereophonic audio signals based on a Cochlear Filter Bank. BCC synthesizer implementations based on the FFT are presented as low-complexity alternatives. A subjective audio quality assessment of these implementations shows the robust performance of BCC for critical speech and audio material. Moreover, the results suggest that the performance given by the reference synthesizer is not significantly compromised when using a low-complexity FFT-based synthesizer. The companion paper (Part II) generalizes BCC analysis and synthesis for multichannel audio and proposes complete BCC schemes including quantization and coding. Part II also describes an alternative BCC scheme with flexible rendering capability at the decoder and proposes several applications for both BCC schemes.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"90 1","pages":"509-519"},"PeriodicalIF":0.0000,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"175","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Trans. Speech Audio Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSA.2003.818109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 175

Abstract

Binaural Cue Coding (BCC) is a method for multichannel spatial rendering based on one down-mixed audio channel and BCC side information. The BCC side information has a low data rate and it is derived from the multichannel encoder input signal. A natural application of BCC is multichannel audio data rate reduction since only a single down-mixed audio channel needs to be transmitted. An alternative BCC scheme for efficient joint transmission of independent source signals supports flexible spatial rendering at the decoder. This paper (Part I) discusses the most relevant binaural perception phenomena exploited by BCC. Based on that, it presents a psychoacoustically motivated approach for designing a BCC analyzer and synthesizer. This leads to a reference implementation for analysis and synthesis of stereophonic audio signals based on a Cochlear Filter Bank. BCC synthesizer implementations based on the FFT are presented as low-complexity alternatives. A subjective audio quality assessment of these implementations shows the robust performance of BCC for critical speech and audio material. Moreover, the results suggest that the performance given by the reference synthesizer is not significantly compromised when using a low-complexity FFT-based synthesizer. The companion paper (Part II) generalizes BCC analysis and synthesis for multichannel audio and proposes complete BCC schemes including quantization and coding. Part II also describes an alternative BCC scheme with flexible rendering capability at the decoder and proposes several applications for both BCC schemes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
双耳线索编码。第一部分:心理声学基础和设计原则
双耳线索编码(BCC)是一种基于一个下混音频通道和BCC边信息的多通道空间渲染方法。BCC侧信息具有较低的数据率,它来源于多通道编码器输入信号。BCC的一个自然应用是多通道音频数据速率降低,因为只需要传输一个下行混合音频通道。另一种BCC方案用于独立源信号的有效联合传输,支持解码器的灵活空间渲染。本文(第一部分)讨论了BCC所利用的最相关的双耳感知现象。在此基础上,提出了一种基于心理声学的BCC分析仪和合成器设计方法。这导致了一个基于耳蜗滤波器组的立体声音频信号分析和合成的参考实现。基于FFT的BCC合成器实现是一种低复杂度的替代方案。对这些实现的主观音频质量评估表明,BCC对关键语音和音频材料的鲁棒性。此外,结果表明,当使用低复杂度的基于fft的合成器时,参考合成器的性能不会受到显著损害。第二部分概述了多声道音频的BCC分析和合成,并提出了完整的BCC方案,包括量化和编码。第二部分还描述了在解码器上具有灵活呈现能力的备选BCC方案,并提出了两种BCC方案的几种应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Errata to "Using Steady-State Suppression to Improve Speech Intelligibility in Reverberant Environments for Elderly Listeners" Farewell Editorial Inaugural Editorial: Riding the Tidal Wave of Human-Centric Information Processing - Innovate, Outreach, Collaborate, Connect, Expand, and Win Three-Dimensional Sound Field Reproduction Using Multiple Circular Loudspeaker Arrays Introduction to the Special Issue on Processing Reverberant Speech: Methodologies and Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1