A General Compression Approach to Multi-Channel Three-Dimensional Audio

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI:10.1109/TASL.2013.2260156

B. Cheng, C. Ritz, I. Burnett, Xiguang Zheng

引用次数: 16

Abstract

This paper presents a technique for low bit rate compression of three-dimensional (3D) audio produced by multiple loudspeaker channels. The approach is based on the time-frequency analysis of the localization of spatial sound sources within the 3D space as rendered by a multi-channel audio signal (in this case 16 channels). This analysis results in the derivation of a stereo downmix signal representing the original 16 channels. Alternatively, a mono-downmix signal with side information representing the location of sound sources within the 3D spatial scene can also be derived. The resulting downmix signals are then compressed with a traditional audio coder, resulting in a representation of the 3D soundfield at bit rates comparable with existing stereo audio coders while maintaining the perceptual quality produced from separate encoding of each channel.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多声道三维音频的通用压缩方法

本文提出了一种由多个扬声器通道产生的三维(3D)音频的低比特率压缩技术。该方法基于多通道音频信号(在本例中为16通道)呈现的3D空间内空间声源定位的时频分析。这一分析结果在派生的立体声下混信号表示原来的16个通道。或者，还可以导出具有表示3D空间场景中声源位置的侧信息的单频下混信号。然后用传统的音频编码器压缩产生的下行信号，产生与现有立体声音频编码器相当的比特率的3D声场表示，同时保持每个通道独立编码产生的感知质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Audio Speech and Language Processing 工程技术-工程：电子与电气

自引率

0.00%

发文量

审稿时长

24.0 months

期刊介绍： The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.