2010 IEEE International Workshop on Multimedia Signal Processing最新文献

英文中文

Enhancing stereophonic teleconferencing with microphone arrays through sound field warping 通过声场翘曲增强麦克风阵列立体声电话会议

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5661989

Weig-Ge Chen, Zhengyou Zhang

It has been proven that spatial audio enhances the realism of sound for teleconferencing. Previously, solutions have been proposed for multiparty conferencing where each remote participant is assumed to have his/her own microphone, and for conferencing between two rooms where the microphones in one room are connected to the equal number of loudspeakers in the other room. Either approach has its limitations. Hence, we propose a new scheme to improve stereophonic conferencing experience through an innovative use of microphone arrays. Instead of operating in the default mode where a single channel is produced using spatial filtering, we propose to transmit all channels forming a collection of spatial samples of the sound field. Those samples are warped appropriately at the remote site, and are spatialized together with audio streams from other remote sites if any, to produce the perception of a virtual sound field. Real-world audio samples are provided to showcase the proposed technique. The informal listening test shows that majority of the users prefer the new experience.

事实证明，空间音频增强了电话会议声音的真实感。以前，已经提出了针对多方会议的解决方案，其中每个远程参与者都假设有自己的麦克风，以及两个房间之间的会议，其中一个房间的麦克风连接到另一个房间的相同数量的扬声器。这两种方法都有其局限性。因此，我们提出了一种新的方案，通过创新地使用麦克风阵列来改善立体声会议体验。代替在使用空间滤波产生单个通道的默认模式下运行，我们建议传输形成声场空间样本集合的所有通道。这些样本在远程站点被适当地扭曲，并与来自其他远程站点的音频流(如果有的话)一起被空间化，以产生虚拟声场的感知。提供了真实世界的音频样本来展示所提出的技术。非正式的听力测试表明，大多数用户更喜欢新的体验。

引用次数: 1

Robust head pose estimation by fusing time-of-flight depth and color 融合飞行时间深度和颜色的鲁棒头部姿态估计

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662004

Amit Bleiweiss, M. Werman

We present a new solution for real-time head pose estimation. The key to our method is a model-based approach based on the fusion of color and time-of-flight depth data. Our method has several advantages over existing head-pose estimation solutions. It requires no initial setup or knowledge of a pre-built model or training data. The use of additional depth data leads to a robust solution, while maintaining real-time performance. The method outperforms the state-of-the art in several experiments using extreme situations such as sudden changes in lighting, large rotations, and fast motion.

提出了一种新的实时头部姿态估计方法。该方法的关键是基于模型的方法，该方法基于颜色和飞行时间深度数据的融合。与现有的头姿估计方法相比，我们的方法有几个优点。它不需要初始设置或预先构建的模型或训练数据的知识。在保持实时性能的同时，使用额外的深度数据可以提供一个强大的解决方案。在一些极端情况下的实验中，如光照突然变化、大旋转和快速运动，该方法的性能优于最先进的技术。

引用次数: 20

Fusion of active and passive sensors for fast 3D capture 主动和被动传感器融合实现快速三维捕获

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5661996

Qingxiong Yang, K. Tan, Bruce Culbertson, J. Apostolopoulos

We envision a conference room of the future where depth sensing systems are able to capture the 3D position and pose of users, and enable users to interact with digital media and contents being shown on immersive displays. The key technical barrier is that current depth sensing systems are noisy, inaccurate, and unreliable. It is well understood that passive stereo fails in non-textured, featureless portions of a scene. Active sensors on the other hand are more accurate in these regions and tend to be noisy in highly textured regions. We propose a way to synergistically combine the two to create a state-of-the-art depth sensing system which runs in near real time. In contrast the only known previous method for fusion is slow and fails to take advantage of the complementary nature of the two types of sensors.

我们设想在未来的会议室里，深度传感系统能够捕捉用户的3D位置和姿势，并使用户能够与沉浸式显示器上显示的数字媒体和内容进行交互。关键的技术障碍是当前的深度传感系统存在噪声、不准确和不可靠的问题。众所周知，被动立体在场景中无纹理、无特征的部分是失败的。另一方面，有源传感器在这些区域更准确，而在高度纹理化的区域往往有噪声。我们提出了一种将两者协同结合的方法，以创建一个接近实时运行的最先进的深度传感系统。相比之下，以前唯一已知的融合方法速度很慢，而且无法利用两种传感器的互补性。

引用次数: 80

Recovering the output of an OFB in the case of instantaneous erasures in sub-band domain 在子带域瞬时擦除的情况下恢复OFB的输出

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662032

Mohsen Akbari, F. Labeau

In this paper, we propose a method for reconstructing the output of an Oversampled Filter Bank (OFB) when instantaneous erasures happen in the sub-band domain. Instantaneous erasure is defined as a situation where the erasure pattern changes in each time instance. This definition differs from the type of erasure usually defined in literature, where e erasures means that e channels of the OFB are off and do not work at all. This new definition is more realistic and increases the flexibility and resilience of the OFB in combating the erasures. Additionally, similar to puncturing, the same idea can be used in an erasure-free channel to reconstruct the output, when sub-band samples are discarded intentionally in order to change the code rate. In this paper we also derive the sufficient conditions that should be met by the OFB in order for the proposed reconstruction method to work. Based on that, eventually we suggest a general form for the OFBs which are robust to this type of erasure.

在本文中，我们提出了一种在子带域发生瞬时擦除时重建过采样滤波器组(OFB)输出的方法。瞬时擦除被定义为擦除模式在每个时间实例中发生变化的情况。这个定义不同于文献中通常定义的擦除类型，在文献中，e擦除意味着OFB的e个通道关闭并且根本不工作。这个新的定义更加现实，并增加了OFB在打击删除方面的灵活性和弹性。此外，与穿刺类似，当有意丢弃子带样本以改变码率时，可以在无擦除通道中使用相同的思想来重建输出。本文还推导了OFB应满足的充分条件，以使所提出的重建方法起作用。在此基础上，我们最终提出了对这种类型的擦除具有鲁棒性的ofb的一般形式。

引用次数: 7

Hierarchical Hole-Filling(HHF): Depth image based rendering without depth map filtering for 3D-TV 分层填充孔(HHF):基于深度图像的3D-TV渲染，无需深度图过滤

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5661999

Mashhour Solh, G. Al-Regib

In this paper we propose a new approach for disocclusion removal in depth image-based rendering (DIBR) for 3D-TV. The new approach, Hierarchical Hole-Filling (HHF), eliminates the need for any preprocessing of the depth map. HHF uses a pyramid like approach to estimate the hole pixels from lower resolution estimates of the 3D wrapped image. The lower resolution estimates involves a pseudo zero canceling plus Gaussian filtering of the wrapped image. Then starting backwards from the lowest resolution hole-free estimate in the pyramid, we interpolate and use the pixel values to fill in the hole in the higher up resolution image. The procedure is repeated until the estimated image is hole-free. Experimental results show that HHF yields virtual images that are free of any geometric distortions, which is not the case in other algorithms that preprocess the depth map. Experiments has also shown that unlike previous DIBR techniques, HHF is not sensitive to depth maps with high percentage of bad matching pixels.

本文提出了一种3d电视深度图像渲染(DIBR)中去残差的新方法。新的方法，分层填充孔(HHF)，消除了对深度图的任何预处理的需要。HHF使用类似金字塔的方法从3D包裹图像的低分辨率估计中估计洞像素。较低的分辨率估计涉及伪零消除加上高斯滤波的包裹图像。然后从金字塔中最低分辨率的无洞估计开始，我们插值并使用像素值填充更高分辨率图像中的洞。重复这个过程，直到估计的图像是无孔的。实验结果表明，HHF产生的虚拟图像没有任何几何畸变，这在其他深度图预处理算法中是不存在的。实验还表明，与以前的DIBR技术不同，HHF对高匹配像素比例的深度图不敏感。

引用次数: 32

Visual quality of current coding technologies at high definition IPTV bitrates 当前高清IPTV码率编码技术的视觉质量

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662052

Christian Keimel, Julian Habigt, Tim Habigt, Martin Rothbucher, K. Diepold

High definition video over IP based networks (IPTV) has become a mainstay in today's consumer environment. In most applications, encoders conforming to the H.264/AVC standard are used. But even within one standard, often a wide range of coding tools are available that can deliver a vastly different visual quality. Therefore we evaluate in this contribution different coding technologies, using different encoder settings of H.264/AVC, but also a completely different encoder like Dirac. We cover a wide range of different bitrates from ADSL to VDSL and different content, with low and high demand on the encoders. As PSNR is not well suited to describe the perceived visual quality, we conducted extensive subject tests to determine the visual quality. Our results show that for currently common bitrates, the visual quality can be more than doubled, if the same coding technology, but different coding tools are used.

基于IP的网络高清视频(IPTV)已经成为当今消费环境的支柱。在大多数应用中，使用符合H.264/AVC标准的编码器。但是，即使在一个标准中，通常也有各种各样的编码工具可用，可以提供截然不同的视觉质量。因此，我们在此贡献中评估了不同的编码技术，使用H.264/AVC的不同编码器设置，以及完全不同的编码器，如Dirac。我们涵盖了从ADSL到VDSL的不同比特率和不同的内容，对编码器的要求有低有高。由于PSNR不能很好地描述感知到的视觉质量，我们进行了广泛的受试者测试来确定视觉质量。我们的研究结果表明，对于目前常见的比特率，如果使用相同的编码技术，但不同的编码工具，视觉质量可以提高一倍以上。

引用次数: 25

Reference frame modification methods in scalable video coding (SVC) 可扩展视频编码(SVC)中的参考帧修改方法

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662019

A. Naghdinezhad, F. Labeau

With the rapid development of multimedia technology, video transmission over error prone channels is widely used. Using predictive video coding can lead to temporal and spatial propagation of channel errors, which consequently results in high degradation in the quality of the received video. In order to address this problem different error resilient methods have been proposed. In this paper, a number of the error resilient methods based on reference frame modification are overviewed briefly and examined with scalable extension of H.264/AVC (SVC). We propose a new method based on hierarchical structure used in temporal scalable coding. Average gains of 0.76 dB over the improved generalized source channel prediction (IGSCP) method and 2.26 dB over normal coding are achieved.

随着多媒体技术的飞速发展，视频传输在易出错信道上的应用越来越广泛。使用预测视频编码会导致信道误差的时空传播，从而导致接收视频质量的严重下降。为了解决这一问题，人们提出了不同的抗误差方法。本文简要介绍了几种基于参考帧修改的误差复原方法，并对H.264/AVC (SVC)的可扩展扩展进行了研究。提出了一种基于层次结构的时间可伸缩编码方法。与改进的广义源信道预测(IGSCP)方法相比，平均增益为0.76 dB，比普通编码方法平均增益为2.26 dB。

引用次数: 4

Integrating a HRTF-based sound synthesis system into Mumble 将基于hrtf的声音合成系统集成到Mumble中

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5661988

Martin Rothbucher, Tim Habigt, Johannes Feldmaier, K. Diepold

This paper describes an integration of a Head Related Transfer Function (HRTF)-based 3D sound convolution engine into the open-source VoIP conferencing software Mumble. Our system allows to virtually place audio contributions of conference participants to different positions around a listener, which helps to overcome the problem of identifying active speakers in an audio conference. Furthermore, using HRTFs to generate 3D sound in virtual 3D space, the listener is able to make use of the cocktail party effect in order to differentiate between several simultaneously active speakers. As a result intelligibility of communication is increased.

本文介绍了一种基于头部相关传递函数(HRTF)的三维声音卷积引擎与开源VoIP会议软件Mumble的集成。我们的系统允许将会议参与者的音频贡献虚拟地放置在听众周围的不同位置，这有助于克服在音频会议中识别主动说话者的问题。此外，使用hrtf在虚拟3D空间中生成3D声音，听者能够利用鸡尾酒会效应来区分几个同时活跃的说话者。因此，沟通的可理解性得到了提高。

引用次数: 8

Private content identification: Performance-privacy-complexity trade-off 私有内容标识:性能-隐私-复杂性的权衡

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5661994

S. Voloshynovskiy, O. Koval, F. Beekhof, F. Farhadzadeh, T. Holotyak

In light of the recent development of multimedia and networking technologies, an exponentially increasing amount of content is available via various public services. That is why content identification attracts a lot of attention. One possible technology for content identification is based on digital fingerprinting. When trying to establish information-theoretic limits in this application, usually it is assumed that the codewords are of infinite length and that a jointly typical decoder is used in the analysis. These assumptions represent a certain over-generalization for the majority of practical applications. Consequently, the impact of the finite length on the mentioned limits remains an open and largely unexplored problem. Furthermore, leaking of privacy-related information to third parties due to storage, distribution and sharing of fingerprinting data represents an emerging research issue that should be addressed carefully. This paper contains an information-theoretic analysis of finite length digital fingerprinting under privacy constraints. A particular link between the considered setup and Forney's erasure/list decoding [1] is presented. Finally, complexity issues of reliable identification in large databases are addressed.

随着多媒体和网络技术的发展，通过各种公共服务提供的内容呈指数级增长。这就是内容识别吸引大量关注的原因。一种可能的内容识别技术是基于数字指纹。当试图在这种应用中建立信息理论限制时，通常假设码字是无限长的，并且在分析中使用联合典型解码器。对于大多数实际应用来说，这些假设都过于一般化了。因此，有限长度对上述极限的影响仍然是一个开放的和在很大程度上未被探索的问题。此外，由于指纹数据的存储、分发和共享，与隐私相关的信息泄露给第三方，这是一个应该认真解决的新兴研究问题。本文对隐私约束下的有限长度数字指纹进行了信息论分析。本文提出了所考虑的设置与Forney的擦除/列表解码[1]之间的特殊联系。最后，讨论了大型数据库中可靠识别的复杂性问题。

{"title":"Private content identification: Performance-privacy-complexity trade-off","authors":"S. Voloshynovskiy, O. Koval, F. Beekhof, F. Farhadzadeh, T. Holotyak","doi":"10.1109/MMSP.2010.5661994","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661994","url":null,"abstract":"In light of the recent development of multimedia and networking technologies, an exponentially increasing amount of content is available via various public services. That is why content identification attracts a lot of attention. One possible technology for content identification is based on digital fingerprinting. When trying to establish information-theoretic limits in this application, usually it is assumed that the codewords are of infinite length and that a jointly typical decoder is used in the analysis. These assumptions represent a certain over-generalization for the majority of practical applications. Consequently, the impact of the finite length on the mentioned limits remains an open and largely unexplored problem. Furthermore, leaking of privacy-related information to third parties due to storage, distribution and sharing of fingerprinting data represents an emerging research issue that should be addressed carefully. This paper contains an information-theoretic analysis of finite length digital fingerprinting under privacy constraints. A particular link between the considered setup and Forney's erasure/list decoding [1] is presented. Finally, complexity issues of reliable identification in large databases are addressed.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131475558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sigmoid shrinkage for BM3D denoising algorithm 用于BM3D去噪的s形收缩算法

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662058

M. Poderico, S. Parrilli, G. Poggi, L. Verdoliva

In this work we propose a modified version of the BM3D algorithm recently introduced by Dabov et al. [1] for the denoising of images corrupted by additive white Gaussian noise. The original technique performs a multipoint filtering, where the nonlocal approach is combined with the wavelet shrinkage of a 3D cube composed by similar patches collected by means of block-matching. Our improvement concerns the thresholding of wavelet coefficients, which are subject to a different shrinkage depending on their level of sparsity. The modified algorithm is more robust with respect to block matching errors, especially when noise is high, as proved by experimental results on a large set of natural images.

在这项工作中，我们提出了最近由Dabov等人[1]引入的BM3D算法的改进版本，用于去除被加性高斯白噪声损坏的图像。原始技术执行多点滤波，其中非局部方法与通过块匹配收集的相似斑块组成的三维立方体的小波收缩相结合。我们的改进涉及小波系数的阈值，小波系数根据其稀疏程度受到不同的收缩。在大量自然图像上的实验结果表明，改进后的算法对块匹配误差具有更强的鲁棒性，特别是在噪声较大的情况下。

引用次数: 13

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2010 IEEE International Workshop on Multimedia Signal Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀