首页 > 最新文献

2010 IEEE International Workshop on Multimedia Signal Processing最新文献

英文 中文
Enhancing stereophonic teleconferencing with microphone arrays through sound field warping 通过声场翘曲增强麦克风阵列立体声电话会议
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5661989
Weig-Ge Chen, Zhengyou Zhang
It has been proven that spatial audio enhances the realism of sound for teleconferencing. Previously, solutions have been proposed for multiparty conferencing where each remote participant is assumed to have his/her own microphone, and for conferencing between two rooms where the microphones in one room are connected to the equal number of loudspeakers in the other room. Either approach has its limitations. Hence, we propose a new scheme to improve stereophonic conferencing experience through an innovative use of microphone arrays. Instead of operating in the default mode where a single channel is produced using spatial filtering, we propose to transmit all channels forming a collection of spatial samples of the sound field. Those samples are warped appropriately at the remote site, and are spatialized together with audio streams from other remote sites if any, to produce the perception of a virtual sound field. Real-world audio samples are provided to showcase the proposed technique. The informal listening test shows that majority of the users prefer the new experience.
事实证明,空间音频增强了电话会议声音的真实感。以前,已经提出了针对多方会议的解决方案,其中每个远程参与者都假设有自己的麦克风,以及两个房间之间的会议,其中一个房间的麦克风连接到另一个房间的相同数量的扬声器。这两种方法都有其局限性。因此,我们提出了一种新的方案,通过创新地使用麦克风阵列来改善立体声会议体验。代替在使用空间滤波产生单个通道的默认模式下运行,我们建议传输形成声场空间样本集合的所有通道。这些样本在远程站点被适当地扭曲,并与来自其他远程站点的音频流(如果有的话)一起被空间化,以产生虚拟声场的感知。提供了真实世界的音频样本来展示所提出的技术。非正式的听力测试表明,大多数用户更喜欢新的体验。
{"title":"Enhancing stereophonic teleconferencing with microphone arrays through sound field warping","authors":"Weig-Ge Chen, Zhengyou Zhang","doi":"10.1109/MMSP.2010.5661989","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661989","url":null,"abstract":"It has been proven that spatial audio enhances the realism of sound for teleconferencing. Previously, solutions have been proposed for multiparty conferencing where each remote participant is assumed to have his/her own microphone, and for conferencing between two rooms where the microphones in one room are connected to the equal number of loudspeakers in the other room. Either approach has its limitations. Hence, we propose a new scheme to improve stereophonic conferencing experience through an innovative use of microphone arrays. Instead of operating in the default mode where a single channel is produced using spatial filtering, we propose to transmit all channels forming a collection of spatial samples of the sound field. Those samples are warped appropriately at the remote site, and are spatialized together with audio streams from other remote sites if any, to produce the perception of a virtual sound field. Real-world audio samples are provided to showcase the proposed technique. The informal listening test shows that majority of the users prefer the new experience.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116206366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Robust head pose estimation by fusing time-of-flight depth and color 融合飞行时间深度和颜色的鲁棒头部姿态估计
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662004
Amit Bleiweiss, M. Werman
We present a new solution for real-time head pose estimation. The key to our method is a model-based approach based on the fusion of color and time-of-flight depth data. Our method has several advantages over existing head-pose estimation solutions. It requires no initial setup or knowledge of a pre-built model or training data. The use of additional depth data leads to a robust solution, while maintaining real-time performance. The method outperforms the state-of-the art in several experiments using extreme situations such as sudden changes in lighting, large rotations, and fast motion.
提出了一种新的实时头部姿态估计方法。该方法的关键是基于模型的方法,该方法基于颜色和飞行时间深度数据的融合。与现有的头姿估计方法相比,我们的方法有几个优点。它不需要初始设置或预先构建的模型或训练数据的知识。在保持实时性能的同时,使用额外的深度数据可以提供一个强大的解决方案。在一些极端情况下的实验中,如光照突然变化、大旋转和快速运动,该方法的性能优于最先进的技术。
{"title":"Robust head pose estimation by fusing time-of-flight depth and color","authors":"Amit Bleiweiss, M. Werman","doi":"10.1109/MMSP.2010.5662004","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662004","url":null,"abstract":"We present a new solution for real-time head pose estimation. The key to our method is a model-based approach based on the fusion of color and time-of-flight depth data. Our method has several advantages over existing head-pose estimation solutions. It requires no initial setup or knowledge of a pre-built model or training data. The use of additional depth data leads to a robust solution, while maintaining real-time performance. The method outperforms the state-of-the art in several experiments using extreme situations such as sudden changes in lighting, large rotations, and fast motion.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133983252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
A subjective experiment for 3D-mesh segmentation evaluation 三维网格分割评价的主观实验
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662046
H. Benhabiles, G. Lavoué, Jean-Philippe Vandeborre, M. Daoudi
In this paper we present a subjective quality assessment experiment for 3D-mesh segmentation. For this end, we carefully designed a protocol with respect to several factors namely the rendering conditions, the possible interactions, the rating range, and the number of human subjects. To carry out the subjective experiment, more than 40 human observers have rated a set of 250 segmentation results issued from various algorithms. The obtained Mean Opinion Scores, which represent the human subjects' point of view toward the quality of each segmentation, have then been used to evaluate both the quality of automatic segmentation algorithms and the quality of similarity metrics used in recent mesh segmentation benchmarking systems.
本文提出了一种用于三维网格分割的主观质量评价实验。为此,我们精心设计了一个协议,考虑了几个因素,即渲染条件、可能的交互、评级范围和人类受试者的数量。为了进行主观实验,40多名人类观察者对各种算法发布的250个分割结果进行了评分。所获得的平均意见分数代表了人类受试者对每个分割质量的观点,然后用于评估自动分割算法的质量和最近网格分割基准系统中使用的相似度量的质量。
{"title":"A subjective experiment for 3D-mesh segmentation evaluation","authors":"H. Benhabiles, G. Lavoué, Jean-Philippe Vandeborre, M. Daoudi","doi":"10.1109/MMSP.2010.5662046","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662046","url":null,"abstract":"In this paper we present a subjective quality assessment experiment for 3D-mesh segmentation. For this end, we carefully designed a protocol with respect to several factors namely the rendering conditions, the possible interactions, the rating range, and the number of human subjects. To carry out the subjective experiment, more than 40 human observers have rated a set of 250 segmentation results issued from various algorithms. The obtained Mean Opinion Scores, which represent the human subjects' point of view toward the quality of each segmentation, have then been used to evaluate both the quality of automatic segmentation algorithms and the quality of similarity metrics used in recent mesh segmentation benchmarking systems.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"684 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132868006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Motion vector coding algorithm based on adaptive template matching 基于自适应模板匹配的运动矢量编码算法
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662023
Wen Yang, O. Au, Jingjing Dai, Feng Zou, Chao Pang, Yu Liu
Motion estimation as well as the corresponding motion compensation is a core part of modern video coding standards, which highly improves the compression efficiency. On the other hand, motion information takes considerable portion of compressed bit stream, especially in low bit rate situation. In this paper, an efficient motion vector prediction algorithm is proposed to minimize the bits used for coding the motion information. First, a possible motion vector predictor (MVP) candidate set (CS) including several scaled spatial and temporal predictors is defined. To increase the diversity of predictors, the spatial predictor is adaptively changed based on current distribution of neighboring motion vectors. After that, adaptive template matching technique is applied to remove non-effective predictors from the CS so that the bits used for the MVP index can be significantly reduced. As the final MVP is chosen based on minimum motion vector difference criterion, a guessing strategy is further introduced so that in some situations the bits consumed by signaling the MVP index to the decoder can be totally omitted. The experimental results indicate that the proposed method can achieve an average bit rate reduction of 5.9% compared with the H.264 standard.
运动估计以及相应的运动补偿是现代视频编码标准的核心部分,它极大地提高了压缩效率。另一方面,运动信息在压缩比特流中占有相当大的比例,特别是在低比特率的情况下。本文提出了一种有效的运动矢量预测算法,以减少用于编码运动信息的比特数。首先,定义一个可能的运动矢量预测(MVP)候选集(CS),包括多个缩放的空间和时间预测。为了增加预测因子的多样性,空间预测因子根据邻近运动向量的当前分布自适应变化。之后,应用自适应模板匹配技术从CS中去除无效的预测因子,从而显著减少用于MVP指数的比特数。由于最终的MVP是基于最小运动矢量差准则选择的,因此进一步引入了一种猜测策略,以便在某些情况下,将MVP索引信号发送给解码器所消耗的比特可以完全省略。实验结果表明,与H.264标准相比,该方法平均比特率降低了5.9%。
{"title":"Motion vector coding algorithm based on adaptive template matching","authors":"Wen Yang, O. Au, Jingjing Dai, Feng Zou, Chao Pang, Yu Liu","doi":"10.1109/MMSP.2010.5662023","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662023","url":null,"abstract":"Motion estimation as well as the corresponding motion compensation is a core part of modern video coding standards, which highly improves the compression efficiency. On the other hand, motion information takes considerable portion of compressed bit stream, especially in low bit rate situation. In this paper, an efficient motion vector prediction algorithm is proposed to minimize the bits used for coding the motion information. First, a possible motion vector predictor (MVP) candidate set (CS) including several scaled spatial and temporal predictors is defined. To increase the diversity of predictors, the spatial predictor is adaptively changed based on current distribution of neighboring motion vectors. After that, adaptive template matching technique is applied to remove non-effective predictors from the CS so that the bits used for the MVP index can be significantly reduced. As the final MVP is chosen based on minimum motion vector difference criterion, a guessing strategy is further introduced so that in some situations the bits consumed by signaling the MVP index to the decoder can be totally omitted. The experimental results indicate that the proposed method can achieve an average bit rate reduction of 5.9% compared with the H.264 standard.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117337709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sigmoid shrinkage for BM3D denoising algorithm 用于BM3D去噪的s形收缩算法
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662058
M. Poderico, S. Parrilli, G. Poggi, L. Verdoliva
In this work we propose a modified version of the BM3D algorithm recently introduced by Dabov et al. [1] for the denoising of images corrupted by additive white Gaussian noise. The original technique performs a multipoint filtering, where the nonlocal approach is combined with the wavelet shrinkage of a 3D cube composed by similar patches collected by means of block-matching. Our improvement concerns the thresholding of wavelet coefficients, which are subject to a different shrinkage depending on their level of sparsity. The modified algorithm is more robust with respect to block matching errors, especially when noise is high, as proved by experimental results on a large set of natural images.
在这项工作中,我们提出了最近由Dabov等人[1]引入的BM3D算法的改进版本,用于去除被加性高斯白噪声损坏的图像。原始技术执行多点滤波,其中非局部方法与通过块匹配收集的相似斑块组成的三维立方体的小波收缩相结合。我们的改进涉及小波系数的阈值,小波系数根据其稀疏程度受到不同的收缩。在大量自然图像上的实验结果表明,改进后的算法对块匹配误差具有更强的鲁棒性,特别是在噪声较大的情况下。
{"title":"Sigmoid shrinkage for BM3D denoising algorithm","authors":"M. Poderico, S. Parrilli, G. Poggi, L. Verdoliva","doi":"10.1109/MMSP.2010.5662058","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662058","url":null,"abstract":"In this work we propose a modified version of the BM3D algorithm recently introduced by Dabov et al. [1] for the denoising of images corrupted by additive white Gaussian noise. The original technique performs a multipoint filtering, where the nonlocal approach is combined with the wavelet shrinkage of a 3D cube composed by similar patches collected by means of block-matching. Our improvement concerns the thresholding of wavelet coefficients, which are subject to a different shrinkage depending on their level of sparsity. The modified algorithm is more robust with respect to block matching errors, especially when noise is high, as proved by experimental results on a large set of natural images.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115986177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Recovering the output of an OFB in the case of instantaneous erasures in sub-band domain 在子带域瞬时擦除的情况下恢复OFB的输出
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662032
Mohsen Akbari, F. Labeau
In this paper, we propose a method for reconstructing the output of an Oversampled Filter Bank (OFB) when instantaneous erasures happen in the sub-band domain. Instantaneous erasure is defined as a situation where the erasure pattern changes in each time instance. This definition differs from the type of erasure usually defined in literature, where e erasures means that e channels of the OFB are off and do not work at all. This new definition is more realistic and increases the flexibility and resilience of the OFB in combating the erasures. Additionally, similar to puncturing, the same idea can be used in an erasure-free channel to reconstruct the output, when sub-band samples are discarded intentionally in order to change the code rate. In this paper we also derive the sufficient conditions that should be met by the OFB in order for the proposed reconstruction method to work. Based on that, eventually we suggest a general form for the OFBs which are robust to this type of erasure.
在本文中,我们提出了一种在子带域发生瞬时擦除时重建过采样滤波器组(OFB)输出的方法。瞬时擦除被定义为擦除模式在每个时间实例中发生变化的情况。这个定义不同于文献中通常定义的擦除类型,在文献中,e擦除意味着OFB的e个通道关闭并且根本不工作。这个新的定义更加现实,并增加了OFB在打击删除方面的灵活性和弹性。此外,与穿刺类似,当有意丢弃子带样本以改变码率时,可以在无擦除通道中使用相同的思想来重建输出。本文还推导了OFB应满足的充分条件,以使所提出的重建方法起作用。在此基础上,我们最终提出了对这种类型的擦除具有鲁棒性的ofb的一般形式。
{"title":"Recovering the output of an OFB in the case of instantaneous erasures in sub-band domain","authors":"Mohsen Akbari, F. Labeau","doi":"10.1109/MMSP.2010.5662032","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662032","url":null,"abstract":"In this paper, we propose a method for reconstructing the output of an Oversampled Filter Bank (OFB) when instantaneous erasures happen in the sub-band domain. Instantaneous erasure is defined as a situation where the erasure pattern changes in each time instance. This definition differs from the type of erasure usually defined in literature, where e erasures means that e channels of the OFB are off and do not work at all. This new definition is more realistic and increases the flexibility and resilience of the OFB in combating the erasures. Additionally, similar to puncturing, the same idea can be used in an erasure-free channel to reconstruct the output, when sub-band samples are discarded intentionally in order to change the code rate. In this paper we also derive the sufficient conditions that should be met by the OFB in order for the proposed reconstruction method to work. Based on that, eventually we suggest a general form for the OFBs which are robust to this type of erasure.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117099383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Hierarchical Hole-Filling(HHF): Depth image based rendering without depth map filtering for 3D-TV 分层填充孔(HHF):基于深度图像的3D-TV渲染,无需深度图过滤
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5661999
Mashhour Solh, G. Al-Regib
In this paper we propose a new approach for disocclusion removal in depth image-based rendering (DIBR) for 3D-TV. The new approach, Hierarchical Hole-Filling (HHF), eliminates the need for any preprocessing of the depth map. HHF uses a pyramid like approach to estimate the hole pixels from lower resolution estimates of the 3D wrapped image. The lower resolution estimates involves a pseudo zero canceling plus Gaussian filtering of the wrapped image. Then starting backwards from the lowest resolution hole-free estimate in the pyramid, we interpolate and use the pixel values to fill in the hole in the higher up resolution image. The procedure is repeated until the estimated image is hole-free. Experimental results show that HHF yields virtual images that are free of any geometric distortions, which is not the case in other algorithms that preprocess the depth map. Experiments has also shown that unlike previous DIBR techniques, HHF is not sensitive to depth maps with high percentage of bad matching pixels.
本文提出了一种3d电视深度图像渲染(DIBR)中去残差的新方法。新的方法,分层填充孔(HHF),消除了对深度图的任何预处理的需要。HHF使用类似金字塔的方法从3D包裹图像的低分辨率估计中估计洞像素。较低的分辨率估计涉及伪零消除加上高斯滤波的包裹图像。然后从金字塔中最低分辨率的无洞估计开始,我们插值并使用像素值填充更高分辨率图像中的洞。重复这个过程,直到估计的图像是无孔的。实验结果表明,HHF产生的虚拟图像没有任何几何畸变,这在其他深度图预处理算法中是不存在的。实验还表明,与以前的DIBR技术不同,HHF对高匹配像素比例的深度图不敏感。
{"title":"Hierarchical Hole-Filling(HHF): Depth image based rendering without depth map filtering for 3D-TV","authors":"Mashhour Solh, G. Al-Regib","doi":"10.1109/MMSP.2010.5661999","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661999","url":null,"abstract":"In this paper we propose a new approach for disocclusion removal in depth image-based rendering (DIBR) for 3D-TV. The new approach, Hierarchical Hole-Filling (HHF), eliminates the need for any preprocessing of the depth map. HHF uses a pyramid like approach to estimate the hole pixels from lower resolution estimates of the 3D wrapped image. The lower resolution estimates involves a pseudo zero canceling plus Gaussian filtering of the wrapped image. Then starting backwards from the lowest resolution hole-free estimate in the pyramid, we interpolate and use the pixel values to fill in the hole in the higher up resolution image. The procedure is repeated until the estimated image is hole-free. Experimental results show that HHF yields virtual images that are free of any geometric distortions, which is not the case in other algorithms that preprocess the depth map. Experiments has also shown that unlike previous DIBR techniques, HHF is not sensitive to depth maps with high percentage of bad matching pixels.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123242636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Integrating a HRTF-based sound synthesis system into Mumble 将基于hrtf的声音合成系统集成到Mumble中
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5661988
Martin Rothbucher, Tim Habigt, Johannes Feldmaier, K. Diepold
This paper describes an integration of a Head Related Transfer Function (HRTF)-based 3D sound convolution engine into the open-source VoIP conferencing software Mumble. Our system allows to virtually place audio contributions of conference participants to different positions around a listener, which helps to overcome the problem of identifying active speakers in an audio conference. Furthermore, using HRTFs to generate 3D sound in virtual 3D space, the listener is able to make use of the cocktail party effect in order to differentiate between several simultaneously active speakers. As a result intelligibility of communication is increased.
本文介绍了一种基于头部相关传递函数(HRTF)的三维声音卷积引擎与开源VoIP会议软件Mumble的集成。我们的系统允许将会议参与者的音频贡献虚拟地放置在听众周围的不同位置,这有助于克服在音频会议中识别主动说话者的问题。此外,使用hrtf在虚拟3D空间中生成3D声音,听者能够利用鸡尾酒会效应来区分几个同时活跃的说话者。因此,沟通的可理解性得到了提高。
{"title":"Integrating a HRTF-based sound synthesis system into Mumble","authors":"Martin Rothbucher, Tim Habigt, Johannes Feldmaier, K. Diepold","doi":"10.1109/MMSP.2010.5661988","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661988","url":null,"abstract":"This paper describes an integration of a Head Related Transfer Function (HRTF)-based 3D sound convolution engine into the open-source VoIP conferencing software Mumble. Our system allows to virtually place audio contributions of conference participants to different positions around a listener, which helps to overcome the problem of identifying active speakers in an audio conference. Furthermore, using HRTFs to generate 3D sound in virtual 3D space, the listener is able to make use of the cocktail party effect in order to differentiate between several simultaneously active speakers. As a result intelligibility of communication is increased.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130238545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Overcoming asynchrony in Audio-Visual Speech Recognition 克服视听语音识别中的异步性
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662066
V. Estellers, J. Thiran
In this paper we propose two alternatives to overcome the natural asynchrony of modalities in Audio-Visual Speech Recognition. We first investigate the use of asynchronous statistical models based on Dynamic Bayesian Networks with different levels of asynchrony. We show that audio-visual models should consider asynchrony within word boundaries and not at phoneme level. The second approach to the problem includes an additional processing of the features before being used for recognition. The proposed technique aligns the temporal evolution of the audio and video streams in terms of a speech-recognition system and enables the use of simpler statistical models for classification. On both cases we report experiments with the CUAVE database, showing the improvements obtained with the proposed asynchronous model and feature processing technique compared to traditional systems.
在本文中,我们提出了两种替代方案来克服在视听语音识别模式的自然异步。我们首先研究了基于不同异步级别的动态贝叶斯网络的异步统计模型的使用。我们表明,视听模型应该考虑词边界内的异步性,而不是音素水平。该问题的第二种方法包括在用于识别之前对特征进行额外处理。所提出的技术将音频和视频流的时间演变与语音识别系统相一致,并允许使用更简单的统计模型进行分类。在这两种情况下,我们报告了使用CUAVE数据库的实验,与传统系统相比,所提出的异步模型和特征处理技术获得了改进。
{"title":"Overcoming asynchrony in Audio-Visual Speech Recognition","authors":"V. Estellers, J. Thiran","doi":"10.1109/MMSP.2010.5662066","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662066","url":null,"abstract":"In this paper we propose two alternatives to overcome the natural asynchrony of modalities in Audio-Visual Speech Recognition. We first investigate the use of asynchronous statistical models based on Dynamic Bayesian Networks with different levels of asynchrony. We show that audio-visual models should consider asynchrony within word boundaries and not at phoneme level. The second approach to the problem includes an additional processing of the features before being used for recognition. The proposed technique aligns the temporal evolution of the audio and video streams in terms of a speech-recognition system and enables the use of simpler statistical models for classification. On both cases we report experiments with the CUAVE database, showing the improvements obtained with the proposed asynchronous model and feature processing technique compared to traditional systems.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130315000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Hybrid Compressed Sensing of images 图像的混合压缩感知
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662001
A. A. Moghadam, H. Radha
We consider the problem of recovering a signal/image (x) with a k-sparse representation, from hybrid (complex and real), noiseless linear samples (y) using a mixture of complex-valued sparse and real-valued dense projections within a single matrix. The proposed Hybrid Compressed Sensing (HCS) employs the complex-sparse part of the projection matrix to divide the n-dimensional signal (x) into subsets. In turn, each subset of the signal (coefficients) is mapped onto a complex sample of the measurement vector (y). Under a worst-case scenario of such sparsity-induced mapping, when the number of complex sparse measurements is sufficiently large then this mapping leads to the isolation of a significant fraction of the k non-zero coefficients into different complex measurement samples from y. Using a simple property of complex numbers (namely complex phases) one can identify the isolated non-zeros of x. After reducing the effect of the identified non-zero coefficients from the compressive samples, we utilize the real-valued dense submatrix to form a full rank system of equations to recover the signal values in the remaining indices (that are not recovered by the sparse complex projection part). We show that the proposed hybrid approach can recover a k-sparse signal (with high probability) while requiring only m ≈ 3√n/2k real measurements (where each complex sample is counted as two real measurements). We also derive expressions for the optimal mix of complex-sparse and real-dense rows within an HCS projection matrix. Further, in a practical range of sparsity ratio (k/n) suitable for images, the hybrid approach outperforms even the most complex compressed sensing frameworks (namely basis pursuit with dense Gaussian matrices). The theoretical complexity of HCS is less than the complexity of solving a full-rank system of m linear equations. In practice, the complexity can be lower than this bound.
我们考虑在单个矩阵中使用复值稀疏和实值密集投影的混合,从混合(复和实),无噪声线性样本(y)中恢复具有k稀疏表示的信号/图像(x)的问题。提出的混合压缩感知(HCS)利用投影矩阵的复稀疏部分将n维信号(x)划分为子集。反过来,信号的每个子集(系数)被映射到测量向量(y)的一个复杂样本上。在这种稀疏性映射的最坏情况下,当复杂稀疏测量的数量足够大时,这种映射导致k个非零系数中的很大一部分被隔离到来自y的不同复杂测量样本中。使用复数的简单性质(即复相),可以识别x的孤立非零。在减少从压缩样本中识别的非零系数的影响后,我们利用实值稠密子矩阵形成一个全秩方程组来恢复剩余指标(未被稀疏复投影部分恢复)中的信号值。我们表明,所提出的混合方法可以恢复k-稀疏信号(高概率),同时只需要m≈3√n/2k个实际测量(其中每个复杂样本被视为两个实际测量)。我们还推导了HCS投影矩阵中复稀疏行和实密集行最优混合的表达式。此外,在适合图像的稀疏比(k/n)的实际范围内,混合方法甚至优于最复杂的压缩感知框架(即密集高斯矩阵的基追踪)。HCS的理论复杂度小于求解m个线性方程组的全秩系统的复杂度。在实践中,复杂度可以低于这个界限。
{"title":"Hybrid Compressed Sensing of images","authors":"A. A. Moghadam, H. Radha","doi":"10.1109/MMSP.2010.5662001","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662001","url":null,"abstract":"We consider the problem of recovering a signal/image (x) with a k-sparse representation, from hybrid (complex and real), noiseless linear samples (y) using a mixture of complex-valued sparse and real-valued dense projections within a single matrix. The proposed Hybrid Compressed Sensing (HCS) employs the complex-sparse part of the projection matrix to divide the n-dimensional signal (x) into subsets. In turn, each subset of the signal (coefficients) is mapped onto a complex sample of the measurement vector (y). Under a worst-case scenario of such sparsity-induced mapping, when the number of complex sparse measurements is sufficiently large then this mapping leads to the isolation of a significant fraction of the k non-zero coefficients into different complex measurement samples from y. Using a simple property of complex numbers (namely complex phases) one can identify the isolated non-zeros of x. After reducing the effect of the identified non-zero coefficients from the compressive samples, we utilize the real-valued dense submatrix to form a full rank system of equations to recover the signal values in the remaining indices (that are not recovered by the sparse complex projection part). We show that the proposed hybrid approach can recover a k-sparse signal (with high probability) while requiring only m ≈ 3√n/2k real measurements (where each complex sample is counted as two real measurements). We also derive expressions for the optimal mix of complex-sparse and real-dense rows within an HCS projection matrix. Further, in a practical range of sparsity ratio (k/n) suitable for images, the hybrid approach outperforms even the most complex compressed sensing frameworks (namely basis pursuit with dense Gaussian matrices). The theoretical complexity of HCS is less than the complexity of solving a full-rank system of m linear equations. In practice, the complexity can be lower than this bound.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131158365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2010 IEEE International Workshop on Multimedia Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1