2010 IEEE International Workshop on Multimedia Signal Processing最新文献

英文中文

A weighted approach of missing data technique in cepstra domain based on S-function 一种基于s函数的倒频谱域缺失数据加权方法

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5661987

Pei Yi, Yubo Ge

The application of Missing Data Technique (MDT) has shown to improve the performance of speech recognition. To apply MDT to cepstral domain, this paper presents a weighted approach to compute the reliability of cepstral feature based on sigmoid function and introduces a weighted distance algorithm. It is deduced that the reliability compensates the Gaussian variance in hidden Markov model (HMM) frame by frame to reduce the mismatch between clean-trained model and corrupted speech. Experimental evaluation using the Aurora2 database demonstrates a distinct digit error rate reduction. The main advantages of the approach are simple system implementation, low computation cost and easy to plug into other robust recognition algorithm.

缺失数据技术(MDT)在语音识别中的应用已被证明可以提高语音识别的性能。为了将MDT应用于倒谱域，提出了一种基于sigmoid函数的倒谱特征可靠性加权计算方法，并引入了加权距离算法。通过对隐马尔可夫模型(HMM)中高斯方差的逐帧补偿，推导出可靠度可以减少训练良好的模型与错误语音之间的不匹配。使用Aurora2数据库的实验评估表明，数字错误率明显降低。该方法的主要优点是系统实现简单，计算成本低，易于插入其他鲁棒识别算法。

引用次数: 3

Improving multiple-F0 estimation by onset detection for polyphonic music transcription 利用起音检测改进复调音乐转录的多重f0估计

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5661985

F. Canadas-Quesada, F. J. Rodríguez-Serrano, P. Vera-Candeas, N. Ruiz-Reyes, J. Carabias-Orti

In a monaural polyphonic context, music transcription and specifically, multiple-F0 estimation systems have achieved promising results in the last decade. However, most of these systems present intermittent misses of pitch within a note or inaccurate definitions about onsets and offsets due to frame-by-frame analysis. In this paper, we propose a multiple-F0 estimation system which extracts a set of active pitches at each frame (analysis frame) but note tracking is performed defining temporal intervals by an accurate onset detector. Our system shows promising results, in terms of onset and multiple-F0 estimation, to be evaluated using real-world and synthesized polyphonic music recordings taken from MAPS music database.

在单音复调环境中，音乐转录，特别是多f0估计系统在过去十年中取得了可喜的成果。然而，由于逐帧分析，这些系统中的大多数在一个音符中出现间歇性的音高缺失，或者对起跳和偏移的定义不准确。在本文中，我们提出了一种多重f0估计系统，该系统在每帧(分析帧)提取一组活动音高，但音符跟踪是通过精确的起始检测器定义时间间隔进行的。我们的系统在开始和多重f0估计方面显示出有希望的结果，可以使用从MAPS音乐数据库中获取的真实世界和合成复调音乐录音进行评估。

引用次数: 3

Object tracking under illumination variations using 2D-cepstrum characteristics of the target 利用目标的二维倒谱特征对光照变化下的目标进行跟踪

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662076

Fuat Çogun, A. Cetin

Most video processing applications require object tracking as it is the base operation for real-time implementations such as surveillance, monitoring and video compression. Therefore, accurate tracking of an object under varying scene conditions is crucial for robustness. It is well known that illumination variations on the observed scene and target are an obstacle against robust object tracking causing the tracker lose the target. In this paper, a 2D-cepstrum based approach is proposed to overcome this problem. Cepstral domain features extracted from the target region are introduced into the covari-ance tracking algorithm and it is experimentally observed that 2D-cepstrum analysis of the target object provides robustness to varying illumination conditions. Another contribution of the paper is the development of the co-difference matrix based object tracking instead of the recently introduced covariance matrix based method.

大多数视频处理应用都需要对象跟踪，因为它是实时实现(如监视、监控和视频压缩)的基础操作。因此，在不同的场景条件下准确跟踪目标对鲁棒性至关重要。众所周知，观测场景和目标的光照变化是鲁棒目标跟踪的障碍，会导致跟踪器丢失目标。本文提出了一种基于二维倒谱的方法来克服这一问题。将从目标区域提取的倒谱域特征引入到协方差跟踪算法中，实验观察到目标物体的二维倒谱分析对不同光照条件具有鲁棒性。本文的另一个贡献是开发了基于协差矩阵的目标跟踪方法，取代了最近引入的基于协方差矩阵的目标跟踪方法。

引用次数: 8

H.264-based multiple description coding using motion compensated temporal interpolation 基于h .264的多描述编码，采用运动补偿时间插值

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662026

C. Greco, Marco Cagnazzo, B. Pesquet-Popescu

Multiple description coding is a framework adapted to noisy transmission environments. In this work, we use H.264 to create two descriptions of a video sequence, each of them assuring a minimum quality level. If both of them are received, a suitable algorithm is used to produce an improved quality sequence. The key technique is a temporal image interpolation using motion compensation, inspired to the distributed video coding context. The interpolated image blocks are weighted with the received blocks obtained from the other description. The optimal weights are computed at the encoder and efficiently sent to the decoder as side information. The proposed technique shows a remarkable gain for central decoding with respect to similar methods available in the state of the art.

多描述编码是一种适应噪声传输环境的编码框架。在这项工作中，我们使用H.264创建视频序列的两个描述，每个描述都保证了最低质量水平。如果两者都接收到，则使用合适的算法来产生改进的质量序列。关键技术是利用运动补偿的时间图像插值，灵感来自分布式视频编码环境。插值后的图像块与从其他描述中获得的接收块进行加权。最优权重在编码器处计算，并作为侧信息有效地发送给解码器。与现有的类似方法相比，所提出的技术在中央解码方面取得了显著的进步。

引用次数: 11

Human emotion recognition using real 3D visual features from Gabor library 使用Gabor库中的真实3D视觉特征进行人类情感识别

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662073

Tie Yun, L. Guan

Emotional state recognition is an important component for efficient human-computer interaction. Most existing works address this problem using 2D features, but they are sensitive to head pose, clutter, and variations in lighting conditions. The general 3D based methods only consider geometric information for feature extraction. In this paper, we present a real 3D visual features based method for human emotion recognition. 3D geometric information plus colour/density information of the facial expressions are extracted by 3D Gabor library to construct visual feature vectors. The filter's scale, orientation, and shape of the library are specified according to the appearance patterns of the 3D facial expressions. An improved kernel canonical correlation analysis (IKCCA) algorithm is proposed for final decision. From training samples, the semantic ratings that describe the different facial expressions are computed by IKCCA to generate a seven dimensional semantic expression vector. It is applied for learning the correlation with different testing samples. According to this correlation, we estimate the associated expression vector and perform expression classification. From experiment results, our proposed method demonstrates impressive performance.

情绪状态识别是实现高效人机交互的重要组成部分。大多数现有的作品都使用2D特征来解决这个问题，但它们对头部姿势、杂乱和光照条件的变化很敏感。一般的基于三维的方法只考虑几何信息进行特征提取。本文提出了一种基于真实三维视觉特征的人类情感识别方法。利用三维Gabor库提取面部表情的三维几何信息和颜色/密度信息，构建视觉特征向量。过滤器的尺度、方向和库的形状是根据3D面部表情的外观模式指定的。提出了一种改进的核典型相关分析(IKCCA)算法用于最终决策。从训练样本中，IKCCA计算描述不同面部表情的语义等级，生成七维语义表达向量。它用于学习与不同测试样本的相关性。根据这种相关性，我们估计相关的表达向量并进行表达分类。实验结果表明，该方法具有良好的性能。

{"title":"Human emotion recognition using real 3D visual features from Gabor library","authors":"Tie Yun, L. Guan","doi":"10.1109/MMSP.2010.5662073","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662073","url":null,"abstract":"Emotional state recognition is an important component for efficient human-computer interaction. Most existing works address this problem using 2D features, but they are sensitive to head pose, clutter, and variations in lighting conditions. The general 3D based methods only consider geometric information for feature extraction. In this paper, we present a real 3D visual features based method for human emotion recognition. 3D geometric information plus colour/density information of the facial expressions are extracted by 3D Gabor library to construct visual feature vectors. The filter's scale, orientation, and shape of the library are specified according to the appearance patterns of the 3D facial expressions. An improved kernel canonical correlation analysis (IKCCA) algorithm is proposed for final decision. From training samples, the semantic ratings that describe the different facial expressions are computed by IKCCA to generate a seven dimensional semantic expression vector. It is applied for learning the correlation with different testing samples. According to this correlation, we estimate the associated expression vector and perform expression classification. From experiment results, our proposed method demonstrates impressive performance.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122068802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Spatial intra-prediction based on mixtures of sparse representations 基于稀疏表示混合的空间内预测

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662044

Angélique Dremeau, Mehmet Türkan, C. Herzet, C. Guillemot, J. Fuchs

In this paper, we consider the problem of spatial prediction based on sparse representations. Several algorithms dealing with this problem can be found in the literature. We propose a novel method involving a mixture of sparse representations. We first place this approach into a probabilistic framework and then derive a practical procedure to solve it. Comparisons of the rate-distortion performance show the superiority of the proposed algorithm with regard to other state-of-the-art algorithms.

本文研究了基于稀疏表示的空间预测问题。处理这个问题的几个算法可以在文献中找到。我们提出了一种涉及混合稀疏表示的新方法。我们首先把这种方法放到一个概率框架中，然后推导出一个实用的程序来解决它。通过对率失真性能的比较，表明了该算法相对于其他先进算法的优越性。

引用次数: 4

Side information refinement for long duration GOPs in DVC DVC中长时间GOPs的边信息细化

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662038

G. Petrazzuoli, Thomas Maugey, Marco Cagnazzo, B. Pesquet-Popescu

Side information generation is a critical step in distributed video coding systems. This is performed by using motion compensated temporal interpolation between two or more key frames (KFs). However, when the temporal distance between key frames increases (i.e. when the GOP size becomes large), the linear interpolation becomes less effective. In a previous work we showed that this problem can be mitigated by using high order interpolation. Now, in the case of long duration GOP, state-of-the-art algorithms propose a hierarchical algorithm for side information generation. By using this procedure, the quality of the central interpolated image in a GOP is consistently worse than images closer to the KFs. In this paper we propose a refinement of the central WZFs by higher order interpolation of the already decoded WZFs, that are closer to the WZF to be estimated. So we reduce the fluctuation of side information quality, with a beneficial impact on final rate-distortion characteristics of the system. The experimental results show an improvement on the SI up to 2.71 dB with respect the state-of-the-art and a global improvement of the PSNR on the decoded frames up to 0.71 dB and a bit rate reduction up to 15 %.

边信息生成是分布式视频编码系统的关键步骤。这是通过在两个或多个关键帧(KFs)之间使用运动补偿时间插值来实现的。然而，当关键帧之间的时间距离增加时(即当GOP大小变大时)，线性插值变得不那么有效。在之前的工作中，我们证明了这个问题可以通过使用高阶插值来缓解。现在，在长时间GOP的情况下，最先进的算法提出了一种分层算法来生成侧信息。通过使用这种方法，GOP中中心插值图像的质量始终比靠近KFs的图像差。在本文中，我们提出了一种中心WZF的改进方法，即对已经解码的WZF进行高阶插值，使其更接近待估计的WZF。因此，我们减少了侧信息质量的波动，对系统的最终率失真特性产生了有益的影响。实验结果表明，相对于最新技术，SI提高了2.71 dB，解码帧的PSNR整体提高了0.71 dB，比特率降低了15%。

{"title":"Side information refinement for long duration GOPs in DVC","authors":"G. Petrazzuoli, Thomas Maugey, Marco Cagnazzo, B. Pesquet-Popescu","doi":"10.1109/MMSP.2010.5662038","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662038","url":null,"abstract":"Side information generation is a critical step in distributed video coding systems. This is performed by using motion compensated temporal interpolation between two or more key frames (KFs). However, when the temporal distance between key frames increases (i.e. when the GOP size becomes large), the linear interpolation becomes less effective. In a previous work we showed that this problem can be mitigated by using high order interpolation. Now, in the case of long duration GOP, state-of-the-art algorithms propose a hierarchical algorithm for side information generation. By using this procedure, the quality of the central interpolated image in a GOP is consistently worse than images closer to the KFs. In this paper we propose a refinement of the central WZFs by higher order interpolation of the already decoded WZFs, that are closer to the WZF to be estimated. So we reduce the fluctuation of side information quality, with a beneficial impact on final rate-distortion characteristics of the system. The experimental results show an improvement on the SI up to 2.71 dB with respect the state-of-the-art and a global improvement of the PSNR on the decoded frames up to 0.71 dB and a bit rate reduction up to 15 %.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123137701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Fitting pinna-related transfer functions to anthropometry for binaural sound rendering 拟合与峰峰相关的传递函数到人体测量中用于双耳声音渲染

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662018

Simone Spagnol, M. Geronazzo, F. Avanzini

This paper faces the general problem of modeling pinna-related transfer functions (PRTFs) for 3-D sound rendering. Following a structural approach, we aim at constructing a model for PRTF synthesis which allows to control separately the evolution of ear resonances and spectral notches through the design of two distinct filter blocks. Taking such model as endpoint, we propose a method based on the McAulay-Quatieri partial tracking algorithm to extract the frequencies of the most important spectral notches. Ray-tracing analysis performed on the so obtained tracks reveals a convincing correspondence between extracted frequencies and pinna geometry of a bunch of subjects.

本文面对的是三维声音渲染中与峰顶相关的传递函数(prtf)建模问题。遵循结构方法，我们的目标是构建一个PRTF合成模型，该模型允许通过设计两个不同的滤波器块分别控制耳共振和频谱陷波的演变。以该模型为终点，提出了一种基于McAulay-Quatieri部分跟踪算法的提取最重要频谱凹痕频率的方法。对获得的轨迹进行的光线追踪分析显示，提取的频率与一群受试者的耳廓几何形状之间存在令人信服的对应关系。

引用次数: 30

Joint source-channel coding/decoding of 3D-ESCOT bitstreams 3D-ESCOT位流的联合源信道编码/解码

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662034

M. Abid, M. Kieffer, B. Pesquet-Popescu

Joint source-channel decoding (JSCD) exploits residual redundancy in compressed bitstreams to improve the robustness to transmission errors of multimedia coding schemes. This paper proposes an architecture to introduce some additional side information in compressed streams to help JSCD. This architecture exploits a reference decoder already present or introduced at the encoder side. An application to the robust decoding of 3D-ESCOT encoded bitstreams generated within the Vidwav video coder is presented. The layered bitstream generated by this encoder allows SNR scalability, and moreover, when processed by a JSCD, provides increased robustness to transmission errors compared with a single layered bitstream.

联合源信道解码(JSCD)利用压缩码流中的剩余冗余来提高多媒体编码方案对传输错误的鲁棒性。本文提出了一种架构，在压缩流中引入一些附加的侧信息来帮助JSCD。这种架构利用了一个已经存在或在编码器端引入的参考解码器。介绍了在Vidwav视频编码器中生成的3D-ESCOT编码比特流的鲁棒解码应用。由该编码器生成的分层比特流允许信噪比可伸缩性，而且，当由JSCD处理时，与单层比特流相比，提供了更高的传输错误鲁棒性。

引用次数: 2

Parametric stereo extension of ITU-T G.722 based on a new downmixing scheme 基于新下混方案的ITU-T G.722参数立体声扩展

2010 IEEE International Workshop on Multimedia Signal Processing

Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662017

Thi Minh Nguyet Hoang, S. Ragot, Balázs Kövesi, P. Scalart

In this paper, we present a novel, frequency-domain stereo to mono downmixing, which preserves the energy of spectral components and avoids setting the left or right channel as a phase reference. Based on this downmixing technique, a parametric stereo analysis-synthesis model is described in which subband stereo parameters consist of interchannel level differences and phase differences between the mono signal and one of the stereo channels (left or right). This model is applied to the stereo extension of ITU-T G.722 at 56+8 and 64+16 kbit/s with a frame length of 5 ms. AB test results are provided to assess the quality of the proposed downmixing technique. In addition, the quality of the proposed G.722-based stereo coder is compared against reference coders (G.722.1 at 24 and 32 kbit/s dual mono and G.722 at 64 kbit/s dual mono) for clean speech, noisy speech and music.

在本文中，我们提出了一种新颖的频域立体声到单声道的下混，它保留了频谱分量的能量，并且避免了将左通道或右通道作为相位参考。基于这种下混技术，描述了一种参数立体声分析-合成模型，其中子带立体声参数由单声道信号与其中一个立体声通道(左或右)之间的声道间电平差和相位差组成。该模型适用于ITU-T G.722以56+8和64+16 kbit/s的帧长为5 ms的立体声扩展。给出了AB测试结果来评估所提出的下混技术的质量。此外，将所提出的基于G.722的立体声编码器的质量与参考编码器(24和32 kbit/s双单声道的G.722.1和64 kbit/s双单声道的G.722)进行了比较，用于清洁语音，嘈杂语音和音乐。

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2010 IEEE International Workshop on Multimedia Signal Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀