首页 > 最新文献

2010 IEEE International Workshop on Multimedia Signal Processing最新文献

英文 中文
Probabilistic framework for template-based chord recognition 基于模板的和弦识别的概率框架
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662016
L. Oudre, C. Févotte, Y. Grenier
This paper describes a method for chord recognition from audio signals. Our method provides a coherent and relevant probabilistic framework for template-based transcription. The only information needed for the transcription is the definition of the chords : in particular neither annotated audio data nor music theory knowledge is required. We extract from the signal a succession of chroma vectors which are our model observations. We propose a generative model for these observations from chord distribution probabilities and fixed chord templates. The parameters are evaluated through an EM algorithm. In order to capture the temporal structure, we apply some post-processing filtering methods before detecting the chords. Our method is evaluated on two audio corpus. Results show that our method outperforms state-of-the-art chord recognition methods and also gives more relevant chord transcriptions.
本文介绍了一种从音频信号中识别和弦的方法。我们的方法为基于模板的转录提供了一个连贯和相关的概率框架。转录所需的唯一信息是和弦的定义:特别是既不需要注释音频数据也不需要音乐理论知识。我们从信号中提取一系列色度向量,这些色度向量是我们的模型观测值。我们提出了一个基于弦分布概率和固定弦模板的生成模型。通过EM算法对参数进行评估。为了捕获时间结构,我们在检测和弦之前应用了一些后处理滤波方法。我们的方法在两个音频语料库上进行了评估。结果表明,我们的方法优于最先进的和弦识别方法,并且还提供了更多相关的和弦转录。
{"title":"Probabilistic framework for template-based chord recognition","authors":"L. Oudre, C. Févotte, Y. Grenier","doi":"10.1109/MMSP.2010.5662016","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662016","url":null,"abstract":"This paper describes a method for chord recognition from audio signals. Our method provides a coherent and relevant probabilistic framework for template-based transcription. The only information needed for the transcription is the definition of the chords : in particular neither annotated audio data nor music theory knowledge is required. We extract from the signal a succession of chroma vectors which are our model observations. We propose a generative model for these observations from chord distribution probabilities and fixed chord templates. The parameters are evaluated through an EM algorithm. In order to capture the temporal structure, we apply some post-processing filtering methods before detecting the chords. Our method is evaluated on two audio corpus. Results show that our method outperforms state-of-the-art chord recognition methods and also gives more relevant chord transcriptions.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"63 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114060227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multimodal speech recognition of a person with articulation disorders using AAM and MAF 利用AAM和MAF对发音障碍患者进行多模态语音识别
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662075
Chikoto Miyamoto, Yuto Komai, T. Takiguchi, Y. Ariki, I. Li
We investigated the speech recognition of a person with articulation disorders resulting from athetoid cerebral palsy. The articulation of speech tends to become unstable due to strain on speech-related muscles, and that causes degradation of speech recognition. Therefore, we use multiple acoustic frames (MAF) as an acoustic feature to solve this problem. Further, in a real environment, current speech recognition systems do not have sufficient performance due to noise influence. In addition to acoustic features, visual features are used to increase noise robustness in a real environment. However, there are recognition problems resulting from the tendency of those suffering from cerebral palsy to move their head erratically. We investigate a pose-robust audio-visual speech recognition method using an Active Appearance Model (AAM) to solve this problem for people with articulation disorders resulting from athetoid cerebral palsy. AAMs are used for face tracking to extract pose-robust facial feature points. Its effectiveness is confirmed by word recognition experiments on noisy speech of a person with articulation disorders.
我们调查了一个人的语音识别与衔接障碍导致的动脉样脑瘫。由于言语相关肌肉的紧张,言语的发音会变得不稳定,从而导致言语识别能力的下降。因此,我们使用多声框架(MAF)作为声学特征来解决这个问题。此外,在真实环境中,由于噪声的影响,当前的语音识别系统没有足够的性能。除了声学特征外,视觉特征还用于提高真实环境中的噪声鲁棒性。然而,脑瘫患者往往会不规律地移动头部,从而导致识别问题。我们研究了一种使用主动外观模型(AAM)的姿态鲁棒性视听语音识别方法,以解决由动脉样脑瘫引起的发音障碍患者的这一问题。aam用于人脸跟踪,提取具有姿态鲁棒性的人脸特征点。通过对发音障碍患者嘈杂言语的识别实验,验证了该方法的有效性。
{"title":"Multimodal speech recognition of a person with articulation disorders using AAM and MAF","authors":"Chikoto Miyamoto, Yuto Komai, T. Takiguchi, Y. Ariki, I. Li","doi":"10.1109/MMSP.2010.5662075","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662075","url":null,"abstract":"We investigated the speech recognition of a person with articulation disorders resulting from athetoid cerebral palsy. The articulation of speech tends to become unstable due to strain on speech-related muscles, and that causes degradation of speech recognition. Therefore, we use multiple acoustic frames (MAF) as an acoustic feature to solve this problem. Further, in a real environment, current speech recognition systems do not have sufficient performance due to noise influence. In addition to acoustic features, visual features are used to increase noise robustness in a real environment. However, there are recognition problems resulting from the tendency of those suffering from cerebral palsy to move their head erratically. We investigate a pose-robust audio-visual speech recognition method using an Active Appearance Model (AAM) to solve this problem for people with articulation disorders resulting from athetoid cerebral palsy. AAMs are used for face tracking to extract pose-robust facial feature points. Its effectiveness is confirmed by word recognition experiments on noisy speech of a person with articulation disorders.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128957255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Movement recognition exploiting multi-view information 利用多视角信息的运动识别
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662059
Alexandros Iosifidis, N. Nikolaidis, I. Pitas
In this paper a novel view-invariant movement recognition method is presented. A multi-camera setup is used to capture the movement from different observation angles. Identification of the position of each camera with respect to the subject's body is achieved by a procedure based on morphological operations and the proportions of the human body. Binary body masks from frames of all cameras, consistently arranged through the previous procedure, are concatenated to produce the so-called multi-view binary mask. These masks are rescaled and vectorized to create feature vectors in the input space. Fuzzy vector quantization is performed to associate input feature vectors with movement representations and linear discriminant analysis is used to map movements in a low dimensionality discriminant feature space. Experimental results show that the method can achieve very satisfactory recognition rates.
提出了一种新的视觉不变运动识别方法。多摄像头设置用于从不同的观察角度捕捉运动。通过基于形态学操作和人体比例的程序来识别每个相机相对于受试者身体的位置。所有摄像机帧的二进制体掩码,通过前面的程序一致地排列,被连接起来产生所谓的多视图二进制掩码。这些掩码被重新缩放和矢量化,以在输入空间中创建特征向量。通过模糊向量量化将输入特征向量与运动表示相关联,并使用线性判别分析在低维判别特征空间中映射运动。实验结果表明,该方法能取得令人满意的识别率。
{"title":"Movement recognition exploiting multi-view information","authors":"Alexandros Iosifidis, N. Nikolaidis, I. Pitas","doi":"10.1109/MMSP.2010.5662059","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662059","url":null,"abstract":"In this paper a novel view-invariant movement recognition method is presented. A multi-camera setup is used to capture the movement from different observation angles. Identification of the position of each camera with respect to the subject's body is achieved by a procedure based on morphological operations and the proportions of the human body. Binary body masks from frames of all cameras, consistently arranged through the previous procedure, are concatenated to produce the so-called multi-view binary mask. These masks are rescaled and vectorized to create feature vectors in the input space. Fuzzy vector quantization is performed to associate input feature vectors with movement representations and linear discriminant analysis is used to map movements in a low dimensionality discriminant feature space. Experimental results show that the method can achieve very satisfactory recognition rates.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126804575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Content identification based on digital fingerprint: What can be done if ML decoding fails? 基于数字指纹的内容识别:如果ML解码失败怎么办?
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5661995
F. Farhadzadeh, S. Voloshynovskiy, O. Koval
In this paper, the performance of the content identification based on digital fingerprinting and order statistic list decoding is analyzed by evaluating the probabilities of correct identification, false acceptance and the probability mass function of queried binary fingerprint position on the list of candidates. The particular attention is dedicated to the cases when traditional maximum likelihood decoder fails to produce the reliable content identification. The maximum likelihood decoding is shown to be a particular case of order statistic list decoding for the list size equals 1. We demonstrate the efficiency of the proposed content identification system performance by investigating the probability mass function behavior and imposing the constraint on the cardinality of list size.
本文通过评估正确识别概率、错误接受概率和查询到的二进制指纹在候选指纹列表上位置的概率质量函数,分析了基于数字指纹和顺序统计列表解码的内容识别性能。特别注意传统的最大似然解码器不能产生可靠的内容识别的情况。最大似然解码显示为列表大小等于1的顺序统计列表解码的特殊情况。我们通过研究概率质量函数行为和对列表大小的基数性施加约束来证明所提出的内容识别系统性能的有效性。
{"title":"Content identification based on digital fingerprint: What can be done if ML decoding fails?","authors":"F. Farhadzadeh, S. Voloshynovskiy, O. Koval","doi":"10.1109/MMSP.2010.5661995","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661995","url":null,"abstract":"In this paper, the performance of the content identification based on digital fingerprinting and order statistic list decoding is analyzed by evaluating the probabilities of correct identification, false acceptance and the probability mass function of queried binary fingerprint position on the list of candidates. The particular attention is dedicated to the cases when traditional maximum likelihood decoder fails to produce the reliable content identification. The maximum likelihood decoding is shown to be a particular case of order statistic list decoding for the list size equals 1. We demonstrate the efficiency of the proposed content identification system performance by investigating the probability mass function behavior and imposing the constraint on the cardinality of list size.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115387887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bit allocation and encoded view selection for optimal multiview image representation 位分配和编码视图选择的最佳多视图图像表示
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662025
Gene Cheung, V. Velisavljevic
Novel coding tools have been proposed recently to encode texture and depth maps of multiview images, exploiting inter-view correlations, for depth-image-based rendering (DIBR). However, the important associated bit allocation problem for DIBR remains open: for chosen view coding and synthesis tools, how to allocate bits among texture and depth maps across encoded views, so that the fidelity of a set of V views reconstructed at the decoder is maximized, for a fixed bitrate budget? In this paper, we present an optimization strategy to select subset of texture and depth maps of the original V views for encoding at appropriate quantization levels, so that at the decoder, the combined quality of decoded views (using encoded texture maps) and synthesized views (using encoded texture and depth maps of neighboring views) is maximized. We show that using the monotonicity property, complexity of our strategy can be greatly reduced. Experiments show that using our strategy, one can achieve up to 0.83dB gain in PSNR improvement over a heuristic scheme of encoding only texture maps of all V views at constant quantization levels. Further, computation can be reduced by up to 66% over a full parameter search approach.
近年来,人们提出了一种新的编码工具来编码多视点图像的纹理图和深度图,利用视点间的相关性进行深度图像渲染(DIBR)。然而,DIBR的重要相关位分配问题仍然是开放的:对于选择的视图编码和合成工具,如何在编码视图的纹理和深度图之间分配位,以便在解码器重构的一组V视图的保真度最大化,以固定的比特率预算?在本文中,我们提出了一种优化策略,选择原始V视图的纹理和深度图子集在适当的量化级别进行编码,从而在解码器上最大化解码视图(使用编码的纹理图)和合成视图(使用编码的相邻视图的纹理和深度图)的综合质量。结果表明,利用单调性可以大大降低策略的复杂度。实验表明,与仅编码所有V视图的纹理映射的启发式方案相比,使用我们的策略可以实现高达0.83dB的PSNR改进。此外,与全参数搜索方法相比,计算量最多可减少66%。
{"title":"Bit allocation and encoded view selection for optimal multiview image representation","authors":"Gene Cheung, V. Velisavljevic","doi":"10.1109/MMSP.2010.5662025","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662025","url":null,"abstract":"Novel coding tools have been proposed recently to encode texture and depth maps of multiview images, exploiting inter-view correlations, for depth-image-based rendering (DIBR). However, the important associated bit allocation problem for DIBR remains open: for chosen view coding and synthesis tools, how to allocate bits among texture and depth maps across encoded views, so that the fidelity of a set of V views reconstructed at the decoder is maximized, for a fixed bitrate budget? In this paper, we present an optimization strategy to select subset of texture and depth maps of the original V views for encoding at appropriate quantization levels, so that at the decoder, the combined quality of decoded views (using encoded texture maps) and synthesized views (using encoded texture and depth maps of neighboring views) is maximized. We show that using the monotonicity property, complexity of our strategy can be greatly reduced. Experiments show that using our strategy, one can achieve up to 0.83dB gain in PSNR improvement over a heuristic scheme of encoding only texture maps of all V views at constant quantization levels. Further, computation can be reduced by up to 66% over a full parameter search approach.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134565132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Considering security and robustness constraints for watermark-based Tardos fingerprinting 考虑基于水印的Tardos指纹识别的安全性和鲁棒性约束
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5661992
B. Mathon, P. Bas, François Cayre, B. Macq
This article is a theoretical study on binary Tardos' fingerprinting codes embedded using watermarking schemes. Our approach is derived from [1] and encompasses both security and robustness constraints. We assume here that the coalition has estimated the symbols of the fingerprinting code by the way of a security attack, the quality of the estimation relying on the security of the watermarking scheme. Taking into account the fact that the coalition can perform estimation errors, we update the Worst Case Attack, which minimises the mutual information between the sequence of one colluder and the pirated sequence forged by the coalition. After comparing the achievable rates of the previous and proposed Worst Case Attack according to the estimation error, we conclude this analysis by comparing the robustness of no-secure embedding schemes versus secure ones. We show that, for low probabilities of error during the decoding stage (e.g. highly robust watermarking schemes), security enables to increase the achievable rate of the fingerprinting scheme.
本文对利用水印技术嵌入二进制塔尔多斯指纹码进行了理论研究。我们的方法来源于[1],包含了安全性和健壮性约束。我们假设联盟已经通过安全攻击的方式估计了指纹码的符号,估计的质量依赖于水印方案的安全性。考虑到联盟可能产生估计误差的事实,我们更新了最坏情况攻击,使一个共谋者的序列与联盟伪造的盗版序列之间的互信息最小化。在根据估计误差比较了之前和提出的最坏情况攻击的可实现率之后,我们通过比较非安全嵌入方案与安全嵌入方案的鲁棒性来总结本文的分析。我们表明,对于解码阶段的低错误概率(例如高度鲁棒的水印方案),安全性可以提高指纹方案的可实现率。
{"title":"Considering security and robustness constraints for watermark-based Tardos fingerprinting","authors":"B. Mathon, P. Bas, François Cayre, B. Macq","doi":"10.1109/MMSP.2010.5661992","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5661992","url":null,"abstract":"This article is a theoretical study on binary Tardos' fingerprinting codes embedded using watermarking schemes. Our approach is derived from [1] and encompasses both security and robustness constraints. We assume here that the coalition has estimated the symbols of the fingerprinting code by the way of a security attack, the quality of the estimation relying on the security of the watermarking scheme. Taking into account the fact that the coalition can perform estimation errors, we update the Worst Case Attack, which minimises the mutual information between the sequence of one colluder and the pirated sequence forged by the coalition. After comparing the achievable rates of the previous and proposed Worst Case Attack according to the estimation error, we conclude this analysis by comparing the robustness of no-secure embedding schemes versus secure ones. We show that, for low probabilities of error during the decoding stage (e.g. highly robust watermarking schemes), security enables to increase the achievable rate of the fingerprinting scheme.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131392869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient MV prediction for zonal search in video transcoding 视频转码中区域搜索的高效MV预测
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662024
S. Marcelino, S. Faria, P. Assunção, S. Moiron, M. Ghanbari
This paper proposes a method to efficiently find motion vector predictions for zonal search motion re-estimation in fast video transcoders. The motion information extracted from the incoming video stream is processed to generate accurate motion vector predictions for transcoding with reduced complexity. Our results demonstrate that motion vector predictions computed by the proposed method outperform those generated by the highly efficient EPZS (Enhanced Predictive Zonal Search) algorithm in H.264/AVC transcoders. The computational complexity is reduced up to 59.6% at negligible cost in R-D performance. The proposed method can be useful in multimedia systems and applications using any type of transcoder, such as transrating and/or spatial resolution downsizing.
针对快速视频转码器中区域搜索运动重估计问题,提出了一种高效的运动向量预测方法。从传入视频流中提取的运动信息被处理以生成准确的运动矢量预测,从而降低了转码的复杂性。结果表明,在H.264/AVC转码器中,采用该方法计算的运动向量预测结果优于高效的EPZS (Enhanced Predictive zone Search)算法。在研发性能方面,计算复杂性降低了59.6%,成本可以忽略不计。所提出的方法可用于使用任何类型的转码器的多媒体系统和应用,例如翻译和/或空间分辨率缩小。
{"title":"Efficient MV prediction for zonal search in video transcoding","authors":"S. Marcelino, S. Faria, P. Assunção, S. Moiron, M. Ghanbari","doi":"10.1109/MMSP.2010.5662024","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662024","url":null,"abstract":"This paper proposes a method to efficiently find motion vector predictions for zonal search motion re-estimation in fast video transcoders. The motion information extracted from the incoming video stream is processed to generate accurate motion vector predictions for transcoding with reduced complexity. Our results demonstrate that motion vector predictions computed by the proposed method outperform those generated by the highly efficient EPZS (Enhanced Predictive Zonal Search) algorithm in H.264/AVC transcoders. The computational complexity is reduced up to 59.6% at negligible cost in R-D performance. The proposed method can be useful in multimedia systems and applications using any type of transcoder, such as transrating and/or spatial resolution downsizing.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131898265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Generalized multiscale seam carving 广义多尺度缝刻
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662048
David D. Conger, Mrityunjay Kumar, H. Radha
With the abundance and variety of display devices, novel image resizing techniques have become more desirable. Content-aware image resizing (retargeting) techniques have been proposed that show improvement over traditional techniques such as cropping and resampling. In particular, seam carving has gained attention as an effective solution, using simple filters to detect and preserve the high-energy areas of an image. Yet, it stands to be more robust to a variety of image types. To facilitate such improvement, we recast seam carving in a more general framework and in the context of filter banks. This enables improved filter design, and leads to a multiscale model that addresses the problem of scale of image features. We have found our generalized multiscale model to improve on the existing seam carving method for a variety of images.
随着显示设备的丰富和多样化,新的图像大小调整技术已成为人们所需要的。内容感知图像调整(重定向)技术已经被提出,显示出传统技术如裁剪和重采样的改进。特别是,接缝雕刻作为一种有效的解决方案而受到关注,它使用简单的过滤器来检测和保留图像的高能区域。然而,它对各种图像类型的鲁棒性更强。为了促进这种改进,我们在更一般的框架和过滤器组的背景下重新定义了接缝雕刻。这使得改进的滤波器设计,并导致多尺度模型,解决了图像特征的规模问题。我们建立了一种广义的多尺度模型,以改进现有的各种图像的缝雕刻方法。
{"title":"Generalized multiscale seam carving","authors":"David D. Conger, Mrityunjay Kumar, H. Radha","doi":"10.1109/MMSP.2010.5662048","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662048","url":null,"abstract":"With the abundance and variety of display devices, novel image resizing techniques have become more desirable. Content-aware image resizing (retargeting) techniques have been proposed that show improvement over traditional techniques such as cropping and resampling. In particular, seam carving has gained attention as an effective solution, using simple filters to detect and preserve the high-energy areas of an image. Yet, it stands to be more robust to a variety of image types. To facilitate such improvement, we recast seam carving in a more general framework and in the context of filter banks. This enables improved filter design, and leads to a multiscale model that addresses the problem of scale of image features. We have found our generalized multiscale model to improve on the existing seam carving method for a variety of images.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129655326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Optimal mode switching for multi-hypothesis motion compensated prediction 多假设运动补偿预测的最优模式切换
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662021
Ramdas Satyan, F. Labeau, K. Rose
Transmission of compressed video over unreliable networks is vulnerable to errors and error propagation. Multi-hypothesis motion compensated prediction (MHMCP) which was originally developed to improve compression efficiency has been shown to have a good error resilience property. In this paper we improve the overall performance of MHMCP in packet loss scenarios by performing optimal mode switching within a rate distortion framework. The approach builds on the recursive optimal per-pixel estimate (ROPE), which is extended by re-deriving recursion formulas for the more complex MHMCP setting, so as to achieve an accurate estimation of the end-to-end distortion. Simulation results show significant performance gains over the standard MHMCP scheme and the importance of performing effective mode decisions. We also show results with comparison to conventional ROPE.
在不可靠的网络上传输压缩视频容易出现错误和错误传播。多假设运动补偿预测(MHMCP)最初是为了提高压缩效率而发展起来的,具有良好的抗误差性能。在本文中,我们通过在速率失真框架内执行最优模式切换来提高丢包场景下MHMCP的整体性能。该方法建立在递归最优逐像素估计(ROPE)的基础上,通过对更复杂的MHMCP设置重新推导递归公式进行扩展,从而实现对端到端失真的准确估计。仿真结果表明,与标准MHMCP方案相比,该方案的性能有了显著提高,同时也表明了进行有效模式决策的重要性。我们还展示了与传统ROPE的比较结果。
{"title":"Optimal mode switching for multi-hypothesis motion compensated prediction","authors":"Ramdas Satyan, F. Labeau, K. Rose","doi":"10.1109/MMSP.2010.5662021","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662021","url":null,"abstract":"Transmission of compressed video over unreliable networks is vulnerable to errors and error propagation. Multi-hypothesis motion compensated prediction (MHMCP) which was originally developed to improve compression efficiency has been shown to have a good error resilience property. In this paper we improve the overall performance of MHMCP in packet loss scenarios by performing optimal mode switching within a rate distortion framework. The approach builds on the recursive optimal per-pixel estimate (ROPE), which is extended by re-deriving recursion formulas for the more complex MHMCP setting, so as to achieve an accurate estimation of the end-to-end distortion. Simulation results show significant performance gains over the standard MHMCP scheme and the importance of performing effective mode decisions. We also show results with comparison to conventional ROPE.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131021379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised detection of multimodal clusters in edited recordings 编辑录音中多模态簇的无监督检测
Pub Date : 2010-12-10 DOI: 10.1109/MMSP.2010.5662015
Alfred Dielmann
Edited video recordings, such as talk-shows and sitcoms, often include Audio-Visual clusters: frequent repetitions of closely related acoustic and visual content. For example during a political debate, every time that a given participant holds the conversational floor, her/his voice tends to co-occur with camera views (i.e. shots) showing her/his portrait. Differently from the previous Audio-Visual clustering works, this paper proposes an unsupervised approach that detects Audio-Visual clusters, avoiding to make assumptions on the recording content, such as the presence of specific participant voices or faces. Sequences of audio and shot clusters are automatically identified using unsupervised audio diarization and shot segmentation techniques. Audio-Visual clusters are then formed by ranking the co-occurrences between these two segmentations and selecting those which significantly go beyond chance. Numerical experiments performed on a collection of 70 political debates, comprising more than 43 hours of live edited recordings, showed that automatically extracted AudioVisual clusters well match the ground-truth annotation, achieving high purity performances.
编辑过的录像,如谈话节目和情景喜剧,通常包括视听组:密切相关的声音和视觉内容的频繁重复。例如,在一场政治辩论中,每当一个给定的参与者占据谈话空间时,她/他的声音往往与显示她/他肖像的镜头同时出现。与以往的视听聚类工作不同,本文提出了一种无监督的方法来检测视听聚类,避免了对录音内容进行假设,例如特定参与者的声音或面孔的存在。使用无监督音频化和镜头分割技术自动识别音频和镜头簇序列。然后通过对这两个部分之间的共同出现进行排序并选择那些明显超出偶然的部分来形成视听集群。在70场政治辩论中进行的数值实验,包括超过43小时的现场编辑录音,表明自动提取的视听集群与基础事实注释很好地匹配,实现了高纯度的性能。
{"title":"Unsupervised detection of multimodal clusters in edited recordings","authors":"Alfred Dielmann","doi":"10.1109/MMSP.2010.5662015","DOIUrl":"https://doi.org/10.1109/MMSP.2010.5662015","url":null,"abstract":"Edited video recordings, such as talk-shows and sitcoms, often include Audio-Visual clusters: frequent repetitions of closely related acoustic and visual content. For example during a political debate, every time that a given participant holds the conversational floor, her/his voice tends to co-occur with camera views (i.e. shots) showing her/his portrait. Differently from the previous Audio-Visual clustering works, this paper proposes an unsupervised approach that detects Audio-Visual clusters, avoiding to make assumptions on the recording content, such as the presence of specific participant voices or faces. Sequences of audio and shot clusters are automatically identified using unsupervised audio diarization and shot segmentation techniques. Audio-Visual clusters are then formed by ranking the co-occurrences between these two segmentations and selecting those which significantly go beyond chance. Numerical experiments performed on a collection of 70 political debates, comprising more than 43 hours of live edited recordings, showed that automatically extracted AudioVisual clusters well match the ground-truth annotation, achieving high purity performances.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116248218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2010 IEEE International Workshop on Multimedia Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1