首页 > 最新文献

2011 IEEE 13th International Workshop on Multimedia Signal Processing最新文献

英文 中文
Recognizing actions using salient features 使用显著特征识别动作
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093832
Liang Wang, Debin Zhao
Towards a compact video feature representation, we propose a novel feature selection methodology for action recognition based on the saliency maps of videos. Since saliency maps measure the perceptual importance of the pixels and regions in videos, selecting features using saliency maps enables us to find a feature representation that covers the informative parts of a video. Because saliency detection is a bottom-up procedure, some appearance changes or motions that are irrelevant to actions may also be detected as salient regions. To further improve the purity of the feature representation, we prune these irrelevant salient regions using the saliency values distribution and the spatial-temporal distribution of the salient regions. Extensive experiments are conducted to demonstrate that the proposed feature selection method largely improves the performance of bag-of-video-words model on action recognition based on three different attention models including a static attention model, a motion attention model and their combination.
为了紧凑的视频特征表示,我们提出了一种新的基于视频显著性映射的动作识别特征选择方法。由于显著性图测量视频中像素和区域的感知重要性,使用显著性图选择特征使我们能够找到覆盖视频信息部分的特征表示。由于显著性检测是一个自下而上的过程,一些与动作无关的外观变化或运动也可能被检测为显著区域。为了进一步提高特征表示的纯度,我们利用显著值分布和显著区域的时空分布对这些不相关的显著区域进行了修剪。大量的实验表明,基于静态注意模型、运动注意模型及其组合三种不同的注意模型,所提出的特征选择方法在很大程度上提高了视频词袋模型在动作识别中的性能。
{"title":"Recognizing actions using salient features","authors":"Liang Wang, Debin Zhao","doi":"10.1109/MMSP.2011.6093832","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093832","url":null,"abstract":"Towards a compact video feature representation, we propose a novel feature selection methodology for action recognition based on the saliency maps of videos. Since saliency maps measure the perceptual importance of the pixels and regions in videos, selecting features using saliency maps enables us to find a feature representation that covers the informative parts of a video. Because saliency detection is a bottom-up procedure, some appearance changes or motions that are irrelevant to actions may also be detected as salient regions. To further improve the purity of the feature representation, we prune these irrelevant salient regions using the saliency values distribution and the spatial-temporal distribution of the salient regions. Extensive experiments are conducted to demonstrate that the proposed feature selection method largely improves the performance of bag-of-video-words model on action recognition based on three different attention models including a static attention model, a motion attention model and their combination.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"65 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114529890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A system for dynamic playlist generation driven by multimodal control signals and descriptors 一个由多模态控制信号和描述符驱动的动态播放列表生成系统
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093850
Luca Chiarandini, M. Zanoni, A. Sarti
This work describes a general approach to multimedia playlist generation and description and an application of the approach to music information retrieval. The example of system that we implemented updates a musical playlist on the fly based on prior information (musical preferences); current descriptors of the song that is being played; and fine-grained and semantically rich descriptors (descriptors of user's gestures, of environment conditions, etc.). The system incorporates a learning system that infers the user's preferences. Subjective tests have been conducted on usability and quality of the recommendation system.
本工作描述了多媒体播放列表生成和描述的一般方法,以及该方法在音乐信息检索中的应用。我们所执行的系统示例是基于先验信息(音乐偏好)动态更新音乐播放列表;正在播放的歌曲的当前描述符;以及细粒度和语义丰富的描述符(用户手势、环境条件等的描述符)。该系统集成了一个推断用户偏好的学习系统。对推荐系统的可用性和质量进行了主观测试。
{"title":"A system for dynamic playlist generation driven by multimodal control signals and descriptors","authors":"Luca Chiarandini, M. Zanoni, A. Sarti","doi":"10.1109/MMSP.2011.6093850","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093850","url":null,"abstract":"This work describes a general approach to multimedia playlist generation and description and an application of the approach to music information retrieval. The example of system that we implemented updates a musical playlist on the fly based on prior information (musical preferences); current descriptors of the song that is being played; and fine-grained and semantically rich descriptors (descriptors of user's gestures, of environment conditions, etc.). The system incorporates a learning system that infers the user's preferences. Subjective tests have been conducted on usability and quality of the recommendation system.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114849512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Age estimation based on extended non-negative matrix factorization 基于扩展非负矩阵分解的年龄估计
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093779
Ce Zhan, W. Li, P. Ogunbona
Previous studies suggested that local appearance-based methods are more efficient than geometric-based and holistic methods for age estimation. This is mainly due to the fact that age information are usually encoded by the local features such as wrinkles and skin texture on the forehead or at the eye corners. However, the variations of theses features caused by other factors such as identity, expression, pose and lighting may be larger than that caused by aging. Thus, one of the key challenges of age estimation lies in constructing a feature space that could successfully recovers age information while ignoring other sources of variations. In this paper, non-negative matrix factorization (NMF) is extended to learn a localized non-overlapping subspace representation for age estimation. To emphasize the appearance variation in aging, one individual extended NMF subspace is learned for each age or age group. The age or age group of a given face image is then estimated based on its reconstruction error after being projected into the learned age subspaces. Furthermore, a coarse to fine scheme is employed for exact age estimation, so that the age is estimated within the pre-classified age groups. Cross-database tests are conducted using FG-NET and MORPH databases to evaluate the proposed method. Experimental results have demonstrated the efficacy of the method.
以往的研究表明,基于局部外观的年龄估计方法比基于几何和整体的年龄估计方法更有效。这主要是由于年龄信息通常是由前额或眼角的皱纹和皮肤纹理等局部特征编码的。然而,由身份、表情、姿势、光线等其他因素引起的这些特征的变化可能比年龄引起的变化更大。因此,年龄估计的关键挑战之一在于构建一个能够成功恢复年龄信息而忽略其他变化源的特征空间。本文将非负矩阵分解(NMF)扩展到学习年龄估计的局部非重叠子空间表示。为了强调衰老过程中的外观变化,我们为每个年龄或年龄组学习了一个扩展的NMF子空间。然后,将给定人脸图像投影到学习到的年龄子空间后,根据其重建误差估计其年龄或年龄组。此外,采用粗到细的精确年龄估计方案,使年龄估计在预分类的年龄组内。使用FG-NET和MORPH数据库进行了跨数据库测试,以评估所提出的方法。实验结果证明了该方法的有效性。
{"title":"Age estimation based on extended non-negative matrix factorization","authors":"Ce Zhan, W. Li, P. Ogunbona","doi":"10.1109/MMSP.2011.6093779","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093779","url":null,"abstract":"Previous studies suggested that local appearance-based methods are more efficient than geometric-based and holistic methods for age estimation. This is mainly due to the fact that age information are usually encoded by the local features such as wrinkles and skin texture on the forehead or at the eye corners. However, the variations of theses features caused by other factors such as identity, expression, pose and lighting may be larger than that caused by aging. Thus, one of the key challenges of age estimation lies in constructing a feature space that could successfully recovers age information while ignoring other sources of variations. In this paper, non-negative matrix factorization (NMF) is extended to learn a localized non-overlapping subspace representation for age estimation. To emphasize the appearance variation in aging, one individual extended NMF subspace is learned for each age or age group. The age or age group of a given face image is then estimated based on its reconstruction error after being projected into the learned age subspaces. Furthermore, a coarse to fine scheme is employed for exact age estimation, so that the age is estimated within the pre-classified age groups. Cross-database tests are conducted using FG-NET and MORPH databases to evaluate the proposed method. Experimental results have demonstrated the efficacy of the method.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121638809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Compression of compound images by combining several strategies 结合多种策略对复合图像进行压缩
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093824
Cuiling Lan, Jizheng Xu, Feng Wu
Compound images are combinations of text, graphics and natural images. They possess characteristics different from those of natural images, such as a strong anisotropy, sparse color histograms and repeated patterns. Former research on compressing them has mainly focused on developing certain strategies based on some of these characteristics but has failed so far to fully exploit them simultaneously. In this paper, we investigate the combination of four up-to-date strategies to construct a comprehensive scheme for compound image compression. We have implemented these strategies as four types of modes with variable block sizes. Experimental results show that the proposed scheme achieves significant coding gains for compound image compression at all bitrates.
复合图像是文本、图形和自然图像的组合。它们具有与自然图像不同的各向异性强、颜色直方图稀疏、图案重复等特点。以往关于压缩它们的研究主要集中在基于这些特征的某些策略上,但迄今为止未能同时充分利用它们。在本文中,我们研究了四种最新策略的组合,以构建一个综合方案的复合图像压缩。我们将这些策略实现为四种不同块大小的模式。实验结果表明,该方案在所有比特率下都能获得显著的复合图像压缩编码增益。
{"title":"Compression of compound images by combining several strategies","authors":"Cuiling Lan, Jizheng Xu, Feng Wu","doi":"10.1109/MMSP.2011.6093824","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093824","url":null,"abstract":"Compound images are combinations of text, graphics and natural images. They possess characteristics different from those of natural images, such as a strong anisotropy, sparse color histograms and repeated patterns. Former research on compressing them has mainly focused on developing certain strategies based on some of these characteristics but has failed so far to fully exploit them simultaneously. In this paper, we investigate the combination of four up-to-date strategies to construct a comprehensive scheme for compound image compression. We have implemented these strategies as four types of modes with variable block sizes. Experimental results show that the proposed scheme achieves significant coding gains for compound image compression at all bitrates.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125156449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A flexible markerless registration method for video augmented reality 一种灵活的视频增强现实无标记配准方法
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093790
L. Ling, I. Burnett, E. Cheng
This paper proposes a flexible, markerless registration method that addresses the problem of realistic virtual object placement at any position in a video sequence. The registration consists of two steps: four points are specified by the user to build the world coordinate system, where the virtual object is rendered. A self-calibration camera tracking algorithm is then proposed to recover the camera viewpoint frame-by-frame, such that the virtual object can be dynamically and correctly rendered according to camera movement. The proposed registration method needs no reference fiducials, knowledge of camera parameters or the user environment, where the virtual object can be placed in any environment even without any distinct features. Experimental evaluations demonstrate low errors for several camera motion rotations around the X and Y axes for the self-calibration algorithm. Finally, virtual object rendering applications in different user environments are evaluated.
本文提出了一种灵活的无标记配准方法,解决了在视频序列中任意位置放置逼真虚拟物体的问题。注册包括两个步骤:用户指定四个点来建立世界坐标系统,并在其中呈现虚拟对象。提出了一种自标定摄像机跟踪算法,逐帧恢复摄像机视点,使虚拟物体能够根据摄像机的运动动态正确渲染。所提出的配准方法不需要参考基准,不需要了解相机参数或用户环境,虚拟对象可以放置在任何环境中,即使没有任何明显的特征。实验结果表明,该自标定算法在X轴和Y轴上的几个相机运动旋转误差很小。最后,对不同用户环境下的虚拟物体渲染应用进行了评估。
{"title":"A flexible markerless registration method for video augmented reality","authors":"L. Ling, I. Burnett, E. Cheng","doi":"10.1109/MMSP.2011.6093790","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093790","url":null,"abstract":"This paper proposes a flexible, markerless registration method that addresses the problem of realistic virtual object placement at any position in a video sequence. The registration consists of two steps: four points are specified by the user to build the world coordinate system, where the virtual object is rendered. A self-calibration camera tracking algorithm is then proposed to recover the camera viewpoint frame-by-frame, such that the virtual object can be dynamically and correctly rendered according to camera movement. The proposed registration method needs no reference fiducials, knowledge of camera parameters or the user environment, where the virtual object can be placed in any environment even without any distinct features. Experimental evaluations demonstrate low errors for several camera motion rotations around the X and Y axes for the self-calibration algorithm. Finally, virtual object rendering applications in different user environments are evaluated.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134114894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multi-dimensional correlation steganalysis 多维相关隐写分析
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093791
F. Farhat, A. Diyanat, S. Ghaemmaghami, M. Aref
Multi-dimensional spatial analysis of image pixels have not been much investigated for the steganalysis of the LSB Steganographic methods. Pixel distribution based steganalysis methods could be thwarted by intelligently compensating statistical characteristics of image pixels, as reported in several papers. Simple LSB replacement methods have been improved by introducing smarter LSB embedding approaches, e.g. LSB matching and LSB+ methods, but they are basically the same in the sense of the LSB alteration. A new analytical method to detect LSB stego images is proposed in this paper. Our approach is based on the relative locations of image pixels that are essentially changed in an LSB embedding system. Furthermore, we introduce some new statistical features including “local entropies sum” and “clouds min sum” to achieve a higher performance. Simulation results show that our proposed approach outperforms some well-known LSB steganalysis methods, in terms of detection accuracy and the embedding rate estimation.
在LSB隐写方法中,对图像像素的多维空间分析还没有太多的研究。基于像素分布的隐写分析方法可以通过智能补偿图像像素的统计特征来阻止,正如一些论文所报道的那样。简单的LSB替换方法通过引入更智能的LSB嵌入方法(如LSB匹配方法和LSB+方法)得到了改进,但它们在LSB改变的意义上基本相同。提出了一种新的LSB隐写图像检测分析方法。我们的方法是基于在LSB嵌入系统中本质上改变的图像像素的相对位置。此外,我们引入了一些新的统计特征,包括“局部熵和”和“云最小和”,以达到更高的性能。仿真结果表明,该方法在检测精度和嵌入率估计方面都优于一些已知的LSB隐写分析方法。
{"title":"Multi-dimensional correlation steganalysis","authors":"F. Farhat, A. Diyanat, S. Ghaemmaghami, M. Aref","doi":"10.1109/MMSP.2011.6093791","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093791","url":null,"abstract":"Multi-dimensional spatial analysis of image pixels have not been much investigated for the steganalysis of the LSB Steganographic methods. Pixel distribution based steganalysis methods could be thwarted by intelligently compensating statistical characteristics of image pixels, as reported in several papers. Simple LSB replacement methods have been improved by introducing smarter LSB embedding approaches, e.g. LSB matching and LSB+ methods, but they are basically the same in the sense of the LSB alteration. A new analytical method to detect LSB stego images is proposed in this paper. Our approach is based on the relative locations of image pixels that are essentially changed in an LSB embedding system. Furthermore, we introduce some new statistical features including “local entropies sum” and “clouds min sum” to achieve a higher performance. Simulation results show that our proposed approach outperforms some well-known LSB steganalysis methods, in terms of detection accuracy and the embedding rate estimation.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134126047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ObjectBook construction for large-scale semantic-aware image retrieval 面向大规模语义感知图像检索的ObjectBook构建
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093776
Shiliang Zhang, Q. Tian, Qingming Huang, Wen Gao
Automatic image annotation assigns semantic labels to images thus presents great potential to achieve semantic-aware image retrieval. However, existing annotation algorithms are not scalable to this emerging need, both in terms of computational efficiency and the number of tags they can deal with. Facilitated by recent development of the large-scale image category recognition data such as ImageNet, we extrapolate from it a model for scalable image annotation and semantic-aware image retrieval, namely ObjectBook. The element in the ObjectBook, which is called an ObjectWord, is defined as a collection of discriminative image patches annotated with the corresponding objects. We take ObjectBook as a high-level semantic preserving visual vocabulary, and hence are able to easily develop efficient image annotation and inverted file indexing strategies for large-scale image collections. The proposed retrieval strategy is compared with state-of-the-art algorithms. Experimental results manifest that the ObjectBook is both discriminative and scalable for large-scale semantic-aware image retrieval.
自动图像标注为图像分配语义标签,为实现语义感知的图像检索提供了巨大的潜力。然而,现有的注释算法在计算效率和它们可以处理的标签数量方面都不能满足这种新出现的需求。在ImageNet等大规模图像分类识别数据发展的推动下,我们由此推断出一种可扩展的图像标注和语义感知图像检索模型,即ObjectBook。ObjectBook中的元素称为ObjectWord,它被定义为带有相应对象注释的判别图像补丁的集合。我们将ObjectBook作为一种高级的语义保持视觉词汇表,因此可以很容易地为大规模图像集合开发高效的图像注释和倒排文件索引策略。将所提出的检索策略与现有算法进行了比较。实验结果表明,ObjectBook具有良好的判别性和可扩展性,可用于大规模的语义感知图像检索。
{"title":"ObjectBook construction for large-scale semantic-aware image retrieval","authors":"Shiliang Zhang, Q. Tian, Qingming Huang, Wen Gao","doi":"10.1109/MMSP.2011.6093776","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093776","url":null,"abstract":"Automatic image annotation assigns semantic labels to images thus presents great potential to achieve semantic-aware image retrieval. However, existing annotation algorithms are not scalable to this emerging need, both in terms of computational efficiency and the number of tags they can deal with. Facilitated by recent development of the large-scale image category recognition data such as ImageNet, we extrapolate from it a model for scalable image annotation and semantic-aware image retrieval, namely ObjectBook. The element in the ObjectBook, which is called an ObjectWord, is defined as a collection of discriminative image patches annotated with the corresponding objects. We take ObjectBook as a high-level semantic preserving visual vocabulary, and hence are able to easily develop efficient image annotation and inverted file indexing strategies for large-scale image collections. The proposed retrieval strategy is compared with state-of-the-art algorithms. Experimental results manifest that the ObjectBook is both discriminative and scalable for large-scale semantic-aware image retrieval.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133434519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Transform-domain temporal prediction in video coding with spatially adaptive spectral correlations 基于空间自适应频谱相关性的视频编码变换域时间预测
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093815
Jingning Han, Vinay Melkote, K. Rose
Temporal prediction in standard video coding is performed in the spatial domain, where each pixel block is predicted from a motion-compensated pixel block in a previously reconstructed frame. Such prediction treats each pixel independently and ignores underlying spatial correlations. In contrast, this paper proposes a paradigm for motion-compensated prediction in the transform domain, that eliminates much of the spatial correlation before individual frequency components along a motion trajectory are independently predicted. The proposed scheme exploits the true temporal correlations, that emerge only after signal decomposition, and vary considerably from low to high frequency. The scheme spatially and temporally adapts to the evolving source statistics via a recursive procedure to obtain the cross-correlation between transform coefficients on the same motion trajectory. This recursion involves already reconstructed data and precludes the need for any additional side-information in the bit-stream. Experiments demonstrate substantial performance gains in comparison with the standard codec that employs conventional pixel domain motion-compensated prediction.
标准视频编码中的时间预测是在空间域中执行的,其中每个像素块是从先前重构帧中的运动补偿像素块预测的。这种预测对每个像素进行独立处理,忽略了潜在的空间相关性。相比之下,本文提出了一种在变换域中进行运动补偿预测的范式,该范式在沿运动轨迹独立预测单个频率分量之前消除了许多空间相关性。所提出的方案利用了真正的时间相关性,这种相关性只有在信号分解后才会出现,并且从低频到高频变化很大。该方案通过递归过程在空间和时间上适应不断变化的源统计量,以获得同一运动轨迹上变换系数之间的相互关系。这种递归涉及已经重构的数据,并且排除了在比特流中需要任何额外的侧信息。实验表明,与采用传统像素域运动补偿预测的标准编解码器相比,性能有了实质性的提高。
{"title":"Transform-domain temporal prediction in video coding with spatially adaptive spectral correlations","authors":"Jingning Han, Vinay Melkote, K. Rose","doi":"10.1109/MMSP.2011.6093815","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093815","url":null,"abstract":"Temporal prediction in standard video coding is performed in the spatial domain, where each pixel block is predicted from a motion-compensated pixel block in a previously reconstructed frame. Such prediction treats each pixel independently and ignores underlying spatial correlations. In contrast, this paper proposes a paradigm for motion-compensated prediction in the transform domain, that eliminates much of the spatial correlation before individual frequency components along a motion trajectory are independently predicted. The proposed scheme exploits the true temporal correlations, that emerge only after signal decomposition, and vary considerably from low to high frequency. The scheme spatially and temporally adapts to the evolving source statistics via a recursive procedure to obtain the cross-correlation between transform coefficients on the same motion trajectory. This recursion involves already reconstructed data and precludes the need for any additional side-information in the bit-stream. Experiments demonstrate substantial performance gains in comparison with the standard codec that employs conventional pixel domain motion-compensated prediction.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125773011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Super-resolution reconstruction with prior manifold on primitive patches for video compression 基于先验流形的视频压缩超分辨率重建
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093849
Jingtao Chen, H. Xiong
This paper proposes a generic video compression framework with low-quality video data and a learning-based approach, which is rooted in sparse representation for the ill-posed problem of video super-resolution reconstruction. It is regularized by the prior manifold only on the “primitive patches”, and each primitive patch is modeled by a sparse representation concerning an over-complete dictionary of trained set. Due to low intrinsic dimensionality of primitives, the number of samples in the dictionary can be greatly reduced. Considering the similar geometry of the manifolds of the feature spaces from the low-frequency and the high-frequency primitives, we hypothesize that the low-frequency and its corresponding high-frequency primitive patches share the same sparse representation structure. In this sense, high-resolution frame primitives are divided into low-frequency and high-frequency frame primitives, and high-frequency frame primitive patches can be synthesized from both the high-frequency primitive patch dictionary and the sparse structure of the corresponding low-frequency frame primitive patches. It does not involve with explicit motion estimation and any assistant information, and decomposes the original video sequence into key frames and low-resolution frames with low entropy. The corresponding high-resolution frames would be reconstructed by combining the high-frequency and the low-frequency patches with smoothness constraints and the backpro-jection process. Experimental results demonstrate the objective and subjective efficiency in comparison with H.264/AVC and existing super-resolution reconstruction approaches.
针对视频超分辨率重构中的不适定问题,提出了一种基于稀疏表示的通用视频压缩框架和基于学习的方法。它仅在“原始块”上用先验流形正则化,每个原始块由训练集的过完备字典的稀疏表示来建模。由于原语的固有维数较低,字典中的样本数量可以大大减少。考虑到低频和高频基元特征空间流形的几何形状相似,我们假设低频及其对应的高频基元斑块具有相同的稀疏表示结构。从这个意义上说,高分辨率帧原语分为低频和高频帧原语,高频帧原语片字典和相应低频帧原语片的稀疏结构都可以合成高频帧原语片。它不涉及显式的运动估计和任何辅助信息,将原始视频序列分解为低熵的关键帧和低分辨率帧。将带平滑约束的高频和低频斑块结合,并进行反投影处理,重建相应的高分辨率帧。实验结果表明,与H.264/AVC和现有的超分辨率重建方法相比,该方法在主观上和客观上都是有效的。
{"title":"Super-resolution reconstruction with prior manifold on primitive patches for video compression","authors":"Jingtao Chen, H. Xiong","doi":"10.1109/MMSP.2011.6093849","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093849","url":null,"abstract":"This paper proposes a generic video compression framework with low-quality video data and a learning-based approach, which is rooted in sparse representation for the ill-posed problem of video super-resolution reconstruction. It is regularized by the prior manifold only on the “primitive patches”, and each primitive patch is modeled by a sparse representation concerning an over-complete dictionary of trained set. Due to low intrinsic dimensionality of primitives, the number of samples in the dictionary can be greatly reduced. Considering the similar geometry of the manifolds of the feature spaces from the low-frequency and the high-frequency primitives, we hypothesize that the low-frequency and its corresponding high-frequency primitive patches share the same sparse representation structure. In this sense, high-resolution frame primitives are divided into low-frequency and high-frequency frame primitives, and high-frequency frame primitive patches can be synthesized from both the high-frequency primitive patch dictionary and the sparse structure of the corresponding low-frequency frame primitive patches. It does not involve with explicit motion estimation and any assistant information, and decomposes the original video sequence into key frames and low-resolution frames with low entropy. The corresponding high-resolution frames would be reconstructed by combining the high-frequency and the low-frequency patches with smoothness constraints and the backpro-jection process. Experimental results demonstrate the objective and subjective efficiency in comparison with H.264/AVC and existing super-resolution reconstruction approaches.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132801335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Adaptive in-loop noise-filtered prediction for High Efficiency Video Coding 高效视频编码的自适应环内滤波预测
Pub Date : 2011-12-01 DOI: 10.1109/MMSP.2011.6093773
Eugen Wige, Gilbert Yammine, P. Amon, A. Hutter, André Kaup
Compression of noisy image sequences is a hard challenge in video coding. Especially for high quality compression the preprocessing of videos is not possible, as it decreases the objective quality of the videos. In order to overcome this problem, this paper presents an in-loop denoising framework for efficient medium to high fidelity compression of noisy video data. It is shown that using low complexity in-loop noise estimation and noise filtering as well as adaptive selection of the denoised inter frame predictors can improve the compression performance. The proposed algorithm for adaptive selection of the denoised predictor is based on the actual HEVC reference model. The different inter frame prediction modes within the current HEVC reference model are exploited for adaptive selection of denoised prediction by transmission of some side information in combination with decoder side estimation for denoised prediction. The simulation results show considerable gains using the proposed in-loop denoising framework with adaptive selection. In addition the theoretical bounds for the compression efficiency, if we could perfectly estimate the adaptive selection of the denoised prediction in the decoder, are shown in the simulation results.
噪声图像序列的压缩是视频编码中的一个难题。特别是对于高质量的压缩,视频的预处理是不可能的,因为它降低了视频的客观质量。为了克服这一问题,本文提出了一种环内去噪框架,用于对含噪视频数据进行高效的中高保真度压缩。研究表明,采用低复杂度的环内噪声估计和噪声滤波以及自适应选择去噪的帧间预测器可以提高压缩性能。提出了一种基于实际HEVC参考模型的去噪预测器自适应选择算法。利用当前HEVC参考模型中不同的帧间预测模式,通过传输一些边信息,结合解码器边估计进行去噪预测,自适应选择去噪预测。仿真结果表明,采用自适应选择的环内去噪框架可以获得可观的增益。此外,仿真结果表明,如果我们能够完美地估计解码器中去噪预测的自适应选择,则压缩效率的理论界限。
{"title":"Adaptive in-loop noise-filtered prediction for High Efficiency Video Coding","authors":"Eugen Wige, Gilbert Yammine, P. Amon, A. Hutter, André Kaup","doi":"10.1109/MMSP.2011.6093773","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093773","url":null,"abstract":"Compression of noisy image sequences is a hard challenge in video coding. Especially for high quality compression the preprocessing of videos is not possible, as it decreases the objective quality of the videos. In order to overcome this problem, this paper presents an in-loop denoising framework for efficient medium to high fidelity compression of noisy video data. It is shown that using low complexity in-loop noise estimation and noise filtering as well as adaptive selection of the denoised inter frame predictors can improve the compression performance. The proposed algorithm for adaptive selection of the denoised predictor is based on the actual HEVC reference model. The different inter frame prediction modes within the current HEVC reference model are exploited for adaptive selection of denoised prediction by transmission of some side information in combination with decoder side estimation for denoised prediction. The simulation results show considerable gains using the proposed in-loop denoising framework with adaptive selection. In addition the theoretical bounds for the compression efficiency, if we could perfectly estimate the adaptive selection of the denoised prediction in the decoder, are shown in the simulation results.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134161360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2011 IEEE 13th International Workshop on Multimedia Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1