首页 > 最新文献

2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)最新文献

英文 中文
Embedded coding of optical flow fields for scalable video compression 用于可扩展视频压缩的光流场嵌入编码
Pub Date : 2014-11-20 DOI: 10.1109/MMSP.2014.6958817
Sean I. Young, R. Mathew, D. Taubman
An embedded coding scheme for dense motion (optical flow) fields is proposed. Such a scheme is particularly useful in scalable video compression where one must compensate for inter-frame motion at various visual qualities and resolutions. However, the high cost of coding such fields has often made this option prohibitive. Using our previously developed `breakpoint'-adaptive wavelet transform, we show that it is possible to code dense motion fields efficiently while simultaneously endowing the coded motion representation with embedded resolution and quality scalability attributes. Performance comparisons with the traditional non-scalable block-based model are also made and presented with the aid of a modified H.264/AVC JM reference encoder.
提出了一种密集运动(光流)场的嵌入式编码方案。这种方案在可扩展的视频压缩中特别有用,因为必须在各种视觉质量和分辨率下补偿帧间运动。然而,对这些字段进行编码的高成本常常使这种选择难以实现。使用我们之前开发的“断点”自适应小波变换,我们表明可以有效地编码密集运动场,同时赋予编码运动表示嵌入分辨率和质量可扩展性属性。在改进的H.264/AVC JM参考编码器的帮助下,与传统的不可扩展的基于块的模型进行了性能比较。
{"title":"Embedded coding of optical flow fields for scalable video compression","authors":"Sean I. Young, R. Mathew, D. Taubman","doi":"10.1109/MMSP.2014.6958817","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958817","url":null,"abstract":"An embedded coding scheme for dense motion (optical flow) fields is proposed. Such a scheme is particularly useful in scalable video compression where one must compensate for inter-frame motion at various visual qualities and resolutions. However, the high cost of coding such fields has often made this option prohibitive. Using our previously developed `breakpoint'-adaptive wavelet transform, we show that it is possible to code dense motion fields efficiently while simultaneously endowing the coded motion representation with embedded resolution and quality scalability attributes. Performance comparisons with the traditional non-scalable block-based model are also made and presented with the aid of a modified H.264/AVC JM reference encoder.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115076356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A new error-mapping scheme for scalable audio coding 一种新的可扩展音频编码错误映射方案
Pub Date : 2014-11-20 DOI: 10.1109/MMSP.2014.6958815
Haibin Huang, S. Rahardja
In scalable audio coders, such as the MPEG-4 SLS, error-mapping is used to map quantization errors in the core coder to an error signal before passing through bit-plane coding. In this paper, we propose a new error-mapping scheme that is derived by observing statistical properties of the error signal. Compared with the error-mapping in SLS, the proposed scheme improves coding efficiency as well as computational complexity of the coder. An average improvement of 9 points in MUSHRA score has been achieved by the proposed scheme in subjective listening tests. The proposed error-mapping adds a useful new tool to the existing toolset for constructing next-generation scalable audio coders.
在可扩展的音频编码器中,例如MPEG-4 SLS,错误映射用于在通过位平面编码之前将核心编码器中的量化错误映射到错误信号。在本文中,我们提出了一种新的误差映射方案,该方案是通过观察误差信号的统计性质推导出来的。与SLS中的错误映射相比,该方案提高了编码效率,降低了编码器的计算复杂度。在主观听力测试中,该方案使学生的MUSHRA成绩平均提高了9分。提出的错误映射为构建下一代可扩展音频编码器的现有工具集增加了一个有用的新工具。
{"title":"A new error-mapping scheme for scalable audio coding","authors":"Haibin Huang, S. Rahardja","doi":"10.1109/MMSP.2014.6958815","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958815","url":null,"abstract":"In scalable audio coders, such as the MPEG-4 SLS, error-mapping is used to map quantization errors in the core coder to an error signal before passing through bit-plane coding. In this paper, we propose a new error-mapping scheme that is derived by observing statistical properties of the error signal. Compared with the error-mapping in SLS, the proposed scheme improves coding efficiency as well as computational complexity of the coder. An average improvement of 9 points in MUSHRA score has been achieved by the proposed scheme in subjective listening tests. The proposed error-mapping adds a useful new tool to the existing toolset for constructing next-generation scalable audio coders.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134009572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view action recognition by cross-domain learning 基于跨域学习的多视图动作识别
Pub Date : 2014-11-20 DOI: 10.1109/MMSP.2014.6958811
Weizhi Nie, Anan Liu, Jing Yu, Yuting Su, L. Chaisorn, Yongkang Wang, M. Kankanhalli
This paper proposes a novel multi-view human action recognition method by discovering and sharing common knowledge among different video sets captured in multiple viewpoints. To our knowledge, we are the first to treat a specific view as target domain and the others as source domains and consequently formulate the multi-view action recognition into the cross-domain learning framework. First, the classic bag-of-visual word framework is implemented for visual feature extraction in individual viewpoints. Then, we propose a cross-domain learning method with block-wise weighted kernel function matrix to highlight the saliency components and consequently augment the discriminative ability of the model. Extensive experiments are implemented on IXMAS, the popular multi-view action dataset. The experimental results demonstrate that the proposed method can consistently outperform the state of the arts.
本文提出了一种新的多视点人体动作识别方法,通过在多视点捕获的不同视频集之间发现和共享共同知识。据我们所知,我们是第一个将特定的视图作为目标域,将其他视图作为源域,从而将多视图动作识别形成跨域学习框架的人。首先,实现了经典的视觉词袋框架,用于单个视点的视觉特征提取;然后,我们提出了一种基于分块加权核函数矩阵的跨域学习方法来突出显著性成分,从而增强模型的判别能力。在流行的多视图动作数据集IXMAS上进行了大量的实验。实验结果表明,所提出的方法始终优于目前的技术水平。
{"title":"Multi-view action recognition by cross-domain learning","authors":"Weizhi Nie, Anan Liu, Jing Yu, Yuting Su, L. Chaisorn, Yongkang Wang, M. Kankanhalli","doi":"10.1109/MMSP.2014.6958811","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958811","url":null,"abstract":"This paper proposes a novel multi-view human action recognition method by discovering and sharing common knowledge among different video sets captured in multiple viewpoints. To our knowledge, we are the first to treat a specific view as target domain and the others as source domains and consequently formulate the multi-view action recognition into the cross-domain learning framework. First, the classic bag-of-visual word framework is implemented for visual feature extraction in individual viewpoints. Then, we propose a cross-domain learning method with block-wise weighted kernel function matrix to highlight the saliency components and consequently augment the discriminative ability of the model. Extensive experiments are implemented on IXMAS, the popular multi-view action dataset. The experimental results demonstrate that the proposed method can consistently outperform the state of the arts.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132200140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Bidirectional hierarchical anchoring of motion fields for scalable video coding 面向可扩展视频编码的运动场双向分层锚定
Pub Date : 2014-11-20 DOI: 10.1109/MMSP.2014.6958816
Dominic Rüfenacht, R. Mathew, D. Taubman
The ability to predict motion fields at finer temporal scales from coarser ones is a very desirable property for temporal scalability. This is at best very difficult in current state-of-the-art video codecs (i.e., H.264, HEVC), where motion fields are anchored in the frame that is to be predicted (target frame). In this paper, we propose to anchor motion fields in the reference frames. We show how from only one fully coded motion field at the coarsest temporal level as well as breakpoints which signal discontinuities in the motion field, we are able to reliably predict motion fields used at finer temporal levels. This significantly reduces the cost for coding the motion fields. Results on synthetic data show improved rate-distortion (R-D) performance and superior scalability, when compared to the traditional way of anchoring motion fields.
从较粗的时间尺度预测较细的时间尺度上的运动场的能力是时间可扩展性的一个非常理想的特性。这在当前最先进的视频编解码器(即H.264, HEVC)中是非常困难的,其中运动场固定在要预测的帧(目标帧)中。本文提出在参考系中锚定运动场。我们展示了如何从一个完全编码的运动场在最粗的时间水平,以及断点信号不连续的运动场,我们能够可靠地预测运动场使用在更细的时间水平。这大大降低了对运动域进行编码的成本。合成数据的结果表明,与传统的运动场锚定方法相比,该方法提高了率失真(R-D)性能,具有更好的可扩展性。
{"title":"Bidirectional hierarchical anchoring of motion fields for scalable video coding","authors":"Dominic Rüfenacht, R. Mathew, D. Taubman","doi":"10.1109/MMSP.2014.6958816","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958816","url":null,"abstract":"The ability to predict motion fields at finer temporal scales from coarser ones is a very desirable property for temporal scalability. This is at best very difficult in current state-of-the-art video codecs (i.e., H.264, HEVC), where motion fields are anchored in the frame that is to be predicted (target frame). In this paper, we propose to anchor motion fields in the reference frames. We show how from only one fully coded motion field at the coarsest temporal level as well as breakpoints which signal discontinuities in the motion field, we are able to reliably predict motion fields used at finer temporal levels. This significantly reduces the cost for coding the motion fields. Results on synthetic data show improved rate-distortion (R-D) performance and superior scalability, when compared to the traditional way of anchoring motion fields.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132489848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A novel video coding scheme using a scene adaptive non-parametric background model 一种新的基于场景自适应非参数背景模型的视频编码方案
Pub Date : 2014-11-14 DOI: 10.1109/MMSP.2014.6958823
Subrata Chakraborty, M. Paul, M. Murshed, Mortuza Ali
Video coding techniques utilising background frames, provide better rate distortion performance by exploiting coding efficiency in uncovered background areas compared to the latest video coding standard. Parametric approaches such as the mixture of Gaussian (MoG) based background modeling has been widely used however they require prior knowledge about the test videos for parameter estimation. Recently introduced non-parametric (NP) based background modeling techniques successfully improved video coding performance through a HEVC integrated coding scheme. The inherent nature of the NP technique naturally exhibits superior performance in dynamic background scenarios compared to the MoG based technique without a priori knowledge of video data distribution. Although NP based coding schemes showed promising coding performances, they suffer from a number of key challenges - (a) determination of the optimal subset of training frames for generating a suitable background that can be used as a reference frame during coding, (b) incorporating dynamic changes in the background effectively after the initial background frame is generated, (c) managing frequent scene change leading to performance degradation, and (d) optimizing coding quality ratio between an I-frame and other frames under bit rate constraints. In this study we develop a new scene adaptive coding scheme using the NP based technique, capable of solving the current challenges by incorporating a new continuously updating background generation process. Extensive experimental results are also provided to validate the effectiveness of the new scheme.
与最新的视频编码标准相比,利用背景帧的视频编码技术通过利用未覆盖背景区域的编码效率,提供更好的率失真性能。参数化方法,如基于高斯混合(MoG)的背景建模已经被广泛使用,但是它们需要对测试视频的先验知识来进行参数估计。最近引入的基于非参数(NP)背景建模技术通过HEVC集成编码方案成功地提高了视频编码性能。与没有视频数据分布先验知识的基于MoG的技术相比,NP技术的固有性质在动态背景场景中自然表现出优越的性能。尽管基于NP的编码方案显示出有希望的编码性能,但它们面临着许多关键挑战——(a)确定最佳训练帧子集,以生成可在编码过程中用作参考帧的合适背景,(b)在初始背景帧生成后有效地结合背景的动态变化,(c)管理导致性能下降的频繁场景变化。(d)在比特率约束下优化i帧与其他帧之间的编码质量比。在本研究中,我们使用基于NP的技术开发了一种新的场景自适应编码方案,能够通过结合新的不断更新的背景生成过程来解决当前的挑战。大量的实验结果验证了新方案的有效性。
{"title":"A novel video coding scheme using a scene adaptive non-parametric background model","authors":"Subrata Chakraborty, M. Paul, M. Murshed, Mortuza Ali","doi":"10.1109/MMSP.2014.6958823","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958823","url":null,"abstract":"Video coding techniques utilising background frames, provide better rate distortion performance by exploiting coding efficiency in uncovered background areas compared to the latest video coding standard. Parametric approaches such as the mixture of Gaussian (MoG) based background modeling has been widely used however they require prior knowledge about the test videos for parameter estimation. Recently introduced non-parametric (NP) based background modeling techniques successfully improved video coding performance through a HEVC integrated coding scheme. The inherent nature of the NP technique naturally exhibits superior performance in dynamic background scenarios compared to the MoG based technique without a priori knowledge of video data distribution. Although NP based coding schemes showed promising coding performances, they suffer from a number of key challenges - (a) determination of the optimal subset of training frames for generating a suitable background that can be used as a reference frame during coding, (b) incorporating dynamic changes in the background effectively after the initial background frame is generated, (c) managing frequent scene change leading to performance degradation, and (d) optimizing coding quality ratio between an I-frame and other frames under bit rate constraints. In this study we develop a new scene adaptive coding scheme using the NP based technique, capable of solving the current challenges by incorporating a new continuously updating background generation process. Extensive experimental results are also provided to validate the effectiveness of the new scheme.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127010588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Social image search exploiting joint visual-textual information within a fuzzy hypergraph framework 利用模糊超图框架中视觉-文本联合信息的社会图像搜索
Pub Date : 2014-09-01 DOI: 10.1109/MMSP.2014.6958809
Konstantinos Pliakos, Constantine Kotropoulos
The unremitting growth of social media popularity is manifested by the vast volume of images uploaded to the web. Despite the extensive research efforts, there are still open problems in accurate or efficient image search methods. The majority of existing methods, dedicated to image search, treat the image visual content and the semantic information captured by the social image tags, separately or in a sequential manner. Here, a novel and efficient method is proposed, exploiting visual and textual information simultaneously. The joint visual-textual information is captured by a fuzzy hypergraph powered by the term-frequency and inverse-document-frequency (tf-idf) weighting scheme. Experimental results conducted on two datasets substantiate the merits of the proposed method. Indicatively, an average precision of 77% is measured at 1% recall for image-based queries.
社交媒体受欢迎程度的持续增长体现在大量图片上传到网络上。尽管进行了大量的研究工作,但在准确或高效的图像搜索方法方面仍存在一些开放性问题。现有的大多数用于图像搜索的方法,将图像视觉内容和社交图像标签捕获的语义信息分开或按顺序处理。在此,提出了一种新颖而高效的方法,即同时利用视觉信息和文本信息。由术语频率和反文档频率(tf-idf)加权方案驱动的模糊超图捕获联合的视觉文本信息。在两个数据集上进行的实验结果证实了该方法的优点。具有指示性的是,基于图像的查询在1%召回率下的平均精度为77%。
{"title":"Social image search exploiting joint visual-textual information within a fuzzy hypergraph framework","authors":"Konstantinos Pliakos, Constantine Kotropoulos","doi":"10.1109/MMSP.2014.6958809","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958809","url":null,"abstract":"The unremitting growth of social media popularity is manifested by the vast volume of images uploaded to the web. Despite the extensive research efforts, there are still open problems in accurate or efficient image search methods. The majority of existing methods, dedicated to image search, treat the image visual content and the semantic information captured by the social image tags, separately or in a sequential manner. Here, a novel and efficient method is proposed, exploiting visual and textual information simultaneously. The joint visual-textual information is captured by a fuzzy hypergraph powered by the term-frequency and inverse-document-frequency (tf-idf) weighting scheme. Experimental results conducted on two datasets substantiate the merits of the proposed method. Indicatively, an average precision of 77% is measured at 1% recall for image-based queries.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115146030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SVM is not always confident: Telling whether the output from multiclass SVM is true or false by analysing its confidence values 支持向量机并非总是可信的:通过分析其置信值来判断多类支持向量机的输出是真还是假
Pub Date : 2014-09-01 DOI: 10.1109/MMSP.2014.6958800
T. Yamasaki, Takaki Maeda, K. Aizawa
This paper presents an algorithm to distinguish whether the output label that is yielded from multiclass support vector machine (SVM) is true or false without knowing the answer. Such judgment is done only by the confidence analysis based on the pre-training/testing using the training data. Such true/false judgment is useful for refining the output labels. We experimentally demonstrate that the decision value difference between the top candidate and the second candidate is a good measure. In addition, a proper threshold can be determined by the pre-training/testing using only the training data. Experimental results using three standard image datasets demonstrate that our proposed algorithm can improve Matthews correlation coefficient (MCC) much better than simply thresholding the decision value for the top candidate.
本文提出了一种在不知道答案的情况下区分多类支持向量机输出标签是真还是假的算法。这种判断只能通过使用训练数据进行预训练/测试的置信度分析来完成。这样的真/假判断对于精炼输出标签很有用。我们通过实验证明,第一候选和第二候选之间的决策值差异是一个很好的度量。此外,可以通过仅使用训练数据的预训练/测试来确定适当的阈值。使用三个标准图像数据集的实验结果表明,与简单地对最优候选图像的决策值进行阈值化相比,我们提出的算法可以更好地提高马修斯相关系数(MCC)。
{"title":"SVM is not always confident: Telling whether the output from multiclass SVM is true or false by analysing its confidence values","authors":"T. Yamasaki, Takaki Maeda, K. Aizawa","doi":"10.1109/MMSP.2014.6958800","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958800","url":null,"abstract":"This paper presents an algorithm to distinguish whether the output label that is yielded from multiclass support vector machine (SVM) is true or false without knowing the answer. Such judgment is done only by the confidence analysis based on the pre-training/testing using the training data. Such true/false judgment is useful for refining the output labels. We experimentally demonstrate that the decision value difference between the top candidate and the second candidate is a good measure. In addition, a proper threshold can be determined by the pre-training/testing using only the training data. Experimental results using three standard image datasets demonstrate that our proposed algorithm can improve Matthews correlation coefficient (MCC) much better than simply thresholding the decision value for the top candidate.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123509610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Highly optimized implementation of HEVC decoder for general processors 高度优化的HEVC解码器实现通用处理器
Pub Date : 2014-09-01 DOI: 10.1109/MMSP.2014.6958819
Shengbin Meng, Y. Duan, Jun Sun, Zongming Guo
In this paper, we propose a novel design and optimized implementation of the HEVC decoder. First, a novel decoder prototype with refined decoding workflow and efficient memory management is designed. Then on this basis, a series of single-instruction-multiple-data (SIMD) based algorithms are used to speed up several time-consuming modules in HEVC decoding. Finally, a frame-based parallel framework is applied to exploit the multi-threading technology on multicore processors. With the highly optimized HEVC decoder, decoding speed of 246fps on Intel i7-2400 3.4GHz quad-core processor for 1080p videos and 52fps on ARM Cortex-A9 1.2GHz dual-core processor for 720p videos can be achieved in our experiments.
在本文中,我们提出了一种新的HEVC解码器的设计和优化实现。首先,设计了一种具有精简译码流程和高效内存管理的解码器原型。然后在此基础上,采用一系列基于单指令多数据(SIMD)的算法来加快HEVC解码中几个耗时的模块。最后,采用基于帧的并行框架,在多核处理器上充分利用多线程技术。通过高度优化的HEVC解码器,我们的实验在Intel i7-2400 3.4GHz四核处理器上对1080p视频的解码速度可以达到246fps,在ARM Cortex-A9 1.2GHz双核处理器上对720p视频的解码速度可以达到52fps。
{"title":"Highly optimized implementation of HEVC decoder for general processors","authors":"Shengbin Meng, Y. Duan, Jun Sun, Zongming Guo","doi":"10.1109/MMSP.2014.6958819","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958819","url":null,"abstract":"In this paper, we propose a novel design and optimized implementation of the HEVC decoder. First, a novel decoder prototype with refined decoding workflow and efficient memory management is designed. Then on this basis, a series of single-instruction-multiple-data (SIMD) based algorithms are used to speed up several time-consuming modules in HEVC decoding. Finally, a frame-based parallel framework is applied to exploit the multi-threading technology on multicore processors. With the highly optimized HEVC decoder, decoding speed of 246fps on Intel i7-2400 3.4GHz quad-core processor for 1080p videos and 52fps on ARM Cortex-A9 1.2GHz dual-core processor for 720p videos can be achieved in our experiments.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125269595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Free-viewpoint video sequences: A new challenge for objective quality metrics 自由视点视频序列:对客观质量度量的新挑战
Pub Date : 2014-09-01 DOI: 10.1109/MMSP.2014.6958832
Philippe Hanhart, Emilie Bosc, P. Callet, T. Ebrahimi
Free-viewpoint television is expected to create a more natural and interactive viewing experience by providing the ability to interactively change the viewpoint to enjoy a 3D scene. To render new virtual viewpoints, free-viewpoint systems rely on view synthesis. However, it is known that most objective metrics fail at predicting perceived quality of synthesized views. Therefore, it is legitimate to question the reliability of commonly used objective metrics to assess the quality of free-viewpoint video (FVV) sequences. In this paper, we analyze the performance of several commonly used objective quality metrics on FVV sequences, which were synthesized from decompressed depth data, using subjective scores as ground truth. Statistical analyses showed that commonly used metrics were not reliable predictors of perceived image quality when different contents and distortions were considered. However, the correlation improved when considering individual conditions, which indicates that the artifacts produced by some view synthesis algorithms might not be correctly handled by current metrics.
自由视点电视有望通过提供交互式改变视点的能力来创造更自然和互动的观看体验,以享受3D场景。为了呈现新的虚拟视点,自由视点系统依赖于视点合成。然而,众所周知,大多数客观指标无法预测合成视图的感知质量。因此,质疑常用的客观度量来评估自由视点视频(FVV)序列质量的可靠性是合理的。在本文中,我们分析了几种常用的客观质量指标在FVV序列上的性能,这些FVV序列是由深度解压缩数据合成的,以主观分数为基础真值。统计分析表明,当考虑到不同的内容和失真时,常用的指标并不是感知图像质量的可靠预测指标。然而,在考虑单个条件时,相关性得到了改善,这表明由某些视图合成算法产生的工件可能无法由当前度量正确处理。
{"title":"Free-viewpoint video sequences: A new challenge for objective quality metrics","authors":"Philippe Hanhart, Emilie Bosc, P. Callet, T. Ebrahimi","doi":"10.1109/MMSP.2014.6958832","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958832","url":null,"abstract":"Free-viewpoint television is expected to create a more natural and interactive viewing experience by providing the ability to interactively change the viewpoint to enjoy a 3D scene. To render new virtual viewpoints, free-viewpoint systems rely on view synthesis. However, it is known that most objective metrics fail at predicting perceived quality of synthesized views. Therefore, it is legitimate to question the reliability of commonly used objective metrics to assess the quality of free-viewpoint video (FVV) sequences. In this paper, we analyze the performance of several commonly used objective quality metrics on FVV sequences, which were synthesized from decompressed depth data, using subjective scores as ground truth. Statistical analyses showed that commonly used metrics were not reliable predictors of perceived image quality when different contents and distortions were considered. However, the correlation improved when considering individual conditions, which indicates that the artifacts produced by some view synthesis algorithms might not be correctly handled by current metrics.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134429591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Vision-based tracking in large image database for real-time mobile augmented reality 基于视觉的大型图像数据库实时移动增强现实跟踪
Pub Date : 2014-09-01 DOI: 10.1109/MMSP.2014.6958790
Madjid Maidi, M. Preda, Yassine Lehiani, T. Lavric
This paper presents an approach for tracking natural objects in augmented reality applications. The targets are detected and identified using a markerless approach relying upon the extraction of image salient features and descriptors. The method deals with large image databases using a novel strategy for feature retrieval and pairwise matching. Further-more, the developed method integrates a real-time solution for 3D pose estimation using an analytical technique based on camera perspective transformations. The algorithm associates 2D feature samples coming from the identification part with 3D mapped points of the object space. Next, a sampling scheme for ordering correspondences is carried out to establishing the 2D/3D projective relationship. The tracker performs localization using the feature images and 3D models to enhance the scene view with overlaid graphics by computing the camera motion parameters. The modules built within this architecture are deployed on a mobile platform to provide an intuitive interface for interacting with the surrounding real world. The system is experimented and evaluated on challenging scalable image dataset and the obtained results demonstrate the effectiveness of the approach towards versatile augmented reality applications.
本文提出了一种增强现实应用中自然物体的跟踪方法。目标检测和识别使用无标记的方法依赖于提取图像显著特征和描述符。该方法使用一种新的特征检索和成对匹配策略来处理大型图像数据库。此外,所开发的方法集成了基于摄像机视角变换的分析技术的三维姿态估计的实时解决方案。该算法将来自识别部分的二维特征样本与目标空间的三维映射点相关联。其次,提出了一种排序对应的采样方案,建立了二维/三维投影关系。跟踪器使用特征图像和3D模型进行定位,通过计算相机运动参数来增强覆盖图形的场景视图。在该体系结构中构建的模块部署在移动平台上,以提供与周围现实世界交互的直观界面。该系统在具有挑战性的可扩展图像数据集上进行了实验和评估,所获得的结果证明了该方法在多功能增强现实应用中的有效性。
{"title":"Vision-based tracking in large image database for real-time mobile augmented reality","authors":"Madjid Maidi, M. Preda, Yassine Lehiani, T. Lavric","doi":"10.1109/MMSP.2014.6958790","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958790","url":null,"abstract":"This paper presents an approach for tracking natural objects in augmented reality applications. The targets are detected and identified using a markerless approach relying upon the extraction of image salient features and descriptors. The method deals with large image databases using a novel strategy for feature retrieval and pairwise matching. Further-more, the developed method integrates a real-time solution for 3D pose estimation using an analytical technique based on camera perspective transformations. The algorithm associates 2D feature samples coming from the identification part with 3D mapped points of the object space. Next, a sampling scheme for ordering correspondences is carried out to establishing the 2D/3D projective relationship. The tracker performs localization using the feature images and 3D models to enhance the scene view with overlaid graphics by computing the camera motion parameters. The modules built within this architecture are deployed on a mobile platform to provide an intuitive interface for interacting with the surrounding real world. The system is experimented and evaluated on challenging scalable image dataset and the obtained results demonstrate the effectiveness of the approach towards versatile augmented reality applications.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131297251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1