首页 > 最新文献

2013 Visual Communications and Image Processing (VCIP)最新文献

英文 中文
A light-weight HEVC encoder for image coding 用于图像编码的轻量级HEVC编码器
Pub Date : 2013-11-01 DOI: 10.1109/VCIP.2013.6706448
Fei Liang, Xiulian Peng, Jizheng Xu
High Efficiency Video Coding (HEVC), not only provides a much better coding efficiency than previous video coding standards, but also shows significantly superior performance than other image coding schemes when applied to image coding. However, the improvement is at the cost of significant increase of encoding complexity. In this paper, we focus on retaining the high coding efficiency provided by HEVC while largely reducing its encoding complexity for image coding. By applying various techniques including optimized coding structure parameters, coding unit early termination, fast intra prediction and transform skip mode decision, we significantly reduce the complexity of HEVC intra coding while keeping most of its coding efficiency. Experimental results show that our light-weight HEVC encoder can save about 82% coding time compared with original HEVC encoder. With a slight loss to the HEVC reference software, the proposed scheme still gains about 19% in BD-BR compared with H.264/AVC.
高效视频编码(High Efficiency Video Coding, HEVC)不仅提供了比以前的视频编码标准更高的编码效率,而且在应用于图像编码时也表现出明显优于其他图像编码方案的性能。然而,这种改进是以显著增加编码复杂度为代价的。在本文中,我们的重点是保留HEVC所提供的高编码效率,同时大大降低其编码复杂度。通过优化编码结构参数、编码单元提前终止、快速内预测和变换跳过模式决策等技术,在保持HEVC编码效率的同时,显著降低了HEVC内编码的复杂度。实验结果表明,与原有的HEVC编码器相比,轻量级HEVC编码器可节省约82%的编码时间。与H.264/AVC相比,在对HEVC参考软件略有损失的情况下,该方案的BD-BR性能仍提高了19%左右。
{"title":"A light-weight HEVC encoder for image coding","authors":"Fei Liang, Xiulian Peng, Jizheng Xu","doi":"10.1109/VCIP.2013.6706448","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706448","url":null,"abstract":"High Efficiency Video Coding (HEVC), not only provides a much better coding efficiency than previous video coding standards, but also shows significantly superior performance than other image coding schemes when applied to image coding. However, the improvement is at the cost of significant increase of encoding complexity. In this paper, we focus on retaining the high coding efficiency provided by HEVC while largely reducing its encoding complexity for image coding. By applying various techniques including optimized coding structure parameters, coding unit early termination, fast intra prediction and transform skip mode decision, we significantly reduce the complexity of HEVC intra coding while keeping most of its coding efficiency. Experimental results show that our light-weight HEVC encoder can save about 82% coding time compared with original HEVC encoder. With a slight loss to the HEVC reference software, the proposed scheme still gains about 19% in BD-BR compared with H.264/AVC.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128378488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Face recogntion in open world environment 开放世界环境下的人脸识别
Pub Date : 2013-11-01 DOI: 10.1109/VCIP.2013.6706423
Jielin Qiu, Ya Zhang, Jun Sun
Face recognition in open world environment is a very challenging task due to variant appearances of the target persons and a large scale of unregistered probe faces. In this paper we combine two parallel classifiers, one based on the Local Binary Pattern (LBP) feature and the other based on the Gabor features, to build a specific face recognizer for each target person. Faces used for training are borderline patterns obtained through a morphing procedure combing target faces and random non-target ones. Grid-search is applied to find an optimal morphing-degree-pair. By using an AND operator to integrate the prediction of the two complementary parallel classifiers, many false positives are eliminated in the final results. The proposed algorithm is compared with the Robust Sparse Coding method, using selected celebrities as the target persons and the images from FERET as the non-target faces. Experimental results suggest that the proposed approach is better at tolerating the distortion of the target person's appearance and has a lower false alarm rate.
开放世界环境下的人脸识别由于目标人的外表变化和大量未注册的探测人脸,是一项非常具有挑战性的任务。在本文中,我们结合了两个并行分类器,一个基于局部二值模式(LBP)特征,另一个基于Gabor特征,为每个目标人构建一个特定的人脸识别器。用于训练的人脸是通过对目标人脸和随机非目标人脸进行变形处理得到的边缘模式。采用网格搜索方法寻找最优的变形度对。通过使用AND算子对两个互补的并行分类器的预测结果进行整合,消除了最终结果中的许多误报。将所提出的算法与鲁棒稀疏编码方法进行比较,选择名人作为目标人物,FERET图像作为非目标人脸。实验结果表明,该方法具有较好的容忍度和较低的误报率。
{"title":"Face recogntion in open world environment","authors":"Jielin Qiu, Ya Zhang, Jun Sun","doi":"10.1109/VCIP.2013.6706423","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706423","url":null,"abstract":"Face recognition in open world environment is a very challenging task due to variant appearances of the target persons and a large scale of unregistered probe faces. In this paper we combine two parallel classifiers, one based on the Local Binary Pattern (LBP) feature and the other based on the Gabor features, to build a specific face recognizer for each target person. Faces used for training are borderline patterns obtained through a morphing procedure combing target faces and random non-target ones. Grid-search is applied to find an optimal morphing-degree-pair. By using an AND operator to integrate the prediction of the two complementary parallel classifiers, many false positives are eliminated in the final results. The proposed algorithm is compared with the Robust Sparse Coding method, using selected celebrities as the target persons and the images from FERET as the non-target faces. Experimental results suggest that the proposed approach is better at tolerating the distortion of the target person's appearance and has a lower false alarm rate.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128216286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Saliency detection for stereoscopic images 立体图像的显著性检测
Pub Date : 2013-11-01 DOI: 10.1109/VCIP.2013.6706346
Yuming Fang, Junle Wang, Manish Narwaria, P. Callet, Weisi Lin
Saliency detection techniques have been widely used in various 2D multimedia processing applications. Currently, the emerging applications of stereoscopic display require new saliency detection models for stereoscopic images. Different from saliency detection for 2D images, depth features have to be taken into account in saliency detection for stereoscopic images. In this paper, we propose a new stereoscopic saliency detection framework based on the feature contrast of color, intensity, texture, and depth. Four types of features including color, luminance, texture, and depth are extracted from DC-T coefficients to represent the energy for image patches. A Gaussian model of the spatial distance between image patches is adopted for the consideration of local and global contrast calculation. A new fusion method is designed to combine the feature maps for computing the final saliency map for stereoscopic images. Experimental results on a recent eye tracking database show the superior performance of the proposed method over other existing ones in saliency estimation for 3D images.
显著性检测技术已广泛应用于各种二维多媒体处理中。目前,新兴的立体显示应用需要新的立体图像显著性检测模型。与二维图像的显著性检测不同,立体图像的显著性检测需要考虑深度特征。本文提出了一种基于颜色、强度、纹理和深度特征对比的立体显著性检测框架。从DC-T系数中提取颜色、亮度、纹理和深度四种特征来表示图像patch的能量。为了考虑局部和全局对比度计算,采用图像斑块之间空间距离的高斯模型。设计了一种新的融合方法,结合特征图计算最终的立体图像显著性图。在最近的眼动追踪数据库上的实验结果表明,该方法在三维图像的显著性估计方面优于现有的方法。
{"title":"Saliency detection for stereoscopic images","authors":"Yuming Fang, Junle Wang, Manish Narwaria, P. Callet, Weisi Lin","doi":"10.1109/VCIP.2013.6706346","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706346","url":null,"abstract":"Saliency detection techniques have been widely used in various 2D multimedia processing applications. Currently, the emerging applications of stereoscopic display require new saliency detection models for stereoscopic images. Different from saliency detection for 2D images, depth features have to be taken into account in saliency detection for stereoscopic images. In this paper, we propose a new stereoscopic saliency detection framework based on the feature contrast of color, intensity, texture, and depth. Four types of features including color, luminance, texture, and depth are extracted from DC-T coefficients to represent the energy for image patches. A Gaussian model of the spatial distance between image patches is adopted for the consideration of local and global contrast calculation. A new fusion method is designed to combine the feature maps for computing the final saliency map for stereoscopic images. Experimental results on a recent eye tracking database show the superior performance of the proposed method over other existing ones in saliency estimation for 3D images.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132661633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 159
Tree-based Shape Descriptor for scalable logo detection 用于可扩展徽标检测的基于树的形状描述符
Pub Date : 2013-11-01 DOI: 10.1109/VCIP.2013.6706326
Chengde Wan, Zhicheng Zhao, Xin Guo, A. Cai
Detecting logos in real-world images is a great challenging task due to a variety of viewpoint or light condition changes and real-time requirements in practice. Conventional object detection methods, e.g., part-based model, may suffer from expensively computational cost if it was directly applied to this task. A promising alternative, triangle structural descriptor associated with matching strategy, offers an efficient way of recognizing logos. However, the descriptor fails to the rotation of logo images that often occurs when viewpoint changes. To overcome this shortcoming, we propose a new Tree-based Shape Descriptor (TSD) in this paper, which is strictly invariant to affine transformation in real-world images. The core of proposed descriptor is to encode the shape of logos by depicting both appearance and spatial information of four local key-points. In the training stage, an efficient algorithm is introduced to mine a discriminate subset of four tuples from all possible key-point combinations. Moreover, a root indexing scheme is designed to enable to detect multiple logos simultaneously. Extensive experiments on three benchmarks demonstrate the superiority of proposed approach over state-of-the-art methods.
在现实世界中,由于各种视点或光线条件的变化以及实践中的实时性要求,检测徽标是一项极具挑战性的任务。传统的目标检测方法,如基于零件的模型,如果直接应用于该任务,可能会带来昂贵的计算成本。一种很有前途的替代方法是与匹配策略相关联的三角形结构描述符,它提供了一种有效的标识识别方法。但是,描述符无法在视点更改时经常发生的徽标图像旋转。为了克服这一缺点,本文提出了一种新的基于树的形状描述子(TSD),该描述子对真实图像的仿射变换严格不变性。该描述符的核心是通过描述四个局部关键点的外观和空间信息来编码标识的形状。在训练阶段,引入了一种有效的算法,从所有可能的键点组合中挖掘出四个元组的区别子集。此外,还设计了一个根索引方案,可以同时检测多个徽标。在三个基准上进行的广泛实验表明,所提出的方法优于最先进的方法。
{"title":"Tree-based Shape Descriptor for scalable logo detection","authors":"Chengde Wan, Zhicheng Zhao, Xin Guo, A. Cai","doi":"10.1109/VCIP.2013.6706326","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706326","url":null,"abstract":"Detecting logos in real-world images is a great challenging task due to a variety of viewpoint or light condition changes and real-time requirements in practice. Conventional object detection methods, e.g., part-based model, may suffer from expensively computational cost if it was directly applied to this task. A promising alternative, triangle structural descriptor associated with matching strategy, offers an efficient way of recognizing logos. However, the descriptor fails to the rotation of logo images that often occurs when viewpoint changes. To overcome this shortcoming, we propose a new Tree-based Shape Descriptor (TSD) in this paper, which is strictly invariant to affine transformation in real-world images. The core of proposed descriptor is to encode the shape of logos by depicting both appearance and spatial information of four local key-points. In the training stage, an efficient algorithm is introduced to mine a discriminate subset of four tuples from all possible key-point combinations. Moreover, a root indexing scheme is designed to enable to detect multiple logos simultaneously. Extensive experiments on three benchmarks demonstrate the superiority of proposed approach over state-of-the-art methods.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134278629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A novel R-Q model based rate control scheme in HEVC 一种基于R-Q模型的HEVC速率控制方案
Pub Date : 2013-11-01 DOI: 10.1109/VCIP.2013.6706343
Xiaochuan Liang, Qiang Wang, Yinhe Zhou, Binji Luo, Aidong Men
High Efficiency Video Coding (HEVC) standard, which has been published as ITU-T H.265-ISO/IEC 23008-2, is the latest video coding standard of the ITU-T and the ISO/IEC. The main goal of the HEVC standardization is to improve the compression performance significantly, about 50% bit rate reduction for equal perceptual video quality, compared to the H.264/AVC standard. For any practically applied video coding standard, rate control is always an integral part. This paper proposed a novel rate-quantization (R-Q) model based rate control scheme to further reduce the bitrate error. The experimental results show that the proposed algorithm has better performance compared with the existing algorithm. The bitrate error of the proposed algorithm is much lower than the existing algorithms, while the Y-PSNR loss is less.
HEVC (High Efficiency Video Coding)标准是ITU-T和ISO/IEC的最新视频编码标准,已发布为ITU-T H.265-ISO/IEC 23008-2。HEVC标准化的主要目标是显著提高压缩性能,与H.264/AVC标准相比,在相同的感知视频质量下降低约50%的比特率。对于任何实际应用的视频编码标准,速率控制都是不可缺少的一部分。为了进一步降低码率误差,提出了一种新的基于率量化(R-Q)模型的码率控制方案。实验结果表明,与现有算法相比,该算法具有更好的性能。该算法的比特率误差远低于现有算法,Y-PSNR损失较小。
{"title":"A novel R-Q model based rate control scheme in HEVC","authors":"Xiaochuan Liang, Qiang Wang, Yinhe Zhou, Binji Luo, Aidong Men","doi":"10.1109/VCIP.2013.6706343","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706343","url":null,"abstract":"High Efficiency Video Coding (HEVC) standard, which has been published as ITU-T H.265-ISO/IEC 23008-2, is the latest video coding standard of the ITU-T and the ISO/IEC. The main goal of the HEVC standardization is to improve the compression performance significantly, about 50% bit rate reduction for equal perceptual video quality, compared to the H.264/AVC standard. For any practically applied video coding standard, rate control is always an integral part. This paper proposed a novel rate-quantization (R-Q) model based rate control scheme to further reduce the bitrate error. The experimental results show that the proposed algorithm has better performance compared with the existing algorithm. The bitrate error of the proposed algorithm is much lower than the existing algorithms, while the Y-PSNR loss is less.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114863137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Multi-scale face hallucination based on frequency bands analysis 基于频带分析的多尺度面部幻觉
Pub Date : 2013-11-01 DOI: 10.1109/VCIP.2013.6706411
Xiaodan Du, F. Jiang, Debin Zhao
In this paper, a multi-scale face hallucination method is proposed to produce high-resolution (HR) face images from low-resolution (LR) ones according to the specific face characteristics and priors based on frequency bands analysis. In the first scale, the middle-resolution (MR) images are generated based on a patch-based learning method in DCT domain. In this scale, the DC coefficients and AC coefficients are estimated separately. In the second scale, a DCT upsampling for low frequency band restoration and a high frequency band restoration are combined to generate the final high-resolution face images. Extensive experiments show that the proposed algorithm achieves significant improvement.
本文提出了一种基于频带分析的多尺度人脸幻觉方法,根据特定的人脸特征和先验信息,将低分辨率人脸图像转化为高分辨率人脸图像。在第一个尺度上,在DCT域采用基于patch的学习方法生成中分辨率(MR)图像。在这个量表中,直流系数和交流系数是分开估计的。在第二个尺度上,结合DCT上采样进行低频恢复和高频恢复,生成最终的高分辨率人脸图像。大量的实验表明,该算法取得了显著的改进。
{"title":"Multi-scale face hallucination based on frequency bands analysis","authors":"Xiaodan Du, F. Jiang, Debin Zhao","doi":"10.1109/VCIP.2013.6706411","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706411","url":null,"abstract":"In this paper, a multi-scale face hallucination method is proposed to produce high-resolution (HR) face images from low-resolution (LR) ones according to the specific face characteristics and priors based on frequency bands analysis. In the first scale, the middle-resolution (MR) images are generated based on a patch-based learning method in DCT domain. In this scale, the DC coefficients and AC coefficients are estimated separately. In the second scale, a DCT upsampling for low frequency band restoration and a high frequency band restoration are combined to generate the final high-resolution face images. Extensive experiments show that the proposed algorithm achieves significant improvement.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133155046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Distributed soft video broadcast with variable block size motion estimation 基于可变块大小运动估计的分布式软视频广播
Pub Date : 2013-11-01 DOI: 10.1109/VCIP.2013.6706380
Ailing Zhang, Xiaopeng Fan, Ruiqin Xiong, Debin Zhao
In recent years, video broadcast has become a popular application, but the traditional hierarchical design requires the source to pick a bitrate and video resolution for encoding before transmission, it cannot be efficient to accommodate users with different channel quality. The proposed DCAST scheme can solve this problem. DCAST uses the ME and MC technology to generate a predicted frame which helps current frame to do coset coding. However, in the ME process, DCAST uses fixed block size. In this paper, we use the variable block size motion estimation to replace the fixed block size motion estimation, it can effectively reduce the block effect and improve the quality of the reconstructed frame generated by MC and predicted frame. The DCAST with variable block size motion estimation is 0.5dB better than DCAST with fixed block size motion estimation.
近年来,视频广播已成为一种流行的应用,但传统的分层设计要求源端在传输前选择一个比特率和视频分辨率进行编码,无法有效地适应不同信道质量的用户。提出的DCAST方案可以解决这一问题。DCAST使用ME和MC技术生成预测帧,帮助当前帧进行协集编码。然而,在ME进程中,DCAST使用固定的块大小。在本文中,我们使用可变块大小的运动估计来代替固定块大小的运动估计,它可以有效地减少块效应,提高由MC和预测帧生成的重构帧的质量。可变块大小运动估计的DCAST比固定块大小运动估计的DCAST效果好0.5dB。
{"title":"Distributed soft video broadcast with variable block size motion estimation","authors":"Ailing Zhang, Xiaopeng Fan, Ruiqin Xiong, Debin Zhao","doi":"10.1109/VCIP.2013.6706380","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706380","url":null,"abstract":"In recent years, video broadcast has become a popular application, but the traditional hierarchical design requires the source to pick a bitrate and video resolution for encoding before transmission, it cannot be efficient to accommodate users with different channel quality. The proposed DCAST scheme can solve this problem. DCAST uses the ME and MC technology to generate a predicted frame which helps current frame to do coset coding. However, in the ME process, DCAST uses fixed block size. In this paper, we use the variable block size motion estimation to replace the fixed block size motion estimation, it can effectively reduce the block effect and improve the quality of the reconstructed frame generated by MC and predicted frame. The DCAST with variable block size motion estimation is 0.5dB better than DCAST with fixed block size motion estimation.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116081000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Multiple target performance evaluation model for HD video encoder VLSI architecture design 高清视频编码器VLSI架构设计的多目标性能评估模型
Pub Date : 2013-11-01 DOI: 10.1109/VCIP.2013.6706350
H. Yin, Shizhong Li, Hongqi Hu
FPGA and ASIC are suitable platforms for high definition video encoder implementation. Efficient video encoder VLSI architecture design suffers from several challenges and multiple target performance trade-off. Algorithm and hardware architecture are supposed to be jointly designed for multiple target performance trade-off. How to evaluate the performance, accounting for multiple target performance parameters, is one important problem for algorithm and architecture joint design. In this paper, we propose measure methods for multiple target performance parameters for VLSI architecture design, and then propose a novel multiple-target performance evaluation model. The performances of the prevalent H.264/AVC encoder architectures are evaluated with the proposed model. This work is meaningful for algorithm and architecture joint optimization.
FPGA和ASIC是实现高清视频编码器的合适平台。高效视频编码器VLSI架构设计面临诸多挑战和多目标性能权衡。针对多目标性能权衡问题,需要联合设计算法和硬件架构。如何在兼顾多个目标性能参数的情况下进行性能评估,是算法与架构联合设计的一个重要问题。本文提出了VLSI架构设计中多目标性能参数的测量方法,并在此基础上提出了一种新的多目标性能评估模型。利用该模型对目前流行的H.264/AVC编码器体系结构的性能进行了评价。该工作对算法和体系结构的联合优化具有重要意义。
{"title":"Multiple target performance evaluation model for HD video encoder VLSI architecture design","authors":"H. Yin, Shizhong Li, Hongqi Hu","doi":"10.1109/VCIP.2013.6706350","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706350","url":null,"abstract":"FPGA and ASIC are suitable platforms for high definition video encoder implementation. Efficient video encoder VLSI architecture design suffers from several challenges and multiple target performance trade-off. Algorithm and hardware architecture are supposed to be jointly designed for multiple target performance trade-off. How to evaluate the performance, accounting for multiple target performance parameters, is one important problem for algorithm and architecture joint design. In this paper, we propose measure methods for multiple target performance parameters for VLSI architecture design, and then propose a novel multiple-target performance evaluation model. The performances of the prevalent H.264/AVC encoder architectures are evaluated with the proposed model. This work is meaningful for algorithm and architecture joint optimization.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123763255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
What color is an object? 物体是什么颜色的?
Pub Date : 2013-11-01 DOI: 10.1109/VCIP.2013.6706433
Xiaofan Zhang, Zengchang Qin, X. Liu, T. Wan
Color perception is one of the major cognitive abilities of human being. Color information is also one of the most important features in various computer vision tasks including object recognition, tracking, scene classification and so on. In this paper, we proposed a simple and effective method for learning color composition of objects from large annotated datasets. The new proposed model is based on a region-based bag-of-colors model and saliency detection. The effectiveness of the model is empirically verified on manually labelled datasets with single or multiple tags. The significance of this research is that the color information of an object can provide useful prior knowledge to help improving the existing computer vision models in image segmentation, object recognition and tracking.
色彩感知是人类的主要认知能力之一。在各种计算机视觉任务中,包括物体识别、跟踪、场景分类等,颜色信息也是最重要的特征之一。在本文中,我们提出了一种简单有效的方法来学习大型标注数据集中物体的颜色组成。新提出的模型是基于基于区域的颜色袋模型和显著性检测。该模型的有效性在单个或多个标签的手动标记数据集上得到了经验验证。本研究的意义在于,物体的颜色信息可以提供有用的先验知识,帮助改进现有的计算机视觉模型在图像分割、物体识别和跟踪方面的性能。
{"title":"What color is an object?","authors":"Xiaofan Zhang, Zengchang Qin, X. Liu, T. Wan","doi":"10.1109/VCIP.2013.6706433","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706433","url":null,"abstract":"Color perception is one of the major cognitive abilities of human being. Color information is also one of the most important features in various computer vision tasks including object recognition, tracking, scene classification and so on. In this paper, we proposed a simple and effective method for learning color composition of objects from large annotated datasets. The new proposed model is based on a region-based bag-of-colors model and saliency detection. The effectiveness of the model is empirically verified on manually labelled datasets with single or multiple tags. The significance of this research is that the color information of an object can provide useful prior knowledge to help improving the existing computer vision models in image segmentation, object recognition and tracking.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130132661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel approach for combined rotational and translational motion estimation using Frame Projection Warping 一种基于帧投影变形的旋转运动和平移运动联合估计方法
Pub Date : 2013-11-01 DOI: 10.1109/VCIP.2013.6706396
Deepika Shukla, R. K. Jha, K. Aizawa
This paper introduces a novel video stabilization technique for combined rotational and translational motion using integral frame projections. In the proposed Frame Projection Warping (FPW) method, the normalized intensity projection curves of two consecutive frames are warped using dynamic time warping, to get the relative shift between them. Rotational and vertical motion estimation involves partitioning of frame in to two halves and their corresponding estimated motions are then utilized for respective rotational angle and vertical shift estimation. This technique uses the human perception for analyzing the rotation in terms of vertical displacement of two halves of frame. The proposed technique is tested over various hand recorded videos. The results show better performance of FPW over various existing intensity based techniques. This technique also gives better accuracy in case of frame blurring, which is a serious cause of wrong motion estimation. The performance of the proposed technique is measured in terms of interframe transformation fidelity and processing time.
本文介绍了一种利用积分帧投影实现旋转运动与平移运动相结合的视频稳像技术。在提出的帧投影扭曲(FPW)方法中,对连续两帧的归一化强度投影曲线进行动态时间扭曲,得到它们之间的相对位移。旋转和垂直运动估计涉及将帧分成两半,然后利用其相应的估计运动进行各自的旋转角度和垂直位移估计。该技术利用人的感知来分析框架的两半垂直位移的旋转。所提出的技术在各种手工录制的视频中进行了测试。结果表明,FPW的性能优于现有的各种基于强度的技术。在帧模糊的情况下,这种技术也提供了更好的精度,帧模糊是错误运动估计的一个严重原因。该技术的性能以帧间变换保真度和处理时间来衡量。
{"title":"A novel approach for combined rotational and translational motion estimation using Frame Projection Warping","authors":"Deepika Shukla, R. K. Jha, K. Aizawa","doi":"10.1109/VCIP.2013.6706396","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706396","url":null,"abstract":"This paper introduces a novel video stabilization technique for combined rotational and translational motion using integral frame projections. In the proposed Frame Projection Warping (FPW) method, the normalized intensity projection curves of two consecutive frames are warped using dynamic time warping, to get the relative shift between them. Rotational and vertical motion estimation involves partitioning of frame in to two halves and their corresponding estimated motions are then utilized for respective rotational angle and vertical shift estimation. This technique uses the human perception for analyzing the rotation in terms of vertical displacement of two halves of frame. The proposed technique is tested over various hand recorded videos. The results show better performance of FPW over various existing intensity based techniques. This technique also gives better accuracy in case of frame blurring, which is a serious cause of wrong motion estimation. The performance of the proposed technique is measured in terms of interframe transformation fidelity and processing time.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127885518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2013 Visual Communications and Image Processing (VCIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1