Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706448
Fei Liang, Xiulian Peng, Jizheng Xu
High Efficiency Video Coding (HEVC), not only provides a much better coding efficiency than previous video coding standards, but also shows significantly superior performance than other image coding schemes when applied to image coding. However, the improvement is at the cost of significant increase of encoding complexity. In this paper, we focus on retaining the high coding efficiency provided by HEVC while largely reducing its encoding complexity for image coding. By applying various techniques including optimized coding structure parameters, coding unit early termination, fast intra prediction and transform skip mode decision, we significantly reduce the complexity of HEVC intra coding while keeping most of its coding efficiency. Experimental results show that our light-weight HEVC encoder can save about 82% coding time compared with original HEVC encoder. With a slight loss to the HEVC reference software, the proposed scheme still gains about 19% in BD-BR compared with H.264/AVC.
高效视频编码(High Efficiency Video Coding, HEVC)不仅提供了比以前的视频编码标准更高的编码效率,而且在应用于图像编码时也表现出明显优于其他图像编码方案的性能。然而,这种改进是以显著增加编码复杂度为代价的。在本文中,我们的重点是保留HEVC所提供的高编码效率,同时大大降低其编码复杂度。通过优化编码结构参数、编码单元提前终止、快速内预测和变换跳过模式决策等技术,在保持HEVC编码效率的同时,显著降低了HEVC内编码的复杂度。实验结果表明,与原有的HEVC编码器相比,轻量级HEVC编码器可节省约82%的编码时间。与H.264/AVC相比,在对HEVC参考软件略有损失的情况下,该方案的BD-BR性能仍提高了19%左右。
{"title":"A light-weight HEVC encoder for image coding","authors":"Fei Liang, Xiulian Peng, Jizheng Xu","doi":"10.1109/VCIP.2013.6706448","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706448","url":null,"abstract":"High Efficiency Video Coding (HEVC), not only provides a much better coding efficiency than previous video coding standards, but also shows significantly superior performance than other image coding schemes when applied to image coding. However, the improvement is at the cost of significant increase of encoding complexity. In this paper, we focus on retaining the high coding efficiency provided by HEVC while largely reducing its encoding complexity for image coding. By applying various techniques including optimized coding structure parameters, coding unit early termination, fast intra prediction and transform skip mode decision, we significantly reduce the complexity of HEVC intra coding while keeping most of its coding efficiency. Experimental results show that our light-weight HEVC encoder can save about 82% coding time compared with original HEVC encoder. With a slight loss to the HEVC reference software, the proposed scheme still gains about 19% in BD-BR compared with H.264/AVC.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128378488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706423
Jielin Qiu, Ya Zhang, Jun Sun
Face recognition in open world environment is a very challenging task due to variant appearances of the target persons and a large scale of unregistered probe faces. In this paper we combine two parallel classifiers, one based on the Local Binary Pattern (LBP) feature and the other based on the Gabor features, to build a specific face recognizer for each target person. Faces used for training are borderline patterns obtained through a morphing procedure combing target faces and random non-target ones. Grid-search is applied to find an optimal morphing-degree-pair. By using an AND operator to integrate the prediction of the two complementary parallel classifiers, many false positives are eliminated in the final results. The proposed algorithm is compared with the Robust Sparse Coding method, using selected celebrities as the target persons and the images from FERET as the non-target faces. Experimental results suggest that the proposed approach is better at tolerating the distortion of the target person's appearance and has a lower false alarm rate.
{"title":"Face recogntion in open world environment","authors":"Jielin Qiu, Ya Zhang, Jun Sun","doi":"10.1109/VCIP.2013.6706423","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706423","url":null,"abstract":"Face recognition in open world environment is a very challenging task due to variant appearances of the target persons and a large scale of unregistered probe faces. In this paper we combine two parallel classifiers, one based on the Local Binary Pattern (LBP) feature and the other based on the Gabor features, to build a specific face recognizer for each target person. Faces used for training are borderline patterns obtained through a morphing procedure combing target faces and random non-target ones. Grid-search is applied to find an optimal morphing-degree-pair. By using an AND operator to integrate the prediction of the two complementary parallel classifiers, many false positives are eliminated in the final results. The proposed algorithm is compared with the Robust Sparse Coding method, using selected celebrities as the target persons and the images from FERET as the non-target faces. Experimental results suggest that the proposed approach is better at tolerating the distortion of the target person's appearance and has a lower false alarm rate.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128216286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706346
Yuming Fang, Junle Wang, Manish Narwaria, P. Callet, Weisi Lin
Saliency detection techniques have been widely used in various 2D multimedia processing applications. Currently, the emerging applications of stereoscopic display require new saliency detection models for stereoscopic images. Different from saliency detection for 2D images, depth features have to be taken into account in saliency detection for stereoscopic images. In this paper, we propose a new stereoscopic saliency detection framework based on the feature contrast of color, intensity, texture, and depth. Four types of features including color, luminance, texture, and depth are extracted from DC-T coefficients to represent the energy for image patches. A Gaussian model of the spatial distance between image patches is adopted for the consideration of local and global contrast calculation. A new fusion method is designed to combine the feature maps for computing the final saliency map for stereoscopic images. Experimental results on a recent eye tracking database show the superior performance of the proposed method over other existing ones in saliency estimation for 3D images.
{"title":"Saliency detection for stereoscopic images","authors":"Yuming Fang, Junle Wang, Manish Narwaria, P. Callet, Weisi Lin","doi":"10.1109/VCIP.2013.6706346","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706346","url":null,"abstract":"Saliency detection techniques have been widely used in various 2D multimedia processing applications. Currently, the emerging applications of stereoscopic display require new saliency detection models for stereoscopic images. Different from saliency detection for 2D images, depth features have to be taken into account in saliency detection for stereoscopic images. In this paper, we propose a new stereoscopic saliency detection framework based on the feature contrast of color, intensity, texture, and depth. Four types of features including color, luminance, texture, and depth are extracted from DC-T coefficients to represent the energy for image patches. A Gaussian model of the spatial distance between image patches is adopted for the consideration of local and global contrast calculation. A new fusion method is designed to combine the feature maps for computing the final saliency map for stereoscopic images. Experimental results on a recent eye tracking database show the superior performance of the proposed method over other existing ones in saliency estimation for 3D images.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132661633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706326
Chengde Wan, Zhicheng Zhao, Xin Guo, A. Cai
Detecting logos in real-world images is a great challenging task due to a variety of viewpoint or light condition changes and real-time requirements in practice. Conventional object detection methods, e.g., part-based model, may suffer from expensively computational cost if it was directly applied to this task. A promising alternative, triangle structural descriptor associated with matching strategy, offers an efficient way of recognizing logos. However, the descriptor fails to the rotation of logo images that often occurs when viewpoint changes. To overcome this shortcoming, we propose a new Tree-based Shape Descriptor (TSD) in this paper, which is strictly invariant to affine transformation in real-world images. The core of proposed descriptor is to encode the shape of logos by depicting both appearance and spatial information of four local key-points. In the training stage, an efficient algorithm is introduced to mine a discriminate subset of four tuples from all possible key-point combinations. Moreover, a root indexing scheme is designed to enable to detect multiple logos simultaneously. Extensive experiments on three benchmarks demonstrate the superiority of proposed approach over state-of-the-art methods.
{"title":"Tree-based Shape Descriptor for scalable logo detection","authors":"Chengde Wan, Zhicheng Zhao, Xin Guo, A. Cai","doi":"10.1109/VCIP.2013.6706326","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706326","url":null,"abstract":"Detecting logos in real-world images is a great challenging task due to a variety of viewpoint or light condition changes and real-time requirements in practice. Conventional object detection methods, e.g., part-based model, may suffer from expensively computational cost if it was directly applied to this task. A promising alternative, triangle structural descriptor associated with matching strategy, offers an efficient way of recognizing logos. However, the descriptor fails to the rotation of logo images that often occurs when viewpoint changes. To overcome this shortcoming, we propose a new Tree-based Shape Descriptor (TSD) in this paper, which is strictly invariant to affine transformation in real-world images. The core of proposed descriptor is to encode the shape of logos by depicting both appearance and spatial information of four local key-points. In the training stage, an efficient algorithm is introduced to mine a discriminate subset of four tuples from all possible key-point combinations. Moreover, a root indexing scheme is designed to enable to detect multiple logos simultaneously. Extensive experiments on three benchmarks demonstrate the superiority of proposed approach over state-of-the-art methods.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134278629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706343
Xiaochuan Liang, Qiang Wang, Yinhe Zhou, Binji Luo, Aidong Men
High Efficiency Video Coding (HEVC) standard, which has been published as ITU-T H.265-ISO/IEC 23008-2, is the latest video coding standard of the ITU-T and the ISO/IEC. The main goal of the HEVC standardization is to improve the compression performance significantly, about 50% bit rate reduction for equal perceptual video quality, compared to the H.264/AVC standard. For any practically applied video coding standard, rate control is always an integral part. This paper proposed a novel rate-quantization (R-Q) model based rate control scheme to further reduce the bitrate error. The experimental results show that the proposed algorithm has better performance compared with the existing algorithm. The bitrate error of the proposed algorithm is much lower than the existing algorithms, while the Y-PSNR loss is less.
HEVC (High Efficiency Video Coding)标准是ITU-T和ISO/IEC的最新视频编码标准,已发布为ITU-T H.265-ISO/IEC 23008-2。HEVC标准化的主要目标是显著提高压缩性能,与H.264/AVC标准相比,在相同的感知视频质量下降低约50%的比特率。对于任何实际应用的视频编码标准,速率控制都是不可缺少的一部分。为了进一步降低码率误差,提出了一种新的基于率量化(R-Q)模型的码率控制方案。实验结果表明,与现有算法相比,该算法具有更好的性能。该算法的比特率误差远低于现有算法,Y-PSNR损失较小。
{"title":"A novel R-Q model based rate control scheme in HEVC","authors":"Xiaochuan Liang, Qiang Wang, Yinhe Zhou, Binji Luo, Aidong Men","doi":"10.1109/VCIP.2013.6706343","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706343","url":null,"abstract":"High Efficiency Video Coding (HEVC) standard, which has been published as ITU-T H.265-ISO/IEC 23008-2, is the latest video coding standard of the ITU-T and the ISO/IEC. The main goal of the HEVC standardization is to improve the compression performance significantly, about 50% bit rate reduction for equal perceptual video quality, compared to the H.264/AVC standard. For any practically applied video coding standard, rate control is always an integral part. This paper proposed a novel rate-quantization (R-Q) model based rate control scheme to further reduce the bitrate error. The experimental results show that the proposed algorithm has better performance compared with the existing algorithm. The bitrate error of the proposed algorithm is much lower than the existing algorithms, while the Y-PSNR loss is less.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114863137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706411
Xiaodan Du, F. Jiang, Debin Zhao
In this paper, a multi-scale face hallucination method is proposed to produce high-resolution (HR) face images from low-resolution (LR) ones according to the specific face characteristics and priors based on frequency bands analysis. In the first scale, the middle-resolution (MR) images are generated based on a patch-based learning method in DCT domain. In this scale, the DC coefficients and AC coefficients are estimated separately. In the second scale, a DCT upsampling for low frequency band restoration and a high frequency band restoration are combined to generate the final high-resolution face images. Extensive experiments show that the proposed algorithm achieves significant improvement.
{"title":"Multi-scale face hallucination based on frequency bands analysis","authors":"Xiaodan Du, F. Jiang, Debin Zhao","doi":"10.1109/VCIP.2013.6706411","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706411","url":null,"abstract":"In this paper, a multi-scale face hallucination method is proposed to produce high-resolution (HR) face images from low-resolution (LR) ones according to the specific face characteristics and priors based on frequency bands analysis. In the first scale, the middle-resolution (MR) images are generated based on a patch-based learning method in DCT domain. In this scale, the DC coefficients and AC coefficients are estimated separately. In the second scale, a DCT upsampling for low frequency band restoration and a high frequency band restoration are combined to generate the final high-resolution face images. Extensive experiments show that the proposed algorithm achieves significant improvement.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133155046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, video broadcast has become a popular application, but the traditional hierarchical design requires the source to pick a bitrate and video resolution for encoding before transmission, it cannot be efficient to accommodate users with different channel quality. The proposed DCAST scheme can solve this problem. DCAST uses the ME and MC technology to generate a predicted frame which helps current frame to do coset coding. However, in the ME process, DCAST uses fixed block size. In this paper, we use the variable block size motion estimation to replace the fixed block size motion estimation, it can effectively reduce the block effect and improve the quality of the reconstructed frame generated by MC and predicted frame. The DCAST with variable block size motion estimation is 0.5dB better than DCAST with fixed block size motion estimation.
{"title":"Distributed soft video broadcast with variable block size motion estimation","authors":"Ailing Zhang, Xiaopeng Fan, Ruiqin Xiong, Debin Zhao","doi":"10.1109/VCIP.2013.6706380","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706380","url":null,"abstract":"In recent years, video broadcast has become a popular application, but the traditional hierarchical design requires the source to pick a bitrate and video resolution for encoding before transmission, it cannot be efficient to accommodate users with different channel quality. The proposed DCAST scheme can solve this problem. DCAST uses the ME and MC technology to generate a predicted frame which helps current frame to do coset coding. However, in the ME process, DCAST uses fixed block size. In this paper, we use the variable block size motion estimation to replace the fixed block size motion estimation, it can effectively reduce the block effect and improve the quality of the reconstructed frame generated by MC and predicted frame. The DCAST with variable block size motion estimation is 0.5dB better than DCAST with fixed block size motion estimation.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116081000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706350
H. Yin, Shizhong Li, Hongqi Hu
FPGA and ASIC are suitable platforms for high definition video encoder implementation. Efficient video encoder VLSI architecture design suffers from several challenges and multiple target performance trade-off. Algorithm and hardware architecture are supposed to be jointly designed for multiple target performance trade-off. How to evaluate the performance, accounting for multiple target performance parameters, is one important problem for algorithm and architecture joint design. In this paper, we propose measure methods for multiple target performance parameters for VLSI architecture design, and then propose a novel multiple-target performance evaluation model. The performances of the prevalent H.264/AVC encoder architectures are evaluated with the proposed model. This work is meaningful for algorithm and architecture joint optimization.
{"title":"Multiple target performance evaluation model for HD video encoder VLSI architecture design","authors":"H. Yin, Shizhong Li, Hongqi Hu","doi":"10.1109/VCIP.2013.6706350","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706350","url":null,"abstract":"FPGA and ASIC are suitable platforms for high definition video encoder implementation. Efficient video encoder VLSI architecture design suffers from several challenges and multiple target performance trade-off. Algorithm and hardware architecture are supposed to be jointly designed for multiple target performance trade-off. How to evaluate the performance, accounting for multiple target performance parameters, is one important problem for algorithm and architecture joint design. In this paper, we propose measure methods for multiple target performance parameters for VLSI architecture design, and then propose a novel multiple-target performance evaluation model. The performances of the prevalent H.264/AVC encoder architectures are evaluated with the proposed model. This work is meaningful for algorithm and architecture joint optimization.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123763255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706433
Xiaofan Zhang, Zengchang Qin, X. Liu, T. Wan
Color perception is one of the major cognitive abilities of human being. Color information is also one of the most important features in various computer vision tasks including object recognition, tracking, scene classification and so on. In this paper, we proposed a simple and effective method for learning color composition of objects from large annotated datasets. The new proposed model is based on a region-based bag-of-colors model and saliency detection. The effectiveness of the model is empirically verified on manually labelled datasets with single or multiple tags. The significance of this research is that the color information of an object can provide useful prior knowledge to help improving the existing computer vision models in image segmentation, object recognition and tracking.
{"title":"What color is an object?","authors":"Xiaofan Zhang, Zengchang Qin, X. Liu, T. Wan","doi":"10.1109/VCIP.2013.6706433","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706433","url":null,"abstract":"Color perception is one of the major cognitive abilities of human being. Color information is also one of the most important features in various computer vision tasks including object recognition, tracking, scene classification and so on. In this paper, we proposed a simple and effective method for learning color composition of objects from large annotated datasets. The new proposed model is based on a region-based bag-of-colors model and saliency detection. The effectiveness of the model is empirically verified on manually labelled datasets with single or multiple tags. The significance of this research is that the color information of an object can provide useful prior knowledge to help improving the existing computer vision models in image segmentation, object recognition and tracking.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130132661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706396
Deepika Shukla, R. K. Jha, K. Aizawa
This paper introduces a novel video stabilization technique for combined rotational and translational motion using integral frame projections. In the proposed Frame Projection Warping (FPW) method, the normalized intensity projection curves of two consecutive frames are warped using dynamic time warping, to get the relative shift between them. Rotational and vertical motion estimation involves partitioning of frame in to two halves and their corresponding estimated motions are then utilized for respective rotational angle and vertical shift estimation. This technique uses the human perception for analyzing the rotation in terms of vertical displacement of two halves of frame. The proposed technique is tested over various hand recorded videos. The results show better performance of FPW over various existing intensity based techniques. This technique also gives better accuracy in case of frame blurring, which is a serious cause of wrong motion estimation. The performance of the proposed technique is measured in terms of interframe transformation fidelity and processing time.
{"title":"A novel approach for combined rotational and translational motion estimation using Frame Projection Warping","authors":"Deepika Shukla, R. K. Jha, K. Aizawa","doi":"10.1109/VCIP.2013.6706396","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706396","url":null,"abstract":"This paper introduces a novel video stabilization technique for combined rotational and translational motion using integral frame projections. In the proposed Frame Projection Warping (FPW) method, the normalized intensity projection curves of two consecutive frames are warped using dynamic time warping, to get the relative shift between them. Rotational and vertical motion estimation involves partitioning of frame in to two halves and their corresponding estimated motions are then utilized for respective rotational angle and vertical shift estimation. This technique uses the human perception for analyzing the rotation in terms of vertical displacement of two halves of frame. The proposed technique is tested over various hand recorded videos. The results show better performance of FPW over various existing intensity based techniques. This technique also gives better accuracy in case of frame blurring, which is a serious cause of wrong motion estimation. The performance of the proposed technique is measured in terms of interframe transformation fidelity and processing time.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127885518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}