Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051573
Leiquan Wang, Zhicheng Zhao, Fei Su
In social image search, most existing hypergraph methods use the visual and textual features in isolation by treating each feature term as a hyperedge. Nevertheless, they neglect the correlations of visual and textual hyperedges, which are more robust to represent the high-order relationship among vertices. In this paper, we propose a hypergraph with correlated hyperedges (CHH), which introduces high-order relationship of hyperedges into hypergraph learning. Based on CHH, a pairwise visual-textual correlation hypergraph (VTCH) model is used for tag-based social image search. To overcome the large number of newly generated hybrid hyperedges, a bagging-based method is adopted to balance the accuracy and speed. Finally, adaptive hyperedges learning method is used to obtain the relevance score for social image search. The experiments conducted on MIR Flickr show the effectiveness of our proposed method.
{"title":"Tag-based social image search with hyperedges correlation","authors":"Leiquan Wang, Zhicheng Zhao, Fei Su","doi":"10.1109/VCIP.2014.7051573","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051573","url":null,"abstract":"In social image search, most existing hypergraph methods use the visual and textual features in isolation by treating each feature term as a hyperedge. Nevertheless, they neglect the correlations of visual and textual hyperedges, which are more robust to represent the high-order relationship among vertices. In this paper, we propose a hypergraph with correlated hyperedges (CHH), which introduces high-order relationship of hyperedges into hypergraph learning. Based on CHH, a pairwise visual-textual correlation hypergraph (VTCH) model is used for tag-based social image search. To overcome the large number of newly generated hybrid hyperedges, a bagging-based method is adopted to balance the accuracy and speed. Finally, adaptive hyperedges learning method is used to obtain the relevance score for social image search. The experiments conducted on MIR Flickr show the effectiveness of our proposed method.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"363 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121723298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051599
Jia-Lin Chen, Chun-Chen Kuo, Liang-Gee Chen
In this paper, we propose a novel concept of region-of-unpredictable (ROU) to accelerate full-frame feature generation in video sequences. Due to the high correlation between successive frames, there are only few regions in which the features could not be estimated accurately from the previous frame called region-of-unpredictable (ROU). We develop a scheme combining partial feature extraction in ROU with feature prediction from the previous frame. The full-frame features of the current frame can then be obtained to minimize information loss. Experimental results show that the ROU determination algorithm supports 95.71% detection rate. The full-frame feature generation scheme using ROU determination saves 79.38% computational time compared with the full-frame feature extraction.
{"title":"Region-of-unpredictable determination for accelerated full-frame feature generation in video sequences","authors":"Jia-Lin Chen, Chun-Chen Kuo, Liang-Gee Chen","doi":"10.1109/VCIP.2014.7051599","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051599","url":null,"abstract":"In this paper, we propose a novel concept of region-of-unpredictable (ROU) to accelerate full-frame feature generation in video sequences. Due to the high correlation between successive frames, there are only few regions in which the features could not be estimated accurately from the previous frame called region-of-unpredictable (ROU). We develop a scheme combining partial feature extraction in ROU with feature prediction from the previous frame. The full-frame features of the current frame can then be obtained to minimize information loss. Experimental results show that the ROU determination algorithm supports 95.71% detection rate. The full-frame feature generation scheme using ROU determination saves 79.38% computational time compared with the full-frame feature extraction.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123302418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051588
Mengmeng Zhang, Yuhui Guo, H. Bai
Since the publication of the High Efficiency Video Coding standard as the newest video coding standard, several extensions have been made. Among these, the use of the screen content coding in many fields is one of the important extensions. In terms of coding tree unit (CTU) partitioning, rate distortion optimization is still used in screen content coding. The complexity of the process has resulted in problems in relation to real-time application. Thus, this paper proposes a fast-deciding CTU partition mode algorithm based on entropy and coding bits. Experimental results show that the proposed algorithm can save 32% of encoding time on average compared with the default algorithm in HM-12.1+RExt-5.1 with only 0.8% bit rate increment in coding performance.
{"title":"Fast intra partition algorithm for HEVC screen content coding","authors":"Mengmeng Zhang, Yuhui Guo, H. Bai","doi":"10.1109/VCIP.2014.7051588","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051588","url":null,"abstract":"Since the publication of the High Efficiency Video Coding standard as the newest video coding standard, several extensions have been made. Among these, the use of the screen content coding in many fields is one of the important extensions. In terms of coding tree unit (CTU) partitioning, rate distortion optimization is still used in screen content coding. The complexity of the process has resulted in problems in relation to real-time application. Thus, this paper proposes a fast-deciding CTU partition mode algorithm based on entropy and coding bits. Experimental results show that the proposed algorithm can save 32% of encoding time on average compared with the default algorithm in HM-12.1+RExt-5.1 with only 0.8% bit rate increment in coding performance.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116917506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a reversible data hiding (RDH) method based on a two-dimensional wavelet coefficient histogram (2D WCH) in the wavelet domain. First, a cover image is decomposed into wavelet subbands using the invertible integer-to-integer wavelet transform (121-WT). Then, the 2D WCH is generated by counting the occurrence frequency of the wavelet coefficient pairs which denote two wavelet coefficients located in the same position in the selected two subbands where the secret message is embedded. By using the 2D WCH, the correlation between the selected subbands is more effectively utilized than the traditional ID histogram. The proposed method embed the secret message reversibly in the cover image by expanding the 2D WCH. In order to embed the secret message as efficient as possible, the expansion rule for 2D WCH is proposed. Moreover, the coefficient pair selection (CPS), which the coefficients embedding the data are selected in order to modify only the selected coefficients, is implemented before generating the 2D WCH. In the experiment, the proposed method is compared with the conventional RDH methods in terms of the capacity-distortion curve.
{"title":"Two-dimensional histogram expansion of wavelet coefficient for reversible data hiding","authors":"Kazuki Yamato, Kazuma Shinoda, Madoka Hasegawa, Shigeo Kato","doi":"10.1109/VCIP.2014.7051553","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051553","url":null,"abstract":"In this paper, we propose a reversible data hiding (RDH) method based on a two-dimensional wavelet coefficient histogram (2D WCH) in the wavelet domain. First, a cover image is decomposed into wavelet subbands using the invertible integer-to-integer wavelet transform (121-WT). Then, the 2D WCH is generated by counting the occurrence frequency of the wavelet coefficient pairs which denote two wavelet coefficients located in the same position in the selected two subbands where the secret message is embedded. By using the 2D WCH, the correlation between the selected subbands is more effectively utilized than the traditional ID histogram. The proposed method embed the secret message reversibly in the cover image by expanding the 2D WCH. In order to embed the secret message as efficient as possible, the expansion rule for 2D WCH is proposed. Moreover, the coefficient pair selection (CPS), which the coefficients embedding the data are selected in order to modify only the selected coefficients, is implemented before generating the 2D WCH. In the experiment, the proposed method is compared with the conventional RDH methods in terms of the capacity-distortion curve.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121153626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051526
Wael Elloumi, Kamel Guissous, A. Chetouani, S. Treuillet
In this paper, we propose to use visual saliency to improve an indoor localization system based on image matching. A learning step permits to determinate the reference trajectory by selecting some key frames along the path. During the localization step, the current image is then compared to the obtained key frames in order to estimate the user's position. This comparison is realized by extracting primitive information through a saliency method, which aims to improve our localization system by focusing our attention on the more singular regions to match. Another advantage of the saliency-guided detection is to save computation time. The proposed framework has been developed and tested on a Smartphone. The obtained results show the interest of the use of saliency models by comparing the numbers of features and good matches in video sequence.
{"title":"Improving a vision indoor localization system by a saliency-guided detection","authors":"Wael Elloumi, Kamel Guissous, A. Chetouani, S. Treuillet","doi":"10.1109/VCIP.2014.7051526","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051526","url":null,"abstract":"In this paper, we propose to use visual saliency to improve an indoor localization system based on image matching. A learning step permits to determinate the reference trajectory by selecting some key frames along the path. During the localization step, the current image is then compared to the obtained key frames in order to estimate the user's position. This comparison is realized by extracting primitive information through a saliency method, which aims to improve our localization system by focusing our attention on the more singular regions to match. Another advantage of the saliency-guided detection is to save computation time. The proposed framework has been developed and tested on a Smartphone. The obtained results show the interest of the use of saliency models by comparing the numbers of features and good matches in video sequence.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121107600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051510
Yuwen He, Yan Ye, Jie Dong
Color gamut scalability (CGS) in scalable extensions of High Efficiency Video Coding (SHVC) supports scalable coding with multiple layers in different color spaces. Base layer conveying HDTV video in BT.709 color space and enhancement layer conveying UHDTV video in BT.2020 color space is identified as a practical use case for CGS. Efficient CGS coding can be achieved using a 3D Look-up Table (LUT) based color conversion process. This paper proposes a robust 3D LUT parameter estimation method that estimates the 3D LUT parameters globally using the Least Square method. Problems of matrix sparsity and uneven sample distribution are carefully handled to improve the stability and accuracy of the estimation process. Simulation results confirm that the proposed 3D LUT estimation method can significantly improve coding performance compared with other gamut conversion methods.
高效视频编码(High Efficiency Video Coding, SHVC)的可扩展扩展中的色域可扩展性(CGS)支持在不同颜色空间中进行多层可扩展编码。确定了在BT.709色彩空间中传输HDTV视频的基层和在BT.2020色彩空间中传输UHDTV视频的增强层作为CGS的实际用例。使用基于3D查找表(LUT)的颜色转换过程可以实现高效的CGS编码。提出了一种鲁棒的三维LUT参数估计方法,利用最小二乘法对三维LUT参数进行全局估计。为了提高估计过程的稳定性和准确性,对矩阵稀疏性和样本分布不均匀等问题进行了细致的处理。仿真结果表明,与其他色域转换方法相比,所提出的三维LUT估计方法可以显著提高编码性能。
{"title":"Robust 3D LUT estimation method for SHVC color gamut scalability","authors":"Yuwen He, Yan Ye, Jie Dong","doi":"10.1109/VCIP.2014.7051510","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051510","url":null,"abstract":"Color gamut scalability (CGS) in scalable extensions of High Efficiency Video Coding (SHVC) supports scalable coding with multiple layers in different color spaces. Base layer conveying HDTV video in BT.709 color space and enhancement layer conveying UHDTV video in BT.2020 color space is identified as a practical use case for CGS. Efficient CGS coding can be achieved using a 3D Look-up Table (LUT) based color conversion process. This paper proposes a robust 3D LUT parameter estimation method that estimates the 3D LUT parameters globally using the Least Square method. Problems of matrix sparsity and uneven sample distribution are carefully handled to improve the stability and accuracy of the estimation process. Simulation results confirm that the proposed 3D LUT estimation method can significantly improve coding performance compared with other gamut conversion methods.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115380796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051498
Li Song, Chen Chen, Yi Xu, Genjian Xue, Yi Zhou
A recently proposed model, known as blind/referenceless image spatial quality evaluator (BRISQUE), achieves the state-of-the-art performance in context of blind image quality assessment (IQA). This model used the predefined generalized Gaussian distribution (GGD) to describe the regularity of natural scene statistics, introducing fitting errors due to variations of image contents. In this paper, a more generalized model is proposed to better characterize the regularity of extensive image contents, which is learned from the concatenated histograms of mean subtracted contrast normalized (MSCN) coefficients and pairwise products of MSCN coefficients of neighbouring pixels. The new feature based on MSCN shows its capability of preserving intrinsic distribution of image statistics. Consequently support vector machine regression (SVR) can map it to more accurate image quality scores. Experimental results show that the proposed approach achieves a slight gain from BRISQUE, which indicates the crafted GGD modelling step in BRISQUE is not essential for final performance.
{"title":"Blind image quality assessment based on a new feature of nature scene statistics","authors":"Li Song, Chen Chen, Yi Xu, Genjian Xue, Yi Zhou","doi":"10.1109/VCIP.2014.7051498","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051498","url":null,"abstract":"A recently proposed model, known as blind/referenceless image spatial quality evaluator (BRISQUE), achieves the state-of-the-art performance in context of blind image quality assessment (IQA). This model used the predefined generalized Gaussian distribution (GGD) to describe the regularity of natural scene statistics, introducing fitting errors due to variations of image contents. In this paper, a more generalized model is proposed to better characterize the regularity of extensive image contents, which is learned from the concatenated histograms of mean subtracted contrast normalized (MSCN) coefficients and pairwise products of MSCN coefficients of neighbouring pixels. The new feature based on MSCN shows its capability of preserving intrinsic distribution of image statistics. Consequently support vector machine regression (SVR) can map it to more accurate image quality scores. Experimental results show that the proposed approach achieves a slight gain from BRISQUE, which indicates the crafted GGD modelling step in BRISQUE is not essential for final performance.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115904665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camcorder piracy has great impact on the movie industry. Although there are many methods to prevent recording in theatre, no recognized technology satisfies the need of defeating camcorder piracy as well as having no effect on the audience. This paper presents a new projector display technique to defeat camcorder piracy in the theatre using a new paradigm of information display technology, called temporal psychovisual modulation (TPVM). TPVM exploits the difference in image formation mechanisms of human eyes and imaging sensors. The images formed in human vision is continuous integration of the light field while discrete sampling is used in digital video acquisition which has "blackout" period in each sampling cycle. Based on this difference, we can decompose a movie into a set of display frames and broadcast them out at high speed so that the audience can not notice any disturbance, while the video frames captured by camcorder will contain highly objectionable artifacts. The proposed prototype system built on the platform of DLP® LightCrafter 4500™ serves as a proof-of-concept of anti-piracy system.
{"title":"DLP based anti-piracy display system","authors":"Zhongpai Gao, Guangtao Zhai, Xiaolin Wu, Xiongkuo Min, Cheng Zhi","doi":"10.1109/VCIP.2014.7051525","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051525","url":null,"abstract":"Camcorder piracy has great impact on the movie industry. Although there are many methods to prevent recording in theatre, no recognized technology satisfies the need of defeating camcorder piracy as well as having no effect on the audience. This paper presents a new projector display technique to defeat camcorder piracy in the theatre using a new paradigm of information display technology, called temporal psychovisual modulation (TPVM). TPVM exploits the difference in image formation mechanisms of human eyes and imaging sensors. The images formed in human vision is continuous integration of the light field while discrete sampling is used in digital video acquisition which has \"blackout\" period in each sampling cycle. Based on this difference, we can decompose a movie into a set of display frames and broadcast them out at high speed so that the audience can not notice any disturbance, while the video frames captured by camcorder will contain highly objectionable artifacts. The proposed prototype system built on the platform of DLP® LightCrafter 4500™ serves as a proof-of-concept of anti-piracy system.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115699688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051606
Mari Miyata, K. Kodama, T. Hamamoto
Denoising is important in image processing because degradation by noise affects not only the quality of captured images but also the performance of visual applications that use them. For example, under low light levels, it is difficult to accurately estimate scene depths using noisy stereo images. Conventional methods for denoising find similar regions on an image or among multiple images by block matching(BM) to integrate them for suppressing noise effectively. However, such exhaustive BM incurs considerable costs for real-time applications, in particular, when multi-view images(MVI) are involved. We use view-dependent plane sweeping(PS) for image reconstruction to achieve effective MVI denoising with low computational cost. We use PS for converting MVI to multi-focus images(MFI) to suppress their noise. Then, we find regions in focus on the MFI solely by comparing them with the target view image. Finally, we simply merge the regions to obtain reconstructed images in which their noise is effectively suppressed.
{"title":"Fast multiple-view denoising based on image reconstruction by plane sweeping","authors":"Mari Miyata, K. Kodama, T. Hamamoto","doi":"10.1109/VCIP.2014.7051606","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051606","url":null,"abstract":"Denoising is important in image processing because degradation by noise affects not only the quality of captured images but also the performance of visual applications that use them. For example, under low light levels, it is difficult to accurately estimate scene depths using noisy stereo images. Conventional methods for denoising find similar regions on an image or among multiple images by block matching(BM) to integrate them for suppressing noise effectively. However, such exhaustive BM incurs considerable costs for real-time applications, in particular, when multi-view images(MVI) are involved. We use view-dependent plane sweeping(PS) for image reconstruction to achieve effective MVI denoising with low computational cost. We use PS for converting MVI to multi-focus images(MFI) to suppress their noise. Then, we find regions in focus on the MFI solely by comparing them with the target view image. Finally, we simply merge the regions to obtain reconstructed images in which their noise is effectively suppressed.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127401476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051519
Fabian Jäger, M. Wien
3D video is an emerging technology that bundles depth information with texture videos to allow for view synthesis applications at the receiver. Depth discontinuities define object boundaries in both, depth maps and the collocated texture video. Therefore, depth segmentation can be utilized for a fine-grained motion field partitioning of the corresponding texture component. In this paper, depth information is used to increase coding efficiency for texture videos by deriving an arbitrarily shaped partitioning. By applying motion compensation to each partition independently and eventually merging the two prediction signals, highly accurate prediction signals can be produced that reduce the remaining texture residual signal significantly. Simulation results show bitrate savings of up to 2.8% for the dependent texture views and up to about 1.0% with respect to the total bitrate.
{"title":"Simplified depth-based block partitioning and prediction merging in 3D video coding","authors":"Fabian Jäger, M. Wien","doi":"10.1109/VCIP.2014.7051519","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051519","url":null,"abstract":"3D video is an emerging technology that bundles depth information with texture videos to allow for view synthesis applications at the receiver. Depth discontinuities define object boundaries in both, depth maps and the collocated texture video. Therefore, depth segmentation can be utilized for a fine-grained motion field partitioning of the corresponding texture component. In this paper, depth information is used to increase coding efficiency for texture videos by deriving an arbitrarily shaped partitioning. By applying motion compensation to each partition independently and eventually merging the two prediction signals, highly accurate prediction signals can be produced that reduce the remaining texture residual signal significantly. Simulation results show bitrate savings of up to 2.8% for the dependent texture views and up to about 1.0% with respect to the total bitrate.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116297947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}