Pub Date : 2017-12-01DOI: 10.1109/VCIP.2017.8305021
Lingbing Peng, Shuaicheng Liu, Dehua Xie, Shuyuan Zhu, B. Zeng
Endoscopic videos have been widely used for stomach diagnoses. However, endoscopic devices often capture videos with motion blurs, due to the dimly-lit environment and the camera shakiness during the capturing, which severely disturbs the diagnoses. In this paper, we present a framework that can restore blurry frames by synthesizing image details from the nearby sharp frames. Specifically, the blurry frame and their corresponding nearby sharp frames are identified according to the image gradient sharpness. To restore one blurry frame, a non-parametric mesh-based motion model is proposed to align the sharp frame to the blurry frame. The motion model leverages motions from image feature matches and optical flows, which yields high quality alignments to overcome challenges such as noisy, blurry, reflective and textureless interferences. After the alignment, the deblurred frame is synthesized by matching patches locally between the blurry frame and the aligned sharp frame. Without the estimation of blur kernels, we show that it is possible to directly compare a blurry patch against the sharp patches for the nearest neighbor matches in endoscopic images. The experiments demonstrate the effectiveness of our algorithm.
{"title":"Endoscopic video deblurring via synthesis","authors":"Lingbing Peng, Shuaicheng Liu, Dehua Xie, Shuyuan Zhu, B. Zeng","doi":"10.1109/VCIP.2017.8305021","DOIUrl":"https://doi.org/10.1109/VCIP.2017.8305021","url":null,"abstract":"Endoscopic videos have been widely used for stomach diagnoses. However, endoscopic devices often capture videos with motion blurs, due to the dimly-lit environment and the camera shakiness during the capturing, which severely disturbs the diagnoses. In this paper, we present a framework that can restore blurry frames by synthesizing image details from the nearby sharp frames. Specifically, the blurry frame and their corresponding nearby sharp frames are identified according to the image gradient sharpness. To restore one blurry frame, a non-parametric mesh-based motion model is proposed to align the sharp frame to the blurry frame. The motion model leverages motions from image feature matches and optical flows, which yields high quality alignments to overcome challenges such as noisy, blurry, reflective and textureless interferences. After the alignment, the deblurred frame is synthesized by matching patches locally between the blurry frame and the aligned sharp frame. Without the estimation of blur kernels, we show that it is possible to directly compare a blurry patch against the sharp patches for the nearest neighbor matches in endoscopic images. The experiments demonstrate the effectiveness of our algorithm.","PeriodicalId":423636,"journal":{"name":"2017 IEEE Visual Communications and Image Processing (VCIP)","volume":"248 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129954475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/VCIP.2017.8305095
Xin Huang, Huaning Wang, Long Xu, W. Sun
Solar flare is one type of violent eruptions from the Sun. Its effects almost immediately arrive to the near-Earth environment, so it is crucial to forecast solar flares in space weather. So far, the physical mechanisms of solar flares are not yet clear, hence we learn a solar flare forecasting model from the historical observational magnetograms by using the deep learning method. Instead of designing the feature extractor by the solar physicist in the traditional solar flare forecasting model, the proposed forecasting model can automatically learn features from input raw data, and followed by a classifier for foretasting from the learned features. The experimental results demonstrate that the proposed model can achieve better performance of solar flare forecasting comparing to traditional solar flare forecasting models.
{"title":"Learning solar flare forecasting model from magnetograms","authors":"Xin Huang, Huaning Wang, Long Xu, W. Sun","doi":"10.1109/VCIP.2017.8305095","DOIUrl":"https://doi.org/10.1109/VCIP.2017.8305095","url":null,"abstract":"Solar flare is one type of violent eruptions from the Sun. Its effects almost immediately arrive to the near-Earth environment, so it is crucial to forecast solar flares in space weather. So far, the physical mechanisms of solar flares are not yet clear, hence we learn a solar flare forecasting model from the historical observational magnetograms by using the deep learning method. Instead of designing the feature extractor by the solar physicist in the traditional solar flare forecasting model, the proposed forecasting model can automatically learn features from input raw data, and followed by a classifier for foretasting from the learned features. The experimental results demonstrate that the proposed model can achieve better performance of solar flare forecasting comparing to traditional solar flare forecasting models.","PeriodicalId":423636,"journal":{"name":"2017 IEEE Visual Communications and Image Processing (VCIP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128397295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/VCIP.2017.8305083
R. Skupin, Y. Sanchez, Ye-Kui Wang, M. Hannuksela, J. Boyce, M. Wien
The emergence of consumer level capturing and display devices for 360 degree video creates new and promising segments in entertainment, education, professional training, and other markets. In order to avoid market fragmentation and ensure interoperability of 360 degree video ecosystems, industry and academia cooperate in standardization efforts in this field. In the video coding domain, 360 degree video invalidates many established procedures, e.g., concerning evaluation of the visual quality, while the specific content characteristics offer potential for higher compression efficiency beyond the current standards. Likewise, 360 degree video puts stricter demands on the system level aspects of transmission but may also offer the potential to enhance existing transport schemes. The Joint Collaborative Team on Video Coding (JCT-VC) as well as the Joint Video Exploration Team (JVET) already started investigations into 360 degree video coding while numerous activities in the Systems subgroup of the Moving Picture Experts Group (MPEG) started to investigate application requirements and delivery aspects of 360 degree video. This paper reports on the current status of the outlined standardization efforts.
{"title":"Standardization status of 360 degree video coding and delivery","authors":"R. Skupin, Y. Sanchez, Ye-Kui Wang, M. Hannuksela, J. Boyce, M. Wien","doi":"10.1109/VCIP.2017.8305083","DOIUrl":"https://doi.org/10.1109/VCIP.2017.8305083","url":null,"abstract":"The emergence of consumer level capturing and display devices for 360 degree video creates new and promising segments in entertainment, education, professional training, and other markets. In order to avoid market fragmentation and ensure interoperability of 360 degree video ecosystems, industry and academia cooperate in standardization efforts in this field. In the video coding domain, 360 degree video invalidates many established procedures, e.g., concerning evaluation of the visual quality, while the specific content characteristics offer potential for higher compression efficiency beyond the current standards. Likewise, 360 degree video puts stricter demands on the system level aspects of transmission but may also offer the potential to enhance existing transport schemes. The Joint Collaborative Team on Video Coding (JCT-VC) as well as the Joint Video Exploration Team (JVET) already started investigations into 360 degree video coding while numerous activities in the Systems subgroup of the Moving Picture Experts Group (MPEG) started to investigate application requirements and delivery aspects of 360 degree video. This paper reports on the current status of the outlined standardization efforts.","PeriodicalId":423636,"journal":{"name":"2017 IEEE Visual Communications and Image Processing (VCIP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114686975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/VCIP.2017.8305158
W. Xu, Jin Wang, Qing Zhu, Xi Wu, Yifei Qi
Depth cameras have gained significant popularity due to their affordable cost in recent years. However, the resolution of depth map captured by these cameras is rather limited, and thus it hardly can be directly used in visual depth perception and 3D reconstruction. In order to handle this problem, we propose a novel multiclass dictionary learning method, in which depth image is divided into classified patches according to their geometrical directions and a sparse dictionary is trained within each class. Different from previous SR works, we build the correspondence between training samples and their corresponding register color image via sparse representation. We further use the adaptive autoregressive model as a reconstruction constraint to preserve smooth regions and sharp edges. Experimental results demonstrate that our method outperforms state-of-the-art methods in depth map super-resolution in terms of both subjective quality and objective quality.
{"title":"Depth map super-resolution via multiclass dictionary learning with geometrical directions","authors":"W. Xu, Jin Wang, Qing Zhu, Xi Wu, Yifei Qi","doi":"10.1109/VCIP.2017.8305158","DOIUrl":"https://doi.org/10.1109/VCIP.2017.8305158","url":null,"abstract":"Depth cameras have gained significant popularity due to their affordable cost in recent years. However, the resolution of depth map captured by these cameras is rather limited, and thus it hardly can be directly used in visual depth perception and 3D reconstruction. In order to handle this problem, we propose a novel multiclass dictionary learning method, in which depth image is divided into classified patches according to their geometrical directions and a sparse dictionary is trained within each class. Different from previous SR works, we build the correspondence between training samples and their corresponding register color image via sparse representation. We further use the adaptive autoregressive model as a reconstruction constraint to preserve smooth regions and sharp edges. Experimental results demonstrate that our method outperforms state-of-the-art methods in depth map super-resolution in terms of both subjective quality and objective quality.","PeriodicalId":423636,"journal":{"name":"2017 IEEE Visual Communications and Image Processing (VCIP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127721859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/VCIP.2017.8305079
Yanan Wei, Zulin Wang, Mai Xu, Shu-juan Qiao
For H.264 to high efficiency video coding (HEVC) transcoding, this paper proposes a hierarchical Long Short-Term Memory (LSTM) method to predict coding unit (CU) splitting. Specifically, we first analyze the correlation between CU splitting patterns and H.264 features. Upon our analysis, we further propose a hierarchical LSTM architecture for predicting CU splitting of HEVC, with regard to the explored H.264 features. The features of H.264, including residual, macroblock (MB) partition and bit allocation, are employed as the input to our LSTM method. Experimental results demonstrate that the proposed method outperforms the state-of-the-art H.264 to HEVC transcoding methods, in terms of both complexity reduction and PSNR performance.
{"title":"An LSTM method for predicting CU splitting in H.264 to HEVC transcoding","authors":"Yanan Wei, Zulin Wang, Mai Xu, Shu-juan Qiao","doi":"10.1109/VCIP.2017.8305079","DOIUrl":"https://doi.org/10.1109/VCIP.2017.8305079","url":null,"abstract":"For H.264 to high efficiency video coding (HEVC) transcoding, this paper proposes a hierarchical Long Short-Term Memory (LSTM) method to predict coding unit (CU) splitting. Specifically, we first analyze the correlation between CU splitting patterns and H.264 features. Upon our analysis, we further propose a hierarchical LSTM architecture for predicting CU splitting of HEVC, with regard to the explored H.264 features. The features of H.264, including residual, macroblock (MB) partition and bit allocation, are employed as the input to our LSTM method. Experimental results demonstrate that the proposed method outperforms the state-of-the-art H.264 to HEVC transcoding methods, in terms of both complexity reduction and PSNR performance.","PeriodicalId":423636,"journal":{"name":"2017 IEEE Visual Communications and Image Processing (VCIP)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134115065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/VCIP.2017.8305038
Per Wennersten, Jacob Ström, Y. Wang, K. Andersson, Rickard Sjöberg, Jack Enhorn
This paper proposes the use of a bilateral filter as a coding tool for video compression. The filter is applied after transform and reconstruction, and the filtered result is used both for output as well as for spatial and temporal prediction. The implementation is based on a look-up table (LUT), making it fast enough to give a reasonable trade-off between complexity and compression efficiency. By varying the center filter coefficient and avoiding storing zero LUT entries, it is possible to reduce the size of the LUT to 2202 bytes. It is also demonstrated that the filter can be implemented without divisions, which is important for full custom ASIC implementations. The method has been implemented and tested according to the common test conditions in JEM version 5.0.1. For still images, or intra frames, we report a 0.4% bitrate reduction with a complexity increase of 6% in the encoder and 5% in the decoder. For video, we report a 0.5% bitrate reduction with a complexity increase of 3% in the encoder and 0% in the decoder.
{"title":"Bilateral filtering for video coding","authors":"Per Wennersten, Jacob Ström, Y. Wang, K. Andersson, Rickard Sjöberg, Jack Enhorn","doi":"10.1109/VCIP.2017.8305038","DOIUrl":"https://doi.org/10.1109/VCIP.2017.8305038","url":null,"abstract":"This paper proposes the use of a bilateral filter as a coding tool for video compression. The filter is applied after transform and reconstruction, and the filtered result is used both for output as well as for spatial and temporal prediction. The implementation is based on a look-up table (LUT), making it fast enough to give a reasonable trade-off between complexity and compression efficiency. By varying the center filter coefficient and avoiding storing zero LUT entries, it is possible to reduce the size of the LUT to 2202 bytes. It is also demonstrated that the filter can be implemented without divisions, which is important for full custom ASIC implementations. The method has been implemented and tested according to the common test conditions in JEM version 5.0.1. For still images, or intra frames, we report a 0.4% bitrate reduction with a complexity increase of 6% in the encoder and 5% in the decoder. For video, we report a 0.5% bitrate reduction with a complexity increase of 3% in the encoder and 0% in the decoder.","PeriodicalId":423636,"journal":{"name":"2017 IEEE Visual Communications and Image Processing (VCIP)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134311843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/VCIP.2017.8305111
Xi Zhang, Xiaolin Wu
In many object recognition applications, especially in face recognition, varying illuminations can adversely affect the robustness of the object recognition system. In this paper, we propose a novel illumination invariant feature called Neighboring Radiance Ratio (NRR) which is insensitive to both intensity and direction of light. NRR is derived and analyzed based on a physical image formation model. The computation of NRR does not need any prior information or any training data and NRR is far less sensitive to the border of shadows than most existing methods. The analysis of the illumination invariance of NRR is also presented. The proposed NRR feature is tested on Extended Yale B and CMU-PIE databases and compared with several previous methods. The experimental results corroborate our analysis and demonstrate that NRR is highly robust image feature against illumination changes.
{"title":"Illumination invariant feature based on neighboring radiance ratio","authors":"Xi Zhang, Xiaolin Wu","doi":"10.1109/VCIP.2017.8305111","DOIUrl":"https://doi.org/10.1109/VCIP.2017.8305111","url":null,"abstract":"In many object recognition applications, especially in face recognition, varying illuminations can adversely affect the robustness of the object recognition system. In this paper, we propose a novel illumination invariant feature called Neighboring Radiance Ratio (NRR) which is insensitive to both intensity and direction of light. NRR is derived and analyzed based on a physical image formation model. The computation of NRR does not need any prior information or any training data and NRR is far less sensitive to the border of shadows than most existing methods. The analysis of the illumination invariance of NRR is also presented. The proposed NRR feature is tested on Extended Yale B and CMU-PIE databases and compared with several previous methods. The experimental results corroborate our analysis and demonstrate that NRR is highly robust image feature against illumination changes.","PeriodicalId":423636,"journal":{"name":"2017 IEEE Visual Communications and Image Processing (VCIP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129404103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/VCIP.2017.8305058
Xiang Li, Guangtao Zhai, Jia Wang, Ke Gu
With the rapid development of visual media, people prefer to pay more attention to privacy protection in public situations. Currently, most existing researches on information security such as cryptography and steganography mainly concern transmission and yet little has been done to keep the information displayed on screens from reaching eyes of the bystanders. At the same time, the reporter just stands in front of the screen during traditional meetings. Limited time, screen area and report forms, which inevitably leads to limited information. As a result, we design a portable screen for assisting the reporter to present any information in a new form. In some public occasions, the reporter can show private content with important information to authorized audience while the others can not see that. In this paper, we propose a new Spatial PsychoVisual Modulation (SPVM) based solution to the privacy problem. This system uses two synchronized projectors with linear polarization filters and polarization glasses, and a camera with linear polarization filter the metallic screen. It can guarantee the system shows private information synchronously. We have implemented the system and experimental results demonstrate the effectiveness and robustness of the proposed information security display system.
{"title":"Portable information security display system via Spatial PsychoVisual Modulation","authors":"Xiang Li, Guangtao Zhai, Jia Wang, Ke Gu","doi":"10.1109/VCIP.2017.8305058","DOIUrl":"https://doi.org/10.1109/VCIP.2017.8305058","url":null,"abstract":"With the rapid development of visual media, people prefer to pay more attention to privacy protection in public situations. Currently, most existing researches on information security such as cryptography and steganography mainly concern transmission and yet little has been done to keep the information displayed on screens from reaching eyes of the bystanders. At the same time, the reporter just stands in front of the screen during traditional meetings. Limited time, screen area and report forms, which inevitably leads to limited information. As a result, we design a portable screen for assisting the reporter to present any information in a new form. In some public occasions, the reporter can show private content with important information to authorized audience while the others can not see that. In this paper, we propose a new Spatial PsychoVisual Modulation (SPVM) based solution to the privacy problem. This system uses two synchronized projectors with linear polarization filters and polarization glasses, and a camera with linear polarization filter the metallic screen. It can guarantee the system shows private information synchronously. We have implemented the system and experimental results demonstrate the effectiveness and robustness of the proposed information security display system.","PeriodicalId":423636,"journal":{"name":"2017 IEEE Visual Communications and Image Processing (VCIP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126562166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/VCIP.2017.8305078
Mangona Bokshi, Fei Tao, C. Busso, J. Hansen
The process of speech production changes between speaking and singing due to excitation, vocal tract articulatory positioning, and cognitive motor planning while singing. Singing does not only deviate from typical spoken speech, but it varies across various styles of singing. This is due to alternative genres of music, singing quality of an individual, as well as different languages and cultures. Because of this variation, it is important to establish a baseline system for differentiating between certain aspects of singing. In this study, we establish a classification system that automatically estimates singing quality of candidates from an American TV singing show based on their singing speech acoustics, lip and eye movements. We employ three classifiers that include: Logistic Regression, Naive Bayes and K-nearest neighbor (k-NN) and compare performance of each using unimodal and multimodal features. We also compare performance based on different modalities (speech, lip, eye structure). The results show that audio content performs the best, with modest gains when lip and eye content are fused. An interesting outcome is that lip and eye content achieve an 82% quality assessment while audio achieves 95%. The ability to assess singing quality from lip and eye content at this level is remarkable.
{"title":"Assessment and classification of singing quality based on audio-visual features","authors":"Mangona Bokshi, Fei Tao, C. Busso, J. Hansen","doi":"10.1109/VCIP.2017.8305078","DOIUrl":"https://doi.org/10.1109/VCIP.2017.8305078","url":null,"abstract":"The process of speech production changes between speaking and singing due to excitation, vocal tract articulatory positioning, and cognitive motor planning while singing. Singing does not only deviate from typical spoken speech, but it varies across various styles of singing. This is due to alternative genres of music, singing quality of an individual, as well as different languages and cultures. Because of this variation, it is important to establish a baseline system for differentiating between certain aspects of singing. In this study, we establish a classification system that automatically estimates singing quality of candidates from an American TV singing show based on their singing speech acoustics, lip and eye movements. We employ three classifiers that include: Logistic Regression, Naive Bayes and K-nearest neighbor (k-NN) and compare performance of each using unimodal and multimodal features. We also compare performance based on different modalities (speech, lip, eye structure). The results show that audio content performs the best, with modest gains when lip and eye content are fused. An interesting outcome is that lip and eye content achieve an 82% quality assessment while audio achieves 95%. The ability to assess singing quality from lip and eye content at this level is remarkable.","PeriodicalId":423636,"journal":{"name":"2017 IEEE Visual Communications and Image Processing (VCIP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130887857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/VCIP.2017.8305107
Wei Zhang, Dong Liu, Zhiwei Xiong, Jizheng Xu
A light field consists of multiple views of a scene, which can be arranged and encoded like a pseudo sequence. Since the correlations between views are not equal and indeed content dependent, a well-constructed coding order and adaptive prediction structure will improve performance. In this paper, we propose an adaptive prediction structure for light field compression. While the coding order is inherited from the 2-D hierarchical coding order, the prediction structure is determined by the differences between scale-invariant feature transform (SIFT) descriptors of the views. Experimental results show that the proposed method leads to on average 5.71% BD-rate reduction compared with fixed prediction structure.
{"title":"SIFT-based adaptive prediction structure for light field compression","authors":"Wei Zhang, Dong Liu, Zhiwei Xiong, Jizheng Xu","doi":"10.1109/VCIP.2017.8305107","DOIUrl":"https://doi.org/10.1109/VCIP.2017.8305107","url":null,"abstract":"A light field consists of multiple views of a scene, which can be arranged and encoded like a pseudo sequence. Since the correlations between views are not equal and indeed content dependent, a well-constructed coding order and adaptive prediction structure will improve performance. In this paper, we propose an adaptive prediction structure for light field compression. While the coding order is inherited from the 2-D hierarchical coding order, the prediction structure is determined by the differences between scale-invariant feature transform (SIFT) descriptors of the views. Experimental results show that the proposed method leads to on average 5.71% BD-rate reduction compared with fixed prediction structure.","PeriodicalId":423636,"journal":{"name":"2017 IEEE Visual Communications and Image Processing (VCIP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127629287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}