2017 IEEE Visual Communications and Image Processing (VCIP)最新文献

英文中文

Endoscopic video deblurring via synthesis 内窥镜视频通过合成去模糊

2017 IEEE Visual Communications and Image Processing (VCIP)

Pub Date : 2017-12-01 DOI: 10.1109/VCIP.2017.8305021

Lingbing Peng, Shuaicheng Liu, Dehua Xie, Shuyuan Zhu, B. Zeng

Endoscopic videos have been widely used for stomach diagnoses. However, endoscopic devices often capture videos with motion blurs, due to the dimly-lit environment and the camera shakiness during the capturing, which severely disturbs the diagnoses. In this paper, we present a framework that can restore blurry frames by synthesizing image details from the nearby sharp frames. Specifically, the blurry frame and their corresponding nearby sharp frames are identified according to the image gradient sharpness. To restore one blurry frame, a non-parametric mesh-based motion model is proposed to align the sharp frame to the blurry frame. The motion model leverages motions from image feature matches and optical flows, which yields high quality alignments to overcome challenges such as noisy, blurry, reflective and textureless interferences. After the alignment, the deblurred frame is synthesized by matching patches locally between the blurry frame and the aligned sharp frame. Without the estimation of blur kernels, we show that it is possible to directly compare a blurry patch against the sharp patches for the nearest neighbor matches in endoscopic images. The experiments demonstrate the effectiveness of our algorithm.

内镜视频已广泛用于胃诊断。然而，内窥镜设备在拍摄视频时，由于光线昏暗的环境和相机在拍摄过程中的抖动，经常会出现运动模糊，这严重影响了诊断。在本文中，我们提出了一个框架，可以通过合成图像细节从附近的清晰帧恢复模糊帧。具体而言，根据图像梯度清晰度识别模糊帧及其附近对应的清晰帧。为了恢复一个模糊帧，提出了一种基于非参数网格的运动模型，将清晰帧与模糊帧对齐。运动模型利用图像特征匹配和光流的运动，从而产生高质量的对齐，以克服诸如噪声，模糊，反射和无纹理干扰等挑战。对准后，将模糊帧与对准后的清晰帧进行局部匹配，合成去模糊帧。在没有模糊核估计的情况下，我们表明可以直接比较内窥镜图像中最近邻匹配的模糊斑块和锐利斑块。实验证明了算法的有效性。

{"title":"Endoscopic video deblurring via synthesis","authors":"Lingbing Peng, Shuaicheng Liu, Dehua Xie, Shuyuan Zhu, B. Zeng","doi":"10.1109/VCIP.2017.8305021","DOIUrl":"https://doi.org/10.1109/VCIP.2017.8305021","url":null,"abstract":"Endoscopic videos have been widely used for stomach diagnoses. However, endoscopic devices often capture videos with motion blurs, due to the dimly-lit environment and the camera shakiness during the capturing, which severely disturbs the diagnoses. In this paper, we present a framework that can restore blurry frames by synthesizing image details from the nearby sharp frames. Specifically, the blurry frame and their corresponding nearby sharp frames are identified according to the image gradient sharpness. To restore one blurry frame, a non-parametric mesh-based motion model is proposed to align the sharp frame to the blurry frame. The motion model leverages motions from image feature matches and optical flows, which yields high quality alignments to overcome challenges such as noisy, blurry, reflective and textureless interferences. After the alignment, the deblurred frame is synthesized by matching patches locally between the blurry frame and the aligned sharp frame. Without the estimation of blur kernels, we show that it is possible to directly compare a blurry patch against the sharp patches for the nearest neighbor matches in endoscopic images. The experiments demonstrate the effectiveness of our algorithm.","PeriodicalId":423636,"journal":{"name":"2017 IEEE Visual Communications and Image Processing (VCIP)","volume":"248 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129954475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Learning solar flare forecasting model from magnetograms 利用磁图学习太阳耀斑预报模型

2017 IEEE Visual Communications and Image Processing (VCIP)

Pub Date : 2017-12-01 DOI: 10.1109/VCIP.2017.8305095

Xin Huang, Huaning Wang, Long Xu, W. Sun

Solar flare is one type of violent eruptions from the Sun. Its effects almost immediately arrive to the near-Earth environment, so it is crucial to forecast solar flares in space weather. So far, the physical mechanisms of solar flares are not yet clear, hence we learn a solar flare forecasting model from the historical observational magnetograms by using the deep learning method. Instead of designing the feature extractor by the solar physicist in the traditional solar flare forecasting model, the proposed forecasting model can automatically learn features from input raw data, and followed by a classifier for foretasting from the learned features. The experimental results demonstrate that the proposed model can achieve better performance of solar flare forecasting comparing to traditional solar flare forecasting models.

太阳耀斑是太阳猛烈喷发的一种。它的影响几乎会立即到达近地环境，因此在太空天气中预测太阳耀斑是至关重要的。迄今为止，太阳耀斑的物理机制尚不清楚，因此我们利用深度学习方法从历史观测磁图中学习太阳耀斑预测模型。与传统的太阳耀斑预测模型中由太阳物理学家设计特征提取器不同，本文提出的预测模型可以从输入的原始数据中自动学习特征，然后通过分类器对学习到的特征进行预测。实验结果表明，与传统的太阳耀斑预测模型相比，该模型具有更好的预测效果。

引用次数: 2

Standardization status of 360 degree video coding and delivery 360度视频编码与传输标准化现状

2017 IEEE Visual Communications and Image Processing (VCIP)

Pub Date : 2017-12-01 DOI: 10.1109/VCIP.2017.8305083

R. Skupin, Y. Sanchez, Ye-Kui Wang, M. Hannuksela, J. Boyce, M. Wien

The emergence of consumer level capturing and display devices for 360 degree video creates new and promising segments in entertainment, education, professional training, and other markets. In order to avoid market fragmentation and ensure interoperability of 360 degree video ecosystems, industry and academia cooperate in standardization efforts in this field. In the video coding domain, 360 degree video invalidates many established procedures, e.g., concerning evaluation of the visual quality, while the specific content characteristics offer potential for higher compression efficiency beyond the current standards. Likewise, 360 degree video puts stricter demands on the system level aspects of transmission but may also offer the potential to enhance existing transport schemes. The Joint Collaborative Team on Video Coding (JCT-VC) as well as the Joint Video Exploration Team (JVET) already started investigations into 360 degree video coding while numerous activities in the Systems subgroup of the Moving Picture Experts Group (MPEG) started to investigate application requirements and delivery aspects of 360 degree video. This paper reports on the current status of the outlined standardization efforts.

消费级360度视频捕捉和显示设备的出现，为娱乐、教育、专业培训和其他市场创造了新的、有前景的细分市场。为了避免市场分化并确保360度视频生态系统的互操作性，工业界和学术界在该领域的标准化工作中进行了合作。在视频编码领域，360度视频使许多既定程序失效，例如，关于视觉质量的评估，而具体的内容特征提供了超越当前标准的更高压缩效率的潜力。同样，360度视频对传输的系统级方面提出了更严格的要求，但也可能提供增强现有运输计划的潜力。视频编码联合协作小组(JCT-VC)和联合视频探索小组(JVET)已经开始调查360度视频编码，而运动图像专家组(MPEG)系统子小组的许多活动也开始调查360度视频的应用需求和交付方面。本文报告了概述的标准化工作的现状。

{"title":"Standardization status of 360 degree video coding and delivery","authors":"R. Skupin, Y. Sanchez, Ye-Kui Wang, M. Hannuksela, J. Boyce, M. Wien","doi":"10.1109/VCIP.2017.8305083","DOIUrl":"https://doi.org/10.1109/VCIP.2017.8305083","url":null,"abstract":"The emergence of consumer level capturing and display devices for 360 degree video creates new and promising segments in entertainment, education, professional training, and other markets. In order to avoid market fragmentation and ensure interoperability of 360 degree video ecosystems, industry and academia cooperate in standardization efforts in this field. In the video coding domain, 360 degree video invalidates many established procedures, e.g., concerning evaluation of the visual quality, while the specific content characteristics offer potential for higher compression efficiency beyond the current standards. Likewise, 360 degree video puts stricter demands on the system level aspects of transmission but may also offer the potential to enhance existing transport schemes. The Joint Collaborative Team on Video Coding (JCT-VC) as well as the Joint Video Exploration Team (JVET) already started investigations into 360 degree video coding while numerous activities in the Systems subgroup of the Moving Picture Experts Group (MPEG) started to investigate application requirements and delivery aspects of 360 degree video. This paper reports on the current status of the outlined standardization efforts.","PeriodicalId":423636,"journal":{"name":"2017 IEEE Visual Communications and Image Processing (VCIP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114686975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Depth map super-resolution via multiclass dictionary learning with geometrical directions 基于几何方向的多类字典学习深度图超分辨率

2017 IEEE Visual Communications and Image Processing (VCIP)

Pub Date : 2017-12-01 DOI: 10.1109/VCIP.2017.8305158

W. Xu, Jin Wang, Qing Zhu, Xi Wu, Yifei Qi

Depth cameras have gained significant popularity due to their affordable cost in recent years. However, the resolution of depth map captured by these cameras is rather limited, and thus it hardly can be directly used in visual depth perception and 3D reconstruction. In order to handle this problem, we propose a novel multiclass dictionary learning method, in which depth image is divided into classified patches according to their geometrical directions and a sparse dictionary is trained within each class. Different from previous SR works, we build the correspondence between training samples and their corresponding register color image via sparse representation. We further use the adaptive autoregressive model as a reconstruction constraint to preserve smooth regions and sharp edges. Experimental results demonstrate that our method outperforms state-of-the-art methods in depth map super-resolution in terms of both subjective quality and objective quality.

近年来，深度相机因其价格实惠而受到广泛欢迎。然而，这些相机所获取的深度图分辨率有限，难以直接用于视觉深度感知和三维重建。为了解决这一问题，我们提出了一种新的多类字典学习方法，该方法将深度图像根据其几何方向划分为分类块，并在每个类内训练一个稀疏字典。与以往的SR工作不同的是，我们通过稀疏表示来建立训练样本与其对应的配准彩色图像之间的对应关系。我们进一步使用自适应自回归模型作为重建约束，以保持光滑区域和锐利边缘。实验结果表明，该方法无论在主观质量还是客观质量上都优于目前最先进的深度图超分辨率方法。

引用次数: 2

An LSTM method for predicting CU splitting in H.264 to HEVC transcoding 一种预测H.264转HEVC中CU分裂的LSTM方法

2017 IEEE Visual Communications and Image Processing (VCIP)

Pub Date : 2017-12-01 DOI: 10.1109/VCIP.2017.8305079

Yanan Wei, Zulin Wang, Mai Xu, Shu-juan Qiao

For H.264 to high efficiency video coding (HEVC) transcoding, this paper proposes a hierarchical Long Short-Term Memory (LSTM) method to predict coding unit (CU) splitting. Specifically, we first analyze the correlation between CU splitting patterns and H.264 features. Upon our analysis, we further propose a hierarchical LSTM architecture for predicting CU splitting of HEVC, with regard to the explored H.264 features. The features of H.264, including residual, macroblock (MB) partition and bit allocation, are employed as the input to our LSTM method. Experimental results demonstrate that the proposed method outperforms the state-of-the-art H.264 to HEVC transcoding methods, in terms of both complexity reduction and PSNR performance.

针对H.264到高效视频编码(HEVC)的转码问题，提出了一种分层长短期记忆(LSTM)预测编码单元(CU)分裂的方法。具体来说，我们首先分析了CU分割模式与H.264特性之间的相关性。基于我们的分析，我们进一步提出了一种分层LSTM架构，用于预测HEVC的CU分裂，考虑到探索的H.264特征。将H.264的残差、宏块(MB)划分和位分配等特征作为LSTM方法的输入。实验结果表明，该方法在复杂度降低和PSNR性能方面都优于目前最先进的H.264转HEVC转码方法。

引用次数: 6

Bilateral filtering for video coding 视频编码的双边滤波

2017 IEEE Visual Communications and Image Processing (VCIP)

Pub Date : 2017-12-01 DOI: 10.1109/VCIP.2017.8305038

Per Wennersten, Jacob Ström, Y. Wang, K. Andersson, Rickard Sjöberg, Jack Enhorn

This paper proposes the use of a bilateral filter as a coding tool for video compression. The filter is applied after transform and reconstruction, and the filtered result is used both for output as well as for spatial and temporal prediction. The implementation is based on a look-up table (LUT), making it fast enough to give a reasonable trade-off between complexity and compression efficiency. By varying the center filter coefficient and avoiding storing zero LUT entries, it is possible to reduce the size of the LUT to 2202 bytes. It is also demonstrated that the filter can be implemented without divisions, which is important for full custom ASIC implementations. The method has been implemented and tested according to the common test conditions in JEM version 5.0.1. For still images, or intra frames, we report a 0.4% bitrate reduction with a complexity increase of 6% in the encoder and 5% in the decoder. For video, we report a 0.5% bitrate reduction with a complexity increase of 3% in the encoder and 0% in the decoder.

本文提出使用双边滤波器作为视频压缩的编码工具。在变换重建后进行滤波，滤波后的结果既用于输出，也用于时空预测。该实现基于查找表(LUT)，使其足够快，可以在复杂性和压缩效率之间做出合理的权衡。通过改变中心过滤系数并避免存储零LUT条目，可以将LUT的大小减少到2202字节。还证明了该滤波器可以在没有除法的情况下实现，这对于完全自定义ASIC实现非常重要。该方法已经根据JEM版本5.0.1中的常用测试条件进行了实现和测试。对于静止图像或帧内，我们报告了0.4%的比特率降低，编码器的复杂性增加了6%，解码器的复杂性增加了5%。对于视频，我们报告了0.5%的比特率降低，编码器的复杂性增加了3%，解码器的复杂性增加了0%。

引用次数: 12

Illumination invariant feature based on neighboring radiance ratio 基于邻域亮度比的光照不变性特征

2017 IEEE Visual Communications and Image Processing (VCIP)

Pub Date : 2017-12-01 DOI: 10.1109/VCIP.2017.8305111

Xi Zhang, Xiaolin Wu

In many object recognition applications, especially in face recognition, varying illuminations can adversely affect the robustness of the object recognition system. In this paper, we propose a novel illumination invariant feature called Neighboring Radiance Ratio (NRR) which is insensitive to both intensity and direction of light. NRR is derived and analyzed based on a physical image formation model. The computation of NRR does not need any prior information or any training data and NRR is far less sensitive to the border of shadows than most existing methods. The analysis of the illumination invariance of NRR is also presented. The proposed NRR feature is tested on Extended Yale B and CMU-PIE databases and compared with several previous methods. The experimental results corroborate our analysis and demonstrate that NRR is highly robust image feature against illumination changes.

在许多目标识别应用中，特别是在人脸识别中，不同的光照会对目标识别系统的鲁棒性产生不利影响。本文提出了一种对光的强度和方向都不敏感的新型光照不变性特征——邻近辐射比(NRR)。在物理图像形成模型的基础上推导和分析了NRR。NRR的计算不需要任何先验信息或任何训练数据，而且NRR对阴影边界的敏感性远低于大多数现有方法。并对NRR的光照不变性进行了分析。在扩展耶鲁B数据库和CMU-PIE数据库上对所提出的NRR特征进行了测试，并与已有的几种方法进行了比较。实验结果证实了我们的分析，并证明了NRR对光照变化具有很高的鲁棒性。

引用次数: 4

Portable information security display system via Spatial PsychoVisual Modulation 空间心视调制便携式信息安全显示系统

2017 IEEE Visual Communications and Image Processing (VCIP)

Pub Date : 2017-12-01 DOI: 10.1109/VCIP.2017.8305058

Xiang Li, Guangtao Zhai, Jia Wang, Ke Gu

With the rapid development of visual media, people prefer to pay more attention to privacy protection in public situations. Currently, most existing researches on information security such as cryptography and steganography mainly concern transmission and yet little has been done to keep the information displayed on screens from reaching eyes of the bystanders. At the same time, the reporter just stands in front of the screen during traditional meetings. Limited time, screen area and report forms, which inevitably leads to limited information. As a result, we design a portable screen for assisting the reporter to present any information in a new form. In some public occasions, the reporter can show private content with important information to authorized audience while the others can not see that. In this paper, we propose a new Spatial PsychoVisual Modulation (SPVM) based solution to the privacy problem. This system uses two synchronized projectors with linear polarization filters and polarization glasses, and a camera with linear polarization filter the metallic screen. It can guarantee the system shows private information synchronously. We have implemented the system and experimental results demonstrate the effectiveness and robustness of the proposed information security display system.

随着视觉媒体的快速发展，人们更加注重公共场合的隐私保护。目前，大多数关于信息安全的研究，如密码学、隐写术等，主要关注的是传输问题，而对于如何防止屏幕上显示的信息到达旁观者的视线，研究很少。与此同时，在传统的会议中，记者只是站在屏幕前。有限的时间，屏幕面积和报表形式，不可避免地导致有限的信息。因此，我们设计了一个便携式屏幕，以帮助报告者以新的形式呈现任何信息。在某些公共场合，记者可以向授权受众展示包含重要信息的私人内容，而其他人却看不到。本文提出了一种新的基于空间心理视觉调制(SPVM)的隐私问题解决方案。该系统采用两个同步投影仪和带线偏振滤光片的偏振镜，以及带线偏振滤光片的金属屏摄像机。它可以保证系统同步显示私有信息。实验结果证明了所提出的信息安全显示系统的有效性和鲁棒性。

{"title":"Portable information security display system via Spatial PsychoVisual Modulation","authors":"Xiang Li, Guangtao Zhai, Jia Wang, Ke Gu","doi":"10.1109/VCIP.2017.8305058","DOIUrl":"https://doi.org/10.1109/VCIP.2017.8305058","url":null,"abstract":"With the rapid development of visual media, people prefer to pay more attention to privacy protection in public situations. Currently, most existing researches on information security such as cryptography and steganography mainly concern transmission and yet little has been done to keep the information displayed on screens from reaching eyes of the bystanders. At the same time, the reporter just stands in front of the screen during traditional meetings. Limited time, screen area and report forms, which inevitably leads to limited information. As a result, we design a portable screen for assisting the reporter to present any information in a new form. In some public occasions, the reporter can show private content with important information to authorized audience while the others can not see that. In this paper, we propose a new Spatial PsychoVisual Modulation (SPVM) based solution to the privacy problem. This system uses two synchronized projectors with linear polarization filters and polarization glasses, and a camera with linear polarization filter the metallic screen. It can guarantee the system shows private information synchronously. We have implemented the system and experimental results demonstrate the effectiveness and robustness of the proposed information security display system.","PeriodicalId":423636,"journal":{"name":"2017 IEEE Visual Communications and Image Processing (VCIP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126562166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Assessment and classification of singing quality based on audio-visual features 基于视听特征的歌唱质量评价与分类

2017 IEEE Visual Communications and Image Processing (VCIP)

Pub Date : 2017-12-01 DOI: 10.1109/VCIP.2017.8305078

Mangona Bokshi, Fei Tao, C. Busso, J. Hansen

The process of speech production changes between speaking and singing due to excitation, vocal tract articulatory positioning, and cognitive motor planning while singing. Singing does not only deviate from typical spoken speech, but it varies across various styles of singing. This is due to alternative genres of music, singing quality of an individual, as well as different languages and cultures. Because of this variation, it is important to establish a baseline system for differentiating between certain aspects of singing. In this study, we establish a classification system that automatically estimates singing quality of candidates from an American TV singing show based on their singing speech acoustics, lip and eye movements. We employ three classifiers that include: Logistic Regression, Naive Bayes and K-nearest neighbor (k-NN) and compare performance of each using unimodal and multimodal features. We also compare performance based on different modalities (speech, lip, eye structure). The results show that audio content performs the best, with modest gains when lip and eye content are fused. An interesting outcome is that lip and eye content achieve an 82% quality assessment while audio achieves 95%. The ability to assess singing quality from lip and eye content at this level is remarkable.

在歌唱过程中，由于兴奋、声道发音定位和认知运动规划，言语产生过程在说话和歌唱之间发生变化。唱歌不仅偏离了典型的口语，而且在不同的唱歌风格中也有所不同。这是由于不同的音乐流派，个人的演唱质量，以及不同的语言和文化。由于这种差异，建立一个基线系统来区分唱歌的某些方面是很重要的。在这项研究中，我们建立了一个分类系统，该系统可以根据美国电视歌唱节目中候选人的歌唱声效、嘴唇和眼睛的运动来自动估计他们的歌唱质量。我们使用了三种分类器，包括:逻辑回归、朴素贝叶斯和k-近邻(k-NN)，并使用单峰和多峰特征比较了每种分类器的性能。我们还比较了基于不同形态(语言、嘴唇、眼睛结构)的表现。结果表明，音频内容表现最好，当嘴唇和眼睛内容融合时，效果适度。一个有趣的结果是，嘴唇和眼睛的内容达到82%的质量评估，而音频达到95%。在这个水平上，从嘴唇和眼睛的内容来评估歌唱质量的能力是了不起的。

{"title":"Assessment and classification of singing quality based on audio-visual features","authors":"Mangona Bokshi, Fei Tao, C. Busso, J. Hansen","doi":"10.1109/VCIP.2017.8305078","DOIUrl":"https://doi.org/10.1109/VCIP.2017.8305078","url":null,"abstract":"The process of speech production changes between speaking and singing due to excitation, vocal tract articulatory positioning, and cognitive motor planning while singing. Singing does not only deviate from typical spoken speech, but it varies across various styles of singing. This is due to alternative genres of music, singing quality of an individual, as well as different languages and cultures. Because of this variation, it is important to establish a baseline system for differentiating between certain aspects of singing. In this study, we establish a classification system that automatically estimates singing quality of candidates from an American TV singing show based on their singing speech acoustics, lip and eye movements. We employ three classifiers that include: Logistic Regression, Naive Bayes and K-nearest neighbor (k-NN) and compare performance of each using unimodal and multimodal features. We also compare performance based on different modalities (speech, lip, eye structure). The results show that audio content performs the best, with modest gains when lip and eye content are fused. An interesting outcome is that lip and eye content achieve an 82% quality assessment while audio achieves 95%. The ability to assess singing quality from lip and eye content at this level is remarkable.","PeriodicalId":423636,"journal":{"name":"2017 IEEE Visual Communications and Image Processing (VCIP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130887857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

SIFT-based adaptive prediction structure for light field compression 基于sift的光场压缩自适应预测结构

2017 IEEE Visual Communications and Image Processing (VCIP)

Pub Date : 2017-12-01 DOI: 10.1109/VCIP.2017.8305107

Wei Zhang, Dong Liu, Zhiwei Xiong, Jizheng Xu

A light field consists of multiple views of a scene, which can be arranged and encoded like a pseudo sequence. Since the correlations between views are not equal and indeed content dependent, a well-constructed coding order and adaptive prediction structure will improve performance. In this paper, we propose an adaptive prediction structure for light field compression. While the coding order is inherited from the 2-D hierarchical coding order, the prediction structure is determined by the differences between scale-invariant feature transform (SIFT) descriptors of the views. Experimental results show that the proposed method leads to on average 5.71% BD-rate reduction compared with fixed prediction structure.

光场由场景的多个视图组成，可以像伪序列一样排列和编码。由于视图之间的相关性不相等，而且确实依赖于内容，因此构造良好的编码顺序和自适应预测结构将提高性能。本文提出了一种光场压缩自适应预测结构。虽然编码顺序继承自二维分层编码顺序，但预测结构由视图的尺度不变特征变换(SIFT)描述符之间的差异决定。实验结果表明，与固定预测结构相比，该方法平均降低了5.71%的bd率。

引用次数: 10

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2017 IEEE Visual Communications and Image Processing (VCIP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀