2014 IEEE Visual Communications and Image Processing Conference最新文献

英文中文

Discriminative multi-modality non-negative sparse graph model for action recognition 动作识别的判别多模态非负稀疏图模型

2014 IEEE Visual Communications and Image Processing Conference

Pub Date : 2014-12-01 DOI: 10.1109/VCIP.2014.7051502

Yuanbo Chen, Yanyun Zhao, Bojin Zhuang, A. Cai

A discriminative multi-modality non-negative sparse (DMNS) graph model is proposed in this paper. In the model, features in each modality are first projected into the Mahalanobis space by a transformation learned for this modality, a multi-modality non-negative sparse graph is then constructed in the Mahalanobis space with shared coefficients across modalities. Both the labeled and unlabeled data can be introduced into the graph, and label propagation can then be performed to predict labels of the unlabeled samples. Extensive experiments over two benchmark datasets demonstrate the advantages of the proposed DMNS-graph method over the state-of-the-art methods.

提出了一种判别多模非负稀疏(DMNS)图模型。在该模型中，每个模态中的特征首先通过对该模态学习的变换映射到Mahalanobis空间中，然后在Mahalanobis空间中构造具有共享系数的多模态非负稀疏图。标记和未标记的数据都可以引入到图中，然后可以执行标签传播来预测未标记样本的标签。在两个基准数据集上的大量实验证明了所提出的DMNS-graph方法优于最先进的方法。

引用次数: 0

Depth estimation by combining stereo matching and coded aperture 结合立体匹配和编码孔径的深度估计

2014 IEEE Visual Communications and Image Processing Conference

Pub Date : 2014-12-01 DOI: 10.1109/VCIP.2014.7051561

C. Wang, E. Sahin, O. Suominen, A. Gotchev

We investigate possible improvements that can be achieved in depth estimation by merging coded apertures and stereo cameras. We analyze several stereo camera setups which are equipped with different sets of coded apertures to explore such possibilities. The demonstrated results of this analysis are encouraging in the sense that coded apertures can provide valuable complementary information to stereo vision based depth estimation in some cases. In addition to that, we take advantage of stereo camera arrangement to have a single shot multiple coded aperture system. We show that with this system, it is possible to extract depth information robustly, by utilizing the inherent relation between the disparity and defocus cues, even for scene regions which are problematic for stereo matching.

我们研究了可以通过合并编码孔径和立体相机来实现深度估计的可能改进。我们分析了几种立体相机设置，它们配备了不同的编码光圈集来探索这种可能性。这种分析的结果是令人鼓舞的，因为编码孔径可以在某些情况下为基于立体视觉的深度估计提供有价值的补充信息。除此之外，我们还利用立体摄像机的布置，采用单镜头多编码光圈系统。我们的研究表明，通过利用视差和离焦线索之间的内在关系，即使对于立体匹配存在问题的场景区域，该系统也可以鲁棒地提取深度信息。

引用次数: 3

Fast hierarchical cost volume aggregation for stereo-matching 用于立体匹配的快速分层成本体积聚合

2014 IEEE Visual Communications and Image Processing Conference

Pub Date : 2014-12-01 DOI: 10.1109/VCIP.2014.7051615

Sergey Smirnov, A. Gotchev

Some of the best performing local stereo-matching approaches use cross-bilateral filters for proper cost aggregation. The recent attempts have been directed toward efficient approximations of such filter aimed at higher speed. In this paper, we suggest a simple yet efficient coarse-to-fine cost volume aggregation scheme, which employs pyramidal decomposition of the cost volume followed by edge-avoiding reconstruction and aggregation. The scheme substantially reduces the computational complexity while providing fair quality of the estimated disparity maps compared to other approximated bilateral filtering schemes. In fact, the speed of the proposed technique is comparable with the speed of fixed kernel aggregation implemented through integral images.

一些性能最好的局部立体匹配方法使用交叉双边过滤器进行适当的成本聚合。最近的尝试是针对这种滤波器的高效近似，以达到更高的速度。本文提出了一种简单有效的从粗到精的成本体聚合方案，该方案对成本体进行金字塔分解，然后进行避边重构和聚合。与其他近似双边滤波方案相比，该方案大大降低了计算复杂度，同时提供了估计视差图的公平质量。实际上，该技术的速度与通过积分图像实现的固定核聚合的速度相当。

引用次数: 1

Fast and viewpoint robust human detection in uncluttered environments 在整洁的环境中快速和视点鲁棒的人体检测

2014 IEEE Visual Communications and Image Processing Conference

Pub Date : 2014-12-01 DOI: 10.1109/VCIP.2014.7051621

Paul Blondel, A. Potelle, C. Pégard, Rogelio Lozano

Human detection is a very popular field of computer vision. Few works propose a solution for detecting people whatever the camera's viewpoint such as for UAV applications. In this context even state-of-the-art detectors can fail to detect people. We found that the Integral Channel Features detector (ICF) is inoperant in such a context. In this paper, we propose an approach to still benefit from the assets of the ICF while considerably extending the angular robustness during the detection. The main contributions of this work are: a new framework based on the Cluster Boosting Tree and the ICF detector for viewpoint robust human detection; a new training dataset for taking into account the human shape modifications occuring when the pitch angle of the camera changes. We showed that our detector (the PRD) is superior to the ICF for detecting people from complex viewpoints in uncluttered environments and that the computation time of the detector is real-time compatible.

人体检测是计算机视觉的一个非常流行的领域。很少有作品提出了一个解决方案，无论相机的视点，如无人机应用检测人。在这种情况下，即使是最先进的探测器也无法探测到人。我们发现积分通道特征检测器(ICF)在这种情况下不起作用。在本文中，我们提出了一种方法，在检测过程中仍然受益于ICF的资产，同时大大扩展了角度鲁棒性。本工作的主要贡献是:基于聚类提升树和ICF检测器的视点鲁棒人体检测新框架;一个新的训练数据集，用于考虑当相机俯仰角变化时发生的人体形状变化。我们证明了我们的检测器(PRD)优于ICF，可以在整洁的环境中从复杂的角度检测人，并且检测器的计算时间是实时兼容的。

引用次数: 7

Independent uniform prediction mode for screen content video coding 独立统一的屏幕内容视频编码预测模式

2014 IEEE Visual Communications and Image Processing Conference

Pub Date : 2014-12-01 DOI: 10.1109/VCIP.2014.7051521

Xingyu Zhang, R. Cohen, A. Vetro

Many of the existing video coding standards in use today were developed primarily using camera-captured content as test material. Today, with the more widespread use of connected devices, there is an increased interest in developing video coding tools that target screen content video. Screen content video is often characterized by having sharp edges, noiseless graphics-generated region, repeated patterns, limited sets of colors, etc. This paper presents an independent uniform prediction (IUP) mode for improving the coding efficiency of screen content video. IUP chooses one color out of a small set of global colors to form a uniform prediction block. Unlike existing palette-based modes, IUP does not have to construct and signal a color index map for every block that is coded. Experimental results using IUP in the HEVC Range Extensions 6.0 framework are presented, along with results using techniques that reduce complexity so that the IUP-based encoder is faster than the reference encoder.

目前使用的许多现有视频编码标准主要是使用摄像机捕获的内容作为测试材料开发的。如今，随着互联设备的广泛使用，人们对开发针对屏幕内容视频的视频编码工具越来越感兴趣。屏幕内容视频通常具有锐利的边缘、无噪声的图形生成区域、重复的图案、有限的颜色集合等特征。为了提高屏幕内容视频的编码效率，提出了一种独立的统一预测(IUP)模式。IUP从一小组全局颜色中选择一种颜色，形成统一的预测块。与现有的基于调色板的模式不同，IUP不需要为每个编码的块构建和标记颜色索引图。给出了在HEVC Range Extensions 6.0框架中使用IUP的实验结果，以及使用降低复杂性的技术的结果，使得基于IUP的编码器比参考编码器更快。

引用次数: 1

Full reference image quality metric for stereo images based on Cyclopean image computation and neural fusion 基于cyclopeean图像计算和神经融合的立体图像全参考图像质量度量

2014 IEEE Visual Communications and Image Processing Conference

Pub Date : 2014-12-01 DOI: 10.1109/VCIP.2014.7051516

A. Chetouani

In this paper, we present a New Stereo Full-Reference Image Quality Metric (SFR-IQM) based on Cyclopean Image (CI) computation and 2D IQM fusion. The Cyclopean images of the reference image and its degraded version are first computed from the left and the right views. 2D measures are then extracted from the obtained CIs and are combined using an Artificial Neural Networks (ANN) in order to derive a single index. The 3D LIVE Image Quality Database has been here used to evaluate our method and its capability to predict the subjective judgments. The obtained results have been compared to some recent methods considered as the state-of-the-art. The experimental results show the relevance of our method.

本文提出了一种基于单幅图像(CI)计算和二维单幅图像质量度量融合的立体全参考图像质量度量。首先从左视图和右视图计算参考图像及其降级版本的cyclopeean图像。然后从获得的ci中提取二维测量，并使用人工神经网络(ANN)进行组合，以得出单个指标。3D LIVE图像质量数据库已经在这里用来评估我们的方法及其预测主观判断的能力。所获得的结果已与最近一些被认为是最先进的方法进行了比较。实验结果表明了该方法的有效性。

引用次数: 10

Efficient depth propagation in videos with GPU-acceleration 高效深度传播视频与gpu加速

2014 IEEE Visual Communications and Image Processing Conference

Pub Date : 2014-12-01 DOI: 10.1109/VCIP.2014.7051557

Manuel Ivancsics, N. Brosch, M. Gelautz

In this paper we propose an optimized semiautomatic approach for efficient 2D-to-3D video conversion. It is based on a conversion algorithm that leverages segmentation and filtering techniques to propagate sparse depth information that was provided by a user. Our GPU acceleration of in the work of Brosch et al. (2011) significantly reduces the computation time of the original algorithm. Since the limited capacity of the CPU's onboard memory hinders the parallel execution of large data such as videos, we additionally propose a temporally coherent clip-based 2D-to-3D conversion approach for long videos. Evaluations show that the proposed, optimized conversion approach is capable of generating high-quality results, while significantly reducing the execution time compared to the original, un-optimized approach.

在本文中，我们提出了一种优化的半自动方法来实现高效的2d到3d视频转换。它基于一种转换算法，该算法利用分割和过滤技术来传播用户提供的稀疏深度信息。在Brosch等人(2011)的工作中，我们的GPU加速显著减少了原始算法的计算时间。由于CPU板载内存的有限容量阻碍了视频等大数据的并行执行，我们还提出了一种基于时间连贯剪辑的长视频2d到3d转换方法。评估表明，所提出的优化转换方法能够生成高质量的结果，同时与原始的未优化方法相比，显着减少了执行时间。

引用次数: 0

Analysis and optimization of x265 encoder x265编码器的分析与优化

2014 IEEE Visual Communications and Image Processing Conference

Pub Date : 2014-12-01 DOI: 10.1109/VCIP.2014.7051616

Q. Hu, Xiaoyun Zhang, Zhiyong Gao, Jun Sun

x265 is an open-source encoder project which aims to deliver the world's fastest and most computationally efficient HEVC encoder. Although x265 has been developed efficiently with many optimization techniques, it is still not able to encode HD videos in real time even at its faster setting. In this paper, we deeply investigate the encoding framework and computational complexity of x265, and find that RDO process is the most time consuming part. Then, an efficient prediction scheme is proposed which includes decreasing the number of RDO times, early skip detection and fast intra mode decision. Experimental results show that the proposed method improves the speed of x265 from 19.86fps to 37.76fps for HD test sequences, i.e., 47.44% complexity reduction, with only 1.37% BDBR coding performance loss.

x265是一个开源编码器项目，旨在提供世界上最快和计算效率最高的HEVC编码器。虽然x265已经通过许多优化技术得到了有效的开发，但即使在更快的设置下，它仍然无法实时编码高清视频。本文深入研究了x265的编码框架和计算复杂度，发现RDO过程是其中最耗时的部分。然后，提出了一种有效的预测方案，该方案包括减少RDO次数、早期跳过检测和快速模内决策。实验结果表明，该方法将HD测试序列的x265帧速率从19.86fps提高到37.76fps，复杂度降低47.44%，仅损失1.37%的BDBR编码性能。

引用次数: 15

Demo: DLP based anti-piracy display system 演示:基于DLP的防盗版显示系统

2014 IEEE Visual Communications and Image Processing Conference

Pub Date : 2014-12-01 DOI: 10.1109/VCIP.2014.7051571

Zhongpai Gao, Guangtao Zhai, Xiaolin Wu, Xiongkuo Min, Chunjia Hu

Camcorder piracy has great impact on the movie industry. Although there are many methods to prevent recording in theatre, no recognized technology satisfies the need of defeating camcorder piracy as well as having no effect on the audience. To realize anti-piracy, we uses a new paradigm of information display technology, called temporal psychovisual modulation (TPVM). TPVM exploits the difference in image formation mechanisms of human eyes and imaging sensors. Based on this difference, we build a prototype system on the platform of DLP® LightCrafter 4500™ which features high speed pattern display. The display system serves as a proof-of-concept of anti-piracy system.

摄像机盗版对电影行业的影响很大。虽然有许多方法可以防止在剧院录音，但没有一种公认的技术既能满足打击盗版摄像机的需要，又不会对观众产生影响。为了实现反盗版，我们采用了一种新的信息显示技术范式，即时间心理视觉调制(TPVM)。TPVM利用人眼和成像传感器在成像机制上的差异。基于这一差异，我们在DLP®LightCrafter 4500™平台上构建了一个具有高速图案显示功能的原型系统。该显示系统作为反盗版系统的概念验证。

引用次数: 1

Key view selection in distributed multiview coding 分布式多视图编码中的关键视图选择

2014 IEEE Visual Communications and Image Processing Conference

Pub Date : 2014-12-01 DOI: 10.1109/VCIP.2014.7051612

Thomas Maugey, G. Petrazzuoli, P. Frossard, Marco Cagnazzo, B. Pesquet-Popescu

Multiview image and video systems with large number of views lead to new problems in data representation, transmission and user interaction. In order to reduce the data volumes, most distributed multiview coding schemes exploit the inter-view redundancies at the decoder side, using view synthesis from key views. In the situation where many views are considered, the two following questions become fundamental: i) how many key views have to be chosen for keeping a good reconstruction quality with reasonable coding cost? ii) where to place them optimally in the multiview sequences? We propose in this paper an algorithm for selecting the key views in a distributed multiview coding scheme. Based on a novel metric for the correlation between the views, we formulate an optimization problem for the positioning of the key views such that both the distortion of the reconstruction and the coding rate cost are effectively minimized. We then propose a new optimization strategy based on shortest path algorithm that permits to determine both the optimal number of key views and their positions in the image set. We experimentally validate our solution in a practical distributed multiview coding system and we show that considering the 3D scene geometry in the key view positioning brings significant rate-distortion improvements compared to distance-based key view selection as it is commonly done in the literature.

视点多的多视点图像视频系统在数据表示、传输和用户交互等方面带来了新的问题。为了减少数据量，大多数分布式多视图编码方案利用解码器侧的视图间冗余，使用关键视图的视图合成。在考虑许多视图的情况下，以下两个问题变得至关重要:i)为了在合理的编码成本下保持良好的重构质量，必须选择多少个关键视图?Ii)在多视图序列中放置它们的最佳位置?本文提出了一种在分布式多视图编码方案中选择关键视图的算法。基于一种新的视图之间的相关性度量，我们提出了一个关键视图定位的优化问题，从而有效地降低了重构失真和编码率成本。然后，我们提出了一种新的基于最短路径算法的优化策略，该策略允许确定关键视图的最佳数量及其在图像集中的位置。我们在一个实际的分布式多视图编码系统中实验验证了我们的解决方案，并且我们表明，与文献中常用的基于距离的关键视图选择相比，在关键视图定位中考虑3D场景几何可以显著改善率失真。

{"title":"Key view selection in distributed multiview coding","authors":"Thomas Maugey, G. Petrazzuoli, P. Frossard, Marco Cagnazzo, B. Pesquet-Popescu","doi":"10.1109/VCIP.2014.7051612","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051612","url":null,"abstract":"Multiview image and video systems with large number of views lead to new problems in data representation, transmission and user interaction. In order to reduce the data volumes, most distributed multiview coding schemes exploit the inter-view redundancies at the decoder side, using view synthesis from key views. In the situation where many views are considered, the two following questions become fundamental: i) how many key views have to be chosen for keeping a good reconstruction quality with reasonable coding cost? ii) where to place them optimally in the multiview sequences? We propose in this paper an algorithm for selecting the key views in a distributed multiview coding scheme. Based on a novel metric for the correlation between the views, we formulate an optimization problem for the positioning of the key views such that both the distortion of the reconstruction and the coding rate cost are effectively minimized. We then propose a new optimization strategy based on shortest path algorithm that permits to determine both the optimal number of key views and their positions in the image set. We experimentally validate our solution in a practical distributed multiview coding system and we show that considering the 3D scene geometry in the key view positioning brings significant rate-distortion improvements compared to distance-based key view selection as it is commonly done in the literature.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134554107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2014 IEEE Visual Communications and Image Processing Conference

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀