2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)最新文献

英文中文

Deep Wavelet Prediction for Image Super-Resolution 图像超分辨率的深度小波预测

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Pub Date : 2017-07-21 DOI: 10.1109/CVPRW.2017.148

Tiantong Guo, Hojjat Seyed Mousavi, T. Vu, V. Monga

Recent advances have seen a surge of deep learning approaches for image super-resolution. Invariably, a network, e.g. a deep convolutional neural network (CNN) or auto-encoder is trained to learn the relationship between low and high-resolution image patches. Recognizing that a wavelet transform provides a "coarse" as well as "detail" separation of image content, we design a deep CNN to predict the "missing details" of wavelet coefficients of the low-resolution images to obtain the Super-Resolution (SR) results, which we name Deep Wavelet Super-Resolution (DWSR). Out network is trained in the wavelet domain with four input and output channels respectively. The input comprises of 4 sub-bands of the low-resolution wavelet coefficients and outputs are residuals (missing details) of 4 sub-bands of high-resolution wavelet coefficients. Wavelet coefficients and wavelet residuals are used as input and outputs of our network to further enhance the sparsity of activation maps. A key benefit of such a design is that it greatly reduces the training burden of learning the network that reconstructs low frequency details. The output prediction is added to the input to form the final SR wavelet coefficients. Then the inverse 2d discrete wavelet transformation is applied to transform the predicted details and generate the SR results. We show that DWSR is computationally simpler and yet produces competitive and often better results than state-of-the-art alternatives.

最近的进展已经看到了图像超分辨率的深度学习方法的激增。通常，一个网络，例如深度卷积神经网络(CNN)或自动编码器被训练来学习低分辨率和高分辨率图像补丁之间的关系。认识到小波变换提供了图像内容的“粗糙”和“细节”分离，我们设计了一个深度CNN来预测低分辨率图像的小波系数的“缺失细节”，以获得超分辨率(SR)结果，我们将其命名为深度小波超分辨率(DWSR)。我们的网络在小波域中分别训练了四个输入和输出通道。输入由低分辨率小波系数的4个子带组成，输出为高分辨率小波系数的4个子带残差(缺失细节)。利用小波系数和小波残差作为网络的输入和输出，进一步增强了激活图的稀疏性。这种设计的一个关键好处是，它大大减少了学习重建低频细节的网络的训练负担。输出预测被添加到输入以形成最终的SR小波系数。然后利用二维离散逆小波变换对预测细节进行变换，得到SR结果。我们表明，DWSR在计算上更简单，但与最先进的替代方案相比，它产生的结果具有竞争力，而且往往更好。

{"title":"Deep Wavelet Prediction for Image Super-Resolution","authors":"Tiantong Guo, Hojjat Seyed Mousavi, T. Vu, V. Monga","doi":"10.1109/CVPRW.2017.148","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.148","url":null,"abstract":"Recent advances have seen a surge of deep learning approaches for image super-resolution. Invariably, a network, e.g. a deep convolutional neural network (CNN) or auto-encoder is trained to learn the relationship between low and high-resolution image patches. Recognizing that a wavelet transform provides a \"coarse\" as well as \"detail\" separation of image content, we design a deep CNN to predict the \"missing details\" of wavelet coefficients of the low-resolution images to obtain the Super-Resolution (SR) results, which we name Deep Wavelet Super-Resolution (DWSR). Out network is trained in the wavelet domain with four input and output channels respectively. The input comprises of 4 sub-bands of the low-resolution wavelet coefficients and outputs are residuals (missing details) of 4 sub-bands of high-resolution wavelet coefficients. Wavelet coefficients and wavelet residuals are used as input and outputs of our network to further enhance the sparsity of activation maps. A key benefit of such a design is that it greatly reduces the training burden of learning the network that reconstructs low frequency details. The output prediction is added to the input to form the final SR wavelet coefficients. Then the inverse 2d discrete wavelet transformation is applied to transform the predicted details and generate the SR results. We show that DWSR is computationally simpler and yet produces competitive and often better results than state-of-the-art alternatives.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"1 1","pages":"1100-1109"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83353703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 185

AcFR: Active Face Recognition Using Convolutional Neural Networks 基于卷积神经网络的主动人脸识别

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Pub Date : 2017-07-21 DOI: 10.1109/CVPRW.2017.11

Masaki Nakada, Han Wang, Demetri Terzopoulos

We propose AcFR, an active face recognition system that employs a convolutional neural network and acts consistently with human behaviors in common face recognition scenarios. AcFR comprises two main components—a recognition module and a controller module. The recognition module uses a pre-trained VGG-Face net to extract facial image features along with a nearest neighbor identity recognition algorithm. Based on the results, the controller module can make three different decisions—greet a recognized individual, disregard an unknown individual, or acquire a different viewpoint from which to reassess the subject, all of which are natural reactions when people observe passers-by. Evaluated on the PIE dataset, our recognition module yields higher accuracy on images under closer angles to those saved in memory. The accuracy is viewdependent and it also provides evidence for the proper design of the controller module.

我们提出AcFR，一种主动人脸识别系统，它采用卷积神经网络，并在常见的人脸识别场景中与人类行为保持一致。AcFR由两个主要部分组成:识别模块和控制器模块。识别模块使用预训练的VGG-Face网络，结合最近邻身份识别算法提取人脸图像特征。根据结果，控制器模块可以做出三种不同的决定——与认识的人打招呼，无视未知的人，或者从不同的角度重新评估对象，这些都是人们观察过路人时的自然反应。在PIE数据集上进行评估，我们的识别模块在更接近内存中保存的图像的角度下产生更高的准确性。其精度与视图相关，也为控制器模块的合理设计提供了依据。

引用次数: 28

Recognition of Affect in the Wild Using Deep Neural Networks 利用深度神经网络识别野外情感

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Pub Date : 2017-07-21 DOI: 10.1109/CVPRW.2017.247

D. Kollias, M. Nicolaou, I. Kotsia, Guoying Zhao, S. Zafeiriou

In this paper we utilize the first large-scale "in-the-wild" (Aff-Wild) database, which is annotated in terms of the valence-arousal dimensions, to train and test an end-to-end deep neural architecture for the estimation of continuous emotion dimensions based on visual cues. The proposed architecture is based on jointly training convolutional (CNN) and recurrent neural network (RNN) layers, thus exploiting both the invariant properties of convolutional features, while also modelling temporal dynamics that arise in human behaviour via the recurrent layers. Various pre-trained networks are used as starting structures which are subsequently appropriately fine-tuned to the Aff-Wild database. Obtained results show premise for the utilization of deep architectures for the visual analysis of human behaviour in terms of continuous emotion dimensions and analysis of different types of affect.

在本文中，我们利用第一个大规模的“野外”(Aff-Wild)数据库，该数据库根据价唤醒维度进行注释，以训练和测试端到端深度神经架构，用于基于视觉线索估计连续情感维度。所提出的架构是基于联合训练卷积(CNN)和循环神经网络(RNN)层，从而利用卷积特征的不变特性，同时也通过循环层建模人类行为中出现的时间动态。各种预训练的网络被用作初始结构，随后适当地微调到af - wild数据库。所得结果为利用深度架构从连续情感维度和不同类型情感分析的角度对人类行为进行视觉分析提供了前提。

引用次数: 127

Evaluating State-of-the-Art Object Detector on Challenging Traffic Light Data 基于交通灯数据的最先进目标检测器评估

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Pub Date : 2017-07-21 DOI: 10.1109/CVPRW.2017.122

M. B. Jensen, Kamal Nasrollahi, T. Moeslund

Traffic light detection (TLD) is a vital part of both intelligent vehicles and driving assistance systems (DAS). General for most TLDs is that they are evaluated on small and private datasets making it hard to determine the exact performance of a given method. In this paper we apply the state-of-the-art, real-time object detection system You Only Look Once, (YOLO) on the public LISA Traffic Light dataset available through the VIVA-challenge, which contain a high number of annotated traffic lights, captured in varying light and weather conditions.,,,,,,The YOLO object detector achieves an AUC of impressively 90.49% for daysequence1, which is an improvement of 50.32% compared to the latest ACF entry in the VIVAchallenge. Using the exact same training configuration as the ACF detector, the YOLO detector reaches an AUC of 58.3%, which is in an increase of 18.13%.

交通信号灯检测(TLD)是智能车辆和驾驶辅助系统(DAS)的重要组成部分。大多数顶级域名的一般情况是，它们是在小型和私有数据集上进行评估的，这使得很难确定给定方法的确切性能。在本文中，我们将最先进的实时目标检测系统You Only Look Once (YOLO)应用于通过viva挑战获得的公共LISA交通灯数据集，该数据集包含大量在不同光线和天气条件下捕获的带注释的交通灯。，，，，，， YOLO目标检测器对daysequence1的AUC达到了令人印象深刻的90.49%，与vivchallenge中最新的ACF条目相比，这一AUC提高了50.32%。使用与ACF检测器完全相同的训练配置，YOLO检测器的AUC达到58.3%，提高了18.13%。

引用次数: 66

Singlets: Multi-resolution Motion Singularities for Soccer Video Abstraction 单线:足球视频抽象的多分辨率运动奇点

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Pub Date : 2017-07-21 DOI: 10.1109/CVPRW.2017.15

K. Blanc, D. Lingrand, F. Precioso

The burst of video production appeals for new browsing frameworks. Chiefly in sports, TV companies have years of recorded match archives to exploit and sports fans are looking for replay, summary or collection of events. In this work, we design a new multi-resolution motion feature for video abstraction. This descriptor is based on optical flow singularities tracked along the video. We use these singlets in order to detect zooms, slow-motions and salient moments in soccer games and finally to produce an automatic summarization of a game. We produce a database for soccer video summarization composed of 4 soccer matches from HDTV games for the FIFA world cup 2014 annotated with goals, fouls, corners and salient moments to make a summary. We correctly detect 88.2% of saliant moments using this database. To highlight the generalization of our approach, we test our system on the final game of the handball world championship 2015 without any retraining, refining or adaptation.

视频制作的爆发需要新的浏览框架。主要是在体育方面，电视公司有多年的比赛记录档案可以利用，而体育迷正在寻找重播、总结或赛事集合。在这项工作中，我们设计了一种新的多分辨率运动特征用于视频抽象。该描述符基于沿视频跟踪的光流奇点。我们使用这些单线来检测足球比赛中的变焦、慢动作和重要时刻，并最终生成一场比赛的自动摘要。本文以2014年世界杯高清电视比赛的4场足球比赛为基础，制作了足球视频摘要数据库，并对进球、犯规、角球、重要时刻进行了注释。使用该数据库，我们正确地检测了88.2%的显著矩。为了突出我们方法的通用性，我们在2015年手球世界锦标赛的最后一场比赛中测试了我们的系统，没有进行任何再训练、改进或调整。

引用次数: 4

CNN Based Yeast Cell Segmentation in Multi-modal Fluorescent Microscopy Data 基于CNN的酵母细胞分割在多模态荧光显微镜数据

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Pub Date : 2017-07-21 DOI: 10.1109/CVPRW.2017.105

A. S. Aydin, Abhinandan Dubey, Daniel Dovrat, A. Aharoni, Roy Shilkrot

We present a method for foreground segmentation of yeast cells in the presence of high-noise induced by intentional low illumination, where traditional approaches (e.g., threshold-based methods, specialized cell-segmentation methods) fail. To deal with these harsh conditions, we use a fully-convolutional semantic segmentation network based on the SegNet architecture. Our model is capable of segmenting patches extracted from yeast live-cell experiments with a mIOU score of 0.71 on unseen patches drawn from independent experiments. Further, we show that simultaneous multi-modal observations of bio-fluorescent markers can result in better segmentation performance than the DIC channel alone.

我们提出了一种酵母细胞前景分割的方法，在存在由故意低照度引起的高噪声的情况下，传统的方法(例如，基于阈值的方法，专门的细胞分割方法)失败。为了处理这些苛刻的条件，我们使用了基于SegNet架构的全卷积语义分割网络。我们的模型能够分割从酵母活细胞实验中提取的斑块，对独立实验中提取的未见斑块的mIOU分数为0.71。此外，我们发现生物荧光标记的同时多模态观察可以产生比单独的DIC通道更好的分割性能。

引用次数: 26

When Kernel Methods Meet Feature Learning: Log-Covariance Network for Action Recognition From Skeletal Data 当核方法满足特征学习:基于骨骼数据的动作识别的对数协方差网络

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Pub Date : 2017-07-21 DOI: 10.1109/CVPRW.2017.165

Jacopo Cavazza, Pietro Morerio, Vittorio Murino

Human action recognition from skeletal data is a hot research topic and important in many open domain applications of computer vision, thanks to recently introduced 3D sensors. In the literature, naive methods simply transfer off-the-shelf techniques from video to the skeletal representation. However, the current state-of-the-art is contended between to different paradigms: kernel-based methods and feature learning with (recurrent) neural networks. Both approaches show strong performances, yet they exhibit heavy, but complementary, drawbacks. Motivated by this fact, our work aims at combining together the best of the two paradigms, by proposing an approach where a shallow network is fed with a covariance representation. Our intuition is that, as long as the dynamics is effectively modeled, there is no need for the classification network to be deep nor recurrent in order to score favorably. We validate this hypothesis in a broad experimental analysis over 6 publicly available datasets.

基于骨骼数据的人体动作识别是一个热门研究课题，在计算机视觉的许多开放领域应用中具有重要意义。在文献中，幼稚的方法只是将现成的技术从视频转移到骨骼表示。然而，目前最先进的技术在两种不同的范式之间竞争:基于核的方法和(循环)神经网络的特征学习。这两种方法都表现出强大的性能，但它们也表现出严重的(但互补的)缺点。受到这一事实的启发，我们的工作旨在通过提出一种方法，将浅层网络与协方差表示相结合，从而将两种范式的优点结合在一起。我们的直觉是，只要动态被有效地建模，分类网络就不需要深度或循环来获得有利的分数。我们在6个公开数据集的广泛实验分析中验证了这一假设。

引用次数: 10

Multi-modal Score Fusion and Decision Trees for Explainable Automatic Job Candidate Screening from Video CVs 基于多模态分数融合和决策树的视频简历可解释性自动筛选

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Pub Date : 2017-07-21 DOI: 10.1109/CVPRW.2017.210

Heysem Kaya, Furkan Gürpinar, A. A. Salah

We describe an end-to-end system for explainable automatic job candidate screening from video CVs. In this application, audio, face and scene features are first computed from an input video CV, using rich feature sets. These multiple modalities are fed into modality-specific regressors to predict apparent personality traits and a variable that predicts whether the subject will be invited to the interview. The base learners are stacked to an ensemble of decision trees to produce the outputs of the quantitative stage, and a single decision tree, combined with a rule-based algorithm produces interview decision explanations based on the quantitative results. The proposed system in this work ranks first in both quantitative and qualitative stages of the CVPR 2017 ChaLearn Job Candidate Screening Coopetition.

我们描述了一个端到端的系统，用于从视频简历中自动筛选可解释的求职者。在这个应用程序中，音频、人脸和场景特征首先从输入视频CV中计算出来，使用丰富的特征集。这些多重模态被输入到模态特定的回归器中，以预测明显的人格特征，并输入一个变量来预测受试者是否会被邀请参加面试。基础学习者被堆叠到一组决策树中以产生定量阶段的输出，单个决策树与基于规则的算法相结合，根据定量结果产生面试决策解释。本工作提出的系统在CVPR 2017年ChaLearn职位候选人筛选合作中，在定量和定性两个阶段均排名第一。

引用次数: 52

The Role of Synchronic Causal Conditions in Visual Knowledge Learning 共时因果条件在视觉知识学习中的作用

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Pub Date : 2017-07-21 DOI: 10.1109/CVPRW.2017.8

Seng-Beng Ho

We propose a principled approach for the learning of causal conditions from actions and activities taking place in the physical environment through visual input. Causal conditions are the preconditions that must exist before a certain effect can ensue. We propose to consider diachronic and synchronic causal conditions separately for the learning of causal knowledge. Diachronic condition captures the "change" aspect of the causal relationship – what change must be present at a certain time to effect a subsequent change – while the synchronic condition is the "contextual" aspect – what "static" condition must be present to enable the causal relationship involved. This paper focuses on discussing the learning of synchronic causal conditions as well as proposing a principled framework for the learning of causal knowledge including the learning of extended sequences of cause-effect and the encoding of this knowledge in the form of scripts for prediction and problem solving.

我们提出了一种原则性的方法，通过视觉输入从物理环境中发生的动作和活动中学习因果条件。因果条件是在某种结果发生之前必须存在的先决条件。我们建议对因果知识的学习分别考虑历时和共时的因果条件。历时条件捕捉因果关系的“变化”方面——什么变化必须在一定时间出现，以影响随后的变化——而共时条件是“上下文”方面——什么“静态”条件必须出现，以使涉及的因果关系成为可能。本文重点讨论了共时因果条件的学习，并提出了一个因果知识学习的原则框架，包括因果扩展序列的学习和以脚本形式对这些知识进行编码，以用于预测和解决问题。

引用次数: 6

RGB-D Scene Labeling with Multimodal Recurrent Neural Networks 基于多模态递归神经网络的RGB-D场景标记

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Pub Date : 2017-07-21 DOI: 10.1109/CVPRW.2017.31

Heng Fan, Xue Mei, D. Prokhorov, Haibin Ling

Recurrent neural networks (RNNs) are able to capture context in an image by modeling long-range semantic dependencies among image units. However, existing methods only utilize RNNs to model dependencies of a single modality (e.g., RGB) for labeling. In this work we extend this single-modal RNNs to multimodal RNNs (MM-RNNs) and apply it to RGB-D scene labeling. Our MM-RNNs are capable of seamlessly modeling dependencies of both RGB and depth modalities, and allow 'memory' sharing across modalities. By sharing 'memory', each modality possesses multiple properties of itself and other modalities, and becomes more discriminative to distinguish pixels. Moreover, we also analyse two simple extensions of single-modal RNNs and demonstrate that our MM-RNNs perform better than both of them. Integrating with convolutional neural networks (CNNs), we build an end-to-end network for RGB-D scene labeling. Extensive experiments on NYU depth V1 and V2 demonstrate the effectiveness of MM-RNNs.

递归神经网络(RNNs)能够通过建模图像单元之间的远程语义依赖关系来捕获图像中的上下文。然而，现有方法仅利用rnn对单一模态(例如RGB)的依赖关系进行建模以进行标记。在这项工作中，我们将这种单模态rnn扩展到多模态rnn (mm - rnn)，并将其应用于RGB-D场景标记。我们的mm - rnn能够无缝地建模RGB和深度模式的依赖关系，并允许跨模式的“内存”共享。通过共享“记忆”，每个模态都具有自身和其他模态的多重属性，并且在区分像素时更具辨别能力。此外，我们还分析了单模态rnn的两个简单扩展，并证明了我们的mm - rnn比它们都表现得更好。结合卷积神经网络(cnn)，构建了RGB-D场景标记的端到端网络。在NYU深度V1和V2上的大量实验证明了mm - rnn的有效性。

引用次数: 7

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀