首页 > 最新文献

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension 你比六年级学生聪明吗?多模态机器理解的教科书问答
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.571
Aniruddha Kembhavi, Minjoon Seo, Dustin Schwenk, Jonghyun Choi, Ali Farhadi, Hannaneh Hajishirzi
We introduce the task of Multi-Modal Machine Comprehension (M3C), which aims at answering multimodal questions given a context of text, diagrams and images. We present the Textbook Question Answering (TQA) dataset that includes 1,076 lessons and 26,260 multi-modal questions, taken from middle school science curricula. Our analysis shows that a significant portion of questions require complex parsing of the text and the diagrams and reasoning, indicating that our dataset is more complex compared to previous machine comprehension and visual question answering datasets. We extend state-of-the-art methods for textual machine comprehension and visual question answering to the TQA dataset. Our experiments show that these models do not perform well on TQA. The presented dataset opens new challenges for research in question answering and reasoning across multiple modalities.
我们介绍了多模态机器理解(M3C)任务,其目的是回答给定文本,图表和图像背景下的多模态问题。我们提出了教科书问答(TQA)数据集,其中包括1,076个课程和26,260个多模态问题,取自中学科学课程。我们的分析表明,很大一部分问题需要对文本、图表和推理进行复杂的解析,这表明我们的数据集比以前的机器理解和视觉问答数据集更复杂。我们将最先进的文本机器理解和视觉问题回答方法扩展到TQA数据集。我们的实验表明,这些模型在TQA上表现不佳。提出的数据集为跨多种模式的问题回答和推理研究开辟了新的挑战。
{"title":"Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension","authors":"Aniruddha Kembhavi, Minjoon Seo, Dustin Schwenk, Jonghyun Choi, Ali Farhadi, Hannaneh Hajishirzi","doi":"10.1109/CVPR.2017.571","DOIUrl":"https://doi.org/10.1109/CVPR.2017.571","url":null,"abstract":"We introduce the task of Multi-Modal Machine Comprehension (M3C), which aims at answering multimodal questions given a context of text, diagrams and images. We present the Textbook Question Answering (TQA) dataset that includes 1,076 lessons and 26,260 multi-modal questions, taken from middle school science curricula. Our analysis shows that a significant portion of questions require complex parsing of the text and the diagrams and reasoning, indicating that our dataset is more complex compared to previous machine comprehension and visual question answering datasets. We extend state-of-the-art methods for textual machine comprehension and visual question answering to the TQA dataset. Our experiments show that these models do not perform well on TQA. The presented dataset opens new challenges for research in question answering and reasoning across multiple modalities.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"57 1","pages":"5376-5384"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90843414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 202
BranchOut: Regularization for Online Ensemble Tracking with Convolutional Neural Networks 分支:卷积神经网络在线集成跟踪的正则化
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.63
Bohyung Han, Jack Sim, Hartwig Adam
We propose an extremely simple but effective regularization technique of convolutional neural networks (CNNs), referred to as BranchOut, for online ensemble tracking. Our algorithm employs a CNN for target representation, which has a common convolutional layers but has multiple branches of fully connected layers. For better regularization, a subset of branches in the CNN are selected randomly for online learning whenever target appearance models need to be updated. Each branch may have a different number of layers to maintain variable abstraction levels of target appearances. BranchOut with multi-level target representation allows us to learn robust target appearance models with diversity and handle various challenges in visual tracking problem effectively. The proposed algorithm is evaluated in standard tracking benchmarks and shows the state-of-the-art performance even without additional pretraining on external tracking sequences.
我们提出了一种非常简单但有效的卷积神经网络(cnn)正则化技术,称为BranchOut,用于在线集成跟踪。我们的算法采用CNN进行目标表示,该算法具有常见的卷积层,但具有完全连接层的多个分支。为了更好的正则化,在需要更新目标外观模型时,随机选择CNN中的分支子集进行在线学习。每个分支可能有不同数量的层来维护目标外观的可变抽象级别。基于多层次目标表示的BranchOut算法使我们能够学习具有多样性的鲁棒目标外观模型,有效地解决了视觉跟踪问题中的各种挑战。该算法在标准跟踪基准中进行了评估,即使没有对外部跟踪序列进行额外的预训练,也显示出最先进的性能。
{"title":"BranchOut: Regularization for Online Ensemble Tracking with Convolutional Neural Networks","authors":"Bohyung Han, Jack Sim, Hartwig Adam","doi":"10.1109/CVPR.2017.63","DOIUrl":"https://doi.org/10.1109/CVPR.2017.63","url":null,"abstract":"We propose an extremely simple but effective regularization technique of convolutional neural networks (CNNs), referred to as BranchOut, for online ensemble tracking. Our algorithm employs a CNN for target representation, which has a common convolutional layers but has multiple branches of fully connected layers. For better regularization, a subset of branches in the CNN are selected randomly for online learning whenever target appearance models need to be updated. Each branch may have a different number of layers to maintain variable abstraction levels of target appearances. BranchOut with multi-level target representation allows us to learn robust target appearance models with diversity and handle various challenges in visual tracking problem effectively. The proposed algorithm is evaluated in standard tracking benchmarks and shows the state-of-the-art performance even without additional pretraining on external tracking sequences.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"189 1","pages":"521-530"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83055338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 144
Non-local Deep Features for Salient Object Detection 用于显著目标检测的非局部深度特征
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.698
Zhiming Luo, A. Mishra, Andrew Achkar, Justin A. Eichel, Shaozi Li, Pierre-Marc Jodoin
Saliency detection aims to highlight the most relevant objects in an image. Methods using conventional models struggle whenever salient objects are pictured on top of a cluttered background while deep neural nets suffer from excess complexity and slow evaluation speeds. In this paper, we propose a simplified convolutional neural network which combines local and global information through a multi-resolution 4×5 grid structure. Instead of enforcing spacial coherence with a CRF or superpixels as is usually the case, we implemented a loss function inspired by the Mumford-Shah functional which penalizes errors on the boundary. We trained our model on the MSRA-B dataset, and tested it on six different saliency benchmark datasets. Results show that our method is on par with the state-of-the-art while reducing computation time by a factor of 18 to 100 times, enabling near real-time, high performance saliency detection.
显著性检测旨在突出显示图像中最相关的物体。每当在杂乱的背景上描绘出突出的物体时,使用传统模型的方法就会遇到困难,而深度神经网络则受到过度复杂性和缓慢评估速度的困扰。在本文中,我们提出了一种简化的卷积神经网络,它通过多分辨率4×5网格结构结合了局部和全局信息。与通常使用CRF或超像素强制空间一致性不同,我们实现了一个受Mumford-Shah函数启发的损失函数,该函数用于惩罚边界上的错误。我们在MSRA-B数据集上训练我们的模型,并在六个不同的显著性基准数据集上进行测试。结果表明,我们的方法与最先进的方法相当,同时将计算时间减少了18到100倍,实现了近实时、高性能的显著性检测。
{"title":"Non-local Deep Features for Salient Object Detection","authors":"Zhiming Luo, A. Mishra, Andrew Achkar, Justin A. Eichel, Shaozi Li, Pierre-Marc Jodoin","doi":"10.1109/CVPR.2017.698","DOIUrl":"https://doi.org/10.1109/CVPR.2017.698","url":null,"abstract":"Saliency detection aims to highlight the most relevant objects in an image. Methods using conventional models struggle whenever salient objects are pictured on top of a cluttered background while deep neural nets suffer from excess complexity and slow evaluation speeds. In this paper, we propose a simplified convolutional neural network which combines local and global information through a multi-resolution 4×5 grid structure. Instead of enforcing spacial coherence with a CRF or superpixels as is usually the case, we implemented a loss function inspired by the Mumford-Shah functional which penalizes errors on the boundary. We trained our model on the MSRA-B dataset, and tested it on six different saliency benchmark datasets. Results show that our method is on par with the state-of-the-art while reducing computation time by a factor of 18 to 100 times, enabling near real-time, high performance saliency detection.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"16 1","pages":"6593-6601"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82057473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 430
SCC: Semantic Context Cascade for Efficient Action Detection 高效动作检测的语义上下文级联
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.338
Fabian Caba Heilbron, Wayner Barrios, Victor Escorcia, Bernard Ghanem
Despite the recent advances in large-scale video analysis, action detection remains as one of the most challenging unsolved problems in computer vision. This snag is in part due to the large volume of data that needs to be analyzed to detect actions in videos. Existing approaches have mitigated the computational cost, but still, these methods lack rich high-level semantics that helps them to localize the actions quickly. In this paper, we introduce a Semantic Cascade Context (SCC) model that aims to detect action in long video sequences. By embracing semantic priors associated with human activities, SCC produces high-quality class-specific action proposals and prune unrelated activities in a cascade fashion. Experimental results in ActivityNet unveils that SCC achieves state-of-the-art performance for action detection while operating at real time.
尽管最近在大规模视频分析方面取得了进展,但动作检测仍然是计算机视觉中最具挑战性的未解决问题之一。这种障碍部分是由于需要分析大量数据来检测视频中的动作。现有的方法已经降低了计算成本,但是这些方法仍然缺乏丰富的高级语义来帮助它们快速定位动作。在本文中,我们引入了一个语义级联上下文(SCC)模型,旨在检测长视频序列中的动作。通过采用与人类活动相关的语义先验,SCC产生高质量的类特定行动建议,并以级联方式修剪不相关的活动。ActivityNet的实验结果表明,SCC在实时操作时实现了最先进的动作检测性能。
{"title":"SCC: Semantic Context Cascade for Efficient Action Detection","authors":"Fabian Caba Heilbron, Wayner Barrios, Victor Escorcia, Bernard Ghanem","doi":"10.1109/CVPR.2017.338","DOIUrl":"https://doi.org/10.1109/CVPR.2017.338","url":null,"abstract":"Despite the recent advances in large-scale video analysis, action detection remains as one of the most challenging unsolved problems in computer vision. This snag is in part due to the large volume of data that needs to be analyzed to detect actions in videos. Existing approaches have mitigated the computational cost, but still, these methods lack rich high-level semantics that helps them to localize the actions quickly. In this paper, we introduce a Semantic Cascade Context (SCC) model that aims to detect action in long video sequences. By embracing semantic priors associated with human activities, SCC produces high-quality class-specific action proposals and prune unrelated activities in a cascade fashion. Experimental results in ActivityNet unveils that SCC achieves state-of-the-art performance for action detection while operating at real time.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"43 1","pages":"3175-3184"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80522790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 98
Mimicking Very Efficient Network for Object Detection 模拟非常有效的网络对象检测
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.776
Quanquan Li, Sheng Jin, Junjie Yan
Current CNN based object detectors need initialization from pre-trained ImageNet classification models, which are usually time-consuming. In this paper, we present a fully convolutional feature mimic framework to train very efficient CNN based detectors, which do not need ImageNet pre-training and achieve competitive performance as the large and slow models. We add supervision from high-level features of the large networks in training to help the small network better learn object representation. More specifically, we conduct a mimic method for the features sampled from the entire feature map and use a transform layer to map features from the small network onto the same dimension of the large network. In training the small network, we optimize the similarity between features sampled from the same region on the feature maps of both networks. Extensive experiments are conducted on pedestrian and common object detection tasks using VGG, Inception and ResNet. On both Caltech and Pascal VOC, we show that the modified 2.5× accelerated Inception network achieves competitive performance as the full Inception Network. Our faster model runs at 80 FPS for a 1000×1500 large input with only a minor degradation of performance on Caltech.
目前基于CNN的目标检测器需要从预训练的ImageNet分类模型中初始化,这通常很耗时。在本文中,我们提出了一个全卷积特征模拟框架来训练非常高效的基于CNN的检测器,该检测器不需要ImageNet预训练,并且与大型慢速模型一样具有竞争力。我们在训练中加入来自大型网络的高级特征的监督,以帮助小型网络更好地学习对象表示。更具体地说,我们对从整个特征图中采样的特征进行模拟方法,并使用变换层将小网络中的特征映射到大网络的相同维度上。在训练小网络时,我们在两个网络的特征映射上优化从同一区域采样的特征之间的相似性。利用VGG、Inception和ResNet对行人和常见目标检测任务进行了大量的实验。在Caltech和Pascal VOC上,我们证明了改性的2.5×加速的盗梦网络与完整的盗梦网络一样具有竞争力。我们更快的模型在1000×1500大输入下以80 FPS运行,在加州理工学院只有轻微的性能下降。
{"title":"Mimicking Very Efficient Network for Object Detection","authors":"Quanquan Li, Sheng Jin, Junjie Yan","doi":"10.1109/CVPR.2017.776","DOIUrl":"https://doi.org/10.1109/CVPR.2017.776","url":null,"abstract":"Current CNN based object detectors need initialization from pre-trained ImageNet classification models, which are usually time-consuming. In this paper, we present a fully convolutional feature mimic framework to train very efficient CNN based detectors, which do not need ImageNet pre-training and achieve competitive performance as the large and slow models. We add supervision from high-level features of the large networks in training to help the small network better learn object representation. More specifically, we conduct a mimic method for the features sampled from the entire feature map and use a transform layer to map features from the small network onto the same dimension of the large network. In training the small network, we optimize the similarity between features sampled from the same region on the feature maps of both networks. Extensive experiments are conducted on pedestrian and common object detection tasks using VGG, Inception and ResNet. On both Caltech and Pascal VOC, we show that the modified 2.5× accelerated Inception network achieves competitive performance as the full Inception Network. Our faster model runs at 80 FPS for a 1000×1500 large input with only a minor degradation of performance on Caltech.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"7 1","pages":"7341-7349"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80784058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 256
A Non-local Low-Rank Framework for Ultrasound Speckle Reduction 超声散斑去除的非局部低秩框架
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.60
Lei Zhu, Chi-Wing Fu, M. S. Brown, P. Heng
Speckle refers to the granular patterns that occur in ultrasound images due to wave interference. Speckle removal can greatly improve the visibility of the underlying structures in an ultrasound image and enhance subsequent post processing. We present a novel framework for speckle removal based on low-rank non-local filtering. Our approach works by first computing a guidance image that assists in the selection of candidate patches for non-local filtering in the face of significant speckles. The candidate patches are further refined using a low-rank minimization estimated using a truncated weighted nuclear norm (TWNN) and structured sparsity. We show that the proposed filtering framework produces results that outperform state-of-the-art methods both qualitatively and quantitatively. This framework also provides better segmentation results when used for pre-processing ultrasound images.
斑点是指在超声图像中由于波的干扰而出现的颗粒状图案。斑点去除可以大大提高超声图像中底层结构的可见性,并增强后续的后处理。提出了一种新的基于低秩非局部滤波的散斑去除框架。我们的方法首先通过计算一个指导图像来帮助选择候选补丁,以便在面对重要斑点时进行非局部滤波。候选补丁使用截断加权核范数(TWNN)和结构稀疏度估计的低秩最小化进一步细化。我们表明,提出的过滤框架产生的结果优于最先进的方法定性和定量。该框架在超声图像预处理中也提供了更好的分割效果。
{"title":"A Non-local Low-Rank Framework for Ultrasound Speckle Reduction","authors":"Lei Zhu, Chi-Wing Fu, M. S. Brown, P. Heng","doi":"10.1109/CVPR.2017.60","DOIUrl":"https://doi.org/10.1109/CVPR.2017.60","url":null,"abstract":"Speckle refers to the granular patterns that occur in ultrasound images due to wave interference. Speckle removal can greatly improve the visibility of the underlying structures in an ultrasound image and enhance subsequent post processing. We present a novel framework for speckle removal based on low-rank non-local filtering. Our approach works by first computing a guidance image that assists in the selection of candidate patches for non-local filtering in the face of significant speckles. The candidate patches are further refined using a low-rank minimization estimated using a truncated weighted nuclear norm (TWNN) and structured sparsity. We show that the proposed filtering framework produces results that outperform state-of-the-art methods both qualitatively and quantitatively. This framework also provides better segmentation results when used for pre-processing ultrasound images.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"577 1","pages":"493-501"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76259459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Fast Haze Removal for Nighttime Image Using Maximum Reflectance Prior 快速雾去除夜间图像使用最大反射率先验
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.742
Jing Zhang, Yang Cao, Shuai Fang, Yu Kang, Changwen Chen
In this paper, we address a haze removal problem from a single nighttime image, even in the presence of varicolored and non-uniform illumination. The core idea lies in a novel maximum reflectance prior. We first introduce the nighttime hazy imaging model, which includes a local ambient illumination item in both direct attenuation term and scattering term. Then, we propose a simple but effective image prior, maximum reflectance prior, to estimate the varying ambient illumination. The maximum reflectance prior is based on a key observation: for most daytime haze-free image patches, each color channel has very high intensity at some pixels. For the nighttime haze image, the local maximum intensities at each color channel are mainly contributed by the ambient illumination. Therefore, we can directly estimate the ambient illumination and transmission map, and consequently restore a high quality haze-free image. Experimental results on various nighttime hazy images demonstrate the effectiveness of the proposed approach. In particular, our approach has the advantage of computational efficiency, which is 10-100 times faster than state-of-the-art methods.
在本文中,我们解决了从单个夜间图像中去除雾霾的问题,即使在存在不同颜色和不均匀照明的情况下。其核心思想在于一种新的最大反射率先验。首先介绍了夜间朦胧成像模型,该模型在直接衰减项和散射项中都包含了局部环境光照项。然后,我们提出了一个简单而有效的图像先验,最大反射率先验,以估计变化的环境照明。最大反射率先验是基于一个关键的观察:对于大多数白天无雾图像补丁,每个颜色通道在某些像素处具有非常高的强度。对于夜间雾霾图像,各颜色通道的局部最大强度主要由环境光照贡献。因此,我们可以直接估计环境照度和透射图,从而恢复高质量的无雾图像。在各种夜间朦胧图像上的实验结果证明了该方法的有效性。特别是,我们的方法具有计算效率的优势,比最先进的方法快10-100倍。
{"title":"Fast Haze Removal for Nighttime Image Using Maximum Reflectance Prior","authors":"Jing Zhang, Yang Cao, Shuai Fang, Yu Kang, Changwen Chen","doi":"10.1109/CVPR.2017.742","DOIUrl":"https://doi.org/10.1109/CVPR.2017.742","url":null,"abstract":"In this paper, we address a haze removal problem from a single nighttime image, even in the presence of varicolored and non-uniform illumination. The core idea lies in a novel maximum reflectance prior. We first introduce the nighttime hazy imaging model, which includes a local ambient illumination item in both direct attenuation term and scattering term. Then, we propose a simple but effective image prior, maximum reflectance prior, to estimate the varying ambient illumination. The maximum reflectance prior is based on a key observation: for most daytime haze-free image patches, each color channel has very high intensity at some pixels. For the nighttime haze image, the local maximum intensities at each color channel are mainly contributed by the ambient illumination. Therefore, we can directly estimate the ambient illumination and transmission map, and consequently restore a high quality haze-free image. Experimental results on various nighttime hazy images demonstrate the effectiveness of the proposed approach. In particular, our approach has the advantage of computational efficiency, which is 10-100 times faster than state-of-the-art methods.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"45 1","pages":"7016-7024"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90721435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 114
Deep Crisp Boundaries 深脆边界
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.187
Yupei Wang, Xin Zhao, Kaiqi Huang
Edge detection had made significant progress with the help of deep Convolutional Networks (ConvNet). ConvNet based edge detectors approached human level performance on standard benchmarks. We provide a systematical study of these detector outputs, and show that they failed to accurately localize edges, which can be adversarial for tasks that require crisp edge inputs. In addition, we propose a novel refinement architecture to address the challenging problem of learning a crisp edge detector using ConvNet. Our method leverages a top-down backward refinement pathway, and progressively increases the resolution of feature maps to generate crisp edges. Our results achieve promising performance on BSDS500, surpassing human accuracy when using standard criteria, and largely outperforming state-of-the-art methods when using more strict criteria. We further demonstrate the benefit of crisp edge maps for estimating optical flow and generating object proposals.
在深度卷积网络(ConvNet)的帮助下,边缘检测取得了重大进展。基于卷积神经网络的边缘检测器在标准基准测试中的表现接近人类水平。我们对这些检测器输出进行了系统的研究,并表明它们无法准确地定位边缘,这对于需要清晰边缘输入的任务来说可能是对抗的。此外,我们提出了一种新的改进架构来解决使用卷积神经网络学习清晰边缘检测器的挑战性问题。该方法利用自顶向下的反向细化路径,逐步提高特征图的分辨率,生成清晰的边缘。我们的结果在BSDS500上取得了很好的性能,在使用标准标准时超过了人类的准确性,在使用更严格的标准时大大优于最先进的方法。我们进一步证明了清晰的边缘图在估计光流和生成目标建议方面的好处。
{"title":"Deep Crisp Boundaries","authors":"Yupei Wang, Xin Zhao, Kaiqi Huang","doi":"10.1109/CVPR.2017.187","DOIUrl":"https://doi.org/10.1109/CVPR.2017.187","url":null,"abstract":"Edge detection had made significant progress with the help of deep Convolutional Networks (ConvNet). ConvNet based edge detectors approached human level performance on standard benchmarks. We provide a systematical study of these detector outputs, and show that they failed to accurately localize edges, which can be adversarial for tasks that require crisp edge inputs. In addition, we propose a novel refinement architecture to address the challenging problem of learning a crisp edge detector using ConvNet. Our method leverages a top-down backward refinement pathway, and progressively increases the resolution of feature maps to generate crisp edges. Our results achieve promising performance on BSDS500, surpassing human accuracy when using standard criteria, and largely outperforming state-of-the-art methods when using more strict criteria. We further demonstrate the benefit of crisp edge maps for estimating optical flow and generating object proposals.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"1724-1732"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91024824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 81
Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data 慢流:利用高速相机的准确和多样化的光流参考数据
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.154
J. Janai, F. Güney, Jonas Wulff, Michael J. Black, Andreas Geiger
Existing optical flow datasets are limited in size and variability due to the difficulty of capturing dense ground truth. In this paper, we tackle this problem by tracking pixels through densely sampled space-time volumes recorded with a high-speed video camera. Our model exploits the linearity of small motions and reasons about occlusions from multiple frames. Using our technique, we are able to establish accurate reference flow fields outside the laboratory in natural environments. Besides, we show how our predictions can be used to augment the input images with realistic motion blur. We demonstrate the quality of the produced flow fields on synthetic and real-world datasets. Finally, we collect a novel challenging optical flow dataset by applying our technique on data from a high-speed camera and analyze the performance of the state-of-the-art in optical flow under various levels of motion blur.
由于难以捕获密集的地面真值,现有的光流数据集在大小和可变性方面受到限制。在本文中,我们通过高速摄像机记录的密集采样时空体来跟踪像素来解决这个问题。我们的模型利用了小运动的线性和多帧遮挡的原因。利用我们的技术,我们能够在实验室外的自然环境中建立精确的参考流场。此外,我们还展示了如何使用我们的预测来增强具有逼真运动模糊的输入图像。我们在合成数据集和真实数据集上展示了产生的流场的质量。最后,我们将我们的技术应用于高速摄像机的数据,收集了一个新的具有挑战性的光流数据集,并分析了在不同运动模糊水平下最先进的光流性能。
{"title":"Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data","authors":"J. Janai, F. Güney, Jonas Wulff, Michael J. Black, Andreas Geiger","doi":"10.1109/CVPR.2017.154","DOIUrl":"https://doi.org/10.1109/CVPR.2017.154","url":null,"abstract":"Existing optical flow datasets are limited in size and variability due to the difficulty of capturing dense ground truth. In this paper, we tackle this problem by tracking pixels through densely sampled space-time volumes recorded with a high-speed video camera. Our model exploits the linearity of small motions and reasons about occlusions from multiple frames. Using our technique, we are able to establish accurate reference flow fields outside the laboratory in natural environments. Besides, we show how our predictions can be used to augment the input images with realistic motion blur. We demonstrate the quality of the produced flow fields on synthetic and real-world datasets. Finally, we collect a novel challenging optical flow dataset by applying our technique on data from a high-speed camera and analyze the performance of the state-of-the-art in optical flow under various levels of motion blur.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"33 1","pages":"1406-1416"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75713724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Global Context-Aware Attention LSTM Networks for 3D Action Recognition 面向三维动作识别的全局上下文感知注意力LSTM网络
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.391
Jun Liu, G. Wang, Ping Hu, Ling-yu Duan, A. Kot
Long Short-Term Memory (LSTM) networks have shown superior performance in 3D human action recognition due to their power in modeling the dynamics and dependencies in sequential data. Since not all joints are informative for action analysis and the irrelevant joints often bring a lot of noise, we need to pay more attention to the informative ones. However, original LSTM does not have strong attention capability. Hence we propose a new class of LSTM network, Global Context-Aware Attention LSTM (GCA-LSTM), for 3D action recognition, which is able to selectively focus on the informative joints in the action sequence with the assistance of global contextual information. In order to achieve a reliable attention representation for the action sequence, we further propose a recurrent attention mechanism for our GCA-LSTM network, in which the attention performance is improved iteratively. Experiments show that our end-to-end network can reliably focus on the most informative joints in each frame of the skeleton sequence. Moreover, our network yields state-of-the-art performance on three challenging datasets for 3D action recognition.
长短期记忆(LSTM)网络在三维人体动作识别中表现出优异的性能,这是由于其对连续数据的动态和依赖性建模的能力。由于并非所有的关节都是信息丰富的,并且不相关的关节往往会带来大量的噪声,因此我们需要更多地关注信息丰富的关节。但是,原始LSTM的注意能力不强。因此,我们提出了一种新的LSTM网络——全局上下文感知注意力LSTM (GCA-LSTM),用于三维动作识别,它能够在全局上下文信息的帮助下选择性地关注动作序列中的信息关节。为了实现动作序列的可靠注意表示,我们进一步提出了一种循环注意机制,迭代提高了GCA-LSTM网络的注意性能。实验表明,我们的端到端网络可以可靠地关注骨架序列每帧中信息量最大的关节。此外,我们的网络在3D动作识别的三个具有挑战性的数据集上产生了最先进的性能。
{"title":"Global Context-Aware Attention LSTM Networks for 3D Action Recognition","authors":"Jun Liu, G. Wang, Ping Hu, Ling-yu Duan, A. Kot","doi":"10.1109/CVPR.2017.391","DOIUrl":"https://doi.org/10.1109/CVPR.2017.391","url":null,"abstract":"Long Short-Term Memory (LSTM) networks have shown superior performance in 3D human action recognition due to their power in modeling the dynamics and dependencies in sequential data. Since not all joints are informative for action analysis and the irrelevant joints often bring a lot of noise, we need to pay more attention to the informative ones. However, original LSTM does not have strong attention capability. Hence we propose a new class of LSTM network, Global Context-Aware Attention LSTM (GCA-LSTM), for 3D action recognition, which is able to selectively focus on the informative joints in the action sequence with the assistance of global contextual information. In order to achieve a reliable attention representation for the action sequence, we further propose a recurrent attention mechanism for our GCA-LSTM network, in which the attention performance is improved iteratively. Experiments show that our end-to-end network can reliably focus on the most informative joints in each frame of the skeleton sequence. Moreover, our network yields state-of-the-art performance on three challenging datasets for 3D action recognition.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"4 5 1","pages":"3671-3680"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83453053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 521
期刊
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1