首页 > 最新文献

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Pull the Plug? Predicting If Computers or Humans Should Segment Images 拔掉插头?预测是计算机还是人类应该分割图像
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.48
D. Gurari, S. Jain, Margrit Betke, K. Grauman
Foreground object segmentation is a critical step for many image analysis tasks. While automated methods can produce high-quality results, their failures disappoint users in need of practical solutions. We propose a resource allocation framework for predicting how best to allocate a fixed budget of human annotation effort in order to collect higher quality segmentations for a given batch of images and automated methods. The framework is based on a proposed prediction module that estimates the quality of given algorithm-drawn segmentations. We demonstrate the value of the framework for two novel tasks related to "pulling the plug" on computer and human annotators. Specifically, we implement two systems that automatically decide, for a batch of images, when to replace 1) humans with computers to create coarse segmentations required to initialize segmentation tools and 2) computers with humans to create final, fine-grained segmentations. Experiments demonstrate the advantage of relying on a mix of human and computer efforts over relying on either resource alone for segmenting objects in three diverse datasets representing visible, phase contrast microscopy, and fluorescence microscopy images.
前景目标分割是许多图像分析任务的关键步骤。虽然自动化方法可以产生高质量的结果,但它们的失败使需要实际解决方案的用户失望。我们提出了一个资源分配框架,用于预测如何最好地分配人类注释工作的固定预算,以便为给定的一批图像和自动化方法收集更高质量的分割。该框架基于一个预测模块,该模块估计给定算法绘制的分割的质量。我们展示了该框架在两个与计算机和人类注释器“拔掉插头”相关的新任务中的价值。具体来说,我们实现了两个系统,它们自动决定,对于一批图像,何时用计算机代替人类来创建初始化分割工具所需的粗分割,以及2)用计算机来创建最终的细粒度分割。实验证明,在三个不同的数据集中,分别代表可见、相对比显微镜和荧光显微镜图像,依靠人和计算机的混合努力比单独依靠任何一种资源来分割对象的优势。
{"title":"Pull the Plug? Predicting If Computers or Humans Should Segment Images","authors":"D. Gurari, S. Jain, Margrit Betke, K. Grauman","doi":"10.1109/CVPR.2016.48","DOIUrl":"https://doi.org/10.1109/CVPR.2016.48","url":null,"abstract":"Foreground object segmentation is a critical step for many image analysis tasks. While automated methods can produce high-quality results, their failures disappoint users in need of practical solutions. We propose a resource allocation framework for predicting how best to allocate a fixed budget of human annotation effort in order to collect higher quality segmentations for a given batch of images and automated methods. The framework is based on a proposed prediction module that estimates the quality of given algorithm-drawn segmentations. We demonstrate the value of the framework for two novel tasks related to \"pulling the plug\" on computer and human annotators. Specifically, we implement two systems that automatically decide, for a batch of images, when to replace 1) humans with computers to create coarse segmentations required to initialize segmentation tools and 2) computers with humans to create final, fine-grained segmentations. Experiments demonstrate the advantage of relying on a mix of human and computer efforts over relying on either resource alone for segmenting objects in three diverse datasets representing visible, phase contrast microscopy, and fluorescence microscopy images.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"16 1","pages":"382-391"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91022843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
A Weighted Variational Model for Simultaneous Reflectance and Illumination Estimation 同时估计反射率和照度的加权变分模型
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.304
Xueyang Fu, Delu Zeng, Yue Huang, Xiao-Ping Zhang, Xinghao Ding
We propose a weighted variational model to estimate both the reflectance and the illumination from an observed image. We show that, though it is widely adopted for ease of modeling, the log-transformed image for this task is not ideal. Based on the previous investigation of the logarithmic transformation, a new weighted variational model is proposed for better prior representation, which is imposed in the regularization terms. Different from conventional variational models, the proposed model can preserve the estimated reflectance with more details. Moreover, the proposed model can suppress noise to some extent. An alternating minimization scheme is adopted to solve the proposed model. Experimental results demonstrate the effectiveness of the proposed model with its algorithm. Compared with other variational methods, the proposed method yields comparable or better results on both subjective and objective assessments.
我们提出了一个加权变分模型来估计反射率和照度从观测图像。我们表明,尽管为了便于建模而广泛采用对数变换图像,但该任务的对数变换图像并不理想。在对对数变换研究的基础上,提出了一种新的加权变分模型,用于正则化项中更好的先验表示。与传统的变分模型不同,该模型可以更详细地保留估计的反射率。此外,该模型还能在一定程度上抑制噪声。采用交替最小化方案求解该模型。实验结果证明了该模型及其算法的有效性。与其他变分方法相比,所提出的方法在主观和客观评估上都产生了相当或更好的结果。
{"title":"A Weighted Variational Model for Simultaneous Reflectance and Illumination Estimation","authors":"Xueyang Fu, Delu Zeng, Yue Huang, Xiao-Ping Zhang, Xinghao Ding","doi":"10.1109/CVPR.2016.304","DOIUrl":"https://doi.org/10.1109/CVPR.2016.304","url":null,"abstract":"We propose a weighted variational model to estimate both the reflectance and the illumination from an observed image. We show that, though it is widely adopted for ease of modeling, the log-transformed image for this task is not ideal. Based on the previous investigation of the logarithmic transformation, a new weighted variational model is proposed for better prior representation, which is imposed in the regularization terms. Different from conventional variational models, the proposed model can preserve the estimated reflectance with more details. Moreover, the proposed model can suppress noise to some extent. An alternating minimization scheme is adopted to solve the proposed model. Experimental results demonstrate the effectiveness of the proposed model with its algorithm. Compared with other variational methods, the proposed method yields comparable or better results on both subjective and objective assessments.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"115 1","pages":"2782-2790"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73417073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 642
Learning Aligned Cross-Modal Representations from Weakly Aligned Data 从弱对齐数据中学习对齐的跨模态表示
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.321
Lluís Castrejón, Y. Aytar, Carl Vondrick, H. Pirsiavash, A. Torralba
People can recognize scenes across many different modalities beyond natural images. In this paper, we investigate how to learn cross-modal scene representations that transfer across modalities. To study this problem, we introduce a new cross-modal scene dataset. While convolutional neural networks can categorize cross-modal scenes well, they also learn an intermediate representation not aligned across modalities, which is undesirable for crossmodal transfer applications. We present methods to regularize cross-modal convolutional neural networks so that they have a shared representation that is agnostic of the modality. Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval. Moreover, our visualizations suggest that units emerge in the shared representation that tend to activate on consistent concepts independently of the modality.
除了自然图像之外,人们可以识别许多不同形式的场景。在本文中,我们研究了如何学习跨模态传输的跨模态场景表征。为了研究这个问题,我们引入了一个新的跨模态场景数据集。虽然卷积神经网络可以很好地对跨模态场景进行分类,但它们也学习了跨模态不对齐的中间表示,这对于跨模态传输应用是不希望的。我们提出了正则化跨模态卷积神经网络的方法,使它们具有与模态无关的共享表示。我们的实验表明,我们的场景表征可以帮助跨模态的表征转移以进行检索。此外,我们的可视化表明,单元出现在共享表征中,倾向于独立于模态的一致概念上激活。
{"title":"Learning Aligned Cross-Modal Representations from Weakly Aligned Data","authors":"Lluís Castrejón, Y. Aytar, Carl Vondrick, H. Pirsiavash, A. Torralba","doi":"10.1109/CVPR.2016.321","DOIUrl":"https://doi.org/10.1109/CVPR.2016.321","url":null,"abstract":"People can recognize scenes across many different modalities beyond natural images. In this paper, we investigate how to learn cross-modal scene representations that transfer across modalities. To study this problem, we introduce a new cross-modal scene dataset. While convolutional neural networks can categorize cross-modal scenes well, they also learn an intermediate representation not aligned across modalities, which is undesirable for crossmodal transfer applications. We present methods to regularize cross-modal convolutional neural networks so that they have a shared representation that is agnostic of the modality. Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval. Moreover, our visualizations suggest that units emerge in the shared representation that tend to activate on consistent concepts independently of the modality.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"33 1","pages":"2940-2949"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74545112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 158
A Task-Oriented Approach for Cost-Sensitive Recognition 面向任务的成本敏感识别方法
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.242
Roozbeh Mottaghi, Hannaneh Hajishirzi, Ali Farhadi
With the recent progress in visual recognition, we have already started to see a surge of vision related real-world applications. These applications, unlike general scene understanding, are task oriented and require specific information from visual data. Considering the current growth in new sensory devices, feature designs, feature learning methods, and algorithms, the search in the space of features and models becomes combinatorial. In this paper, we propose a novel cost-sensitive task-oriented recognition method that is based on a combination of linguistic semantics and visual cues. Our task-oriented framework is able to generalize to unseen tasks for which there is no training data and outperforms state-of-the-art cost-based recognition baselines on our new task-based dataset.
随着最近在视觉识别方面的进展,我们已经开始看到与视觉相关的实际应用激增。与一般的场景理解不同,这些应用是面向任务的,需要来自视觉数据的特定信息。考虑到当前新感官设备、特征设计、特征学习方法和算法的增长,特征和模型空间中的搜索变得组合。本文提出了一种基于语言语义和视觉线索相结合的成本敏感任务导向识别方法。我们的面向任务的框架能够推广到没有训练数据的看不见的任务,并且在我们新的基于任务的数据集上优于最先进的基于成本的识别基线。
{"title":"A Task-Oriented Approach for Cost-Sensitive Recognition","authors":"Roozbeh Mottaghi, Hannaneh Hajishirzi, Ali Farhadi","doi":"10.1109/CVPR.2016.242","DOIUrl":"https://doi.org/10.1109/CVPR.2016.242","url":null,"abstract":"With the recent progress in visual recognition, we have already started to see a surge of vision related real-world applications. These applications, unlike general scene understanding, are task oriented and require specific information from visual data. Considering the current growth in new sensory devices, feature designs, feature learning methods, and algorithms, the search in the space of features and models becomes combinatorial. In this paper, we propose a novel cost-sensitive task-oriented recognition method that is based on a combination of linguistic semantics and visual cues. Our task-oriented framework is able to generalize to unseen tasks for which there is no training data and outperforms state-of-the-art cost-based recognition baselines on our new task-based dataset.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"19 1","pages":"2203-2211"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76953810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Geometry-Informed Material Recognition 几何信息材料识别
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.172
Joseph DeGol, M. G. Fard, Derek Hoiem
Our goal is to recognize material categories using images and geometry information. In many applications, such as construction management, coarse geometry information is available. We investigate how 3D geometry (surface normals, camera intrinsic and extrinsic parameters) can be used with 2D features (texture and color) to improve material classification. We introduce a new dataset, GeoMat, which is the first to provide both image and geometry data in the form of: (i) training and testing patches that were extracted at different scales and perspectives from real world examples of each material category, and (ii) a large scale construction site scene that includes 160 images and over 800,000 hand labeled 3D points. Our results show that using 2D and 3D features both jointly and independently to model materials improves classification accuracy across multiple scales and viewing directions for both material patches and images of a large scale construction site scene.
我们的目标是使用图像和几何信息来识别材料类别。在许多应用中,例如施工管理,可以使用粗糙的几何信息。我们研究了如何将3D几何(表面法线,相机内部和外部参数)与2D特征(纹理和颜色)一起使用,以改进材料分类。我们引入了一个新的数据集,GeoMat,它是第一个以以下形式提供图像和几何数据的数据集:(i)从每个材料类别的真实世界示例中以不同的尺度和视角提取的训练和测试补丁,以及(ii)一个大型建筑工地场景,包括160张图像和超过80万个手工标记的3D点。我们的研究结果表明,结合或单独使用二维和三维特征来建模材料,可以提高大规模建筑工地场景中材料斑块和图像在多个尺度和观看方向上的分类精度。
{"title":"Geometry-Informed Material Recognition","authors":"Joseph DeGol, M. G. Fard, Derek Hoiem","doi":"10.1109/CVPR.2016.172","DOIUrl":"https://doi.org/10.1109/CVPR.2016.172","url":null,"abstract":"Our goal is to recognize material categories using images and geometry information. In many applications, such as construction management, coarse geometry information is available. We investigate how 3D geometry (surface normals, camera intrinsic and extrinsic parameters) can be used with 2D features (texture and color) to improve material classification. We introduce a new dataset, GeoMat, which is the first to provide both image and geometry data in the form of: (i) training and testing patches that were extracted at different scales and perspectives from real world examples of each material category, and (ii) a large scale construction site scene that includes 160 images and over 800,000 hand labeled 3D points. Our results show that using 2D and 3D features both jointly and independently to model materials improves classification accuracy across multiple scales and viewing directions for both material patches and images of a large scale construction site scene.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"68 1","pages":"1554-1562"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77725630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Coordinating Multiple Disparity Proposals for Stereo Computation 协调多视差立体计算方案
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.436
Ang Li, Dapeng Chen, Yuanliu Liu, Zejian Yuan
While great progress has been made in stereo computation over the last decades, large textureless regions remain challenging. Segment-based methods can tackle this problem properly, but their performances are sensitive to the segmentation results. In this paper, we alleviate the sensitivity by generating multiple proposals on absolute and relative disparities from multi-segmentations. These proposals supply rich descriptions of surface structures. Especially, the relative disparity between distant pixels can encode the large structure, which is critical to handle the large textureless regions. The proposals are coordinated by point-wise competition and pairwise collaboration within a MRF model. During inference, a dynamic programming is performed in different directions with various step sizes, so the long-range connections are better preserved. In the experiments, we carefully analyzed the effectiveness of the major components. Results on the 2014 Middlebury and KITTI 2015 stereo benchmark show that our method is comparable to state-of-the-art.
虽然在过去的几十年里,立体计算取得了很大的进步,但大型无纹理区域仍然具有挑战性。基于分割的方法可以很好地解决这一问题,但其性能对分割结果很敏感。在本文中,我们通过对多分段的绝对差异和相对差异提出多个建议来缓解这种敏感性。这些建议提供了丰富的表面结构描述。特别是,远距离像素之间的相对差异可以编码大的结构,这对于处理大的无纹理区域至关重要。在MRF模型中,通过逐点竞争和成对协作来协调提案。在推理过程中,在不同的方向上以不同的步长进行动态规划,从而更好地保留了远程连接。在实验中,我们仔细分析了主要成分的有效性。2014年Middlebury和2015年KITTI立体基准测试的结果表明,我们的方法与最先进的方法相当。
{"title":"Coordinating Multiple Disparity Proposals for Stereo Computation","authors":"Ang Li, Dapeng Chen, Yuanliu Liu, Zejian Yuan","doi":"10.1109/CVPR.2016.436","DOIUrl":"https://doi.org/10.1109/CVPR.2016.436","url":null,"abstract":"While great progress has been made in stereo computation over the last decades, large textureless regions remain challenging. Segment-based methods can tackle this problem properly, but their performances are sensitive to the segmentation results. In this paper, we alleviate the sensitivity by generating multiple proposals on absolute and relative disparities from multi-segmentations. These proposals supply rich descriptions of surface structures. Especially, the relative disparity between distant pixels can encode the large structure, which is critical to handle the large textureless regions. The proposals are coordinated by point-wise competition and pairwise collaboration within a MRF model. During inference, a dynamic programming is performed in different directions with various step sizes, so the long-range connections are better preserved. In the experiments, we carefully analyzed the effectiveness of the major components. Results on the 2014 Middlebury and KITTI 2015 stereo benchmark show that our method is comparable to state-of-the-art.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"310 1","pages":"4022-4030"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76454874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
A Holistic Approach to Cross-Channel Image Noise Modeling and Its Application to Image Denoising 跨通道图像噪声建模的整体方法及其在图像去噪中的应用
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.186
Seonghyeon Nam, Youngbae Hwang, Y. Matsushita, Seon Joo Kim
Modelling and analyzing noise in images is a fundamental task in many computer vision systems. Traditionally, noise has been modelled per color channel assuming that the color channels are independent. Although the color channels can be considered as mutually independent in camera RAW images, signals from different color channels get mixed during the imaging process inside the camera due to gamut mapping, tone-mapping, and compression. We show the influence of the in-camera imaging pipeline on noise and propose a new noise model in the 3D RGB space to accounts for the color channel mix-ups. A data-driven approach for determining the parameters of the new noise model is introduced as well as its application to image denoising. The experiments show that our noise model represents the noise in regular JPEG images more accurately compared to the previous models and is advantageous in image denoising.
对图像中的噪声进行建模和分析是许多计算机视觉系统的基本任务。传统上,假设每个颜色通道是独立的,噪声被建模为每个颜色通道。虽然在相机RAW图像中,颜色通道可以看作是相互独立的,但是在相机内部的成像过程中,由于色域映射、色调映射和压缩,不同颜色通道的信号会混合在一起。我们展示了相机内成像管道对噪声的影响,并在3D RGB空间中提出了一个新的噪声模型来解释颜色通道混淆。介绍了一种确定新噪声模型参数的数据驱动方法及其在图像去噪中的应用。实验表明,我们的噪声模型比以往的模型更准确地反映了常规JPEG图像中的噪声,在图像去噪方面具有优势。
{"title":"A Holistic Approach to Cross-Channel Image Noise Modeling and Its Application to Image Denoising","authors":"Seonghyeon Nam, Youngbae Hwang, Y. Matsushita, Seon Joo Kim","doi":"10.1109/CVPR.2016.186","DOIUrl":"https://doi.org/10.1109/CVPR.2016.186","url":null,"abstract":"Modelling and analyzing noise in images is a fundamental task in many computer vision systems. Traditionally, noise has been modelled per color channel assuming that the color channels are independent. Although the color channels can be considered as mutually independent in camera RAW images, signals from different color channels get mixed during the imaging process inside the camera due to gamut mapping, tone-mapping, and compression. We show the influence of the in-camera imaging pipeline on noise and propose a new noise model in the 3D RGB space to accounts for the color channel mix-ups. A data-driven approach for determining the parameters of the new noise model is introduced as well as its application to image denoising. The experiments show that our noise model represents the noise in regular JPEG images more accurately compared to the previous models and is advantageous in image denoising.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"1683-1691"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81715121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 184
Deep Relative Distance Learning: Tell the Difference between Similar Vehicles 深度相对远程学习:区分相似车辆
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.238
Hongye Liu, Yonghong Tian, Yaowei Wang, Lu Pang, Tiejun Huang
The growing explosion in the use of surveillance cameras in public security highlights the importance of vehicle search from a large-scale image or video database. However, compared with person re-identification or face recognition, vehicle search problem has long been neglected by researchers in vision community. This paper focuses on an interesting but challenging problem, vehicle re-identification (a.k.a precise vehicle search). We propose a Deep Relative Distance Learning (DRDL) method which exploits a two-branch deep convolutional network to project raw vehicle images into an Euclidean space where distance can be directly used to measure the similarity of arbitrary two vehicles. To further facilitate the future research on this problem, we also present a carefully-organized largescale image database "VehicleID", which includes multiple images of the same vehicle captured by different realworld cameras in a city. We evaluate our DRDL method on our VehicleID dataset and another recently-released vehicle model classification dataset "CompCars" in three sets of experiments: vehicle re-identification, vehicle model verification and vehicle retrieval. Experimental results show that our method can achieve promising results and outperforms several state-of-the-art approaches.
在公共安全中使用监控摄像头的爆炸式增长凸显了从大规模图像或视频数据库中搜索车辆的重要性。然而,与人再识别或人脸识别相比,车辆搜索问题一直被视觉界的研究者所忽视。本文的重点是一个有趣但具有挑战性的问题,车辆再识别(即精确的车辆搜索)。我们提出了一种深度相对距离学习(DRDL)方法,该方法利用两分支深度卷积网络将原始车辆图像投影到欧几里德空间中,在欧几里德空间中,距离可以直接用于测量任意两辆车辆的相似性。为了进一步促进未来对这一问题的研究,我们还提出了一个精心组织的大型图像数据库“VehicleID”,其中包括城市中不同真实世界摄像机拍摄的同一车辆的多幅图像。在车辆再识别、车辆模型验证和车辆检索三组实验中,我们在我们的车辆id数据集和另一个最近发布的车辆模型分类数据集“CompCars”上评估了我们的DRDL方法。实验结果表明,该方法取得了令人满意的结果,并且优于几种最先进的方法。
{"title":"Deep Relative Distance Learning: Tell the Difference between Similar Vehicles","authors":"Hongye Liu, Yonghong Tian, Yaowei Wang, Lu Pang, Tiejun Huang","doi":"10.1109/CVPR.2016.238","DOIUrl":"https://doi.org/10.1109/CVPR.2016.238","url":null,"abstract":"The growing explosion in the use of surveillance cameras in public security highlights the importance of vehicle search from a large-scale image or video database. However, compared with person re-identification or face recognition, vehicle search problem has long been neglected by researchers in vision community. This paper focuses on an interesting but challenging problem, vehicle re-identification (a.k.a precise vehicle search). We propose a Deep Relative Distance Learning (DRDL) method which exploits a two-branch deep convolutional network to project raw vehicle images into an Euclidean space where distance can be directly used to measure the similarity of arbitrary two vehicles. To further facilitate the future research on this problem, we also present a carefully-organized largescale image database \"VehicleID\", which includes multiple images of the same vehicle captured by different realworld cameras in a city. We evaluate our DRDL method on our VehicleID dataset and another recently-released vehicle model classification dataset \"CompCars\" in three sets of experiments: vehicle re-identification, vehicle model verification and vehicle retrieval. Experimental results show that our method can achieve promising results and outperforms several state-of-the-art approaches.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"10 1","pages":"2167-2175"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88931445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 599
Learning Activity Progression in LSTMs for Activity Detection and Early Detection lstm在活动检测和早期检测中的学习活动进展
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.214
Shugao Ma, L. Sigal, S. Sclaroff
In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection tasks. Conventionally, when training a Recurrent Neural Network, specifically a Long Short Term Memory (LSTM) model, the training loss only considers classification error. However, we argue that the detection score of the correct activity category, or the detection score margin between the correct and incorrect categories, should be monotonically non-decreasing as the model observes more of the activity. We design novel ranking losses that directly penalize the model on violation of such monotonicities, which are used together with classification loss in training of LSTM models. Evaluation on ActivityNet shows significant benefits of the proposed ranking losses in both activity detection and early detection tasks.
在这项工作中,我们改进了时间深度模型的训练,以更好地学习活动检测和早期检测任务的活动进展。传统上,在训练递归神经网络,特别是长短期记忆(LSTM)模型时,训练损失只考虑分类误差。然而,我们认为正确活动类别的检测分数,或者正确和不正确类别之间的检测分数差,应该随着模型观察到更多的活动而单调地不减小。我们设计了新的排序损失,直接惩罚违反这种单调性的模型,并将其与分类损失一起用于LSTM模型的训练。对ActivityNet的评估表明,在活动检测和早期检测任务中,建议的排名损失都有显著的好处。
{"title":"Learning Activity Progression in LSTMs for Activity Detection and Early Detection","authors":"Shugao Ma, L. Sigal, S. Sclaroff","doi":"10.1109/CVPR.2016.214","DOIUrl":"https://doi.org/10.1109/CVPR.2016.214","url":null,"abstract":"In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection tasks. Conventionally, when training a Recurrent Neural Network, specifically a Long Short Term Memory (LSTM) model, the training loss only considers classification error. However, we argue that the detection score of the correct activity category, or the detection score margin between the correct and incorrect categories, should be monotonically non-decreasing as the model observes more of the activity. We design novel ranking losses that directly penalize the model on violation of such monotonicities, which are used together with classification loss in training of LSTM models. Evaluation on ActivityNet shows significant benefits of the proposed ranking losses in both activity detection and early detection tasks.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"5 1","pages":"1942-1950"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89692520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 365
Discriminative Multi-modal Feature Fusion for RGBD Indoor Scene Recognition 基于多模态特征融合的RGBD室内场景识别
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.324
Hongyuan Zhu, Jean-Baptiste Weibel, Shijian Lu
RGBD scene recognition has attracted increasingly attention due to the rapid development of depth sensors and their wide application scenarios. While many research has been conducted, most work used hand-crafted features which are difficult to capture high-level semantic structures. Recently, the feature extracted from deep convolutional neural network has produced state-of-the-art results for various computer vision tasks, which inspire researchers to explore incorporating CNN learned features for RGBD scene understanding. On the other hand, most existing work combines rgb and depth features without adequately exploiting the consistency and complementary information between them. Inspired by some recent work on RGBD object recognition using multi-modal feature fusion, we introduce a novel discriminative multi-modal fusion framework for rgbd scene recognition for the first time which simultaneously considers the inter-and intra-modality correlation for all samples and meanwhile regularizing the learned features to be discriminative and compact. The results from the multimodal layer can be back-propagated to the lower CNN layers, hence the parameters of the CNN layers and multimodal layers are updated iteratively until convergence. Experiments on the recently proposed large scale SUN RGB-D datasets show that our method achieved the state-of-the-art without any image segmentation.
由于深度传感器的快速发展和广泛的应用场景,RGBD场景识别越来越受到人们的关注。虽然已经进行了许多研究,但大多数工作使用手工制作的特征,这很难捕获高级语义结构。最近,从深度卷积神经网络中提取的特征在各种计算机视觉任务中产生了最先进的结果,这激发了研究人员探索将CNN学习特征用于RGBD场景理解的探索。另一方面,大多数现有工作将rgb和深度特征结合在一起,没有充分利用它们之间的一致性和互补性信息。受近年来一些基于多模态特征融合的RGBD目标识别工作的启发,我们首次引入了一种新的用于RGBD场景识别的判别性多模态融合框架,该框架同时考虑了所有样本的模态间和模态内相关性,同时对学习到的特征进行正则化,使其具有判别性和紧凑性。多模态层的结果可以反向传播到较低的CNN层,因此CNN层和多模态层的参数迭代更新直到收敛。在最近提出的大规模SUN RGB-D数据集上的实验表明,我们的方法在没有任何图像分割的情况下达到了最先进的水平。
{"title":"Discriminative Multi-modal Feature Fusion for RGBD Indoor Scene Recognition","authors":"Hongyuan Zhu, Jean-Baptiste Weibel, Shijian Lu","doi":"10.1109/CVPR.2016.324","DOIUrl":"https://doi.org/10.1109/CVPR.2016.324","url":null,"abstract":"RGBD scene recognition has attracted increasingly attention due to the rapid development of depth sensors and their wide application scenarios. While many research has been conducted, most work used hand-crafted features which are difficult to capture high-level semantic structures. Recently, the feature extracted from deep convolutional neural network has produced state-of-the-art results for various computer vision tasks, which inspire researchers to explore incorporating CNN learned features for RGBD scene understanding. On the other hand, most existing work combines rgb and depth features without adequately exploiting the consistency and complementary information between them. Inspired by some recent work on RGBD object recognition using multi-modal feature fusion, we introduce a novel discriminative multi-modal fusion framework for rgbd scene recognition for the first time which simultaneously considers the inter-and intra-modality correlation for all samples and meanwhile regularizing the learned features to be discriminative and compact. The results from the multimodal layer can be back-propagated to the lower CNN layers, hence the parameters of the CNN layers and multimodal layers are updated iteratively until convergence. Experiments on the recently proposed large scale SUN RGB-D datasets show that our method achieved the state-of-the-art without any image segmentation.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"18 1","pages":"2969-2976"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90067423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
期刊
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1