首页 > 最新文献

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Visual Localization by Learning Objects-Of-Interest Dense Match Regression 基于学习感兴趣对象密集匹配回归的视觉定位
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00578
Philippe Weinzaepfel, G. Csurka, Yohann Cabon, M. Humenberger
We introduce a novel CNN-based approach for visual localization from a single RGB image that relies on densely matching a set of Objects-of-Interest (OOIs). In this paper, we focus on planar objects which are highly descriptive in an environment, such as paintings in museums or logos and storefronts in malls or airports. For each OOI, we define a reference image for which 3D world coordinates are available. Given a query image, our CNN model detects the OOIs, segments them and finds a dense set of 2D-2D matches between each detected OOI and its corresponding reference image. Given these 2D-2D matches, together with the 3D world coordinates of each reference image, we obtain a set of 2D-3D matches from which solving a Perspective-n-Point problem gives a pose estimate. We show that 2D-3D matches for reference images, as well as OOI annotations can be obtained for all training images from a single instance annotation per OOI by leveraging Structure-from-Motion reconstruction. We introduce a novel synthetic dataset, VirtualGallery, which targets challenges such as varying lighting conditions and different occlusion levels. Our results show that our method achieves high precision and is robust to these challenges. We also experiment using the Baidu localization dataset captured in a shopping mall. Our approach is the first deep regression-based method to scale to such a larger environment.
我们介绍了一种新的基于cnn的方法,用于从单个RGB图像进行视觉定位,该方法依赖于一组感兴趣的对象(ooi)的密集匹配。在本文中,我们专注于在环境中具有高度描述性的平面对象,例如博物馆中的绘画或商场或机场的徽标和店面。对于每个OOI,我们定义一个可用3D世界坐标的参考图像。给定查询图像,我们的CNN模型检测OOI,对其进行分割,并在每个检测到的OOI与其对应的参考图像之间找到密集的2D-2D匹配集。给定这些2D-2D匹配,以及每个参考图像的3D世界坐标,我们获得一组2D-3D匹配,通过解决视角-n-点问题给出姿态估计。我们展示了参考图像的2D-3D匹配,以及OOI注释可以通过利用Structure-from-Motion重建从每个OOI的单个实例注释中获得所有训练图像的OOI注释。我们介绍了一个新的合成数据集,VirtualGallery,它针对不同的光照条件和不同的遮挡水平等挑战。结果表明,该方法具有较高的精度和鲁棒性。我们还使用在购物中心捕获的百度本地化数据集进行了实验。我们的方法是第一个基于深度回归的方法来扩展到这样一个更大的环境。
{"title":"Visual Localization by Learning Objects-Of-Interest Dense Match Regression","authors":"Philippe Weinzaepfel, G. Csurka, Yohann Cabon, M. Humenberger","doi":"10.1109/CVPR.2019.00578","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00578","url":null,"abstract":"We introduce a novel CNN-based approach for visual localization from a single RGB image that relies on densely matching a set of Objects-of-Interest (OOIs). In this paper, we focus on planar objects which are highly descriptive in an environment, such as paintings in museums or logos and storefronts in malls or airports. For each OOI, we define a reference image for which 3D world coordinates are available. Given a query image, our CNN model detects the OOIs, segments them and finds a dense set of 2D-2D matches between each detected OOI and its corresponding reference image. Given these 2D-2D matches, together with the 3D world coordinates of each reference image, we obtain a set of 2D-3D matches from which solving a Perspective-n-Point problem gives a pose estimate. We show that 2D-3D matches for reference images, as well as OOI annotations can be obtained for all training images from a single instance annotation per OOI by leveraging Structure-from-Motion reconstruction. We introduce a novel synthetic dataset, VirtualGallery, which targets challenges such as varying lighting conditions and different occlusion levels. Our results show that our method achieves high precision and is robust to these challenges. We also experiment using the Baidu localization dataset captured in a shopping mall. Our approach is the first deep regression-based method to scale to such a larger environment.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"27 1","pages":"5627-5636"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80314914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Learning RoI Transformer for Oriented Object Detection in Aerial Images 航空图像中定向目标检测的RoI变换学习
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00296
Jian Ding, Nan Xue, Yang Long, Guisong Xia, Qikai Lu
Object detection in aerial images is an active yet challenging task in computer vision because of the bird’s-eye view perspective, the highly complex backgrounds, and the variant appearances of objects. Especially when detecting densely packed objects in aerial images, methods relying on horizontal proposals for common object detection often introduce mismatches between the Region of Interests (RoIs) and objects. This leads to the common misalignment between the final object classification confidence and localization accuracy. In this paper, we propose a RoI Transformer to address these problems. The core idea of RoI Transformer is to apply spatial transformations on RoIs and learn the transformation parameters under the supervision of oriented bounding box (OBB) annotations. RoI Transformer is with lightweight and can be easily embedded into detectors for oriented object detection. Simply apply the RoI Transformer to light head RCNN has achieved state-of-the-art performances on two common and challenging aerial datasets, i.e., DOTA and HRSC2016, with a neglectable reduction to detection speed. Our RoI Transformer exceeds the deformable Position Sensitive RoI pooling when oriented bounding-box annotations are available. Extensive experiments have also validated the flexibility and effectiveness of our RoI Transformer.
航空图像中的目标检测是计算机视觉领域中一个活跃而又具有挑战性的任务,因为航空图像具有鸟瞰视角、高度复杂的背景和多变的物体外观。特别是在检测航空图像中密集堆积的目标时,依赖于水平建议的通用目标检测方法往往会引入兴趣区域(roi)与目标之间的不匹配。这导致了最终目标分类置信度和定位精度之间的不一致。在本文中,我们提出了一个RoI变压器来解决这些问题。RoI Transformer的核心思想是对RoI进行空间变换,并在定向边界框(OBB)注释的监督下学习变换参数。RoI Transformer重量轻,可以很容易地嵌入到检测器中进行定向对象检测。RCNN在两个常见且具有挑战性的航空数据集(即DOTA和HRSC2016)上取得了最先进的性能,而检测速度的降低可以忽略不计。当定向边界框注释可用时,我们的RoI Transformer超过了可变形的位置敏感RoI池。大量的实验也验证了我们的RoI变压器的灵活性和有效性。
{"title":"Learning RoI Transformer for Oriented Object Detection in Aerial Images","authors":"Jian Ding, Nan Xue, Yang Long, Guisong Xia, Qikai Lu","doi":"10.1109/CVPR.2019.00296","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00296","url":null,"abstract":"Object detection in aerial images is an active yet challenging task in computer vision because of the bird’s-eye view perspective, the highly complex backgrounds, and the variant appearances of objects. Especially when detecting densely packed objects in aerial images, methods relying on horizontal proposals for common object detection often introduce mismatches between the Region of Interests (RoIs) and objects. This leads to the common misalignment between the final object classification confidence and localization accuracy. In this paper, we propose a RoI Transformer to address these problems. The core idea of RoI Transformer is to apply spatial transformations on RoIs and learn the transformation parameters under the supervision of oriented bounding box (OBB) annotations. RoI Transformer is with lightweight and can be easily embedded into detectors for oriented object detection. Simply apply the RoI Transformer to light head RCNN has achieved state-of-the-art performances on two common and challenging aerial datasets, i.e., DOTA and HRSC2016, with a neglectable reduction to detection speed. Our RoI Transformer exceeds the deformable Position Sensitive RoI pooling when oriented bounding-box annotations are available. Extensive experiments have also validated the flexibility and effectiveness of our RoI Transformer.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"18 1","pages":"2844-2853"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83039855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 521
Spectral Reconstruction From Dispersive Blur: A Novel Light Efficient Spectral Imager 色散模糊光谱重建:一种新型光效光谱成像仪
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01248
Yuanyuan Zhao, Xue-mei Hu, Hui Guo, Zhan Ma, Tao Yue, Xun Cao
Developing high light efficiency imaging techniques to retrieve high dimensional optical signal is a long-term goal in computational photography. Multispectral imaging, which captures images of different wavelengths and boosting the abilities for revealing scene properties, has developed rapidly in the last few decades. From scanning method to snapshot imaging, the limit of light collection efficiency is kept being pushed which enables wider applications especially under the light-starved scenes. In this work, we propose a novel multispectral imaging technique, that could capture the multispectral images with a high light efficiency. Through investigating the dispersive blur caused by spectral dispersers and introducing the difference of blur (DoB) constraints, we propose a basic theory for capturing multispectral information from a single dispersive-blurred image and an additional spectrum of an arbitrary point in the scene. Based on the theory, we design a prototype system and develop an optimization algorithm to realize snapshot multispectral imaging. The effectiveness of the proposed method is verified on both the synthetic data and real captured images.
开发高光效成像技术来检索高维光信号是计算摄影的长期目标。在过去的几十年里,多光谱成像技术得到了迅速发展,它可以捕捉不同波长的图像,提高揭示场景特性的能力。从扫描方法到快照成像,不断突破光收集效率的极限,使其在光缺乏场景下的应用更加广泛。在这项工作中,我们提出了一种新的多光谱成像技术,可以以高光效捕获多光谱图像。通过研究光谱分散剂引起的色散模糊,并引入模糊约束的差异,提出了从单幅色散模糊图像和场景中任意点的附加光谱中捕获多光谱信息的基本理论。在此基础上,我们设计了一个原型系统,并开发了一种优化算法来实现快照多光谱成像。在合成数据和实际捕获图像上验证了该方法的有效性。
{"title":"Spectral Reconstruction From Dispersive Blur: A Novel Light Efficient Spectral Imager","authors":"Yuanyuan Zhao, Xue-mei Hu, Hui Guo, Zhan Ma, Tao Yue, Xun Cao","doi":"10.1109/CVPR.2019.01248","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01248","url":null,"abstract":"Developing high light efficiency imaging techniques to retrieve high dimensional optical signal is a long-term goal in computational photography. Multispectral imaging, which captures images of different wavelengths and boosting the abilities for revealing scene properties, has developed rapidly in the last few decades. From scanning method to snapshot imaging, the limit of light collection efficiency is kept being pushed which enables wider applications especially under the light-starved scenes. In this work, we propose a novel multispectral imaging technique, that could capture the multispectral images with a high light efficiency. Through investigating the dispersive blur caused by spectral dispersers and introducing the difference of blur (DoB) constraints, we propose a basic theory for capturing multispectral information from a single dispersive-blurred image and an additional spectrum of an arbitrary point in the scene. Based on the theory, we design a prototype system and develop an optimization algorithm to realize snapshot multispectral imaging. The effectiveness of the proposed method is verified on both the synthetic data and real captured images.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"76 1","pages":"12194-12203"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89520518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning 基于多尺度时空推理的视频社会关系识别
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00368
Xinchen Liu, Wu Liu, Meng Zhang, Jingwen Chen, Lianli Gao, C. Yan, Tao Mei
Discovering social relations, e.g., kinship, friendship, etc., from visual contents can make machines better interpret the behaviors and emotions of human beings. Existing studies mainly focus on recognizing social relations from still images while neglecting another important media--video. On one hand, the actions and storylines in videos provide more important cues for social relation recognition. On the other hand, the key persons may appear at arbitrary spatial-temporal locations, even not in one same image from beginning to the end. To overcome these challenges, we propose a Multi-scale Spatial-Temporal Reasoning (MSTR) framework to recognize social relations from videos. For the spatial representation, we not only adopt a temporal segment network to learn global action and scene information, but also design a Triple Graphs model to capture visual relations between persons and objects. For the temporal domain, we propose a Pyramid Graph Convolutional Network to perform temporal reasoning with multi-scale receptive fields, which can obtain both long-term and short-term storylines in videos. By this means, MSTR can comprehensively explore the multi-scale actions and storylines in spatial-temporal dimensions for social relation reasoning in videos. Extensive experiments on a new large-scale Video Social Relation dataset demonstrate the effectiveness of the proposed framework.
从视觉内容中发现社会关系,如亲情、友谊等,可以让机器更好地解读人类的行为和情感。现有的研究主要集中在从静止图像中识别社会关系,而忽略了另一个重要的媒体——视频。一方面,视频中的动作和故事情节为社会关系识别提供了更重要的线索。另一方面,关键人物可能出现在任意的时空位置,甚至从头到尾都不在同一张图像中。为了克服这些挑战,我们提出了一个多尺度时空推理(MSTR)框架来识别视频中的社会关系。在空间表示方面,我们不仅采用时间段网络来学习全局动作和场景信息,还设计了一个三重图模型来捕捉人与物体之间的视觉关系。在时间域,我们提出了一个金字塔图卷积网络来进行多尺度感受域的时间推理,可以同时获得视频中的长期和短期故事情节。通过这种方式,MSTR可以在时空维度上全面探索视频中社会关系推理的多尺度动作和故事情节。在一个新的大规模视频社会关系数据集上的大量实验证明了该框架的有效性。
{"title":"Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning","authors":"Xinchen Liu, Wu Liu, Meng Zhang, Jingwen Chen, Lianli Gao, C. Yan, Tao Mei","doi":"10.1109/CVPR.2019.00368","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00368","url":null,"abstract":"Discovering social relations, e.g., kinship, friendship, etc., from visual contents can make machines better interpret the behaviors and emotions of human beings. Existing studies mainly focus on recognizing social relations from still images while neglecting another important media--video. On one hand, the actions and storylines in videos provide more important cues for social relation recognition. On the other hand, the key persons may appear at arbitrary spatial-temporal locations, even not in one same image from beginning to the end. To overcome these challenges, we propose a Multi-scale Spatial-Temporal Reasoning (MSTR) framework to recognize social relations from videos. For the spatial representation, we not only adopt a temporal segment network to learn global action and scene information, but also design a Triple Graphs model to capture visual relations between persons and objects. For the temporal domain, we propose a Pyramid Graph Convolutional Network to perform temporal reasoning with multi-scale receptive fields, which can obtain both long-term and short-term storylines in videos. By this means, MSTR can comprehensively explore the multi-scale actions and storylines in spatial-temporal dimensions for social relation reasoning in videos. Extensive experiments on a new large-scale Video Social Relation dataset demonstrate the effectiveness of the proposed framework.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"104 1","pages":"3561-3569"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87486283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth 联合优化空间嵌入和聚类带宽的实例分割
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00904
D. Neven, Bert De Brabandere, M. Proesmans, L. Gool
Current state-of-the-art instance segmentation methods are not suited for real-time applications like autonomous driving, which require fast execution times at high accuracy. Although the currently dominant proposal-based methods have high accuracy, they are slow and generate masks at a fixed and low resolution. Proposal-free methods, by contrast, can generate masks at high resolution and are often faster, but fail to reach the same accuracy as the proposal-based methods. In this work we propose a new clustering loss function for proposal-free instance segmentation. The loss function pulls the spatial embeddings of pixels belonging to the same instance together and jointly learns an instance-specific clustering bandwidth, maximizing the intersection-over-union of the resulting instance mask. When combined with a fast architecture, the network can perform instance segmentation in real-time while maintaining a high accuracy. We evaluate our method on the challenging Cityscapes benchmark and achieve top results (5% improvement over Mask R-CNN) at more than 10 fps on 2MP images.
当前最先进的实例分割方法不适合自动驾驶等实时应用,因为这些应用需要快速的执行时间和高精度。虽然目前主流的基于提议的方法具有很高的精度,但它们速度慢,并且生成的掩码固定且分辨率低。相比之下,无提议的方法可以在高分辨率下生成掩模,并且通常更快,但无法达到与基于提议的方法相同的精度。在这项工作中,我们提出了一种新的聚类损失函数用于无提议的实例分割。损失函数将属于同一实例的像素的空间嵌入拉到一起,并共同学习特定于实例的聚类带宽,最大化所得到的实例掩码的交集-过并。当与快速架构相结合时,网络可以在保持高精度的同时实时执行实例分割。我们在具有挑战性的城市景观基准上评估了我们的方法,并在200万像素的图像上以超过10 fps的速度获得了最佳结果(比Mask R-CNN提高了5%)。
{"title":"Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth","authors":"D. Neven, Bert De Brabandere, M. Proesmans, L. Gool","doi":"10.1109/CVPR.2019.00904","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00904","url":null,"abstract":"Current state-of-the-art instance segmentation methods are not suited for real-time applications like autonomous driving, which require fast execution times at high accuracy. Although the currently dominant proposal-based methods have high accuracy, they are slow and generate masks at a fixed and low resolution. Proposal-free methods, by contrast, can generate masks at high resolution and are often faster, but fail to reach the same accuracy as the proposal-based methods. In this work we propose a new clustering loss function for proposal-free instance segmentation. The loss function pulls the spatial embeddings of pixels belonging to the same instance together and jointly learns an instance-specific clustering bandwidth, maximizing the intersection-over-union of the resulting instance mask. When combined with a fast architecture, the network can perform instance segmentation in real-time while maintaining a high accuracy. We evaluate our method on the challenging Cityscapes benchmark and achieve top results (5% improvement over Mask R-CNN) at more than 10 fps on 2MP images.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"8829-8837"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87766096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 215
Atlas of Digital Pathology: A Generalized Hierarchical Histological Tissue Type-Annotated Database for Deep Learning 数字病理学图谱:用于深度学习的广义分层组织类型注释数据库
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01202
M. S. Hosseini, Lyndon Chan, Gabriel Tse, M. Tang, J. Deng, Sajad Norouzi, C. Rowsell, K. Plataniotis, S. Damaskinos
In recent years, computer vision techniques have made large advances in image recognition and been applied to aid radiological diagnosis. Computational pathology aims to develop similar tools for aiding pathologists in diagnosing digitized histopathological slides, which would improve diagnostic accuracy and productivity amidst increasing workloads. However, there is a lack of publicly-available databases of (1) localized patch-level images annotated with (2) a large range of Histological Tissue Type (HTT). As a result, computational pathology research is constrained to diagnosing specific diseases or classifying tissues from specific organs, and cannot be readily generalized to handle unexpected diseases and organs. In this paper, we propose a new digital pathology database, the ``Atlas of Digital Pathology'' (or ADP), which comprises of 17,668 patch images extracted from 100 slides annotated with up to 57 hierarchical HTTs. Our data is generalized to different tissue types across different organs and aims to provide training data for supervised multi-label learning of patch-level HTT in a digitized whole slide image. We demonstrate the quality of our image labels through pathologist consultation and by training three state-of-the-art neural networks on tissue type classification. Quantitative results support the visually consistency of our data and we demonstrate a tissue type-based visual attention aid as a sample tool that could be developed from our database.
近年来,计算机视觉技术在图像识别方面取得了很大进展,并被应用于辅助放射诊断。计算病理学旨在开发类似的工具来帮助病理学家诊断数字化的组织病理学切片,这将在工作量增加的情况下提高诊断的准确性和生产力。然而,缺乏公开可用的数据库(1)用(2)大范围的组织学组织类型(HTT)注释的局部斑块级图像。因此,计算病理学研究局限于诊断特定疾病或对特定器官的组织进行分类,不能很容易地推广到处理意外的疾病和器官。在本文中,我们提出了一个新的数字病理数据库,“数字病理图谱”(或ADP),它包括从100张幻灯片中提取的17,668张补丁图像,其中注释了多达57个分层html。我们的数据被推广到不同器官的不同组织类型,旨在为数字化整张幻灯片图像中贴片级HTT的监督多标签学习提供训练数据。我们通过病理学家咨询和训练三个最先进的组织类型分类神经网络来展示我们图像标签的质量。定量结果支持我们数据的视觉一致性,我们展示了一个基于组织类型的视觉注意力辅助工具,作为一个样本工具,可以从我们的数据库中开发出来。
{"title":"Atlas of Digital Pathology: A Generalized Hierarchical Histological Tissue Type-Annotated Database for Deep Learning","authors":"M. S. Hosseini, Lyndon Chan, Gabriel Tse, M. Tang, J. Deng, Sajad Norouzi, C. Rowsell, K. Plataniotis, S. Damaskinos","doi":"10.1109/CVPR.2019.01202","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01202","url":null,"abstract":"In recent years, computer vision techniques have made large advances in image recognition and been applied to aid radiological diagnosis. Computational pathology aims to develop similar tools for aiding pathologists in diagnosing digitized histopathological slides, which would improve diagnostic accuracy and productivity amidst increasing workloads. However, there is a lack of publicly-available databases of (1) localized patch-level images annotated with (2) a large range of Histological Tissue Type (HTT). As a result, computational pathology research is constrained to diagnosing specific diseases or classifying tissues from specific organs, and cannot be readily generalized to handle unexpected diseases and organs. In this paper, we propose a new digital pathology database, the ``Atlas of Digital Pathology'' (or ADP), which comprises of 17,668 patch images extracted from 100 slides annotated with up to 57 hierarchical HTTs. Our data is generalized to different tissue types across different organs and aims to provide training data for supervised multi-label learning of patch-level HTT in a digitized whole slide image. We demonstrate the quality of our image labels through pathologist consultation and by training three state-of-the-art neural networks on tissue type classification. Quantitative results support the visually consistency of our data and we demonstrate a tissue type-based visual attention aid as a sample tool that could be developed from our database.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"31 1","pages":"11739-11748"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89642913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
PIEs: Pose Invariant Embeddings 姿态不变嵌入
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01266
Chih-Hui Ho, Pedro Morgado, Amir Persekian, N. Vasconcelos
The role of pose invariance in image recognition and retrieval is studied. A taxonomic classification of embeddings, according to their level of invariance, is introduced and used to clarify connections between existing embeddings, identify missing approaches, and propose invariant generalizations. This leads to a new family of pose invariant embeddings (PIEs), derived from existing approaches by a combination of two models, which follow from the interpretation of CNNs as estimators of class posterior probabilities: a view-to-object model and an object-to-class model. The new pose-invariant models are shown to have interesting properties, both theoretically and through experiments, where they outperform existing multiview approaches. Most notably, they achieve good performance for both 1) classification and retrieval, and 2) single and multiview inference. These are important properties for the design of real vision systems, where universal embeddings are preferable to task specific ones, and multiple images are usually not available at inference time. Finally, a new multiview dataset of real objects, imaged in the wild against complex backgrounds, is introduced. We believe that this is a much needed complement to the synthetic datasets in wide use and will contribute to the advancement of multiview recognition and retrieval.
研究了姿态不变性在图像识别和检索中的作用。根据嵌入的不变性水平,引入了嵌入的分类学分类,并用于澄清现有嵌入之间的联系,识别缺失的方法,并提出不变的概括。这导致了一种新的姿态不变嵌入(pie)家族,该家族源于现有的两种模型的组合方法,这两种模型来自于将cnn解释为类后验概率的估计器:视图到对象模型和对象到类模型。新的姿态不变模型在理论上和实验中都显示出有趣的特性,它们优于现有的多视图方法。最值得注意的是,它们在1)分类和检索以及2)单视图和多视图推理方面都取得了良好的性能。这些是设计真实视觉系统的重要属性,其中通用嵌入比特定任务嵌入更可取,并且在推理时通常无法获得多个图像。最后,介绍了一种新的真实物体的多视图数据集,该数据集是在复杂背景下的野外成像。我们认为,这是对广泛使用的合成数据集的一个急需的补充,将有助于多视图识别和检索的进步。
{"title":"PIEs: Pose Invariant Embeddings","authors":"Chih-Hui Ho, Pedro Morgado, Amir Persekian, N. Vasconcelos","doi":"10.1109/CVPR.2019.01266","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01266","url":null,"abstract":"The role of pose invariance in image recognition and retrieval is studied. A taxonomic classification of embeddings, according to their level of invariance, is introduced and used to clarify connections between existing embeddings, identify missing approaches, and propose invariant generalizations. This leads to a new family of pose invariant embeddings (PIEs), derived from existing approaches by a combination of two models, which follow from the interpretation of CNNs as estimators of class posterior probabilities: a view-to-object model and an object-to-class model. The new pose-invariant models are shown to have interesting properties, both theoretically and through experiments, where they outperform existing multiview approaches. Most notably, they achieve good performance for both 1) classification and retrieval, and 2) single and multiview inference. These are important properties for the design of real vision systems, where universal embeddings are preferable to task specific ones, and multiple images are usually not available at inference time. Finally, a new multiview dataset of real objects, imaged in the wild against complex backgrounds, is introduced. We believe that this is a much needed complement to the synthetic datasets in wide use and will contribute to the advancement of multiview recognition and retrieval.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"18 1","pages":"12369-12378"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89903885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition 球面分形卷积神经网络在点云识别中的应用
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00054
Yongming Rao, Jiwen Lu, Jie Zhou
We present a generic, flexible and 3D rotation invariant framework based on spherical symmetry for point cloud recognition. By introducing regular icosahedral lattice and its fractals to approximate and discretize sphere, convolution can be easily implemented to process 3D points. Based on the fractal structure, a hierarchical feature learning framework together with an adaptive sphere projection module is proposed to learn deep feature in an end-to-end manner. Our framework not only inherits the strong representation power and generalization capability from convolutional neural networks for image recognition, but also extends CNN to learn robust feature resistant to rotations and perturbations. The proposed model is effective yet robust. Comprehensive experimental study demonstrates that our approach can achieve competitive performance compared to state-of-the-art techniques on both 3D object classification and part segmentation tasks, meanwhile, outperform other rotation invariant models on rotated 3D object classification and retrieval tasks by a large margin.
提出了一种通用的、灵活的、基于球对称的三维旋转不变性点云识别框架。通过引入规则的二十面体晶格及其分形来逼近和离散球体,可以方便地实现对三维点的卷积处理。基于分形结构,提出了一种分层特征学习框架和自适应球面投影模块,实现了端到端的深度特征学习。我们的框架不仅继承了卷积神经网络用于图像识别的强大表示能力和泛化能力,而且扩展了CNN学习抗旋转和扰动的鲁棒特征。该模型具有良好的鲁棒性和有效性。综合实验研究表明,我们的方法在三维物体分类和零件分割任务上都可以取得与现有技术相比具有竞争力的性能,同时在旋转三维物体分类和检索任务上也大大优于其他旋转不变量模型。
{"title":"Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition","authors":"Yongming Rao, Jiwen Lu, Jie Zhou","doi":"10.1109/CVPR.2019.00054","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00054","url":null,"abstract":"We present a generic, flexible and 3D rotation invariant framework based on spherical symmetry for point cloud recognition. By introducing regular icosahedral lattice and its fractals to approximate and discretize sphere, convolution can be easily implemented to process 3D points. Based on the fractal structure, a hierarchical feature learning framework together with an adaptive sphere projection module is proposed to learn deep feature in an end-to-end manner. Our framework not only inherits the strong representation power and generalization capability from convolutional neural networks for image recognition, but also extends CNN to learn robust feature resistant to rotations and perturbations. The proposed model is effective yet robust. Comprehensive experimental study demonstrates that our approach can achieve competitive performance compared to state-of-the-art techniques on both 3D object classification and part segmentation tasks, meanwhile, outperform other rotation invariant models on rotated 3D object classification and retrieval tasks by a large margin.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"65 1","pages":"452-460"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84029254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 122
Re-Ranking via Metric Fusion for Object Retrieval and Person Re-Identification 基于度量融合的目标检索和人物再识别排序
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00083
S. Bai, Peng Tang, Philip H. S. Torr, Longin Jan Latecki
This work studies the unsupervised re-ranking procedure for object retrieval and person re-identification with a specific concentration on an ensemble of multiple metrics (or similarities). While the re-ranking step is involved by running a diffusion process on the underlying data manifolds, the fusion step can leverage the complementarity of multiple metrics. We give a comprehensive summary of existing fusion with diffusion strategies, and systematically analyze their pros and cons. Based on the analysis, we propose a unified yet robust algorithm which inherits their advantages and discards their disadvantages. Hence, we call it Unified Ensemble Diffusion (UED). More interestingly, we derive that the inherited properties indeed stem from a theoretical framework, where the relevant works can be elegantly summarized as special cases of UED by imposing additional constraints on the objective function and varying the solver of similarity propagation. Extensive experiments with 3D shape retrieval, image retrieval and person re-identification demonstrate that the proposed framework outperforms the state of the arts, and at the same time suggest that re-ranking via metric fusion is a promising tool to further improve the retrieval performance of existing algorithms.
这项工作研究了对象检索和人员再识别的无监督重新排序过程,并特别关注多个度量(或相似度)的集合。虽然重新排序步骤是通过在底层数据流形上运行扩散过程来完成的,但融合步骤可以利用多个指标的互补性。综合总结了现有的融合扩散策略,系统分析了它们的优缺点,在此基础上提出了一种统一的鲁棒算法,既继承了它们的优点,又去除了它们的缺点。因此,我们称之为统一集成扩散(UED)。更有趣的是,我们得出继承属性确实源于一个理论框架,其中相关工作可以通过对目标函数施加额外约束和改变相似性传播的求解器来优雅地总结为UED的特殊情况。在三维形状检索、图像检索和人物再识别方面的大量实验表明,所提出的框架优于目前的技术水平,同时表明,通过度量融合进行重新排序是一种有前途的工具,可以进一步提高现有算法的检索性能。
{"title":"Re-Ranking via Metric Fusion for Object Retrieval and Person Re-Identification","authors":"S. Bai, Peng Tang, Philip H. S. Torr, Longin Jan Latecki","doi":"10.1109/CVPR.2019.00083","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00083","url":null,"abstract":"This work studies the unsupervised re-ranking procedure for object retrieval and person re-identification with a specific concentration on an ensemble of multiple metrics (or similarities). While the re-ranking step is involved by running a diffusion process on the underlying data manifolds, the fusion step can leverage the complementarity of multiple metrics. We give a comprehensive summary of existing fusion with diffusion strategies, and systematically analyze their pros and cons. Based on the analysis, we propose a unified yet robust algorithm which inherits their advantages and discards their disadvantages. Hence, we call it Unified Ensemble Diffusion (UED). More interestingly, we derive that the inherited properties indeed stem from a theoretical framework, where the relevant works can be elegantly summarized as special cases of UED by imposing additional constraints on the objective function and varying the solver of similarity propagation. Extensive experiments with 3D shape retrieval, image retrieval and person re-identification demonstrate that the proposed framework outperforms the state of the arts, and at the same time suggest that re-ranking via metric fusion is a promising tool to further improve the retrieval performance of existing algorithms.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"73 1","pages":"740-749"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86352676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
Spatially Variant Linear Representation Models for Joint Filtering 联合滤波的空间变线性表示模型
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00180
Jin-shan Pan, Jiangxin Dong, Jimmy S. J. Ren, Liang Lin, Jinhui Tang, Ming-Hsuan Yang
Joint filtering mainly uses an additional guidance image as a prior and transfers its structures to the target image in the filtering process. Different from existing algorithms that rely on locally linear models or hand-designed objective functions to extract the structural information from the guidance image, we propose a new joint filter based on a spatially variant linear representation model (SVLRM), where the target image is linearly represented by the guidance image. However, the SVLRM leads to a highly ill-posed problem. To estimate the linear representation coefficients, we develop an effective algorithm based on a deep convolutional neural network (CNN). The proposed deep CNN (constrained by the SVLRM) is able to estimate the spatially variant linear representation coefficients which are able to model the structural information of both the guidance and input images. We show that the proposed algorithm can be effectively applied to a variety of applications, including depth/RGB image upsampling and restoration, flash/no-flash image deblurring, natural image denoising, scale-aware filtering, etc. Extensive experimental results demonstrate that the proposed algorithm performs favorably against state-of-the-art methods that have been specially designed for each task.
联合滤波主要是利用附加的制导图像作为先验,在滤波过程中将其结构转移到目标图像上。与现有算法依赖局部线性模型或手工设计的目标函数从制导图像中提取结构信息不同,我们提出了一种基于空间变线性表示模型(SVLRM)的联合滤波器,其中目标图像由制导图像线性表示。然而,SVLRM导致了一个高度不适定的问题。为了估计线性表示系数,我们开发了一种基于深度卷积神经网络(CNN)的有效算法。所提出的深度CNN(受SVLRM约束)能够估计空间变化的线性表示系数,该系数能够对制导图像和输入图像的结构信息进行建模。我们的研究表明,该算法可以有效地应用于各种应用,包括深度/RGB图像的上采样和恢复,闪光/无闪光图像去模糊,自然图像去噪,尺度感知滤波等。大量的实验结果表明,所提出的算法优于为每个任务专门设计的最先进的方法。
{"title":"Spatially Variant Linear Representation Models for Joint Filtering","authors":"Jin-shan Pan, Jiangxin Dong, Jimmy S. J. Ren, Liang Lin, Jinhui Tang, Ming-Hsuan Yang","doi":"10.1109/CVPR.2019.00180","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00180","url":null,"abstract":"Joint filtering mainly uses an additional guidance image as a prior and transfers its structures to the target image in the filtering process. Different from existing algorithms that rely on locally linear models or hand-designed objective functions to extract the structural information from the guidance image, we propose a new joint filter based on a spatially variant linear representation model (SVLRM), where the target image is linearly represented by the guidance image. However, the SVLRM leads to a highly ill-posed problem. To estimate the linear representation coefficients, we develop an effective algorithm based on a deep convolutional neural network (CNN). The proposed deep CNN (constrained by the SVLRM) is able to estimate the spatially variant linear representation coefficients which are able to model the structural information of both the guidance and input images. We show that the proposed algorithm can be effectively applied to a variety of applications, including depth/RGB image upsampling and restoration, flash/no-flash image deblurring, natural image denoising, scale-aware filtering, etc. Extensive experimental results demonstrate that the proposed algorithm performs favorably against state-of-the-art methods that have been specially designed for each task.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"8 1","pages":"1702-1711"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82880234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
期刊
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1