首页 > 最新文献

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Dichromatic Model Based Temporal Color Constancy for AC Light Sources 基于二色模型的交流光源时间色常数
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01261
Jun-Sang Yoo, Jong-Ok Kim
Existing dichromatic color constancy approach commonly requires a number of spatial pixels which have high specularity. In this paper, we propose a novel approach to estimate the illuminant chromaticity of AC light source using high-speed camera. We found that the temporal observations of an image pixel at a fixed location distribute on an identical dichromatic plane. Instead of spatial pixels with high specularity, multiple temporal samples of a pixel are exploited to determine AC pixels for dichromatic plane estimation, whose pixel intensity is sinusoidally varying well. A dichromatic plane is calculated per each AC pixel, and illuminant chromaticity is determined by the intersection of dichromatic planes. From multiple dichromatic planes, an optimal illuminant is estimated with a novel MAP framework. It is shown that the proposed method outperforms both existing dichromatic based methods and temporal color constancy methods, irrespective of the amount of specularity.
现有的二色恒常性方法通常需要大量具有高反射性的空间像素。本文提出了一种利用高速摄像机测量交流光源色度的新方法。我们发现在固定位置的图像像素的时间观测分布在相同的二色平面上。利用一个像元的多个时间样本来确定二色平面估计的交流像元,而不是具有高反射率的空间像元,其像元强度具有良好的正弦变化。每个AC像素计算一个二色平面,并且光源色度由二色平面的交集确定。从多个二色平面出发,利用一种新的MAP框架估计出最优光源。结果表明,该方法与现有的基于二色的方法和时间颜色常数的方法相比,无论镜面的多少,都优于现有的方法。
{"title":"Dichromatic Model Based Temporal Color Constancy for AC Light Sources","authors":"Jun-Sang Yoo, Jong-Ok Kim","doi":"10.1109/CVPR.2019.01261","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01261","url":null,"abstract":"Existing dichromatic color constancy approach commonly requires a number of spatial pixels which have high specularity. In this paper, we propose a novel approach to estimate the illuminant chromaticity of AC light source using high-speed camera. We found that the temporal observations of an image pixel at a fixed location distribute on an identical dichromatic plane. Instead of spatial pixels with high specularity, multiple temporal samples of a pixel are exploited to determine AC pixels for dichromatic plane estimation, whose pixel intensity is sinusoidally varying well. A dichromatic plane is calculated per each AC pixel, and illuminant chromaticity is determined by the intersection of dichromatic planes. From multiple dichromatic planes, an optimal illuminant is estimated with a novel MAP framework. It is shown that the proposed method outperforms both existing dichromatic based methods and temporal color constancy methods, irrespective of the amount of specularity.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"55 1","pages":"12321-12330"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80195576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Convolutional Neural Networks Can Be Deceived by Visual Illusions 卷积神经网络可以被视觉错觉欺骗
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01259
Alex Gomez-Villa, Adrián Martín, Javier Vazquez-Corral, M. Bertalmío
Visual illusions teach us that what we see is not always what is represented in the physical world. Their special nature make them a fascinating tool to test and validate any new vision model proposed. In general, current vision models are based on the concatenation of linear and non-linear operations. The similarity of this structure with the operations present in Convolutional Neural Networks (CNNs) has motivated us to study if CNNs trained for low-level visual tasks are deceived by visual illusions. In particular, we show that CNNs trained for image denoising, image deblurring, and computational color constancy are able to replicate the human response to visual illusions, and that the extent of this replication varies with respect to variation in architecture and spatial pattern size. These results suggest that in order to obtain CNNs that better replicate human behaviour, we may need to start aiming for them to better replicate visual illusions.
视觉错觉告诉我们,我们所看到的并不总是现实世界所呈现的。它们的特殊性质使它们成为测试和验证任何新提出的视觉模型的迷人工具。一般来说,当前的视觉模型是基于线性和非线性操作的串联。这种结构与卷积神经网络(cnn)中存在的操作的相似性促使我们研究用于低水平视觉任务的cnn是否被视觉错觉所欺骗。特别是,我们表明cnn训练图像去噪、图像去模糊和计算颜色恒定能够复制人类对视觉错觉的反应,并且这种复制的程度随建筑和空间模式大小的变化而变化。这些结果表明,为了获得更好地复制人类行为的cnn,我们可能需要开始瞄准它们来更好地复制视觉错觉。
{"title":"Convolutional Neural Networks Can Be Deceived by Visual Illusions","authors":"Alex Gomez-Villa, Adrián Martín, Javier Vazquez-Corral, M. Bertalmío","doi":"10.1109/CVPR.2019.01259","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01259","url":null,"abstract":"Visual illusions teach us that what we see is not always what is represented in the physical world. Their special nature make them a fascinating tool to test and validate any new vision model proposed. In general, current vision models are based on the concatenation of linear and non-linear operations. The similarity of this structure with the operations present in Convolutional Neural Networks (CNNs) has motivated us to study if CNNs trained for low-level visual tasks are deceived by visual illusions. In particular, we show that CNNs trained for image denoising, image deblurring, and computational color constancy are able to replicate the human response to visual illusions, and that the extent of this replication varies with respect to variation in architecture and spatial pattern size. These results suggest that in order to obtain CNNs that better replicate human behaviour, we may need to start aiming for them to better replicate visual illusions.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"12301-12309"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84172728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Spatially Variant Linear Representation Models for Joint Filtering 联合滤波的空间变线性表示模型
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00180
Jin-shan Pan, Jiangxin Dong, Jimmy S. J. Ren, Liang Lin, Jinhui Tang, Ming-Hsuan Yang
Joint filtering mainly uses an additional guidance image as a prior and transfers its structures to the target image in the filtering process. Different from existing algorithms that rely on locally linear models or hand-designed objective functions to extract the structural information from the guidance image, we propose a new joint filter based on a spatially variant linear representation model (SVLRM), where the target image is linearly represented by the guidance image. However, the SVLRM leads to a highly ill-posed problem. To estimate the linear representation coefficients, we develop an effective algorithm based on a deep convolutional neural network (CNN). The proposed deep CNN (constrained by the SVLRM) is able to estimate the spatially variant linear representation coefficients which are able to model the structural information of both the guidance and input images. We show that the proposed algorithm can be effectively applied to a variety of applications, including depth/RGB image upsampling and restoration, flash/no-flash image deblurring, natural image denoising, scale-aware filtering, etc. Extensive experimental results demonstrate that the proposed algorithm performs favorably against state-of-the-art methods that have been specially designed for each task.
联合滤波主要是利用附加的制导图像作为先验,在滤波过程中将其结构转移到目标图像上。与现有算法依赖局部线性模型或手工设计的目标函数从制导图像中提取结构信息不同,我们提出了一种基于空间变线性表示模型(SVLRM)的联合滤波器,其中目标图像由制导图像线性表示。然而,SVLRM导致了一个高度不适定的问题。为了估计线性表示系数,我们开发了一种基于深度卷积神经网络(CNN)的有效算法。所提出的深度CNN(受SVLRM约束)能够估计空间变化的线性表示系数,该系数能够对制导图像和输入图像的结构信息进行建模。我们的研究表明,该算法可以有效地应用于各种应用,包括深度/RGB图像的上采样和恢复,闪光/无闪光图像去模糊,自然图像去噪,尺度感知滤波等。大量的实验结果表明,所提出的算法优于为每个任务专门设计的最先进的方法。
{"title":"Spatially Variant Linear Representation Models for Joint Filtering","authors":"Jin-shan Pan, Jiangxin Dong, Jimmy S. J. Ren, Liang Lin, Jinhui Tang, Ming-Hsuan Yang","doi":"10.1109/CVPR.2019.00180","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00180","url":null,"abstract":"Joint filtering mainly uses an additional guidance image as a prior and transfers its structures to the target image in the filtering process. Different from existing algorithms that rely on locally linear models or hand-designed objective functions to extract the structural information from the guidance image, we propose a new joint filter based on a spatially variant linear representation model (SVLRM), where the target image is linearly represented by the guidance image. However, the SVLRM leads to a highly ill-posed problem. To estimate the linear representation coefficients, we develop an effective algorithm based on a deep convolutional neural network (CNN). The proposed deep CNN (constrained by the SVLRM) is able to estimate the spatially variant linear representation coefficients which are able to model the structural information of both the guidance and input images. We show that the proposed algorithm can be effectively applied to a variety of applications, including depth/RGB image upsampling and restoration, flash/no-flash image deblurring, natural image denoising, scale-aware filtering, etc. Extensive experimental results demonstrate that the proposed algorithm performs favorably against state-of-the-art methods that have been specially designed for each task.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"8 1","pages":"1702-1711"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82880234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Adaptive Transfer Network for Cross-Domain Person Re-Identification 跨域人员再识别的自适应转移网络
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00737
Jiawei Liu, Zhengjun Zha, Di Chen, Richang Hong, Meng Wang
Recent deep learning based person re-identification approaches have steadily improved the performance for benchmarks, however they often fail to generalize well from one domain to another. In this work, we propose a novel adaptive transfer network (ATNet) for effective cross-domain person re-identification. ATNet looks into the essential causes of domain gap and addresses it following the principle of "divide-and-conquer". It decomposes the complicated cross-domain transfer into a set of factor-wise sub-transfers, each of which concentrates on style transfer with respect to a certain imaging factor, e.g., illumination, resolution and camera view etc. An adaptive ensemble strategy is proposed to fuse factor-wise transfers by perceiving the affect magnitudes of various factors on images. Such "decomposition-and-ensemble" strategy gives ATNet the capability of precise style transfer at factor level and eventually effective transfer across domains. In particular, ATNet consists of a transfer network composed by multiple factor-wise CycleGANs and an ensemble CycleGAN as well as a selection network that infers the affects of different factors on transferring each image. Extensive experimental results on three widely-used datasets, i.e., Market-1501, DukeMTMC-reID and PRID2011 have demonstrated the effectiveness of the proposed ATNet with significant performance improvements over state-of-the-art methods.
最近基于深度学习的人员再识别方法已经稳步提高了基准测试的性能,但是它们往往不能很好地从一个领域推广到另一个领域。在这项工作中,我们提出了一种新的自适应迁移网络(ATNet),用于有效的跨域人员再识别。ATNet研究了产生域名鸿沟的根本原因,并遵循“分而治之”的原则加以解决。它将复杂的跨域迁移分解为一组基于因子的子迁移,每个子迁移集中于相对于特定成像因子的风格迁移,例如照明,分辨率和相机视图等。提出了一种自适应集成策略,通过感知各种因素对图像的影响程度来融合因子迁移。这种“分解-集成”策略使ATNet能够在要素水平上进行精确的风格迁移,并最终实现有效的跨域迁移。其中,ATNet包括由多因子CycleGAN和集成CycleGAN组成的传输网络,以及推断不同因素对传输每张图像的影响的选择网络。在三个广泛使用的数据集(即Market-1501, DukeMTMC-reID和PRID2011)上的大量实验结果证明了所提出的ATNet的有效性,其性能优于最先进的方法。
{"title":"Adaptive Transfer Network for Cross-Domain Person Re-Identification","authors":"Jiawei Liu, Zhengjun Zha, Di Chen, Richang Hong, Meng Wang","doi":"10.1109/CVPR.2019.00737","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00737","url":null,"abstract":"Recent deep learning based person re-identification approaches have steadily improved the performance for benchmarks, however they often fail to generalize well from one domain to another. In this work, we propose a novel adaptive transfer network (ATNet) for effective cross-domain person re-identification. ATNet looks into the essential causes of domain gap and addresses it following the principle of \"divide-and-conquer\". It decomposes the complicated cross-domain transfer into a set of factor-wise sub-transfers, each of which concentrates on style transfer with respect to a certain imaging factor, e.g., illumination, resolution and camera view etc. An adaptive ensemble strategy is proposed to fuse factor-wise transfers by perceiving the affect magnitudes of various factors on images. Such \"decomposition-and-ensemble\" strategy gives ATNet the capability of precise style transfer at factor level and eventually effective transfer across domains. In particular, ATNet consists of a transfer network composed by multiple factor-wise CycleGANs and an ensemble CycleGAN as well as a selection network that infers the affects of different factors on transferring each image. Extensive experimental results on three widely-used datasets, i.e., Market-1501, DukeMTMC-reID and PRID2011 have demonstrated the effectiveness of the proposed ATNet with significant performance improvements over state-of-the-art methods.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"57 1","pages":"7195-7204"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82802948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 217
Spectral Reconstruction From Dispersive Blur: A Novel Light Efficient Spectral Imager 色散模糊光谱重建:一种新型光效光谱成像仪
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01248
Yuanyuan Zhao, Xue-mei Hu, Hui Guo, Zhan Ma, Tao Yue, Xun Cao
Developing high light efficiency imaging techniques to retrieve high dimensional optical signal is a long-term goal in computational photography. Multispectral imaging, which captures images of different wavelengths and boosting the abilities for revealing scene properties, has developed rapidly in the last few decades. From scanning method to snapshot imaging, the limit of light collection efficiency is kept being pushed which enables wider applications especially under the light-starved scenes. In this work, we propose a novel multispectral imaging technique, that could capture the multispectral images with a high light efficiency. Through investigating the dispersive blur caused by spectral dispersers and introducing the difference of blur (DoB) constraints, we propose a basic theory for capturing multispectral information from a single dispersive-blurred image and an additional spectrum of an arbitrary point in the scene. Based on the theory, we design a prototype system and develop an optimization algorithm to realize snapshot multispectral imaging. The effectiveness of the proposed method is verified on both the synthetic data and real captured images.
开发高光效成像技术来检索高维光信号是计算摄影的长期目标。在过去的几十年里,多光谱成像技术得到了迅速发展,它可以捕捉不同波长的图像,提高揭示场景特性的能力。从扫描方法到快照成像,不断突破光收集效率的极限,使其在光缺乏场景下的应用更加广泛。在这项工作中,我们提出了一种新的多光谱成像技术,可以以高光效捕获多光谱图像。通过研究光谱分散剂引起的色散模糊,并引入模糊约束的差异,提出了从单幅色散模糊图像和场景中任意点的附加光谱中捕获多光谱信息的基本理论。在此基础上,我们设计了一个原型系统,并开发了一种优化算法来实现快照多光谱成像。在合成数据和实际捕获图像上验证了该方法的有效性。
{"title":"Spectral Reconstruction From Dispersive Blur: A Novel Light Efficient Spectral Imager","authors":"Yuanyuan Zhao, Xue-mei Hu, Hui Guo, Zhan Ma, Tao Yue, Xun Cao","doi":"10.1109/CVPR.2019.01248","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01248","url":null,"abstract":"Developing high light efficiency imaging techniques to retrieve high dimensional optical signal is a long-term goal in computational photography. Multispectral imaging, which captures images of different wavelengths and boosting the abilities for revealing scene properties, has developed rapidly in the last few decades. From scanning method to snapshot imaging, the limit of light collection efficiency is kept being pushed which enables wider applications especially under the light-starved scenes. In this work, we propose a novel multispectral imaging technique, that could capture the multispectral images with a high light efficiency. Through investigating the dispersive blur caused by spectral dispersers and introducing the difference of blur (DoB) constraints, we propose a basic theory for capturing multispectral information from a single dispersive-blurred image and an additional spectrum of an arbitrary point in the scene. Based on the theory, we design a prototype system and develop an optimization algorithm to realize snapshot multispectral imaging. The effectiveness of the proposed method is verified on both the synthetic data and real captured images.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"76 1","pages":"12194-12203"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89520518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Atlas of Digital Pathology: A Generalized Hierarchical Histological Tissue Type-Annotated Database for Deep Learning 数字病理学图谱:用于深度学习的广义分层组织类型注释数据库
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01202
M. S. Hosseini, Lyndon Chan, Gabriel Tse, M. Tang, J. Deng, Sajad Norouzi, C. Rowsell, K. Plataniotis, S. Damaskinos
In recent years, computer vision techniques have made large advances in image recognition and been applied to aid radiological diagnosis. Computational pathology aims to develop similar tools for aiding pathologists in diagnosing digitized histopathological slides, which would improve diagnostic accuracy and productivity amidst increasing workloads. However, there is a lack of publicly-available databases of (1) localized patch-level images annotated with (2) a large range of Histological Tissue Type (HTT). As a result, computational pathology research is constrained to diagnosing specific diseases or classifying tissues from specific organs, and cannot be readily generalized to handle unexpected diseases and organs. In this paper, we propose a new digital pathology database, the ``Atlas of Digital Pathology'' (or ADP), which comprises of 17,668 patch images extracted from 100 slides annotated with up to 57 hierarchical HTTs. Our data is generalized to different tissue types across different organs and aims to provide training data for supervised multi-label learning of patch-level HTT in a digitized whole slide image. We demonstrate the quality of our image labels through pathologist consultation and by training three state-of-the-art neural networks on tissue type classification. Quantitative results support the visually consistency of our data and we demonstrate a tissue type-based visual attention aid as a sample tool that could be developed from our database.
近年来,计算机视觉技术在图像识别方面取得了很大进展,并被应用于辅助放射诊断。计算病理学旨在开发类似的工具来帮助病理学家诊断数字化的组织病理学切片,这将在工作量增加的情况下提高诊断的准确性和生产力。然而,缺乏公开可用的数据库(1)用(2)大范围的组织学组织类型(HTT)注释的局部斑块级图像。因此,计算病理学研究局限于诊断特定疾病或对特定器官的组织进行分类,不能很容易地推广到处理意外的疾病和器官。在本文中,我们提出了一个新的数字病理数据库,“数字病理图谱”(或ADP),它包括从100张幻灯片中提取的17,668张补丁图像,其中注释了多达57个分层html。我们的数据被推广到不同器官的不同组织类型,旨在为数字化整张幻灯片图像中贴片级HTT的监督多标签学习提供训练数据。我们通过病理学家咨询和训练三个最先进的组织类型分类神经网络来展示我们图像标签的质量。定量结果支持我们数据的视觉一致性,我们展示了一个基于组织类型的视觉注意力辅助工具,作为一个样本工具,可以从我们的数据库中开发出来。
{"title":"Atlas of Digital Pathology: A Generalized Hierarchical Histological Tissue Type-Annotated Database for Deep Learning","authors":"M. S. Hosseini, Lyndon Chan, Gabriel Tse, M. Tang, J. Deng, Sajad Norouzi, C. Rowsell, K. Plataniotis, S. Damaskinos","doi":"10.1109/CVPR.2019.01202","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01202","url":null,"abstract":"In recent years, computer vision techniques have made large advances in image recognition and been applied to aid radiological diagnosis. Computational pathology aims to develop similar tools for aiding pathologists in diagnosing digitized histopathological slides, which would improve diagnostic accuracy and productivity amidst increasing workloads. However, there is a lack of publicly-available databases of (1) localized patch-level images annotated with (2) a large range of Histological Tissue Type (HTT). As a result, computational pathology research is constrained to diagnosing specific diseases or classifying tissues from specific organs, and cannot be readily generalized to handle unexpected diseases and organs. In this paper, we propose a new digital pathology database, the ``Atlas of Digital Pathology'' (or ADP), which comprises of 17,668 patch images extracted from 100 slides annotated with up to 57 hierarchical HTTs. Our data is generalized to different tissue types across different organs and aims to provide training data for supervised multi-label learning of patch-level HTT in a digitized whole slide image. We demonstrate the quality of our image labels through pathologist consultation and by training three state-of-the-art neural networks on tissue type classification. Quantitative results support the visually consistency of our data and we demonstrate a tissue type-based visual attention aid as a sample tool that could be developed from our database.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"31 1","pages":"11739-11748"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89642913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
PIEs: Pose Invariant Embeddings 姿态不变嵌入
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01266
Chih-Hui Ho, Pedro Morgado, Amir Persekian, N. Vasconcelos
The role of pose invariance in image recognition and retrieval is studied. A taxonomic classification of embeddings, according to their level of invariance, is introduced and used to clarify connections between existing embeddings, identify missing approaches, and propose invariant generalizations. This leads to a new family of pose invariant embeddings (PIEs), derived from existing approaches by a combination of two models, which follow from the interpretation of CNNs as estimators of class posterior probabilities: a view-to-object model and an object-to-class model. The new pose-invariant models are shown to have interesting properties, both theoretically and through experiments, where they outperform existing multiview approaches. Most notably, they achieve good performance for both 1) classification and retrieval, and 2) single and multiview inference. These are important properties for the design of real vision systems, where universal embeddings are preferable to task specific ones, and multiple images are usually not available at inference time. Finally, a new multiview dataset of real objects, imaged in the wild against complex backgrounds, is introduced. We believe that this is a much needed complement to the synthetic datasets in wide use and will contribute to the advancement of multiview recognition and retrieval.
研究了姿态不变性在图像识别和检索中的作用。根据嵌入的不变性水平,引入了嵌入的分类学分类,并用于澄清现有嵌入之间的联系,识别缺失的方法,并提出不变的概括。这导致了一种新的姿态不变嵌入(pie)家族,该家族源于现有的两种模型的组合方法,这两种模型来自于将cnn解释为类后验概率的估计器:视图到对象模型和对象到类模型。新的姿态不变模型在理论上和实验中都显示出有趣的特性,它们优于现有的多视图方法。最值得注意的是,它们在1)分类和检索以及2)单视图和多视图推理方面都取得了良好的性能。这些是设计真实视觉系统的重要属性,其中通用嵌入比特定任务嵌入更可取,并且在推理时通常无法获得多个图像。最后,介绍了一种新的真实物体的多视图数据集,该数据集是在复杂背景下的野外成像。我们认为,这是对广泛使用的合成数据集的一个急需的补充,将有助于多视图识别和检索的进步。
{"title":"PIEs: Pose Invariant Embeddings","authors":"Chih-Hui Ho, Pedro Morgado, Amir Persekian, N. Vasconcelos","doi":"10.1109/CVPR.2019.01266","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01266","url":null,"abstract":"The role of pose invariance in image recognition and retrieval is studied. A taxonomic classification of embeddings, according to their level of invariance, is introduced and used to clarify connections between existing embeddings, identify missing approaches, and propose invariant generalizations. This leads to a new family of pose invariant embeddings (PIEs), derived from existing approaches by a combination of two models, which follow from the interpretation of CNNs as estimators of class posterior probabilities: a view-to-object model and an object-to-class model. The new pose-invariant models are shown to have interesting properties, both theoretically and through experiments, where they outperform existing multiview approaches. Most notably, they achieve good performance for both 1) classification and retrieval, and 2) single and multiview inference. These are important properties for the design of real vision systems, where universal embeddings are preferable to task specific ones, and multiple images are usually not available at inference time. Finally, a new multiview dataset of real objects, imaged in the wild against complex backgrounds, is introduced. We believe that this is a much needed complement to the synthetic datasets in wide use and will contribute to the advancement of multiview recognition and retrieval.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"18 1","pages":"12369-12378"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89903885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Feedback Adversarial Learning: Spatial Feedback for Improving Generative Adversarial Networks 反馈对抗学习:改进生成对抗网络的空间反馈
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00157
Minyoung Huh, Shao-Hua Sun, Ning Zhang
We propose feedback adversarial learning (FAL) framework that can improve existing generative adversarial networks by leveraging spatial feedback from the discriminator. We formulate the generation task as a recurrent framework, in which the discriminator’s feedback is integrated into the feedforward path of the generation process. Specifically, the generator conditions on the discriminator’s spatial output response, and its previous generation to improve generation quality over time – allowing the generator to attend and fix its previous mistakes. To effectively utilize the feedback, we propose an adaptive spatial transform layer, which learns to spatially modulate feature maps from its previous generation and the error signal from the discriminator. We demonstrate that one can easily adapt FAL to existing adversarial learning frameworks on a wide range of tasks, including image generation, image-to-image translation, and voxel generation.
我们提出了反馈对抗学习(FAL)框架,该框架可以通过利用来自鉴别器的空间反馈来改进现有的生成对抗网络。我们将生成任务制定为一个循环框架,其中鉴别器的反馈集成到生成过程的前馈路径中。具体来说,发电机的条件是鉴别器的空间输出响应,以及它的上一代随着时间的推移提高发电质量——允许发电机参加并修复它以前的错误。为了有效地利用反馈,我们提出了一种自适应空间变换层,该层学习对上一代特征映射和鉴别器的误差信号进行空间调制。我们证明了FAL可以很容易地适应现有的对抗性学习框架,用于广泛的任务,包括图像生成、图像到图像的翻译和体素生成。
{"title":"Feedback Adversarial Learning: Spatial Feedback for Improving Generative Adversarial Networks","authors":"Minyoung Huh, Shao-Hua Sun, Ning Zhang","doi":"10.1109/CVPR.2019.00157","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00157","url":null,"abstract":"We propose feedback adversarial learning (FAL) framework that can improve existing generative adversarial networks by leveraging spatial feedback from the discriminator. We formulate the generation task as a recurrent framework, in which the discriminator’s feedback is integrated into the feedforward path of the generation process. Specifically, the generator conditions on the discriminator’s spatial output response, and its previous generation to improve generation quality over time – allowing the generator to attend and fix its previous mistakes. To effectively utilize the feedback, we propose an adaptive spatial transform layer, which learns to spatially modulate feature maps from its previous generation and the error signal from the discriminator. We demonstrate that one can easily adapt FAL to existing adversarial learning frameworks on a wide range of tasks, including image generation, image-to-image translation, and voxel generation.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"s3-35 1","pages":"1476-1485"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90826475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection 一分耕耘一分收获:使用视频为弱监督对象检测生成高精度对象建议
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00964
Krishna Kumar Singh, Yong Jae Lee
We propose a novel way of using videos to obtain high precision object proposals for weakly-supervised object detection. Existing weakly-supervised detection approaches use off-the-shelf proposal methods like edge boxes or selective search to obtain candidate boxes. These methods provide high recall but at the expense of thousands of noisy proposals. Thus, the entire burden of finding the few relevant object regions is left to the ensuing object mining step. To mitigate this issue, we focus instead on improving the precision of the initial candidate object proposals. Since we cannot rely on localization annotations, we turn to video and leverage motion cues to automatically estimate the extent of objects to train a Weakly-supervised Region Proposal Network (W-RPN). We use the W-RPN to generate high precision object proposals, which are in turn used to re-rank high recall proposals like edge boxes or selective search according to their spatial overlap. Our W-RPN proposals lead to significant improvement in performance for state-of-the-art weakly-supervised object detection approaches on PASCAL VOC 2007 and 2012.
我们提出了一种利用视频获取高精度目标建议的新方法,用于弱监督目标检测。现有的弱监督检测方法使用现成的建议方法,如边缘盒或选择性搜索来获得候选盒。这些方法提供了高召回率,但代价是成千上万的噪声提议。因此,寻找少数相关对象区域的全部负担留给了随后的对象挖掘步骤。为了缓解这个问题,我们将重点放在提高初始候选对象建议的精度上。由于我们不能依赖于定位注释,我们转向视频并利用运动线索来自动估计对象的范围,以训练弱监督区域建议网络(W-RPN)。我们使用W-RPN来生成高精度的目标建议,然后根据它们的空间重叠来重新排序高召回率的建议,如边缘盒或选择性搜索。我们的W-RPN提案在PASCAL VOC 2007和2012上显著改善了最先进的弱监督目标检测方法的性能。
{"title":"You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection","authors":"Krishna Kumar Singh, Yong Jae Lee","doi":"10.1109/CVPR.2019.00964","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00964","url":null,"abstract":"We propose a novel way of using videos to obtain high precision object proposals for weakly-supervised object detection. Existing weakly-supervised detection approaches use off-the-shelf proposal methods like edge boxes or selective search to obtain candidate boxes. These methods provide high recall but at the expense of thousands of noisy proposals. Thus, the entire burden of finding the few relevant object regions is left to the ensuing object mining step. To mitigate this issue, we focus instead on improving the precision of the initial candidate object proposals. Since we cannot rely on localization annotations, we turn to video and leverage motion cues to automatically estimate the extent of objects to train a Weakly-supervised Region Proposal Network (W-RPN). We use the W-RPN to generate high precision object proposals, which are in turn used to re-rank high recall proposals like edge boxes or selective search according to their spatial overlap. Our W-RPN proposals lead to significant improvement in performance for state-of-the-art weakly-supervised object detection approaches on PASCAL VOC 2007 and 2012.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"125 4","pages":"9406-9414"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91432660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
KE-GAN: Knowledge Embedded Generative Adversarial Networks for Semi-Supervised Scene Parsing 面向半监督场景分析的知识嵌入式生成对抗网络
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00538
Mengshi Qi, Yunhong Wang, Jie Qin, Annan Li
In recent years, scene parsing has captured increasing attention in computer vision. Previous works have demonstrated promising performance in this task. However, they mainly utilize holistic features, whilst neglecting the rich semantic knowledge and inter-object relationships in the scene. In addition, these methods usually require a large number of pixel-level annotations, which is too expensive in practice. In this paper, we propose a novel Knowledge Embedded Generative Adversarial Networks, dubbed as KE-GAN, to tackle the challenging problem in a semi-supervised fashion. KE-GAN captures semantic consistencies of different categories by devising a Knowledge Graph from the large-scale text corpus. In addition to readily-available unlabeled data, we generate synthetic images to unveil rich structural information underlying the images. Moreover, a pyramid architecture is incorporated into the discriminator to acquire multi-scale contextual information for better parsing results. Extensive experimental results on four standard benchmarks demonstrate that KE-GAN is capable of improving semantic consistencies and learning better representations for scene parsing, resulting in the state-of-the-art performance.
近年来,场景分析在计算机视觉领域受到越来越多的关注。以前的工作已经证明了这一任务的良好性能。然而,它们主要利用整体特征,而忽略了场景中丰富的语义知识和对象间关系。此外,这些方法通常需要大量的像素级注释,在实践中成本太高。在本文中,我们提出了一种新的知识嵌入式生成对抗网络,称为KE-GAN,以半监督的方式解决具有挑战性的问题。KE-GAN通过从大规模文本语料库中设计知识图来捕获不同类别的语义一致性。除了易于获得的未标记数据外,我们还生成合成图像,以揭示图像背后丰富的结构信息。此外,在鉴别器中加入了金字塔结构,以获取多尺度上下文信息,从而获得更好的解析结果。在四个标准基准上的大量实验结果表明,KE-GAN能够提高语义一致性并学习更好的场景解析表示,从而获得最先进的性能。
{"title":"KE-GAN: Knowledge Embedded Generative Adversarial Networks for Semi-Supervised Scene Parsing","authors":"Mengshi Qi, Yunhong Wang, Jie Qin, Annan Li","doi":"10.1109/CVPR.2019.00538","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00538","url":null,"abstract":"In recent years, scene parsing has captured increasing attention in computer vision. Previous works have demonstrated promising performance in this task. However, they mainly utilize holistic features, whilst neglecting the rich semantic knowledge and inter-object relationships in the scene. In addition, these methods usually require a large number of pixel-level annotations, which is too expensive in practice. In this paper, we propose a novel Knowledge Embedded Generative Adversarial Networks, dubbed as KE-GAN, to tackle the challenging problem in a semi-supervised fashion. KE-GAN captures semantic consistencies of different categories by devising a Knowledge Graph from the large-scale text corpus. In addition to readily-available unlabeled data, we generate synthetic images to unveil rich structural information underlying the images. Moreover, a pyramid architecture is incorporated into the discriminator to acquire multi-scale contextual information for better parsing results. Extensive experimental results on four standard benchmarks demonstrate that KE-GAN is capable of improving semantic consistencies and learning better representations for scene parsing, resulting in the state-of-the-art performance.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"45 1","pages":"5232-5241"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91020473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
期刊
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1