首页 > 最新文献

2019 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

英文 中文
Siamese Networks: The Tale of Two Manifolds 暹罗网络:两个流形的故事
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00314
S. Roy, M. Harandi, R. Nock, R. Hartley
Siamese networks are non-linear deep models that have found their ways into a broad set of problems in learning theory, thanks to their embedding capabilities. In this paper, we study Siamese networks from a new perspective and question the validity of their training procedure. We show that in the majority of cases, the objective of a Siamese network is endowed with an invariance property. Neglecting the invariance property leads to a hindrance in training the Siamese networks. To alleviate this issue, we propose two Riemannian structures and generalize a well-established accelerated stochastic gradient descent method to take into account the proposed Riemannian structures. Our empirical evaluations suggest that by making use of the Riemannian geometry, we achieve state-of-the-art results against several algorithms for the challenging problem of fine-grained image classification.
暹罗网络是一种非线性深度模型,由于其嵌入能力,它已经在学习理论的广泛问题中找到了自己的方法。在本文中,我们从一个新的角度来研究暹罗网络,并质疑其训练程序的有效性。我们证明,在大多数情况下,暹罗网络的目标具有不变性。忽略网络的不变性会阻碍网络的训练。为了缓解这一问题,我们提出了两种黎曼结构,并推广了一种成熟的加速随机梯度下降方法来考虑所提出的黎曼结构。我们的经验评估表明,通过使用黎曼几何,我们针对细粒度图像分类的挑战性问题的几种算法获得了最先进的结果。
{"title":"Siamese Networks: The Tale of Two Manifolds","authors":"S. Roy, M. Harandi, R. Nock, R. Hartley","doi":"10.1109/ICCV.2019.00314","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00314","url":null,"abstract":"Siamese networks are non-linear deep models that have found their ways into a broad set of problems in learning theory, thanks to their embedding capabilities. In this paper, we study Siamese networks from a new perspective and question the validity of their training procedure. We show that in the majority of cases, the objective of a Siamese network is endowed with an invariance property. Neglecting the invariance property leads to a hindrance in training the Siamese networks. To alleviate this issue, we propose two Riemannian structures and generalize a well-established accelerated stochastic gradient descent method to take into account the proposed Riemannian structures. Our empirical evaluations suggest that by making use of the Riemannian geometry, we achieve state-of-the-art results against several algorithms for the challenging problem of fine-grained image classification.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"25 1","pages":"3046-3055"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89067744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Deep Depth From Aberration Map 深度从像差图
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00417
M. Kashiwagi, Nao Mishima, Tatsuo Kozakaya, S. Hiura
Passive and convenient depth estimation from single-shot image is still an open problem. Existing depth from defocus methods require multiple input images or special hardware customization. Recent deep monocular depth estimation is also limited to an image with sufficient contextual information. In this work, we propose a novel method which realizes a single-shot deep depth measurement based on physical depth cue using only an off-the-shelf camera and lens. When a defocused image is taken by a camera, it contains various types of aberrations corresponding to distances from the image sensor and positions in the image plane. We call these minute and complexly compound aberrations as Aberration Map (A-Map) and we found that A-Map can be utilized as reliable physical depth cue. Additionally, our deep network named A-Map Analysis Network (AMA-Net) is also proposed, which can effectively learn and estimate depth via A-Map. To evaluate validity and robustness of our approach, we have conducted extensive experiments using both real outdoor scenes and simulated images. The qualitative result shows the accuracy and availability of the method in comparison with a state-of-the-art deep context-based method.
被动且方便的单张图像深度估计仍然是一个有待解决的问题。现有的离焦深度方法需要多个输入图像或特殊的硬件定制。目前的深单目深度估计也仅限于具有足够上下文信息的图像。在这项工作中,我们提出了一种基于物理深度线索的单镜头深度测量方法。当相机拍摄散焦图像时,它包含与图像传感器的距离和图像平面上的位置相应的各种类型的像差。我们将这些微小而复杂的复合像差称为像差图(A-Map),我们发现A-Map可以作为可靠的物理深度线索。此外,我们还提出了我们的深度网络——A-Map分析网络(AMA-Net),它可以通过A-Map有效地学习和估计深度。为了评估我们方法的有效性和鲁棒性,我们使用真实的户外场景和模拟图像进行了广泛的实验。定性结果表明,与目前最先进的基于深度上下文的方法相比,该方法的准确性和可用性。
{"title":"Deep Depth From Aberration Map","authors":"M. Kashiwagi, Nao Mishima, Tatsuo Kozakaya, S. Hiura","doi":"10.1109/ICCV.2019.00417","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00417","url":null,"abstract":"Passive and convenient depth estimation from single-shot image is still an open problem. Existing depth from defocus methods require multiple input images or special hardware customization. Recent deep monocular depth estimation is also limited to an image with sufficient contextual information. In this work, we propose a novel method which realizes a single-shot deep depth measurement based on physical depth cue using only an off-the-shelf camera and lens. When a defocused image is taken by a camera, it contains various types of aberrations corresponding to distances from the image sensor and positions in the image plane. We call these minute and complexly compound aberrations as Aberration Map (A-Map) and we found that A-Map can be utilized as reliable physical depth cue. Additionally, our deep network named A-Map Analysis Network (AMA-Net) is also proposed, which can effectively learn and estimate depth via A-Map. To evaluate validity and robustness of our approach, we have conducted extensive experiments using both real outdoor scenes and simulated images. The qualitative result shows the accuracy and availability of the method in comparison with a state-of-the-art deep context-based method.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"81 1","pages":"4069-4078"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90491012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Deep CG2Real: Synthetic-to-Real Translation via Image Disentanglement Deep CG2Real:基于图像解纠缠的合成到真实的翻译
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00282
Sai Bi, Kalyan Sunkavalli, Federico Perazzi, Eli Shechtman, Vladimir G. Kim, R. Ramamoorthi, U. Diego
We present a method to improve the visual realism of low-quality, synthetic images, e.g. OpenGL renderings. Training an unpaired synthetic-to-real translation network in image space is severely under-constrained and produces visible artifacts. Instead, we propose a semi-supervised approach that operates on the disentangled shading and albedo layers of the image. Our two-stage pipeline first learns to predict accurate shading in a supervised fashion using physically-based renderings as targets, and further increases the realism of the textures and shading with an improved CycleGAN network. Extensive evaluations on the SUNCG indoor scene dataset demonstrate that our approach yields more realistic images compared to other state-of-the-art approaches. Furthermore, networks trained on our generated ``real'' images predict more accurate depth and normals than domain adaptation approaches, suggesting that improving the visual realism of the images can be more effective than imposing task-specific losses.
我们提出了一种提高低质量合成图像(如OpenGL渲染)的视觉真实感的方法。在图像空间中训练一个不成对的合成到真实的翻译网络是严重缺乏约束的,并且会产生可见的伪影。相反,我们提出了一种半监督方法,该方法对图像的未纠缠的阴影和反照率层进行操作。我们的两阶段管道首先学习使用基于物理的渲染作为目标,以监督的方式预测准确的阴影,并通过改进的CycleGAN网络进一步增加纹理和阴影的真实感。对SUNCG室内场景数据集的广泛评估表明,与其他最先进的方法相比,我们的方法产生了更逼真的图像。此外,在我们生成的“真实”图像上训练的网络预测的深度和法线比域适应方法更准确,这表明提高图像的视觉真实感比强加特定任务的损失更有效。
{"title":"Deep CG2Real: Synthetic-to-Real Translation via Image Disentanglement","authors":"Sai Bi, Kalyan Sunkavalli, Federico Perazzi, Eli Shechtman, Vladimir G. Kim, R. Ramamoorthi, U. Diego","doi":"10.1109/ICCV.2019.00282","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00282","url":null,"abstract":"We present a method to improve the visual realism of low-quality, synthetic images, e.g. OpenGL renderings. Training an unpaired synthetic-to-real translation network in image space is severely under-constrained and produces visible artifacts. Instead, we propose a semi-supervised approach that operates on the disentangled shading and albedo layers of the image. Our two-stage pipeline first learns to predict accurate shading in a supervised fashion using physically-based renderings as targets, and further increases the realism of the textures and shading with an improved CycleGAN network. Extensive evaluations on the SUNCG indoor scene dataset demonstrate that our approach yields more realistic images compared to other state-of-the-art approaches. Furthermore, networks trained on our generated ``real'' images predict more accurate depth and normals than domain adaptation approaches, suggesting that improving the visual realism of the images can be more effective than imposing task-specific losses.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"120 1","pages":"2730-2739"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89138228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Fair Loss: Margin-Aware Reinforcement Learning for Deep Face Recognition 公平损失:深度人脸识别的边缘感知强化学习
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.01015
Bingyu Liu, Weihong Deng, Yaoyao Zhong, Mei Wang, Jiani Hu, Xunqiang Tao, Yaohai Huang
Recently, large-margin softmax loss methods, such as angular softmax loss (SphereFace), large margin cosine loss (CosFace), and additive angular margin loss (ArcFace), have demonstrated impressive performance on deep face recognition. These methods incorporate a fixed additive margin to all the classes, ignoring the class imbalance problem. However, imbalanced problem widely exists in various real-world face datasets, in which samples from some classes are in a higher number than others. We argue that the number of a class would influence its demand for the additive margin. In this paper, we introduce a new margin-aware reinforcement learning based loss function, namely fair loss, in which each class will learn an appropriate adaptive margin by Deep Q-learning. Specifically, we train an agent to learn a margin adaptive strategy for each class, and make the additive margins for different classes more reasonable. Our method has better performance than present large-margin loss functions on three benchmarks, Labeled Face in the Wild (LFW), Youtube Faces (YTF) and MegaFace, which demonstrates that our method could learn better face representation on imbalanced face datasets.
近年来,角softmax损失(SphereFace)、大余弦损失(CosFace)和加性角余弦损失(ArcFace)等大余弦损失方法在深度人脸识别中表现出了令人印象深刻的性能。这些方法在所有类中加入了固定的附加裕度,忽略了类不平衡问题。然而,不平衡问题在现实世界的各种人脸数据集中广泛存在,其中某些类别的样本数量多于其他类别。我们认为,一个类别的数量将影响其对附加边际的需求。在本文中,我们引入了一种新的基于边缘感知强化学习的损失函数,即公平损失,其中每个类将通过深度q学习学习适当的自适应边缘。具体来说,我们训练智能体学习每个类别的边际自适应策略,使不同类别的附加边际更加合理。我们的方法在label Face in the Wild (LFW)、Youtube Faces (YTF)和MegaFace三个基准上的性能优于现有的大边际损失函数,这表明我们的方法可以在不平衡的人脸数据集上学习到更好的人脸表示。
{"title":"Fair Loss: Margin-Aware Reinforcement Learning for Deep Face Recognition","authors":"Bingyu Liu, Weihong Deng, Yaoyao Zhong, Mei Wang, Jiani Hu, Xunqiang Tao, Yaohai Huang","doi":"10.1109/ICCV.2019.01015","DOIUrl":"https://doi.org/10.1109/ICCV.2019.01015","url":null,"abstract":"Recently, large-margin softmax loss methods, such as angular softmax loss (SphereFace), large margin cosine loss (CosFace), and additive angular margin loss (ArcFace), have demonstrated impressive performance on deep face recognition. These methods incorporate a fixed additive margin to all the classes, ignoring the class imbalance problem. However, imbalanced problem widely exists in various real-world face datasets, in which samples from some classes are in a higher number than others. We argue that the number of a class would influence its demand for the additive margin. In this paper, we introduce a new margin-aware reinforcement learning based loss function, namely fair loss, in which each class will learn an appropriate adaptive margin by Deep Q-learning. Specifically, we train an agent to learn a margin adaptive strategy for each class, and make the additive margins for different classes more reasonable. Our method has better performance than present large-margin loss functions on three benchmarks, Labeled Face in the Wild (LFW), Youtube Faces (YTF) and MegaFace, which demonstrates that our method could learn better face representation on imbalanced face datasets.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"30 1","pages":"10051-10060"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80557702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
Deep Parametric Indoor Lighting Estimation 深层参数室内照明估计
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00727
Marc-André Gardner, Yannick Hold-Geoffroy, Kalyan Sunkavalli, Christian Gagné, Jean-François Lalonde
We present a method to estimate lighting from a single image of an indoor scene. Previous work has used an environment map representation that does not account for the localized nature of indoor lighting. Instead, we represent lighting as a set of discrete 3D lights with geometric and photometric parameters. We train a deep neural network to regress these parameters from a single image, on a dataset of environment maps annotated with depth. We propose a differentiable layer to convert these parameters to an environment map to compute our loss; this bypasses the challenge of establishing correspondences between estimated and ground truth lights. We demonstrate, via quantitative and qualitative evaluations, that our representation and training scheme lead to more accurate results compared to previous work, while allowing for more realistic 3D object compositing with spatially-varying lighting.
我们提出了一种从室内场景的单个图像估计照明的方法。以前的工作使用了环境地图表示,但没有考虑室内照明的局部性质。相反,我们将照明表示为一组具有几何和光度参数的离散3D灯。我们训练一个深度神经网络,从单个图像回归这些参数,在一个带有深度注释的环境地图数据集上。我们提出了一个可微层来将这些参数转换为环境映射来计算我们的损失;这绕过了在估计真光和实际真光之间建立对应关系的挑战。我们通过定量和定性评估证明,与以前的工作相比,我们的表示和训练方案导致更准确的结果,同时允许更逼真的3D物体合成与空间变化的照明。
{"title":"Deep Parametric Indoor Lighting Estimation","authors":"Marc-André Gardner, Yannick Hold-Geoffroy, Kalyan Sunkavalli, Christian Gagné, Jean-François Lalonde","doi":"10.1109/ICCV.2019.00727","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00727","url":null,"abstract":"We present a method to estimate lighting from a single image of an indoor scene. Previous work has used an environment map representation that does not account for the localized nature of indoor lighting. Instead, we represent lighting as a set of discrete 3D lights with geometric and photometric parameters. We train a deep neural network to regress these parameters from a single image, on a dataset of environment maps annotated with depth. We propose a differentiable layer to convert these parameters to an environment map to compute our loss; this bypasses the challenge of establishing correspondences between estimated and ground truth lights. We demonstrate, via quantitative and qualitative evaluations, that our representation and training scheme lead to more accurate results compared to previous work, while allowing for more realistic 3D object compositing with spatially-varying lighting.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"10 1","pages":"7174-7182"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79699500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
Depth-Induced Multi-Scale Recurrent Attention Network for Saliency Detection 深度诱导的多尺度循环注意网络显著性检测
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00735
Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, Huchuan Lu
In this work, we propose a novel depth-induced multi-scale recurrent attention network for saliency detection. It achieves dramatic performance especially in complex scenarios. There are three main contributions of our network that are experimentally demonstrated to have significant practical merits. First, we design an effective depth refinement block using residual connections to fully extract and fuse multi-level paired complementary cues from RGB and depth streams. Second, depth cues with abundant spatial information are innovatively combined with multi-scale context features for accurately locating salient objects. Third, we boost our model's performance by a novel recurrent attention module inspired by Internal Generative Mechanism of human brain. This module can generate more accurate saliency results via comprehensively learning the internal semantic relation of the fused feature and progressively optimizing local details with memory-oriented scene understanding. In addition, we create a large scale RGB-D dataset containing more complex scenarios, which can contribute to comprehensively evaluating saliency models. Extensive experiments on six public datasets and ours demonstrate that our method can accurately identify salient objects and achieve consistently superior performance over 16 state-of-the-art RGB and RGB-D approaches.
在这项工作中,我们提出了一种新的深度诱导的多尺度循环注意网络,用于显著性检测。特别是在复杂的场景中,它实现了戏剧性的表现。我们的网络有三个主要贡献,实验证明具有重要的实际价值。首先,我们设计了一个有效的深度细化块,利用残差连接从RGB和深度流中充分提取和融合多层次配对互补线索。其次,创新地将具有丰富空间信息的深度线索与多尺度上下文特征相结合,实现显著目标的精确定位;第三,我们借鉴了人类大脑的内部生成机制,设计了一种新颖的循环注意模块,提高了模型的性能。该模块通过综合学习融合特征的内部语义关系,以记忆为导向的场景理解,逐步优化局部细节,生成更准确的显著性结果。此外,我们创建了一个包含更复杂场景的大规模RGB-D数据集,这有助于全面评估显著性模型。在6个公共数据集上进行的大量实验表明,我们的方法可以准确地识别显著目标,并在16种最先进的RGB和RGB- d方法中获得一致的卓越性能。
{"title":"Depth-Induced Multi-Scale Recurrent Attention Network for Saliency Detection","authors":"Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, Huchuan Lu","doi":"10.1109/ICCV.2019.00735","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00735","url":null,"abstract":"In this work, we propose a novel depth-induced multi-scale recurrent attention network for saliency detection. It achieves dramatic performance especially in complex scenarios. There are three main contributions of our network that are experimentally demonstrated to have significant practical merits. First, we design an effective depth refinement block using residual connections to fully extract and fuse multi-level paired complementary cues from RGB and depth streams. Second, depth cues with abundant spatial information are innovatively combined with multi-scale context features for accurately locating salient objects. Third, we boost our model's performance by a novel recurrent attention module inspired by Internal Generative Mechanism of human brain. This module can generate more accurate saliency results via comprehensively learning the internal semantic relation of the fused feature and progressively optimizing local details with memory-oriented scene understanding. In addition, we create a large scale RGB-D dataset containing more complex scenarios, which can contribute to comprehensively evaluating saliency models. Extensive experiments on six public datasets and ours demonstrate that our method can accurately identify salient objects and achieve consistently superior performance over 16 state-of-the-art RGB and RGB-D approaches.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"31 1","pages":"7253-7262"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83191898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 300
Deep Reinforcement Active Learning for Human-in-the-Loop Person Re-Identification 基于深度强化主动学习的人在环人再识别
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00622
Zimo Liu, Jingya Wang, S. Gong, D. Tao, Huchuan Lu
Most existing person re-identification(Re-ID) approaches achieve superior results based on the assumption that a large amount of pre-labelled data is usually available and can be put into training phrase all at once. However, this assumption is not applicable to most real-world deployment of the Re-ID task. In this work, we propose an alternative reinforcement learning based human-in-the-loop model which releases the restriction of pre-labelling and keeps model upgrading with progressively collected data. The goal is to minimize human annotation efforts while maximizing Re-ID performance. It works in an iteratively updating framework by refining the RL policy and CNN parameters alternately. In particular, we formulate a Deep Reinforcement Active Learning (DRAL) method to guide an agent (a model in a reinforcement learning process) in selecting training samples on-the-fly by a human user/annotator. The reinforcement learning reward is the uncertainty value of each human selected sample. A binary feedback (positive or negative) labelled by the human annotator is used to select the samples of which are used to fine-tune a pre-trained CNN Re-ID model. Extensive experiments demonstrate the superiority of our DRAL method for deep reinforcement learning based human-in-the-loop person Re-ID when compared to existing unsupervised and transfer learning models as well as active learning models.
大多数现有的人员再识别(Re-ID)方法都基于这样的假设,即通常有大量预先标记的数据可用,并且可以一次全部放入训练短语。然而,这个假设并不适用于Re-ID任务的大多数实际部署。在这项工作中,我们提出了一种替代的基于强化学习的人在环模型,该模型释放了预标记的限制,并随着逐步收集的数据保持模型升级。目标是在最大限度地提高Re-ID性能的同时最小化人工注释工作。它通过交替改进RL策略和CNN参数,在迭代更新框架中工作。特别是,我们制定了一种深度强化主动学习(DRAL)方法来指导智能体(强化学习过程中的模型)由人类用户/注释者实时选择训练样本。强化学习奖励是每个人类选择的样本的不确定性值。由人类注释者标记的二进制反馈(正或负)用于选择用于微调预训练的CNN Re-ID模型的样本。大量的实验表明,与现有的无监督和迁移学习模型以及主动学习模型相比,我们的基于人在环的深度强化学习的DRAL方法具有优越性。
{"title":"Deep Reinforcement Active Learning for Human-in-the-Loop Person Re-Identification","authors":"Zimo Liu, Jingya Wang, S. Gong, D. Tao, Huchuan Lu","doi":"10.1109/ICCV.2019.00622","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00622","url":null,"abstract":"Most existing person re-identification(Re-ID) approaches achieve superior results based on the assumption that a large amount of pre-labelled data is usually available and can be put into training phrase all at once. However, this assumption is not applicable to most real-world deployment of the Re-ID task. In this work, we propose an alternative reinforcement learning based human-in-the-loop model which releases the restriction of pre-labelling and keeps model upgrading with progressively collected data. The goal is to minimize human annotation efforts while maximizing Re-ID performance. It works in an iteratively updating framework by refining the RL policy and CNN parameters alternately. In particular, we formulate a Deep Reinforcement Active Learning (DRAL) method to guide an agent (a model in a reinforcement learning process) in selecting training samples on-the-fly by a human user/annotator. The reinforcement learning reward is the uncertainty value of each human selected sample. A binary feedback (positive or negative) labelled by the human annotator is used to select the samples of which are used to fine-tune a pre-trained CNN Re-ID model. Extensive experiments demonstrate the superiority of our DRAL method for deep reinforcement learning based human-in-the-loop person Re-ID when compared to existing unsupervised and transfer learning models as well as active learning models.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"15 1","pages":"6121-6130"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84641445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting TextDragon:一个端到端的框架,用于任意形状的文本识别
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00917
Wei Feng, Wenhao He, Fei Yin, Xu-Yao Zhang, Cheng-Lin Liu
Most existing text spotting methods either focus on horizontal/oriented texts or perform arbitrary shaped text spotting with character-level annotations. In this paper, we propose a novel text spotting framework to detect and recognize text of arbitrary shapes in an end-to-end manner, using only word/line-level annotations for training. Motivated from the name of TextSnake, which is only a detection model, we call the proposed text spotting framework TextDragon. In TextDragon, a text detector is designed to describe the shape of text with a series of quadrangles, which can handle text of arbitrary shapes. To extract arbitrary text regions from feature maps, we propose a new differentiable operator named RoISlide, which is the key to connect arbitrary shaped text detection and recognition. Based on the extracted features through RoISlide, a CNN and CTC based text recognizer is introduced to make the framework free from labeling the location of characters. The proposed method achieves state-of-the-art performance on two curved text benchmarks CTW1500 and Total-Text, and competitive results on the ICDAR 2015 Dataset.
大多数现有的文本识别方法要么专注于水平/方向文本,要么使用字符级注释执行任意形状的文本识别。在本文中,我们提出了一种新的文本识别框架,以端到端方式检测和识别任意形状的文本,仅使用单词/行级别的注释进行训练。由于TextSnake只是一个检测模型,我们将提出的文本识别框架称为TextDragon。在TextDragon中,一个文本检测器被设计成用一系列四边形来描述文本的形状,它可以处理任意形状的文本。为了从特征映射中提取任意文本区域,我们提出了一种新的可微算子RoISlide,它是连接任意形状文本检测和识别的关键。基于RoISlide提取的特征,引入了一种基于CNN和CTC的文本识别器,使框架不需要标注字符的位置。该方法在两个曲线文本基准CTW1500和Total-Text上取得了最先进的性能,并在ICDAR 2015数据集上取得了竞争结果。
{"title":"TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting","authors":"Wei Feng, Wenhao He, Fei Yin, Xu-Yao Zhang, Cheng-Lin Liu","doi":"10.1109/ICCV.2019.00917","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00917","url":null,"abstract":"Most existing text spotting methods either focus on horizontal/oriented texts or perform arbitrary shaped text spotting with character-level annotations. In this paper, we propose a novel text spotting framework to detect and recognize text of arbitrary shapes in an end-to-end manner, using only word/line-level annotations for training. Motivated from the name of TextSnake, which is only a detection model, we call the proposed text spotting framework TextDragon. In TextDragon, a text detector is designed to describe the shape of text with a series of quadrangles, which can handle text of arbitrary shapes. To extract arbitrary text regions from feature maps, we propose a new differentiable operator named RoISlide, which is the key to connect arbitrary shaped text detection and recognition. Based on the extracted features through RoISlide, a CNN and CTC based text recognizer is introduced to make the framework free from labeling the location of characters. The proposed method achieves state-of-the-art performance on two curved text benchmarks CTW1500 and Total-Text, and competitive results on the ICDAR 2015 Dataset.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"45 1","pages":"9075-9084"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87359527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 146
Unsupervised Person Re-Identification by Camera-Aware Similarity Consistency Learning 基于摄像机感知相似性一致性学习的无监督人再识别
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00702
Ancong Wu, Weishi Zheng, J. Lai
For matching pedestrians across disjoint camera views in surveillance, person re-identification (Re-ID) has made great progress in supervised learning. However, it is infeasible to label data in a number of new scenes when extending a Re-ID system. Thus, studying unsupervised learning for Re-ID is important for saving labelling cost. Yet, cross-camera scene variation is a key challenge for unsupervised Re-ID, such as illumination, background and viewpoint variations, which cause domain shift in the feature space and result in inconsistent pairwise similarity distributions that degrade matching performance. To alleviate the effect of cross-camera scene variation, we propose a Camera-Aware Similarity Consistency Loss to learn consistent pairwise similarity distributions for intra-camera matching and cross-camera matching. To avoid learning ineffective knowledge in consistency learning, we preserve the prior common knowledge of intra-camera matching in the pretrained model as reliable guiding information, which does not suffer from cross-camera scene variation as cross-camera matching. To learn similarity consistency more effectively, we further develop a coarse-to-fine consistency learning scheme to learn consistency globally and locally in two steps. Experiments show that our method outperformed the state-of-the-art unsupervised Re-ID methods.
针对监控中不相交摄像机视角下的行人匹配问题,人的再识别(Re-ID)在监督学习方面取得了很大进展。然而,在扩展Re-ID系统时,在许多新场景中标记数据是不可行的。因此,研究Re-ID的无监督学习对于节省标签成本具有重要意义。然而,跨相机场景变化是无监督Re-ID的一个关键挑战,如照明、背景和视点变化,这些变化会导致特征空间的域移位,导致不一致的成对相似度分布,从而降低匹配性能。为了减轻跨摄像机场景变化的影响,我们提出了一种摄像机感知的相似度一致性损失算法来学习摄像机内匹配和跨摄像机匹配的一致的成对相似度分布。为了避免一致性学习中学习到无效的知识,我们保留了预训练模型中相机内匹配的先验常识作为可靠的指导信息,不像跨相机匹配那样受到跨相机场景变化的影响。为了更有效地学习相似性一致性,我们进一步开发了一种从粗到细的一致性学习方案,分两步学习全局一致性和局部一致性。实验表明,我们的方法优于最先进的无监督重识别方法。
{"title":"Unsupervised Person Re-Identification by Camera-Aware Similarity Consistency Learning","authors":"Ancong Wu, Weishi Zheng, J. Lai","doi":"10.1109/ICCV.2019.00702","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00702","url":null,"abstract":"For matching pedestrians across disjoint camera views in surveillance, person re-identification (Re-ID) has made great progress in supervised learning. However, it is infeasible to label data in a number of new scenes when extending a Re-ID system. Thus, studying unsupervised learning for Re-ID is important for saving labelling cost. Yet, cross-camera scene variation is a key challenge for unsupervised Re-ID, such as illumination, background and viewpoint variations, which cause domain shift in the feature space and result in inconsistent pairwise similarity distributions that degrade matching performance. To alleviate the effect of cross-camera scene variation, we propose a Camera-Aware Similarity Consistency Loss to learn consistent pairwise similarity distributions for intra-camera matching and cross-camera matching. To avoid learning ineffective knowledge in consistency learning, we preserve the prior common knowledge of intra-camera matching in the pretrained model as reliable guiding information, which does not suffer from cross-camera scene variation as cross-camera matching. To learn similarity consistency more effectively, we further develop a coarse-to-fine consistency learning scheme to learn consistency globally and locally in two steps. Experiments show that our method outperformed the state-of-the-art unsupervised Re-ID methods.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"9 1","pages":"6921-6930"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84712628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
AdaTransform: Adaptive Data Transformation adattransform:自适应数据转换
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00309
Zhiqiang Tang, Xi Peng, Tingfeng Li, Yizhe Zhu, Dimitris N. Metaxas
Data augmentation is widely used to increase data variance in training deep neural networks. However, previous methods require either comprehensive domain knowledge or high computational cost. Can we learn data transformation automatically and efficiently with limited domain knowledge? Furthermore, can we leverage data transformation to improve not only network training but also network testing? In this work, we propose adaptive data transformation to achieve the two goals. The AdaTransform can increase data variance in training and decrease data variance in testing. Experiments on different tasks prove that it can improve generalization performance.
在深度神经网络训练中,数据增强被广泛用于增加数据方差。然而,以往的方法要么需要全面的领域知识,要么需要较高的计算成本。我们能否在有限的领域知识下自动高效地学习数据转换?此外,我们是否可以利用数据转换来改进网络训练和网络测试?在这项工作中,我们提出了自适应数据转换来实现这两个目标。adattransform可以增加训练中的数据方差,减少测试中的数据方差。在不同任务上的实验证明,该方法可以提高泛化性能。
{"title":"AdaTransform: Adaptive Data Transformation","authors":"Zhiqiang Tang, Xi Peng, Tingfeng Li, Yizhe Zhu, Dimitris N. Metaxas","doi":"10.1109/ICCV.2019.00309","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00309","url":null,"abstract":"Data augmentation is widely used to increase data variance in training deep neural networks. However, previous methods require either comprehensive domain knowledge or high computational cost. Can we learn data transformation automatically and efficiently with limited domain knowledge? Furthermore, can we leverage data transformation to improve not only network training but also network testing? In this work, we propose adaptive data transformation to achieve the two goals. The AdaTransform can increase data variance in training and decrease data variance in testing. Experiments on different tasks prove that it can improve generalization performance.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"171 1","pages":"2998-3006"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84794313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1