2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文中文

DeepACG: Co-Saliency Detection via Semantic-aware Contrast Gromov-Wasserstein Distance DeepACG:基于语义感知对比Gromov-Wasserstein距离的共显著性检测

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01349

Kaihua Zhang, Mengming Michael Dong, Bo Liu, Xiaotong Yuan, Qingshan Liu

The objective of co-saliency detection is to segment the co-occurring salient objects in a group of images. To address this task, we introduce a new deep network architecture via semantic-aware contrast Gromov-Wasserstein distance (DeepACG). We first adopt the Gromov-Wasserstein (GW) distance to build dense 4D correlation volumes for all pairs of image pixels within the image group. These dense correlation volumes enable the network to accurately discover the structured pair-wise pixel similarities among the common salient objects. Second, we develop a semantic-aware co-attention module (SCAM) to enhance the foreground co-saliency through predicted categorical information. Specifically, SCAM recognizes the semantic class of the foreground co-objects, and this information is then modulated to the deep representations to localize the related pixels. Third, we design a contrast edge-enhanced module (EEM) to capture richer contexts and preserve fine-grained spatial information. We validate the effectiveness of our model using three largest and most challenging benchmark datasets (Cosal2015, CoCA, and CoSOD3k). Extensive experiments have demonstrated the substantial practical merit of each module. Compared with the existing works, DeepACG shows significant improvements and achieves state-of-the-art performance.

共同显著性检测的目的是对一组图像中共同出现的显著性物体进行分割。为了解决这个问题，我们通过语义感知对比Gromov-Wasserstein距离(DeepACG)引入了一种新的深度网络架构。我们首先采用Gromov-Wasserstein (GW)距离对图像组内所有对图像像素建立密集的四维相关体。这些密集的相关体积使网络能够准确地发现共同显著对象之间的结构化成对像素相似性。其次，我们开发了一个语义感知的共注意模块(SCAM)，通过预测的分类信息来增强前景共显著性。具体来说，SCAM识别前景协同对象的语义类，然后将这些信息调制到深度表示中以定位相关像素。第三，我们设计了一个对比度边缘增强模块(EEM)来捕获更丰富的上下文并保留细粒度的空间信息。我们使用三个最大和最具挑战性的基准数据集(Cosal2015, CoCA和CoSOD3k)验证了我们模型的有效性。大量的实验证明了每个模块的实际价值。与现有的工作相比，DeepACG有了显著的改进，达到了最先进的性能。

{"title":"DeepACG: Co-Saliency Detection via Semantic-aware Contrast Gromov-Wasserstein Distance","authors":"Kaihua Zhang, Mengming Michael Dong, Bo Liu, Xiaotong Yuan, Qingshan Liu","doi":"10.1109/CVPR46437.2021.01349","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01349","url":null,"abstract":"The objective of co-saliency detection is to segment the co-occurring salient objects in a group of images. To address this task, we introduce a new deep network architecture via semantic-aware contrast Gromov-Wasserstein distance (DeepACG). We first adopt the Gromov-Wasserstein (GW) distance to build dense 4D correlation volumes for all pairs of image pixels within the image group. These dense correlation volumes enable the network to accurately discover the structured pair-wise pixel similarities among the common salient objects. Second, we develop a semantic-aware co-attention module (SCAM) to enhance the foreground co-saliency through predicted categorical information. Specifically, SCAM recognizes the semantic class of the foreground co-objects, and this information is then modulated to the deep representations to localize the related pixels. Third, we design a contrast edge-enhanced module (EEM) to capture richer contexts and preserve fine-grained spatial information. We validate the effectiveness of our model using three largest and most challenging benchmark datasets (Cosal2015, CoCA, and CoSOD3k). Extensive experiments have demonstrated the substantial practical merit of each module. Compared with the existing works, DeepACG shows significant improvements and achieves state-of-the-art performance.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123219226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Fine-Grained Shape-Appearance Mutual Learning for Cloth-Changing Person Re-Identification 换布人再识别的细粒形相互鉴

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01037

Peixian Hong, Tao Wu, Ancong Wu, Xintong Han, Weishi Zheng

Recently, person re-identification (Re-ID) has achieved great progress. However, current methods largely depend on color appearance, which is not reliable when a person changes the clothes. Cloth-changing Re-ID is challenging since pedestrian images with clothes change exhibit large intra-class variation and small inter-class variation. Some significant features for identification are embedded in unobvious body shape differences across pedestrians. To explore such body shape cues for cloth-changing Re-ID, we propose a Fine-grained Shape-Appearance Mutual learning framework (FSAM), a two-stream framework that learns fine-grained discriminative body shape knowledge in a shape stream and transfers it to an appearance stream to complement the cloth-unrelated knowledge in the appearance features. Specifically, in the shape stream, FSAM learns fine-grained discriminative mask with the guidance of identities and extracts fine-grained body shape features by a pose-specific multi-branch network. To complement cloth-unrelated shape knowledge in the appearance stream, dense interactive mutual learning is performed across low-level and high-level features to transfer knowledge from shape stream to appearance stream, which enables the appearance stream to be deployed independently without extra computation for mask estimation. We evaluated our method on benchmark cloth-changing Re-ID datasets and achieved the start-of-the-art performance.

近年来，个人身份再识别(Re-ID)取得了很大进展。然而，目前的方法很大程度上依赖于颜色外观，当一个人换衣服时，这是不可靠的。换衣Re-ID具有挑战性，因为换衣后的行人图像表现出较大的类内变化和较小的类间变化。行人之间不明显的体型差异中嵌入了一些重要的识别特征。为了探索这种改变布料的Re-ID的身体形状线索，我们提出了一个细粒度形状-外观相互学习框架(FSAM)，这是一个两流框架，它在形状流中学习细粒度区分的身体形状知识，并将其转移到外观流中，以补充外观特征中与布料无关的知识。具体来说，在形状流中，FSAM在身份的引导下学习细粒度的判别掩模，并通过针对姿态的多分支网络提取细粒度的体型特征。为了补充外观流中与布料无关的形状知识，在低级和高级特征之间进行密集的交互互学习，将知识从形状流转移到外观流，从而使外观流能够独立部署，而无需额外的掩模估计计算。我们在基准换布Re-ID数据集上评估了我们的方法，并取得了最先进的性能。

{"title":"Fine-Grained Shape-Appearance Mutual Learning for Cloth-Changing Person Re-Identification","authors":"Peixian Hong, Tao Wu, Ancong Wu, Xintong Han, Weishi Zheng","doi":"10.1109/CVPR46437.2021.01037","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01037","url":null,"abstract":"Recently, person re-identification (Re-ID) has achieved great progress. However, current methods largely depend on color appearance, which is not reliable when a person changes the clothes. Cloth-changing Re-ID is challenging since pedestrian images with clothes change exhibit large intra-class variation and small inter-class variation. Some significant features for identification are embedded in unobvious body shape differences across pedestrians. To explore such body shape cues for cloth-changing Re-ID, we propose a Fine-grained Shape-Appearance Mutual learning framework (FSAM), a two-stream framework that learns fine-grained discriminative body shape knowledge in a shape stream and transfers it to an appearance stream to complement the cloth-unrelated knowledge in the appearance features. Specifically, in the shape stream, FSAM learns fine-grained discriminative mask with the guidance of identities and extracts fine-grained body shape features by a pose-specific multi-branch network. To complement cloth-unrelated shape knowledge in the appearance stream, dense interactive mutual learning is performed across low-level and high-level features to transfer knowledge from shape stream to appearance stream, which enables the appearance stream to be deployed independently without extra computation for mask estimation. We evaluated our method on benchmark cloth-changing Re-ID datasets and achieved the start-of-the-art performance.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124795501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 54

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency 具有方向上下文感知一致性的半监督语义分割

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00126

Xin Lai, Zhuotao Tian, Li Jiang, Shu Liu, Hengshuang Zhao, Liwei Wang, Jiaya Jia

Semantic segmentation has made tremendous progress in recent years. However, satisfying performance highly depends on a large number of pixel-level annotations. Therefore, in this paper, we focus on the semi-supervised segmentation problem where only a small set of labeled data is provided with a much larger collection of totally unlabeled images. Nevertheless, due to the limited annotations, models may overly rely on the contexts available in the training data, which causes poor generalization to the scenes un-seen before. A preferred high-level representation should capture the contextual information while not losing self-awareness. Therefore, we propose to maintain the context-aware consistency between features of the same identity but with different contexts, making the representations robust to the varying environments. Moreover, we present the Directional Contrastive Loss (DC Loss) to accomplish the consistency in a pixel-to-pixel manner, only requiring the feature with lower quality to be aligned towards its counterpart. In addition, to avoid the false-negative samples and filter the uncertain positive samples, we put forward two sampling strategies. Extensive experiments show that our simple yet effective method surpasses current state-of-the-art methods by a large margin and also generalizes well with extra image-level annotations.

语义分割近年来取得了巨大的进展。然而，令人满意的性能高度依赖于大量的像素级注释。因此，在本文中，我们专注于半监督分割问题，其中只有一小部分标记数据提供了更大的完全未标记图像集合。然而，由于有限的注释，模型可能会过度依赖训练数据中可用的上下文，从而导致对以前未见过的场景的泛化能力差。首选的高级表示应该在不失去自我意识的情况下捕获上下文信息。因此，我们建议在具有相同身份但具有不同上下文的特征之间保持上下文感知一致性，使表征对不同环境具有鲁棒性。此外，我们提出了定向对比损耗(DC Loss)来实现像素到像素的一致性，只需要将质量较低的特征与对应的特征对齐。此外，为了避免假阴性样本和过滤不确定阳性样本，我们提出了两种采样策略。大量的实验表明，我们的方法简单而有效，大大超过了目前最先进的方法，并且通过额外的图像级注释也能很好地泛化。

{"title":"Semi-supervised Semantic Segmentation with Directional Context-aware Consistency","authors":"Xin Lai, Zhuotao Tian, Li Jiang, Shu Liu, Hengshuang Zhao, Liwei Wang, Jiaya Jia","doi":"10.1109/CVPR46437.2021.00126","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00126","url":null,"abstract":"Semantic segmentation has made tremendous progress in recent years. However, satisfying performance highly depends on a large number of pixel-level annotations. Therefore, in this paper, we focus on the semi-supervised segmentation problem where only a small set of labeled data is provided with a much larger collection of totally unlabeled images. Nevertheless, due to the limited annotations, models may overly rely on the contexts available in the training data, which causes poor generalization to the scenes un-seen before. A preferred high-level representation should capture the contextual information while not losing self-awareness. Therefore, we propose to maintain the context-aware consistency between features of the same identity but with different contexts, making the representations robust to the varying environments. Moreover, we present the Directional Contrastive Loss (DC Loss) to accomplish the consistency in a pixel-to-pixel manner, only requiring the feature with lower quality to be aligned towards its counterpart. In addition, to avoid the false-negative samples and filter the uncertain positive samples, we put forward two sampling strategies. Extensive experiments show that our simple yet effective method surpasses current state-of-the-art methods by a large margin and also generalizes well with extra image-level annotations.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124802672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 119

Lesion-Aware Transformers for Diabetic Retinopathy Grading 糖尿病视网膜病变分级的病变感知变压器

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01079

Rui Sun, Yihao Li, Tianzhu Zhang, Zhendong Mao, Feng Wu, Yongdong Zhang

Diabetic retinopathy (DR) is the leading cause of permanent blindness in the working-age population. And automatic DR diagnosis can assist ophthalmologists to design tailored treatments for patients, including DR grading and lesion discovery. However, most of existing methods treat DR grading and lesion discovery as two independent tasks, which require lesion annotations as a learning guidance and limits the actual deployment. To alleviate this problem, we propose a novel lesion-aware transformer (LAT) for DR grading and lesion discovery jointly in a unified deep model via an encoder-decoder structure including a pixel relation based encoder and a lesion filter based decoder. The proposed LAT enjoys several merits. First, to the best of our knowledge, this is the first work to formulate lesion discovery as a weakly supervised lesion localization problem via a transformer decoder. Second, to learn lesion filters well with only image-level labels, we design two effective mechanisms including lesion region importance and lesion region diversity for identifying diverse lesion regions. Extensive experimental results on three challenging benchmarks including Messidor-1, Messidor-2 and EyePACS demonstrate that the proposed LAT performs favorably against state-of-the-art DR grading and lesion discovery methods.

糖尿病视网膜病变(DR)是导致劳动年龄人口永久失明的主要原因。自动DR诊断可以帮助眼科医生为患者设计量身定制的治疗方案，包括DR分级和病变发现。为了解决这一问题，我们提出了一种新的病变感知变压器(LAT)，通过编码器-解码器结构，包括基于像素关系的编码器和基于病变滤波器的解码器，在统一的深度模型中联合进行DR分级和病变发现。拟议的LAT有几个优点。首先，据我们所知，这是第一个通过变压器解码器将病变发现表述为弱监督病变定位问题的工作。其次，为了仅使用图像级标签就能很好地学习病变过滤器，我们设计了病变区域重要性和病变区域多样性两种有效的机制来识别不同的病变区域。在包括Messidor-1、Messidor-2和EyePACS在内的三个具有挑战性的基准测试中，大量的实验结果表明，所提出的LAT在最先进的DR分级和病变发现方法中表现良好。

{"title":"Lesion-Aware Transformers for Diabetic Retinopathy Grading","authors":"Rui Sun, Yihao Li, Tianzhu Zhang, Zhendong Mao, Feng Wu, Yongdong Zhang","doi":"10.1109/CVPR46437.2021.01079","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01079","url":null,"abstract":"Diabetic retinopathy (DR) is the leading cause of permanent blindness in the working-age population. And automatic DR diagnosis can assist ophthalmologists to design tailored treatments for patients, including DR grading and lesion discovery. However, most of existing methods treat DR grading and lesion discovery as two independent tasks, which require lesion annotations as a learning guidance and limits the actual deployment. To alleviate this problem, we propose a novel lesion-aware transformer (LAT) for DR grading and lesion discovery jointly in a unified deep model via an encoder-decoder structure including a pixel relation based encoder and a lesion filter based decoder. The proposed LAT enjoys several merits. First, to the best of our knowledge, this is the first work to formulate lesion discovery as a weakly supervised lesion localization problem via a transformer decoder. Second, to learn lesion filters well with only image-level labels, we design two effective mechanisms including lesion region importance and lesion region diversity for identifying diverse lesion regions. Extensive experimental results on three challenging benchmarks including Messidor-1, Messidor-2 and EyePACS demonstrate that the proposed LAT performs favorably against state-of-the-art DR grading and lesion discovery methods.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122965967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

Blind Deblurring for Saturated Images 饱和图像的盲去模糊

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00624

Liang Chen, Jiawei Zhang, Songnan Lin, Faming Fang, Jimmy S. J. Ren

Blind deblurring has received considerable attention in recent years. However, state-of-the-art methods often fail to process saturated blurry images. The main reason is that pixels around saturated regions are not conforming to the commonly used linear blur model. Pioneer arts suggest excluding these pixels during the deblurring process, which sometimes simultaneously removes the informative edges around saturated regions and results in insufficient information for kernel estimation when large saturated regions exist. To address this problem, we introduce a new blur model to fit both saturated and unsaturated pixels, and all informative pixels can be considered during the deblurring process. Based on our model, we develop an effective maximum a posterior (MAP)-based optimization framework. Quantitative and qualitative evaluations on benchmark datasets and challenging real-world examples show that the proposed method performs favorably against existing methods.

近年来，盲去模糊技术受到了相当大的关注。然而，最先进的方法往往不能处理饱和模糊的图像。主要原因是饱和区域周围的像素不符合常用的线性模糊模型。先锋艺术建议在去模糊过程中排除这些像素，这有时会同时去除饱和区域周围的信息边缘，当存在较大的饱和区域时，会导致核估计的信息不足。为了解决这个问题，我们引入了一种新的模糊模型来拟合饱和和不饱和像素，并且在去模糊过程中可以考虑所有的信息像素。基于我们的模型，我们开发了一个有效的最大后验(MAP)优化框架。对基准数据集和具有挑战性的现实世界示例的定量和定性评估表明，所提出的方法优于现有方法。

引用次数: 11

COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction COMPLETER:通过对比预测的不完全多视图聚类

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01102

Yijie Lin, Yuanbiao Gou, Zitao Liu, Boyun Li, Jiancheng Lv, Xi Peng

In this paper, we study two challenging problems in incomplete multi-view clustering analysis, namely, i) how to learn an informative and consistent representation among different views without the help of labels and ii) how to recover the missing views from data. To this end, we propose a novel objective that incorporates representation learning and data recovery into a unified framework from the view of information theory. To be specific, the informative and consistent representation is learned by maximizing the mutual information across different views through contrastive learning, and the missing views are recovered by minimizing the conditional entropy of different views through dual prediction. To the best of our knowledge, this could be the first work to provide a theoretical framework that unifies the consistent representation learning and cross-view data recovery. Extensive experimental results show the proposed method remarkably outperforms 10 competitive multi-view clustering methods on four challenging datasets. The code is available at https://pengxi.me.

本文研究了不完全多视图聚类分析中的两个具有挑战性的问题，即如何在不借助标签的情况下学习不同视图之间的信息一致的表示和如何从数据中恢复缺失的视图。为此，我们提出了一个新的目标，即从信息论的角度将表示学习和数据恢复整合到一个统一的框架中。具体而言，通过对比学习最大化不同视图之间的相互信息，获得信息丰富的一致表示，通过对偶预测最小化不同视图的条件熵，恢复缺失的视图。据我们所知，这可能是第一个提供统一一致表示学习和跨视图数据恢复的理论框架的工作。大量的实验结果表明，该方法在4个具有挑战性的数据集上显著优于10种竞争多视图聚类方法。代码可在https://pengxi.me上获得。

引用次数: 129

Image Change Captioning by Learning from an Auxiliary Task 从辅助任务中学习图像更改标题

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00275

M. Hosseinzadeh, Yang Wang

We tackle the challenging task of image change captioning. The goal is to describe the subtle difference between two very similar images by generating a sentence caption. While the recent methods mainly focus on proposing new model architectures for this problem, we instead focus on an alternative training scheme. Inspired by the success of multi-task learning, we formulate a training scheme that uses an auxiliary task to improve the training of the change captioning network. We argue that the task of composed query image retrieval is a natural choice as the auxiliary task. Given two almost similar images as the input, the primary network generates a caption describing the fine change between those two images. Next, the auxiliary network is provided with the generated caption and one of those two images. It then tries to pick the second image among a set of candidates. This forces the primary network to generate detailed and precise captions via having an extra supervision loss by the auxiliary network. Furthermore, we propose a new scheme for selecting a negative set of candidates for the retrieval task that can effectively improve the performance. We show that the proposed training strategy performs well on the task of change captioning on benchmark datasets.

我们解决了图像变化标题的挑战性任务。目标是通过生成句子标题来描述两个非常相似的图像之间的细微差异。虽然最近的方法主要集中在为这个问题提出新的模型架构，但我们关注的是一个替代的训练方案。受多任务学习成功的启发，我们制定了一种使用辅助任务来改进变更字幕网络训练的训练方案。我们认为组合查询图像检索任务是作为辅助任务的自然选择。给定两张几乎相似的图像作为输入，主网络生成描述这两张图像之间细微变化的标题。接下来，为辅助网络提供生成的标题和这两个图像中的一个。然后，它尝试从一组候选图像中选择第二张图像。这迫使主网络通过辅助网络的额外监督损失来生成详细和精确的标题。在此基础上，我们提出了一种新的检索任务负候选集选择方案，可以有效地提高检索任务的性能。结果表明，本文提出的训练策略在基准数据集上的标题变更任务上表现良好。

{"title":"Image Change Captioning by Learning from an Auxiliary Task","authors":"M. Hosseinzadeh, Yang Wang","doi":"10.1109/CVPR46437.2021.00275","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00275","url":null,"abstract":"We tackle the challenging task of image change captioning. The goal is to describe the subtle difference between two very similar images by generating a sentence caption. While the recent methods mainly focus on proposing new model architectures for this problem, we instead focus on an alternative training scheme. Inspired by the success of multi-task learning, we formulate a training scheme that uses an auxiliary task to improve the training of the change captioning network. We argue that the task of composed query image retrieval is a natural choice as the auxiliary task. Given two almost similar images as the input, the primary network generates a caption describing the fine change between those two images. Next, the auxiliary network is provided with the generated caption and one of those two images. It then tries to pick the second image among a set of candidates. This forces the primary network to generate detailed and precise captions via having an extra supervision loss by the auxiliary network. Furthermore, we propose a new scheme for selecting a negative set of candidates for the retrieval task that can effectively improve the performance. We show that the proposed training strategy performs well on the task of change captioning on benchmark datasets.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"918 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121035462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Leveraging Line-point Consistence to Preserve Structures for Wide Parallax Image Stitching 利用线点一致性来保留宽视差图像拼接的结构

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01201

Qi Jia, Zheng Li, Xin Fan, Haotian Zhao, Shiyu Teng, Xinchen Ye, Longin Jan Latecki

Generating high-quality stitched images with natural structures is a challenging task in computer vision. In this paper, we succeed in preserving both local and global geometric structures for wide parallax images, while reducing artifacts and distortions. A projective invariant, Characteristic Number, is used to match co-planar local sub-regions for input images. The homography between these well-matched sub-regions produces consistent line and point pairs, suppressing artifacts in overlapping areas. We explore and introduce global collinear structures into an objective function to specify and balance the desired characters for image warping, which can preserve both local and global structures while alleviating distortions. We also develop comprehensive measures for stitching quality to quantify the collinearity of points and the discrepancy of matched line pairs by considering the sensitivity to linear structures for human vision. Extensive experiments demonstrate the superior performance of the proposed method over the state-of-the-art by presenting sharp textures and preserving prominent natural structures in stitched images. Especially, our method not only exhibits lower errors but also the least divergence across all test images. Code is available at https://github.com/dut-media-lab/Image-Stitching.

在计算机视觉中，生成具有自然结构的高质量拼接图像是一项具有挑战性的任务。在本文中，我们成功地保留了宽视差图像的局部和全局几何结构，同时减少了伪影和畸变。投影不变量特征数用于匹配输入图像的共面局部子区域。这些匹配良好的子区域之间的同源性产生一致的线和点对，抑制重叠区域中的伪影。我们探索并引入全局共线结构到目标函数中，以指定和平衡图像扭曲所需的特征，从而在减轻扭曲的同时保留局部和全局结构。我们还通过考虑人类视觉对线性结构的敏感性，开发了综合的拼接质量度量来量化点的共线性和匹配线对的差异。大量的实验表明，该方法通过在缝合图像中呈现清晰的纹理和保留突出的自然结构，比目前最先进的方法具有优越的性能。特别是，我们的方法不仅具有较低的误差，而且在所有测试图像中发散最小。代码可从https://github.com/dut-media-lab/Image-Stitching获得。

{"title":"Leveraging Line-point Consistence to Preserve Structures for Wide Parallax Image Stitching","authors":"Qi Jia, Zheng Li, Xin Fan, Haotian Zhao, Shiyu Teng, Xinchen Ye, Longin Jan Latecki","doi":"10.1109/CVPR46437.2021.01201","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01201","url":null,"abstract":"Generating high-quality stitched images with natural structures is a challenging task in computer vision. In this paper, we succeed in preserving both local and global geometric structures for wide parallax images, while reducing artifacts and distortions. A projective invariant, Characteristic Number, is used to match co-planar local sub-regions for input images. The homography between these well-matched sub-regions produces consistent line and point pairs, suppressing artifacts in overlapping areas. We explore and introduce global collinear structures into an objective function to specify and balance the desired characters for image warping, which can preserve both local and global structures while alleviating distortions. We also develop comprehensive measures for stitching quality to quantify the collinearity of points and the discrepancy of matched line pairs by considering the sensitivity to linear structures for human vision. Extensive experiments demonstrate the superior performance of the proposed method over the state-of-the-art by presenting sharp textures and preserving prominent natural structures in stitched images. Especially, our method not only exhibits lower errors but also the least divergence across all test images. Code is available at https://github.com/dut-media-lab/Image-Stitching.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122639846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

RPN Prototype Alignment For Domain Adaptive Object Detector 面向领域自适应目标检测器的RPN原型对齐

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01224

Y. Zhang, Zilei Wang, Yushi Mao

Recent years have witnessed great progress of object detection. However, due to the domain shift problem, applying the knowledge of an object detector learned from one specific domain to another one often suffers severe performance degradation. Most existing methods adopt feature alignment either on the backbone network or instance classifier to increase the transferability of object detector. Differently, we propose to perform feature alignment in the RPN stage such that the foreground and background RPN proposals in target domain can be effectively distinguished. Specifically, we first construct one set of learnable RPN prototpyes, and then enforce the RPN features to align with the prototypes for both source and target domains. It essentially cooperates the learning of RPN prototypes and features to align the source and target RPN features. Particularly, we propose a simple yet effective method suitable for RPN feature alignment to generate high-quality pseudo label of proposals in target domain, i.e., using the filtered detection results with IoU. Furthermore, we adopt Grad CAM to find the discriminative region within a foreground proposal and use it to increase the discriminability of RPN features for alignment. We conduct extensive experiments on multiple cross-domain detection scenarios, and the results show the effectiveness of our proposed method against previous state-of-the-art methods.

近年来，目标检测技术取得了很大的进展。然而，由于领域转移问题，将目标检测器从一个特定领域学习到的知识应用到另一个特定领域往往会导致严重的性能下降。现有方法大多采用骨干网或实例分类器上的特征对齐来提高目标检测器的可移植性。不同的是，我们提出在RPN阶段进行特征对齐，从而有效区分目标域的前景和背景RPN提议。具体来说，我们首先构建一组可学习的RPN原型，然后强制RPN特征与源和目标领域的原型保持一致。它本质上配合RPN原型和特征的学习，以对齐源和目标RPN特征。特别地，我们提出了一种简单而有效的适合于RPN特征对齐的方法，即利用IoU过滤后的检测结果在目标域中生成高质量的提案伪标签。在此基础上，采用梯度CAM方法在前景图中寻找可分辨区域，提高RPN特征的可分辨性。我们在多个跨域检测场景下进行了广泛的实验，结果表明我们提出的方法与以前最先进的方法相比是有效的。

{"title":"RPN Prototype Alignment For Domain Adaptive Object Detector","authors":"Y. Zhang, Zilei Wang, Yushi Mao","doi":"10.1109/CVPR46437.2021.01224","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01224","url":null,"abstract":"Recent years have witnessed great progress of object detection. However, due to the domain shift problem, applying the knowledge of an object detector learned from one specific domain to another one often suffers severe performance degradation. Most existing methods adopt feature alignment either on the backbone network or instance classifier to increase the transferability of object detector. Differently, we propose to perform feature alignment in the RPN stage such that the foreground and background RPN proposals in target domain can be effectively distinguished. Specifically, we first construct one set of learnable RPN prototpyes, and then enforce the RPN features to align with the prototypes for both source and target domains. It essentially cooperates the learning of RPN prototypes and features to align the source and target RPN features. Particularly, we propose a simple yet effective method suitable for RPN feature alignment to generate high-quality pseudo label of proposals in target domain, i.e., using the filtered detection results with IoU. Furthermore, we adopt Grad CAM to find the discriminative region within a foreground proposal and use it to increase the discriminability of RPN features for alignment. We conduct extensive experiments on multiple cross-domain detection scenarios, and the results show the effectiveness of our proposed method against previous state-of-the-art methods.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123816461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

Rich features for perceptual quality assessment of UGC videos 丰富的UGC视频感知质量评估功能

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01323

Yilin Wang, Junjie Ke, Hossein Talebi, Joong Gon Yim, N. Birkbeck, Balu Adsumilli, P. Milanfar, Feng Yang

Video quality assessment for User Generated Content (UGC) is an important topic in both industry and academia. Most existing methods only focus on one aspect of the perceptual quality assessment, such as technical quality or compression artifacts. In this paper, we create a large scale dataset to comprehensively investigate characteristics of generic UGC video quality. Besides the subjective ratings and content labels of the dataset, we also propose a DNN-based framework to thoroughly analyze importance of content, technical quality, and compression level in perceptual quality. Our model is able to provide quality scores as well as human-friendly quality indicators, to bridge the gap between low level video signals to human perceptual quality. Experimental results show that our model achieves state-of-the-art correlation with Mean Opinion Scores (MOS).

用户生成内容(UGC)视频质量评估是业界和学术界的一个重要课题。大多数现有方法只关注感知质量评估的一个方面，如技术质量或压缩伪影。在本文中，我们创建了一个大规模的数据集来全面研究通用UGC视频质量的特征。除了数据集的主观评分和内容标签外，我们还提出了一个基于dnn的框架，以彻底分析内容，技术质量和压缩水平在感知质量中的重要性。我们的模型能够提供质量分数以及人性化的质量指标，以弥合低电平视频信号与人类感知质量之间的差距。实验结果表明，我们的模型与平均意见分数(MOS)达到了最先进的相关性。

引用次数: 43

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀