Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision最新文献_第10页

MIME: Minority Inclusion for Majority Group Enhancement of AI Performance MIME:少数群体融入多数群体增强AI性能

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-09-01 DOI: 10.48550/arXiv.2209.00746

Pradyumna Chari, Yunhao Ba, Shreeram S. Athreya, A. Kadambi

Several papers have rightly included minority groups in artificial intelligence (AI) training data to improve test inference for minority groups and/or society-at-large. A society-at-large consists of both minority and majority stakeholders. A common misconception is that minority inclusion does not increase performance for majority groups alone. In this paper, we make the surprising finding that including minority samples can improve test error for the majority group. In other words, minority group inclusion leads to majority group enhancements (MIME) in performance. A theoretical existence proof of the MIME effect is presented and found to be consistent with experimental results on six different datasets. Project webpage: https://visual.ee.ucla.edu/mime.htm/

一些论文正确地将少数群体纳入人工智能(AI)训练数据中，以改善少数群体和/或整个社会的测试推断。整个社会由少数利益相关者和多数利益相关者组成。一个常见的误解是，少数群体的加入并不会单独提高多数群体的表现。在本文中，我们得到了令人惊讶的发现，包括少数样本可以改善多数群体的测试误差。换句话说，包含少数组可以提高多数组的性能(MIME)。提出了MIME效应的理论存在性证明，并在六个不同的数据集上与实验结果相一致。项目网页:https://visual.ee.ucla.edu/mime.htm/

引用次数: 1

Exploring Gradient-based Multi-directional Controls in GANs gan中基于梯度的多向控制研究

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-09-01 DOI: 10.48550/arXiv.2209.00698

Zikun Chen, R. Jiang, Brendan Duke, Han Zhao, P. Aarabi

Generative Adversarial Networks (GANs) have been widely applied in modeling diverse image distributions. However, despite its impressive applications, the structure of the latent space in GANs largely remains as a black-box, leaving its controllable generation an open problem, especially when spurious correlations between different semantic attributes exist in the image distributions. To address this problem, previous methods typically learn linear directions or individual channels that control semantic attributes in the image space. However, they often suffer from imperfect disentanglement, or are unable to obtain multi-directional controls. In this work, in light of the above challenges, we propose a novel approach that discovers nonlinear controls, which enables multi-directional manipulation as well as effective disentanglement, based on gradient information in the learned GAN latent space. More specifically, we first learn interpolation directions by following the gradients from classification networks trained separately on the attributes, and then navigate the latent space by exclusively controlling channels activated for the target attribute in the learned directions. Empirically, with small training data, our approach is able to gain fine-grained controls over a diverse set of bi-directional and multi-directional attributes, and we showcase its ability to achieve disentanglement significantly better than state-of-the-art methods both qualitatively and quantitatively.

生成对抗网络(GANs)已广泛应用于各种图像分布的建模。然而，尽管具有令人印象深刻的应用，gan中潜在空间的结构在很大程度上仍然是一个黑盒，使其可控生成成为一个开放问题，特别是当图像分布中存在不同语义属性之间的虚假相关性时。为了解决这个问题，以前的方法通常学习线性方向或控制图像空间中语义属性的单个通道。然而，它们经常遭受不完美的解缠，或者无法获得多向控制。在这项工作中，鉴于上述挑战，我们提出了一种基于学习到的GAN潜在空间中的梯度信息发现非线性控制的新方法，该方法可以实现多向操作以及有效的解纠缠。更具体地说，我们首先通过跟踪属性上单独训练的分类网络的梯度来学习插值方向，然后通过在学习方向上专门控制为目标属性激活的通道来导航潜在空间。从经验上讲，使用小的训练数据，我们的方法能够获得对各种双向和多向属性的细粒度控制，并且我们展示了它在定性和定量上比最先进的方法更好地实现解纠缠的能力。

{"title":"Exploring Gradient-based Multi-directional Controls in GANs","authors":"Zikun Chen, R. Jiang, Brendan Duke, Han Zhao, P. Aarabi","doi":"10.48550/arXiv.2209.00698","DOIUrl":"https://doi.org/10.48550/arXiv.2209.00698","url":null,"abstract":"Generative Adversarial Networks (GANs) have been widely applied in modeling diverse image distributions. However, despite its impressive applications, the structure of the latent space in GANs largely remains as a black-box, leaving its controllable generation an open problem, especially when spurious correlations between different semantic attributes exist in the image distributions. To address this problem, previous methods typically learn linear directions or individual channels that control semantic attributes in the image space. However, they often suffer from imperfect disentanglement, or are unable to obtain multi-directional controls. In this work, in light of the above challenges, we propose a novel approach that discovers nonlinear controls, which enables multi-directional manipulation as well as effective disentanglement, based on gradient information in the learned GAN latent space. More specifically, we first learn interpolation directions by following the gradients from classification networks trained separately on the attributes, and then navigate the latent space by exclusively controlling channels activated for the target attribute in the learned directions. Empirically, with small training data, our approach is able to gain fine-grained controls over a diverse set of bi-directional and multi-directional attributes, and we showcase its ability to achieve disentanglement significantly better than state-of-the-art methods both qualitatively and quantitatively.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"33 1","pages":"104-119"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76670434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

SimpleRecon: 3D Reconstruction Without 3D Convolutions SimpleRecon: 3D重建没有3D卷积

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-08-31 DOI: 10.48550/arXiv.2208.14743

Mohamed Sayed, J. Gibson, Jamie Watson, V. Prisacariu, Michael Firman, Clément Godard

Traditionally, 3D indoor scene reconstruction from posed images happens in two phases: per-image depth estimation, followed by depth merging and surface reconstruction. Recently, a family of methods have emerged that perform reconstruction directly in final 3D volumetric feature space. While these methods have shown impressive reconstruction results, they rely on expensive 3D convolutional layers, limiting their application in resource-constrained environments. In this work, we instead go back to the traditional route, and show how focusing on high quality multi-view depth prediction leads to highly accurate 3D reconstructions using simple off-the-shelf depth fusion. We propose a simple state-of-the-art multi-view depth estimator with two main contributions: 1) a carefully-designed 2D CNN which utilizes strong image priors alongside a plane-sweep feature volume and geometric losses, combined with 2) the integration of keyframe and geometric metadata into the cost volume which allows informed depth plane scoring. Our method achieves a significant lead over the current state-of-the-art for depth estimation and close or better for 3D reconstruction on ScanNet and 7-Scenes, yet still allows for online real-time low-memory reconstruction. Code, models and results are available at https://nianticlabs.github.io/simplerecon

传统的三维室内场景重建方法分为两个阶段:图像深度估计、深度合并和表面重建。最近，出现了一系列直接在最终三维体积特征空间中进行重建的方法。虽然这些方法显示了令人印象深刻的重建结果，但它们依赖于昂贵的3D卷积层，限制了它们在资源受限环境中的应用。在这项工作中，我们回到了传统的路线，并展示了如何专注于高质量的多视图深度预测，从而使用简单的现成深度融合实现高精度的3D重建。我们提出了一个简单的最先进的多视图深度估计器，它有两个主要贡献:1)一个精心设计的2D CNN，它利用强大的图像先验以及平面扫描特征体积和几何损失，结合2)将关键帧和几何元数据集成到成本体积中，从而允许知情深度平面评分。我们的方法在深度估计和ScanNet和7-Scenes上接近或更好的3D重建方面取得了显著的领先优势，但仍然允许在线实时低内存重建。代码、模型和结果可在https://nianticlabs.github.io/simplerecon上获得

{"title":"SimpleRecon: 3D Reconstruction Without 3D Convolutions","authors":"Mohamed Sayed, J. Gibson, Jamie Watson, V. Prisacariu, Michael Firman, Clément Godard","doi":"10.48550/arXiv.2208.14743","DOIUrl":"https://doi.org/10.48550/arXiv.2208.14743","url":null,"abstract":"Traditionally, 3D indoor scene reconstruction from posed images happens in two phases: per-image depth estimation, followed by depth merging and surface reconstruction. Recently, a family of methods have emerged that perform reconstruction directly in final 3D volumetric feature space. While these methods have shown impressive reconstruction results, they rely on expensive 3D convolutional layers, limiting their application in resource-constrained environments. In this work, we instead go back to the traditional route, and show how focusing on high quality multi-view depth prediction leads to highly accurate 3D reconstructions using simple off-the-shelf depth fusion. We propose a simple state-of-the-art multi-view depth estimator with two main contributions: 1) a carefully-designed 2D CNN which utilizes strong image priors alongside a plane-sweep feature volume and geometric losses, combined with 2) the integration of keyframe and geometric metadata into the cost volume which allows informed depth plane scoring. Our method achieves a significant lead over the current state-of-the-art for depth estimation and close or better for 3D reconstruction on ScanNet and 7-Scenes, yet still allows for online real-time low-memory reconstruction. Code, models and results are available at https://nianticlabs.github.io/simplerecon","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"170 1","pages":"1-19"},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79386930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Style-Agnostic Reinforcement Learning 风格不可知的强化学习

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-08-31 DOI: 10.48550/arXiv.2208.14863

Juyong Lee, Seokjun Ahn, Jaesik Park

We present a novel method of learning style-agnostic representation using both style transfer and adversarial learning in the reinforcement learning framework. The style, here, refers to task-irrelevant details such as the color of the background in the images, where generalizing the learned policy across environments with different styles is still a challenge. Focusing on learning style-agnostic representations, our method trains the actor with diverse image styles generated from an inherent adversarial style perturbation generator, which plays a min-max game between the actor and the generator, without demanding expert knowledge for data augmentation or additional class labels for adversarial training. We verify that our method achieves competitive or better performances than the state-of-the-art approaches on Procgen and Distracting Control Suite benchmarks, and further investigate the features extracted from our model, showing that the model better captures the invariants and is less distracted by the shifted style. The code is available at https://github.com/POSTECH-CVLab/style-agnostic-RL.

我们提出了一种在强化学习框架中使用风格迁移和对抗学习来学习风格不可知表征的新方法。这里的风格指的是与任务无关的细节，比如图像中背景的颜色，在不同风格的环境中泛化学习到的策略仍然是一个挑战。专注于学习风格不可知表示，我们的方法用一个固有的对抗性风格扰动生成器生成的不同图像风格来训练actor，该生成器在actor和生成器之间进行最小-最大博弈，而不需要专家知识来增强数据或额外的类标签来进行对抗性训练。我们验证了我们的方法在Procgen和distraction Control Suite基准测试上取得了与最先进的方法相比具有竞争力或更好的性能，并进一步研究了从我们的模型中提取的特征，表明该模型更好地捕获了不变量，并且较少受到转移风格的干扰。代码可在https://github.com/POSTECH-CVLab/style-agnostic-RL上获得。

{"title":"Style-Agnostic Reinforcement Learning","authors":"Juyong Lee, Seokjun Ahn, Jaesik Park","doi":"10.48550/arXiv.2208.14863","DOIUrl":"https://doi.org/10.48550/arXiv.2208.14863","url":null,"abstract":"We present a novel method of learning style-agnostic representation using both style transfer and adversarial learning in the reinforcement learning framework. The style, here, refers to task-irrelevant details such as the color of the background in the images, where generalizing the learned policy across environments with different styles is still a challenge. Focusing on learning style-agnostic representations, our method trains the actor with diverse image styles generated from an inherent adversarial style perturbation generator, which plays a min-max game between the actor and the generator, without demanding expert knowledge for data augmentation or additional class labels for adversarial training. We verify that our method achieves competitive or better performances than the state-of-the-art approaches on Procgen and Distracting Control Suite benchmarks, and further investigate the features extracted from our model, showing that the model better captures the invariants and is less distracted by the shifted style. The code is available at https://github.com/POSTECH-CVLab/style-agnostic-RL.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"56 1","pages":"604-620"},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90567685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer ASpanFormer:无检测器图像匹配与自适应跨度变压器

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-08-30 DOI: 10.48550/arXiv.2208.14201

Hongkai Chen, Zixin Luo, Lei Zhou, Yurun Tian, Mingmin Zhen, Tian Fang, D. McKinnon, Yanghai Tsin, Long Quan

Generating robust and reliable correspondences across images is a fundamental task for a diversity of applications. To capture context at both global and local granularity, we propose ASpanFormer, a Transformer-based detector-free matcher that is built on hierarchical attention structure, adopting a novel attention operation which is capable of adjusting attention span in a self-adaptive manner. To achieve this goal, first, flow maps are regressed in each cross attention phase to locate the center of search region. Next, a sampling grid is generated around the center, whose size, instead of being empirically configured as fixed, is adaptively computed from a pixel uncertainty estimated along with the flow map. Finally, attention is computed across two images within derived regions, referred to as attention span. By these means, we are able to not only maintain long-range dependencies, but also enable fine-grained attention among pixels of high relevance that compensates essential locality and piece-wise smoothness in matching tasks. State-of-the-art accuracy on a wide range of evaluation benchmarks validates the strong matching capability of our method.

跨图像生成健壮可靠的通信是各种应用程序的基本任务。为了在全局和局部粒度上捕获上下文，我们提出了ASpanFormer，这是一种基于transformer的无检测器匹配器，它建立在分层注意力结构上，采用了一种新颖的注意力操作，能够以自适应的方式调整注意力广度。为了实现这一目标，首先在每个交叉注意阶段对流图进行回归，定位搜索区域的中心。接下来，在中心周围生成采样网格，其大小不是经验配置为固定的，而是根据与流程图一起估计的像素不确定性自适应计算。最后，在派生区域内计算两幅图像的注意力，称为注意力广度。通过这些方法，我们不仅能够保持远程依赖关系，而且还能够在高相关性的像素之间实现细粒度关注，从而补偿匹配任务中必要的局部性和分段平滑性。在广泛的评估基准上的最先进的准确性验证了我们方法的强大匹配能力。

{"title":"ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer","authors":"Hongkai Chen, Zixin Luo, Lei Zhou, Yurun Tian, Mingmin Zhen, Tian Fang, D. McKinnon, Yanghai Tsin, Long Quan","doi":"10.48550/arXiv.2208.14201","DOIUrl":"https://doi.org/10.48550/arXiv.2208.14201","url":null,"abstract":"Generating robust and reliable correspondences across images is a fundamental task for a diversity of applications. To capture context at both global and local granularity, we propose ASpanFormer, a Transformer-based detector-free matcher that is built on hierarchical attention structure, adopting a novel attention operation which is capable of adjusting attention span in a self-adaptive manner. To achieve this goal, first, flow maps are regressed in each cross attention phase to locate the center of search region. Next, a sampling grid is generated around the center, whose size, instead of being empirically configured as fixed, is adaptively computed from a pixel uncertainty estimated along with the flow map. Finally, attention is computed across two images within derived regions, referred to as attention span. By these means, we are able to not only maintain long-range dependencies, but also enable fine-grained attention among pixels of high relevance that compensates essential locality and piece-wise smoothness in matching tasks. State-of-the-art accuracy on a wide range of evaluation benchmarks validates the strong matching capability of our method.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"14 1","pages":"20-36"},"PeriodicalIF":0.0,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81919749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

Open-Set Semi-Supervised Object Detection 开集半监督目标检测

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-08-29 DOI: 10.48550/arXiv.2208.13722

Yen-Cheng Liu, Chih-Yao Ma, Xiaoliang Dai, Junjiao Tian, Péter Vajda, Zijian He, Z. Kira

Recent developments for Semi-Supervised Object Detection (SSOD) have shown the promise of leveraging unlabeled data to improve an object detector. However, thus far these methods have assumed that the unlabeled data does not contain out-of-distribution (OOD) classes, which is unrealistic with larger-scale unlabeled datasets. In this paper, we consider a more practical yet challenging problem, Open-Set Semi-Supervised Object Detection (OSSOD). We first find the existing SSOD method obtains a lower performance gain in open-set conditions, and this is caused by the semantic expansion, where the distracting OOD objects are mispredicted as in-distribution pseudo-labels for the semi-supervised training. To address this problem, we consider online and offline OOD detection modules, which are integrated with SSOD methods. With the extensive studies, we found that leveraging an offline OOD detector based on a self-supervised vision transformer performs favorably against online OOD detectors due to its robustness to the interference of pseudo-labeling. In the experiment, our proposed framework effectively addresses the semantic expansion issue and shows consistent improvements on many OSSOD benchmarks, including large-scale COCO-OpenImages. We also verify the effectiveness of our framework under different OSSOD conditions, including varying numbers of in-distribution classes, different degrees of supervision, and different combinations of unlabeled sets.

半监督对象检测(SSOD)的最新发展显示了利用未标记数据来改进对象检测器的前景。然而，到目前为止，这些方法都假设未标记的数据不包含分布外(OOD)类，这对于更大规模的未标记数据集是不现实的。在本文中，我们考虑了一个更实际但更具挑战性的问题，开集半监督目标检测(OSSOD)。我们首先发现现有的SSOD方法在开放集条件下获得较低的性能增益，这是由于语义扩展导致的，其中分散的OOD对象被错误地预测为半监督训练的分布内伪标签。为了解决这个问题，我们考虑了与SSOD方法集成的在线和离线OOD检测模块。通过广泛的研究，我们发现利用基于自监督视觉变压器的离线OOD检测器由于其对伪标记干扰的鲁棒性而优于在线OOD检测器。在实验中，我们提出的框架有效地解决了语义扩展问题，并在许多OSSOD基准上显示出一致的改进，包括大规模的COCO-OpenImages。我们还验证了我们的框架在不同OSSOD条件下的有效性，包括不同数量的分布内类、不同程度的监督和不同的未标记集组合。

{"title":"Open-Set Semi-Supervised Object Detection","authors":"Yen-Cheng Liu, Chih-Yao Ma, Xiaoliang Dai, Junjiao Tian, Péter Vajda, Zijian He, Z. Kira","doi":"10.48550/arXiv.2208.13722","DOIUrl":"https://doi.org/10.48550/arXiv.2208.13722","url":null,"abstract":"Recent developments for Semi-Supervised Object Detection (SSOD) have shown the promise of leveraging unlabeled data to improve an object detector. However, thus far these methods have assumed that the unlabeled data does not contain out-of-distribution (OOD) classes, which is unrealistic with larger-scale unlabeled datasets. In this paper, we consider a more practical yet challenging problem, Open-Set Semi-Supervised Object Detection (OSSOD). We first find the existing SSOD method obtains a lower performance gain in open-set conditions, and this is caused by the semantic expansion, where the distracting OOD objects are mispredicted as in-distribution pseudo-labels for the semi-supervised training. To address this problem, we consider online and offline OOD detection modules, which are integrated with SSOD methods. With the extensive studies, we found that leveraging an offline OOD detector based on a self-supervised vision transformer performs favorably against online OOD detectors due to its robustness to the interference of pseudo-labeling. In the experiment, our proposed framework effectively addresses the semantic expansion issue and shows consistent improvements on many OSSOD benchmarks, including large-scale COCO-OpenImages. We also verify the effectiveness of our framework under different OSSOD conditions, including varying numbers of in-distribution classes, different degrees of supervision, and different combinations of unlabeled sets.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"109 1","pages":"143-159"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80715142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Anti-Retroactive Interference for Lifelong Learning 终身学习的抗追溯干扰

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-08-27 DOI: 10.48550/arXiv.2208.12967

Runqi Wang, Yuxiang Bao, Baochang Zhang, Jianzhuang Liu, Wentao Zhu, Guodong Guo

. Humans can continuously learn new knowledge. However, machine learning models suffer from drastic dropping in performance on previous tasks after learning new tasks. Cognitive science points out that the competition of similar knowledge is an important cause of forgetting. In this paper, we design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain. It tackles the problem from two aspects: extracting knowledge and memorizing knowledge. First, we disrupt the sample’s background distribution through a background attack, which strengthens the model to extract the key features of each task. Second, according to the similarity between incremental knowledge and base knowledge, we design an adaptive fusion of incremental knowledge, which helps the model allocate capacity to the knowledge of different difficulties. It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum. The proposed method is validated on the MNIST, CIFAR100, CUB200 and ImageNet100 datasets. The code is available at https://github.com/bhrqw/ARI .

．人类可以不断地学习新知识。然而，机器学习模型在学习新任务后，在先前任务上的性能会急剧下降。认知科学指出，相似知识的竞争是遗忘的重要原因。本文设计了一个基于元学习和大脑联想机制的终身学习范式。它从知识提取和知识记忆两个方面来解决这个问题。首先，我们通过背景攻击破坏样本的背景分布，增强模型以提取每个任务的关键特征。其次，根据增量知识与基础知识的相似性，设计增量知识的自适应融合，帮助模型对不同难度的知识进行容量分配;从理论上分析了所提出的学习范式能使不同任务的模型收敛到同一最优。在MNIST、CIFAR100、CUB200和ImageNet100数据集上对该方法进行了验证。代码可在https://github.com/bhrqw/ARI上获得。

{"title":"Anti-Retroactive Interference for Lifelong Learning","authors":"Runqi Wang, Yuxiang Bao, Baochang Zhang, Jianzhuang Liu, Wentao Zhu, Guodong Guo","doi":"10.48550/arXiv.2208.12967","DOIUrl":"https://doi.org/10.48550/arXiv.2208.12967","url":null,"abstract":". Humans can continuously learn new knowledge. However, machine learning models suffer from drastic dropping in performance on previous tasks after learning new tasks. Cognitive science points out that the competition of similar knowledge is an important cause of forgetting. In this paper, we design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain. It tackles the problem from two aspects: extracting knowledge and memorizing knowledge. First, we disrupt the sample’s background distribution through a background attack, which strengthens the model to extract the key features of each task. Second, according to the similarity between incremental knowledge and base knowledge, we design an adaptive fusion of incremental knowledge, which helps the model allocate capacity to the knowledge of different difficulties. It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum. The proposed method is validated on the MNIST, CIFAR100, CUB200 and ImageNet100 datasets. The code is available at https://github.com/bhrqw/ARI .","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"27 1","pages":"163-178"},"PeriodicalIF":0.0,"publicationDate":"2022-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82486029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation CMD:基于跨模态相互蒸馏的自监督3D动作表示学习

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-08-26 DOI: 10.48550/arXiv.2208.12448

Yunyao Mao, Wen-gang Zhou, Zhenbo Lu, Jiajun Deng, Houqiang Li

In 3D action recognition, there exists rich complementary information between skeleton modalities. Nevertheless, how to model and utilize this information remains a challenging problem for self-supervised 3D action representation learning. In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem. Different from classic distillation solutions that transfer the knowledge of a fixed and pre-trained teacher to the student, in this work, the knowledge is continuously updated and bidirectionally distilled between modalities. To this end, we propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs. On the one hand, the neighboring similarity distribution is introduced to model the knowledge learned in each modality, where the relational information is naturally suitable for the contrastive frameworks. On the other hand, asymmetrical configurations are used for teacher and student to stabilize the distillation process and to transfer high-confidence information between modalities. By derivation, we find that the cross-modal positive mining in previous works can be regarded as a degenerated version of our CMD. We perform extensive experiments on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD II datasets. Our approach outperforms existing self-supervised methods and sets a series of new records. The code is available at: https://github.com/maoyunyao/CMD

在三维动作识别中，骨骼形态之间存在着丰富的互补信息。然而，如何建模和利用这些信息仍然是自监督三维动作表示学习的一个具有挑战性的问题。在这项工作中，我们将跨模态交互描述为一个双向知识蒸馏问题。与经典的蒸馏解决方案不同，将固定和预先训练的教师的知识转移给学生，在这项工作中，知识不断更新，并在模式之间双向蒸馏。为此，我们提出了一个新的跨模态相互蒸馏(CMD)框架，其设计如下:一方面，引入相邻相似度分布对各模态学习到的知识进行建模，其中的关系信息自然适合于对比框架;另一方面，教师和学生使用不对称配置来稳定蒸馏过程，并在模态之间传递高置信度信息。通过推导，我们发现以前工作中的跨模态正挖掘可以看作是我们的CMD的退化版本。我们在NTU RGB+ d60、NTU RGB+ d120和PKU-MMD II数据集上进行了广泛的实验。我们的方法优于现有的自我监督方法，并创造了一系列新的记录。代码可从https://github.com/maoyunyao/CMD获得

{"title":"CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation","authors":"Yunyao Mao, Wen-gang Zhou, Zhenbo Lu, Jiajun Deng, Houqiang Li","doi":"10.48550/arXiv.2208.12448","DOIUrl":"https://doi.org/10.48550/arXiv.2208.12448","url":null,"abstract":"In 3D action recognition, there exists rich complementary information between skeleton modalities. Nevertheless, how to model and utilize this information remains a challenging problem for self-supervised 3D action representation learning. In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem. Different from classic distillation solutions that transfer the knowledge of a fixed and pre-trained teacher to the student, in this work, the knowledge is continuously updated and bidirectionally distilled between modalities. To this end, we propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs. On the one hand, the neighboring similarity distribution is introduced to model the knowledge learned in each modality, where the relational information is naturally suitable for the contrastive frameworks. On the other hand, asymmetrical configurations are used for teacher and student to stabilize the distillation process and to transfer high-confidence information between modalities. By derivation, we find that the cross-modal positive mining in previous works can be regarded as a degenerated version of our CMD. We perform extensive experiments on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD II datasets. Our approach outperforms existing self-supervised methods and sets a series of new records. The code is available at: https://github.com/maoyunyao/CMD","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"7 1","pages":"734-752"},"PeriodicalIF":0.0,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74250847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Discovering Transferable Forensic Features for CNN-generated Images Detection 发现可转移的取证特征为cnn生成的图像检测

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-08-24 DOI: 10.48550/arXiv.2208.11342

Keshigeyan Chandrasegaran, Ngoc-Trung Tran, A. Binder, Ngai-Man Cheung

Visual counterfeits are increasingly causing an existential conundrum in mainstream media with rapid evolution in neural image synthesis methods. Though detection of such counterfeits has been a taxing problem in the image forensics community, a recent class of forensic detectors -- universal detectors -- are able to surprisingly spot counterfeit images regardless of generator architectures, loss functions, training datasets, and resolutions. This intriguing property suggests the possible existence of transferable forensic features (T-FF) in universal detectors. In this work, we conduct the first analytical study to discover and understand T-FF in universal detectors. Our contributions are 2-fold: 1) We propose a novel forensic feature relevance statistic (FF-RS) to quantify and discover T-FF in universal detectors and, 2) Our qualitative and quantitative investigations uncover an unexpected finding: color is a critical T-FF in universal detectors. Code and models are available at https://keshik6.github.io/transferable-forensic-features/

随着神经图像合成方法的快速发展，视觉伪造越来越成为主流媒体存在的难题。尽管在图像取证社区中，检测此类伪造图像一直是一个棘手的问题，但最近一类法医探测器——通用探测器——能够令人惊讶地发现伪造图像，而不考虑生成器架构、损失函数、训练数据集和分辨率。这一有趣的特性表明，在宇宙探测器中可能存在可转移的法医特征(T-FF)。在这项工作中，我们进行了第一次分析研究，以发现和理解宇宙探测器中的T-FF。我们的贡献有两个方面:1)我们提出了一种新的法医特征相关统计(FF-RS)来量化和发现通用探测器中的T-FF; 2)我们的定性和定量研究揭示了一个意想不到的发现:颜色是通用探测器中关键的T-FF。代码和模型可在https://keshik6.github.io/transferable-forensic-features/上获得

{"title":"Discovering Transferable Forensic Features for CNN-generated Images Detection","authors":"Keshigeyan Chandrasegaran, Ngoc-Trung Tran, A. Binder, Ngai-Man Cheung","doi":"10.48550/arXiv.2208.11342","DOIUrl":"https://doi.org/10.48550/arXiv.2208.11342","url":null,"abstract":"Visual counterfeits are increasingly causing an existential conundrum in mainstream media with rapid evolution in neural image synthesis methods. Though detection of such counterfeits has been a taxing problem in the image forensics community, a recent class of forensic detectors -- universal detectors -- are able to surprisingly spot counterfeit images regardless of generator architectures, loss functions, training datasets, and resolutions. This intriguing property suggests the possible existence of transferable forensic features (T-FF) in universal detectors. In this work, we conduct the first analytical study to discover and understand T-FF in universal detectors. Our contributions are 2-fold: 1) We propose a novel forensic feature relevance statistic (FF-RS) to quantify and discover T-FF in universal detectors and, 2) Our qualitative and quantitative investigations uncover an unexpected finding: color is a critical T-FF in universal detectors. Code and models are available at https://keshik6.github.io/transferable-forensic-features/","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"3 1","pages":"671-689"},"PeriodicalIF":0.0,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88266800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Ultra-high-resolution unpaired stain transformation via Kernelized Instance Normalization 基于核实例归一化的超高分辨率非配对染色变换

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-08-23 DOI: 10.48550/arXiv.2208.10730

M. Ho, Min Wu, Che-Ming Wu

While hematoxylin and eosin (H&E) is a standard staining procedure, immunohistochemistry (IHC) staining further serves as a diagnostic and prognostic method. However, acquiring special staining results requires substantial costs. Hence, we proposed a strategy for ultra-high-resolution unpaired image-to-image translation: Kernelized Instance Normalization (KIN), which preserves local information and successfully achieves seamless stain transformation with constant GPU memory usage. Given a patch, corresponding position, and a kernel, KIN computes local statistics using convolution operation. In addition, KIN can be easily plugged into most currently developed frameworks without re-training. We demonstrate that KIN achieves state-of-the-art stain transformation by replacing instance normalization (IN) layers with KIN layers in three popular frameworks and testing on two histopathological datasets. Furthermore, we manifest the generalizability of KIN with high-resolution natural images. Finally, human evaluation and several objective metrics are used to compare the performance of different approaches. Overall, this is the first successful study for the ultra-high-resolution unpaired image-to-image translation with constant space complexity. Code is available at: https://github.com/Kaminyou/URUST

虽然苏木精和伊红(H&E)是标准的染色方法，但免疫组织化学(IHC)染色还可以作为诊断和预后的方法。然而，获得特殊的染色效果需要大量的成本。因此，我们提出了一种超高分辨率非配对图像到图像转换的策略:kernel - ized Instance Normalization (KIN)，该策略保留了局部信息，并成功地在恒定的GPU内存使用下实现了无缝的染色转换。给定一个补丁、对应的位置和一个内核，KIN使用卷积运算计算局部统计信息。此外，KIN可以很容易地插入到大多数当前开发的框架中，而无需重新培训。我们证明，通过在三个流行的框架中用KIN层替换实例规范化(IN)层，并在两个组织病理学数据集上进行测试，KIN实现了最先进的染色转换。此外，我们用高分辨率自然图像证明了KIN的泛化性。最后，利用人的评价和几个客观指标来比较不同方法的性能。总的来说，这是首次成功研究具有恒定空间复杂度的超高分辨率非配对图像到图像的翻译。代码可从https://github.com/Kaminyou/URUST获得

{"title":"Ultra-high-resolution unpaired stain transformation via Kernelized Instance Normalization","authors":"M. Ho, Min Wu, Che-Ming Wu","doi":"10.48550/arXiv.2208.10730","DOIUrl":"https://doi.org/10.48550/arXiv.2208.10730","url":null,"abstract":"While hematoxylin and eosin (H&E) is a standard staining procedure, immunohistochemistry (IHC) staining further serves as a diagnostic and prognostic method. However, acquiring special staining results requires substantial costs. Hence, we proposed a strategy for ultra-high-resolution unpaired image-to-image translation: Kernelized Instance Normalization (KIN), which preserves local information and successfully achieves seamless stain transformation with constant GPU memory usage. Given a patch, corresponding position, and a kernel, KIN computes local statistics using convolution operation. In addition, KIN can be easily plugged into most currently developed frameworks without re-training. We demonstrate that KIN achieves state-of-the-art stain transformation by replacing instance normalization (IN) layers with KIN layers in three popular frameworks and testing on two histopathological datasets. Furthermore, we manifest the generalizability of KIN with high-resolution natural images. Finally, human evaluation and several objective metrics are used to compare the performance of different approaches. Overall, this is the first successful study for the ultra-high-resolution unpaired image-to-image translation with constant space complexity. Code is available at: https://github.com/Kaminyou/URUST","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"47 1","pages":"490-505"},"PeriodicalIF":0.0,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75992416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3