首页 > 最新文献

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision最新文献

英文 中文
Multi-Granularity Prediction for Scene Text Recognition 场景文本识别的多粒度预测
P. Wang, Cheng Da, C. Yao
. Scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this challenging problem, numerous innovative methods have been successively proposed and incorporating linguistic knowledge into STR models has recently become a prominent trend. In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet powerful vision STR model, which is built upon ViT and outperforms previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods. To integrate linguistic knowledge, we further propose a Multi-Granularity Prediction strategy to inject information from the language modality into the model in an implicit way, i.e. , subword representations (BPE and WordPiece) widely-used in NLP are introduced into the output space, in addition to the conventional character level representation, while no independent language model (LM) is adopted. The resultant algorithm (termed MGP-STR) is able to push the performance envelop of STR to an even higher level. Specifically, it achieves an average recognition accuracy of 93 . 35% on standard benchmarks. Code will be released soon.
. 场景文本识别(STR)是计算机视觉领域一个活跃的研究课题。为了解决这一具有挑战性的问题,相继提出了许多创新的方法,将语言知识纳入STR模型最近成为一个突出的趋势。在这项工作中,我们首先从视觉转换(ViT)的最新进展中汲取灵感,构建一个概念简单但功能强大的视觉STR模型,该模型建立在视觉转换(ViT)的基础上,优于以前最先进的场景文本识别模型,包括纯视觉模型和语言增强方法。为了整合语言知识,我们进一步提出了一种多粒度预测策略,以隐式的方式将语言形态的信息注入到模型中,即在传统的字符级表示之外,在输出空间中引入NLP中广泛使用的子词表示(BPE和WordPiece),而不采用独立的语言模型(LM)。由此产生的算法(称为MGP-STR)能够将STR的性能提升到更高的水平。具体来说,它的平均识别准确率达到了93。35%的标准基准。代码将很快发布。
{"title":"Multi-Granularity Prediction for Scene Text Recognition","authors":"P. Wang, Cheng Da, C. Yao","doi":"10.48550/arXiv.2209.03592","DOIUrl":"https://doi.org/10.48550/arXiv.2209.03592","url":null,"abstract":". Scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this challenging problem, numerous innovative methods have been successively proposed and incorporating linguistic knowledge into STR models has recently become a prominent trend. In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet powerful vision STR model, which is built upon ViT and outperforms previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods. To integrate linguistic knowledge, we further propose a Multi-Granularity Prediction strategy to inject information from the language modality into the model in an implicit way, i.e. , subword representations (BPE and WordPiece) widely-used in NLP are introduced into the output space, in addition to the conventional character level representation, while no independent language model (LM) is adopted. The resultant algorithm (termed MGP-STR) is able to push the performance envelop of STR to an even higher level. Specifically, it achieves an average recognition accuracy of 93 . 35% on standard benchmarks. Code will be released soon.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"112 1","pages":"339-355"},"PeriodicalIF":0.0,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80658762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Unpaired Image Translation via Vector Symbolic Architectures 基于矢量符号结构的非配对图像翻译
Justin D. Theiss, Jay Leverett, Daeil Kim, Aayush Prakash
Image-to-image translation has played an important role in enabling synthetic data for computer vision. However, if the source and target domains have a large semantic mismatch, existing techniques often suffer from source content corruption aka semantic flipping. To address this problem, we propose a new paradigm for image-to-image translation using Vector Symbolic Architectures (VSA), a theoretical framework which defines algebraic operations in a high-dimensional vector (hypervector) space. We introduce VSA-based constraints on adversarial learning for source-to-target translations by learning a hypervector mapping that inverts the translation to ensure consistency with source content. We show both qualitatively and quantitatively that our method improves over other state-of-the-art techniques.
图像到图像的转换在为计算机视觉提供合成数据方面发挥了重要作用。然而,如果源域和目标域有很大的语义不匹配,现有的技术往往会遭受源内容损坏,即语义翻转。为了解决这个问题,我们提出了一个使用向量符号体系结构(VSA)的图像到图像转换的新范式,VSA是一个理论框架,它定义了高维向量(超向量)空间中的代数运算。我们引入了基于vsa的对抗性学习约束,通过学习一个反向翻译的超向量映射来确保与源内容的一致性。我们在定性和定量上都表明,我们的方法优于其他最先进的技术。
{"title":"Unpaired Image Translation via Vector Symbolic Architectures","authors":"Justin D. Theiss, Jay Leverett, Daeil Kim, Aayush Prakash","doi":"10.48550/arXiv.2209.02686","DOIUrl":"https://doi.org/10.48550/arXiv.2209.02686","url":null,"abstract":"Image-to-image translation has played an important role in enabling synthetic data for computer vision. However, if the source and target domains have a large semantic mismatch, existing techniques often suffer from source content corruption aka semantic flipping. To address this problem, we propose a new paradigm for image-to-image translation using Vector Symbolic Architectures (VSA), a theoretical framework which defines algebraic operations in a high-dimensional vector (hypervector) space. We introduce VSA-based constraints on adversarial learning for source-to-target translations by learning a hypervector mapping that inverts the translation to ensure consistency with source content. We show both qualitatively and quantitatively that our method improves over other state-of-the-art techniques.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"8 1","pages":"17-32"},"PeriodicalIF":0.0,"publicationDate":"2022-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75183219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Towards Accurate Binary Neural Networks via Modeling Contextual Dependencies 通过上下文依赖关系建模实现精确的二元神经网络
Xingrun Xing, Yangguang Li, Wei Li, Wenrui Ding, Yalong Jiang, Yufeng Wang, Jinghua Shao, Chunlei Liu, Xianglong Liu
, Abstract. Existing Binary Neural Networks (BNNs) mainly operate on local convolutions with binarization function. However, such simple bit operations lack the ability of modeling contextual dependencies, which is critical for learning discriminative deep representations in vision models. In this work, we tackle this issue by presenting new designs of binary neural modules, which enables BNNs to learn effective contextual dependencies. First, we propose a binary multi-layer perceptron (MLP) block as an alternative to binary convolution blocks to directly model contextual dependencies. Both short-range and long-range feature dependencies are modeled by binary MLPs, where the former provides local inductive bias and the latter breaks limited receptive field in binary convolutions. Second, to improve the robustness of binary models with contextual dependencies, we compute the contextual dynamic embeddings to determine the binarization thresholds in general binary convolutional blocks. Armed with our binary MLP blocks and improved binary convolution, we build the BNNs with explicit Contextual De-pendency modeling, termed as BCDNet. On the standard ImageNet-1K classification benchmark, the BCDNet achieves 72.3% Top-1 accuracy and outperforms leading binary methods by a large margin. In particu-lar, the proposed BCDNet exceeds the state-of-the-art ReActNet-A by 2.9% Top-1 accuracy with similar operations. Our code is available at https://github.com/Sense-GVT/BCDNet .
、抽象。现有的二值神经网络主要是在局部卷积上进行二值化运算。然而,这种简单的位操作缺乏对上下文依赖关系建模的能力,而上下文依赖关系对于学习视觉模型中的判别深度表示至关重要。在这项工作中,我们通过提出新的二元神经模块设计来解决这个问题,这使得bnn能够学习有效的上下文依赖关系。首先,我们提出了一个二元多层感知器(MLP)块作为二元卷积块的替代方案,直接对上下文依赖关系进行建模。在二元mlp模型中,前者提供了局部归纳偏置,后者打破了二元卷积中的有限接受域。其次,为了提高具有上下文相关性的二元模型的鲁棒性,我们计算了上下文动态嵌入来确定一般二进制卷积块的二值化阈值。利用我们的二进制MLP块和改进的二进制卷积,我们使用显式上下文依赖建模(称为BCDNet)构建了bnn。在标准的ImageNet-1K分类基准上,BCDNet达到了72.3%的Top-1准确率,并且大大优于领先的二值化方法。特别是,拟议的BCDNet在类似操作下比最先进的ReActNet-A精度高出2.9%。我们的代码可在https://github.com/Sense-GVT/BCDNet上获得。
{"title":"Towards Accurate Binary Neural Networks via Modeling Contextual Dependencies","authors":"Xingrun Xing, Yangguang Li, Wei Li, Wenrui Ding, Yalong Jiang, Yufeng Wang, Jinghua Shao, Chunlei Liu, Xianglong Liu","doi":"10.48550/arXiv.2209.01404","DOIUrl":"https://doi.org/10.48550/arXiv.2209.01404","url":null,"abstract":", Abstract. Existing Binary Neural Networks (BNNs) mainly operate on local convolutions with binarization function. However, such simple bit operations lack the ability of modeling contextual dependencies, which is critical for learning discriminative deep representations in vision models. In this work, we tackle this issue by presenting new designs of binary neural modules, which enables BNNs to learn effective contextual dependencies. First, we propose a binary multi-layer perceptron (MLP) block as an alternative to binary convolution blocks to directly model contextual dependencies. Both short-range and long-range feature dependencies are modeled by binary MLPs, where the former provides local inductive bias and the latter breaks limited receptive field in binary convolutions. Second, to improve the robustness of binary models with contextual dependencies, we compute the contextual dynamic embeddings to determine the binarization thresholds in general binary convolutional blocks. Armed with our binary MLP blocks and improved binary convolution, we build the BNNs with explicit Contextual De-pendency modeling, termed as BCDNet. On the standard ImageNet-1K classification benchmark, the BCDNet achieves 72.3% Top-1 accuracy and outperforms leading binary methods by a large margin. In particu-lar, the proposed BCDNet exceeds the state-of-the-art ReActNet-A by 2.9% Top-1 accuracy with similar operations. Our code is available at https://github.com/Sense-GVT/BCDNet .","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"8 1","pages":"536-552"},"PeriodicalIF":0.0,"publicationDate":"2022-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86602070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions 大规模非平稳任务分布下较少遗忘的元学习
Zhenyi Wang, Li Shen, Le Fang, Qiuling Suo, Dongling Zhan, Tiehang Duan, Mingchen Gao
. The paradigm of machine intelligence moves from purely supervised learning to a more practical scenario when many loosely related unlabeled data are available and labeled data is scarce. Most existing algo-rithms assume that the underlying task distribution is stationary. Here we consider a more realistic and challenging setting in that task distributions evolve over time. We name this problem as S emi-supervised meta-learning with E volving T ask di S tributions, abbreviated as SETS . Two key challenges arise in this more realistic setting: (i) how to use unlabeled data in the presence of a large amount of unlabeled out-of-distribution (OOD) data; and (ii) how to prevent catastrophic forgetting on previously learned task distributions due to the task distribution shift. We propose an O OD R obust and knowle D ge pres E rved semi-supe R vised meta-learning approach ( ORDER ) ‡ , to tackle these two major challenges. Specifically, our ORDER introduces a novel mutual information regularization to robustify the model with unlabeled OOD data and adopts an optimal transport regularization to remember previously learned knowledge in feature space. In addition, we test our method on a very challenging dataset: SETS on large-scale non-stationary semi-supervised task distributions consisting of (at least) 72K tasks. With extensive experiments, we demonstrate the proposed ORDER alleviates forgetting on evolving task distributions and is more robust to OOD data than related strong baselines.
. 机器智能的范式从纯粹的监督学习转向更实际的场景,当许多松散相关的未标记数据可用而标记数据稀缺时。大多数现有算法都假定底层任务分布是平稳的。在这里,我们考虑一个更现实和更具挑战性的设置,即任务分布随着时间的推移而变化。我们将这个问题命名为S -半监督元学习,其中E包含S个子集,缩写为set。在这种更现实的环境中出现了两个关键挑战:(i)如何在存在大量未标记的分布外(OOD)数据的情况下使用未标记数据;(ii)如何防止由于任务分布的转移而导致的对先前学习的任务分布的灾难性遗忘。为了解决这两个主要的挑战,我们提出了一种基于知识的半超学习型元学习方法(ORDER)。具体来说,我们的ORDER引入了一种新的互信息正则化来对未标记的OOD数据模型进行鲁棒化,并采用最优传输正则化来记住特征空间中先前学习的知识。此外,我们在一个非常具有挑战性的数据集上测试了我们的方法:set在由(至少)72K个任务组成的大规模非平稳半监督任务分布上。通过大量的实验,我们证明了所提出的ORDER减轻了对不断变化的任务分布的遗忘,并且比相关的强基线对OOD数据更具鲁棒性。
{"title":"Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions","authors":"Zhenyi Wang, Li Shen, Le Fang, Qiuling Suo, Dongling Zhan, Tiehang Duan, Mingchen Gao","doi":"10.48550/arXiv.2209.01501","DOIUrl":"https://doi.org/10.48550/arXiv.2209.01501","url":null,"abstract":". The paradigm of machine intelligence moves from purely supervised learning to a more practical scenario when many loosely related unlabeled data are available and labeled data is scarce. Most existing algo-rithms assume that the underlying task distribution is stationary. Here we consider a more realistic and challenging setting in that task distributions evolve over time. We name this problem as S emi-supervised meta-learning with E volving T ask di S tributions, abbreviated as SETS . Two key challenges arise in this more realistic setting: (i) how to use unlabeled data in the presence of a large amount of unlabeled out-of-distribution (OOD) data; and (ii) how to prevent catastrophic forgetting on previously learned task distributions due to the task distribution shift. We propose an O OD R obust and knowle D ge pres E rved semi-supe R vised meta-learning approach ( ORDER ) ‡ , to tackle these two major challenges. Specifically, our ORDER introduces a novel mutual information regularization to robustify the model with unlabeled OOD data and adopts an optimal transport regularization to remember previously learned knowledge in feature space. In addition, we test our method on a very challenging dataset: SETS on large-scale non-stationary semi-supervised task distributions consisting of (at least) 72K tasks. With extensive experiments, we demonstrate the proposed ORDER alleviates forgetting on evolving task distributions and is more robust to OOD data than related strong baselines.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"38 1","pages":"221-238"},"PeriodicalIF":0.0,"publicationDate":"2022-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86359811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Revisiting Outer Optimization in Adversarial Training 对抗性训练中的外部优化问题重述
Ali Dabouei, Fariborz Taherkhani, Sobhan Soleymani, N. Nasrabadi
. Despite the fundamental distinction between adversarial and natural training (AT and NT), AT methods generally adopt momentum SGD (MSGD) for the outer optimization. This paper aims to analyze this choice by investigating the overlooked role of outer optimization in AT. Our exploratory evaluations reveal that AT induces higher gradient norm and variance compared to NT. This phenomenon hinders the outer optimization in AT since the convergence rate of MSGD is highly dependent on the variance of the gradients. To this end, we propose an optimization method called ENGM which regularizes the contribution of each input example to the average mini-batch gradients. We prove that the convergence rate of ENGM is independent of the variance of the gradients, and thus, it is suitable for AT. We introduce a trick to reduce the computational cost of ENGM using empirical observations on the correlation between the norm of gradients w.r.t. the network parameters and input examples. Our extensive evaluations and ablation studies on CIFAR-10, CIFAR-100, and TinyImageNet demonstrate that ENGM and its variants consistently improve the performance of a wide range of AT methods. Furthermore, ENGM alleviates major shortcomings of AT including robust overfitting and high sensitivity to hyperparameter settings.
。尽管对抗性训练和自然训练(AT和NT)之间存在根本性的区别,但AT方法通常采用动量SGD (MSGD)进行外部优化。本文旨在通过研究外部优化在自动化生产中被忽视的作用来分析这种选择。我们的探索性评估表明,与NT相比,AT诱导了更高的梯度范数和方差,这一现象阻碍了AT的外部优化,因为MSGD的收敛速度高度依赖于梯度的方差。为此,我们提出了一种称为ENGM的优化方法,该方法对每个输入样本对平均小批梯度的贡献进行正则化。我们证明了ENGM的收敛速度与梯度的方差无关,因此它适用于AT。我们引入了一种技巧来减少ENGM的计算成本,利用经验观察梯度范数与网络参数和输入示例之间的相关性。我们对CIFAR-10、CIFAR-100和TinyImageNet进行了广泛的评估和消蚀研究,结果表明,ENGM及其变体持续提高了各种AT方法的性能。此外,ENGM缓解了AT的主要缺点,包括鲁棒过拟合和对超参数设置的高灵敏度。
{"title":"Revisiting Outer Optimization in Adversarial Training","authors":"Ali Dabouei, Fariborz Taherkhani, Sobhan Soleymani, N. Nasrabadi","doi":"10.48550/arXiv.2209.01199","DOIUrl":"https://doi.org/10.48550/arXiv.2209.01199","url":null,"abstract":". Despite the fundamental distinction between adversarial and natural training (AT and NT), AT methods generally adopt momentum SGD (MSGD) for the outer optimization. This paper aims to analyze this choice by investigating the overlooked role of outer optimization in AT. Our exploratory evaluations reveal that AT induces higher gradient norm and variance compared to NT. This phenomenon hinders the outer optimization in AT since the convergence rate of MSGD is highly dependent on the variance of the gradients. To this end, we propose an optimization method called ENGM which regularizes the contribution of each input example to the average mini-batch gradients. We prove that the convergence rate of ENGM is independent of the variance of the gradients, and thus, it is suitable for AT. We introduce a trick to reduce the computational cost of ENGM using empirical observations on the correlation between the norm of gradients w.r.t. the network parameters and input examples. Our extensive evaluations and ablation studies on CIFAR-10, CIFAR-100, and TinyImageNet demonstrate that ENGM and its variants consistently improve the performance of a wide range of AT methods. Furthermore, ENGM alleviates major shortcomings of AT including robust overfitting and high sensitivity to hyperparameter settings.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"1 1","pages":"244-261"},"PeriodicalIF":0.0,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88710069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation 基于序列到序列转换的统一的完全和时间戳监督的时间动作分割
Nadine Behrmann, S. Golestaneh, Zico Kolter, Juergen Gall, M. Noroozi
This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup. In contrast to current state-of-the-art frame-level prediction methods, we view action segmentation as a seq2seq translation task, i.e., mapping a sequence of video frames to a sequence of action segments. Our proposed method involves a series of modifications and auxiliary loss functions on the standard Transformer seq2seq translation model to cope with long input sequences opposed to short output sequences and relatively few videos. We incorporate an auxiliary supervision signal for the encoder via a frame-wise loss and propose a separate alignment decoder for an implicit duration prediction. Finally, we extend our framework to the timestamp supervised setting via our proposed constrained k-medoids algorithm to generate pseudo-segmentations. Our proposed framework performs consistently on both fully and timestamp supervised settings, outperforming or competing state-of-the-art on several datasets. Our code is publicly available at https://github.com/boschresearch/UVAST.
本文介绍了一个统一的视频动作分割框架,该框架是在完全时间戳监督下通过序列到序列(seq2seq)转换实现的。与当前最先进的帧级预测方法相比,我们将动作分割视为一个seq2seq转换任务,即将一系列视频帧映射到一系列动作片段。我们提出的方法包括对标准Transformer seq2seq翻译模型进行一系列修改和辅助损失函数,以应对长输入序列而不是短输出序列和相对较少的视频。我们通过逐帧损失为编码器合并了辅助监督信号,并提出了一个单独的对齐解码器,用于隐式持续时间预测。最后,我们通过我们提出的约束k-medoids算法将我们的框架扩展到时间戳监督设置来生成伪分割。我们提出的框架在完全和时间戳监督设置上表现一致,在几个数据集上优于或竞争最先进的技术。我们的代码可以在https://github.com/boschresearch/UVAST上公开获得。
{"title":"Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation","authors":"Nadine Behrmann, S. Golestaneh, Zico Kolter, Juergen Gall, M. Noroozi","doi":"10.48550/arXiv.2209.00638","DOIUrl":"https://doi.org/10.48550/arXiv.2209.00638","url":null,"abstract":"This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup. In contrast to current state-of-the-art frame-level prediction methods, we view action segmentation as a seq2seq translation task, i.e., mapping a sequence of video frames to a sequence of action segments. Our proposed method involves a series of modifications and auxiliary loss functions on the standard Transformer seq2seq translation model to cope with long input sequences opposed to short output sequences and relatively few videos. We incorporate an auxiliary supervision signal for the encoder via a frame-wise loss and propose a separate alignment decoder for an implicit duration prediction. Finally, we extend our framework to the timestamp supervised setting via our proposed constrained k-medoids algorithm to generate pseudo-segmentations. Our proposed framework performs consistently on both fully and timestamp supervised settings, outperforming or competing state-of-the-art on several datasets. Our code is publicly available at https://github.com/boschresearch/UVAST.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"5 1","pages":"52-68"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83706870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
MIME: Minority Inclusion for Majority Group Enhancement of AI Performance MIME:少数群体融入多数群体增强AI性能
Pradyumna Chari, Yunhao Ba, Shreeram S. Athreya, A. Kadambi
Several papers have rightly included minority groups in artificial intelligence (AI) training data to improve test inference for minority groups and/or society-at-large. A society-at-large consists of both minority and majority stakeholders. A common misconception is that minority inclusion does not increase performance for majority groups alone. In this paper, we make the surprising finding that including minority samples can improve test error for the majority group. In other words, minority group inclusion leads to majority group enhancements (MIME) in performance. A theoretical existence proof of the MIME effect is presented and found to be consistent with experimental results on six different datasets. Project webpage: https://visual.ee.ucla.edu/mime.htm/
一些论文正确地将少数群体纳入人工智能(AI)训练数据中,以改善少数群体和/或整个社会的测试推断。整个社会由少数利益相关者和多数利益相关者组成。一个常见的误解是,少数群体的加入并不会单独提高多数群体的表现。在本文中,我们得到了令人惊讶的发现,包括少数样本可以改善多数群体的测试误差。换句话说,包含少数组可以提高多数组的性能(MIME)。提出了MIME效应的理论存在性证明,并在六个不同的数据集上与实验结果相一致。项目网页:https://visual.ee.ucla.edu/mime.htm/
{"title":"MIME: Minority Inclusion for Majority Group Enhancement of AI Performance","authors":"Pradyumna Chari, Yunhao Ba, Shreeram S. Athreya, A. Kadambi","doi":"10.48550/arXiv.2209.00746","DOIUrl":"https://doi.org/10.48550/arXiv.2209.00746","url":null,"abstract":"Several papers have rightly included minority groups in artificial intelligence (AI) training data to improve test inference for minority groups and/or society-at-large. A society-at-large consists of both minority and majority stakeholders. A common misconception is that minority inclusion does not increase performance for majority groups alone. In this paper, we make the surprising finding that including minority samples can improve test error for the majority group. In other words, minority group inclusion leads to majority group enhancements (MIME) in performance. A theoretical existence proof of the MIME effect is presented and found to be consistent with experimental results on six different datasets. Project webpage: https://visual.ee.ucla.edu/mime.htm/","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"47 1","pages":"326-343"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76737266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploring Gradient-based Multi-directional Controls in GANs gan中基于梯度的多向控制研究
Zikun Chen, R. Jiang, Brendan Duke, Han Zhao, P. Aarabi
Generative Adversarial Networks (GANs) have been widely applied in modeling diverse image distributions. However, despite its impressive applications, the structure of the latent space in GANs largely remains as a black-box, leaving its controllable generation an open problem, especially when spurious correlations between different semantic attributes exist in the image distributions. To address this problem, previous methods typically learn linear directions or individual channels that control semantic attributes in the image space. However, they often suffer from imperfect disentanglement, or are unable to obtain multi-directional controls. In this work, in light of the above challenges, we propose a novel approach that discovers nonlinear controls, which enables multi-directional manipulation as well as effective disentanglement, based on gradient information in the learned GAN latent space. More specifically, we first learn interpolation directions by following the gradients from classification networks trained separately on the attributes, and then navigate the latent space by exclusively controlling channels activated for the target attribute in the learned directions. Empirically, with small training data, our approach is able to gain fine-grained controls over a diverse set of bi-directional and multi-directional attributes, and we showcase its ability to achieve disentanglement significantly better than state-of-the-art methods both qualitatively and quantitatively.
生成对抗网络(GANs)已广泛应用于各种图像分布的建模。然而,尽管具有令人印象深刻的应用,gan中潜在空间的结构在很大程度上仍然是一个黑盒,使其可控生成成为一个开放问题,特别是当图像分布中存在不同语义属性之间的虚假相关性时。为了解决这个问题,以前的方法通常学习线性方向或控制图像空间中语义属性的单个通道。然而,它们经常遭受不完美的解缠,或者无法获得多向控制。在这项工作中,鉴于上述挑战,我们提出了一种基于学习到的GAN潜在空间中的梯度信息发现非线性控制的新方法,该方法可以实现多向操作以及有效的解纠缠。更具体地说,我们首先通过跟踪属性上单独训练的分类网络的梯度来学习插值方向,然后通过在学习方向上专门控制为目标属性激活的通道来导航潜在空间。从经验上讲,使用小的训练数据,我们的方法能够获得对各种双向和多向属性的细粒度控制,并且我们展示了它在定性和定量上比最先进的方法更好地实现解纠缠的能力。
{"title":"Exploring Gradient-based Multi-directional Controls in GANs","authors":"Zikun Chen, R. Jiang, Brendan Duke, Han Zhao, P. Aarabi","doi":"10.48550/arXiv.2209.00698","DOIUrl":"https://doi.org/10.48550/arXiv.2209.00698","url":null,"abstract":"Generative Adversarial Networks (GANs) have been widely applied in modeling diverse image distributions. However, despite its impressive applications, the structure of the latent space in GANs largely remains as a black-box, leaving its controllable generation an open problem, especially when spurious correlations between different semantic attributes exist in the image distributions. To address this problem, previous methods typically learn linear directions or individual channels that control semantic attributes in the image space. However, they often suffer from imperfect disentanglement, or are unable to obtain multi-directional controls. In this work, in light of the above challenges, we propose a novel approach that discovers nonlinear controls, which enables multi-directional manipulation as well as effective disentanglement, based on gradient information in the learned GAN latent space. More specifically, we first learn interpolation directions by following the gradients from classification networks trained separately on the attributes, and then navigate the latent space by exclusively controlling channels activated for the target attribute in the learned directions. Empirically, with small training data, our approach is able to gain fine-grained controls over a diverse set of bi-directional and multi-directional attributes, and we showcase its ability to achieve disentanglement significantly better than state-of-the-art methods both qualitatively and quantitatively.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"33 1","pages":"104-119"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76670434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
SimpleRecon: 3D Reconstruction Without 3D Convolutions SimpleRecon: 3D重建没有3D卷积
Mohamed Sayed, J. Gibson, Jamie Watson, V. Prisacariu, Michael Firman, Clément Godard
Traditionally, 3D indoor scene reconstruction from posed images happens in two phases: per-image depth estimation, followed by depth merging and surface reconstruction. Recently, a family of methods have emerged that perform reconstruction directly in final 3D volumetric feature space. While these methods have shown impressive reconstruction results, they rely on expensive 3D convolutional layers, limiting their application in resource-constrained environments. In this work, we instead go back to the traditional route, and show how focusing on high quality multi-view depth prediction leads to highly accurate 3D reconstructions using simple off-the-shelf depth fusion. We propose a simple state-of-the-art multi-view depth estimator with two main contributions: 1) a carefully-designed 2D CNN which utilizes strong image priors alongside a plane-sweep feature volume and geometric losses, combined with 2) the integration of keyframe and geometric metadata into the cost volume which allows informed depth plane scoring. Our method achieves a significant lead over the current state-of-the-art for depth estimation and close or better for 3D reconstruction on ScanNet and 7-Scenes, yet still allows for online real-time low-memory reconstruction. Code, models and results are available at https://nianticlabs.github.io/simplerecon
传统的三维室内场景重建方法分为两个阶段:图像深度估计、深度合并和表面重建。最近,出现了一系列直接在最终三维体积特征空间中进行重建的方法。虽然这些方法显示了令人印象深刻的重建结果,但它们依赖于昂贵的3D卷积层,限制了它们在资源受限环境中的应用。在这项工作中,我们回到了传统的路线,并展示了如何专注于高质量的多视图深度预测,从而使用简单的现成深度融合实现高精度的3D重建。我们提出了一个简单的最先进的多视图深度估计器,它有两个主要贡献:1)一个精心设计的2D CNN,它利用强大的图像先验以及平面扫描特征体积和几何损失,结合2)将关键帧和几何元数据集成到成本体积中,从而允许知情深度平面评分。我们的方法在深度估计和ScanNet和7-Scenes上接近或更好的3D重建方面取得了显著的领先优势,但仍然允许在线实时低内存重建。代码、模型和结果可在https://nianticlabs.github.io/simplerecon上获得
{"title":"SimpleRecon: 3D Reconstruction Without 3D Convolutions","authors":"Mohamed Sayed, J. Gibson, Jamie Watson, V. Prisacariu, Michael Firman, Clément Godard","doi":"10.48550/arXiv.2208.14743","DOIUrl":"https://doi.org/10.48550/arXiv.2208.14743","url":null,"abstract":"Traditionally, 3D indoor scene reconstruction from posed images happens in two phases: per-image depth estimation, followed by depth merging and surface reconstruction. Recently, a family of methods have emerged that perform reconstruction directly in final 3D volumetric feature space. While these methods have shown impressive reconstruction results, they rely on expensive 3D convolutional layers, limiting their application in resource-constrained environments. In this work, we instead go back to the traditional route, and show how focusing on high quality multi-view depth prediction leads to highly accurate 3D reconstructions using simple off-the-shelf depth fusion. We propose a simple state-of-the-art multi-view depth estimator with two main contributions: 1) a carefully-designed 2D CNN which utilizes strong image priors alongside a plane-sweep feature volume and geometric losses, combined with 2) the integration of keyframe and geometric metadata into the cost volume which allows informed depth plane scoring. Our method achieves a significant lead over the current state-of-the-art for depth estimation and close or better for 3D reconstruction on ScanNet and 7-Scenes, yet still allows for online real-time low-memory reconstruction. Code, models and results are available at https://nianticlabs.github.io/simplerecon","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"170 1","pages":"1-19"},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79386930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Style-Agnostic Reinforcement Learning 风格不可知的强化学习
Juyong Lee, Seokjun Ahn, Jaesik Park
We present a novel method of learning style-agnostic representation using both style transfer and adversarial learning in the reinforcement learning framework. The style, here, refers to task-irrelevant details such as the color of the background in the images, where generalizing the learned policy across environments with different styles is still a challenge. Focusing on learning style-agnostic representations, our method trains the actor with diverse image styles generated from an inherent adversarial style perturbation generator, which plays a min-max game between the actor and the generator, without demanding expert knowledge for data augmentation or additional class labels for adversarial training. We verify that our method achieves competitive or better performances than the state-of-the-art approaches on Procgen and Distracting Control Suite benchmarks, and further investigate the features extracted from our model, showing that the model better captures the invariants and is less distracted by the shifted style. The code is available at https://github.com/POSTECH-CVLab/style-agnostic-RL.
我们提出了一种在强化学习框架中使用风格迁移和对抗学习来学习风格不可知表征的新方法。这里的风格指的是与任务无关的细节,比如图像中背景的颜色,在不同风格的环境中泛化学习到的策略仍然是一个挑战。专注于学习风格不可知表示,我们的方法用一个固有的对抗性风格扰动生成器生成的不同图像风格来训练actor,该生成器在actor和生成器之间进行最小-最大博弈,而不需要专家知识来增强数据或额外的类标签来进行对抗性训练。我们验证了我们的方法在Procgen和distraction Control Suite基准测试上取得了与最先进的方法相比具有竞争力或更好的性能,并进一步研究了从我们的模型中提取的特征,表明该模型更好地捕获了不变量,并且较少受到转移风格的干扰。代码可在https://github.com/POSTECH-CVLab/style-agnostic-RL上获得。
{"title":"Style-Agnostic Reinforcement Learning","authors":"Juyong Lee, Seokjun Ahn, Jaesik Park","doi":"10.48550/arXiv.2208.14863","DOIUrl":"https://doi.org/10.48550/arXiv.2208.14863","url":null,"abstract":"We present a novel method of learning style-agnostic representation using both style transfer and adversarial learning in the reinforcement learning framework. The style, here, refers to task-irrelevant details such as the color of the background in the images, where generalizing the learned policy across environments with different styles is still a challenge. Focusing on learning style-agnostic representations, our method trains the actor with diverse image styles generated from an inherent adversarial style perturbation generator, which plays a min-max game between the actor and the generator, without demanding expert knowledge for data augmentation or additional class labels for adversarial training. We verify that our method achieves competitive or better performances than the state-of-the-art approaches on Procgen and Distracting Control Suite benchmarks, and further investigate the features extracted from our model, showing that the model better captures the invariants and is less distracted by the shifted style. The code is available at https://github.com/POSTECH-CVLab/style-agnostic-RL.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"56 1","pages":"604-620"},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90567685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1