首页 > 最新文献

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision最新文献

英文 中文
CenterFormer: Center-based Transformer for 3D Object Detection CenterFormer:用于3D对象检测的基于中心的变压器
Zixiang Zhou, Xian Zhao, Yu Wang, Panqu Wang, H. Foroosh
Query-based transformer has shown great potential in constructing long-range attention in many image-domain tasks, but has rarely been considered in LiDAR-based 3D object detection due to the overwhelming size of the point cloud data. In this paper, we propose CenterFormer, a center-based transformer network for 3D object detection. CenterFormer first uses a center heatmap to select center candidates on top of a standard voxel-based point cloud encoder. It then uses the feature of the center candidate as the query embedding in the transformer. To further aggregate features from multiple frames, we design an approach to fuse features through cross-attention. Lastly, regression heads are added to predict the bounding box on the output center feature representation. Our design reduces the convergence difficulty and computational complexity of the transformer structure. The results show significant improvements over the strong baseline of anchor-free object detection networks. CenterFormer achieves state-of-the-art performance for a single model on the Waymo Open Dataset, with 73.7% mAPH on the validation set and 75.6% mAPH on the test set, significantly outperforming all previously published CNN and transformer-based methods. Our code is publicly available at https://github.com/TuSimple/centerformer
在许多图像域任务中,基于查询的变换在构建远程注意力方面显示出巨大的潜力,但由于点云数据的压倒性规模,在基于lidar的三维目标检测中很少被考虑。在本文中,我们提出了CenterFormer,一个基于中心的变压器网络,用于三维目标检测。CenterFormer首先使用中心热图在标准的基于体素的点云编码器上选择中心候选者。然后使用中心候选的特征作为查询嵌入到转换器中。为了进一步聚合多帧的特征,我们设计了一种通过交叉关注来融合特征的方法。最后,加入回归头来预测输出中心特征表示上的边界框。我们的设计降低了变压器结构的收敛难度和计算复杂度。结果表明,与无锚点目标检测网络的强基线相比,该方法有显著的改进。CenterFormer在Waymo开放数据集上对单个模型实现了最先进的性能,在验证集上的mAPH为73.7%,在测试集上的mAPH为75.6%,显著优于之前发布的所有基于CNN和变压器的方法。我们的代码可以在https://github.com/TuSimple/centerformer上公开获得
{"title":"CenterFormer: Center-based Transformer for 3D Object Detection","authors":"Zixiang Zhou, Xian Zhao, Yu Wang, Panqu Wang, H. Foroosh","doi":"10.48550/arXiv.2209.05588","DOIUrl":"https://doi.org/10.48550/arXiv.2209.05588","url":null,"abstract":"Query-based transformer has shown great potential in constructing long-range attention in many image-domain tasks, but has rarely been considered in LiDAR-based 3D object detection due to the overwhelming size of the point cloud data. In this paper, we propose CenterFormer, a center-based transformer network for 3D object detection. CenterFormer first uses a center heatmap to select center candidates on top of a standard voxel-based point cloud encoder. It then uses the feature of the center candidate as the query embedding in the transformer. To further aggregate features from multiple frames, we design an approach to fuse features through cross-attention. Lastly, regression heads are added to predict the bounding box on the output center feature representation. Our design reduces the convergence difficulty and computational complexity of the transformer structure. The results show significant improvements over the strong baseline of anchor-free object detection networks. CenterFormer achieves state-of-the-art performance for a single model on the Waymo Open Dataset, with 73.7% mAPH on the validation set and 75.6% mAPH on the test set, significantly outperforming all previously published CNN and transformer-based methods. Our code is publicly available at https://github.com/TuSimple/centerformer","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"139 1","pages":"496-513"},"PeriodicalIF":0.0,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79872637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Cross-Modal Knowledge Transfer Without Task-Relevant Source Data 无任务相关源数据的跨模态知识转移
Sk. Miraj Ahmed, Suhas Lohit, Kuan-Chuan Peng, Michael Jones, A. Roy-Chowdhury
Cost-effective depth and infrared sensors as alternatives to usual RGB sensors are now a reality, and have some advantages over RGB in domains like autonomous navigation and remote sensing. As such, building computer vision and deep learning systems for depth and infrared data are crucial. However, large labeled datasets for these modalities are still lacking. In such cases, transferring knowledge from a neural network trained on a well-labeled large dataset in the source modality (RGB) to a neural network that works on a target modality (depth, infrared, etc.) is of great value. For reasons like memory and privacy, it may not be possible to access the source data, and knowledge transfer needs to work with only the source models. We describe an effective solution, SOCKET: SOurce-free Cross-modal KnowledgE Transfer for this challenging task of transferring knowledge from one source modality to a different target modality without access to task-relevant source data. The framework reduces the modality gap using paired task-irrelevant data, as well as by matching the mean and variance of the target features with the batch-norm statistics that are present in the source models. We show through extensive experiments that our method significantly outperforms existing source-free methods for classification tasks which do not account for the modality gap.
具有成本效益的深度和红外传感器作为常规RGB传感器的替代品现在已经成为现实,并且在自主导航和遥感等领域比RGB具有一些优势。因此,为深度和红外数据构建计算机视觉和深度学习系统至关重要。然而,这些模式的大型标记数据集仍然缺乏。在这种情况下,将在源模态(RGB)中标记良好的大型数据集上训练的神经网络的知识转移到在目标模态(深度,红外等)上工作的神经网络是很有价值的。由于内存和隐私等原因,可能无法访问源数据,并且知识转移只需要使用源模型。我们描述了一个有效的解决方案,SOCKET:无源跨模态知识转移,用于在不访问任务相关源数据的情况下将知识从一个源模态转移到另一个目标模态的具有挑战性的任务。该框架使用与任务无关的成对数据,以及通过将目标特征的均值和方差与源模型中存在的批规范统计数据相匹配,减少了模态差距。我们通过大量的实验表明,我们的方法在分类任务中显著优于现有的无源方法,这些方法不考虑模态差距。
{"title":"Cross-Modal Knowledge Transfer Without Task-Relevant Source Data","authors":"Sk. Miraj Ahmed, Suhas Lohit, Kuan-Chuan Peng, Michael Jones, A. Roy-Chowdhury","doi":"10.48550/arXiv.2209.04027","DOIUrl":"https://doi.org/10.48550/arXiv.2209.04027","url":null,"abstract":"Cost-effective depth and infrared sensors as alternatives to usual RGB sensors are now a reality, and have some advantages over RGB in domains like autonomous navigation and remote sensing. As such, building computer vision and deep learning systems for depth and infrared data are crucial. However, large labeled datasets for these modalities are still lacking. In such cases, transferring knowledge from a neural network trained on a well-labeled large dataset in the source modality (RGB) to a neural network that works on a target modality (depth, infrared, etc.) is of great value. For reasons like memory and privacy, it may not be possible to access the source data, and knowledge transfer needs to work with only the source models. We describe an effective solution, SOCKET: SOurce-free Cross-modal KnowledgE Transfer for this challenging task of transferring knowledge from one source modality to a different target modality without access to task-relevant source data. The framework reduces the modality gap using paired task-irrelevant data, as well as by matching the mean and variance of the target features with the batch-norm statistics that are present in the source models. We show through extensive experiments that our method significantly outperforms existing source-free methods for classification tasks which do not account for the modality gap.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"7 1","pages":"111-127"},"PeriodicalIF":0.0,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88568569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Multi-Granularity Prediction for Scene Text Recognition 场景文本识别的多粒度预测
P. Wang, Cheng Da, C. Yao
. Scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this challenging problem, numerous innovative methods have been successively proposed and incorporating linguistic knowledge into STR models has recently become a prominent trend. In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet powerful vision STR model, which is built upon ViT and outperforms previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods. To integrate linguistic knowledge, we further propose a Multi-Granularity Prediction strategy to inject information from the language modality into the model in an implicit way, i.e. , subword representations (BPE and WordPiece) widely-used in NLP are introduced into the output space, in addition to the conventional character level representation, while no independent language model (LM) is adopted. The resultant algorithm (termed MGP-STR) is able to push the performance envelop of STR to an even higher level. Specifically, it achieves an average recognition accuracy of 93 . 35% on standard benchmarks. Code will be released soon.
. 场景文本识别(STR)是计算机视觉领域一个活跃的研究课题。为了解决这一具有挑战性的问题,相继提出了许多创新的方法,将语言知识纳入STR模型最近成为一个突出的趋势。在这项工作中,我们首先从视觉转换(ViT)的最新进展中汲取灵感,构建一个概念简单但功能强大的视觉STR模型,该模型建立在视觉转换(ViT)的基础上,优于以前最先进的场景文本识别模型,包括纯视觉模型和语言增强方法。为了整合语言知识,我们进一步提出了一种多粒度预测策略,以隐式的方式将语言形态的信息注入到模型中,即在传统的字符级表示之外,在输出空间中引入NLP中广泛使用的子词表示(BPE和WordPiece),而不采用独立的语言模型(LM)。由此产生的算法(称为MGP-STR)能够将STR的性能提升到更高的水平。具体来说,它的平均识别准确率达到了93。35%的标准基准。代码将很快发布。
{"title":"Multi-Granularity Prediction for Scene Text Recognition","authors":"P. Wang, Cheng Da, C. Yao","doi":"10.48550/arXiv.2209.03592","DOIUrl":"https://doi.org/10.48550/arXiv.2209.03592","url":null,"abstract":". Scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this challenging problem, numerous innovative methods have been successively proposed and incorporating linguistic knowledge into STR models has recently become a prominent trend. In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet powerful vision STR model, which is built upon ViT and outperforms previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods. To integrate linguistic knowledge, we further propose a Multi-Granularity Prediction strategy to inject information from the language modality into the model in an implicit way, i.e. , subword representations (BPE and WordPiece) widely-used in NLP are introduced into the output space, in addition to the conventional character level representation, while no independent language model (LM) is adopted. The resultant algorithm (termed MGP-STR) is able to push the performance envelop of STR to an even higher level. Specifically, it achieves an average recognition accuracy of 93 . 35% on standard benchmarks. Code will be released soon.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"112 1","pages":"339-355"},"PeriodicalIF":0.0,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80658762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Unpaired Image Translation via Vector Symbolic Architectures 基于矢量符号结构的非配对图像翻译
Justin D. Theiss, Jay Leverett, Daeil Kim, Aayush Prakash
Image-to-image translation has played an important role in enabling synthetic data for computer vision. However, if the source and target domains have a large semantic mismatch, existing techniques often suffer from source content corruption aka semantic flipping. To address this problem, we propose a new paradigm for image-to-image translation using Vector Symbolic Architectures (VSA), a theoretical framework which defines algebraic operations in a high-dimensional vector (hypervector) space. We introduce VSA-based constraints on adversarial learning for source-to-target translations by learning a hypervector mapping that inverts the translation to ensure consistency with source content. We show both qualitatively and quantitatively that our method improves over other state-of-the-art techniques.
图像到图像的转换在为计算机视觉提供合成数据方面发挥了重要作用。然而,如果源域和目标域有很大的语义不匹配,现有的技术往往会遭受源内容损坏,即语义翻转。为了解决这个问题,我们提出了一个使用向量符号体系结构(VSA)的图像到图像转换的新范式,VSA是一个理论框架,它定义了高维向量(超向量)空间中的代数运算。我们引入了基于vsa的对抗性学习约束,通过学习一个反向翻译的超向量映射来确保与源内容的一致性。我们在定性和定量上都表明,我们的方法优于其他最先进的技术。
{"title":"Unpaired Image Translation via Vector Symbolic Architectures","authors":"Justin D. Theiss, Jay Leverett, Daeil Kim, Aayush Prakash","doi":"10.48550/arXiv.2209.02686","DOIUrl":"https://doi.org/10.48550/arXiv.2209.02686","url":null,"abstract":"Image-to-image translation has played an important role in enabling synthetic data for computer vision. However, if the source and target domains have a large semantic mismatch, existing techniques often suffer from source content corruption aka semantic flipping. To address this problem, we propose a new paradigm for image-to-image translation using Vector Symbolic Architectures (VSA), a theoretical framework which defines algebraic operations in a high-dimensional vector (hypervector) space. We introduce VSA-based constraints on adversarial learning for source-to-target translations by learning a hypervector mapping that inverts the translation to ensure consistency with source content. We show both qualitatively and quantitatively that our method improves over other state-of-the-art techniques.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"8 1","pages":"17-32"},"PeriodicalIF":0.0,"publicationDate":"2022-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75183219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Towards Accurate Binary Neural Networks via Modeling Contextual Dependencies 通过上下文依赖关系建模实现精确的二元神经网络
Xingrun Xing, Yangguang Li, Wei Li, Wenrui Ding, Yalong Jiang, Yufeng Wang, Jinghua Shao, Chunlei Liu, Xianglong Liu
, Abstract. Existing Binary Neural Networks (BNNs) mainly operate on local convolutions with binarization function. However, such simple bit operations lack the ability of modeling contextual dependencies, which is critical for learning discriminative deep representations in vision models. In this work, we tackle this issue by presenting new designs of binary neural modules, which enables BNNs to learn effective contextual dependencies. First, we propose a binary multi-layer perceptron (MLP) block as an alternative to binary convolution blocks to directly model contextual dependencies. Both short-range and long-range feature dependencies are modeled by binary MLPs, where the former provides local inductive bias and the latter breaks limited receptive field in binary convolutions. Second, to improve the robustness of binary models with contextual dependencies, we compute the contextual dynamic embeddings to determine the binarization thresholds in general binary convolutional blocks. Armed with our binary MLP blocks and improved binary convolution, we build the BNNs with explicit Contextual De-pendency modeling, termed as BCDNet. On the standard ImageNet-1K classification benchmark, the BCDNet achieves 72.3% Top-1 accuracy and outperforms leading binary methods by a large margin. In particu-lar, the proposed BCDNet exceeds the state-of-the-art ReActNet-A by 2.9% Top-1 accuracy with similar operations. Our code is available at https://github.com/Sense-GVT/BCDNet .
、抽象。现有的二值神经网络主要是在局部卷积上进行二值化运算。然而,这种简单的位操作缺乏对上下文依赖关系建模的能力,而上下文依赖关系对于学习视觉模型中的判别深度表示至关重要。在这项工作中,我们通过提出新的二元神经模块设计来解决这个问题,这使得bnn能够学习有效的上下文依赖关系。首先,我们提出了一个二元多层感知器(MLP)块作为二元卷积块的替代方案,直接对上下文依赖关系进行建模。在二元mlp模型中,前者提供了局部归纳偏置,后者打破了二元卷积中的有限接受域。其次,为了提高具有上下文相关性的二元模型的鲁棒性,我们计算了上下文动态嵌入来确定一般二进制卷积块的二值化阈值。利用我们的二进制MLP块和改进的二进制卷积,我们使用显式上下文依赖建模(称为BCDNet)构建了bnn。在标准的ImageNet-1K分类基准上,BCDNet达到了72.3%的Top-1准确率,并且大大优于领先的二值化方法。特别是,拟议的BCDNet在类似操作下比最先进的ReActNet-A精度高出2.9%。我们的代码可在https://github.com/Sense-GVT/BCDNet上获得。
{"title":"Towards Accurate Binary Neural Networks via Modeling Contextual Dependencies","authors":"Xingrun Xing, Yangguang Li, Wei Li, Wenrui Ding, Yalong Jiang, Yufeng Wang, Jinghua Shao, Chunlei Liu, Xianglong Liu","doi":"10.48550/arXiv.2209.01404","DOIUrl":"https://doi.org/10.48550/arXiv.2209.01404","url":null,"abstract":", Abstract. Existing Binary Neural Networks (BNNs) mainly operate on local convolutions with binarization function. However, such simple bit operations lack the ability of modeling contextual dependencies, which is critical for learning discriminative deep representations in vision models. In this work, we tackle this issue by presenting new designs of binary neural modules, which enables BNNs to learn effective contextual dependencies. First, we propose a binary multi-layer perceptron (MLP) block as an alternative to binary convolution blocks to directly model contextual dependencies. Both short-range and long-range feature dependencies are modeled by binary MLPs, where the former provides local inductive bias and the latter breaks limited receptive field in binary convolutions. Second, to improve the robustness of binary models with contextual dependencies, we compute the contextual dynamic embeddings to determine the binarization thresholds in general binary convolutional blocks. Armed with our binary MLP blocks and improved binary convolution, we build the BNNs with explicit Contextual De-pendency modeling, termed as BCDNet. On the standard ImageNet-1K classification benchmark, the BCDNet achieves 72.3% Top-1 accuracy and outperforms leading binary methods by a large margin. In particu-lar, the proposed BCDNet exceeds the state-of-the-art ReActNet-A by 2.9% Top-1 accuracy with similar operations. Our code is available at https://github.com/Sense-GVT/BCDNet .","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"8 1","pages":"536-552"},"PeriodicalIF":0.0,"publicationDate":"2022-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86602070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions 大规模非平稳任务分布下较少遗忘的元学习
Zhenyi Wang, Li Shen, Le Fang, Qiuling Suo, Dongling Zhan, Tiehang Duan, Mingchen Gao
. The paradigm of machine intelligence moves from purely supervised learning to a more practical scenario when many loosely related unlabeled data are available and labeled data is scarce. Most existing algo-rithms assume that the underlying task distribution is stationary. Here we consider a more realistic and challenging setting in that task distributions evolve over time. We name this problem as S emi-supervised meta-learning with E volving T ask di S tributions, abbreviated as SETS . Two key challenges arise in this more realistic setting: (i) how to use unlabeled data in the presence of a large amount of unlabeled out-of-distribution (OOD) data; and (ii) how to prevent catastrophic forgetting on previously learned task distributions due to the task distribution shift. We propose an O OD R obust and knowle D ge pres E rved semi-supe R vised meta-learning approach ( ORDER ) ‡ , to tackle these two major challenges. Specifically, our ORDER introduces a novel mutual information regularization to robustify the model with unlabeled OOD data and adopts an optimal transport regularization to remember previously learned knowledge in feature space. In addition, we test our method on a very challenging dataset: SETS on large-scale non-stationary semi-supervised task distributions consisting of (at least) 72K tasks. With extensive experiments, we demonstrate the proposed ORDER alleviates forgetting on evolving task distributions and is more robust to OOD data than related strong baselines.
. 机器智能的范式从纯粹的监督学习转向更实际的场景,当许多松散相关的未标记数据可用而标记数据稀缺时。大多数现有算法都假定底层任务分布是平稳的。在这里,我们考虑一个更现实和更具挑战性的设置,即任务分布随着时间的推移而变化。我们将这个问题命名为S -半监督元学习,其中E包含S个子集,缩写为set。在这种更现实的环境中出现了两个关键挑战:(i)如何在存在大量未标记的分布外(OOD)数据的情况下使用未标记数据;(ii)如何防止由于任务分布的转移而导致的对先前学习的任务分布的灾难性遗忘。为了解决这两个主要的挑战,我们提出了一种基于知识的半超学习型元学习方法(ORDER)。具体来说,我们的ORDER引入了一种新的互信息正则化来对未标记的OOD数据模型进行鲁棒化,并采用最优传输正则化来记住特征空间中先前学习的知识。此外,我们在一个非常具有挑战性的数据集上测试了我们的方法:set在由(至少)72K个任务组成的大规模非平稳半监督任务分布上。通过大量的实验,我们证明了所提出的ORDER减轻了对不断变化的任务分布的遗忘,并且比相关的强基线对OOD数据更具鲁棒性。
{"title":"Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions","authors":"Zhenyi Wang, Li Shen, Le Fang, Qiuling Suo, Dongling Zhan, Tiehang Duan, Mingchen Gao","doi":"10.48550/arXiv.2209.01501","DOIUrl":"https://doi.org/10.48550/arXiv.2209.01501","url":null,"abstract":". The paradigm of machine intelligence moves from purely supervised learning to a more practical scenario when many loosely related unlabeled data are available and labeled data is scarce. Most existing algo-rithms assume that the underlying task distribution is stationary. Here we consider a more realistic and challenging setting in that task distributions evolve over time. We name this problem as S emi-supervised meta-learning with E volving T ask di S tributions, abbreviated as SETS . Two key challenges arise in this more realistic setting: (i) how to use unlabeled data in the presence of a large amount of unlabeled out-of-distribution (OOD) data; and (ii) how to prevent catastrophic forgetting on previously learned task distributions due to the task distribution shift. We propose an O OD R obust and knowle D ge pres E rved semi-supe R vised meta-learning approach ( ORDER ) ‡ , to tackle these two major challenges. Specifically, our ORDER introduces a novel mutual information regularization to robustify the model with unlabeled OOD data and adopts an optimal transport regularization to remember previously learned knowledge in feature space. In addition, we test our method on a very challenging dataset: SETS on large-scale non-stationary semi-supervised task distributions consisting of (at least) 72K tasks. With extensive experiments, we demonstrate the proposed ORDER alleviates forgetting on evolving task distributions and is more robust to OOD data than related strong baselines.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"38 1","pages":"221-238"},"PeriodicalIF":0.0,"publicationDate":"2022-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86359811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Revisiting Outer Optimization in Adversarial Training 对抗性训练中的外部优化问题重述
Ali Dabouei, Fariborz Taherkhani, Sobhan Soleymani, N. Nasrabadi
. Despite the fundamental distinction between adversarial and natural training (AT and NT), AT methods generally adopt momentum SGD (MSGD) for the outer optimization. This paper aims to analyze this choice by investigating the overlooked role of outer optimization in AT. Our exploratory evaluations reveal that AT induces higher gradient norm and variance compared to NT. This phenomenon hinders the outer optimization in AT since the convergence rate of MSGD is highly dependent on the variance of the gradients. To this end, we propose an optimization method called ENGM which regularizes the contribution of each input example to the average mini-batch gradients. We prove that the convergence rate of ENGM is independent of the variance of the gradients, and thus, it is suitable for AT. We introduce a trick to reduce the computational cost of ENGM using empirical observations on the correlation between the norm of gradients w.r.t. the network parameters and input examples. Our extensive evaluations and ablation studies on CIFAR-10, CIFAR-100, and TinyImageNet demonstrate that ENGM and its variants consistently improve the performance of a wide range of AT methods. Furthermore, ENGM alleviates major shortcomings of AT including robust overfitting and high sensitivity to hyperparameter settings.
。尽管对抗性训练和自然训练(AT和NT)之间存在根本性的区别,但AT方法通常采用动量SGD (MSGD)进行外部优化。本文旨在通过研究外部优化在自动化生产中被忽视的作用来分析这种选择。我们的探索性评估表明,与NT相比,AT诱导了更高的梯度范数和方差,这一现象阻碍了AT的外部优化,因为MSGD的收敛速度高度依赖于梯度的方差。为此,我们提出了一种称为ENGM的优化方法,该方法对每个输入样本对平均小批梯度的贡献进行正则化。我们证明了ENGM的收敛速度与梯度的方差无关,因此它适用于AT。我们引入了一种技巧来减少ENGM的计算成本,利用经验观察梯度范数与网络参数和输入示例之间的相关性。我们对CIFAR-10、CIFAR-100和TinyImageNet进行了广泛的评估和消蚀研究,结果表明,ENGM及其变体持续提高了各种AT方法的性能。此外,ENGM缓解了AT的主要缺点,包括鲁棒过拟合和对超参数设置的高灵敏度。
{"title":"Revisiting Outer Optimization in Adversarial Training","authors":"Ali Dabouei, Fariborz Taherkhani, Sobhan Soleymani, N. Nasrabadi","doi":"10.48550/arXiv.2209.01199","DOIUrl":"https://doi.org/10.48550/arXiv.2209.01199","url":null,"abstract":". Despite the fundamental distinction between adversarial and natural training (AT and NT), AT methods generally adopt momentum SGD (MSGD) for the outer optimization. This paper aims to analyze this choice by investigating the overlooked role of outer optimization in AT. Our exploratory evaluations reveal that AT induces higher gradient norm and variance compared to NT. This phenomenon hinders the outer optimization in AT since the convergence rate of MSGD is highly dependent on the variance of the gradients. To this end, we propose an optimization method called ENGM which regularizes the contribution of each input example to the average mini-batch gradients. We prove that the convergence rate of ENGM is independent of the variance of the gradients, and thus, it is suitable for AT. We introduce a trick to reduce the computational cost of ENGM using empirical observations on the correlation between the norm of gradients w.r.t. the network parameters and input examples. Our extensive evaluations and ablation studies on CIFAR-10, CIFAR-100, and TinyImageNet demonstrate that ENGM and its variants consistently improve the performance of a wide range of AT methods. Furthermore, ENGM alleviates major shortcomings of AT including robust overfitting and high sensitivity to hyperparameter settings.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"1 1","pages":"244-261"},"PeriodicalIF":0.0,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88710069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation 基于序列到序列转换的统一的完全和时间戳监督的时间动作分割
Nadine Behrmann, S. Golestaneh, Zico Kolter, Juergen Gall, M. Noroozi
This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup. In contrast to current state-of-the-art frame-level prediction methods, we view action segmentation as a seq2seq translation task, i.e., mapping a sequence of video frames to a sequence of action segments. Our proposed method involves a series of modifications and auxiliary loss functions on the standard Transformer seq2seq translation model to cope with long input sequences opposed to short output sequences and relatively few videos. We incorporate an auxiliary supervision signal for the encoder via a frame-wise loss and propose a separate alignment decoder for an implicit duration prediction. Finally, we extend our framework to the timestamp supervised setting via our proposed constrained k-medoids algorithm to generate pseudo-segmentations. Our proposed framework performs consistently on both fully and timestamp supervised settings, outperforming or competing state-of-the-art on several datasets. Our code is publicly available at https://github.com/boschresearch/UVAST.
本文介绍了一个统一的视频动作分割框架,该框架是在完全时间戳监督下通过序列到序列(seq2seq)转换实现的。与当前最先进的帧级预测方法相比,我们将动作分割视为一个seq2seq转换任务,即将一系列视频帧映射到一系列动作片段。我们提出的方法包括对标准Transformer seq2seq翻译模型进行一系列修改和辅助损失函数,以应对长输入序列而不是短输出序列和相对较少的视频。我们通过逐帧损失为编码器合并了辅助监督信号,并提出了一个单独的对齐解码器,用于隐式持续时间预测。最后,我们通过我们提出的约束k-medoids算法将我们的框架扩展到时间戳监督设置来生成伪分割。我们提出的框架在完全和时间戳监督设置上表现一致,在几个数据集上优于或竞争最先进的技术。我们的代码可以在https://github.com/boschresearch/UVAST上公开获得。
{"title":"Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation","authors":"Nadine Behrmann, S. Golestaneh, Zico Kolter, Juergen Gall, M. Noroozi","doi":"10.48550/arXiv.2209.00638","DOIUrl":"https://doi.org/10.48550/arXiv.2209.00638","url":null,"abstract":"This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup. In contrast to current state-of-the-art frame-level prediction methods, we view action segmentation as a seq2seq translation task, i.e., mapping a sequence of video frames to a sequence of action segments. Our proposed method involves a series of modifications and auxiliary loss functions on the standard Transformer seq2seq translation model to cope with long input sequences opposed to short output sequences and relatively few videos. We incorporate an auxiliary supervision signal for the encoder via a frame-wise loss and propose a separate alignment decoder for an implicit duration prediction. Finally, we extend our framework to the timestamp supervised setting via our proposed constrained k-medoids algorithm to generate pseudo-segmentations. Our proposed framework performs consistently on both fully and timestamp supervised settings, outperforming or competing state-of-the-art on several datasets. Our code is publicly available at https://github.com/boschresearch/UVAST.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"5 1","pages":"52-68"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83706870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
MIME: Minority Inclusion for Majority Group Enhancement of AI Performance MIME:少数群体融入多数群体增强AI性能
Pradyumna Chari, Yunhao Ba, Shreeram S. Athreya, A. Kadambi
Several papers have rightly included minority groups in artificial intelligence (AI) training data to improve test inference for minority groups and/or society-at-large. A society-at-large consists of both minority and majority stakeholders. A common misconception is that minority inclusion does not increase performance for majority groups alone. In this paper, we make the surprising finding that including minority samples can improve test error for the majority group. In other words, minority group inclusion leads to majority group enhancements (MIME) in performance. A theoretical existence proof of the MIME effect is presented and found to be consistent with experimental results on six different datasets. Project webpage: https://visual.ee.ucla.edu/mime.htm/
一些论文正确地将少数群体纳入人工智能(AI)训练数据中,以改善少数群体和/或整个社会的测试推断。整个社会由少数利益相关者和多数利益相关者组成。一个常见的误解是,少数群体的加入并不会单独提高多数群体的表现。在本文中,我们得到了令人惊讶的发现,包括少数样本可以改善多数群体的测试误差。换句话说,包含少数组可以提高多数组的性能(MIME)。提出了MIME效应的理论存在性证明,并在六个不同的数据集上与实验结果相一致。项目网页:https://visual.ee.ucla.edu/mime.htm/
{"title":"MIME: Minority Inclusion for Majority Group Enhancement of AI Performance","authors":"Pradyumna Chari, Yunhao Ba, Shreeram S. Athreya, A. Kadambi","doi":"10.48550/arXiv.2209.00746","DOIUrl":"https://doi.org/10.48550/arXiv.2209.00746","url":null,"abstract":"Several papers have rightly included minority groups in artificial intelligence (AI) training data to improve test inference for minority groups and/or society-at-large. A society-at-large consists of both minority and majority stakeholders. A common misconception is that minority inclusion does not increase performance for majority groups alone. In this paper, we make the surprising finding that including minority samples can improve test error for the majority group. In other words, minority group inclusion leads to majority group enhancements (MIME) in performance. A theoretical existence proof of the MIME effect is presented and found to be consistent with experimental results on six different datasets. Project webpage: https://visual.ee.ucla.edu/mime.htm/","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"47 1","pages":"326-343"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76737266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploring Gradient-based Multi-directional Controls in GANs gan中基于梯度的多向控制研究
Zikun Chen, R. Jiang, Brendan Duke, Han Zhao, P. Aarabi
Generative Adversarial Networks (GANs) have been widely applied in modeling diverse image distributions. However, despite its impressive applications, the structure of the latent space in GANs largely remains as a black-box, leaving its controllable generation an open problem, especially when spurious correlations between different semantic attributes exist in the image distributions. To address this problem, previous methods typically learn linear directions or individual channels that control semantic attributes in the image space. However, they often suffer from imperfect disentanglement, or are unable to obtain multi-directional controls. In this work, in light of the above challenges, we propose a novel approach that discovers nonlinear controls, which enables multi-directional manipulation as well as effective disentanglement, based on gradient information in the learned GAN latent space. More specifically, we first learn interpolation directions by following the gradients from classification networks trained separately on the attributes, and then navigate the latent space by exclusively controlling channels activated for the target attribute in the learned directions. Empirically, with small training data, our approach is able to gain fine-grained controls over a diverse set of bi-directional and multi-directional attributes, and we showcase its ability to achieve disentanglement significantly better than state-of-the-art methods both qualitatively and quantitatively.
生成对抗网络(GANs)已广泛应用于各种图像分布的建模。然而,尽管具有令人印象深刻的应用,gan中潜在空间的结构在很大程度上仍然是一个黑盒,使其可控生成成为一个开放问题,特别是当图像分布中存在不同语义属性之间的虚假相关性时。为了解决这个问题,以前的方法通常学习线性方向或控制图像空间中语义属性的单个通道。然而,它们经常遭受不完美的解缠,或者无法获得多向控制。在这项工作中,鉴于上述挑战,我们提出了一种基于学习到的GAN潜在空间中的梯度信息发现非线性控制的新方法,该方法可以实现多向操作以及有效的解纠缠。更具体地说,我们首先通过跟踪属性上单独训练的分类网络的梯度来学习插值方向,然后通过在学习方向上专门控制为目标属性激活的通道来导航潜在空间。从经验上讲,使用小的训练数据,我们的方法能够获得对各种双向和多向属性的细粒度控制,并且我们展示了它在定性和定量上比最先进的方法更好地实现解纠缠的能力。
{"title":"Exploring Gradient-based Multi-directional Controls in GANs","authors":"Zikun Chen, R. Jiang, Brendan Duke, Han Zhao, P. Aarabi","doi":"10.48550/arXiv.2209.00698","DOIUrl":"https://doi.org/10.48550/arXiv.2209.00698","url":null,"abstract":"Generative Adversarial Networks (GANs) have been widely applied in modeling diverse image distributions. However, despite its impressive applications, the structure of the latent space in GANs largely remains as a black-box, leaving its controllable generation an open problem, especially when spurious correlations between different semantic attributes exist in the image distributions. To address this problem, previous methods typically learn linear directions or individual channels that control semantic attributes in the image space. However, they often suffer from imperfect disentanglement, or are unable to obtain multi-directional controls. In this work, in light of the above challenges, we propose a novel approach that discovers nonlinear controls, which enables multi-directional manipulation as well as effective disentanglement, based on gradient information in the learned GAN latent space. More specifically, we first learn interpolation directions by following the gradients from classification networks trained separately on the attributes, and then navigate the latent space by exclusively controlling channels activated for the target attribute in the learned directions. Empirically, with small training data, our approach is able to gain fine-grained controls over a diverse set of bi-directional and multi-directional attributes, and we showcase its ability to achieve disentanglement significantly better than state-of-the-art methods both qualitatively and quantitatively.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"33 1","pages":"104-119"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76670434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1