首页 > 最新文献

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision最新文献

英文 中文
DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation DCL-Net: 6D姿态估计的深度对应学习网络
Hongyang Li, Jiehong Lin, K. Jia
. Establishment of point correspondence between camera and object coordinate systems is a promising way to solve 6D object poses. However, surrogate objectives of correspondence learning in 3D space are a step away from the true ones of object pose estimation, making the learning suboptimal for the end task. In this paper, we address this short-coming by introducing a new method of Deep Correspondence Learning Network for direct 6D object pose estimation, shortened as DCL-Net . Specifically, DCL-Net employs dual newly proposed Feature Disengagement and Alignment (FDA) modules to establish, in the feature space, partial-to-partial correspondence and complete-to-complete one for partial object observation and its complete CAD model, respectively, which result in aggregated pose and match feature pairs from two coordinate systems; these two FDA modules thus bring complementary advantages. The match feature pairs are used to learn confidence scores for measuring the qualities of deep correspondence, while the pose feature pairs are weighted by confidence scores for direct object pose regression. A confidence-based pose refinement network is also proposed to further improve pose precision in an iterative manner. Extensive experiments show that DCL-Net outperforms existing methods on three benchmarking datasets, including YCB-Video, LineMOD, and Oclussion-LineMOD; ablation studies also confirm the efficacy of our novel designs. Our code is released publicly at https://github.com/Gorilla-Lab-SCUT/DCL-Net .
. 建立摄像机与目标坐标系之间的点对应关系是求解6D目标位姿的一种很有前途的方法。然而,三维空间中对应学习的替代目标与物体姿态估计的真实目标有一步之遥,使得学习对最终任务来说不是最优的。在本文中,我们通过引入一种新的用于直接6D目标姿态估计的深度对应学习网络(简称DCL-Net)方法来解决这一缺点。具体而言,DCL-Net采用新提出的双特征分离与对齐(Feature Disengagement and Alignment, FDA)模块,分别在特征空间中建立局部目标观测及其完整CAD模型的部分对部分对应关系和完全对完全对应关系,得到两个坐标系的姿态和匹配特征对聚合;这两个FDA模块因此带来了互补的优势。匹配特征对学习置信度分数用于测量深度对应的质量,姿态特征对加权置信度分数用于直接目标姿态回归。提出了一种基于置信度的姿态优化网络,以迭代的方式进一步提高姿态精度。大量实验表明,DCL-Net在YCB-Video、LineMOD和oclusion -LineMOD三个基准数据集上优于现有方法;消融研究也证实了我们的新设计的有效性。我们的代码在https://github.com/Gorilla-Lab-SCUT/DCL-Net公开发布。
{"title":"DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation","authors":"Hongyang Li, Jiehong Lin, K. Jia","doi":"10.48550/arXiv.2210.05232","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05232","url":null,"abstract":". Establishment of point correspondence between camera and object coordinate systems is a promising way to solve 6D object poses. However, surrogate objectives of correspondence learning in 3D space are a step away from the true ones of object pose estimation, making the learning suboptimal for the end task. In this paper, we address this short-coming by introducing a new method of Deep Correspondence Learning Network for direct 6D object pose estimation, shortened as DCL-Net . Specifically, DCL-Net employs dual newly proposed Feature Disengagement and Alignment (FDA) modules to establish, in the feature space, partial-to-partial correspondence and complete-to-complete one for partial object observation and its complete CAD model, respectively, which result in aggregated pose and match feature pairs from two coordinate systems; these two FDA modules thus bring complementary advantages. The match feature pairs are used to learn confidence scores for measuring the qualities of deep correspondence, while the pose feature pairs are weighted by confidence scores for direct object pose regression. A confidence-based pose refinement network is also proposed to further improve pose precision in an iterative manner. Extensive experiments show that DCL-Net outperforms existing methods on three benchmarking datasets, including YCB-Video, LineMOD, and Oclussion-LineMOD; ablation studies also confirm the efficacy of our novel designs. Our code is released publicly at https://github.com/Gorilla-Lab-SCUT/DCL-Net .","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"38 1","pages":"369-385"},"PeriodicalIF":0.0,"publicationDate":"2022-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78053634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Map-free Visual Relocalization: Metric Pose Relative to a Single Image 无地图视觉重定位:相对于单个图像的度量姿态
Eduardo Arnold, Jamie Wynn, S. Vicente, Guillermo Garcia-Hernando, 'Aron Monszpart, V. Prisacariu, Daniyar Turmukhambetov, Eric Brachmann
. Can we relocalize in a scene represented by a single reference image? Standard visual relocalization requires hundreds of images and scale calibration to build a scene-specific 3D map. In contrast, we propose Map-free Relocalization , i.e. , using only one photo of a scene to enable instant, metric scaled relocalization. Existing datasets are not suitable to benchmark map-free relocalization, due to their focus on large scenes or their limited variability. Thus, we have constructed a new dataset of 655 small places of interest, such as sculptures, murals and fountains, collected worldwide. Each place comes with a reference image to serve as a relocalization anchor, and dozens of query images with known, metric camera poses. The dataset features changing conditions, stark viewpoint changes, high variability across places, and queries with low to no visual overlap with the reference image. We identify two viable families of existing methods to provide baseline results: relative pose regression, and feature matching combined with single-image depth prediction. While these methods show reasonable performance on some favorable scenes in our dataset, map-free relocalization proves to be a challenge that requires new, innovative solutions.
。我们能否在由单个参考图像表示的场景中重新定位?标准的视觉重新定位需要数百张图像和比例校准来构建特定场景的3D地图。相比之下,我们提出了无地图重新定位,即仅使用场景的一张照片来实现即时的度量尺度重新定位。现有的数据集不适合进行基准测试,因为它们关注的是大场景或有限的可变性。因此,我们构建了一个新的数据集,其中包含655个小景点,如雕塑、壁画和喷泉,收集自世界各地。每个地方都有一个参考图像作为重新定位锚,以及数十个已知的查询图像,公制相机姿势。数据集的特点是不断变化的条件、明显的视点变化、不同地方的高度可变性,以及与参考图像的视觉重叠很少或没有重叠的查询。我们确定了两种可行的现有方法来提供基线结果:相对姿态回归和特征匹配结合单图像深度预测。虽然这些方法在我们数据集中的一些有利场景上显示出合理的性能,但无地图重新定位被证明是一个挑战,需要新的、创新的解决方案。
{"title":"Map-free Visual Relocalization: Metric Pose Relative to a Single Image","authors":"Eduardo Arnold, Jamie Wynn, S. Vicente, Guillermo Garcia-Hernando, 'Aron Monszpart, V. Prisacariu, Daniyar Turmukhambetov, Eric Brachmann","doi":"10.48550/arXiv.2210.05494","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05494","url":null,"abstract":". Can we relocalize in a scene represented by a single reference image? Standard visual relocalization requires hundreds of images and scale calibration to build a scene-specific 3D map. In contrast, we propose Map-free Relocalization , i.e. , using only one photo of a scene to enable instant, metric scaled relocalization. Existing datasets are not suitable to benchmark map-free relocalization, due to their focus on large scenes or their limited variability. Thus, we have constructed a new dataset of 655 small places of interest, such as sculptures, murals and fountains, collected worldwide. Each place comes with a reference image to serve as a relocalization anchor, and dozens of query images with known, metric camera poses. The dataset features changing conditions, stark viewpoint changes, high variability across places, and queries with low to no visual overlap with the reference image. We identify two viable families of existing methods to provide baseline results: relative pose regression, and feature matching combined with single-image depth prediction. While these methods show reasonable performance on some favorable scenes in our dataset, map-free relocalization proves to be a challenge that requires new, innovative solutions.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"11 1","pages":"690-708"},"PeriodicalIF":0.0,"publicationDate":"2022-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73238683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
LidarNAS: Unifying and Searching Neural Architectures for 3D Point Clouds LidarNAS:统一和搜索三维点云的神经架构
Chenxi Liu, Zhaoqi Leng, Peigen Sun, Shuyang Cheng, C. Qi, Yin Zhou, Mingxing Tan, Drago Anguelov
Developing neural models that accurately understand objects in 3D point clouds is essential for the success of robotics and autonomous driving. However, arguably due to the higher-dimensional nature of the data (as compared to images), existing neural architectures exhibit a large variety in their designs, including but not limited to the views considered, the format of the neural features, and the neural operations used. Lack of a unified framework and interpretation makes it hard to put these designs in perspective, as well as systematically explore new ones. In this paper, we begin by proposing a unified framework of such, with the key idea being factorizing the neural networks into a series of view transforms and neural layers. We demonstrate that this modular framework can reproduce a variety of existing works while allowing a fair comparison of backbone designs. Then, we show how this framework can easily materialize into a concrete neural architecture search (NAS) space, allowing a principled NAS-for-3D exploration. In performing evolutionary NAS on the 3D object detection task on the Waymo Open Dataset, not only do we outperform the state-of-the-art models, but also report the interesting finding that NAS tends to discover the same macro-level architecture concept for both the vehicle and pedestrian classes.
开发能够准确理解3D点云中的物体的神经模型对于机器人和自动驾驶的成功至关重要。然而,由于数据的高维性质(与图像相比),现有的神经架构在其设计中表现出多种多样,包括但不限于所考虑的视图、神经特征的格式和所使用的神经操作。由于缺乏统一的框架和解释,很难正确地看待这些设计,也很难系统地探索新的设计。在本文中,我们首先提出了一个统一的框架,其关键思想是将神经网络分解为一系列视图变换和神经层。我们证明,这种模块化框架可以重现各种现有的作品,同时允许主干设计的公平比较。然后,我们展示了这个框架如何容易地实现到一个具体的神经架构搜索(NAS)空间,允许一个原则性的NAS-for- 3d探索。在Waymo开放数据集上对3D目标检测任务执行渐进式NAS时,我们不仅优于最先进的模型,而且还报告了一个有趣的发现,即NAS倾向于为车辆和行人类别发现相同的宏观层面架构概念。
{"title":"LidarNAS: Unifying and Searching Neural Architectures for 3D Point Clouds","authors":"Chenxi Liu, Zhaoqi Leng, Peigen Sun, Shuyang Cheng, C. Qi, Yin Zhou, Mingxing Tan, Drago Anguelov","doi":"10.48550/arXiv.2210.05018","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05018","url":null,"abstract":"Developing neural models that accurately understand objects in 3D point clouds is essential for the success of robotics and autonomous driving. However, arguably due to the higher-dimensional nature of the data (as compared to images), existing neural architectures exhibit a large variety in their designs, including but not limited to the views considered, the format of the neural features, and the neural operations used. Lack of a unified framework and interpretation makes it hard to put these designs in perspective, as well as systematically explore new ones. In this paper, we begin by proposing a unified framework of such, with the key idea being factorizing the neural networks into a series of view transforms and neural layers. We demonstrate that this modular framework can reproduce a variety of existing works while allowing a fair comparison of backbone designs. Then, we show how this framework can easily materialize into a concrete neural architecture search (NAS) space, allowing a principled NAS-for-3D exploration. In performing evolutionary NAS on the 3D object detection task on the Waymo Open Dataset, not only do we outperform the state-of-the-art models, but also report the interesting finding that NAS tends to discover the same macro-level architecture concept for both the vehicle and pedestrian classes.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"9 1","pages":"158-175"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82512092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SCAM! Transferring humans between images with Semantic Cross Attention Modulation 骗局!语义交叉注意调制在图像间转移人
Nicolas Dufour, David Picard, Vicky S. Kalogeiton
A large body of recent work targets semantically conditioned image generation. Most such methods focus on the narrower task of pose transfer and ignore the more challenging task of subject transfer that consists in not only transferring the pose but also the appearance and background. In this work, we introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse information in each semantic region of the image (including foreground and background), thus achieving precise generation with emphasis on fine details. This is enabled by the Semantic Attention Transformer Encoder that extracts multiple latent vectors for each semantic region, and the corresponding generator that exploits these multiple latents by using semantic cross attention modulation. It is trained only using a reconstruction setup, while subject transfer is performed at test time. Our analysis shows that our proposed architecture is successful at encoding the diversity of appearance in each semantic region. Extensive experiments on the iDesigner and CelebAMask-HD datasets show that SCAM outperforms SEAN and SPADE; moreover, it sets the new state of the art on subject transfer.
最近大量的工作都是针对语义条件下的图像生成。这些方法大多集中在较窄的姿势转移任务上,而忽略了更具有挑战性的主题转移任务,即不仅要转移姿势,还要转移外观和背景。在这项工作中,我们引入了SCAM(语义交叉注意调制),这是一个系统,它在图像的每个语义区域(包括前景和背景)编码丰富多样的信息,从而实现精确生成,并强调细节。这是通过语义注意转换器编码器实现的,该编码器为每个语义区域提取多个潜在向量,相应的生成器通过使用语义交叉注意调制来利用这些多个潜在向量。它只使用重建设置进行训练,而受试者转移在测试时进行。我们的分析表明,我们提出的架构在编码每个语义区域的外观多样性方面是成功的。在idedesigner和CelebAMask-HD数据集上进行的大量实验表明,SCAM优于SEAN和SPADE;此外,它还开创了主体转移研究的新局面。
{"title":"SCAM! Transferring humans between images with Semantic Cross Attention Modulation","authors":"Nicolas Dufour, David Picard, Vicky S. Kalogeiton","doi":"10.48550/arXiv.2210.04883","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04883","url":null,"abstract":"A large body of recent work targets semantically conditioned image generation. Most such methods focus on the narrower task of pose transfer and ignore the more challenging task of subject transfer that consists in not only transferring the pose but also the appearance and background. In this work, we introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse information in each semantic region of the image (including foreground and background), thus achieving precise generation with emphasis on fine details. This is enabled by the Semantic Attention Transformer Encoder that extracts multiple latent vectors for each semantic region, and the corresponding generator that exploits these multiple latents by using semantic cross attention modulation. It is trained only using a reconstruction setup, while subject transfer is performed at test time. Our analysis shows that our proposed architecture is successful at encoding the diversity of appearance in each semantic region. Extensive experiments on the iDesigner and CelebAMask-HD datasets show that SCAM outperforms SEAN and SPADE; moreover, it sets the new state of the art on subject transfer.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"1 1","pages":"713-729"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85072113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Super-Resolution by Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images 预测偏移量的超分辨率:用于栅格化图像的超高效超分辨率网络
Jinjin Gu, Haoming Cai, Chenyu Dong, Ruofan Zhang, Yulun Zhang, Wenming Yang, Chun Yuan
Rendering high-resolution (HR) graphics brings substantial computational costs. Efficient graphics super-resolution (SR) methods may achieve HR rendering with small computing resources and have attracted extensive research interests in industry and research communities. We present a new method for real-time SR for computer graphics, namely Super-Resolution by Predicting Offsets (SRPO). Our algorithm divides the image into two parts for processing, i.e., sharp edges and flatter areas. For edges, different from the previous SR methods that take the anti-aliased images as inputs, our proposed SRPO takes advantage of the characteristics of rasterized images to conduct SR on the rasterized images. To complement the residual between HR and low-resolution (LR) rasterized images, we train an ultra-efficient network to predict the offset maps to move the appropriate surrounding pixels to the new positions. For flat areas, we found simple interpolation methods can already generate reasonable output. We finally use a guided fusion operation to integrate the sharp edges generated by the network and flat areas by the interpolation method to get the final SR image. The proposed network only contains 8,434 parameters and can be accelerated by network quantization. Extensive experiments show that the proposed SRPO can achieve superior visual effects at a smaller computational cost than the existing state-of-the-art methods.
渲染高分辨率(HR)图形带来了大量的计算成本。高效的图形超分辨率(SR)方法可以用较小的计算资源实现HR渲染,引起了工业界和研究界的广泛研究兴趣。我们提出了一种新的计算机图形学实时SR方法,即预测偏移的超分辨率(SRPO)。我们的算法将图像分为两部分进行处理,即尖锐边缘和平坦区域。对于边缘,与以往以抗混叠图像为输入的SR方法不同,我们提出的SRPO利用栅格化图像的特性对栅格化图像进行SR。为了补充HR和低分辨率(LR)栅格化图像之间的残差,我们训练了一个超高效的网络来预测偏移映射,将适当的周围像素移动到新的位置。对于平坦区域,我们发现简单的插值方法已经可以产生合理的输出。最后通过引导融合运算,将网络生成的尖锐边缘与插值方法得到的平坦区域进行融合,得到最终的SR图像。该网络仅包含8434个参数,并可通过网络量化加速。大量的实验表明,与现有的最先进的方法相比,所提出的SRPO可以以更小的计算成本获得更好的视觉效果。
{"title":"Super-Resolution by Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images","authors":"Jinjin Gu, Haoming Cai, Chenyu Dong, Ruofan Zhang, Yulun Zhang, Wenming Yang, Chun Yuan","doi":"10.48550/arXiv.2210.04198","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04198","url":null,"abstract":"Rendering high-resolution (HR) graphics brings substantial computational costs. Efficient graphics super-resolution (SR) methods may achieve HR rendering with small computing resources and have attracted extensive research interests in industry and research communities. We present a new method for real-time SR for computer graphics, namely Super-Resolution by Predicting Offsets (SRPO). Our algorithm divides the image into two parts for processing, i.e., sharp edges and flatter areas. For edges, different from the previous SR methods that take the anti-aliased images as inputs, our proposed SRPO takes advantage of the characteristics of rasterized images to conduct SR on the rasterized images. To complement the residual between HR and low-resolution (LR) rasterized images, we train an ultra-efficient network to predict the offset maps to move the appropriate surrounding pixels to the new positions. For flat areas, we found simple interpolation methods can already generate reasonable output. We finally use a guided fusion operation to integrate the sharp edges generated by the network and flat areas by the interpolation method to get the final SR image. The proposed network only contains 8,434 parameters and can be accelerated by network quantization. Extensive experiments show that the proposed SRPO can achieve superior visual effects at a smaller computational cost than the existing state-of-the-art methods.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"143 1","pages":"583-598"},"PeriodicalIF":0.0,"publicationDate":"2022-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82897307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Attention Diversification for Domain Generalization 面向领域泛化的注意力分散
Rang Meng, Xianfeng Li, Weijie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Mingli Song, Di Xie, Shiliang Pu
Convolutional neural networks (CNNs) have demonstrated gratifying results at learning discriminative features. However, when applied to unseen domains, state-of-the-art models are usually prone to errors due to domain shift. After investigating this issue from the perspective of shortcut learning, we find the devils lie in the fact that models trained on different domains merely bias to different domain-specific features yet overlook diverse task-related features. Under this guidance, a novel Attention Diversification framework is proposed, in which Intra-Model and Inter-Model Attention Diversification Regularization are collaborated to reassign appropriate attention to diverse task-related features. Briefly, Intra-Model Attention Diversification Regularization is equipped on the high-level feature maps to achieve in-channel discrimination and cross-channel diversification via forcing different channels to pay their most salient attention to different spatial locations. Besides, Inter-Model Attention Diversification Regularization is proposed to further provide task-related attention diversification and domain-related attention suppression, which is a paradigm of"simulate, divide and assemble": simulate domain shift via exploiting multiple domain-specific models, divide attention maps into task-related and domain-related groups, and assemble them within each group respectively to execute regularization. Extensive experiments and analyses are conducted on various benchmarks to demonstrate that our method achieves state-of-the-art performance over other competing methods. Code is available at https://github.com/hikvision-research/DomainGeneralization.
卷积神经网络(cnn)在学习判别特征方面取得了令人满意的结果。然而,当应用于不可见的领域时,最先进的模型通常容易由于领域转移而产生错误。在从捷径学习的角度研究这个问题后,我们发现问题在于不同领域训练的模型仅仅偏向于不同领域特定的特征,而忽略了不同的任务相关特征。在此指导下,提出了一种新的注意力分散框架,该框架将模型内和模型间的注意力分散正则化相结合,将注意力重新分配到不同的任务相关特征上。简而言之,模型内注意力分散正则化(Intra-Model Attention Diversification Regularization)是在高级特征图上配置的,通过迫使不同的通道将其最突出的注意力集中到不同的空间位置来实现通道内区分和跨通道多样化。此外,提出了模型间注意多样化正则化,进一步提供任务相关的注意多样化和领域相关的注意抑制,这是一种“模拟、划分和组装”的范式:通过利用多个特定领域的模型模拟领域转移,将注意图划分为任务相关和领域相关的组,并分别在每个组内组装进行正则化。在各种基准上进行了广泛的实验和分析,以证明我们的方法比其他竞争方法实现了最先进的性能。代码可从https://github.com/hikvision-research/DomainGeneralization获得。
{"title":"Attention Diversification for Domain Generalization","authors":"Rang Meng, Xianfeng Li, Weijie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Mingli Song, Di Xie, Shiliang Pu","doi":"10.48550/arXiv.2210.04206","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04206","url":null,"abstract":"Convolutional neural networks (CNNs) have demonstrated gratifying results at learning discriminative features. However, when applied to unseen domains, state-of-the-art models are usually prone to errors due to domain shift. After investigating this issue from the perspective of shortcut learning, we find the devils lie in the fact that models trained on different domains merely bias to different domain-specific features yet overlook diverse task-related features. Under this guidance, a novel Attention Diversification framework is proposed, in which Intra-Model and Inter-Model Attention Diversification Regularization are collaborated to reassign appropriate attention to diverse task-related features. Briefly, Intra-Model Attention Diversification Regularization is equipped on the high-level feature maps to achieve in-channel discrimination and cross-channel diversification via forcing different channels to pay their most salient attention to different spatial locations. Besides, Inter-Model Attention Diversification Regularization is proposed to further provide task-related attention diversification and domain-related attention suppression, which is a paradigm of\"simulate, divide and assemble\": simulate domain shift via exploiting multiple domain-specific models, divide attention maps into task-related and domain-related groups, and assemble them within each group respectively to execute regularization. Extensive experiments and analyses are conducted on various benchmarks to demonstrate that our method achieves state-of-the-art performance over other competing methods. Code is available at https://github.com/hikvision-research/DomainGeneralization.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"6 1","pages":"322-340"},"PeriodicalIF":0.0,"publicationDate":"2022-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88474156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors IDa-Det:用于1位检测器的信息差异感知蒸馏
Sheng Xu, Yanjing Li, Bo-Wen Zeng, Teli Ma, Baochang Zhang, Xianbin Cao, Penglei Gao, Jinhu Lv
Knowledge distillation (KD) has been proven to be useful for training compact object detection models. However, we observe that KD is often effective when the teacher model and student counterpart share similar proposal information. This explains why existing KD methods are less effective for 1-bit detectors, caused by a significant information discrepancy between the real-valued teacher and the 1-bit student. This paper presents an Information Discrepancy-aware strategy (IDa-Det) to distill 1-bit detectors that can effectively eliminate information discrepancies and significantly reduce the performance gap between a 1-bit detector and its real-valued counterpart. We formulate the distillation process as a bi-level optimization formulation. At the inner level, we select the representative proposals with maximum information discrepancy. We then introduce a novel entropy distillation loss to reduce the disparity based on the selected proposals. Extensive experiments demonstrate IDa-Det's superiority over state-of-the-art 1-bit detectors and KD methods on both PASCAL VOC and COCO datasets. IDa-Det achieves a 76.9% mAP for a 1-bit Faster-RCNN with ResNet-18 backbone. Our code is open-sourced on https://github.com/SteveTsui/IDa-Det.
知识蒸馏(KD)已被证明是训练紧凑目标检测模型的有用方法。然而,我们观察到,当教师模型和学生模型共享相似的提议信息时,KD通常是有效的。这解释了为什么现有的KD方法对1位检测器的有效性较低,这是由于实值教师和1位学生之间存在显着的信息差异。本文提出了一种信息差异感知策略(IDa-Det)来提取1位检测器,该检测器可以有效地消除信息差异,并显着降低1位检测器与实值检测器之间的性能差距。我们将蒸馏过程制定为一个双层优化配方。在内部层面,我们选择信息差异最大的具有代表性的提案。然后,我们引入了一种新的熵蒸馏损失来减小基于所选建议的差异。大量的实验证明了IDa-Det在PASCAL VOC和COCO数据集上优于最先进的1位检测器和KD方法。IDa-Det在具有ResNet-18骨干网的1位更快的rcnn上实现了76.9%的mAP。我们的代码在https://github.com/SteveTsui/IDa-Det上是开源的。
{"title":"IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors","authors":"Sheng Xu, Yanjing Li, Bo-Wen Zeng, Teli Ma, Baochang Zhang, Xianbin Cao, Penglei Gao, Jinhu Lv","doi":"10.48550/arXiv.2210.03477","DOIUrl":"https://doi.org/10.48550/arXiv.2210.03477","url":null,"abstract":"Knowledge distillation (KD) has been proven to be useful for training compact object detection models. However, we observe that KD is often effective when the teacher model and student counterpart share similar proposal information. This explains why existing KD methods are less effective for 1-bit detectors, caused by a significant information discrepancy between the real-valued teacher and the 1-bit student. This paper presents an Information Discrepancy-aware strategy (IDa-Det) to distill 1-bit detectors that can effectively eliminate information discrepancies and significantly reduce the performance gap between a 1-bit detector and its real-valued counterpart. We formulate the distillation process as a bi-level optimization formulation. At the inner level, we select the representative proposals with maximum information discrepancy. We then introduce a novel entropy distillation loss to reduce the disparity based on the selected proposals. Extensive experiments demonstrate IDa-Det's superiority over state-of-the-art 1-bit detectors and KD methods on both PASCAL VOC and COCO datasets. IDa-Det achieves a 76.9% mAP for a 1-bit Faster-RCNN with ResNet-18 backbone. Our code is open-sourced on https://github.com/SteveTsui/IDa-Det.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"60 1","pages":"346-361"},"PeriodicalIF":0.0,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88015808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
FloatingFusion: Depth from ToF and Image-stabilized Stereo Cameras FloatingFusion:来自ToF和图像稳定立体相机的深度
Andreas Meuleman, Hak-Il Kim, J. Tompkin, Min H. Kim
{"title":"FloatingFusion: Depth from ToF and Image-stabilized Stereo Cameras","authors":"Andreas Meuleman, Hak-Il Kim, J. Tompkin, Min H. Kim","doi":"10.1007/978-3-031-19769-7_35","DOIUrl":"https://doi.org/10.1007/978-3-031-19769-7_35","url":null,"abstract":"","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"36 1","pages":"602-618"},"PeriodicalIF":0.0,"publicationDate":"2022-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84319181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Differentiable Raycasting for Self-supervised Occupancy Forecasting 自监督入住率预测的可微投射
Tarasha Khurana, Peiyun Hu, Achal Dave, Jason Ziglar, David Held, Deva Ramanan
Motion planning for safe autonomous driving requires learning how the environment around an ego-vehicle evolves with time. Ego-centric perception of driveable regions in a scene not only changes with the motion of actors in the environment, but also with the movement of the ego-vehicle itself. Self-supervised representations proposed for large-scale planning, such as ego-centric freespace, confound these two motions, making the representation difficult to use for downstream motion planners. In this paper, we use geometric occupancy as a natural alternative to view-dependent representations such as freespace. Occupancy maps naturally disentangle the motion of the environment from the motion of the ego-vehicle. However, one cannot directly observe the full 3D occupancy of a scene (due to occlusion), making it difficult to use as a signal for learning. Our key insight is to use differentiable raycasting to"render"future occupancy predictions into future LiDAR sweep predictions, which can be compared with ground-truth sweeps for self-supervised learning. The use of differentiable raycasting allows occupancy to emerge as an internal representation within the forecasting network. In the absence of groundtruth occupancy, we quantitatively evaluate the forecasting of raycasted LiDAR sweeps and show improvements of upto 15 F1 points. For downstream motion planners, where emergent occupancy can be directly used to guide non-driveable regions, this representation relatively reduces the number of collisions with objects by up to 17% as compared to freespace-centric motion planners.
安全自动驾驶的运动规划需要了解自动驾驶汽车周围的环境如何随着时间的推移而变化。场景中以自我为中心的可驾驶区域感知不仅随着环境中角色的运动而变化,而且随着自我车辆本身的运动而变化。针对大规模规划提出的自监督表示,如以自我为中心的自由空间,混淆了这两种运动,使得下游运动规划者难以使用该表示。在本文中,我们使用几何占位作为依赖于视图的表示(如自由空间)的自然替代。占用地图自然地将环境的运动与自我车辆的运动分离开来。然而,人们无法直接观察到场景的完整3D占用(由于遮挡),因此很难将其用作学习的信号。我们的关键见解是使用可微分光线投射将未来的占用预测“渲染”到未来的激光雷达扫描预测中,这可以与自监督学习的地面真相扫描进行比较。可微分光线投射的使用使得占用率作为预测网络中的内部表示形式出现。在没有真实占用的情况下,我们定量评估了光线投射激光雷达扫描的预测,并显示了高达15个F1点的改进。对于下游运动规划器,紧急占用可以直接用于引导不可驾驶区域,与以自由空间为中心的运动规划器相比,这种表示相对减少了与物体碰撞的数量,最多可减少17%。
{"title":"Differentiable Raycasting for Self-supervised Occupancy Forecasting","authors":"Tarasha Khurana, Peiyun Hu, Achal Dave, Jason Ziglar, David Held, Deva Ramanan","doi":"10.48550/arXiv.2210.01917","DOIUrl":"https://doi.org/10.48550/arXiv.2210.01917","url":null,"abstract":"Motion planning for safe autonomous driving requires learning how the environment around an ego-vehicle evolves with time. Ego-centric perception of driveable regions in a scene not only changes with the motion of actors in the environment, but also with the movement of the ego-vehicle itself. Self-supervised representations proposed for large-scale planning, such as ego-centric freespace, confound these two motions, making the representation difficult to use for downstream motion planners. In this paper, we use geometric occupancy as a natural alternative to view-dependent representations such as freespace. Occupancy maps naturally disentangle the motion of the environment from the motion of the ego-vehicle. However, one cannot directly observe the full 3D occupancy of a scene (due to occlusion), making it difficult to use as a signal for learning. Our key insight is to use differentiable raycasting to\"render\"future occupancy predictions into future LiDAR sweep predictions, which can be compared with ground-truth sweeps for self-supervised learning. The use of differentiable raycasting allows occupancy to emerge as an internal representation within the forecasting network. In the absence of groundtruth occupancy, we quantitatively evaluate the forecasting of raycasted LiDAR sweeps and show improvements of upto 15 F1 points. For downstream motion planners, where emergent occupancy can be directly used to guide non-driveable regions, this representation relatively reduces the number of collisions with objects by up to 17% as compared to freespace-centric motion planners.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"61 1","pages":"353-369"},"PeriodicalIF":0.0,"publicationDate":"2022-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89246636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
From Face to Natural Image: Learning Real Degradation for Blind Image Super-Resolution 从人脸到自然图像:学习盲图像超分辨率的真实退化
Xiaoming Li, Chaofeng Chen, Xianhui Lin, W. Zuo, Lei Zhang
How to design proper training pairs is critical for super-resolving real-world low-quality (LQ) images, which suffers from the difficulties in either acquiring paired ground-truth high-quality (HQ) images or synthesizing photo-realistic degraded LQ observations. Recent works mainly focus on modeling the degradation with handcrafted or estimated degradation parameters, which are however incapable to model complicated real-world degradation types, resulting in limited quality improvement. Notably, LQ face images, which may have the same degradation process as natural images, can be robustly restored with photo-realistic textures by exploiting their strong structural priors. This motivates us to use the real-world LQ face images and their restored HQ counterparts to model the complex real-world degradation (namely ReDegNet), and then transfer it to HQ natural images to synthesize their realistic LQ counterparts. By taking these paired HQ-LQ face images as inputs to explicitly predict the degradation-aware and content-independent representations, we could control the degraded image generation, and subsequently transfer these degradation representations from face to natural images to synthesize the degraded LQ natural images. Experiments show that our ReDegNet can well learn the real degradation process from face images. The restoration network trained with our synthetic pairs performs favorably against SOTAs. More importantly, our method provides a new way to handle the real-world complex scenarios by learning their degradation representations from the facial portions, which can be used to significantly improve the quality of non-facial areas. The source code is available at https://github.com/csxmli2016/ReDegNet.
如何设计合适的训练对对于超分辨真实世界低质量(LQ)图像是至关重要的,它既难以获得成对的高质量(HQ)图像,也难以合成逼真的退化LQ观测值。最近的研究主要集中在用手工制作的或估计的退化参数来建模退化,然而,这些方法无法模拟复杂的现实世界的退化类型,导致质量提高有限。值得注意的是,LQ人脸图像可能具有与自然图像相同的退化过程,通过利用其强结构先验,可以鲁棒地恢复具有逼真纹理的图像。这促使我们使用真实世界的LQ人脸图像及其还原的HQ对应图像来模拟复杂的真实世界退化(即ReDegNet),然后将其转移到HQ自然图像中以合成其真实的LQ对应图像。通过将这些配对的HQ-LQ人脸图像作为输入,明确预测退化感知和内容无关的表征,我们可以控制退化图像的生成,随后将这些退化表征从人脸转移到自然图像中,以合成退化的LQ自然图像。实验表明,我们的ReDegNet可以很好地学习人脸图像的真实退化过程。用我们的合成对训练的恢复网络对sota表现良好。更重要的是,我们的方法提供了一种新的方法来处理现实世界的复杂场景,通过学习面部部分的退化表示,可以显著提高非面部区域的质量。源代码可从https://github.com/csxmli2016/ReDegNet获得。
{"title":"From Face to Natural Image: Learning Real Degradation for Blind Image Super-Resolution","authors":"Xiaoming Li, Chaofeng Chen, Xianhui Lin, W. Zuo, Lei Zhang","doi":"10.48550/arXiv.2210.00752","DOIUrl":"https://doi.org/10.48550/arXiv.2210.00752","url":null,"abstract":"How to design proper training pairs is critical for super-resolving real-world low-quality (LQ) images, which suffers from the difficulties in either acquiring paired ground-truth high-quality (HQ) images or synthesizing photo-realistic degraded LQ observations. Recent works mainly focus on modeling the degradation with handcrafted or estimated degradation parameters, which are however incapable to model complicated real-world degradation types, resulting in limited quality improvement. Notably, LQ face images, which may have the same degradation process as natural images, can be robustly restored with photo-realistic textures by exploiting their strong structural priors. This motivates us to use the real-world LQ face images and their restored HQ counterparts to model the complex real-world degradation (namely ReDegNet), and then transfer it to HQ natural images to synthesize their realistic LQ counterparts. By taking these paired HQ-LQ face images as inputs to explicitly predict the degradation-aware and content-independent representations, we could control the degraded image generation, and subsequently transfer these degradation representations from face to natural images to synthesize the degraded LQ natural images. Experiments show that our ReDegNet can well learn the real degradation process from face images. The restoration network trained with our synthetic pairs performs favorably against SOTAs. More importantly, our method provides a new way to handle the real-world complex scenarios by learning their degradation representations from the facial portions, which can be used to significantly improve the quality of non-facial areas. The source code is available at https://github.com/csxmli2016/ReDegNet.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"56 1","pages":"376-392"},"PeriodicalIF":0.0,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79827963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1