Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision最新文献_第5页

DeepMend: Learning Occupancy Functions to Represent Shape for Repair DeepMend:学习占用函数来表示修复的形状

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05728

Nikolas Lamb, Sean Banerjee, N. Banerjee

We present DeepMend, a novel approach to reconstruct restorations to fractured shapes using learned occupancy functions. Existing shape repair approaches predict low-resolution voxelized restorations, or require symmetries or access to a pre-existing complete oracle. We represent the occupancy of a fractured shape as the conjunction of the occupancy of an underlying complete shape and the fracture surface, which we model as functions of latent codes using neural networks. Given occupancy samples from an input fractured shape, we estimate latent codes using an inference loss augmented with novel penalty terms that avoid empty or voluminous restorations. We use inferred codes to reconstruct the restoration shape. We show results with simulated fractures on synthetic and real-world scanned objects, and with scanned real fractured mugs. Compared to the existing voxel approach and two baseline methods, our work shows state-of-the-art results in accuracy and avoiding restoration artifacts over non-fracture regions of the fractured shape.

我们提出了DeepMend，一种利用学习的占用函数重建骨折形状修复的新方法。现有的形状修复方法预测低分辨率体素化修复，或需要对称性或访问预先存在的完整oracle。我们将裂缝形状的占用表示为潜在完整形状和裂缝表面的占用的结合，我们使用神经网络将其建模为潜在代码的函数。给定来自输入断裂形状的占用样本，我们使用推理损失和新的惩罚项来估计潜在代码，以避免空的或大量的恢复。我们使用推断代码来重建恢复形状。我们展示了合成和真实扫描对象的模拟骨折的结果，以及扫描的真实骨折杯子的结果。与现有的体素方法和两种基线方法相比，我们的工作显示了最先进的精度结果，并避免了在骨折形状的非骨折区域的修复伪影。

引用次数: 3

Global Spectral Filter Memory Network for Video Object Segmentation 用于视频目标分割的全局频谱滤波记忆网络

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05567

Yong Liu, R. Yu, Jiahao Wang, Xinyuan Zhao, Yitong Wang, Yansong Tang, Yujiu Yang

This paper studies semi-supervised video object segmentation through boosting intra-frame interaction. Recent memory network-based methods focus on exploiting inter-frame temporal reference while paying little attention to intra-frame spatial dependency. Specifically, these segmentation model tends to be susceptible to interference from unrelated nontarget objects in a certain frame. To this end, we propose Global Spectral Filter Memory network (GSFM), which improves intra-frame interaction through learning long-term spatial dependencies in the spectral domain. The key components of GSFM is 2D (inverse) discrete Fourier transform for spatial information mixing. Besides, we empirically find low frequency feature should be enhanced in encoder (backbone) while high frequency for decoder (segmentation head). We attribute this to semantic information extracting role for encoder and fine-grained details highlighting role for decoder. Thus, Low (High) Frequency Module is proposed to fit this circumstance. Extensive experiments on the popular DAVIS and YouTube-VOS benchmarks demonstrate that GSFM noticeably outperforms the baseline method and achieves state-of-the-art performance. Besides, extensive analysis shows that the proposed modules are reasonable and of great generalization ability. Our source code is available at https://github.com/workforai/GSFM.

本文通过增强帧内交互来研究半监督视频目标分割。目前基于记忆网络的方法主要关注帧间的时间参考，而很少关注帧内的空间依赖。具体来说，这些分割模型容易受到某一帧内不相关的非目标物体的干扰。为此，我们提出了全局频谱滤波记忆网络(GSFM)，该网络通过学习频谱域的长期空间依赖关系来改善帧内交互。GSFM的关键部分是用于空间信息混合的二维(逆)离散傅里叶变换。此外，我们还通过经验发现，在编码器(主干)中需要增强低频特征，而在解码器(分割头)中需要增强高频特征。我们将其归因于编码器的语义信息提取作用和解码器的细粒度细节突出作用。因此，提出了低(高)频率模块来适应这种情况。在流行的DAVIS和YouTube-VOS基准测试上进行的大量实验表明，GSFM明显优于基线方法，并实现了最先进的性能。此外，广泛的分析表明，所提出的模块是合理的，具有很强的泛化能力。我们的源代码可从https://github.com/workforai/GSFM获得。

{"title":"Global Spectral Filter Memory Network for Video Object Segmentation","authors":"Yong Liu, R. Yu, Jiahao Wang, Xinyuan Zhao, Yitong Wang, Yansong Tang, Yujiu Yang","doi":"10.48550/arXiv.2210.05567","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05567","url":null,"abstract":"This paper studies semi-supervised video object segmentation through boosting intra-frame interaction. Recent memory network-based methods focus on exploiting inter-frame temporal reference while paying little attention to intra-frame spatial dependency. Specifically, these segmentation model tends to be susceptible to interference from unrelated nontarget objects in a certain frame. To this end, we propose Global Spectral Filter Memory network (GSFM), which improves intra-frame interaction through learning long-term spatial dependencies in the spectral domain. The key components of GSFM is 2D (inverse) discrete Fourier transform for spatial information mixing. Besides, we empirically find low frequency feature should be enhanced in encoder (backbone) while high frequency for decoder (segmentation head). We attribute this to semantic information extracting role for encoder and fine-grained details highlighting role for decoder. Thus, Low (High) Frequency Module is proposed to fit this circumstance. Extensive experiments on the popular DAVIS and YouTube-VOS benchmarks demonstrate that GSFM noticeably outperforms the baseline method and achieves state-of-the-art performance. Besides, extensive analysis shows that the proposed modules are reasonable and of great generalization ability. Our source code is available at https://github.com/workforai/GSFM.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77418941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation DCL-Net: 6D姿态估计的深度对应学习网络

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05232

Hongyang Li, Jiehong Lin, K. Jia

. Establishment of point correspondence between camera and object coordinate systems is a promising way to solve 6D object poses. However, surrogate objectives of correspondence learning in 3D space are a step away from the true ones of object pose estimation, making the learning suboptimal for the end task. In this paper, we address this short-coming by introducing a new method of Deep Correspondence Learning Network for direct 6D object pose estimation, shortened as DCL-Net . Specifically, DCL-Net employs dual newly proposed Feature Disengagement and Alignment (FDA) modules to establish, in the feature space, partial-to-partial correspondence and complete-to-complete one for partial object observation and its complete CAD model, respectively, which result in aggregated pose and match feature pairs from two coordinate systems; these two FDA modules thus bring complementary advantages. The match feature pairs are used to learn confidence scores for measuring the qualities of deep correspondence, while the pose feature pairs are weighted by confidence scores for direct object pose regression. A confidence-based pose refinement network is also proposed to further improve pose precision in an iterative manner. Extensive experiments show that DCL-Net outperforms existing methods on three benchmarking datasets, including YCB-Video, LineMOD, and Oclussion-LineMOD; ablation studies also confirm the efficacy of our novel designs. Our code is released publicly at https://github.com/Gorilla-Lab-SCUT/DCL-Net .

．建立摄像机与目标坐标系之间的点对应关系是求解6D目标位姿的一种很有前途的方法。然而，三维空间中对应学习的替代目标与物体姿态估计的真实目标有一步之遥，使得学习对最终任务来说不是最优的。在本文中，我们通过引入一种新的用于直接6D目标姿态估计的深度对应学习网络(简称DCL-Net)方法来解决这一缺点。具体而言，DCL-Net采用新提出的双特征分离与对齐(Feature Disengagement and Alignment, FDA)模块，分别在特征空间中建立局部目标观测及其完整CAD模型的部分对部分对应关系和完全对完全对应关系，得到两个坐标系的姿态和匹配特征对聚合;这两个FDA模块因此带来了互补的优势。匹配特征对学习置信度分数用于测量深度对应的质量，姿态特征对加权置信度分数用于直接目标姿态回归。提出了一种基于置信度的姿态优化网络，以迭代的方式进一步提高姿态精度。大量实验表明，DCL-Net在YCB-Video、LineMOD和oclusion -LineMOD三个基准数据集上优于现有方法;消融研究也证实了我们的新设计的有效性。我们的代码在https://github.com/Gorilla-Lab-SCUT/DCL-Net公开发布。

{"title":"DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation","authors":"Hongyang Li, Jiehong Lin, K. Jia","doi":"10.48550/arXiv.2210.05232","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05232","url":null,"abstract":". Establishment of point correspondence between camera and object coordinate systems is a promising way to solve 6D object poses. However, surrogate objectives of correspondence learning in 3D space are a step away from the true ones of object pose estimation, making the learning suboptimal for the end task. In this paper, we address this short-coming by introducing a new method of Deep Correspondence Learning Network for direct 6D object pose estimation, shortened as DCL-Net . Specifically, DCL-Net employs dual newly proposed Feature Disengagement and Alignment (FDA) modules to establish, in the feature space, partial-to-partial correspondence and complete-to-complete one for partial object observation and its complete CAD model, respectively, which result in aggregated pose and match feature pairs from two coordinate systems; these two FDA modules thus bring complementary advantages. The match feature pairs are used to learn confidence scores for measuring the qualities of deep correspondence, while the pose feature pairs are weighted by confidence scores for direct object pose regression. A confidence-based pose refinement network is also proposed to further improve pose precision in an iterative manner. Extensive experiments show that DCL-Net outperforms existing methods on three benchmarking datasets, including YCB-Video, LineMOD, and Oclussion-LineMOD; ablation studies also confirm the efficacy of our novel designs. Our code is released publicly at https://github.com/Gorilla-Lab-SCUT/DCL-Net .","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78053634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Map-free Visual Relocalization: Metric Pose Relative to a Single Image 无地图视觉重定位:相对于单个图像的度量姿态

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05494

Eduardo Arnold, Jamie Wynn, S. Vicente, Guillermo Garcia-Hernando, 'Aron Monszpart, V. Prisacariu, Daniyar Turmukhambetov, Eric Brachmann

. Can we relocalize in a scene represented by a single reference image? Standard visual relocalization requires hundreds of images and scale calibration to build a scene-specific 3D map. In contrast, we propose Map-free Relocalization , i.e. , using only one photo of a scene to enable instant, metric scaled relocalization. Existing datasets are not suitable to benchmark map-free relocalization, due to their focus on large scenes or their limited variability. Thus, we have constructed a new dataset of 655 small places of interest, such as sculptures, murals and fountains, collected worldwide. Each place comes with a reference image to serve as a relocalization anchor, and dozens of query images with known, metric camera poses. The dataset features changing conditions, stark viewpoint changes, high variability across places, and queries with low to no visual overlap with the reference image. We identify two viable families of existing methods to provide baseline results: relative pose regression, and feature matching combined with single-image depth prediction. While these methods show reasonable performance on some favorable scenes in our dataset, map-free relocalization proves to be a challenge that requires new, innovative solutions.

。我们能否在由单个参考图像表示的场景中重新定位?标准的视觉重新定位需要数百张图像和比例校准来构建特定场景的3D地图。相比之下，我们提出了无地图重新定位，即仅使用场景的一张照片来实现即时的度量尺度重新定位。现有的数据集不适合进行基准测试，因为它们关注的是大场景或有限的可变性。因此，我们构建了一个新的数据集，其中包含655个小景点，如雕塑、壁画和喷泉，收集自世界各地。每个地方都有一个参考图像作为重新定位锚，以及数十个已知的查询图像，公制相机姿势。数据集的特点是不断变化的条件、明显的视点变化、不同地方的高度可变性，以及与参考图像的视觉重叠很少或没有重叠的查询。我们确定了两种可行的现有方法来提供基线结果:相对姿态回归和特征匹配结合单图像深度预测。虽然这些方法在我们数据集中的一些有利场景上显示出合理的性能，但无地图重新定位被证明是一个挑战，需要新的、创新的解决方案。

{"title":"Map-free Visual Relocalization: Metric Pose Relative to a Single Image","authors":"Eduardo Arnold, Jamie Wynn, S. Vicente, Guillermo Garcia-Hernando, 'Aron Monszpart, V. Prisacariu, Daniyar Turmukhambetov, Eric Brachmann","doi":"10.48550/arXiv.2210.05494","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05494","url":null,"abstract":". Can we relocalize in a scene represented by a single reference image? Standard visual relocalization requires hundreds of images and scale calibration to build a scene-specific 3D map. In contrast, we propose Map-free Relocalization , i.e. , using only one photo of a scene to enable instant, metric scaled relocalization. Existing datasets are not suitable to benchmark map-free relocalization, due to their focus on large scenes or their limited variability. Thus, we have constructed a new dataset of 655 small places of interest, such as sculptures, murals and fountains, collected worldwide. Each place comes with a reference image to serve as a relocalization anchor, and dozens of query images with known, metric camera poses. The dataset features changing conditions, stark viewpoint changes, high variability across places, and queries with low to no visual overlap with the reference image. We identify two viable families of existing methods to provide baseline results: relative pose regression, and feature matching combined with single-image depth prediction. While these methods show reasonable performance on some favorable scenes in our dataset, map-free relocalization proves to be a challenge that requires new, innovative solutions.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73238683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

LidarNAS: Unifying and Searching Neural Architectures for 3D Point Clouds LidarNAS:统一和搜索三维点云的神经架构

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.05018

Chenxi Liu, Zhaoqi Leng, Peigen Sun, Shuyang Cheng, C. Qi, Yin Zhou, Mingxing Tan, Drago Anguelov

Developing neural models that accurately understand objects in 3D point clouds is essential for the success of robotics and autonomous driving. However, arguably due to the higher-dimensional nature of the data (as compared to images), existing neural architectures exhibit a large variety in their designs, including but not limited to the views considered, the format of the neural features, and the neural operations used. Lack of a unified framework and interpretation makes it hard to put these designs in perspective, as well as systematically explore new ones. In this paper, we begin by proposing a unified framework of such, with the key idea being factorizing the neural networks into a series of view transforms and neural layers. We demonstrate that this modular framework can reproduce a variety of existing works while allowing a fair comparison of backbone designs. Then, we show how this framework can easily materialize into a concrete neural architecture search (NAS) space, allowing a principled NAS-for-3D exploration. In performing evolutionary NAS on the 3D object detection task on the Waymo Open Dataset, not only do we outperform the state-of-the-art models, but also report the interesting finding that NAS tends to discover the same macro-level architecture concept for both the vehicle and pedestrian classes.

开发能够准确理解3D点云中的物体的神经模型对于机器人和自动驾驶的成功至关重要。然而，由于数据的高维性质(与图像相比)，现有的神经架构在其设计中表现出多种多样，包括但不限于所考虑的视图、神经特征的格式和所使用的神经操作。由于缺乏统一的框架和解释，很难正确地看待这些设计，也很难系统地探索新的设计。在本文中，我们首先提出了一个统一的框架，其关键思想是将神经网络分解为一系列视图变换和神经层。我们证明，这种模块化框架可以重现各种现有的作品，同时允许主干设计的公平比较。然后，我们展示了这个框架如何容易地实现到一个具体的神经架构搜索(NAS)空间，允许一个原则性的NAS-for- 3d探索。在Waymo开放数据集上对3D目标检测任务执行渐进式NAS时，我们不仅优于最先进的模型，而且还报告了一个有趣的发现，即NAS倾向于为车辆和行人类别发现相同的宏观层面架构概念。

{"title":"LidarNAS: Unifying and Searching Neural Architectures for 3D Point Clouds","authors":"Chenxi Liu, Zhaoqi Leng, Peigen Sun, Shuyang Cheng, C. Qi, Yin Zhou, Mingxing Tan, Drago Anguelov","doi":"10.48550/arXiv.2210.05018","DOIUrl":"https://doi.org/10.48550/arXiv.2210.05018","url":null,"abstract":"Developing neural models that accurately understand objects in 3D point clouds is essential for the success of robotics and autonomous driving. However, arguably due to the higher-dimensional nature of the data (as compared to images), existing neural architectures exhibit a large variety in their designs, including but not limited to the views considered, the format of the neural features, and the neural operations used. Lack of a unified framework and interpretation makes it hard to put these designs in perspective, as well as systematically explore new ones. In this paper, we begin by proposing a unified framework of such, with the key idea being factorizing the neural networks into a series of view transforms and neural layers. We demonstrate that this modular framework can reproduce a variety of existing works while allowing a fair comparison of backbone designs. Then, we show how this framework can easily materialize into a concrete neural architecture search (NAS) space, allowing a principled NAS-for-3D exploration. In performing evolutionary NAS on the 3D object detection task on the Waymo Open Dataset, not only do we outperform the state-of-the-art models, but also report the interesting finding that NAS tends to discover the same macro-level architecture concept for both the vehicle and pedestrian classes.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82512092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

SCAM! Transferring humans between images with Semantic Cross Attention Modulation 骗局!语义交叉注意调制在图像间转移人

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.04883

Nicolas Dufour, David Picard, Vicky S. Kalogeiton

A large body of recent work targets semantically conditioned image generation. Most such methods focus on the narrower task of pose transfer and ignore the more challenging task of subject transfer that consists in not only transferring the pose but also the appearance and background. In this work, we introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse information in each semantic region of the image (including foreground and background), thus achieving precise generation with emphasis on fine details. This is enabled by the Semantic Attention Transformer Encoder that extracts multiple latent vectors for each semantic region, and the corresponding generator that exploits these multiple latents by using semantic cross attention modulation. It is trained only using a reconstruction setup, while subject transfer is performed at test time. Our analysis shows that our proposed architecture is successful at encoding the diversity of appearance in each semantic region. Extensive experiments on the iDesigner and CelebAMask-HD datasets show that SCAM outperforms SEAN and SPADE; moreover, it sets the new state of the art on subject transfer.

最近大量的工作都是针对语义条件下的图像生成。这些方法大多集中在较窄的姿势转移任务上，而忽略了更具有挑战性的主题转移任务，即不仅要转移姿势，还要转移外观和背景。在这项工作中，我们引入了SCAM(语义交叉注意调制)，这是一个系统，它在图像的每个语义区域(包括前景和背景)编码丰富多样的信息，从而实现精确生成，并强调细节。这是通过语义注意转换器编码器实现的，该编码器为每个语义区域提取多个潜在向量，相应的生成器通过使用语义交叉注意调制来利用这些多个潜在向量。它只使用重建设置进行训练，而受试者转移在测试时进行。我们的分析表明，我们提出的架构在编码每个语义区域的外观多样性方面是成功的。在idedesigner和CelebAMask-HD数据集上进行的大量实验表明，SCAM优于SEAN和SPADE;此外，它还开创了主体转移研究的新局面。

{"title":"SCAM! Transferring humans between images with Semantic Cross Attention Modulation","authors":"Nicolas Dufour, David Picard, Vicky S. Kalogeiton","doi":"10.48550/arXiv.2210.04883","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04883","url":null,"abstract":"A large body of recent work targets semantically conditioned image generation. Most such methods focus on the narrower task of pose transfer and ignore the more challenging task of subject transfer that consists in not only transferring the pose but also the appearance and background. In this work, we introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse information in each semantic region of the image (including foreground and background), thus achieving precise generation with emphasis on fine details. This is enabled by the Semantic Attention Transformer Encoder that extracts multiple latent vectors for each semantic region, and the corresponding generator that exploits these multiple latents by using semantic cross attention modulation. It is trained only using a reconstruction setup, while subject transfer is performed at test time. Our analysis shows that our proposed architecture is successful at encoding the diversity of appearance in each semantic region. Extensive experiments on the iDesigner and CelebAMask-HD datasets show that SCAM outperforms SEAN and SPADE; moreover, it sets the new state of the art on subject transfer.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85072113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Super-Resolution by Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images 预测偏移量的超分辨率:用于栅格化图像的超高效超分辨率网络

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-09 DOI: 10.48550/arXiv.2210.04198

Jinjin Gu, Haoming Cai, Chenyu Dong, Ruofan Zhang, Yulun Zhang, Wenming Yang, Chun Yuan

Rendering high-resolution (HR) graphics brings substantial computational costs. Efficient graphics super-resolution (SR) methods may achieve HR rendering with small computing resources and have attracted extensive research interests in industry and research communities. We present a new method for real-time SR for computer graphics, namely Super-Resolution by Predicting Offsets (SRPO). Our algorithm divides the image into two parts for processing, i.e., sharp edges and flatter areas. For edges, different from the previous SR methods that take the anti-aliased images as inputs, our proposed SRPO takes advantage of the characteristics of rasterized images to conduct SR on the rasterized images. To complement the residual between HR and low-resolution (LR) rasterized images, we train an ultra-efficient network to predict the offset maps to move the appropriate surrounding pixels to the new positions. For flat areas, we found simple interpolation methods can already generate reasonable output. We finally use a guided fusion operation to integrate the sharp edges generated by the network and flat areas by the interpolation method to get the final SR image. The proposed network only contains 8,434 parameters and can be accelerated by network quantization. Extensive experiments show that the proposed SRPO can achieve superior visual effects at a smaller computational cost than the existing state-of-the-art methods.

渲染高分辨率(HR)图形带来了大量的计算成本。高效的图形超分辨率(SR)方法可以用较小的计算资源实现HR渲染，引起了工业界和研究界的广泛研究兴趣。我们提出了一种新的计算机图形学实时SR方法，即预测偏移的超分辨率(SRPO)。我们的算法将图像分为两部分进行处理，即尖锐边缘和平坦区域。对于边缘，与以往以抗混叠图像为输入的SR方法不同，我们提出的SRPO利用栅格化图像的特性对栅格化图像进行SR。为了补充HR和低分辨率(LR)栅格化图像之间的残差，我们训练了一个超高效的网络来预测偏移映射，将适当的周围像素移动到新的位置。对于平坦区域，我们发现简单的插值方法已经可以产生合理的输出。最后通过引导融合运算，将网络生成的尖锐边缘与插值方法得到的平坦区域进行融合，得到最终的SR图像。该网络仅包含8434个参数，并可通过网络量化加速。大量的实验表明，与现有的最先进的方法相比，所提出的SRPO可以以更小的计算成本获得更好的视觉效果。

{"title":"Super-Resolution by Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images","authors":"Jinjin Gu, Haoming Cai, Chenyu Dong, Ruofan Zhang, Yulun Zhang, Wenming Yang, Chun Yuan","doi":"10.48550/arXiv.2210.04198","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04198","url":null,"abstract":"Rendering high-resolution (HR) graphics brings substantial computational costs. Efficient graphics super-resolution (SR) methods may achieve HR rendering with small computing resources and have attracted extensive research interests in industry and research communities. We present a new method for real-time SR for computer graphics, namely Super-Resolution by Predicting Offsets (SRPO). Our algorithm divides the image into two parts for processing, i.e., sharp edges and flatter areas. For edges, different from the previous SR methods that take the anti-aliased images as inputs, our proposed SRPO takes advantage of the characteristics of rasterized images to conduct SR on the rasterized images. To complement the residual between HR and low-resolution (LR) rasterized images, we train an ultra-efficient network to predict the offset maps to move the appropriate surrounding pixels to the new positions. For flat areas, we found simple interpolation methods can already generate reasonable output. We finally use a guided fusion operation to integrate the sharp edges generated by the network and flat areas by the interpolation method to get the final SR image. The proposed network only contains 8,434 parameters and can be accelerated by network quantization. Extensive experiments show that the proposed SRPO can achieve superior visual effects at a smaller computational cost than the existing state-of-the-art methods.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82897307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Attention Diversification for Domain Generalization 面向领域泛化的注意力分散

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-09 DOI: 10.48550/arXiv.2210.04206

Rang Meng, Xianfeng Li, Weijie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Mingli Song, Di Xie, Shiliang Pu

Convolutional neural networks (CNNs) have demonstrated gratifying results at learning discriminative features. However, when applied to unseen domains, state-of-the-art models are usually prone to errors due to domain shift. After investigating this issue from the perspective of shortcut learning, we find the devils lie in the fact that models trained on different domains merely bias to different domain-specific features yet overlook diverse task-related features. Under this guidance, a novel Attention Diversification framework is proposed, in which Intra-Model and Inter-Model Attention Diversification Regularization are collaborated to reassign appropriate attention to diverse task-related features. Briefly, Intra-Model Attention Diversification Regularization is equipped on the high-level feature maps to achieve in-channel discrimination and cross-channel diversification via forcing different channels to pay their most salient attention to different spatial locations. Besides, Inter-Model Attention Diversification Regularization is proposed to further provide task-related attention diversification and domain-related attention suppression, which is a paradigm of"simulate, divide and assemble": simulate domain shift via exploiting multiple domain-specific models, divide attention maps into task-related and domain-related groups, and assemble them within each group respectively to execute regularization. Extensive experiments and analyses are conducted on various benchmarks to demonstrate that our method achieves state-of-the-art performance over other competing methods. Code is available at https://github.com/hikvision-research/DomainGeneralization.

卷积神经网络(cnn)在学习判别特征方面取得了令人满意的结果。然而，当应用于不可见的领域时，最先进的模型通常容易由于领域转移而产生错误。在从捷径学习的角度研究这个问题后，我们发现问题在于不同领域训练的模型仅仅偏向于不同领域特定的特征，而忽略了不同的任务相关特征。在此指导下，提出了一种新的注意力分散框架，该框架将模型内和模型间的注意力分散正则化相结合，将注意力重新分配到不同的任务相关特征上。简而言之，模型内注意力分散正则化(Intra-Model Attention Diversification Regularization)是在高级特征图上配置的，通过迫使不同的通道将其最突出的注意力集中到不同的空间位置来实现通道内区分和跨通道多样化。此外，提出了模型间注意多样化正则化，进一步提供任务相关的注意多样化和领域相关的注意抑制，这是一种“模拟、划分和组装”的范式:通过利用多个特定领域的模型模拟领域转移，将注意图划分为任务相关和领域相关的组，并分别在每个组内组装进行正则化。在各种基准上进行了广泛的实验和分析，以证明我们的方法比其他竞争方法实现了最先进的性能。代码可从https://github.com/hikvision-research/DomainGeneralization获得。

{"title":"Attention Diversification for Domain Generalization","authors":"Rang Meng, Xianfeng Li, Weijie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Mingli Song, Di Xie, Shiliang Pu","doi":"10.48550/arXiv.2210.04206","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04206","url":null,"abstract":"Convolutional neural networks (CNNs) have demonstrated gratifying results at learning discriminative features. However, when applied to unseen domains, state-of-the-art models are usually prone to errors due to domain shift. After investigating this issue from the perspective of shortcut learning, we find the devils lie in the fact that models trained on different domains merely bias to different domain-specific features yet overlook diverse task-related features. Under this guidance, a novel Attention Diversification framework is proposed, in which Intra-Model and Inter-Model Attention Diversification Regularization are collaborated to reassign appropriate attention to diverse task-related features. Briefly, Intra-Model Attention Diversification Regularization is equipped on the high-level feature maps to achieve in-channel discrimination and cross-channel diversification via forcing different channels to pay their most salient attention to different spatial locations. Besides, Inter-Model Attention Diversification Regularization is proposed to further provide task-related attention diversification and domain-related attention suppression, which is a paradigm of\"simulate, divide and assemble\": simulate domain shift via exploiting multiple domain-specific models, divide attention maps into task-related and domain-related groups, and assemble them within each group respectively to execute regularization. Extensive experiments and analyses are conducted on various benchmarks to demonstrate that our method achieves state-of-the-art performance over other competing methods. Code is available at https://github.com/hikvision-research/DomainGeneralization.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88474156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors IDa-Det:用于1位检测器的信息差异感知蒸馏

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-07 DOI: 10.48550/arXiv.2210.03477

Sheng Xu, Yanjing Li, Bo-Wen Zeng, Teli Ma, Baochang Zhang, Xianbin Cao, Penglei Gao, Jinhu Lv

Knowledge distillation (KD) has been proven to be useful for training compact object detection models. However, we observe that KD is often effective when the teacher model and student counterpart share similar proposal information. This explains why existing KD methods are less effective for 1-bit detectors, caused by a significant information discrepancy between the real-valued teacher and the 1-bit student. This paper presents an Information Discrepancy-aware strategy (IDa-Det) to distill 1-bit detectors that can effectively eliminate information discrepancies and significantly reduce the performance gap between a 1-bit detector and its real-valued counterpart. We formulate the distillation process as a bi-level optimization formulation. At the inner level, we select the representative proposals with maximum information discrepancy. We then introduce a novel entropy distillation loss to reduce the disparity based on the selected proposals. Extensive experiments demonstrate IDa-Det's superiority over state-of-the-art 1-bit detectors and KD methods on both PASCAL VOC and COCO datasets. IDa-Det achieves a 76.9% mAP for a 1-bit Faster-RCNN with ResNet-18 backbone. Our code is open-sourced on https://github.com/SteveTsui/IDa-Det.

知识蒸馏(KD)已被证明是训练紧凑目标检测模型的有用方法。然而，我们观察到，当教师模型和学生模型共享相似的提议信息时，KD通常是有效的。这解释了为什么现有的KD方法对1位检测器的有效性较低，这是由于实值教师和1位学生之间存在显着的信息差异。本文提出了一种信息差异感知策略(IDa-Det)来提取1位检测器，该检测器可以有效地消除信息差异，并显着降低1位检测器与实值检测器之间的性能差距。我们将蒸馏过程制定为一个双层优化配方。在内部层面，我们选择信息差异最大的具有代表性的提案。然后，我们引入了一种新的熵蒸馏损失来减小基于所选建议的差异。大量的实验证明了IDa-Det在PASCAL VOC和COCO数据集上优于最先进的1位检测器和KD方法。IDa-Det在具有ResNet-18骨干网的1位更快的rcnn上实现了76.9%的mAP。我们的代码在https://github.com/SteveTsui/IDa-Det上是开源的。

{"title":"IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors","authors":"Sheng Xu, Yanjing Li, Bo-Wen Zeng, Teli Ma, Baochang Zhang, Xianbin Cao, Penglei Gao, Jinhu Lv","doi":"10.48550/arXiv.2210.03477","DOIUrl":"https://doi.org/10.48550/arXiv.2210.03477","url":null,"abstract":"Knowledge distillation (KD) has been proven to be useful for training compact object detection models. However, we observe that KD is often effective when the teacher model and student counterpart share similar proposal information. This explains why existing KD methods are less effective for 1-bit detectors, caused by a significant information discrepancy between the real-valued teacher and the 1-bit student. This paper presents an Information Discrepancy-aware strategy (IDa-Det) to distill 1-bit detectors that can effectively eliminate information discrepancies and significantly reduce the performance gap between a 1-bit detector and its real-valued counterpart. We formulate the distillation process as a bi-level optimization formulation. At the inner level, we select the representative proposals with maximum information discrepancy. We then introduce a novel entropy distillation loss to reduce the disparity based on the selected proposals. Extensive experiments demonstrate IDa-Det's superiority over state-of-the-art 1-bit detectors and KD methods on both PASCAL VOC and COCO datasets. IDa-Det achieves a 76.9% mAP for a 1-bit Faster-RCNN with ResNet-18 backbone. Our code is open-sourced on https://github.com/SteveTsui/IDa-Det.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88015808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

FloatingFusion: Depth from ToF and Image-stabilized Stereo Cameras FloatingFusion:来自ToF和图像稳定立体相机的深度

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-06 DOI: 10.1007/978-3-031-19769-7_35

Andreas Meuleman, Hak-Il Kim, J. Tompkin, Min H. Kim

引用次数: 2