Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision最新文献

英文中文

Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval 基于不确定性学习的人图像检索可靠性感知预测

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-24 DOI: 10.1007/978-3-031-19781-9_34

Zhaopeng Dou, Zhongdao Wang, Weihua Chen, Yali Li, Shengjin Wang

引用次数: 4

SC-wLS: Towards Interpretable Feed-forward Camera Re-localization SC-wLS:面向可解释前馈摄像机的再定位

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-23 DOI: 10.1007/978-3-031-19769-7_34

Xin Wu, Hao Zhao, Shunkai Li, Yingdian Cao, H. Zha

引用次数: 7

Photo-realistic Neural Domain Randomization 逼真的神经域随机化

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-23 DOI: 10.1007/978-3-031-19806-9_18

Sergey Zakharov, Rares Ambrus, V. Guizilini, Wadim Kehl, Adrien Gaidon

引用次数: 5

PoseScript: 3D Human Poses from Natural Language postscript:来自自然语言的3D人体姿势

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-21 DOI: 10.48550/arXiv.2210.11795

Ginger Delmas, Philippe Weinzaepfel, Thomas Lucas, F. Moreno-Noguer, Grégory Rogez

Natural language is leveraged in many computer vision tasks such as image captioning, cross-modal retrieval or visual question answering, to provide fine-grained semantic information. While human pose is key to human understanding, current 3D human pose datasets lack detailed language descriptions. In this work, we introduce the PoseScript dataset, which pairs a few thousand 3D human poses from AMASS with rich human-annotated descriptions of the body parts and their spatial relationships. To increase the size of this dataset to a scale compatible with typical data hungry learning algorithms, we propose an elaborate captioning process that generates automatic synthetic descriptions in natural language from given 3D keypoints. This process extracts low-level pose information -- the posecodes -- using a set of simple but generic rules on the 3D keypoints. The posecodes are then combined into higher level textual descriptions using syntactic rules. Automatic annotations substantially increase the amount of available data, and make it possible to effectively pretrain deep models for finetuning on human captions. To demonstrate the potential of annotated poses, we show applications of the PoseScript dataset to retrieval of relevant poses from large-scale datasets and to synthetic pose generation, both based on a textual pose description.

自然语言在许多计算机视觉任务中被利用，如图像字幕、跨模态检索或视觉问答，以提供细粒度的语义信息。虽然人体姿势是人类理解的关键，但目前的3D人体姿势数据集缺乏详细的语言描述。在这项工作中，我们引入了PoseScript数据集，该数据集将来自AMASS的数千个3D人体姿势与丰富的人体部位及其空间关系的人类注释描述配对。为了将该数据集的大小增加到与典型的数据饥渴学习算法兼容的规模，我们提出了一个精心设计的字幕过程，该过程可以从给定的3D关键点生成自然语言的自动合成描述。这个过程使用一组简单但通用的3D关键点规则提取低级姿态信息。然后使用语法规则将这些叠码组合成更高级的文本描述。自动注释大大增加了可用数据的数量，并使有效地预训练深度模型以微调人类标题成为可能。为了展示姿势标注的潜力，我们展示了PoseScript数据集在从大规模数据集中检索相关姿势和合成姿势生成方面的应用，两者都基于文本姿势描述。

{"title":"PoseScript: 3D Human Poses from Natural Language","authors":"Ginger Delmas, Philippe Weinzaepfel, Thomas Lucas, F. Moreno-Noguer, Grégory Rogez","doi":"10.48550/arXiv.2210.11795","DOIUrl":"https://doi.org/10.48550/arXiv.2210.11795","url":null,"abstract":"Natural language is leveraged in many computer vision tasks such as image captioning, cross-modal retrieval or visual question answering, to provide fine-grained semantic information. While human pose is key to human understanding, current 3D human pose datasets lack detailed language descriptions. In this work, we introduce the PoseScript dataset, which pairs a few thousand 3D human poses from AMASS with rich human-annotated descriptions of the body parts and their spatial relationships. To increase the size of this dataset to a scale compatible with typical data hungry learning algorithms, we propose an elaborate captioning process that generates automatic synthetic descriptions in natural language from given 3D keypoints. This process extracts low-level pose information -- the posecodes -- using a set of simple but generic rules on the 3D keypoints. The posecodes are then combined into higher level textual descriptions using syntactic rules. Automatic annotations substantially increase the amount of available data, and make it possible to effectively pretrain deep models for finetuning on human captions. To demonstrate the potential of annotated poses, we show applications of the PoseScript dataset to retrieval of relevant poses from large-scale datasets and to synthetic pose generation, both based on a textual pose description.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"9 1","pages":"346-362"},"PeriodicalIF":0.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76715550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Distilling the Undistillable: Learning from a Nasty Teacher 提炼不可提炼的东西:向一个讨厌的老师学习

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-21 DOI: 10.48550/arXiv.2210.11728

Surgan Jandial, Yash Khasbage, Arghya Pal, V. Balasubramanian, Balaji Krishnamurthy

The inadvertent stealing of private/sensitive information using Knowledge Distillation (KD) has been getting significant attention recently and has guided subsequent defense efforts considering its critical nature. Recent work Nasty Teacher proposed to develop teachers which can not be distilled or imitated by models attacking it. However, the promise of confidentiality offered by a nasty teacher is not well studied, and as a further step to strengthen against such loopholes, we attempt to bypass its defense and steal (or extract) information in its presence successfully. Specifically, we analyze Nasty Teacher from two different directions and subsequently leverage them carefully to develop simple yet efficient methodologies, named as HTC and SCM, which increase the learning from Nasty Teacher by upto 68.63% on standard datasets. Additionally, we also explore an improvised defense method based on our insights of stealing. Our detailed set of experiments and ablations on diverse models/settings demonstrate the efficacy of our approach.

最近，利用知识蒸馏(Knowledge Distillation, KD)技术无意中窃取私人/敏感信息的行为引起了人们的极大关注，并指导了后续的防御工作。最近的作品《讨厌的老师》提出要培养那些不能被攻击它的模式所提炼和模仿的教师。然而，一个讨厌的老师所提供的保密承诺并没有得到很好的研究，作为进一步加强对这些漏洞的防御，我们试图绕过它的防御，并成功地窃取(或提取)信息。具体来说，我们从两个不同的方向分析了Nasty Teacher，然后仔细地利用它们来开发简单而有效的方法，称为HTC和SCM，这将从Nasty Teacher中获得的学习在标准数据集上提高了68.63%。此外，我们还探索了一种基于我们对偷窃的见解的临时防御方法。我们在不同模型/设置上的详细实验和消融证明了我们方法的有效性。

{"title":"Distilling the Undistillable: Learning from a Nasty Teacher","authors":"Surgan Jandial, Yash Khasbage, Arghya Pal, V. Balasubramanian, Balaji Krishnamurthy","doi":"10.48550/arXiv.2210.11728","DOIUrl":"https://doi.org/10.48550/arXiv.2210.11728","url":null,"abstract":"The inadvertent stealing of private/sensitive information using Knowledge Distillation (KD) has been getting significant attention recently and has guided subsequent defense efforts considering its critical nature. Recent work Nasty Teacher proposed to develop teachers which can not be distilled or imitated by models attacking it. However, the promise of confidentiality offered by a nasty teacher is not well studied, and as a further step to strengthen against such loopholes, we attempt to bypass its defense and steal (or extract) information in its presence successfully. Specifically, we analyze Nasty Teacher from two different directions and subsequently leverage them carefully to develop simple yet efficient methodologies, named as HTC and SCM, which increase the learning from Nasty Teacher by upto 68.63% on standard datasets. Additionally, we also explore an improvised defense method based on our insights of stealing. Our detailed set of experiments and ablations on diverse models/settings demonstrate the efficacy of our approach.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"60 1","pages":"587-603"},"PeriodicalIF":0.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90853053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs GraphCSPN:基于动态GCNs的几何感知深度补全

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-19 DOI: 10.48550/arXiv.2210.10758

Xin Liu, Xiaofei Shao, Boqian Wang, Yali Li, Shengjin Wang

Image guided depth completion aims to recover per-pixel dense depth maps from sparse depth measurements with the help of aligned color images, which has a wide range of applications from robotics to autonomous driving. However, the 3D nature of sparse-to-dense depth completion has not been fully explored by previous methods. In this work, we propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion. First, unlike previous methods, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning. In addition, the proposed networks explicitly incorporate learnable geometric constraints to regularize the propagation process performed in three-dimensional space rather than in two-dimensional plane. Furthermore, we construct the graph utilizing sequences of feature patches, and update it dynamically with an edge attention module during propagation, so as to better capture both the local neighboring features and global relationships over long distance. Extensive experiments on both indoor NYU-Depth-v2 and outdoor KITTI datasets demonstrate that our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps. Code and models are available at the project page.

图像引导深度补全旨在借助对齐的彩色图像从稀疏深度测量中恢复每像素密集深度地图，从机器人到自动驾驶都有广泛的应用。然而，以前的方法尚未充分探索稀疏到密集深度完井的三维性质。在这项工作中，我们提出了一种基于图卷积的空间传播网络(GraphCSPN)作为深度补全的通用方法。首先，与以前的方法不同，我们利用卷积神经网络和图神经网络以互补的方式进行几何表示学习。此外，所提出的网络明确地结合了可学习的几何约束，以正则化在三维空间而不是二维平面上进行的传播过程。此外，我们利用特征补丁序列构建图，并在传播过程中使用边缘关注模块动态更新图，从而更好地捕获局部相邻特征和远距离全局关系。在室内NYU-Depth-v2和室外KITTI数据集上进行的大量实验表明，我们的方法达到了最先进的性能，特别是在仅使用几个传播步骤的情况下进行比较时。代码和模型可在项目页面中获得。

{"title":"GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs","authors":"Xin Liu, Xiaofei Shao, Boqian Wang, Yali Li, Shengjin Wang","doi":"10.48550/arXiv.2210.10758","DOIUrl":"https://doi.org/10.48550/arXiv.2210.10758","url":null,"abstract":"Image guided depth completion aims to recover per-pixel dense depth maps from sparse depth measurements with the help of aligned color images, which has a wide range of applications from robotics to autonomous driving. However, the 3D nature of sparse-to-dense depth completion has not been fully explored by previous methods. In this work, we propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion. First, unlike previous methods, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning. In addition, the proposed networks explicitly incorporate learnable geometric constraints to regularize the propagation process performed in three-dimensional space rather than in two-dimensional plane. Furthermore, we construct the graph utilizing sequences of feature patches, and update it dynamically with an edge attention module during propagation, so as to better capture both the local neighboring features and global relationships over long distance. Extensive experiments on both indoor NYU-Depth-v2 and outdoor KITTI datasets demonstrate that our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps. Code and models are available at the project page.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"55 1","pages":"90-107"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78228323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

LaMAR: Benchmarking Localization and Mapping for Augmented Reality LaMAR:增强现实的基准定位和映射

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-19 DOI: 10.48550/arXiv.2210.10770

Paul-Edouard Sarlin, Mihai Dusmanu, Johannes L. Schönberger, Pablo Speciale, Lukas Gruber, Viktor Larsson, O. Mikšík, M. Pollefeys

Localization and mapping is the foundational technology for augmented reality (AR) that enables sharing and persistence of digital content in the real world. While significant progress has been made, researchers are still mostly driven by unrealistic benchmarks not representative of real-world AR scenarios. These benchmarks are often based on small-scale datasets with low scene diversity, captured from stationary cameras, and lack other sensor inputs like inertial, radio, or depth data. Furthermore, their ground-truth (GT) accuracy is mostly insufficient to satisfy AR requirements. To close this gap, we introduce LaMAR, a new benchmark with a comprehensive capture and GT pipeline that co-registers realistic trajectories and sensor streams captured by heterogeneous AR devices in large, unconstrained scenes. To establish an accurate GT, our pipeline robustly aligns the trajectories against laser scans in a fully automated manner. As a result, we publish a benchmark dataset of diverse and large-scale scenes recorded with head-mounted and hand-held AR devices. We extend several state-of-the-art methods to take advantage of the AR-specific setup and evaluate them on our benchmark. The results offer new insights on current research and reveal promising avenues for future work in the field of localization and mapping for AR.

定位和映射是增强现实(AR)的基础技术，它使数字内容能够在现实世界中共享和持久。虽然已经取得了重大进展，但研究人员仍然主要受到不切实际的基准的驱动，而不是代表现实世界的AR场景。这些基准通常基于低场景多样性的小规模数据集，从固定相机捕获，并且缺乏其他传感器输入，如惯性，无线电或深度数据。此外，它们的地基真值(GT)精度大多不足以满足AR的要求。为了缩小这一差距，我们引入了LaMAR，这是一种新的基准，具有全面的捕获和GT管道，可在大型无约束场景中共同注册异构AR设备捕获的真实轨迹和传感器流。为了建立精确的GT，我们的管道以全自动的方式将轨迹与激光扫描进行对齐。因此，我们发布了一个使用头戴式和手持AR设备记录的各种大规模场景的基准数据集。我们扩展了几种最先进的方法，以利用特定于ar的设置，并在基准测试中对它们进行评估。这些结果为当前的研究提供了新的见解，并为AR的定位和地图绘制领域的未来工作揭示了有希望的途径。

{"title":"LaMAR: Benchmarking Localization and Mapping for Augmented Reality","authors":"Paul-Edouard Sarlin, Mihai Dusmanu, Johannes L. Schönberger, Pablo Speciale, Lukas Gruber, Viktor Larsson, O. Mikšík, M. Pollefeys","doi":"10.48550/arXiv.2210.10770","DOIUrl":"https://doi.org/10.48550/arXiv.2210.10770","url":null,"abstract":"Localization and mapping is the foundational technology for augmented reality (AR) that enables sharing and persistence of digital content in the real world. While significant progress has been made, researchers are still mostly driven by unrealistic benchmarks not representative of real-world AR scenarios. These benchmarks are often based on small-scale datasets with low scene diversity, captured from stationary cameras, and lack other sensor inputs like inertial, radio, or depth data. Furthermore, their ground-truth (GT) accuracy is mostly insufficient to satisfy AR requirements. To close this gap, we introduce LaMAR, a new benchmark with a comprehensive capture and GT pipeline that co-registers realistic trajectories and sensor streams captured by heterogeneous AR devices in large, unconstrained scenes. To establish an accurate GT, our pipeline robustly aligns the trajectories against laser scans in a fully automated manner. As a result, we publish a benchmark dataset of diverse and large-scale scenes recorded with head-mounted and hand-held AR devices. We extend several state-of-the-art methods to take advantage of the AR-specific setup and evaluate them on our benchmark. The results offer new insights on current research and reveal promising avenues for future work in the field of localization and mapping for AR.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"7 1","pages":"686-704"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78596733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Attaining Class-level Forgetting in Pretrained Model using Few Samples 使用少量样本实现预训练模型的类级遗忘

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-19 DOI: 10.48550/arXiv.2210.10670

Pravendra Singh, Pratik Mazumder, M. A. Karim

In order to address real-world problems, deep learning models are jointly trained on many classes. However, in the future, some classes may become restricted due to privacy/ethical concerns, and the restricted class knowledge has to be removed from the models that have been trained on them. The available data may also be limited due to privacy/ethical concerns, and re-training the model will not be possible. We propose a novel approach to address this problem without affecting the model's prediction power for the remaining classes. Our approach identifies the model parameters that are highly relevant to the restricted classes and removes the knowledge regarding the restricted classes from them using the limited available training data. Our approach is significantly faster and performs similar to the model re-trained on the complete data of the remaining classes.

为了解决现实世界的问题，深度学习模型在许多类上进行联合训练。然而，在未来，由于隐私/道德问题，一些类可能会受到限制，并且受限制的类知识必须从已对其进行培训的模型中删除。由于隐私/道德问题，可用的数据也可能有限，并且不可能重新训练模型。我们提出了一种新的方法来解决这个问题，而不影响模型对其余类别的预测能力。我们的方法识别与受限类高度相关的模型参数，并使用有限的可用训练数据从它们中删除有关受限类的知识。我们的方法明显更快，并且执行类似于在剩余类的完整数据上重新训练的模型。

引用次数: 1

Scaling Adversarial Training to Large Perturbation Bounds 将对抗训练扩展到大扰动界

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-18 DOI: 10.48550/arXiv.2210.09852

Sravanti Addepalli, Samyak Jain, Gaurang Sriramanan, R. Venkatesh Babu

The vulnerability of Deep Neural Networks to Adversarial Attacks has fuelled research towards building robust models. While most Adversarial Training algorithms aim at defending attacks constrained within low magnitude Lp norm bounds, real-world adversaries are not limited by such constraints. In this work, we aim to achieve adversarial robustness within larger bounds, against perturbations that may be perceptible, but do not change human (or Oracle) prediction. The presence of images that flip Oracle predictions and those that do not makes this a challenging setting for adversarial robustness. We discuss the ideal goals of an adversarial defense algorithm beyond perceptual limits, and further highlight the shortcomings of naively extending existing training algorithms to higher perturbation bounds. In order to overcome these shortcomings, we propose a novel defense, Oracle-Aligned Adversarial Training (OA-AT), to align the predictions of the network with that of an Oracle during adversarial training. The proposed approach achieves state-of-the-art performance at large epsilon bounds (such as an L-inf bound of 16/255 on CIFAR-10) while outperforming existing defenses (AWP, TRADES, PGD-AT) at standard bounds (8/255) as well.

深度神经网络对对抗性攻击的脆弱性推动了对构建鲁棒模型的研究。虽然大多数对抗性训练算法旨在防御受低幅度Lp范数约束的攻击，但现实世界中的对手并不受此类约束的限制。在这项工作中，我们的目标是在更大的范围内实现对抗性鲁棒性，以对抗可能可感知的扰动，但不会改变人类(或Oracle)的预测。图像的存在推翻了Oracle的预测，而那些没有推翻预测的图像，使得对抗性稳健性成为一个具有挑战性的设置。我们讨论了超越感知极限的对抗性防御算法的理想目标，并进一步强调了将现有训练算法天真地扩展到更高摄动界的缺点。为了克服这些缺点，我们提出了一种新的防御方法，Oracle- aligned Adversarial Training (OA-AT)，在对抗训练期间使网络的预测与Oracle的预测保持一致。所提出的方法在大的epsilon边界(例如CIFAR-10上的16/255的L-inf边界)上实现了最先进的性能，同时在标准边界(8/255)上也优于现有的防御(AWP, TRADES, PGD-AT)。

{"title":"Scaling Adversarial Training to Large Perturbation Bounds","authors":"Sravanti Addepalli, Samyak Jain, Gaurang Sriramanan, R. Venkatesh Babu","doi":"10.48550/arXiv.2210.09852","DOIUrl":"https://doi.org/10.48550/arXiv.2210.09852","url":null,"abstract":"The vulnerability of Deep Neural Networks to Adversarial Attacks has fuelled research towards building robust models. While most Adversarial Training algorithms aim at defending attacks constrained within low magnitude Lp norm bounds, real-world adversaries are not limited by such constraints. In this work, we aim to achieve adversarial robustness within larger bounds, against perturbations that may be perceptible, but do not change human (or Oracle) prediction. The presence of images that flip Oracle predictions and those that do not makes this a challenging setting for adversarial robustness. We discuss the ideal goals of an adversarial defense algorithm beyond perceptual limits, and further highlight the shortcomings of naively extending existing training algorithms to higher perturbation bounds. In order to overcome these shortcomings, we propose a novel defense, Oracle-Aligned Adversarial Training (OA-AT), to align the predictions of the network with that of an Oracle during adversarial training. The proposed approach achieves state-of-the-art performance at large epsilon bounds (such as an L-inf bound of 16/255 on CIFAR-10) while outperforming existing defenses (AWP, TRADES, PGD-AT) at standard bounds (8/255) as well.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"43 1","pages":"301-316"},"PeriodicalIF":0.0,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85435781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

ARAH: Animatable Volume Rendering of Articulated Human SDFs ARAH:铰接式人体sdf的可动画体渲染

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-18 DOI: 10.48550/arXiv.2210.10036

Shaofei Wang, Katja Schwarz, Andreas Geiger, Siyu Tang

Combining human body models with differentiable rendering has recently enabled animatable avatars of clothed humans from sparse sets of multi-view RGB videos. While state-of-the-art approaches achieve realistic appearance with neural radiance fields (NeRF), the inferred geometry often lacks detail due to missing geometric constraints. Further, animating avatars in out-of-distribution poses is not yet possible because the mapping from observation space to canonical space does not generalize faithfully to unseen poses. In this work, we address these shortcomings and propose a model to create animatable clothed human avatars with detailed geometry that generalize well to out-of-distribution poses. To achieve detailed geometry, we combine an articulated implicit surface representation with volume rendering. For generalization, we propose a novel joint root-finding algorithm for simultaneous ray-surface intersection search and correspondence search. Our algorithm enables efficient point sampling and accurate point canonicalization while generalizing well to unseen poses. We demonstrate that our proposed pipeline can generate clothed avatars with high-quality pose-dependent geometry and appearance from a sparse set of multi-view RGB videos. Our method achieves state-of-the-art performance on geometry and appearance reconstruction while creating animatable avatars that generalize well to out-of-distribution poses beyond the small number of training poses.

将人体模型与可微分渲染相结合，最近可以从稀疏的多视图RGB视频集中生成穿着衣服的人的动画化身。虽然最先进的方法可以通过神经辐射场(NeRF)实现逼真的外观，但由于缺少几何约束，推断的几何形状往往缺乏细节。此外，由于从观察空间到规范空间的映射不能忠实地推广到看不见的姿势，因此在非分布姿势中动画化身尚不可能。在这项工作中，我们解决了这些缺点，并提出了一个模型来创建具有详细几何形状的可动画化的穿着的人类化身，该模型可以很好地推广到分布外的姿势。为了获得详细的几何图形，我们将铰接的隐式表面表示与体绘制结合起来。为了推广，我们提出了一种同时进行射线面相交搜索和对应搜索的联合寻根算法。我们的算法能够实现高效的点采样和精确的点规范化，同时很好地推广到看不见的姿势。我们证明了我们提出的管道可以从一组稀疏的多视图RGB视频中生成具有高质量姿态相关几何形状和外观的穿着头像。我们的方法在几何和外观重建方面实现了最先进的性能，同时创建了可动画的化身，这些化身可以很好地推广到超出少量训练姿势的非分布姿势。

{"title":"ARAH: Animatable Volume Rendering of Articulated Human SDFs","authors":"Shaofei Wang, Katja Schwarz, Andreas Geiger, Siyu Tang","doi":"10.48550/arXiv.2210.10036","DOIUrl":"https://doi.org/10.48550/arXiv.2210.10036","url":null,"abstract":"Combining human body models with differentiable rendering has recently enabled animatable avatars of clothed humans from sparse sets of multi-view RGB videos. While state-of-the-art approaches achieve realistic appearance with neural radiance fields (NeRF), the inferred geometry often lacks detail due to missing geometric constraints. Further, animating avatars in out-of-distribution poses is not yet possible because the mapping from observation space to canonical space does not generalize faithfully to unseen poses. In this work, we address these shortcomings and propose a model to create animatable clothed human avatars with detailed geometry that generalize well to out-of-distribution poses. To achieve detailed geometry, we combine an articulated implicit surface representation with volume rendering. For generalization, we propose a novel joint root-finding algorithm for simultaneous ray-surface intersection search and correspondence search. Our algorithm enables efficient point sampling and accurate point canonicalization while generalizing well to unseen poses. We demonstrate that our proposed pipeline can generate clothed avatars with high-quality pose-dependent geometry and appearance from a sparse set of multi-view RGB videos. Our method achieves state-of-the-art performance on geometry and appearance reconstruction while creating animatable avatars that generalize well to out-of-distribution poses beyond the small number of training poses.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"35 1","pages":"1-19"},"PeriodicalIF":0.0,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86554090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 55

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀