Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision最新文献

英文中文

3D Scene Inference from Transient Histograms 基于瞬态直方图的3D场景推断

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-11-09 DOI: 10.48550/arXiv.2211.05094

Sacha Jungerman, A. Ingle, Yin Li, Mohit Gupta

Time-resolved image sensors that capture light at pico-to-nanosecond timescales were once limited to niche applications but are now rapidly becoming mainstream in consumer devices. We propose low-cost and low-power imaging modalities that capture scene information from minimal time-resolved image sensors with as few as one pixel. The key idea is to flood illuminate large scene patches (or the entire scene) with a pulsed light source and measure the time-resolved reflected light by integrating over the entire illuminated area. The one-dimensional measured temporal waveform, called emph{transient}, encodes both distances and albedoes at all visible scene points and as such is an aggregate proxy for the scene's 3D geometry. We explore the viability and limitations of the transient waveforms by themselves for recovering scene information, and also when combined with traditional RGB cameras. We show that plane estimation can be performed from a single transient and that using only a few more it is possible to recover a depth map of the whole scene. We also show two proof-of-concept hardware prototypes that demonstrate the feasibility of our approach for compact, mobile, and budget-limited applications.

在皮到纳秒的时间尺度上捕捉光的时间分辨率图像传感器曾经局限于小众应用，但现在正迅速成为消费设备的主流。我们提出了低成本和低功耗的成像模式，从最小的时间分辨率图像传感器捕获场景信息，只有一个像素。关键思想是用脉冲光源照射大的场景斑块(或整个场景)，并通过对整个照明区域进行积分来测量时间分辨反射光。一维测量的时间波形，称为emph{瞬态}，编码所有可见场景点的距离和反照率，因此是场景3D几何形状的综合代理。我们探讨了瞬态波形本身用于恢复场景信息的可行性和局限性，以及与传统RGB相机结合使用时的可行性和局限性。我们表明，平面估计可以从一个单一的瞬态执行，并且只使用几个，就有可能恢复整个场景的深度图。我们还展示了两个概念验证硬件原型，它们证明了我们的方法对于紧凑、移动和预算有限的应用程序的可行性。

{"title":"3D Scene Inference from Transient Histograms","authors":"Sacha Jungerman, A. Ingle, Yin Li, Mohit Gupta","doi":"10.48550/arXiv.2211.05094","DOIUrl":"https://doi.org/10.48550/arXiv.2211.05094","url":null,"abstract":"Time-resolved image sensors that capture light at pico-to-nanosecond timescales were once limited to niche applications but are now rapidly becoming mainstream in consumer devices. We propose low-cost and low-power imaging modalities that capture scene information from minimal time-resolved image sensors with as few as one pixel. The key idea is to flood illuminate large scene patches (or the entire scene) with a pulsed light source and measure the time-resolved reflected light by integrating over the entire illuminated area. The one-dimensional measured temporal waveform, called emph{transient}, encodes both distances and albedoes at all visible scene points and as such is an aggregate proxy for the scene's 3D geometry. We explore the viability and limitations of the transient waveforms by themselves for recovering scene information, and also when combined with traditional RGB cameras. We show that plane estimation can be performed from a single transient and that using only a few more it is possible to recover a depth map of the whole scene. We also show two proof-of-concept hardware prototypes that demonstrate the feasibility of our approach for compact, mobile, and budget-limited applications.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"14 1","pages":"401-417"},"PeriodicalIF":0.0,"publicationDate":"2022-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88742523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Editable Indoor Lighting Estimation 可编辑的室内照明估计

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-11-08 DOI: 10.48550/arXiv.2211.03928

Henrique Weber, Mathieu Garon, Jean-François Lalonde

. We present a method for estimating lighting from a single perspective image of an indoor scene. Previous methods for predicting indoor illumination usually focus on either simple, parametric lighting that lack realism, or on richer representations that are difficult or even impossible to understand or modify after prediction. We propose a pipeline that estimates a parametric light that is easy to edit and allows renderings with strong shadows, alongside with a non-parametric texture with high-frequency information necessary for realistic rendering of specular objects. Once estimated, the predictions obtained with our model are interpretable and can easily be modified by an artist/user with a few mouse clicks. Quantitative and qualitative results show that our approach makes indoor lighting estimation easier to handle by a casual user, while still producing competitive results.

。我们提出了一种从室内场景的单视角图像估计照明的方法。以前预测室内照明的方法通常要么集中在缺乏真实感的简单参数化照明上，要么集中在预测后难以甚至不可能理解或修改的更丰富的表示上。我们提出了一个管道，估计一个易于编辑的参数光，并允许具有强阴影的渲染，以及具有高频率信息的非参数纹理，这对于高光物体的逼真渲染是必要的。一旦估计，我们的模型得到的预测是可解释的，可以很容易地被艺术家/用户用鼠标点击几下修改。定量和定性结果表明，我们的方法使室内照明估计更容易被普通用户处理，同时仍然产生有竞争力的结果。

引用次数: 1

RRSR: Reciprocal Reference-based Image Super-Resolution with Progressive Feature Alignment and Selection 基于互向参考的图像超分辨率渐进式特征对齐与选择

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-11-08 DOI: 10.48550/arXiv.2211.04203

Lin Zhang, Xin Li, Dongliang He, Fu Li, Yili Wang, Zhao Zhang

Reference-based image super-resolution (RefSR) is a promising SR branch and has shown great potential in overcoming the limitations of single image super-resolution. While previous state-of-the-art RefSR methods mainly focus on improving the efficacy and robustness of reference feature transfer, it is generally overlooked that a well reconstructed SR image should enable better SR reconstruction for its similar LR images when it is referred to as. Therefore, in this work, we propose a reciprocal learning framework that can appropriately leverage such a fact to reinforce the learning of a RefSR network. Besides, we deliberately design a progressive feature alignment and selection module for further improving the RefSR task. The newly proposed module aligns reference-input images at multi-scale feature spaces and performs reference-aware feature selection in a progressive manner, thus more precise reference features can be transferred into the input features and the network capability is enhanced. Our reciprocal learning paradigm is model-agnostic and it can be applied to arbitrary RefSR models. We empirically show that multiple recent state-of-the-art RefSR models can be consistently improved with our reciprocal learning paradigm. Furthermore, our proposed model together with the reciprocal learning strategy sets new state-of-the-art performances on multiple benchmarks.

基于参考的图像超分辨率(RefSR)是一个很有前途的图像超分辨率分支，在克服单幅图像超分辨率的局限性方面显示出巨大的潜力。虽然之前最先进的RefSR方法主要侧重于提高参考特征转移的有效性和鲁棒性，但通常忽略了一个重建良好的SR图像应该能够更好地重建其相似的LR图像，当它被称为。因此，在这项工作中，我们提出了一个互惠学习框架，可以适当地利用这一事实来加强RefSR网络的学习。此外，为了进一步改进RefSR任务，我们特意设计了一个渐进式特征对齐和选择模块。该模块在多尺度特征空间对参考输入图像进行对齐，并逐步进行参考感知特征选择，从而将更精确的参考特征转移到输入特征中，增强了网络性能。我们的互惠学习范式是模型不可知的，它可以应用于任意的RefSR模型。我们的经验表明，多个最新的最先进的RefSR模型可以通过我们的互惠学习范式不断改进。此外，我们提出的模型与互惠学习策略一起在多个基准上设定了新的最先进的性能。

{"title":"RRSR: Reciprocal Reference-based Image Super-Resolution with Progressive Feature Alignment and Selection","authors":"Lin Zhang, Xin Li, Dongliang He, Fu Li, Yili Wang, Zhao Zhang","doi":"10.48550/arXiv.2211.04203","DOIUrl":"https://doi.org/10.48550/arXiv.2211.04203","url":null,"abstract":"Reference-based image super-resolution (RefSR) is a promising SR branch and has shown great potential in overcoming the limitations of single image super-resolution. While previous state-of-the-art RefSR methods mainly focus on improving the efficacy and robustness of reference feature transfer, it is generally overlooked that a well reconstructed SR image should enable better SR reconstruction for its similar LR images when it is referred to as. Therefore, in this work, we propose a reciprocal learning framework that can appropriately leverage such a fact to reinforce the learning of a RefSR network. Besides, we deliberately design a progressive feature alignment and selection module for further improving the RefSR task. The newly proposed module aligns reference-input images at multi-scale feature spaces and performs reference-aware feature selection in a progressive manner, thus more precise reference features can be transferred into the input features and the network capability is enhanced. Our reciprocal learning paradigm is model-agnostic and it can be applied to arbitrary RefSR models. We empirically show that multiple recent state-of-the-art RefSR models can be consistently improved with our reciprocal learning paradigm. Furthermore, our proposed model together with the reciprocal learning strategy sets new state-of-the-art performances on multiple benchmarks.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"7 1","pages":"648-664"},"PeriodicalIF":0.0,"publicationDate":"2022-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89730799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Towards Real World HDRTV Reconstruction: A Data Synthesis-based Approach 面向现实世界的HDRTV重建:一种基于数据综合的方法

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-11-06 DOI: 10.48550/arXiv.2211.03058

Zhen Cheng, Tao Wang, Yong Li, Fenglong Song, C. Chen, Zhiwei Xiong

Existing deep learning based HDRTV reconstruction methods assume one kind of tone mapping operators (TMOs) as the degradation procedure to synthesize SDRTV-HDRTV pairs for supervised training. In this paper, we argue that, although traditional TMOs exploit efficient dynamic range compression priors, they have several drawbacks on modeling the realistic degradation: information over-preservation, color bias and possible artifacts, making the trained reconstruction networks hard to generalize well to real-world cases. To solve this problem, we propose a learning-based data synthesis approach to learn the properties of real-world SDRTVs by integrating several tone mapping priors into both network structures and loss functions. In specific, we design a conditioned two-stream network with prior tone mapping results as a guidance to synthesize SDRTVs by both global and local transformations. To train the data synthesis network, we form a novel self-supervised content loss to constraint different aspects of the synthesized SDRTVs at regions with different brightness distributions and an adversarial loss to emphasize the details to be more realistic. To validate the effectiveness of our approach, we synthesize SDRTV-HDRTV pairs with our method and use them to train several HDRTV reconstruction networks. Then we collect two inference datasets containing both labeled and unlabeled real-world SDRTVs, respectively. Experimental results demonstrate that, the networks trained with our synthesized data generalize significantly better to these two real-world datasets than existing solutions.

现有的基于深度学习的HDRTV重建方法采用一种音调映射算子(TMOs)作为退化过程，合成SDRTV-HDRTV对进行监督训练。在本文中，我们认为，尽管传统的TMOs利用了有效的动态范围压缩先验，但它们在模拟现实退化方面存在一些缺点:信息过度保存、颜色偏差和可能的伪影，使得训练好的重建网络难以很好地推广到现实世界的情况。为了解决这个问题，我们提出了一种基于学习的数据合成方法，通过将几个音调映射先验值集成到网络结构和损失函数中来学习真实世界sdrtv的属性。具体来说，我们设计了一个有条件的两流网络，以先验的音调映射结果作为指导，通过全局和局部变换合成sdrtv。为了训练数据合成网络，我们形成了一种新的自监督内容损失来约束合成的sdrtv在不同亮度分布区域的不同方面，并形成了一种对抗损失来强调细节，使其更加逼真。为了验证该方法的有效性，我们利用该方法合成了SDRTV-HDRTV对，并用它们训练了多个HDRTV重建网络。然后，我们收集了两个推理数据集，分别包含标记和未标记的真实世界的sdrtv。实验结果表明，与现有的解决方案相比，用我们的合成数据训练的网络对这两个现实世界数据集的泛化能力明显更好。

{"title":"Towards Real World HDRTV Reconstruction: A Data Synthesis-based Approach","authors":"Zhen Cheng, Tao Wang, Yong Li, Fenglong Song, C. Chen, Zhiwei Xiong","doi":"10.48550/arXiv.2211.03058","DOIUrl":"https://doi.org/10.48550/arXiv.2211.03058","url":null,"abstract":"Existing deep learning based HDRTV reconstruction methods assume one kind of tone mapping operators (TMOs) as the degradation procedure to synthesize SDRTV-HDRTV pairs for supervised training. In this paper, we argue that, although traditional TMOs exploit efficient dynamic range compression priors, they have several drawbacks on modeling the realistic degradation: information over-preservation, color bias and possible artifacts, making the trained reconstruction networks hard to generalize well to real-world cases. To solve this problem, we propose a learning-based data synthesis approach to learn the properties of real-world SDRTVs by integrating several tone mapping priors into both network structures and loss functions. In specific, we design a conditioned two-stream network with prior tone mapping results as a guidance to synthesize SDRTVs by both global and local transformations. To train the data synthesis network, we form a novel self-supervised content loss to constraint different aspects of the synthesized SDRTVs at regions with different brightness distributions and an adversarial loss to emphasize the details to be more realistic. To validate the effectiveness of our approach, we synthesize SDRTV-HDRTV pairs with our method and use them to train several HDRTV reconstruction networks. Then we collect two inference datasets containing both labeled and unlabeled real-world SDRTVs, respectively. Experimental results demonstrate that, the networks trained with our synthesized data generalize significantly better to these two real-world datasets than existing solutions.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"136 1","pages":"199-216"},"PeriodicalIF":0.0,"publicationDate":"2022-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84938790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Large Scale Real-World Multi-Person Tracking 大规模的真实世界多人跟踪

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-11-03 DOI: 10.48550/arXiv.2211.02175

Bing Shuai, Alessandro Bergamo, Uta Buechler, Andrew G. Berneshawi, Alyssa Boden, Joseph Tighe

This paper presents a new large scale multi-person tracking dataset -- texttt{PersonPath22}, which is over an order of magnitude larger than currently available high quality multi-object tracking datasets such as MOT17, HiEve, and MOT20 datasets. The lack of large scale training and test data for this task has limited the community's ability to understand the performance of their tracking systems on a wide range of scenarios and conditions such as variations in person density, actions being performed, weather, and time of day. texttt{PersonPath22} dataset was specifically sourced to provide a wide variety of these conditions and our annotations include rich meta-data such that the performance of a tracker can be evaluated along these different dimensions. The lack of training data has also limited the ability to perform end-to-end training of tracking systems. As such, the highest performing tracking systems all rely on strong detectors trained on external image datasets. We hope that the release of this dataset will enable new lines of research that take advantage of large scale video based training data.

本文提出了一个新的大规模多人跟踪数据集——texttt{PersonPath22}，它比目前可用的高质量多目标跟踪数据集(如MOT17、HiEve和MOT20数据集)大一个数量级以上。缺乏大规模的训练和测试数据限制了社区了解其跟踪系统在各种场景和条件下的性能的能力，例如人员密度的变化、正在执行的动作、天气和一天中的时间。texttt{PersonPath22}数据集专门用于提供各种各样的这些条件，我们的注释包括丰富的元数据，这样跟踪器的性能可以沿着这些不同的维度进行评估。训练数据的缺乏也限制了跟踪系统进行端到端训练的能力。因此，性能最高的跟踪系统都依赖于外部图像数据集训练的强检测器。我们希望这个数据集的发布将使利用大规模视频训练数据的新研究成为可能。

{"title":"Large Scale Real-World Multi-Person Tracking","authors":"Bing Shuai, Alessandro Bergamo, Uta Buechler, Andrew G. Berneshawi, Alyssa Boden, Joseph Tighe","doi":"10.48550/arXiv.2211.02175","DOIUrl":"https://doi.org/10.48550/arXiv.2211.02175","url":null,"abstract":"This paper presents a new large scale multi-person tracking dataset -- texttt{PersonPath22}, which is over an order of magnitude larger than currently available high quality multi-object tracking datasets such as MOT17, HiEve, and MOT20 datasets. The lack of large scale training and test data for this task has limited the community's ability to understand the performance of their tracking systems on a wide range of scenarios and conditions such as variations in person density, actions being performed, weather, and time of day. texttt{PersonPath22} dataset was specifically sourced to provide a wide variety of these conditions and our annotations include rich meta-data such that the performance of a tracker can be evaluated along these different dimensions. The lack of training data has also limited the ability to perform end-to-end training of tracking systems. As such, the highest performing tracking systems all rely on strong detectors trained on external image datasets. We hope that the release of this dataset will enable new lines of research that take advantage of large scale video based training data.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"22 1","pages":"504-521"},"PeriodicalIF":0.0,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73440223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Bridging the Visual Semantic Gap in VLN via Semantically Richer Instructions 利用语义丰富的指令弥合VLN中的视觉语义差距

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-27 DOI: 10.1007/978-3-031-19836-6_4

Joaquín Ossandón, Benjamín Earle, Álvaro Soto

引用次数: 0

Addressing Heterogeneity in Federated Learning via Distributional Transformation 通过分布转换处理联邦学习中的异质性

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-26 DOI: 10.1007/978-3-031-19839-7_11

Haolin Yuan, Bo Hui, Yuchen Yang, P. Burlina, N. Gong, Yinzhi Cao

引用次数: 8

SUPR: A Sparse Unified Part-Based Human Representation SUPR:稀疏统一的基于部分的人类表征

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-25 DOI: 10.48550/arXiv.2210.13861

Ahmed A. A. Osman, Timo Bolkart, Dimitrios Tzionas, Michael J. Black

Statistical 3D shape models of the head, hands, and fullbody are widely used in computer vision and graphics. Despite their wide use, we show that existing models of the head and hands fail to capture the full range of motion for these parts. Moreover, existing work largely ignores the feet, which are crucial for modeling human movement and have applications in biomechanics, animation, and the footwear industry. The problem is that previous body part models are trained using 3D scans that are isolated to the individual parts. Such data does not capture the full range of motion for such parts, e.g. the motion of head relative to the neck. Our observation is that full-body scans provide important information about the motion of the body parts. Consequently, we propose a new learning scheme that jointly trains a full-body model and specific part models using a federated dataset of full-body and body-part scans. Specifically, we train an expressive human body model called SUPR (Sparse Unified Part-Based Human Representation), where each joint strictly influences a sparse set of model vertices. The factorized representation enables separating SUPR into an entire suite of body part models. Note that the feet have received little attention and existing 3D body models have highly under-actuated feet. Using novel 4D scans of feet, we train a model with an extended kinematic tree that captures the range of motion of the toes. Additionally, feet deform due to ground contact. To model this, we include a novel non-linear deformation function that predicts foot deformation conditioned on the foot pose, shape, and ground contact. We train SUPR on an unprecedented number of scans: 1.2 million body, head, hand and foot scans. We quantitatively compare SUPR and the separated body parts and find that our suite of models generalizes better than existing models. SUPR is available at http://supr.is.tue.mpg.de

头部、手部和全身的统计三维形状模型在计算机视觉和图形学中有着广泛的应用。尽管它们被广泛使用，但我们表明，现有的头部和手部模型无法捕捉到这些部位的全部运动范围。此外，现有的工作在很大程度上忽略了脚，这是模拟人类运动的关键，在生物力学、动画和鞋类工业中都有应用。问题是，以前的身体部位模型是用3D扫描来训练的，这种扫描是孤立于单个部位的。这些数据并没有捕捉到这些部位的全部运动范围，例如头部相对于颈部的运动。我们的观察是，全身扫描提供了有关身体部位运动的重要信息。因此，我们提出了一种新的学习方案，该方案使用全身和身体部位扫描的联合数据集联合训练全身模型和特定部位模型。具体来说，我们训练了一个富有表现力的人体模型，称为SUPR(稀疏统一的基于部分的人体表征)，其中每个关节严格影响一个稀疏的模型顶点集。因式表示可以将SUPR分离成一套完整的身体部位模型。注意，足部很少受到关注，现有的3D身体模型具有高度欠驱动的足部。使用新颖的足部4D扫描，我们训练了一个扩展的运动树模型，该模型捕获了脚趾的运动范围。此外，脚会因接触地面而变形。为了模拟这一点，我们包含了一个新的非线性变形函数，该函数预测足部变形取决于足部姿势、形状和地面接触。我们对SUPR进行了前所未有的扫描训练:120万次身体、头部、手部和脚部扫描。我们定量地比较了SUPR和分离的身体部位，发现我们的模型套件比现有的模型泛化得更好。SUPR可在http://supr.is.tue.mpg.de上获得

{"title":"SUPR: A Sparse Unified Part-Based Human Representation","authors":"Ahmed A. A. Osman, Timo Bolkart, Dimitrios Tzionas, Michael J. Black","doi":"10.48550/arXiv.2210.13861","DOIUrl":"https://doi.org/10.48550/arXiv.2210.13861","url":null,"abstract":"Statistical 3D shape models of the head, hands, and fullbody are widely used in computer vision and graphics. Despite their wide use, we show that existing models of the head and hands fail to capture the full range of motion for these parts. Moreover, existing work largely ignores the feet, which are crucial for modeling human movement and have applications in biomechanics, animation, and the footwear industry. The problem is that previous body part models are trained using 3D scans that are isolated to the individual parts. Such data does not capture the full range of motion for such parts, e.g. the motion of head relative to the neck. Our observation is that full-body scans provide important information about the motion of the body parts. Consequently, we propose a new learning scheme that jointly trains a full-body model and specific part models using a federated dataset of full-body and body-part scans. Specifically, we train an expressive human body model called SUPR (Sparse Unified Part-Based Human Representation), where each joint strictly influences a sparse set of model vertices. The factorized representation enables separating SUPR into an entire suite of body part models. Note that the feet have received little attention and existing 3D body models have highly under-actuated feet. Using novel 4D scans of feet, we train a model with an extended kinematic tree that captures the range of motion of the toes. Additionally, feet deform due to ground contact. To model this, we include a novel non-linear deformation function that predicts foot deformation conditioned on the foot pose, shape, and ground contact. We train SUPR on an unprecedented number of scans: 1.2 million body, head, hand and foot scans. We quantitatively compare SUPR and the separated body parts and find that our suite of models generalizes better than existing models. SUPR is available at http://supr.is.tue.mpg.de","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"47 1","pages":"568-585"},"PeriodicalIF":0.0,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81404466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Pointly-Supervised Panoptic Segmentation 点监督全视分割

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-25 DOI: 10.48550/arXiv.2210.13950

Junsong Fan, Zhaoxiang Zhang, T. Tan

In this paper, we propose a new approach to applying point-level annotations for weakly-supervised panoptic segmentation. Instead of the dense pixel-level labels used by fully supervised methods, point-level labels only provide a single point for each target as supervision, significantly reducing the annotation burden. We formulate the problem in an end-to-end framework by simultaneously generating panoptic pseudo-masks from point-level labels and learning from them. To tackle the core challenge, i.e., panoptic pseudo-mask generation, we propose a principled approach to parsing pixels by minimizing pixel-to-point traversing costs, which model semantic similarity, low-level texture cues, and high-level manifold knowledge to discriminate panoptic targets. We conduct experiments on the Pascal VOC and the MS COCO datasets to demonstrate the approach's effectiveness and show state-of-the-art performance in the weakly-supervised panoptic segmentation problem. Codes are available at https://github.com/BraveGroup/PSPS.git.

本文提出了一种将点级标注应用于弱监督全视分割的新方法。与全监督方法使用的密集像素级标签不同，点级标签只为每个目标提供一个点作为监督，大大减少了标注负担。我们通过从点级标签同时生成全景伪掩模并从中学习，在端到端框架中制定问题。为了解决核心挑战，即泛光伪掩码生成，我们提出了一种原则性的方法，通过最小化像素到点的遍历成本来解析像素，该方法建模语义相似性，低级纹理线索和高级流形知识来区分泛光目标。我们在Pascal VOC和MS COCO数据集上进行了实验，以证明该方法的有效性，并在弱监督全光分割问题中展示了最先进的性能。代码可在https://github.com/BraveGroup/PSPS.git上获得。

引用次数: 10

Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement 基于逆运动学和改进的多人三维姿态和形状估计

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

Pub Date : 2022-10-24 DOI: 10.48550/arXiv.2210.13529

Junuk Cha, Muhammad Saqlain, Geonu Kim, Minjung Shin, Seungryul Baek

Estimating 3D poses and shapes in the form of meshes from monocular RGB images is challenging. Obviously, it is more difficult than estimating 3D poses only in the form of skeletons or heatmaps. When interacting persons are involved, the 3D mesh reconstruction becomes more challenging due to the ambiguity introduced by person-to-person occlusions. To tackle the challenges, we propose a coarse-to-fine pipeline that benefits from 1) inverse kinematics from the occlusion-robust 3D skeleton estimation and 2) Transformer-based relation-aware refinement techniques. In our pipeline, we first obtain occlusion-robust 3D skeletons for multiple persons from an RGB image. Then, we apply inverse kinematics to convert the estimated skeletons to deformable 3D mesh parameters. Finally, we apply the Transformer-based mesh refinement that refines the obtained mesh parameters considering intra- and inter-person relations of 3D meshes. Via extensive experiments, we demonstrate the effectiveness of our method, outperforming state-of-the-arts on 3DPW, MuPoTS and AGORA datasets.

从单目RGB图像中以网格形式估计3D姿势和形状是具有挑战性的。显然，这比仅以骨架或热图的形式估计3D姿势要困难得多。当涉及到相互作用的人时，由于人对人遮挡带来的模糊性，三维网格重建变得更具挑战性。为了解决这些挑战，我们提出了一种从粗到细的管道，它受益于1)基于遮挡鲁棒3D骨架估计的逆运动学和2)基于变压器的关系感知细化技术。在我们的管道中，我们首先从RGB图像中获得多人的遮挡鲁棒3D骨架。然后，我们应用逆运动学将估计的骨架转换为可变形的三维网格参数。最后，我们应用基于transformer的网格细化，考虑到三维网格的内部和内部关系，对得到的网格参数进行细化。通过大量的实验，我们证明了我们的方法的有效性，在3DPW, MuPoTS和AGORA数据集上优于最先进的技术。

{"title":"Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement","authors":"Junuk Cha, Muhammad Saqlain, Geonu Kim, Minjung Shin, Seungryul Baek","doi":"10.48550/arXiv.2210.13529","DOIUrl":"https://doi.org/10.48550/arXiv.2210.13529","url":null,"abstract":"Estimating 3D poses and shapes in the form of meshes from monocular RGB images is challenging. Obviously, it is more difficult than estimating 3D poses only in the form of skeletons or heatmaps. When interacting persons are involved, the 3D mesh reconstruction becomes more challenging due to the ambiguity introduced by person-to-person occlusions. To tackle the challenges, we propose a coarse-to-fine pipeline that benefits from 1) inverse kinematics from the occlusion-robust 3D skeleton estimation and 2) Transformer-based relation-aware refinement techniques. In our pipeline, we first obtain occlusion-robust 3D skeletons for multiple persons from an RGB image. Then, we apply inverse kinematics to convert the estimated skeletons to deformable 3D mesh parameters. Finally, we apply the Transformer-based mesh refinement that refines the obtained mesh parameters considering intra- and inter-person relations of 3D meshes. Via extensive experiments, we demonstrate the effectiveness of our method, outperforming state-of-the-arts on 3DPW, MuPoTS and AGORA datasets.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"32 1","pages":"660-677"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84988368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀