首页 > 最新文献

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision最新文献

英文 中文
RRSR: Reciprocal Reference-based Image Super-Resolution with Progressive Feature Alignment and Selection 基于互向参考的图像超分辨率渐进式特征对齐与选择
Lin Zhang, Xin Li, Dongliang He, Fu Li, Yili Wang, Zhao Zhang
Reference-based image super-resolution (RefSR) is a promising SR branch and has shown great potential in overcoming the limitations of single image super-resolution. While previous state-of-the-art RefSR methods mainly focus on improving the efficacy and robustness of reference feature transfer, it is generally overlooked that a well reconstructed SR image should enable better SR reconstruction for its similar LR images when it is referred to as. Therefore, in this work, we propose a reciprocal learning framework that can appropriately leverage such a fact to reinforce the learning of a RefSR network. Besides, we deliberately design a progressive feature alignment and selection module for further improving the RefSR task. The newly proposed module aligns reference-input images at multi-scale feature spaces and performs reference-aware feature selection in a progressive manner, thus more precise reference features can be transferred into the input features and the network capability is enhanced. Our reciprocal learning paradigm is model-agnostic and it can be applied to arbitrary RefSR models. We empirically show that multiple recent state-of-the-art RefSR models can be consistently improved with our reciprocal learning paradigm. Furthermore, our proposed model together with the reciprocal learning strategy sets new state-of-the-art performances on multiple benchmarks.
基于参考的图像超分辨率(RefSR)是一个很有前途的图像超分辨率分支,在克服单幅图像超分辨率的局限性方面显示出巨大的潜力。虽然之前最先进的RefSR方法主要侧重于提高参考特征转移的有效性和鲁棒性,但通常忽略了一个重建良好的SR图像应该能够更好地重建其相似的LR图像,当它被称为。因此,在这项工作中,我们提出了一个互惠学习框架,可以适当地利用这一事实来加强RefSR网络的学习。此外,为了进一步改进RefSR任务,我们特意设计了一个渐进式特征对齐和选择模块。该模块在多尺度特征空间对参考输入图像进行对齐,并逐步进行参考感知特征选择,从而将更精确的参考特征转移到输入特征中,增强了网络性能。我们的互惠学习范式是模型不可知的,它可以应用于任意的RefSR模型。我们的经验表明,多个最新的最先进的RefSR模型可以通过我们的互惠学习范式不断改进。此外,我们提出的模型与互惠学习策略一起在多个基准上设定了新的最先进的性能。
{"title":"RRSR: Reciprocal Reference-based Image Super-Resolution with Progressive Feature Alignment and Selection","authors":"Lin Zhang, Xin Li, Dongliang He, Fu Li, Yili Wang, Zhao Zhang","doi":"10.48550/arXiv.2211.04203","DOIUrl":"https://doi.org/10.48550/arXiv.2211.04203","url":null,"abstract":"Reference-based image super-resolution (RefSR) is a promising SR branch and has shown great potential in overcoming the limitations of single image super-resolution. While previous state-of-the-art RefSR methods mainly focus on improving the efficacy and robustness of reference feature transfer, it is generally overlooked that a well reconstructed SR image should enable better SR reconstruction for its similar LR images when it is referred to as. Therefore, in this work, we propose a reciprocal learning framework that can appropriately leverage such a fact to reinforce the learning of a RefSR network. Besides, we deliberately design a progressive feature alignment and selection module for further improving the RefSR task. The newly proposed module aligns reference-input images at multi-scale feature spaces and performs reference-aware feature selection in a progressive manner, thus more precise reference features can be transferred into the input features and the network capability is enhanced. Our reciprocal learning paradigm is model-agnostic and it can be applied to arbitrary RefSR models. We empirically show that multiple recent state-of-the-art RefSR models can be consistently improved with our reciprocal learning paradigm. Furthermore, our proposed model together with the reciprocal learning strategy sets new state-of-the-art performances on multiple benchmarks.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89730799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Towards Real World HDRTV Reconstruction: A Data Synthesis-based Approach 面向现实世界的HDRTV重建:一种基于数据综合的方法
Zhen Cheng, Tao Wang, Yong Li, Fenglong Song, C. Chen, Zhiwei Xiong
Existing deep learning based HDRTV reconstruction methods assume one kind of tone mapping operators (TMOs) as the degradation procedure to synthesize SDRTV-HDRTV pairs for supervised training. In this paper, we argue that, although traditional TMOs exploit efficient dynamic range compression priors, they have several drawbacks on modeling the realistic degradation: information over-preservation, color bias and possible artifacts, making the trained reconstruction networks hard to generalize well to real-world cases. To solve this problem, we propose a learning-based data synthesis approach to learn the properties of real-world SDRTVs by integrating several tone mapping priors into both network structures and loss functions. In specific, we design a conditioned two-stream network with prior tone mapping results as a guidance to synthesize SDRTVs by both global and local transformations. To train the data synthesis network, we form a novel self-supervised content loss to constraint different aspects of the synthesized SDRTVs at regions with different brightness distributions and an adversarial loss to emphasize the details to be more realistic. To validate the effectiveness of our approach, we synthesize SDRTV-HDRTV pairs with our method and use them to train several HDRTV reconstruction networks. Then we collect two inference datasets containing both labeled and unlabeled real-world SDRTVs, respectively. Experimental results demonstrate that, the networks trained with our synthesized data generalize significantly better to these two real-world datasets than existing solutions.
现有的基于深度学习的HDRTV重建方法采用一种音调映射算子(TMOs)作为退化过程,合成SDRTV-HDRTV对进行监督训练。在本文中,我们认为,尽管传统的TMOs利用了有效的动态范围压缩先验,但它们在模拟现实退化方面存在一些缺点:信息过度保存、颜色偏差和可能的伪影,使得训练好的重建网络难以很好地推广到现实世界的情况。为了解决这个问题,我们提出了一种基于学习的数据合成方法,通过将几个音调映射先验值集成到网络结构和损失函数中来学习真实世界sdrtv的属性。具体来说,我们设计了一个有条件的两流网络,以先验的音调映射结果作为指导,通过全局和局部变换合成sdrtv。为了训练数据合成网络,我们形成了一种新的自监督内容损失来约束合成的sdrtv在不同亮度分布区域的不同方面,并形成了一种对抗损失来强调细节,使其更加逼真。为了验证该方法的有效性,我们利用该方法合成了SDRTV-HDRTV对,并用它们训练了多个HDRTV重建网络。然后,我们收集了两个推理数据集,分别包含标记和未标记的真实世界的sdrtv。实验结果表明,与现有的解决方案相比,用我们的合成数据训练的网络对这两个现实世界数据集的泛化能力明显更好。
{"title":"Towards Real World HDRTV Reconstruction: A Data Synthesis-based Approach","authors":"Zhen Cheng, Tao Wang, Yong Li, Fenglong Song, C. Chen, Zhiwei Xiong","doi":"10.48550/arXiv.2211.03058","DOIUrl":"https://doi.org/10.48550/arXiv.2211.03058","url":null,"abstract":"Existing deep learning based HDRTV reconstruction methods assume one kind of tone mapping operators (TMOs) as the degradation procedure to synthesize SDRTV-HDRTV pairs for supervised training. In this paper, we argue that, although traditional TMOs exploit efficient dynamic range compression priors, they have several drawbacks on modeling the realistic degradation: information over-preservation, color bias and possible artifacts, making the trained reconstruction networks hard to generalize well to real-world cases. To solve this problem, we propose a learning-based data synthesis approach to learn the properties of real-world SDRTVs by integrating several tone mapping priors into both network structures and loss functions. In specific, we design a conditioned two-stream network with prior tone mapping results as a guidance to synthesize SDRTVs by both global and local transformations. To train the data synthesis network, we form a novel self-supervised content loss to constraint different aspects of the synthesized SDRTVs at regions with different brightness distributions and an adversarial loss to emphasize the details to be more realistic. To validate the effectiveness of our approach, we synthesize SDRTV-HDRTV pairs with our method and use them to train several HDRTV reconstruction networks. Then we collect two inference datasets containing both labeled and unlabeled real-world SDRTVs, respectively. Experimental results demonstrate that, the networks trained with our synthesized data generalize significantly better to these two real-world datasets than existing solutions.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84938790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Large Scale Real-World Multi-Person Tracking 大规模的真实世界多人跟踪
Bing Shuai, Alessandro Bergamo, Uta Buechler, Andrew G. Berneshawi, Alyssa Boden, Joseph Tighe
This paper presents a new large scale multi-person tracking dataset -- texttt{PersonPath22}, which is over an order of magnitude larger than currently available high quality multi-object tracking datasets such as MOT17, HiEve, and MOT20 datasets. The lack of large scale training and test data for this task has limited the community's ability to understand the performance of their tracking systems on a wide range of scenarios and conditions such as variations in person density, actions being performed, weather, and time of day. texttt{PersonPath22} dataset was specifically sourced to provide a wide variety of these conditions and our annotations include rich meta-data such that the performance of a tracker can be evaluated along these different dimensions. The lack of training data has also limited the ability to perform end-to-end training of tracking systems. As such, the highest performing tracking systems all rely on strong detectors trained on external image datasets. We hope that the release of this dataset will enable new lines of research that take advantage of large scale video based training data.
本文提出了一个新的大规模多人跟踪数据集——texttt{PersonPath22},它比目前可用的高质量多目标跟踪数据集(如MOT17、HiEve和MOT20数据集)大一个数量级以上。缺乏大规模的训练和测试数据限制了社区了解其跟踪系统在各种场景和条件下的性能的能力,例如人员密度的变化、正在执行的动作、天气和一天中的时间。texttt{PersonPath22}数据集专门用于提供各种各样的这些条件,我们的注释包括丰富的元数据,这样跟踪器的性能可以沿着这些不同的维度进行评估。训练数据的缺乏也限制了跟踪系统进行端到端训练的能力。因此,性能最高的跟踪系统都依赖于外部图像数据集训练的强检测器。我们希望这个数据集的发布将使利用大规模视频训练数据的新研究成为可能。
{"title":"Large Scale Real-World Multi-Person Tracking","authors":"Bing Shuai, Alessandro Bergamo, Uta Buechler, Andrew G. Berneshawi, Alyssa Boden, Joseph Tighe","doi":"10.48550/arXiv.2211.02175","DOIUrl":"https://doi.org/10.48550/arXiv.2211.02175","url":null,"abstract":"This paper presents a new large scale multi-person tracking dataset -- texttt{PersonPath22}, which is over an order of magnitude larger than currently available high quality multi-object tracking datasets such as MOT17, HiEve, and MOT20 datasets. The lack of large scale training and test data for this task has limited the community's ability to understand the performance of their tracking systems on a wide range of scenarios and conditions such as variations in person density, actions being performed, weather, and time of day. texttt{PersonPath22} dataset was specifically sourced to provide a wide variety of these conditions and our annotations include rich meta-data such that the performance of a tracker can be evaluated along these different dimensions. The lack of training data has also limited the ability to perform end-to-end training of tracking systems. As such, the highest performing tracking systems all rely on strong detectors trained on external image datasets. We hope that the release of this dataset will enable new lines of research that take advantage of large scale video based training data.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73440223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Bridging the Visual Semantic Gap in VLN via Semantically Richer Instructions 利用语义丰富的指令弥合VLN中的视觉语义差距
Joaquín Ossandón, Benjamín Earle, Álvaro Soto
{"title":"Bridging the Visual Semantic Gap in VLN via Semantically Richer Instructions","authors":"Joaquín Ossandón, Benjamín Earle, Álvaro Soto","doi":"10.1007/978-3-031-19836-6_4","DOIUrl":"https://doi.org/10.1007/978-3-031-19836-6_4","url":null,"abstract":"","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72899856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Addressing Heterogeneity in Federated Learning via Distributional Transformation 通过分布转换处理联邦学习中的异质性
Haolin Yuan, Bo Hui, Yuchen Yang, P. Burlina, N. Gong, Yinzhi Cao
{"title":"Addressing Heterogeneity in Federated Learning via Distributional Transformation","authors":"Haolin Yuan, Bo Hui, Yuchen Yang, P. Burlina, N. Gong, Yinzhi Cao","doi":"10.1007/978-3-031-19839-7_11","DOIUrl":"https://doi.org/10.1007/978-3-031-19839-7_11","url":null,"abstract":"","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86155336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
SUPR: A Sparse Unified Part-Based Human Representation SUPR:稀疏统一的基于部分的人类表征
Ahmed A. A. Osman, Timo Bolkart, Dimitrios Tzionas, Michael J. Black
Statistical 3D shape models of the head, hands, and fullbody are widely used in computer vision and graphics. Despite their wide use, we show that existing models of the head and hands fail to capture the full range of motion for these parts. Moreover, existing work largely ignores the feet, which are crucial for modeling human movement and have applications in biomechanics, animation, and the footwear industry. The problem is that previous body part models are trained using 3D scans that are isolated to the individual parts. Such data does not capture the full range of motion for such parts, e.g. the motion of head relative to the neck. Our observation is that full-body scans provide important information about the motion of the body parts. Consequently, we propose a new learning scheme that jointly trains a full-body model and specific part models using a federated dataset of full-body and body-part scans. Specifically, we train an expressive human body model called SUPR (Sparse Unified Part-Based Human Representation), where each joint strictly influences a sparse set of model vertices. The factorized representation enables separating SUPR into an entire suite of body part models. Note that the feet have received little attention and existing 3D body models have highly under-actuated feet. Using novel 4D scans of feet, we train a model with an extended kinematic tree that captures the range of motion of the toes. Additionally, feet deform due to ground contact. To model this, we include a novel non-linear deformation function that predicts foot deformation conditioned on the foot pose, shape, and ground contact. We train SUPR on an unprecedented number of scans: 1.2 million body, head, hand and foot scans. We quantitatively compare SUPR and the separated body parts and find that our suite of models generalizes better than existing models. SUPR is available at http://supr.is.tue.mpg.de
头部、手部和全身的统计三维形状模型在计算机视觉和图形学中有着广泛的应用。尽管它们被广泛使用,但我们表明,现有的头部和手部模型无法捕捉到这些部位的全部运动范围。此外,现有的工作在很大程度上忽略了脚,这是模拟人类运动的关键,在生物力学、动画和鞋类工业中都有应用。问题是,以前的身体部位模型是用3D扫描来训练的,这种扫描是孤立于单个部位的。这些数据并没有捕捉到这些部位的全部运动范围,例如头部相对于颈部的运动。我们的观察是,全身扫描提供了有关身体部位运动的重要信息。因此,我们提出了一种新的学习方案,该方案使用全身和身体部位扫描的联合数据集联合训练全身模型和特定部位模型。具体来说,我们训练了一个富有表现力的人体模型,称为SUPR(稀疏统一的基于部分的人体表征),其中每个关节严格影响一个稀疏的模型顶点集。因式表示可以将SUPR分离成一套完整的身体部位模型。注意,足部很少受到关注,现有的3D身体模型具有高度欠驱动的足部。使用新颖的足部4D扫描,我们训练了一个扩展的运动树模型,该模型捕获了脚趾的运动范围。此外,脚会因接触地面而变形。为了模拟这一点,我们包含了一个新的非线性变形函数,该函数预测足部变形取决于足部姿势、形状和地面接触。我们对SUPR进行了前所未有的扫描训练:120万次身体、头部、手部和脚部扫描。我们定量地比较了SUPR和分离的身体部位,发现我们的模型套件比现有的模型泛化得更好。SUPR可在http://supr.is.tue.mpg.de上获得
{"title":"SUPR: A Sparse Unified Part-Based Human Representation","authors":"Ahmed A. A. Osman, Timo Bolkart, Dimitrios Tzionas, Michael J. Black","doi":"10.48550/arXiv.2210.13861","DOIUrl":"https://doi.org/10.48550/arXiv.2210.13861","url":null,"abstract":"Statistical 3D shape models of the head, hands, and fullbody are widely used in computer vision and graphics. Despite their wide use, we show that existing models of the head and hands fail to capture the full range of motion for these parts. Moreover, existing work largely ignores the feet, which are crucial for modeling human movement and have applications in biomechanics, animation, and the footwear industry. The problem is that previous body part models are trained using 3D scans that are isolated to the individual parts. Such data does not capture the full range of motion for such parts, e.g. the motion of head relative to the neck. Our observation is that full-body scans provide important information about the motion of the body parts. Consequently, we propose a new learning scheme that jointly trains a full-body model and specific part models using a federated dataset of full-body and body-part scans. Specifically, we train an expressive human body model called SUPR (Sparse Unified Part-Based Human Representation), where each joint strictly influences a sparse set of model vertices. The factorized representation enables separating SUPR into an entire suite of body part models. Note that the feet have received little attention and existing 3D body models have highly under-actuated feet. Using novel 4D scans of feet, we train a model with an extended kinematic tree that captures the range of motion of the toes. Additionally, feet deform due to ground contact. To model this, we include a novel non-linear deformation function that predicts foot deformation conditioned on the foot pose, shape, and ground contact. We train SUPR on an unprecedented number of scans: 1.2 million body, head, hand and foot scans. We quantitatively compare SUPR and the separated body parts and find that our suite of models generalizes better than existing models. SUPR is available at http://supr.is.tue.mpg.de","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81404466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Pointly-Supervised Panoptic Segmentation 点监督全视分割
Junsong Fan, Zhaoxiang Zhang, T. Tan
In this paper, we propose a new approach to applying point-level annotations for weakly-supervised panoptic segmentation. Instead of the dense pixel-level labels used by fully supervised methods, point-level labels only provide a single point for each target as supervision, significantly reducing the annotation burden. We formulate the problem in an end-to-end framework by simultaneously generating panoptic pseudo-masks from point-level labels and learning from them. To tackle the core challenge, i.e., panoptic pseudo-mask generation, we propose a principled approach to parsing pixels by minimizing pixel-to-point traversing costs, which model semantic similarity, low-level texture cues, and high-level manifold knowledge to discriminate panoptic targets. We conduct experiments on the Pascal VOC and the MS COCO datasets to demonstrate the approach's effectiveness and show state-of-the-art performance in the weakly-supervised panoptic segmentation problem. Codes are available at https://github.com/BraveGroup/PSPS.git.
本文提出了一种将点级标注应用于弱监督全视分割的新方法。与全监督方法使用的密集像素级标签不同,点级标签只为每个目标提供一个点作为监督,大大减少了标注负担。我们通过从点级标签同时生成全景伪掩模并从中学习,在端到端框架中制定问题。为了解决核心挑战,即泛光伪掩码生成,我们提出了一种原则性的方法,通过最小化像素到点的遍历成本来解析像素,该方法建模语义相似性,低级纹理线索和高级流形知识来区分泛光目标。我们在Pascal VOC和MS COCO数据集上进行了实验,以证明该方法的有效性,并在弱监督全光分割问题中展示了最先进的性能。代码可在https://github.com/BraveGroup/PSPS.git上获得。
{"title":"Pointly-Supervised Panoptic Segmentation","authors":"Junsong Fan, Zhaoxiang Zhang, T. Tan","doi":"10.48550/arXiv.2210.13950","DOIUrl":"https://doi.org/10.48550/arXiv.2210.13950","url":null,"abstract":"In this paper, we propose a new approach to applying point-level annotations for weakly-supervised panoptic segmentation. Instead of the dense pixel-level labels used by fully supervised methods, point-level labels only provide a single point for each target as supervision, significantly reducing the annotation burden. We formulate the problem in an end-to-end framework by simultaneously generating panoptic pseudo-masks from point-level labels and learning from them. To tackle the core challenge, i.e., panoptic pseudo-mask generation, we propose a principled approach to parsing pixels by minimizing pixel-to-point traversing costs, which model semantic similarity, low-level texture cues, and high-level manifold knowledge to discriminate panoptic targets. We conduct experiments on the Pascal VOC and the MS COCO datasets to demonstrate the approach's effectiveness and show state-of-the-art performance in the weakly-supervised panoptic segmentation problem. Codes are available at https://github.com/BraveGroup/PSPS.git.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80289702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement 基于逆运动学和改进的多人三维姿态和形状估计
Junuk Cha, Muhammad Saqlain, Geonu Kim, Minjung Shin, Seungryul Baek
Estimating 3D poses and shapes in the form of meshes from monocular RGB images is challenging. Obviously, it is more difficult than estimating 3D poses only in the form of skeletons or heatmaps. When interacting persons are involved, the 3D mesh reconstruction becomes more challenging due to the ambiguity introduced by person-to-person occlusions. To tackle the challenges, we propose a coarse-to-fine pipeline that benefits from 1) inverse kinematics from the occlusion-robust 3D skeleton estimation and 2) Transformer-based relation-aware refinement techniques. In our pipeline, we first obtain occlusion-robust 3D skeletons for multiple persons from an RGB image. Then, we apply inverse kinematics to convert the estimated skeletons to deformable 3D mesh parameters. Finally, we apply the Transformer-based mesh refinement that refines the obtained mesh parameters considering intra- and inter-person relations of 3D meshes. Via extensive experiments, we demonstrate the effectiveness of our method, outperforming state-of-the-arts on 3DPW, MuPoTS and AGORA datasets.
从单目RGB图像中以网格形式估计3D姿势和形状是具有挑战性的。显然,这比仅以骨架或热图的形式估计3D姿势要困难得多。当涉及到相互作用的人时,由于人对人遮挡带来的模糊性,三维网格重建变得更具挑战性。为了解决这些挑战,我们提出了一种从粗到细的管道,它受益于1)基于遮挡鲁棒3D骨架估计的逆运动学和2)基于变压器的关系感知细化技术。在我们的管道中,我们首先从RGB图像中获得多人的遮挡鲁棒3D骨架。然后,我们应用逆运动学将估计的骨架转换为可变形的三维网格参数。最后,我们应用基于transformer的网格细化,考虑到三维网格的内部和内部关系,对得到的网格参数进行细化。通过大量的实验,我们证明了我们的方法的有效性,在3DPW, MuPoTS和AGORA数据集上优于最先进的技术。
{"title":"Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement","authors":"Junuk Cha, Muhammad Saqlain, Geonu Kim, Minjung Shin, Seungryul Baek","doi":"10.48550/arXiv.2210.13529","DOIUrl":"https://doi.org/10.48550/arXiv.2210.13529","url":null,"abstract":"Estimating 3D poses and shapes in the form of meshes from monocular RGB images is challenging. Obviously, it is more difficult than estimating 3D poses only in the form of skeletons or heatmaps. When interacting persons are involved, the 3D mesh reconstruction becomes more challenging due to the ambiguity introduced by person-to-person occlusions. To tackle the challenges, we propose a coarse-to-fine pipeline that benefits from 1) inverse kinematics from the occlusion-robust 3D skeleton estimation and 2) Transformer-based relation-aware refinement techniques. In our pipeline, we first obtain occlusion-robust 3D skeletons for multiple persons from an RGB image. Then, we apply inverse kinematics to convert the estimated skeletons to deformable 3D mesh parameters. Finally, we apply the Transformer-based mesh refinement that refines the obtained mesh parameters considering intra- and inter-person relations of 3D meshes. Via extensive experiments, we demonstrate the effectiveness of our method, outperforming state-of-the-arts on 3DPW, MuPoTS and AGORA datasets.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84988368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval 基于不确定性学习的人图像检索可靠性感知预测
Zhaopeng Dou, Zhongdao Wang, Weihua Chen, Yali Li, Shengjin Wang
{"title":"Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval","authors":"Zhaopeng Dou, Zhongdao Wang, Weihua Chen, Yali Li, Shengjin Wang","doi":"10.1007/978-3-031-19781-9_34","DOIUrl":"https://doi.org/10.1007/978-3-031-19781-9_34","url":null,"abstract":"","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78737290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SC-wLS: Towards Interpretable Feed-forward Camera Re-localization SC-wLS:面向可解释前馈摄像机的再定位
Xin Wu, Hao Zhao, Shunkai Li, Yingdian Cao, H. Zha
{"title":"SC-wLS: Towards Interpretable Feed-forward Camera Re-localization","authors":"Xin Wu, Hao Zhao, Shunkai Li, Yingdian Cao, H. Zha","doi":"10.1007/978-3-031-19769-7_34","DOIUrl":"https://doi.org/10.1007/978-3-031-19769-7_34","url":null,"abstract":"","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85065927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1