2021 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献_第10页

DWKS : A Local Descriptor of Deformations Between Meshes and Point Clouds DWKS:网格和点云之间变形的局部描述符

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00377

Robin Magnet, M. Ovsjanikov

We propose a novel pointwise descriptor, called DWKS, aimed at finding correspondences across two deformable shape collections. Unlike the majority of existing descriptors, rather than capturing local geometry, DWKS captures the deformation around a point within a collection in a multi-scale and informative manner. This, in turn, allows to compute inter-collection correspondences without using landmarks. To this end, we build upon the successful spectral WKS descriptors, but rather than using the Laplace-Beltrami operator, show that a similar construction can be performed on shape difference operators, that capture differences or distortion within a collection. By leveraging the collection information our descriptor facilitates difficult non-rigid shape matching tasks, even in the presence of strong partiality and significant deformations. We demonstrate the utility of our approach across a range of challenging matching problems on both meshes and point clouds. The code for this paper can be found at https://github.com/RobinMagnet/DWKS

我们提出了一种新的点向描述符，称为DWKS，旨在找到两个可变形形状集合之间的对应关系。与大多数现有的描述符不同，DWKS不是捕获局部几何形状，而是以多尺度和信息的方式捕获集合中点周围的变形。反过来，这允许在不使用地标的情况下计算集合间的对应关系。为此，我们建立在成功的光谱WKS描述符的基础上，但不是使用拉普拉斯-贝尔特拉米算子，而是表明可以在形状差异算子上执行类似的构造，以捕获集合中的差异或扭曲。通过利用集合信息，我们的描述符简化了困难的非刚性形状匹配任务，即使在存在强烈的偏袒和显著变形的情况下也是如此。我们在网格和点云上展示了我们的方法在一系列具有挑战性的匹配问题上的实用性。本文的代码可以在https://github.com/RobinMagnet/DWKS上找到

引用次数: 1

Cross-Modality Person Re-Identification via Modality Confusion and Center Aggregation 基于模态混淆和中心聚集的跨模态人物再识别

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01609

Xin Hao, Sanyuan Zhao, Mang Ye, Jianbing Shen

Cross-modality person re-identification is a challenging task due to large cross-modality discrepancy and intramodality variations. Currently, most existing methods focus on learning modality-specific or modality-shareable features by using the identity supervision or modality label. Different from existing methods, this paper presents a novel Modality Confusion Learning Network (MCLNet). Its basic idea is to confuse two modalities, ensuring that the optimization is explicitly concentrated on the modality-irrelevant perspective. Specifically, MCLNet is designed to learn modality-invariant features by simultaneously minimizing inter-modality discrepancy while maximizing cross-modality similarity among instances in a single framework. Furthermore, an identity-aware marginal center aggregation strategy is introduced to extract the centralization features, while keeping diversity with a marginal constraint. Finally, we design a camera-aware learning scheme to enrich the discriminability. Extensive experiments on SYSU-MM01 and RegDB datasets show that MCLNet outperforms the state-of-the-art by a large margin. On the large-scale SYSU-MM01 dataset, our model can achieve 65.40 % and 61.98 % in terms of Rank-1 accuracy and mAP value.

由于存在较大的跨模态差异和模态内差异，跨模态人的再识别是一项具有挑战性的任务。目前，大多数现有方法都是通过身份监督或模态标签来学习模态特定或模态可共享的特征。与现有方法不同，本文提出了一种新的模态混淆学习网络(MCLNet)。其基本思想是混淆两种模态，确保优化明确地集中在与模态无关的角度上。具体来说，MCLNet旨在通过最小化模态间差异同时最大化单个框架中实例之间的跨模态相似性来学习模态不变特征。引入身份感知的边缘中心聚合策略提取集中化特征，同时在边缘约束下保持多样性。最后，我们设计了一个摄像头感知学习方案，以丰富识别能力。在SYSU-MM01和RegDB数据集上进行的大量实验表明，MCLNet在很大程度上优于最先进的技术。在SYSU-MM01大规模数据集上，我们的模型在Rank-1精度和mAP值方面分别达到65.40%和61.98%。

{"title":"Cross-Modality Person Re-Identification via Modality Confusion and Center Aggregation","authors":"Xin Hao, Sanyuan Zhao, Mang Ye, Jianbing Shen","doi":"10.1109/ICCV48922.2021.01609","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01609","url":null,"abstract":"Cross-modality person re-identification is a challenging task due to large cross-modality discrepancy and intramodality variations. Currently, most existing methods focus on learning modality-specific or modality-shareable features by using the identity supervision or modality label. Different from existing methods, this paper presents a novel Modality Confusion Learning Network (MCLNet). Its basic idea is to confuse two modalities, ensuring that the optimization is explicitly concentrated on the modality-irrelevant perspective. Specifically, MCLNet is designed to learn modality-invariant features by simultaneously minimizing inter-modality discrepancy while maximizing cross-modality similarity among instances in a single framework. Furthermore, an identity-aware marginal center aggregation strategy is introduced to extract the centralization features, while keeping diversity with a marginal constraint. Finally, we design a camera-aware learning scheme to enrich the discriminability. Extensive experiments on SYSU-MM01 and RegDB datasets show that MCLNet outperforms the state-of-the-art by a large margin. On the large-scale SYSU-MM01 dataset, our model can achieve 65.40 % and 61.98 % in terms of Rank-1 accuracy and mAP value.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"35 1","pages":"16383-16392"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85746695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67

Gait Recognition in the Wild: A Benchmark 野外步态识别:一个基准

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/iccv48922.2021.01452

Zheng Zhu, Xianda Guo, Tian Yang, Junjie Huang, Jiankang Deng, Guan Huang, Dalong Du, Jiwen Lu, Jie Zhou

Gait benchmarks empower the research community to train and evaluate high-performance gait recognition systems. Even though growing efforts have been devoted to cross-view recognition, academia is restricted by current existing databases captured in the controlled environment. In this paper, we contribute a new benchmark for Gait REcognition in the Wild (GREW). The GREW dataset is constructed from natural videos, which contains hundreds of cameras and thousands of hours streams in open systems. With tremendous manual annotations, the GREW consists of 26K identities and 128K sequences with rich attributes for unconstrained gait recognition. Moreover, we add a distractor set of over 233K sequences, making it more suitable for real-world applications. Compared with prevailing predefined cross-view datasets, the GREW has diverse and practical view variations, as well as more natural challenging factors. To the best of our knowledge, this is the first large-scale dataset for gait recognition in the wild. Equipped with this benchmark, we dissect the unconstrained gait recognition problem. Representative appearance-based and model-based methods are explored, and comprehensive baselines are established. Experimental results show (1) The proposed GREW benchmark is necessary for training and evaluating gait recognizer in the wild. (2) For state-of-the-art gait recognition approaches, there is a lot of room for improvement. (3) The GREW benchmark can be used as effective pre-training for controlled gait recognition. Benchmark website is https://www.grew-benchmark.org/.

步态基准使研究界能够训练和评估高性能的步态识别系统。尽管越来越多的努力致力于交叉视图识别，但学术界受到当前在受控环境中捕获的现有数据库的限制。在本文中，我们提出了一种新的野外步态识别基准。grow数据集由自然视频构建而成，其中包含开放系统中数百个摄像机和数千小时的流。通过大量的手工注释，该算法由26K个身份和128K个具有丰富属性的序列组成，用于无约束步态识别。此外，我们添加了一个超过233K序列的分心集，使其更适合现实世界的应用。与现有的预定义交叉视图数据集相比，grow具有多样化和实用的视图变化，以及更多的自然挑战因素。据我们所知，这是野外步态识别的第一个大规模数据集。在此基础上，对无约束步态识别问题进行了分析。探索了具有代表性的基于外观和基于模型的方法，建立了综合基线。实验结果表明:(1)所提出的grow基准对于训练和评估野外步态识别器是必要的。(2)对于最先进的步态识别方法，还有很大的改进空间。(3) grow基准可以作为控制步态识别的有效预训练。基准网站是https://www.grew-benchmark.org/。

{"title":"Gait Recognition in the Wild: A Benchmark","authors":"Zheng Zhu, Xianda Guo, Tian Yang, Junjie Huang, Jiankang Deng, Guan Huang, Dalong Du, Jiwen Lu, Jie Zhou","doi":"10.1109/iccv48922.2021.01452","DOIUrl":"https://doi.org/10.1109/iccv48922.2021.01452","url":null,"abstract":"Gait benchmarks empower the research community to train and evaluate high-performance gait recognition systems. Even though growing efforts have been devoted to cross-view recognition, academia is restricted by current existing databases captured in the controlled environment. In this paper, we contribute a new benchmark for Gait REcognition in the Wild (GREW). The GREW dataset is constructed from natural videos, which contains hundreds of cameras and thousands of hours streams in open systems. With tremendous manual annotations, the GREW consists of 26K identities and 128K sequences with rich attributes for unconstrained gait recognition. Moreover, we add a distractor set of over 233K sequences, making it more suitable for real-world applications. Compared with prevailing predefined cross-view datasets, the GREW has diverse and practical view variations, as well as more natural challenging factors. To the best of our knowledge, this is the first large-scale dataset for gait recognition in the wild. Equipped with this benchmark, we dissect the unconstrained gait recognition problem. Representative appearance-based and model-based methods are explored, and comprehensive baselines are established. Experimental results show (1) The proposed GREW benchmark is necessary for training and evaluating gait recognizer in the wild. (2) For state-of-the-art gait recognition approaches, there is a lot of room for improvement. (3) The GREW benchmark can be used as effective pre-training for controlled gait recognition. Benchmark website is https://www.grew-benchmark.org/.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"108 1","pages":"14769-14779"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85797240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 76

Can Scale-Consistent Monocular Depth Be Learned in a Self-Supervised Scale-Invariant Manner? 尺度一致的单目深度能否以自监督尺度不变的方式学习?

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01249

Lijun Wang, Yifan Wang, Linzhao Wang, Yu-Wei Zhan, Ying Wang, Huchuan Lu

Geometric constraints are shown to enforce scale consistency and remedy the scale ambiguity issue in self-supervised monocular depth estimation. Meanwhile, scale-invariant losses focus on learning relative depth, leading to accurate relative depth prediction. To combine the best of both worlds, we learn scale-consistent self-supervised depth in a scale-invariant manner. Towards this goal, we present a scale-aware geometric (SAG) loss, which enforces scale consistency through point cloud alignment. Compared to prior arts, SAG loss takes relative scale into consideration during relative motion estimation, enabling more precise alignment and explicit supervision for scale inference. In addition, a novel two-stream architecture for depth estimation is designed, which disentangles scale from depth estimation and allows depth to be learned in a scale-invariant manner. The integration of SAG loss and two-stream network enables more consistent scale inference and more accurate relative depth estimation. Our method achieves state-of-the-art performance under both scale-invariant and scale-dependent evaluation settings.

在自监督单目深度估计中，几何约束可以增强尺度一致性并解决尺度模糊问题。同时，尺度不变损失侧重于学习相对深度，从而实现准确的相对深度预测。为了结合这两个世界的优点，我们以尺度不变的方式学习尺度一致的自监督深度。为了实现这一目标，我们提出了一种尺度感知几何损失(SAG)，它通过点云对齐来增强尺度一致性。与现有技术相比，SAG损失在相对运动估计中考虑了相对尺度，从而实现了更精确的对齐和对尺度推理的明确监督。此外，设计了一种新的深度估计的双流架构，将尺度与深度估计分离开来，并允许深度以尺度不变的方式学习。将SAG损失与两流网络相结合，使尺度推断更加一致，相对深度估计更加准确。我们的方法在尺度不变和尺度相关的评估设置下都实现了最先进的性能。

{"title":"Can Scale-Consistent Monocular Depth Be Learned in a Self-Supervised Scale-Invariant Manner?","authors":"Lijun Wang, Yifan Wang, Linzhao Wang, Yu-Wei Zhan, Ying Wang, Huchuan Lu","doi":"10.1109/ICCV48922.2021.01249","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01249","url":null,"abstract":"Geometric constraints are shown to enforce scale consistency and remedy the scale ambiguity issue in self-supervised monocular depth estimation. Meanwhile, scale-invariant losses focus on learning relative depth, leading to accurate relative depth prediction. To combine the best of both worlds, we learn scale-consistent self-supervised depth in a scale-invariant manner. Towards this goal, we present a scale-aware geometric (SAG) loss, which enforces scale consistency through point cloud alignment. Compared to prior arts, SAG loss takes relative scale into consideration during relative motion estimation, enabling more precise alignment and explicit supervision for scale inference. In addition, a novel two-stream architecture for depth estimation is designed, which disentangles scale from depth estimation and allows depth to be learned in a scale-invariant manner. The integration of SAG loss and two-stream network enables more consistent scale inference and more accurate relative depth estimation. Our method achieves state-of-the-art performance under both scale-invariant and scale-dependent evaluation settings.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"6 1","pages":"12707-12716"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86279441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Defending against Universal Adversarial Patches by Clipping Feature Norms 通过裁剪特征规范防御通用对抗性补丁

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01612

Cheng Yu, Jiansheng Chen, Youze Xue, Yuyang Liu, Weitao Wan, Jiayu Bao, Huimin Ma

Physical-world adversarial attacks based on universal adversarial patches have been proved to be able to mislead deep convolutional neural networks (CNNs), exposing the vulnerability of real-world visual classification systems based on CNNs. In this paper, we empirically reveal and mathematically explain that the universal adversarial patches usually lead to deep feature vectors with very large norms in popular CNNs. Inspired by this, we propose a simple yet effective defending approach using a new feature norm clipping (FNC) layer which is a differentiable module that can be flexibly inserted in different CNNs to adaptively suppress the generation of large norm deep feature vectors. FNC introduces no trainable parameter and only very low computational overhead. However, experiments on multiple datasets validate that it can effectively improve the robustness of different CNNs towards white-box universal patch attacks while maintaining a satisfactory recognition accuracy for clean samples.

基于通用对抗性补丁的物理世界对抗性攻击已被证明能够误导深度卷积神经网络(cnn)，暴露了基于cnn的现实世界视觉分类系统的脆弱性。在本文中，我们通过经验揭示和数学解释了在流行的cnn中，通用对抗补丁通常会导致具有非常大范数的深度特征向量。受此启发，我们提出了一种简单而有效的防御方法，使用新的特征范数裁剪(FNC)层，该层是一个可微模块，可以灵活地插入到不同的cnn中，以自适应抑制大范数深度特征向量的生成。FNC没有引入可训练的参数，只有非常低的计算开销。然而，在多数据集上的实验证明，该方法可以有效提高不同cnn对白盒通用补丁攻击的鲁棒性，同时对干净样本保持满意的识别精度。

引用次数: 13

PrimitiveNet: Primitive Instance Segmentation with Local Primitive Embedding under Adversarial Metric PrimitiveNet:基于对抗度量的局部基元嵌入基元实例分割

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01506

Jingwei Huang, Yanfeng Zhang, Mingwei Sun

We present PrimitiveNet, a novel approach for high-resolution primitive instance segmentation from point clouds on a large scale. Our key idea is to transform the global segmentation problem into easier local tasks. We train a high-resolution primitive embedding network to predict explicit geometry features and implicit latent features for each point. The embedding is jointly trained with an adversarial network as a primitive discriminator to decide whether points are from the same primitive instance in local neighborhoods. Such local supervision encourages the learned embedding and discriminator to describe local surface properties and robustly distinguish different instances. At inference time, network predictions are followed by a region growing method to finalize the segmentation. Experiments show that our method outperforms existing state-of-the-arts based on mean average precision by a significant margin (46.3%) on ABC dataset [31]. We can process extremely large real scenes covering more than 0.1km2. Ablation studies highlight the contribution of our core designs. Finally, our method can improve geometry processing algorithms to abstract scans as lightweight models. Code and data will be available based on Pytorch1 and Mindspore2.

我们提出了PrimitiveNet，一种用于大规模点云的高分辨率原始实例分割的新方法。我们的关键思想是将全局分割问题转化为更容易的局部任务。我们训练了一个高分辨率的原始嵌入网络来预测每个点的显式几何特征和隐式潜在特征。该嵌入与一个对抗网络共同训练，作为一个原始判别器来判断点是否来自局部邻域的同一原始实例。这种局部监督鼓励学习嵌入和判别器描述局部表面性质，并鲁棒区分不同的实例。在推理阶段，网络预测后采用区域增长方法完成分割。实验表明，我们的方法在ABC数据集上显著优于现有的基于平均精度的最先进方法(46.3%)[31]。我们可以处理超过0.1km2的超大真实场景。消融研究突出了我们核心设计的贡献。最后，我们的方法可以改进几何处理算法，将扫描抽象为轻量级模型。代码和数据将基于Pytorch1和Mindspore2提供。

{"title":"PrimitiveNet: Primitive Instance Segmentation with Local Primitive Embedding under Adversarial Metric","authors":"Jingwei Huang, Yanfeng Zhang, Mingwei Sun","doi":"10.1109/ICCV48922.2021.01506","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01506","url":null,"abstract":"We present PrimitiveNet, a novel approach for high-resolution primitive instance segmentation from point clouds on a large scale. Our key idea is to transform the global segmentation problem into easier local tasks. We train a high-resolution primitive embedding network to predict explicit geometry features and implicit latent features for each point. The embedding is jointly trained with an adversarial network as a primitive discriminator to decide whether points are from the same primitive instance in local neighborhoods. Such local supervision encourages the learned embedding and discriminator to describe local surface properties and robustly distinguish different instances. At inference time, network predictions are followed by a region growing method to finalize the segmentation. Experiments show that our method outperforms existing state-of-the-arts based on mean average precision by a significant margin (46.3%) on ABC dataset [31]. We can process extremely large real scenes covering more than 0.1km2. Ablation studies highlight the contribution of our core designs. Finally, our method can improve geometry processing algorithms to abstract scans as lightweight models. Code and data will be available based on Pytorch1 and Mindspore2.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"40 1","pages":"15323-15333"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82250295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

PARTS: Unsupervised segmentation with slots, attention and independence maximization 部分:无监督分割槽，注意力和独立性最大化

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01027

Daniel Zoran, Rishabh Kabra, Alexander Lerchner, Danilo Jimenez Rezende

From an early age, humans perceive the visual world as composed of coherent objects with distinctive properties such as shape, size, and color. There is great interest in building models that are able to learn similar structure, ideally in an unsupervised manner. Learning such structure from complex 3D scenes that include clutter, occlusions, interactions, and camera motion is still an open challenge. We present a model that is able to segment visual scenes from complex 3D environments into distinct objects, learn disentangled representations of individual objects, and form consistent and coherent predictions of future frames, in a fully unsupervised manner. Our model (named PARTS) builds on recent approaches that utilize iterative amortized inference and transition dynamics for deep generative models. We achieve dramatic improvements in performance by introducing several novel contributions. We introduce a recurrent slot-attention like encoder which allows for top-down influence during inference. We argue that when inferring scene structure from image sequences it is better to use a fixed prior which is shared across the sequence rather than an auto-regressive prior as often used in prior work. We demonstrate our model’s success on three different video datasets (the popular benchmark CLEVRER; a simulated 3D Playroom environment; and a real-world Robotics Arm dataset). Finally, we analyze the contributions of the various model components and the representations learned by the model.

从很小的时候起，人类就认为视觉世界是由形状、大小和颜色等不同属性的连贯物体组成的。人们对建立能够学习类似结构的模型非常感兴趣，理想情况下是以无监督的方式。从复杂的3D场景中学习这种结构，包括杂乱，遮挡，相互作用和相机运动仍然是一个开放的挑战。我们提出了一个模型，该模型能够将复杂的3D环境中的视觉场景分割成不同的对象，学习单个对象的解纠缠表示，并以完全无监督的方式形成对未来框架的一致和连贯的预测。我们的模型(命名为PARTS)建立在最近的方法之上，这些方法利用迭代平摊推理和深度生成模型的转换动力学。通过引入一些新颖的贡献，我们实现了性能的显著改进。我们引入了一个类似于循环槽注意的编码器，它允许在推理过程中自上而下的影响。我们认为，当从图像序列推断场景结构时，最好使用在整个序列中共享的固定先验，而不是像以前的工作中经常使用的自回归先验。我们在三个不同的视频数据集上展示了我们的模型的成功(流行的基准clever;模拟的3D游戏室环境;和真实世界的机器人手臂数据集)。最后，我们分析了各种模型组件的贡献以及模型学习到的表示。

{"title":"PARTS: Unsupervised segmentation with slots, attention and independence maximization","authors":"Daniel Zoran, Rishabh Kabra, Alexander Lerchner, Danilo Jimenez Rezende","doi":"10.1109/ICCV48922.2021.01027","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01027","url":null,"abstract":"From an early age, humans perceive the visual world as composed of coherent objects with distinctive properties such as shape, size, and color. There is great interest in building models that are able to learn similar structure, ideally in an unsupervised manner. Learning such structure from complex 3D scenes that include clutter, occlusions, interactions, and camera motion is still an open challenge. We present a model that is able to segment visual scenes from complex 3D environments into distinct objects, learn disentangled representations of individual objects, and form consistent and coherent predictions of future frames, in a fully unsupervised manner. Our model (named PARTS) builds on recent approaches that utilize iterative amortized inference and transition dynamics for deep generative models. We achieve dramatic improvements in performance by introducing several novel contributions. We introduce a recurrent slot-attention like encoder which allows for top-down influence during inference. We argue that when inferring scene structure from image sequences it is better to use a fixed prior which is shared across the sequence rather than an auto-regressive prior as often used in prior work. We demonstrate our model’s success on three different video datasets (the popular benchmark CLEVRER; a simulated 3D Playroom environment; and a real-world Robotics Arm dataset). Finally, we analyze the contributions of the various model components and the representations learned by the model.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"16 1","pages":"10419-10427"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82547817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Synchronization of Group-labelled Multi-graphs 群标记多图的同步

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00639

Andrea Porfiri Dal Cin, L. Magri, F. Arrigoni, Andrea Fusiello, G. Boracchi

Synchronization refers to the problem of inferring the unknown values attached to vertices of a graph where edges are labelled with the ratio of the incident vertices, and labels belong to a group. This paper addresses the synchronization problem on multi-graphs, that are graphs with more than one edge connecting the same pair of nodes. The problem naturally arises when multiple measures are available to model the relationship between two vertices. This happens when different sensors measure the same quantity, or when the original graph is partitioned into sub-graphs that are solved independently. In this case, the relationships among sub-graphs give rise to multi-edges and the problem can be traced back to a multi-graph synchronization. The baseline solution reduces multi-graphs to simple ones by averaging their multi-edges, however this approach falls short because: i) averaging is well defined only for some groups and ii) the resulting estimator is less precise and accurate, as we prove empirically. Specifically, we present MULTISYNC, a synchronization algorithm for multi-graphs that is based on a principled constrained eigenvalue optimization. MULTISYNC is a general solution that can cope with any linear group and we show to be profitably usable both on synthetic and real problems.

同步是指推断图的顶点所附加的未知值的问题，其中的边被标记为事件顶点的比例，并且标签属于一个组。本文研究了多图上的同步问题，即有多条边连接同一对节点的图。当有多个度量方法可用于对两个顶点之间的关系进行建模时，自然会出现问题。当不同的传感器测量相同的量时，或者当原始图被分割成独立求解的子图时，就会发生这种情况。在这种情况下，子图之间的关系产生了多边，问题可以追溯到多图同步。基线解决方案通过平均多边将多图简化为简单图，然而这种方法存在不足，因为:i)平均仅对某些组有很好的定义，ii)结果估计器不太精确和准确，正如我们经验证明的那样。具体来说，我们提出了MULTISYNC，一种基于原则约束特征值优化的多图同步算法。MULTISYNC是一种通用的解决方案，可以处理任何线性群，我们证明了它在综合和实际问题上都是有益的。

{"title":"Synchronization of Group-labelled Multi-graphs","authors":"Andrea Porfiri Dal Cin, L. Magri, F. Arrigoni, Andrea Fusiello, G. Boracchi","doi":"10.1109/ICCV48922.2021.00639","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00639","url":null,"abstract":"Synchronization refers to the problem of inferring the unknown values attached to vertices of a graph where edges are labelled with the ratio of the incident vertices, and labels belong to a group. This paper addresses the synchronization problem on multi-graphs, that are graphs with more than one edge connecting the same pair of nodes. The problem naturally arises when multiple measures are available to model the relationship between two vertices. This happens when different sensors measure the same quantity, or when the original graph is partitioned into sub-graphs that are solved independently. In this case, the relationships among sub-graphs give rise to multi-edges and the problem can be traced back to a multi-graph synchronization. The baseline solution reduces multi-graphs to simple ones by averaging their multi-edges, however this approach falls short because: i) averaging is well defined only for some groups and ii) the resulting estimator is less precise and accurate, as we prove empirically. Specifically, we present MULTISYNC, a synchronization algorithm for multi-graphs that is based on a principled constrained eigenvalue optimization. MULTISYNC is a general solution that can cope with any linear group and we show to be profitably usable both on synthetic and real problems.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"34 1","pages":"6433-6443"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81289292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Persistent Homology based Graph Convolution Network for Fine-grained 3D Shape Segmentation 基于持久同调的图卷积网络细粒度三维形状分割

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00701

Chi-Chong Wong, C. Vong

Fine-grained 3D segmentation is an important task in 3D object understanding, especially in applications such as intelligent manufacturing or parts analysis for 3D objects. However, many challenges involved in such problem are yet to be solved, such as i) interpreting the complex structures located in different regions for 3D objects; ii) capturing fine-grained structures with sufficient topology correctness. Current deep learning and graph machine learning methods fail to tackle such challenges and thus provide inferior performance in fine-grained 3D analysis. In this work, methods in topological data analysis are incorporated with geometric deep learning model for the task of fine-grained segmentation for 3D objects. We propose a novel neural network model called Persistent Homology based Graph Convolution Network (PHGCN), which i) integrates persistent homology into graph convolution network to capture multi-scale structural information that can accurately represent complex structures for 3D objects; ii) applies a novel Persistence Diagram Loss (ℒPD) that provides sufficient topology correctness for segmentation over the fine-grained structures. Extensive experiments on fine-grained 3D segmentation validate the effectiveness of the proposed PHGCN model and show significant improvements over current state-of-the-art methods.

细粒度的三维分割是三维对象理解中的一项重要任务，特别是在三维对象的智能制造或零件分析等应用中。然而，这一问题所涉及的许多挑战尚未得到解决，例如:1)为3D物体解释位于不同区域的复杂结构;Ii)捕获具有足够拓扑正确性的细粒度结构。目前的深度学习和图形机器学习方法无法解决这些挑战，因此在细粒度3D分析中提供较差的性能。在这项工作中，拓扑数据分析方法与几何深度学习模型相结合，用于3D物体的细粒度分割任务。我们提出了一种新的神经网络模型，称为基于持久同调的图卷积网络(PHGCN)，该模型i)将持久同调集成到图卷积网络中，以捕获多尺度结构信息，可以准确地表示三维物体的复杂结构;ii)应用了一种新的持久化图损失(persistent Diagram Loss，简称为__pd)，它为细粒度结构的分割提供了足够的拓扑正确性。在细粒度三维分割上的大量实验验证了所提出的PHGCN模型的有效性，并显示出比当前最先进的方法有显著改进。

{"title":"Persistent Homology based Graph Convolution Network for Fine-grained 3D Shape Segmentation","authors":"Chi-Chong Wong, C. Vong","doi":"10.1109/ICCV48922.2021.00701","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00701","url":null,"abstract":"Fine-grained 3D segmentation is an important task in 3D object understanding, especially in applications such as intelligent manufacturing or parts analysis for 3D objects. However, many challenges involved in such problem are yet to be solved, such as i) interpreting the complex structures located in different regions for 3D objects; ii) capturing fine-grained structures with sufficient topology correctness. Current deep learning and graph machine learning methods fail to tackle such challenges and thus provide inferior performance in fine-grained 3D analysis. In this work, methods in topological data analysis are incorporated with geometric deep learning model for the task of fine-grained segmentation for 3D objects. We propose a novel neural network model called Persistent Homology based Graph Convolution Network (PHGCN), which i) integrates persistent homology into graph convolution network to capture multi-scale structural information that can accurately represent complex structures for 3D objects; ii) applies a novel Persistence Diagram Loss (ℒPD) that provides sufficient topology correctness for segmentation over the fine-grained structures. Extensive experiments on fine-grained 3D segmentation validate the effectiveness of the proposed PHGCN model and show significant improvements over current state-of-the-art methods.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"7078-7087"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84367691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Efficient Action Recognition via Dynamic Knowledge Propagation 基于动态知识传播的高效动作识别

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01346

Hanul Kim, Mihir Jain, Jun-Tae Lee, Sungrack Yun, F. Porikli

Efficient action recognition has become crucial to extend the success of action recognition to many real-world applications. Contrary to most existing methods, which mainly focus on selecting salient frames to reduce the computation cost, we focus more on making the most of the selected frames. To this end, we employ two networks of different capabilities that operate in tandem to efficiently recognize actions. Given a video, the lighter network processes more frames while the heavier one only processes a few. In order to enable the effective interaction between the two, we propose dynamic knowledge propagation based on a cross-attention mechanism. This is the main component of our framework that is essentially a student-teacher architecture, but as the teacher model continues to interact with the student model during inference, we call it a dynamic student-teacher framework. Through extensive experiments, we demonstrate the effectiveness of each component of our framework. Our method outperforms competing state-of-the-art methods on two video datasets: ActivityNet-v1.3 and Mini-Kinetics.

高效的动作识别已成为将动作识别成功扩展到许多实际应用的关键。与大多数现有方法主要侧重于选择显著帧以减少计算成本不同，我们更侧重于充分利用所选帧。为此，我们采用了两个不同功能的网络，它们串联运行以有效地识别动作。给定一个视频，较轻的网络处理更多帧，而较重的网络只处理少数帧。为了实现两者之间的有效交互，我们提出了基于交叉注意机制的动态知识传播。这是我们的框架的主要组成部分，本质上是一个学生-教师架构，但是由于教师模型在推理过程中继续与学生模型交互，我们称之为动态学生-教师框架。通过大量的实验，我们证明了框架的每个组成部分的有效性。我们的方法在两个视频数据集(ActivityNet-v1.3和Mini-Kinetics)上优于最先进的竞争方法。

{"title":"Efficient Action Recognition via Dynamic Knowledge Propagation","authors":"Hanul Kim, Mihir Jain, Jun-Tae Lee, Sungrack Yun, F. Porikli","doi":"10.1109/ICCV48922.2021.01346","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01346","url":null,"abstract":"Efficient action recognition has become crucial to extend the success of action recognition to many real-world applications. Contrary to most existing methods, which mainly focus on selecting salient frames to reduce the computation cost, we focus more on making the most of the selected frames. To this end, we employ two networks of different capabilities that operate in tandem to efficiently recognize actions. Given a video, the lighter network processes more frames while the heavier one only processes a few. In order to enable the effective interaction between the two, we propose dynamic knowledge propagation based on a cross-attention mechanism. This is the main component of our framework that is essentially a student-teacher architecture, but as the teacher model continues to interact with the student model during inference, we call it a dynamic student-teacher framework. Through extensive experiments, we demonstrate the effectiveness of each component of our framework. Our method outperforms competing state-of-the-art methods on two video datasets: ActivityNet-v1.3 and Mini-Kinetics.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"42 2 1","pages":"13699-13708"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82859374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15