首页 > 最新文献

2021 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

英文 中文
Occluded Person Re-Identification with Single-scale Global Representations 基于单尺度全局表征的闭塞人再识别
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01166
Cheng Yan, Guansong Pang, J. Jiao, Xiaolong Bai, Xuetao Feng, Chunhua Shen
Occluded person re-identification (ReID) aims at re-identifying occluded pedestrians from occluded or holistic images taken across multiple cameras. Current state-of-the-art (SOTA) occluded ReID models rely on some auxiliary modules, including pose estimation, feature pyramid and graph matching modules, to learn multi-scale and/or part-level features to tackle the occlusion challenges. This unfortunately leads to complex ReID models that (i) fail to generalize to challenging occlusions of diverse appearance, shape or size, and (ii) become ineffective in handling non-occluded pedestrians. However, real-world ReID applications typically have highly diverse occlusions and involve a hybrid of occluded and non-occluded pedestrians. To address these two issues, we introduce a novel ReID model that learns discriminative single-scale global-level pedestrian features by enforcing a novel exponentially sensitive yet bounded distance loss on occlusion-based augmented data. We show for the first time that learning single-scale global features without using these auxiliary modules is able to outperform the SOTA multi-scale and/or part-level feature-based models. Further, our simple model can achieve new SOTA performance in both occluded and non-occluded ReID, as shown by extensive results on three occluded and two general ReID benchmarks. Additionally, we create a large-scale occluded person ReID dataset with various occlusions in different scenes, which is significantly larger and contains more diverse occlusions and pedestrian dressings than existing occluded ReID datasets, providing a more faithful occluded ReID benchmark. The dataset is available at: https://git.io/OPReID
遮挡人再识别(ReID)旨在从多个摄像头拍摄的遮挡或整体图像中重新识别被遮挡的行人。当前最先进的(SOTA)遮挡ReID模型依赖于一些辅助模块,包括姿态估计、特征金字塔和图匹配模块,来学习多尺度和/或部分级特征来解决遮挡挑战。不幸的是,这导致了复杂的ReID模型(i)无法推广到具有不同外观、形状或大小的具有挑战性的遮挡,以及(ii)在处理未遮挡的行人时变得无效。然而,现实世界的ReID应用通常具有高度多样化的遮挡,并且涉及遮挡和非遮挡行人的混合。为了解决这两个问题,我们引入了一种新的ReID模型,该模型通过对基于遮挡的增强数据施加新的指数敏感但有界的距离损失来学习判别单尺度全球行人特征。我们首次证明,在不使用这些辅助模块的情况下学习单尺度全局特征能够优于SOTA多尺度和/或基于部件级特征的模型。此外,我们的简单模型可以在遮挡和非遮挡的ReID中实现新的SOTA性能,在三个遮挡和两个一般ReID基准上的广泛结果表明。此外,我们创建了不同场景下不同遮挡的大规模遮挡人ReID数据集,与现有遮挡ReID数据集相比,该数据集明显更大,包含更多不同的遮挡和行人敷料,提供了更忠实的遮挡ReID基准。该数据集可从https://git.io/OPReID获取
{"title":"Occluded Person Re-Identification with Single-scale Global Representations","authors":"Cheng Yan, Guansong Pang, J. Jiao, Xiaolong Bai, Xuetao Feng, Chunhua Shen","doi":"10.1109/ICCV48922.2021.01166","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01166","url":null,"abstract":"Occluded person re-identification (ReID) aims at re-identifying occluded pedestrians from occluded or holistic images taken across multiple cameras. Current state-of-the-art (SOTA) occluded ReID models rely on some auxiliary modules, including pose estimation, feature pyramid and graph matching modules, to learn multi-scale and/or part-level features to tackle the occlusion challenges. This unfortunately leads to complex ReID models that (i) fail to generalize to challenging occlusions of diverse appearance, shape or size, and (ii) become ineffective in handling non-occluded pedestrians. However, real-world ReID applications typically have highly diverse occlusions and involve a hybrid of occluded and non-occluded pedestrians. To address these two issues, we introduce a novel ReID model that learns discriminative single-scale global-level pedestrian features by enforcing a novel exponentially sensitive yet bounded distance loss on occlusion-based augmented data. We show for the first time that learning single-scale global features without using these auxiliary modules is able to outperform the SOTA multi-scale and/or part-level feature-based models. Further, our simple model can achieve new SOTA performance in both occluded and non-occluded ReID, as shown by extensive results on three occluded and two general ReID benchmarks. Additionally, we create a large-scale occluded person ReID dataset with various occlusions in different scenes, which is significantly larger and contains more diverse occlusions and pedestrian dressings than existing occluded ReID datasets, providing a more faithful occluded ReID benchmark. The dataset is available at: https://git.io/OPReID","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"4 1","pages":"11855-11864"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74096042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
BEV-Net: Assessing Social Distancing Compliance by Joint People Localization and Geometric Reasoning BEV-Net:通过联合人员定位和几何推理评估社会距离依从性
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00535
Zhirui Dai, Yue-Ren Jiang, Yi Li, Bo Liu, Antoni B. Chan, Nuno Vasconcelos
Social distancing, an essential public health measure to limit the spread of contagious diseases, has gained significant attention since the outbreak of the COVID-19 pandemic. In this work, the problem of visual social distancing compliance assessment in busy public areas, with wide field-of-view cameras, is considered. A dataset of crowd scenes with people annotations under a bird’s eye view (BEV) and ground truth for metric distances is introduced, and several measures for the evaluation of social distance detection systems are proposed. A multi-branch network, BEV-Net, is proposed to localize individuals in world coordinates and identify high-risk regions where social distancing is violated. BEV-Net combines detection of head and feet locations, camera pose estimation, a differentiable homography module to map image into BEV coordinates, and geometric reasoning to produce a BEV map of the people locations in the scene. Experiments on complex crowded scenes demonstrate the power of the approach and show superior performance over baselines derived from methods in the literature. Applications of interest for public health decision makers are finally discussed. Datasets, code and pretrained models are publicly available at GitHub1.
保持社交距离是限制传染病传播的重要公共卫生措施,自新冠肺炎疫情爆发以来备受关注。在这项工作中,研究了在繁忙的公共区域,使用宽视场摄像机的视觉社会距离合规评估问题。介绍了一种鸟瞰下带有人物注释的人群场景数据集和度量距离的地面真值数据集,并提出了几种评价社会距离检测系统的方法。提出了一种多分支网络BEV-Net,用于在世界坐标上对个体进行定位,并识别违反社交距离的高风险区域。BEV- net结合了头和脚位置检测、相机姿态估计、可微分单应性模块将图像映射到BEV坐标,以及几何推理来生成场景中人物位置的BEV地图。在复杂拥挤场景上的实验证明了该方法的强大功能,并显示出优于文献中方法衍生的基线的性能。最后讨论了公共卫生决策者感兴趣的应用。数据集、代码和预训练模型在GitHub1上公开可用。
{"title":"BEV-Net: Assessing Social Distancing Compliance by Joint People Localization and Geometric Reasoning","authors":"Zhirui Dai, Yue-Ren Jiang, Yi Li, Bo Liu, Antoni B. Chan, Nuno Vasconcelos","doi":"10.1109/ICCV48922.2021.00535","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00535","url":null,"abstract":"Social distancing, an essential public health measure to limit the spread of contagious diseases, has gained significant attention since the outbreak of the COVID-19 pandemic. In this work, the problem of visual social distancing compliance assessment in busy public areas, with wide field-of-view cameras, is considered. A dataset of crowd scenes with people annotations under a bird’s eye view (BEV) and ground truth for metric distances is introduced, and several measures for the evaluation of social distance detection systems are proposed. A multi-branch network, BEV-Net, is proposed to localize individuals in world coordinates and identify high-risk regions where social distancing is violated. BEV-Net combines detection of head and feet locations, camera pose estimation, a differentiable homography module to map image into BEV coordinates, and geometric reasoning to produce a BEV map of the people locations in the scene. Experiments on complex crowded scenes demonstrate the power of the approach and show superior performance over baselines derived from methods in the literature. Applications of interest for public health decision makers are finally discussed. Datasets, code and pretrained models are publicly available at GitHub1.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"47 47 1","pages":"5381-5391"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77050331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Fast Light-field Disparity Estimation with Multi-disparity-scale Cost Aggregation 基于多视差尺度成本聚合的快速光场视差估计
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00626
Zhicong Huang, Xue-mei Hu, Zhou Xue, Weizhu Xu, Tao Yue
Light field images contain both angular and spatial information of captured light rays. The rich information of light fields enables straightforward disparity recovery capability but demands high computational cost as well. In this paper, we design a lightweight disparity estimation model with physical-based multi-disparity-scale cost volume aggregation for fast disparity estimation. By introducing a sub-network of edge guidance, we significantly improve the recovery of geometric details near edges and improve the overall performance. We test the proposed model extensively on both synthetic and real-captured datasets, which provide both densely and sparsely sampled light fields. Finally, we significantly reduce computation cost and GPU memory consumption, while achieving comparable performance with state-of-the-art disparity estimation methods for light fields. Our source code is available at https://github.com/zcong17huang/FastLFnet.
光场图像包含捕获光线的角度信息和空间信息。光场信息的丰富使视差恢复变得简单,但计算成本较高。为了快速估计视差,本文设计了一种基于物理的多视差尺度成本体积聚合的轻量级视差估计模型。通过引入边缘引导子网络,我们显著提高了边缘附近几何细节的恢复,提高了整体性能。我们在合成和实际捕获的数据集上广泛地测试了所提出的模型,这些数据集提供了密集和稀疏采样的光场。最后,我们显着降低了计算成本和GPU内存消耗,同时实现了与最先进的视差估计方法相当的性能。我们的源代码可从https://github.com/zcong17huang/FastLFnet获得。
{"title":"Fast Light-field Disparity Estimation with Multi-disparity-scale Cost Aggregation","authors":"Zhicong Huang, Xue-mei Hu, Zhou Xue, Weizhu Xu, Tao Yue","doi":"10.1109/ICCV48922.2021.00626","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00626","url":null,"abstract":"Light field images contain both angular and spatial information of captured light rays. The rich information of light fields enables straightforward disparity recovery capability but demands high computational cost as well. In this paper, we design a lightweight disparity estimation model with physical-based multi-disparity-scale cost volume aggregation for fast disparity estimation. By introducing a sub-network of edge guidance, we significantly improve the recovery of geometric details near edges and improve the overall performance. We test the proposed model extensively on both synthetic and real-captured datasets, which provide both densely and sparsely sampled light fields. Finally, we significantly reduce computation cost and GPU memory consumption, while achieving comparable performance with state-of-the-art disparity estimation methods for light fields. Our source code is available at https://github.com/zcong17huang/FastLFnet.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"6300-6309"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77120881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Unsupervised Segmentation incorporating Shape Prior via Generative Adversarial Networks 基于生成对抗网络的形状先验的无监督分割
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00723
Dahye Kim, Byung-Woo Hong
We present an image segmentation algorithm that is developed in an unsupervised deep learning framework. The delineation of object boundaries often fails due to the nuisance factors such as illumination changes and occlusions. Thus, we initially propose an unsupervised image decomposition algorithm to obtain an intrinsic representation that is robust with respect to undesirable bias fields based on a multiplicative image model. The obtained intrinsic image is subsequently provided to an unsupervised segmentation procedure that is developed based on a piecewise smooth model. The segmentation model is further designed to incorporate a geometric constraint imposed in the generative adversarial network framework where the discrepancy between the distribution of partitioning functions and the distribution of prior shapes is minimized. We demonstrate the effectiveness and robustness of the proposed algorithm in particular with bias fields and occlusions using simple yet illustrative synthetic examples and a benchmark dataset for image segmentation.
我们提出了一种在无监督深度学习框架中开发的图像分割算法。由于光照变化和遮挡等干扰因素,物体边界的划定常常失败。因此,我们最初提出了一种无监督图像分解算法,以获得基于乘法图像模型的对不希望的偏置域具有鲁棒性的内在表示。随后将所获得的固有图像提供给基于分段平滑模型开发的无监督分割程序。该分割模型进一步设计为在生成对抗网络框架中加入几何约束,使分割函数分布与先验形状分布之间的差异最小化。我们使用简单但说明性的合成示例和用于图像分割的基准数据集证明了所提出算法的有效性和鲁棒性,特别是在偏置场和遮挡方面。
{"title":"Unsupervised Segmentation incorporating Shape Prior via Generative Adversarial Networks","authors":"Dahye Kim, Byung-Woo Hong","doi":"10.1109/ICCV48922.2021.00723","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00723","url":null,"abstract":"We present an image segmentation algorithm that is developed in an unsupervised deep learning framework. The delineation of object boundaries often fails due to the nuisance factors such as illumination changes and occlusions. Thus, we initially propose an unsupervised image decomposition algorithm to obtain an intrinsic representation that is robust with respect to undesirable bias fields based on a multiplicative image model. The obtained intrinsic image is subsequently provided to an unsupervised segmentation procedure that is developed based on a piecewise smooth model. The segmentation model is further designed to incorporate a geometric constraint imposed in the generative adversarial network framework where the discrepancy between the distribution of partitioning functions and the distribution of prior shapes is minimized. We demonstrate the effectiveness and robustness of the proposed algorithm in particular with bias fields and occlusions using simple yet illustrative synthetic examples and a benchmark dataset for image segmentation.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"38 1","pages":"7304-7314"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77645751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An Elastica Geodesic Approach with Convexity Shape Prior 具有凸形先验的弹性测地线方法
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00682
Da Chen, L. Cohen, J. Mirebeau, X. Tai
The minimal geodesic models based on the Eikonal equations are capable of finding suitable solutions in various image segmentation scenarios. Existing geodesic-based segmentation approaches usually exploit the image features in conjunction with geometric regularization terms (such as curve length or elastica length) for computing geodesic paths. In this paper, we consider a more complicated problem: finding simple and closed geodesic curves which are imposed a convexity shape prior. The proposed approach relies on an orientation-lifting strategy, by which a planar curve can be mapped to an high-dimensional orientation space. The convexity shape prior serves as a constraint for the construction of local metrics. The geodesic curves in the lifted space then can be efficiently computed through the fast marching method. In addition, we introduce a way to incorporate region-based homogeneity features into the proposed geodesic model so as to solve the region-based segmentation issues with shape prior constraints.
基于Eikonal方程的最小测地线模型能够在各种图像分割场景中找到合适的解。现有的基于测地线的分割方法通常利用图像特征和几何正则化项(如曲线长度或弹性长度)来计算测地线路径。在本文中,我们考虑了一个更复杂的问题:寻找简单和封闭的测地线曲线,这些曲线被施加了一个凸形状先验。该方法依赖于一种方向提升策略,通过该策略可以将平面曲线映射到高维方向空间。凸形先验作为局部度量构造的约束条件。通过快速行军法,可以有效地计算提升空间中的测地线曲线。此外,我们还引入了一种将基于区域的均匀性特征融入到所提出的测地线模型中的方法,以解决具有形状先验约束的基于区域的分割问题。
{"title":"An Elastica Geodesic Approach with Convexity Shape Prior","authors":"Da Chen, L. Cohen, J. Mirebeau, X. Tai","doi":"10.1109/ICCV48922.2021.00682","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00682","url":null,"abstract":"The minimal geodesic models based on the Eikonal equations are capable of finding suitable solutions in various image segmentation scenarios. Existing geodesic-based segmentation approaches usually exploit the image features in conjunction with geometric regularization terms (such as curve length or elastica length) for computing geodesic paths. In this paper, we consider a more complicated problem: finding simple and closed geodesic curves which are imposed a convexity shape prior. The proposed approach relies on an orientation-lifting strategy, by which a planar curve can be mapped to an high-dimensional orientation space. The convexity shape prior serves as a constraint for the construction of local metrics. The geodesic curves in the lifted space then can be efficiently computed through the fast marching method. In addition, we introduce a way to incorporate region-based homogeneity features into the proposed geodesic model so as to solve the region-based segmentation issues with shape prior constraints.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"33 1","pages":"6880-6889"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75082018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multi-Modal Multi-Action Video Recognition 多模态多动作视频识别
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01342
Zhensheng Shi, Ju Liang, Qianqian Li, Haiyong Zheng, Zhaorui Gu, Junyu Dong, Bing Zheng
Multi-action video recognition is much more challenging due to the requirement to recognize multiple actions co-occurring simultaneously or sequentially. Modeling multi-action relations is beneficial and crucial to understand videos with multiple actions, and actions in a video are usually presented in multiple modalities. In this paper, we propose a novel multi-action relation model for videos, by leveraging both relational graph convolutional networks (GCNs) and video multi-modality. We first build multi-modal GCNs to explore modality-aware multi-action relations, fed by modality-specific action representation as node features, i.e., spatiotemporal features learned by 3D convolutional neural network (CNN), audio and textual embeddings queried from respective feature lexicons. We then joint both multi-modal CNN-GCN models and multi-modal feature representations for learning better relational action predictions. Ablation study, multi-action relation visualization, and boosts analysis, all show efficacy of our multi-modal multi-action relation modeling. Also our method achieves state-of-the-art performance on large-scale multi-action M-MiT benchmark. Our code is made publicly available at https://github.com/zhenglab/multi-action-video.
由于需要识别同时或顺序发生的多个动作,多动作视频识别更具挑战性。多动作关系建模对于理解具有多动作的视频是有益的,也是至关重要的,而视频中的动作通常以多种方式呈现。本文通过利用关系图卷积网络(GCNs)和视频多模态,提出了一种新的视频多动作关系模型。我们首先构建多模态GCNs来探索模态感知的多动作关系,通过模态特定的动作表示作为节点特征,即3D卷积神经网络(CNN)学习的时空特征,从各自的特征词典中查询的音频和文本嵌入。然后,我们将多模态CNN-GCN模型和多模态特征表示结合起来,以学习更好的关系动作预测。消融研究、多作用关系可视化和boost分析均显示了多模态多作用关系建模的有效性。此外,我们的方法在大规模多动作M-MiT基准上达到了最先进的性能。我们的代码可以在https://github.com/zhenglab/multi-action-video上公开获得。
{"title":"Multi-Modal Multi-Action Video Recognition","authors":"Zhensheng Shi, Ju Liang, Qianqian Li, Haiyong Zheng, Zhaorui Gu, Junyu Dong, Bing Zheng","doi":"10.1109/ICCV48922.2021.01342","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01342","url":null,"abstract":"Multi-action video recognition is much more challenging due to the requirement to recognize multiple actions co-occurring simultaneously or sequentially. Modeling multi-action relations is beneficial and crucial to understand videos with multiple actions, and actions in a video are usually presented in multiple modalities. In this paper, we propose a novel multi-action relation model for videos, by leveraging both relational graph convolutional networks (GCNs) and video multi-modality. We first build multi-modal GCNs to explore modality-aware multi-action relations, fed by modality-specific action representation as node features, i.e., spatiotemporal features learned by 3D convolutional neural network (CNN), audio and textual embeddings queried from respective feature lexicons. We then joint both multi-modal CNN-GCN models and multi-modal feature representations for learning better relational action predictions. Ablation study, multi-action relation visualization, and boosts analysis, all show efficacy of our multi-modal multi-action relation modeling. Also our method achieves state-of-the-art performance on large-scale multi-action M-MiT benchmark. Our code is made publicly available at https://github.com/zhenglab/multi-action-video.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"99 1","pages":"13658-13667"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76585518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
EC-DARTS: Inducing Equalized and Consistent Optimization into DARTS ec - dart:在dart中引入均衡和一致的优化
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01177
Qinqin Zhou, Xiawu Zheng, Liujuan Cao, Bineng Zhong, Teng Xi, Gang Zhang, Errui Ding, Mingliang Xu, Rongrong Ji
Based on the relaxed search space, differential architecture search (DARTS) is efficient in searching for a high-performance architecture. However, the unbalanced competition among operations that have different trainable parameters causes the model collapse. Besides, the inconsistent structures in the search and retraining stages causes cross-stage evaluation to be unstable. In this paper, we call these issues as an operation gap and a structure gap in DARTS. To shrink these gaps, we propose to induce equalized and consistent optimization in differentiable architecture search (EC-DARTS). EC-DARTS decouples different operations based on their categories to optimize the operation weights so that the operation gap between them is shrinked. Besides, we introduce an induced structural transition to bridge the structure gap between the model structures in the search and retraining stages. Extensive experiments on CIFAR10 and ImageNet demonstrate the effectiveness of our method. Specifically, on CIFAR10, we achieve a test error of 2.39%, while only 0.3 GPU days on NVIDIA TITAN V. On ImageNet, our method achieves a top-1 error of 23.6% under the mobile setting.
差分体系结构搜索(dart)是一种基于宽松搜索空间的高性能体系结构搜索方法。然而,具有不同可训练参数的操作之间的不平衡竞争导致模型崩溃。此外,搜索和再训练阶段的结构不一致导致了跨阶段评估的不稳定性。在本文中,我们将这些问题称为dart中的操作缺口和结构缺口。为了缩小这些差距,我们建议在可微架构搜索(ec - dart)中引入均衡和一致的优化。ec - dart根据不同的操作类别对不同的操作进行解耦,以优化操作权重,从而缩小它们之间的操作差距。此外,我们引入了诱导结构转换,以弥合模型结构在搜索和再训练阶段之间的结构差距。在CIFAR10和ImageNet上的大量实验证明了该方法的有效性。具体而言,在CIFAR10上,我们实现了2.39%的测试误差,而在NVIDIA TITAN v上只有0.3个GPU天。在ImageNet上,我们的方法在移动设置下实现了23.6%的top-1误差。
{"title":"EC-DARTS: Inducing Equalized and Consistent Optimization into DARTS","authors":"Qinqin Zhou, Xiawu Zheng, Liujuan Cao, Bineng Zhong, Teng Xi, Gang Zhang, Errui Ding, Mingliang Xu, Rongrong Ji","doi":"10.1109/ICCV48922.2021.01177","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01177","url":null,"abstract":"Based on the relaxed search space, differential architecture search (DARTS) is efficient in searching for a high-performance architecture. However, the unbalanced competition among operations that have different trainable parameters causes the model collapse. Besides, the inconsistent structures in the search and retraining stages causes cross-stage evaluation to be unstable. In this paper, we call these issues as an operation gap and a structure gap in DARTS. To shrink these gaps, we propose to induce equalized and consistent optimization in differentiable architecture search (EC-DARTS). EC-DARTS decouples different operations based on their categories to optimize the operation weights so that the operation gap between them is shrinked. Besides, we introduce an induced structural transition to bridge the structure gap between the model structures in the search and retraining stages. Extensive experiments on CIFAR10 and ImageNet demonstrate the effectiveness of our method. Specifically, on CIFAR10, we achieve a test error of 2.39%, while only 0.3 GPU days on NVIDIA TITAN V. On ImageNet, our method achieves a top-1 error of 23.6% under the mobile setting.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"48 1","pages":"11966-11975"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77026827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
FashionMirror: Co-attention Feature-remapping Virtual Try-on with Sequential Template Poses 时尚镜:共同关注的特征重新映射虚拟试穿与顺序模板姿势
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01355
Chieh-Yun Chen, Ling Lo, Pin-Jui Huang, Hong-Han Shuai, Wen-Huang Cheng
Virtual try-on tasks have drawn increased attention. Prior arts focus on tackling this task via warping clothes and fusing the information at the pixel level with the help of semantic segmentation. However, conducting semantic segmentation is time-consuming and easily causes error accumulation over time. Besides, warping the information at the pixel level instead of the feature level limits the performance (e.g., unable to generate different views) and is unstable since it directly demonstrates the results even with a misalignment. In contrast, fusing information at the feature level can be further refined by the convolution to obtain the final results. Based on these assumptions, we propose a co-attention feature-remapping framework, namely FashionMirror, that generates the try-on results according to the driven-pose sequence in two stages. In the first stage, we consider the source human image and the target try-on clothes to predict the removed mask and the try-on clothing mask, which replaces the pre-processed semantic segmentation and reduces the inference time. In the second stage, we first remove the clothes on the source human via the removed mask and warp the clothing features conditioning on the try-on clothing mask to fit the next frame human. Meanwhile, we predict the optical flows from the consecutive 2D poses and warp the source human to the next frame at the feature level. Then, we enhance the clothing features and source human features in every frame to generate realistic try-on results with spatiotemporal smoothness. Both qualitative and quantitative results show that FashionMirror outperforms the state-of-the-art virtual try-on approaches.
虚拟试戴任务引起了越来越多的关注。现有技术主要通过扭曲衣服和在语义分割的帮助下在像素级融合信息来解决这个问题。然而,进行语义分割是费时的,并且容易导致错误的积累。此外,在像素级而不是特征级扭曲信息会限制性能(例如,无法生成不同的视图),并且不稳定,因为它即使在不对齐的情况下也直接显示结果。而特征级的融合信息可以通过卷积进一步细化,从而得到最终结果。基于这些假设,我们提出了一个共关注特征重映射框架,即FashionMirror,该框架根据驱动姿势序列分两个阶段生成试穿结果。在第一阶段,我们考虑源人体图像和目标试衣来预测去除的面具和试衣面具,取代了预处理的语义分割,减少了推理时间。在第二阶段,我们首先通过移除的面具去除源人身上的衣服,并在试穿的衣服面具上扭曲衣服特征,以适应下一帧人。同时,我们从连续的二维姿态预测光流,并在特征级将源人体扭曲到下一帧。然后,我们对每一帧的服装特征和源人物特征进行增强,生成具有时空平滑性的逼真试穿结果。定性和定量结果都表明,FashionMirror优于最先进的虚拟试戴方法。
{"title":"FashionMirror: Co-attention Feature-remapping Virtual Try-on with Sequential Template Poses","authors":"Chieh-Yun Chen, Ling Lo, Pin-Jui Huang, Hong-Han Shuai, Wen-Huang Cheng","doi":"10.1109/ICCV48922.2021.01355","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01355","url":null,"abstract":"Virtual try-on tasks have drawn increased attention. Prior arts focus on tackling this task via warping clothes and fusing the information at the pixel level with the help of semantic segmentation. However, conducting semantic segmentation is time-consuming and easily causes error accumulation over time. Besides, warping the information at the pixel level instead of the feature level limits the performance (e.g., unable to generate different views) and is unstable since it directly demonstrates the results even with a misalignment. In contrast, fusing information at the feature level can be further refined by the convolution to obtain the final results. Based on these assumptions, we propose a co-attention feature-remapping framework, namely FashionMirror, that generates the try-on results according to the driven-pose sequence in two stages. In the first stage, we consider the source human image and the target try-on clothes to predict the removed mask and the try-on clothing mask, which replaces the pre-processed semantic segmentation and reduces the inference time. In the second stage, we first remove the clothes on the source human via the removed mask and warp the clothing features conditioning on the try-on clothing mask to fit the next frame human. Meanwhile, we predict the optical flows from the consecutive 2D poses and warp the source human to the next frame at the feature level. Then, we enhance the clothing features and source human features in every frame to generate realistic try-on results with spatiotemporal smoothness. Both qualitative and quantitative results show that FashionMirror outperforms the state-of-the-art virtual try-on approaches.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"43 1","pages":"13789-13798"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77093870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
TempNet: Online Semantic Segmentation on Large-scale Point Cloud Series temnet:大规模点云系列的在线语义分割
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00703
Yunsong Zhou, Hongzi Zhu, Chunqin Li, Tiankai Cui, Shan Chang, Minyi Guo
Online semantic segmentation on a time series of point cloud frames is an essential task in autonomous driving. Existing models focus on single-frame segmentation, which cannot achieve satisfactory segmentation accuracy and offer unstably flicker among frames. In this paper, we propose a light-weight semantic segmentation framework for largescale point cloud series, called TempNet, which can improve both the accuracy and the stability of existing semantic segmentation models by combining a novel frame aggregation scheme. To be computational cost-efficient, feature extraction and aggregation are only conducted on a small portion of key frames via a temporal feature aggregation (TFA) network using an attentional pooling mechanism, and such enhanced features are propagated to the intermediate non-key frames. To avoid information loss from non-key frames, a partial feature update (PFU) network is designed to partially update the propagated features with the local features extracted on a non-key frame if a large disparity between the two is quickly assessed. As a result, consistent and information-rich features can be obtained for each frame. We implement TempNet on five state-of-the-art (SOTA) point cloud segmentation models and conduct extensive experiments on the SemanticKITTI dataset. Results demonstrate that TempNet outperforms SOTA competitors by wide margins with little extra computational cost.
在点云帧时间序列上的在线语义分割是自动驾驶中的一项重要任务。现有的模型主要是单帧分割,分割精度不高,帧间闪烁不稳定。本文提出了一种轻量级的大规模点云语义分割框架TempNet,该框架结合了一种新的帧聚合方案,提高了现有语义分割模型的准确性和稳定性。为了节省计算成本,通过使用注意力池机制的时间特征聚合(TFA)网络只对一小部分关键帧进行特征提取和聚合,并将这些增强的特征传播到中间的非关键帧。为了避免非关键帧的信息丢失,设计了一种局部特征更新(PFU)网络,当快速评估到非关键帧与非关键帧之间的差异较大时,用在非关键帧上提取的局部特征对传播的特征进行部分更新。这样,每一帧图像都能获得一致的、信息丰富的特征。我们在五个最先进的(SOTA)点云分割模型上实现了TempNet,并在SemanticKITTI数据集上进行了广泛的实验。结果表明,TempNet以很少的额外计算成本大大优于SOTA竞争对手。
{"title":"TempNet: Online Semantic Segmentation on Large-scale Point Cloud Series","authors":"Yunsong Zhou, Hongzi Zhu, Chunqin Li, Tiankai Cui, Shan Chang, Minyi Guo","doi":"10.1109/ICCV48922.2021.00703","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00703","url":null,"abstract":"Online semantic segmentation on a time series of point cloud frames is an essential task in autonomous driving. Existing models focus on single-frame segmentation, which cannot achieve satisfactory segmentation accuracy and offer unstably flicker among frames. In this paper, we propose a light-weight semantic segmentation framework for largescale point cloud series, called TempNet, which can improve both the accuracy and the stability of existing semantic segmentation models by combining a novel frame aggregation scheme. To be computational cost-efficient, feature extraction and aggregation are only conducted on a small portion of key frames via a temporal feature aggregation (TFA) network using an attentional pooling mechanism, and such enhanced features are propagated to the intermediate non-key frames. To avoid information loss from non-key frames, a partial feature update (PFU) network is designed to partially update the propagated features with the local features extracted on a non-key frame if a large disparity between the two is quickly assessed. As a result, consistent and information-rich features can be obtained for each frame. We implement TempNet on five state-of-the-art (SOTA) point cloud segmentation models and conduct extensive experiments on the SemanticKITTI dataset. Results demonstrate that TempNet outperforms SOTA competitors by wide margins with little extra computational cost.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"14 1","pages":"7098-7107"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79032589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
DeepPRO: Deep Partial Point Cloud Registration of Objects DeepPRO:对象的深度局部点云配准
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00563
Donghoon Lee, Onur C. Hamsici, Steven Feng, P. Sharma, Thorsten Gernoth, Apple
We consider the problem of online and real-time registration of partial point clouds obtained from an unseen real-world rigid object without knowing its 3D model. The point cloud is partial as it is obtained by a depth sensor capturing only the visible part of the object from a certain viewpoint. It introduces two main challenges: 1) two partial point clouds do not fully overlap and 2) keypoints tend to be less reliable when the visible part of the object does not have salient local structures. To address these issues, we propose DeepPRO, a keypoint-free and an end-to-end trainable deep neural network. Its core idea is inspired by how humans align two point clouds: we can imagine how two point clouds will look like after the registration based on their shape. To realize the idea, DeepPRO has inputs of two partial point clouds and directly predicts the point-wise location of the aligned point cloud. By preserving the ordering of points during the prediction, we enjoy dense correspondences between input and predicted point clouds when inferring rigid transform parameters. We conduct extensive experiments on the real-world Linemod and synthetic ModelNet40 datasets. In addition, we collect and evaluate on the PRO1k dataset, a large-scale version of Linemod meant to test generalization to real-world scans. Results show that DeepPRO achieves the best accuracy against thirteen strong baseline methods, e.g., 2.2mm ADD on the Linemod dataset, while running 50 fps on mobile devices.
我们考虑了在不知道真实世界的三维模型的情况下,从看不见的刚性物体上获得的部分点云的在线和实时配准问题。点云是局部的,因为它是由深度传感器从某个视点捕获物体的可见部分获得的。它引入了两个主要挑战:1)两个部分点云不完全重叠;2)当物体的可见部分没有显著的局部结构时,关键点往往不太可靠。为了解决这些问题,我们提出了DeepPRO,一个无关键点和端到端可训练的深度神经网络。其核心思想的灵感来自于人类如何对齐两个点云:我们可以想象两个点云在根据它们的形状进行配准后的样子。为了实现这一想法,DeepPRO有两个局部点云的输入,并直接预测对齐点云的逐点位置。通过在预测过程中保持点的顺序,我们在推断刚性变换参数时享受输入和预测点云之间的密集对应。我们在真实世界的Linemod和合成的ModelNet40数据集上进行了广泛的实验。此外,我们收集并评估了PRO1k数据集,这是Linemod的一个大规模版本,旨在测试对真实世界扫描的泛化。结果表明,DeepPRO在13种强基线方法(例如,在Linemod数据集上使用2.2mm ADD)下达到了最佳精度,而在移动设备上运行50 fps。
{"title":"DeepPRO: Deep Partial Point Cloud Registration of Objects","authors":"Donghoon Lee, Onur C. Hamsici, Steven Feng, P. Sharma, Thorsten Gernoth, Apple","doi":"10.1109/ICCV48922.2021.00563","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00563","url":null,"abstract":"We consider the problem of online and real-time registration of partial point clouds obtained from an unseen real-world rigid object without knowing its 3D model. The point cloud is partial as it is obtained by a depth sensor capturing only the visible part of the object from a certain viewpoint. It introduces two main challenges: 1) two partial point clouds do not fully overlap and 2) keypoints tend to be less reliable when the visible part of the object does not have salient local structures. To address these issues, we propose DeepPRO, a keypoint-free and an end-to-end trainable deep neural network. Its core idea is inspired by how humans align two point clouds: we can imagine how two point clouds will look like after the registration based on their shape. To realize the idea, DeepPRO has inputs of two partial point clouds and directly predicts the point-wise location of the aligned point cloud. By preserving the ordering of points during the prediction, we enjoy dense correspondences between input and predicted point clouds when inferring rigid transform parameters. We conduct extensive experiments on the real-world Linemod and synthetic ModelNet40 datasets. In addition, we collect and evaluate on the PRO1k dataset, a large-scale version of Linemod meant to test generalization to real-world scans. Results show that DeepPRO achieves the best accuracy against thirteen strong baseline methods, e.g., 2.2mm ADD on the Linemod dataset, while running 50 fps on mobile devices.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"18 1","pages":"5663-5672"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75529339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1