首页 > 最新文献

2021 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

英文 中文
Learning Multiple Pixelwise Tasks Based on Loss Scale Balancing 基于损失尺度平衡的多像素任务学习
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00506
Jae-Han Lee, Chulwoo Lee, Chang-Su Kim
We propose a novel loss weighting algorithm, called loss scale balancing (LSB), for multi-task learning (MTL) of pixelwise vision tasks. An MTL model is trained to estimate multiple pixelwise predictions using an overall loss, which is a linear combination of individual task losses. The proposed algorithm dynamically adjusts the linear weights to learn all tasks effectively. Instead of controlling the trend of each loss value directly, we balance the loss scale — the product of the loss value and its weight — periodically. In addition, by evaluating the difficulty of each task based on the previous loss record, the proposed algorithm focuses more on difficult tasks during training. Experimental results show that the proposed algorithm outperforms conventional weighting algorithms for MTL of various pixelwise tasks. Codes are available at https://github.com/jaehanlee-mcl/LSB-MTL.
针对像素级视觉任务的多任务学习,提出了一种新的损失加权算法——损失尺度平衡(LSB)。MTL模型被训练成使用整体损失(单个任务损失的线性组合)来估计多个像素级预测。该算法动态调整线性权值,有效学习所有任务。我们不是直接控制每个损失值的趋势,而是周期性地平衡损失规模——损失值与其权重的乘积。此外,基于之前的损失记录对每个任务的难度进行评估,使算法更加关注训练过程中较难的任务。实验结果表明,对于各种像素级任务的MTL,该算法优于传统的加权算法。代码可在https://github.com/jaehanlee-mcl/LSB-MTL上获得。
{"title":"Learning Multiple Pixelwise Tasks Based on Loss Scale Balancing","authors":"Jae-Han Lee, Chulwoo Lee, Chang-Su Kim","doi":"10.1109/ICCV48922.2021.00506","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00506","url":null,"abstract":"We propose a novel loss weighting algorithm, called loss scale balancing (LSB), for multi-task learning (MTL) of pixelwise vision tasks. An MTL model is trained to estimate multiple pixelwise predictions using an overall loss, which is a linear combination of individual task losses. The proposed algorithm dynamically adjusts the linear weights to learn all tasks effectively. Instead of controlling the trend of each loss value directly, we balance the loss scale — the product of the loss value and its weight — periodically. In addition, by evaluating the difficulty of each task based on the previous loss record, the proposed algorithm focuses more on difficult tasks during training. Experimental results show that the proposed algorithm outperforms conventional weighting algorithms for MTL of various pixelwise tasks. Codes are available at https://github.com/jaehanlee-mcl/LSB-MTL.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"32 1","pages":"5087-5096"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75070997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Video Question Answering Using Language-Guided Deep Compressed-Domain Video Feature 基于语言引导的深度压缩域视频特征的视频问答
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00173
Nayoung Kim, S. Ha, Jewon Kang
Video Question Answering (Video QA) aims to give an answer to the question through semantic reasoning between visual and linguistic information. Recently, handling large amounts of multi-modal video and language information of a video is considered important in the industry. However, the current video QA models use deep features, suffered from significant computational complexity and insufficient representation capability both in training and testing. Existing features are extracted using pre-trained networks after all the frames are decoded, which is not always suitable for video QA tasks. In this paper, we develop a novel deep neural network to provide video QA features obtained from coded video bit-stream to reduce the complexity. The proposed network includes several dedicated deep modules to both the video QA and the video compression system, which is the first attempt at the video QA task. The proposed network is predominantly model-agnostic. It is integrated into the state-of-the-art networks for improved performance without any computationally expensive motion-related deep models. The experimental results demonstrate that the proposed network outperforms the previous studies at lower complexity. https://github.com/Nayoung-Kim-ICP/VQAC
视频问答(Video Question answer, Video QA)旨在通过视觉信息和语言信息之间的语义推理,给出问题的答案。近年来,处理大量的多模态视频和视频的语言信息被业界认为是很重要的。然而,目前的视频QA模型使用深度特征,在训练和测试中都存在计算复杂度和表示能力不足的问题。在所有帧解码后,使用预训练的网络提取现有特征,这并不总是适用于视频QA任务。在本文中,我们开发了一种新的深度神经网络来提供从编码视频比特流中获得的视频QA特征,以降低复杂度。该网络包含了多个专用于视频质量保证和视频压缩系统的深度模块,是视频质量保证任务的首次尝试。所提出的网络主要是模型不可知的。它被集成到最先进的网络中,以提高性能,而无需任何计算昂贵的运动相关深度模型。实验结果表明,该网络在较低的复杂度下优于前人的研究。https://github.com/Nayoung-Kim-ICP/VQAC
{"title":"Video Question Answering Using Language-Guided Deep Compressed-Domain Video Feature","authors":"Nayoung Kim, S. Ha, Jewon Kang","doi":"10.1109/ICCV48922.2021.00173","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00173","url":null,"abstract":"Video Question Answering (Video QA) aims to give an answer to the question through semantic reasoning between visual and linguistic information. Recently, handling large amounts of multi-modal video and language information of a video is considered important in the industry. However, the current video QA models use deep features, suffered from significant computational complexity and insufficient representation capability both in training and testing. Existing features are extracted using pre-trained networks after all the frames are decoded, which is not always suitable for video QA tasks. In this paper, we develop a novel deep neural network to provide video QA features obtained from coded video bit-stream to reduce the complexity. The proposed network includes several dedicated deep modules to both the video QA and the video compression system, which is the first attempt at the video QA task. The proposed network is predominantly model-agnostic. It is integrated into the state-of-the-art networks for improved performance without any computationally expensive motion-related deep models. The experimental results demonstrate that the proposed network outperforms the previous studies at lower complexity. https://github.com/Nayoung-Kim-ICP/VQAC","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"44 1","pages":"1688-1697"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74948769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Aggregation with Feature Detection 具有特征检测的聚合
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00057
Shuyang Sun, Xiaoyu Yue, Xiaojuan Qi, Wanli Ouyang, V. Prisacariu, Philip H. S. Torr
Aggregating features from different depths of a network is widely adopted to improve the network capability. Lots of modern architectures are equipped with skip connections, which actually makes the feature aggregation happen in all these networks. Since different features tell different semantic meanings, there are inconsistencies and incompatibilities to be solved. However, existing works naively blend deep features via element-wise summation or concatenation with a convolution behind. Better feature aggregation method beyond summation or concatenation is rarely explored. In this paper, given two layers of features to be aggregated together, we first detect and identify where and what needs to be updated in one layer, then replace the feature at the identified location with the information of the other layer This process, which we call DEtect-rePLAce (DEPLA), enables us to avoid inconsistent patterns while keeping useful information in the merged outputs. Experimental results demonstrate our method largely boosts multiple baselines e.g. ResNet, FishNet and FPN on three major vision tasks including ImageNet classification, MS COCO object detection and instance segmentation.
为了提高网络的性能,广泛采用了来自网络不同深度的聚合特性。许多现代体系结构都配备了跳跃连接,这实际上使得所有这些网络都发生了特征聚合。由于不同的特征表示不同的语义,因此存在不一致和不兼容的问题需要解决。然而,现有的作品通过元素的求和或串联以及背后的卷积来天真地混合深层特征。除了求和或连接之外,很少有更好的特征聚合方法被探索。在本文中,给定要聚合在一起的两层特征,我们首先检测和识别一层中需要更新的位置和内容,然后用另一层的信息替换识别位置的特征。这个过程,我们称之为detect - replace (DEPLA),使我们能够避免模式不一致,同时在合并的输出中保留有用的信息。实验结果表明,我们的方法在ImageNet分类、MS COCO目标检测和实例分割等三个主要视觉任务上大大提升了ResNet、FishNet和FPN等多个基线。
{"title":"Aggregation with Feature Detection","authors":"Shuyang Sun, Xiaoyu Yue, Xiaojuan Qi, Wanli Ouyang, V. Prisacariu, Philip H. S. Torr","doi":"10.1109/ICCV48922.2021.00057","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00057","url":null,"abstract":"Aggregating features from different depths of a network is widely adopted to improve the network capability. Lots of modern architectures are equipped with skip connections, which actually makes the feature aggregation happen in all these networks. Since different features tell different semantic meanings, there are inconsistencies and incompatibilities to be solved. However, existing works naively blend deep features via element-wise summation or concatenation with a convolution behind. Better feature aggregation method beyond summation or concatenation is rarely explored. In this paper, given two layers of features to be aggregated together, we first detect and identify where and what needs to be updated in one layer, then replace the feature at the identified location with the information of the other layer This process, which we call DEtect-rePLAce (DEPLA), enables us to avoid inconsistent patterns while keeping useful information in the merged outputs. Experimental results demonstrate our method largely boosts multiple baselines e.g. ResNet, FishNet and FPN on three major vision tasks including ImageNet classification, MS COCO object detection and instance segmentation.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"67 1","pages":"507-516"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74986947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Light Source Guided Single-Image Flare Removal from Unpaired Data 光源引导的单像耀斑去除从不成对的数据
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00414
Xiaotian Qiao, G. Hancke, Rynson W. H. Lau
Causally-taken images often suffer from flare artifacts, due to the unintended reflections and scattering of light inside the camera. However, as flares may appear in a variety of shapes, positions, and colors, detecting and removing them entirely from an image is very challenging. Existing methods rely on predefined intensity and geometry priors of flares, and may fail to distinguish the difference between light sources and flare artifacts. We observe that the conditions of the light source in the image play an important role in the resulting flares. In this paper, we present a deep framework with light source aware guidance for single-image flare removal (SIFR). In particular, we first detect the light source regions and the flare regions separately, and then remove the flare artifacts based on the light source aware guidance. By learning the underlying relationships between the two types of regions, our approach can remove different kinds of flares from the image. In addition, instead of using paired training data which are difficult to collect, we propose the first unpaired flare removal dataset and new cycle-consistency constraints to obtain more diverse examples and avoid manual annotations. Extensive experiments demonstrate that our method outperforms the baselines qualitatively and quantitatively. We also show that our model can be applied to flare effect manipulation (e.g., adding or changing image flares).
由于相机内部光线的意外反射和散射,随意拍摄的图像经常会出现闪光伪影。然而,由于耀斑可能以各种形状、位置和颜色出现,因此从图像中完全检测和去除它们是非常具有挑战性的。现有的方法依赖于预定义的强度和耀斑的几何先验,并且可能无法区分光源和耀斑伪影之间的差异。我们观察到,图像中光源的条件在产生耀斑中起着重要作用。本文提出了一种具有光源感知制导的深度框架,用于单图像耀斑去除(SIFR)。首先分别对光源区域和耀斑区域进行检测,然后基于光源感知制导去除耀斑伪影。通过了解两种区域之间的潜在关系,我们的方法可以从图像中去除不同类型的耀斑。此外,为了避免使用难以收集的成对训练数据,我们提出了第一个非成对的耀斑去除数据集和新的循环一致性约束,以获得更多样化的样本并避免人工注释。大量的实验表明,我们的方法在定性和定量上都优于基线。我们还表明,我们的模型可以应用于耀斑效果的操作(例如,添加或改变图像耀斑)。
{"title":"Light Source Guided Single-Image Flare Removal from Unpaired Data","authors":"Xiaotian Qiao, G. Hancke, Rynson W. H. Lau","doi":"10.1109/ICCV48922.2021.00414","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00414","url":null,"abstract":"Causally-taken images often suffer from flare artifacts, due to the unintended reflections and scattering of light inside the camera. However, as flares may appear in a variety of shapes, positions, and colors, detecting and removing them entirely from an image is very challenging. Existing methods rely on predefined intensity and geometry priors of flares, and may fail to distinguish the difference between light sources and flare artifacts. We observe that the conditions of the light source in the image play an important role in the resulting flares. In this paper, we present a deep framework with light source aware guidance for single-image flare removal (SIFR). In particular, we first detect the light source regions and the flare regions separately, and then remove the flare artifacts based on the light source aware guidance. By learning the underlying relationships between the two types of regions, our approach can remove different kinds of flares from the image. In addition, instead of using paired training data which are difficult to collect, we propose the first unpaired flare removal dataset and new cycle-consistency constraints to obtain more diverse examples and avoid manual annotations. Extensive experiments demonstrate that our method outperforms the baselines qualitatively and quantitatively. We also show that our model can be applied to flare effect manipulation (e.g., adding or changing image flares).","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"4157-4165"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72958918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
ECS-Net: Improving Weakly Supervised Semantic Segmentation by Using Connections Between Class Activation Maps ECS-Net:利用类激活图之间的连接改进弱监督语义分割
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00719
Kunyang Sun, Haoqing Shi, Zhengming Zhang, Yongming Huang
Image-level weakly supervised semantic segmentation is a challenging task. As classification networks tend to capture notable object features and are insensitive to over-activation, class activation map (CAM) is too sparse and rough to guide segmentation network training. Inspired by the fact that erasing distinguishing features force networks to collect new ones from non-discriminative object regions, we using relationships between CAMs to propose a novel weakly supervised method. In this work, we apply these features, learned from erased images, as segmentation super-vision, driving network to study robust representation. In specifically, object regions obtained by CAM techniques are erased on images firstly. To provide other regions with seg-mentation supervision, Erased CAM Supervision Net (ECS-Net) generates pixel-level labels by predicting segmentation results of those processed images. We also design the rule of suppressing noise to select reliable labels. Our experiments on PASCAL VOC 2012 dataset show that without data annotations except for ground truth image-level labels, our ECS-Net achieves 67.6% mIoU on test set and 66.6% mIoU on val set, outperforming previous state-of-the-art methods.
图像级弱监督语义分割是一项具有挑战性的任务。由于分类网络倾向于捕捉显著的目标特征,对过度激活不敏感,类激活图(class activation map, CAM)过于稀疏和粗糙,无法指导分割网络的训练。受消除区分特征迫使网络从非区分目标区域收集新特征这一事实的启发,我们利用cam之间的关系提出了一种新的弱监督方法。在这项工作中,我们将这些从擦除图像中学习到的特征作为分割监督,驱动网络来研究鲁棒表示。具体而言,首先在图像上擦除CAM技术获得的目标区域。为了向其他区域提供分割监督,擦除CAM监督网络(ECS-Net)通过预测这些处理后的图像的分割结果来生成像素级标签。我们还设计了抑制噪声的规则来选择可靠的标签。我们在PASCAL VOC 2012数据集上的实验表明,除了地面真实图像级标签之外,在没有数据注释的情况下,我们的ECS-Net在测试集上达到67.6%的mIoU,在val集上达到66.6%的mIoU,优于之前最先进的方法。
{"title":"ECS-Net: Improving Weakly Supervised Semantic Segmentation by Using Connections Between Class Activation Maps","authors":"Kunyang Sun, Haoqing Shi, Zhengming Zhang, Yongming Huang","doi":"10.1109/ICCV48922.2021.00719","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00719","url":null,"abstract":"Image-level weakly supervised semantic segmentation is a challenging task. As classification networks tend to capture notable object features and are insensitive to over-activation, class activation map (CAM) is too sparse and rough to guide segmentation network training. Inspired by the fact that erasing distinguishing features force networks to collect new ones from non-discriminative object regions, we using relationships between CAMs to propose a novel weakly supervised method. In this work, we apply these features, learned from erased images, as segmentation super-vision, driving network to study robust representation. In specifically, object regions obtained by CAM techniques are erased on images firstly. To provide other regions with seg-mentation supervision, Erased CAM Supervision Net (ECS-Net) generates pixel-level labels by predicting segmentation results of those processed images. We also design the rule of suppressing noise to select reliable labels. Our experiments on PASCAL VOC 2012 dataset show that without data annotations except for ground truth image-level labels, our ECS-Net achieves 67.6% mIoU on test set and 66.6% mIoU on val set, outperforming previous state-of-the-art methods.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"13 1","pages":"7263-7272"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73007672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 64
SemiHand: Semi-supervised Hand Pose Estimation with Consistency 半手:具有一致性的半监督手姿态估计
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01117
Linlin Yang, Shicheng Chen, Angela Yao
We present SemiHand, a semi-supervised framework for 3D hand pose estimation from monocular images. We pre-train the model on labelled synthetic data and fine-tune it on unlabelled real-world data by pseudo-labeling with consistency training. By design, we introduce data augmentation of differing difficulties, consistency regularizer, label correction and sample selection for RGB-based 3D hand pose estimation. In particular, by approximating the hand masks from hand poses, we propose a cross-modal consistency and leverage semantic predictions to guide the predicted poses. Meanwhile, we introduce pose registration as label correction to guarantee the biomechanical feasibility of hand bone lengths. Experiments show that our method achieves a favorable improvement on real-world datasets after fine-tuning.
我们提出了一个半监督框架,用于单目图像的三维手部姿态估计。我们在标记的合成数据上预训练模型,并通过一致性训练的伪标记对未标记的真实数据进行微调。通过设计,我们引入了基于rgb的三维手部姿态估计的不同难度的数据增强、一致性正则化器、标签校正和样本选择。特别是,通过手部姿势近似手部面具,我们提出了一种跨模态一致性,并利用语义预测来指导预测的姿势。同时引入位姿配准作为标签校正,保证了手骨长度的生物力学可行性。实验表明,经过微调后,我们的方法在真实数据集上取得了良好的改进。
{"title":"SemiHand: Semi-supervised Hand Pose Estimation with Consistency","authors":"Linlin Yang, Shicheng Chen, Angela Yao","doi":"10.1109/ICCV48922.2021.01117","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01117","url":null,"abstract":"We present SemiHand, a semi-supervised framework for 3D hand pose estimation from monocular images. We pre-train the model on labelled synthetic data and fine-tune it on unlabelled real-world data by pseudo-labeling with consistency training. By design, we introduce data augmentation of differing difficulties, consistency regularizer, label correction and sample selection for RGB-based 3D hand pose estimation. In particular, by approximating the hand masks from hand poses, we propose a cross-modal consistency and leverage semantic predictions to guide the predicted poses. Meanwhile, we introduce pose registration as label correction to guarantee the biomechanical feasibility of hand bone lengths. Experiments show that our method achieves a favorable improvement on real-world datasets after fine-tuning.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"15 1","pages":"11344-11353"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73281822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Polarimetric Helmholtz Stereopsis 偏光亥姆霍兹立体视
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00499
Yuqi Ding, Yu Ji, Mingyuan Zhou, S. B. Kang, Jinwei Ye
Helmholtz stereopsis (HS) exploits the reciprocity principle of light propagation (i.e., the Helmholtz reciprocity) for 3D reconstruction of surfaces with arbitrary reflectance. In this paper, we present the polarimetric Helmholtz stereopsis (polar-HS), which extends the classical HS by considering the polarization state of light in the reciprocal paths. With the additional phase information from polarization, polar-HS requires only one reciprocal image pair. We formulate new reciprocity and diffuse/specular polarimetric constraints to recover surface depths and normals using an optimization framework. Using a hardware prototype, we show that our approach produces high-quality 3D reconstruction for different types of surfaces, ranging from diffuse to highly specular.
亥姆霍兹立体视(HS)利用光传播的互易原理(即亥姆霍兹互易)对任意反射率的曲面进行三维重建。在本文中,我们提出了极化亥姆霍兹立体视(polar-HS),它通过考虑光在互易路径中的偏振状态来扩展经典亥姆霍兹立体视。加上偏振的相位信息,polar-HS只需要一个互反的图像对。我们制定了新的互易性和漫射/镜面极化约束,以恢复表面深度和法线使用优化框架。使用硬件原型,我们证明了我们的方法可以为不同类型的表面(从漫射到高度镜面)产生高质量的3D重建。
{"title":"Polarimetric Helmholtz Stereopsis","authors":"Yuqi Ding, Yu Ji, Mingyuan Zhou, S. B. Kang, Jinwei Ye","doi":"10.1109/ICCV48922.2021.00499","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00499","url":null,"abstract":"Helmholtz stereopsis (HS) exploits the reciprocity principle of light propagation (i.e., the Helmholtz reciprocity) for 3D reconstruction of surfaces with arbitrary reflectance. In this paper, we present the polarimetric Helmholtz stereopsis (polar-HS), which extends the classical HS by considering the polarization state of light in the reciprocal paths. With the additional phase information from polarization, polar-HS requires only one reciprocal image pair. We formulate new reciprocity and diffuse/specular polarimetric constraints to recover surface depths and normals using an optimization framework. Using a hardware prototype, we show that our approach produces high-quality 3D reconstruction for different types of surfaces, ranging from diffuse to highly specular.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"23 3","pages":"5017-5026"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72586627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Dynamic Context-Sensitive Filtering Network for Video Salient Object Detection 视频显著目标检测的动态上下文敏感滤波网络
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00158
Miaohui Zhang, Jie Liu, Yifei Wang, Yongri Piao, S. Yao, Wei Ji, Jingjing Li, Huchuan Lu, Zhongxuan Luo
The ability to capture inter-frame dynamics has been critical to the development of video salient object detection (VSOD). While many works have achieved great success in this field, a deeper insight into its dynamic nature should be developed. In this work, we aim to answer the following questions: How can a model adjust itself to dynamic variations as well as perceive fine differences in the real-world environment; How are the temporal dynamics well introduced into spatial information over time? To this end, we propose a dynamic context-sensitive filtering network (DCFNet) equipped with a dynamic context-sensitive filtering module (DCFM) and an effective bidirectional dynamic fusion strategy. The proposed DCFM sheds new light on dynamic filter generation by extracting location-related affinities between consecutive frames. Our bidirectional dynamic fusion strategy encourages the interaction of spatial and temporal information in a dynamic manner. Experimental results demonstrate that our proposed method can achieve state-of-the-art performance on most VSOD datasets while ensuring a real-time speed of 28 fps. The source code is publicly available at https://github.com/OIPLab-DUT/DCFNet.
捕获帧间动态的能力对于视频显著目标检测(VSOD)的发展至关重要。虽然许多作品在这一领域取得了巨大的成功,但应该对其动态性质进行更深入的了解。在这项工作中,我们的目标是回答以下问题:模型如何调整自身以适应动态变化,并感知现实世界环境中的细微差异;随着时间的推移,时间动态如何被很好地引入到空间信息中?为此,我们提出了一种配备动态上下文敏感过滤模块(DCFM)和有效的双向动态融合策略的动态上下文敏感过滤网络(DCFNet)。提出的DCFM通过提取连续帧之间的位置相关亲和力,为动态滤波器的生成提供了新的思路。我们的双向动态融合策略鼓励空间和时间信息以动态的方式相互作用。实验结果表明,我们提出的方法可以在大多数VSOD数据集上实现最先进的性能,同时确保28 fps的实时速度。源代码可在https://github.com/OIPLab-DUT/DCFNet上公开获得。
{"title":"Dynamic Context-Sensitive Filtering Network for Video Salient Object Detection","authors":"Miaohui Zhang, Jie Liu, Yifei Wang, Yongri Piao, S. Yao, Wei Ji, Jingjing Li, Huchuan Lu, Zhongxuan Luo","doi":"10.1109/ICCV48922.2021.00158","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00158","url":null,"abstract":"The ability to capture inter-frame dynamics has been critical to the development of video salient object detection (VSOD). While many works have achieved great success in this field, a deeper insight into its dynamic nature should be developed. In this work, we aim to answer the following questions: How can a model adjust itself to dynamic variations as well as perceive fine differences in the real-world environment; How are the temporal dynamics well introduced into spatial information over time? To this end, we propose a dynamic context-sensitive filtering network (DCFNet) equipped with a dynamic context-sensitive filtering module (DCFM) and an effective bidirectional dynamic fusion strategy. The proposed DCFM sheds new light on dynamic filter generation by extracting location-related affinities between consecutive frames. Our bidirectional dynamic fusion strategy encourages the interaction of spatial and temporal information in a dynamic manner. Experimental results demonstrate that our proposed method can achieve state-of-the-art performance on most VSOD datasets while ensuring a real-time speed of 28 fps. The source code is publicly available at https://github.com/OIPLab-DUT/DCFNet.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"59 1","pages":"1533-1543"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73677896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Differentiable Dynamic Wirings for Neural Networks 神经网络的可微动态连线
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00038
Kun Yuan, Quanquan Li, Shaopeng Guo, Dapeng Chen, Aojun Zhou, F. Yu, Ziwei Liu
A standard practice of deploying deep neural networks is to apply the same architecture to all the input instances. However, a fixed architecture may not be suitable for different data with high diversity. To boost the model capacity, existing methods usually employ larger convolutional kernels or deeper network layers, which incurs prohibitive computational costs. In this paper, we address this issue by proposing Differentiable Dynamic Wirings (DDW), which learns the instance-aware connectivity that creates different wiring patterns for different instances. 1) Specifically, the network is initialized as a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent the connection paths. 2) We generate edge weights by a learnable module, Router, and select the edges whose weights are larger than a threshold, to adjust the connectivity of the neural network structure. 3) Instead of using the same path of the network, DDW aggregates features dynamically in each node, which allows the network to have more representation power.To facilitate effective training, we further represent the network connectivity of each sample as an adjacency matrix. The matrix is updated to aggregate features in the forward pass, cached in the memory, and used for gradient computing in the backward pass. We validate the effectiveness of our approach with several mainstream architectures, including MobileNetV2, ResNet, ResNeXt, and RegNet. Extensive experiments are performed on ImageNet classification and COCO object detection, which demonstrates the effectiveness and generalization ability of our approach.
部署深度神经网络的标准做法是对所有输入实例应用相同的架构。但是,固定的架构可能不适合具有高多样性的不同数据。为了提高模型容量,现有的方法通常使用更大的卷积核或更深的网络层,这带来了令人望而却步的计算成本。在本文中,我们通过提出可微分动态连接(DDW)来解决这个问题,DDW学习实例感知的连接,为不同的实例创建不同的连接模式。1)具体来说,将网络初始化为一个完全有向无环图,其中节点表示卷积块,边表示连接路径。2)通过可学习模块Router生成边权值,选择权值大于阈值的边,调整神经网络结构的连通性。3) DDW不使用网络的相同路径,而是在每个节点上动态聚合特征,使网络具有更强的表示能力。为了便于有效训练,我们进一步将每个样本的网络连通性表示为邻接矩阵。矩阵被更新为聚合向前传递的特征,缓存在内存中,并用于向后传递的梯度计算。我们用几种主流架构验证了我们方法的有效性,包括MobileNetV2、ResNet、ResNeXt和RegNet。在ImageNet分类和COCO目标检测上进行了大量的实验,验证了该方法的有效性和泛化能力。
{"title":"Differentiable Dynamic Wirings for Neural Networks","authors":"Kun Yuan, Quanquan Li, Shaopeng Guo, Dapeng Chen, Aojun Zhou, F. Yu, Ziwei Liu","doi":"10.1109/ICCV48922.2021.00038","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00038","url":null,"abstract":"A standard practice of deploying deep neural networks is to apply the same architecture to all the input instances. However, a fixed architecture may not be suitable for different data with high diversity. To boost the model capacity, existing methods usually employ larger convolutional kernels or deeper network layers, which incurs prohibitive computational costs. In this paper, we address this issue by proposing Differentiable Dynamic Wirings (DDW), which learns the instance-aware connectivity that creates different wiring patterns for different instances. 1) Specifically, the network is initialized as a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent the connection paths. 2) We generate edge weights by a learnable module, Router, and select the edges whose weights are larger than a threshold, to adjust the connectivity of the neural network structure. 3) Instead of using the same path of the network, DDW aggregates features dynamically in each node, which allows the network to have more representation power.To facilitate effective training, we further represent the network connectivity of each sample as an adjacency matrix. The matrix is updated to aggregate features in the forward pass, cached in the memory, and used for gradient computing in the backward pass. We validate the effectiveness of our approach with several mainstream architectures, including MobileNetV2, ResNet, ResNeXt, and RegNet. Extensive experiments are performed on ImageNet classification and COCO object detection, which demonstrates the effectiveness and generalization ability of our approach.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"3 1","pages":"317-326"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73796221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Collaborative Optimization and Aggregation for Decentralized Domain Generalization and Adaptation 分散领域泛化与自适应的协同优化与聚合
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00642
Guile Wu, S. Gong
Contemporary domain generalization (DG) and multisource unsupervised domain adaptation (UDA) methods mostly collect data from multiple domains together for joint optimization. However, this centralized training paradigm poses a threat to data privacy and is not applicable when data are non-shared across domains. In this work, we propose a new approach called Collaborative Optimization and Aggregation (COPA), which aims at optimizing a generalized target model for decentralized DG and UDA, where data from different domains are non-shared and private. Our base model consists of a domain-invariant feature extractor and an ensemble of domain-specific classifiers. In an iterative learning process, we optimize a local model for each domain, and then centrally aggregate local feature extractors and assemble domain-specific classifiers to construct a generalized global model, without sharing data from different domains. To improve generalization of feature extractors, we employ hybrid batch-instance normalization and collaboration of frozen classifiers. For better decentralized UDA, we further introduce a prediction agreement mechanism to overcome local disparities towards central model aggregation. Extensive experiments on five DG and UDA benchmark datasets show that COPA is capable of achieving comparable performance against the state-of-the-art DG and UDA methods without the need for centralized data collection in model training.
现有的域泛化(DG)和多源无监督域自适应(UDA)方法多是将多个域的数据集合在一起进行联合优化。然而,这种集中式训练范式对数据隐私构成了威胁,并且不适用于数据跨域非共享的情况。在这项工作中,我们提出了一种称为协同优化和聚合(COPA)的新方法,旨在优化分散DG和UDA的广义目标模型,其中来自不同领域的数据是非共享和私有的。我们的基本模型由一个领域不变的特征提取器和一个领域特定分类器的集合组成。在迭代学习过程中,我们为每个领域优化一个局部模型,然后集中聚集局部特征提取器和组装特定于领域的分类器来构建一个广义的全局模型,而不共享来自不同领域的数据。为了提高特征提取器的泛化,我们采用了混合批处理实例规范化和冻结分类器的协作。为了更好地实现去中心化UDA,我们进一步引入了一种预测协议机制,以克服向中心模型聚集的局部差异。在五个DG和UDA基准数据集上进行的大量实验表明,COPA能够实现与最先进的DG和UDA方法相当的性能,而无需在模型训练中集中收集数据。
{"title":"Collaborative Optimization and Aggregation for Decentralized Domain Generalization and Adaptation","authors":"Guile Wu, S. Gong","doi":"10.1109/ICCV48922.2021.00642","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00642","url":null,"abstract":"Contemporary domain generalization (DG) and multisource unsupervised domain adaptation (UDA) methods mostly collect data from multiple domains together for joint optimization. However, this centralized training paradigm poses a threat to data privacy and is not applicable when data are non-shared across domains. In this work, we propose a new approach called Collaborative Optimization and Aggregation (COPA), which aims at optimizing a generalized target model for decentralized DG and UDA, where data from different domains are non-shared and private. Our base model consists of a domain-invariant feature extractor and an ensemble of domain-specific classifiers. In an iterative learning process, we optimize a local model for each domain, and then centrally aggregate local feature extractors and assemble domain-specific classifiers to construct a generalized global model, without sharing data from different domains. To improve generalization of feature extractors, we employ hybrid batch-instance normalization and collaboration of frozen classifiers. For better decentralized UDA, we further introduce a prediction agreement mechanism to overcome local disparities towards central model aggregation. Extensive experiments on five DG and UDA benchmark datasets show that COPA is capable of achieving comparable performance against the state-of-the-art DG and UDA methods without the need for centralized data collection in model training.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"30 1","pages":"6464-6473"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73940799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
期刊
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1