首页 > 最新文献

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
Co-Net: A Collaborative Region-Contour-Driven Network for Fine-to-Finer Medical Image Segmentation 协同网络:用于精细到精细医学图像分割的协同区域轮廓驱动网络
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00177
Anran Liu, Xiangsheng Huang, Tong Li, Pengcheng Ma
In this paper, a fine-to-finer segmentation task is investigated driven by region and contour features collaboratively on Glomerular Electron-Dense Deposits (GEDD) in view of the complementary nature of these two types of features. To this end, a novel network (Co-Net) is presented to dynamically use fine saliency segmentation to guide finer segmentation on boundaries. The whole architecture contains double mutually boosted decoders sharing one common encoder. Specifically, a new structure named Global-guided Interaction Module (GIM) is designed to effectively control the information flow and reduce redundancy in the cross-level feature fusion process. At the same time, the global features are used in it to make the features of each layer gain access to richer context, and a fine segmentation map is obtained initially; Discontinuous Boundary Supervision (DBS) strategy is applied to pay more attention to discontinuity positions and modifying segmentation errors on boundaries. At last, Selective Kernel (SK) is used for dynamical aggregation of the region and contour features to obtain a finer segmentation. Our proposed approach is evaluated on an independent GEDD dataset labeled by pathologists and also on open polyp datasets to test the generalization. Ablation studies show the effectiveness of different modules. On all datasets, our proposal achieves high segmentation accuracy and surpasses previous methods.
鉴于肾小球电子致密沉积物(Glomerular Electron-Dense Deposits, GEDD)两类特征的互补性,本文研究了区域特征和轮廓特征协同驱动的精细到精细的分割任务。为此,提出了一种新颖的网络(Co-Net),动态地使用精细显著性分割来指导边界上的精细分割。整个架构包含两个相互增强的解码器,共享一个公共编码器。具体而言,设计了全局引导交互模块(Global-guided Interaction Module, GIM)结构,以有效控制信息流动,减少跨层特征融合过程中的冗余。同时,利用全局特征使每一层的特征获得更丰富的上下文,初步得到精细的分割图;采用不连续边界监督(DBS)策略,更加关注不连续位置并修正边界上的分割误差。最后,利用选择性核(SK)对区域和轮廓特征进行动态聚合,得到更精细的分割结果。我们提出的方法在由病理学家标记的独立GEDD数据集和开放息肉数据集上进行了评估,以测试其泛化性。烧蚀研究表明了不同模块的有效性。在所有的数据集上,我们的方法都达到了很高的分割精度,并且超越了以前的方法。
{"title":"Co-Net: A Collaborative Region-Contour-Driven Network for Fine-to-Finer Medical Image Segmentation","authors":"Anran Liu, Xiangsheng Huang, Tong Li, Pengcheng Ma","doi":"10.1109/WACV51458.2022.00177","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00177","url":null,"abstract":"In this paper, a fine-to-finer segmentation task is investigated driven by region and contour features collaboratively on Glomerular Electron-Dense Deposits (GEDD) in view of the complementary nature of these two types of features. To this end, a novel network (Co-Net) is presented to dynamically use fine saliency segmentation to guide finer segmentation on boundaries. The whole architecture contains double mutually boosted decoders sharing one common encoder. Specifically, a new structure named Global-guided Interaction Module (GIM) is designed to effectively control the information flow and reduce redundancy in the cross-level feature fusion process. At the same time, the global features are used in it to make the features of each layer gain access to richer context, and a fine segmentation map is obtained initially; Discontinuous Boundary Supervision (DBS) strategy is applied to pay more attention to discontinuity positions and modifying segmentation errors on boundaries. At last, Selective Kernel (SK) is used for dynamical aggregation of the region and contour features to obtain a finer segmentation. Our proposed approach is evaluated on an independent GEDD dataset labeled by pathologists and also on open polyp datasets to test the generalization. Ablation studies show the effectiveness of different modules. On all datasets, our proposal achieves high segmentation accuracy and surpasses previous methods.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123010863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Modeling Aleatoric Uncertainty for Camouflaged Object Detection 伪装目标检测的任意不确定性建模
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00267
Jiawei Liu, Jing Zhang, N. Barnes
Aleatoric uncertainty captures noise within the observations. For camouflaged object detection, due to similar appearance of the camouflaged foreground and the back-ground, it’s difficult to obtain highly accurate annotations, especially annotations around object boundaries. We argue that training directly with the "noisy" camouflage map may lead to a model of poor generalization ability. In this paper, we introduce an explicitly aleatoric uncertainty estimation technique to represent predictive uncertainty due to noisy labeling. Specifically, we present a confidence-aware camouflaged object detection (COD) framework using dynamic supervision to produce both an accurate camouflage map and a reliable "aleatoric uncertainty". Different from existing techniques that produce deterministic prediction following the point estimation pipeline, our framework formalises aleatoric uncertainty as probability distribution over model output and the input image. We claim that, once trained, our confidence estimation network can evaluate the pixel-wise accuracy of the prediction without relying on the ground truth camouflage map. Extensive results illustrate the superior performance of the proposed model in explaining the camouflage prediction. Our codes are available at https://github.com/Carlisle-Liu/OCENet
任意不确定性捕获了观测中的噪声。对于伪装目标检测,由于伪装后的前景和背景外观相似,很难获得高精度的标注,特别是物体边界周围的标注。我们认为直接使用“噪声”伪装图进行训练可能会导致模型泛化能力差。在本文中,我们引入了一种显式任意不确定性估计技术来表示由噪声标记引起的预测不确定性。具体来说,我们提出了一个使用动态监督的置信度感知伪装目标检测(COD)框架,以产生准确的伪装地图和可靠的“任意不确定性”。与现有的根据点估计管道产生确定性预测的技术不同,我们的框架将任意不确定性形式化为模型输出和输入图像的概率分布。我们声称,一旦训练,我们的置信度估计网络可以评估预测的像素精度,而不依赖于地面真实伪装图。大量的结果表明,该模型在解释伪装预测方面具有优越的性能。我们的代码可在https://github.com/Carlisle-Liu/OCENet上获得
{"title":"Modeling Aleatoric Uncertainty for Camouflaged Object Detection","authors":"Jiawei Liu, Jing Zhang, N. Barnes","doi":"10.1109/WACV51458.2022.00267","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00267","url":null,"abstract":"Aleatoric uncertainty captures noise within the observations. For camouflaged object detection, due to similar appearance of the camouflaged foreground and the back-ground, it’s difficult to obtain highly accurate annotations, especially annotations around object boundaries. We argue that training directly with the \"noisy\" camouflage map may lead to a model of poor generalization ability. In this paper, we introduce an explicitly aleatoric uncertainty estimation technique to represent predictive uncertainty due to noisy labeling. Specifically, we present a confidence-aware camouflaged object detection (COD) framework using dynamic supervision to produce both an accurate camouflage map and a reliable \"aleatoric uncertainty\". Different from existing techniques that produce deterministic prediction following the point estimation pipeline, our framework formalises aleatoric uncertainty as probability distribution over model output and the input image. We claim that, once trained, our confidence estimation network can evaluate the pixel-wise accuracy of the prediction without relying on the ground truth camouflage map. Extensive results illustrate the superior performance of the proposed model in explaining the camouflage prediction. Our codes are available at https://github.com/Carlisle-Liu/OCENet","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121240749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
PICA: Point-wise Instance and Centroid Alignment Based Few-shot Domain Adaptive Object Detection with Loose Annotations PICA:基于点向实例和质心对齐的少镜头域自适应目标检测
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00047
Chaoliang Zhong, Jiexi Wang, Chengang Feng, Ying Zhang, Jun Sun, Yasuto Yokota
In this work, we focus on supervised domain adaptation for object detection in few-shot loose annotation setting, where the source images are sufficient and fully labeled but the target images are few-shot and loosely annotated. As annotated objects exist in the target domain, instance level alignment can be utilized to improve the performance. Traditional methods conduct the instance level alignment by semantically aligning the distributions of paired object features with domain adversarial training. Although it is demonstrated that point-wise surrogates of distribution alignment provide a more effective solution in few-shot classification tasks across domains, this point-wise alignment approach has not yet been extended to object detection. In this work, we propose a method that extends the point-wise alignment from classification to object detection. Moreover, in the few-shot loose annotation setting, the background ROIs of target domain suffer from severe label noise problem, which may make the point-wise alignment fail. To this end, we exploit moving average centroids to mitigate the label noise problem of background ROIs. Meanwhile, we exploit point-wise alignment over instances and centroids to tackle the problem of scarcity of labeled target instances. Hence this method is not only robust against label noises of background ROIs but also robust against the scarcity of labeled target objects. Experimental results show that the proposed instance level alignment method brings significant improvement compared with the baseline and is superior to state-of-the-art methods.
在这项工作中,我们重点研究了在少量松散注释设置下的监督域自适应目标检测,其中源图像足够且完全标记,而目标图像是少量且松散注释。由于带注释的对象存在于目标域中,因此可以利用实例级对齐来提高性能。传统方法通过领域对抗训练对配对对象特征的分布进行语义对齐,从而实现实例级对齐。虽然已经证明,点向分布对齐的替代方法在跨域的少量分类任务中提供了更有效的解决方案,但这种点向对齐方法尚未扩展到目标检测中。在这项工作中,我们提出了一种将逐点对齐从分类扩展到目标检测的方法。此外,在少量松散标注设置下,目标域背景roi存在严重的标签噪声问题,可能导致逐点对齐失败。为此,我们利用移动平均质心来缓解背景roi的标记噪声问题。同时,我们利用实例和质心的逐点对齐来解决标记目标实例的稀缺性问题。因此,该方法不仅对背景roi的标签噪声具有鲁棒性,而且对标记目标的稀缺性具有鲁棒性。实验结果表明,所提出的实例级对齐方法与基线相比有显著的改进,优于现有的方法。
{"title":"PICA: Point-wise Instance and Centroid Alignment Based Few-shot Domain Adaptive Object Detection with Loose Annotations","authors":"Chaoliang Zhong, Jiexi Wang, Chengang Feng, Ying Zhang, Jun Sun, Yasuto Yokota","doi":"10.1109/WACV51458.2022.00047","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00047","url":null,"abstract":"In this work, we focus on supervised domain adaptation for object detection in few-shot loose annotation setting, where the source images are sufficient and fully labeled but the target images are few-shot and loosely annotated. As annotated objects exist in the target domain, instance level alignment can be utilized to improve the performance. Traditional methods conduct the instance level alignment by semantically aligning the distributions of paired object features with domain adversarial training. Although it is demonstrated that point-wise surrogates of distribution alignment provide a more effective solution in few-shot classification tasks across domains, this point-wise alignment approach has not yet been extended to object detection. In this work, we propose a method that extends the point-wise alignment from classification to object detection. Moreover, in the few-shot loose annotation setting, the background ROIs of target domain suffer from severe label noise problem, which may make the point-wise alignment fail. To this end, we exploit moving average centroids to mitigate the label noise problem of background ROIs. Meanwhile, we exploit point-wise alignment over instances and centroids to tackle the problem of scarcity of labeled target instances. Hence this method is not only robust against label noises of background ROIs but also robust against the scarcity of labeled target objects. Experimental results show that the proposed instance level alignment method brings significant improvement compared with the baseline and is superior to state-of-the-art methods.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133433044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Channel Pruning via Lookahead Search Guided Reinforcement Learning 基于前瞻搜索引导强化学习的信道修剪
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00357
Z. Wang, Chengcheng Li
Channel pruning has become an effective yet still challenging approach to achieve compact neural networks. It aims to prune the optimal set of filters whose removal results in minimal performance degradation of the slimmed network. Due to the prohibitively vast search space of filter combinations, existing approaches usually use various criteria to estimate the filter importance while sacrificing some precision. Here we present a new approach to optimizing the filter selection in channel pruning with lookahead search guided reinforcement learning (RL). A neural network that takes as input filterrelated features is trained with RL to prune the optimal sequence of filters and maximize the performance of the remaining network. In addition, we employ Monte Carlo tree search (MCTS) to provide a lookahead search for filter selection, which increases the sample efficiency for the RL training. Experiments on MNIST, CIFAR-10, and ILSVRC-2012 validate the effectiveness of our approach compared to both traditional and automated existing channel pruning approaches.
信道修剪已成为实现紧凑神经网络的有效方法,但仍具有挑战性。它的目的是修剪一组最优的过滤器,这些过滤器的删除导致精简网络的性能下降最小。由于过滤器组合的搜索空间非常大,现有的方法通常使用各种标准来估计过滤器的重要性,同时牺牲一些精度。本文提出了一种基于前瞻搜索引导强化学习(RL)的通道剪枝滤波器选择优化方法。将输入滤波器相关特征作为神经网络,用强化学习进行训练,以修剪最优的滤波器序列,并使剩余网络的性能最大化。此外,我们采用蒙特卡罗树搜索(MCTS)为过滤器选择提供前瞻性搜索,从而提高了强化学习训练的样本效率。在MNIST、CIFAR-10和ILSVRC-2012上的实验验证了我们的方法与传统和自动化现有信道修剪方法相比的有效性。
{"title":"Channel Pruning via Lookahead Search Guided Reinforcement Learning","authors":"Z. Wang, Chengcheng Li","doi":"10.1109/WACV51458.2022.00357","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00357","url":null,"abstract":"Channel pruning has become an effective yet still challenging approach to achieve compact neural networks. It aims to prune the optimal set of filters whose removal results in minimal performance degradation of the slimmed network. Due to the prohibitively vast search space of filter combinations, existing approaches usually use various criteria to estimate the filter importance while sacrificing some precision. Here we present a new approach to optimizing the filter selection in channel pruning with lookahead search guided reinforcement learning (RL). A neural network that takes as input filterrelated features is trained with RL to prune the optimal sequence of filters and maximize the performance of the remaining network. In addition, we employ Monte Carlo tree search (MCTS) to provide a lookahead search for filter selection, which increases the sample efficiency for the RL training. Experiments on MNIST, CIFAR-10, and ILSVRC-2012 validate the effectiveness of our approach compared to both traditional and automated existing channel pruning approaches.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133588371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Learned Event-based Visual Perception for Improved Space Object Detection 基于学习事件的视觉感知改进空间目标检测
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00336
Nikolaus Salvatore, Justin Fletcher
The detection of dim artificial Earth satellites using ground-based electro-optical sensors, particularly in the presence of background light, is technologically challenging. This perceptual task is foundational to our understanding of the space environment, and grows in importance as the number, variety, and dynamism of space objects increases. We present a hybrid image- and event-based architecture that leverages dynamic vision sensing technology to detect resident space objects in geosynchronous Earth orbit. Given the asynchronous, one-dimensional image data supplied by a dynamic vision sensor, our architecture applies conventional image feature extractors to integrated, two-dimensional frames in conjunction with point-cloud feature extractors, such as PointNet, in order to increase detection performance for dim objects in scenes with high background activity. In addition, an end-to-end event-based imaging simulator is developed to both produce data for model training as well as approximate the optimal sensor parameters for event-based sensing in the context of electrooptical telescope imagery. Experimental results confirm that the inclusion of point-cloud feature extractors increases recall for dim objects in the high-background regime.
利用地面光电传感器探测昏暗的人造地球卫星,特别是在有背景光的情况下,在技术上具有挑战性。这种感知任务是我们理解空间环境的基础,并且随着空间物体的数量、种类和动态性的增加而变得越来越重要。我们提出了一种基于图像和事件的混合架构,该架构利用动态视觉传感技术来检测地球同步轨道上的驻留空间物体。考虑到动态视觉传感器提供的异步一维图像数据,我们的架构将传统的图像特征提取器与点云特征提取器(如PointNet)一起应用于集成的二维帧,以提高对高背景活动场景中暗淡物体的检测性能。此外,开发了端到端基于事件的成像模拟器,既可以为模型训练生成数据,也可以在光电望远镜成像的背景下为基于事件的传感近似获得最佳传感器参数。实验结果证实,点云特征提取器的加入提高了高背景条件下模糊目标的召回率。
{"title":"Learned Event-based Visual Perception for Improved Space Object Detection","authors":"Nikolaus Salvatore, Justin Fletcher","doi":"10.1109/WACV51458.2022.00336","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00336","url":null,"abstract":"The detection of dim artificial Earth satellites using ground-based electro-optical sensors, particularly in the presence of background light, is technologically challenging. This perceptual task is foundational to our understanding of the space environment, and grows in importance as the number, variety, and dynamism of space objects increases. We present a hybrid image- and event-based architecture that leverages dynamic vision sensing technology to detect resident space objects in geosynchronous Earth orbit. Given the asynchronous, one-dimensional image data supplied by a dynamic vision sensor, our architecture applies conventional image feature extractors to integrated, two-dimensional frames in conjunction with point-cloud feature extractors, such as PointNet, in order to increase detection performance for dim objects in scenes with high background activity. In addition, an end-to-end event-based imaging simulator is developed to both produce data for model training as well as approximate the optimal sensor parameters for event-based sensing in the context of electrooptical telescope imagery. Experimental results confirm that the inclusion of point-cloud feature extractors increases recall for dim objects in the high-background regime.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133694593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Inpaint2Learn: A Self-Supervised Framework for Affordance Learning Inpaint2Learn:一个功能学习的自我监督框架
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00383
Lingzhi Zhang, Weiyu Du, Shenghao Zhou, Jiancong Wang, Jianbo Shi
Perceiving affordances –the opportunities of interaction in a scene, is a fundamental ability of humans. It is an equally important skill for AI agents and robots to better understand and interact with the world. However, labeling affordances in the environment is not a trivial task. To address this issue, we propose a task-agnostic framework, named Inpaint2Learn, that generates affordance labels in a fully automatic manner and opens the door for affordance learning in the wild. To demonstrate its effectiveness, we apply it to three different tasks: human affordance prediction, Location2Object and 6D object pose hallucination. Our experiments and user studies show that our models, trained with the Inpaint2Learn scaffold, are able to generate diverse and visually plausible results in all three scenarios.
感知能力——场景中交互的机会,是人类的一项基本能力。对于人工智能代理和机器人来说,更好地理解世界并与之互动是一项同样重要的技能。然而,在环境中标记可得性并不是一项简单的任务。为了解决这个问题,我们提出了一个名为Inpaint2Learn的任务不可知框架,它以全自动的方式生成可视性标签,为在自然环境中进行可视性学习打开了大门。为了证明其有效性,我们将其应用于三个不同的任务:人类功能预测、Location2Object和6D物体姿势幻觉。我们的实验和用户研究表明,使用Inpaint2Learn支架训练的模型能够在所有三种场景中生成多样化且视觉上可信的结果。
{"title":"Inpaint2Learn: A Self-Supervised Framework for Affordance Learning","authors":"Lingzhi Zhang, Weiyu Du, Shenghao Zhou, Jiancong Wang, Jianbo Shi","doi":"10.1109/WACV51458.2022.00383","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00383","url":null,"abstract":"Perceiving affordances –the opportunities of interaction in a scene, is a fundamental ability of humans. It is an equally important skill for AI agents and robots to better understand and interact with the world. However, labeling affordances in the environment is not a trivial task. To address this issue, we propose a task-agnostic framework, named Inpaint2Learn, that generates affordance labels in a fully automatic manner and opens the door for affordance learning in the wild. To demonstrate its effectiveness, we apply it to three different tasks: human affordance prediction, Location2Object and 6D object pose hallucination. Our experiments and user studies show that our models, trained with the Inpaint2Learn scaffold, are able to generate diverse and visually plausible results in all three scenarios.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114305989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
FalCon: Fine-grained Feature Map Sparsity Computing with Decomposed Convolutions for Inference Optimization 基于分解卷积的细粒度特征映射稀疏计算用于推理优化
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00369
Zirui Xu, Fuxun Yu, Chenxi Liu, Zhe Wu, Hongcheng Wang, Xiang Chen
Many works focus on the model’s static parameter optimization (e.g., filters and weights) for CNN inference acceleration. Compared to parameter sparsity, feature map sparsity is per-input related which has better adaptability. The practical sparsity patterns are non-structural and randomly located on feature maps with non-identical shapes. However, the existing feature map sparsity works take computing efficiency as the primary goal, thereby they can only remove structural sparsity and fail to match the above characteristics. In this paper, we develop a novel sparsity computing scheme called FalCon, which can well adapt to the practical sparsity patterns while still maintaining efficient computing. Specifically, we first propose a decomposed convolution design that enables a fine-grained computing unit for sparsity. Additionally, a decomposed convolution computing optimization paradigm is proposed to convert the sparse computing units to practical acceleration. Extensive experiments show that FalCon achieves at most 67.30% theoretical computation reduction with a neglected accuracy drop while accelerating CNN inference by 37%.
许多工作都集中在CNN推理加速模型的静态参数优化(如滤波器和权重)上。与参数稀疏性相比,特征映射稀疏性是与每个输入相关的,具有更好的适应性。实用的稀疏模式是非结构性的,随机分布在具有不同形状的特征映射上。然而,现有的特征映射稀疏性工作以计算效率为主要目标,只能去除结构稀疏性,无法匹配上述特征。在本文中,我们开发了一种新的稀疏计算方案FalCon,它可以很好地适应实际的稀疏模式,同时保持高效的计算。具体来说,我们首先提出了一种分解卷积设计,它可以实现细粒度计算单元的稀疏性。此外,提出了一种分解卷积计算优化范式,将稀疏计算单元转化为实际加速。大量实验表明,FalCon在忽略准确率下降的情况下,最多可实现67.30%的理论计算减少,同时将CNN推理加速37%。
{"title":"FalCon: Fine-grained Feature Map Sparsity Computing with Decomposed Convolutions for Inference Optimization","authors":"Zirui Xu, Fuxun Yu, Chenxi Liu, Zhe Wu, Hongcheng Wang, Xiang Chen","doi":"10.1109/WACV51458.2022.00369","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00369","url":null,"abstract":"Many works focus on the model’s static parameter optimization (e.g., filters and weights) for CNN inference acceleration. Compared to parameter sparsity, feature map sparsity is per-input related which has better adaptability. The practical sparsity patterns are non-structural and randomly located on feature maps with non-identical shapes. However, the existing feature map sparsity works take computing efficiency as the primary goal, thereby they can only remove structural sparsity and fail to match the above characteristics. In this paper, we develop a novel sparsity computing scheme called FalCon, which can well adapt to the practical sparsity patterns while still maintaining efficient computing. Specifically, we first propose a decomposed convolution design that enables a fine-grained computing unit for sparsity. Additionally, a decomposed convolution computing optimization paradigm is proposed to convert the sparse computing units to practical acceleration. Extensive experiments show that FalCon achieves at most 67.30% theoretical computation reduction with a neglected accuracy drop while accelerating CNN inference by 37%.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"354 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115928335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
C-VTON: Context-Driven Image-Based Virtual Try-On Network C-VTON:上下文驱动的基于图像的虚拟试戴网络
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00226
Benjamin Fele, Ajda Lampe, P. Peer, Vitomir Štruc
Image-based virtual try-on techniques have shown great promise for enhancing the user-experience and improving customer satisfaction on fashion-oriented e-commerce platforms. However, existing techniques are currently still limited in the quality of the try-on results they are able to produce from input images of diverse characteristics. In this work, we propose a Context-Driven Virtual Try-On Network (C-VTON) that addresses these limitations and convincingly transfers selected clothing items to the target subjects even under challenging pose configurations and in the presence of self-occlusions. At the core of the C-VTON pipeline are: (i) a geometric matching procedure that efficiently aligns the target clothing with the pose of the person in the input images, and (ii) a powerful image generator that utilizes various types of contextual information when synthesizing the final try-on result. C-VTON is evaluated in rigorous experiments on the VITON and MPV datasets and in comparison to state-of-the-art techniques from the literature. Experimental results show that the proposed approach is able to produce photo-realistic and visually convincing results and significantly improves on the existing state-of-the-art.
在面向时尚的电子商务平台上,基于图像的虚拟试戴技术在增强用户体验和提高客户满意度方面表现出了巨大的希望。然而,现有的技术目前仍然局限于从不同特征的输入图像中产生的试戴结果的质量。在这项工作中,我们提出了一个情境驱动的虚拟试穿网络(C-VTON),它解决了这些限制,即使在具有挑战性的姿势配置和存在自我遮挡的情况下,也能令人信服地将选定的服装转移到目标受试者身上。C-VTON管道的核心是:(i)几何匹配程序,有效地将目标服装与输入图像中的人的姿势对齐,以及(ii)强大的图像生成器,在合成最终试穿结果时利用各种类型的上下文信息。C-VTON在VITON和MPV数据集的严格实验中进行评估,并与文献中最先进的技术进行比较。实验结果表明,该方法能够产生逼真的视觉效果,在现有技术的基础上有了显著的改进。
{"title":"C-VTON: Context-Driven Image-Based Virtual Try-On Network","authors":"Benjamin Fele, Ajda Lampe, P. Peer, Vitomir Štruc","doi":"10.1109/WACV51458.2022.00226","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00226","url":null,"abstract":"Image-based virtual try-on techniques have shown great promise for enhancing the user-experience and improving customer satisfaction on fashion-oriented e-commerce platforms. However, existing techniques are currently still limited in the quality of the try-on results they are able to produce from input images of diverse characteristics. In this work, we propose a Context-Driven Virtual Try-On Network (C-VTON) that addresses these limitations and convincingly transfers selected clothing items to the target subjects even under challenging pose configurations and in the presence of self-occlusions. At the core of the C-VTON pipeline are: (i) a geometric matching procedure that efficiently aligns the target clothing with the pose of the person in the input images, and (ii) a powerful image generator that utilizes various types of contextual information when synthesizing the final try-on result. C-VTON is evaluated in rigorous experiments on the VITON and MPV datasets and in comparison to state-of-the-art techniques from the literature. Experimental results show that the proposed approach is able to produce photo-realistic and visually convincing results and significantly improves on the existing state-of-the-art.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122555908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
3DRefTransformer: Fine-Grained Object Identification in Real-World Scenes Using Natural Language 3DRefTransformer:在现实世界场景中使用自然语言的细粒度对象识别
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00068
Ahmed Abdelreheem, Ujjwal Upadhyay, Ivan Skorokhodov, Rawan Al Yahya, Jun Chen, Mohamed Elhoseiny
In this paper, we study fine-grained 3D object identification in real-world scenes described by a textual query. The task aims to discriminatively understand an instance of a particular 3D object described by natural language utterances among other instances of 3D objects of the same class appearing in a visual scene. We introduce the 3DRefTransformer net, a transformer-based neural network that identifies 3D objects described by linguistic utterances in real-world scenes. The network’s input is 3D object segmented point cloud images representing a real-world scene and a language utterance that refers to one of the scene objects. The goal is to identify the referred object. Compared to the state-of-the-art models that are mostly based on graph convolutions and LSTMs, our 3DRefTrans-former net offers two key advantages. First, it is an end-to-end transformer model that operates both on language and 3D visual objects. Second, it has a natural ability to ground textual terms in the utterance to the learning representation of 3D objects in the scene. We further incorporate object pairwise spatial relation loss and contrastive learning during model training. We show in our experiments that our model improves the performance upon the current SOTA significantly on Referit3D Nr3D and Sr3D datasets. Code and Models will be made publicly available at https://vision-cair.github.io/3dreftransformer/.
在本文中,我们研究了用文本查询描述的真实场景中的细粒度三维物体识别。该任务旨在区分地理解由自然语言话语描述的特定3D对象的实例,以及在视觉场景中出现的同类3D对象的其他实例。我们介绍了3DRefTransformer网络,这是一个基于变压器的神经网络,可以识别现实世界场景中由语言话语描述的3D物体。该网络的输入是代表现实世界场景的3D对象分割点云图像,以及指向场景对象之一的语言话语。目标是识别被引用的对象。与主要基于图卷积和lstm的最先进的模型相比,我们的3dreftransformer网络具有两个关键优势。首先,它是一个端到端转换器模型,可以同时操作语言和3D可视化对象。其次,它具有将话语中的文本术语与场景中3D物体的学习表示联系起来的天然能力。在模型训练过程中,我们进一步将对象两两空间关系损失和对比学习结合起来。我们的实验表明,我们的模型在refit3d Nr3D和Sr3D数据集上显著提高了当前SOTA的性能。代码和模型将在https://vision-cair.github.io/3dreftransformer/上公开提供。
{"title":"3DRefTransformer: Fine-Grained Object Identification in Real-World Scenes Using Natural Language","authors":"Ahmed Abdelreheem, Ujjwal Upadhyay, Ivan Skorokhodov, Rawan Al Yahya, Jun Chen, Mohamed Elhoseiny","doi":"10.1109/WACV51458.2022.00068","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00068","url":null,"abstract":"In this paper, we study fine-grained 3D object identification in real-world scenes described by a textual query. The task aims to discriminatively understand an instance of a particular 3D object described by natural language utterances among other instances of 3D objects of the same class appearing in a visual scene. We introduce the 3DRefTransformer net, a transformer-based neural network that identifies 3D objects described by linguistic utterances in real-world scenes. The network’s input is 3D object segmented point cloud images representing a real-world scene and a language utterance that refers to one of the scene objects. The goal is to identify the referred object. Compared to the state-of-the-art models that are mostly based on graph convolutions and LSTMs, our 3DRefTrans-former net offers two key advantages. First, it is an end-to-end transformer model that operates both on language and 3D visual objects. Second, it has a natural ability to ground textual terms in the utterance to the learning representation of 3D objects in the scene. We further incorporate object pairwise spatial relation loss and contrastive learning during model training. We show in our experiments that our model improves the performance upon the current SOTA significantly on Referit3D Nr3D and Sr3D datasets. Code and Models will be made publicly available at https://vision-cair.github.io/3dreftransformer/.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125647411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Leveraging Test-Time Consensus Prediction for Robustness against Unseen Noise 利用测试时间一致性预测对不可见噪声的鲁棒性
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00362
Anindya Sarkar, Anirban Sarkar, V. Balasubramanian
We propose a method to improve DNN robustness against unseen noisy corruptions, such as Gaussian noise, Shot Noise, Impulse Noise, Speckle noise with different levels of severity by leveraging ensemble technique through a consensus based prediction method using self-supervised learning at inference time. We also propose to enhance the model training by considering other aspects of the issue i.e. noise in data and better representation learning which shows even better generalization performance with the consensus based prediction strategy. We report results of each noisy corruption on the standard CIFAR10-C and ImageNet-C benchmark which shows significant boost in performance over previous methods. We also introduce results for MNIST-C and TinyImagenet-C to show usefulness of our method across datasets of different complexities to provide robustness against unseen noise. We show results with different architectures to validate our method against other baseline methods, and also conduct experiments to show the usefulness of each part of our method.
我们提出了一种方法,通过在推理时使用自监督学习的基于共识的预测方法,利用集成技术提高DNN对看不见的噪声破坏的鲁棒性,如高斯噪声、散点噪声、脉冲噪声和不同严重程度的斑点噪声。我们还建议通过考虑问题的其他方面来增强模型训练,例如数据中的噪声和更好的表示学习,通过基于共识的预测策略显示出更好的泛化性能。我们报告了标准CIFAR10-C和ImageNet-C基准测试中每种噪声损坏的结果,这些结果显示比以前的方法性能有显着提高。我们还介绍了MNIST-C和TinyImagenet-C的结果,以显示我们的方法在不同复杂性的数据集上的实用性,以提供对看不见的噪声的鲁棒性。我们展示了不同架构的结果,以对照其他基准方法验证我们的方法,并且还进行了实验,以显示我们方法的每个部分的有用性。
{"title":"Leveraging Test-Time Consensus Prediction for Robustness against Unseen Noise","authors":"Anindya Sarkar, Anirban Sarkar, V. Balasubramanian","doi":"10.1109/WACV51458.2022.00362","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00362","url":null,"abstract":"We propose a method to improve DNN robustness against unseen noisy corruptions, such as Gaussian noise, Shot Noise, Impulse Noise, Speckle noise with different levels of severity by leveraging ensemble technique through a consensus based prediction method using self-supervised learning at inference time. We also propose to enhance the model training by considering other aspects of the issue i.e. noise in data and better representation learning which shows even better generalization performance with the consensus based prediction strategy. We report results of each noisy corruption on the standard CIFAR10-C and ImageNet-C benchmark which shows significant boost in performance over previous methods. We also introduce results for MNIST-C and TinyImagenet-C to show usefulness of our method across datasets of different complexities to provide robustness against unseen noise. We show results with different architectures to validate our method against other baseline methods, and also conduct experiments to show the usefulness of each part of our method.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128229449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1