首页 > 最新文献

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
Co-Net: A Collaborative Region-Contour-Driven Network for Fine-to-Finer Medical Image Segmentation 协同网络:用于精细到精细医学图像分割的协同区域轮廓驱动网络
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00177
Anran Liu, Xiangsheng Huang, Tong Li, Pengcheng Ma
In this paper, a fine-to-finer segmentation task is investigated driven by region and contour features collaboratively on Glomerular Electron-Dense Deposits (GEDD) in view of the complementary nature of these two types of features. To this end, a novel network (Co-Net) is presented to dynamically use fine saliency segmentation to guide finer segmentation on boundaries. The whole architecture contains double mutually boosted decoders sharing one common encoder. Specifically, a new structure named Global-guided Interaction Module (GIM) is designed to effectively control the information flow and reduce redundancy in the cross-level feature fusion process. At the same time, the global features are used in it to make the features of each layer gain access to richer context, and a fine segmentation map is obtained initially; Discontinuous Boundary Supervision (DBS) strategy is applied to pay more attention to discontinuity positions and modifying segmentation errors on boundaries. At last, Selective Kernel (SK) is used for dynamical aggregation of the region and contour features to obtain a finer segmentation. Our proposed approach is evaluated on an independent GEDD dataset labeled by pathologists and also on open polyp datasets to test the generalization. Ablation studies show the effectiveness of different modules. On all datasets, our proposal achieves high segmentation accuracy and surpasses previous methods.
鉴于肾小球电子致密沉积物(Glomerular Electron-Dense Deposits, GEDD)两类特征的互补性,本文研究了区域特征和轮廓特征协同驱动的精细到精细的分割任务。为此,提出了一种新颖的网络(Co-Net),动态地使用精细显著性分割来指导边界上的精细分割。整个架构包含两个相互增强的解码器,共享一个公共编码器。具体而言,设计了全局引导交互模块(Global-guided Interaction Module, GIM)结构,以有效控制信息流动,减少跨层特征融合过程中的冗余。同时,利用全局特征使每一层的特征获得更丰富的上下文,初步得到精细的分割图;采用不连续边界监督(DBS)策略,更加关注不连续位置并修正边界上的分割误差。最后,利用选择性核(SK)对区域和轮廓特征进行动态聚合,得到更精细的分割结果。我们提出的方法在由病理学家标记的独立GEDD数据集和开放息肉数据集上进行了评估,以测试其泛化性。烧蚀研究表明了不同模块的有效性。在所有的数据集上,我们的方法都达到了很高的分割精度,并且超越了以前的方法。
{"title":"Co-Net: A Collaborative Region-Contour-Driven Network for Fine-to-Finer Medical Image Segmentation","authors":"Anran Liu, Xiangsheng Huang, Tong Li, Pengcheng Ma","doi":"10.1109/WACV51458.2022.00177","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00177","url":null,"abstract":"In this paper, a fine-to-finer segmentation task is investigated driven by region and contour features collaboratively on Glomerular Electron-Dense Deposits (GEDD) in view of the complementary nature of these two types of features. To this end, a novel network (Co-Net) is presented to dynamically use fine saliency segmentation to guide finer segmentation on boundaries. The whole architecture contains double mutually boosted decoders sharing one common encoder. Specifically, a new structure named Global-guided Interaction Module (GIM) is designed to effectively control the information flow and reduce redundancy in the cross-level feature fusion process. At the same time, the global features are used in it to make the features of each layer gain access to richer context, and a fine segmentation map is obtained initially; Discontinuous Boundary Supervision (DBS) strategy is applied to pay more attention to discontinuity positions and modifying segmentation errors on boundaries. At last, Selective Kernel (SK) is used for dynamical aggregation of the region and contour features to obtain a finer segmentation. Our proposed approach is evaluated on an independent GEDD dataset labeled by pathologists and also on open polyp datasets to test the generalization. Ablation studies show the effectiveness of different modules. On all datasets, our proposal achieves high segmentation accuracy and surpasses previous methods.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123010863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Modeling Aleatoric Uncertainty for Camouflaged Object Detection 伪装目标检测的任意不确定性建模
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00267
Jiawei Liu, Jing Zhang, N. Barnes
Aleatoric uncertainty captures noise within the observations. For camouflaged object detection, due to similar appearance of the camouflaged foreground and the back-ground, it’s difficult to obtain highly accurate annotations, especially annotations around object boundaries. We argue that training directly with the "noisy" camouflage map may lead to a model of poor generalization ability. In this paper, we introduce an explicitly aleatoric uncertainty estimation technique to represent predictive uncertainty due to noisy labeling. Specifically, we present a confidence-aware camouflaged object detection (COD) framework using dynamic supervision to produce both an accurate camouflage map and a reliable "aleatoric uncertainty". Different from existing techniques that produce deterministic prediction following the point estimation pipeline, our framework formalises aleatoric uncertainty as probability distribution over model output and the input image. We claim that, once trained, our confidence estimation network can evaluate the pixel-wise accuracy of the prediction without relying on the ground truth camouflage map. Extensive results illustrate the superior performance of the proposed model in explaining the camouflage prediction. Our codes are available at https://github.com/Carlisle-Liu/OCENet
任意不确定性捕获了观测中的噪声。对于伪装目标检测,由于伪装后的前景和背景外观相似,很难获得高精度的标注,特别是物体边界周围的标注。我们认为直接使用“噪声”伪装图进行训练可能会导致模型泛化能力差。在本文中,我们引入了一种显式任意不确定性估计技术来表示由噪声标记引起的预测不确定性。具体来说,我们提出了一个使用动态监督的置信度感知伪装目标检测(COD)框架,以产生准确的伪装地图和可靠的“任意不确定性”。与现有的根据点估计管道产生确定性预测的技术不同,我们的框架将任意不确定性形式化为模型输出和输入图像的概率分布。我们声称,一旦训练,我们的置信度估计网络可以评估预测的像素精度,而不依赖于地面真实伪装图。大量的结果表明,该模型在解释伪装预测方面具有优越的性能。我们的代码可在https://github.com/Carlisle-Liu/OCENet上获得
{"title":"Modeling Aleatoric Uncertainty for Camouflaged Object Detection","authors":"Jiawei Liu, Jing Zhang, N. Barnes","doi":"10.1109/WACV51458.2022.00267","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00267","url":null,"abstract":"Aleatoric uncertainty captures noise within the observations. For camouflaged object detection, due to similar appearance of the camouflaged foreground and the back-ground, it’s difficult to obtain highly accurate annotations, especially annotations around object boundaries. We argue that training directly with the \"noisy\" camouflage map may lead to a model of poor generalization ability. In this paper, we introduce an explicitly aleatoric uncertainty estimation technique to represent predictive uncertainty due to noisy labeling. Specifically, we present a confidence-aware camouflaged object detection (COD) framework using dynamic supervision to produce both an accurate camouflage map and a reliable \"aleatoric uncertainty\". Different from existing techniques that produce deterministic prediction following the point estimation pipeline, our framework formalises aleatoric uncertainty as probability distribution over model output and the input image. We claim that, once trained, our confidence estimation network can evaluate the pixel-wise accuracy of the prediction without relying on the ground truth camouflage map. Extensive results illustrate the superior performance of the proposed model in explaining the camouflage prediction. Our codes are available at https://github.com/Carlisle-Liu/OCENet","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121240749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
PICA: Point-wise Instance and Centroid Alignment Based Few-shot Domain Adaptive Object Detection with Loose Annotations PICA:基于点向实例和质心对齐的少镜头域自适应目标检测
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00047
Chaoliang Zhong, Jiexi Wang, Chengang Feng, Ying Zhang, Jun Sun, Yasuto Yokota
In this work, we focus on supervised domain adaptation for object detection in few-shot loose annotation setting, where the source images are sufficient and fully labeled but the target images are few-shot and loosely annotated. As annotated objects exist in the target domain, instance level alignment can be utilized to improve the performance. Traditional methods conduct the instance level alignment by semantically aligning the distributions of paired object features with domain adversarial training. Although it is demonstrated that point-wise surrogates of distribution alignment provide a more effective solution in few-shot classification tasks across domains, this point-wise alignment approach has not yet been extended to object detection. In this work, we propose a method that extends the point-wise alignment from classification to object detection. Moreover, in the few-shot loose annotation setting, the background ROIs of target domain suffer from severe label noise problem, which may make the point-wise alignment fail. To this end, we exploit moving average centroids to mitigate the label noise problem of background ROIs. Meanwhile, we exploit point-wise alignment over instances and centroids to tackle the problem of scarcity of labeled target instances. Hence this method is not only robust against label noises of background ROIs but also robust against the scarcity of labeled target objects. Experimental results show that the proposed instance level alignment method brings significant improvement compared with the baseline and is superior to state-of-the-art methods.
在这项工作中,我们重点研究了在少量松散注释设置下的监督域自适应目标检测,其中源图像足够且完全标记,而目标图像是少量且松散注释。由于带注释的对象存在于目标域中,因此可以利用实例级对齐来提高性能。传统方法通过领域对抗训练对配对对象特征的分布进行语义对齐,从而实现实例级对齐。虽然已经证明,点向分布对齐的替代方法在跨域的少量分类任务中提供了更有效的解决方案,但这种点向对齐方法尚未扩展到目标检测中。在这项工作中,我们提出了一种将逐点对齐从分类扩展到目标检测的方法。此外,在少量松散标注设置下,目标域背景roi存在严重的标签噪声问题,可能导致逐点对齐失败。为此,我们利用移动平均质心来缓解背景roi的标记噪声问题。同时,我们利用实例和质心的逐点对齐来解决标记目标实例的稀缺性问题。因此,该方法不仅对背景roi的标签噪声具有鲁棒性,而且对标记目标的稀缺性具有鲁棒性。实验结果表明,所提出的实例级对齐方法与基线相比有显著的改进,优于现有的方法。
{"title":"PICA: Point-wise Instance and Centroid Alignment Based Few-shot Domain Adaptive Object Detection with Loose Annotations","authors":"Chaoliang Zhong, Jiexi Wang, Chengang Feng, Ying Zhang, Jun Sun, Yasuto Yokota","doi":"10.1109/WACV51458.2022.00047","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00047","url":null,"abstract":"In this work, we focus on supervised domain adaptation for object detection in few-shot loose annotation setting, where the source images are sufficient and fully labeled but the target images are few-shot and loosely annotated. As annotated objects exist in the target domain, instance level alignment can be utilized to improve the performance. Traditional methods conduct the instance level alignment by semantically aligning the distributions of paired object features with domain adversarial training. Although it is demonstrated that point-wise surrogates of distribution alignment provide a more effective solution in few-shot classification tasks across domains, this point-wise alignment approach has not yet been extended to object detection. In this work, we propose a method that extends the point-wise alignment from classification to object detection. Moreover, in the few-shot loose annotation setting, the background ROIs of target domain suffer from severe label noise problem, which may make the point-wise alignment fail. To this end, we exploit moving average centroids to mitigate the label noise problem of background ROIs. Meanwhile, we exploit point-wise alignment over instances and centroids to tackle the problem of scarcity of labeled target instances. Hence this method is not only robust against label noises of background ROIs but also robust against the scarcity of labeled target objects. Experimental results show that the proposed instance level alignment method brings significant improvement compared with the baseline and is superior to state-of-the-art methods.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133433044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Channel Pruning via Lookahead Search Guided Reinforcement Learning 基于前瞻搜索引导强化学习的信道修剪
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00357
Z. Wang, Chengcheng Li
Channel pruning has become an effective yet still challenging approach to achieve compact neural networks. It aims to prune the optimal set of filters whose removal results in minimal performance degradation of the slimmed network. Due to the prohibitively vast search space of filter combinations, existing approaches usually use various criteria to estimate the filter importance while sacrificing some precision. Here we present a new approach to optimizing the filter selection in channel pruning with lookahead search guided reinforcement learning (RL). A neural network that takes as input filterrelated features is trained with RL to prune the optimal sequence of filters and maximize the performance of the remaining network. In addition, we employ Monte Carlo tree search (MCTS) to provide a lookahead search for filter selection, which increases the sample efficiency for the RL training. Experiments on MNIST, CIFAR-10, and ILSVRC-2012 validate the effectiveness of our approach compared to both traditional and automated existing channel pruning approaches.
信道修剪已成为实现紧凑神经网络的有效方法,但仍具有挑战性。它的目的是修剪一组最优的过滤器,这些过滤器的删除导致精简网络的性能下降最小。由于过滤器组合的搜索空间非常大,现有的方法通常使用各种标准来估计过滤器的重要性,同时牺牲一些精度。本文提出了一种基于前瞻搜索引导强化学习(RL)的通道剪枝滤波器选择优化方法。将输入滤波器相关特征作为神经网络,用强化学习进行训练,以修剪最优的滤波器序列,并使剩余网络的性能最大化。此外,我们采用蒙特卡罗树搜索(MCTS)为过滤器选择提供前瞻性搜索,从而提高了强化学习训练的样本效率。在MNIST、CIFAR-10和ILSVRC-2012上的实验验证了我们的方法与传统和自动化现有信道修剪方法相比的有效性。
{"title":"Channel Pruning via Lookahead Search Guided Reinforcement Learning","authors":"Z. Wang, Chengcheng Li","doi":"10.1109/WACV51458.2022.00357","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00357","url":null,"abstract":"Channel pruning has become an effective yet still challenging approach to achieve compact neural networks. It aims to prune the optimal set of filters whose removal results in minimal performance degradation of the slimmed network. Due to the prohibitively vast search space of filter combinations, existing approaches usually use various criteria to estimate the filter importance while sacrificing some precision. Here we present a new approach to optimizing the filter selection in channel pruning with lookahead search guided reinforcement learning (RL). A neural network that takes as input filterrelated features is trained with RL to prune the optimal sequence of filters and maximize the performance of the remaining network. In addition, we employ Monte Carlo tree search (MCTS) to provide a lookahead search for filter selection, which increases the sample efficiency for the RL training. Experiments on MNIST, CIFAR-10, and ILSVRC-2012 validate the effectiveness of our approach compared to both traditional and automated existing channel pruning approaches.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133588371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Learned Event-based Visual Perception for Improved Space Object Detection 基于学习事件的视觉感知改进空间目标检测
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00336
Nikolaus Salvatore, Justin Fletcher
The detection of dim artificial Earth satellites using ground-based electro-optical sensors, particularly in the presence of background light, is technologically challenging. This perceptual task is foundational to our understanding of the space environment, and grows in importance as the number, variety, and dynamism of space objects increases. We present a hybrid image- and event-based architecture that leverages dynamic vision sensing technology to detect resident space objects in geosynchronous Earth orbit. Given the asynchronous, one-dimensional image data supplied by a dynamic vision sensor, our architecture applies conventional image feature extractors to integrated, two-dimensional frames in conjunction with point-cloud feature extractors, such as PointNet, in order to increase detection performance for dim objects in scenes with high background activity. In addition, an end-to-end event-based imaging simulator is developed to both produce data for model training as well as approximate the optimal sensor parameters for event-based sensing in the context of electrooptical telescope imagery. Experimental results confirm that the inclusion of point-cloud feature extractors increases recall for dim objects in the high-background regime.
利用地面光电传感器探测昏暗的人造地球卫星,特别是在有背景光的情况下,在技术上具有挑战性。这种感知任务是我们理解空间环境的基础,并且随着空间物体的数量、种类和动态性的增加而变得越来越重要。我们提出了一种基于图像和事件的混合架构,该架构利用动态视觉传感技术来检测地球同步轨道上的驻留空间物体。考虑到动态视觉传感器提供的异步一维图像数据,我们的架构将传统的图像特征提取器与点云特征提取器(如PointNet)一起应用于集成的二维帧,以提高对高背景活动场景中暗淡物体的检测性能。此外,开发了端到端基于事件的成像模拟器,既可以为模型训练生成数据,也可以在光电望远镜成像的背景下为基于事件的传感近似获得最佳传感器参数。实验结果证实,点云特征提取器的加入提高了高背景条件下模糊目标的召回率。
{"title":"Learned Event-based Visual Perception for Improved Space Object Detection","authors":"Nikolaus Salvatore, Justin Fletcher","doi":"10.1109/WACV51458.2022.00336","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00336","url":null,"abstract":"The detection of dim artificial Earth satellites using ground-based electro-optical sensors, particularly in the presence of background light, is technologically challenging. This perceptual task is foundational to our understanding of the space environment, and grows in importance as the number, variety, and dynamism of space objects increases. We present a hybrid image- and event-based architecture that leverages dynamic vision sensing technology to detect resident space objects in geosynchronous Earth orbit. Given the asynchronous, one-dimensional image data supplied by a dynamic vision sensor, our architecture applies conventional image feature extractors to integrated, two-dimensional frames in conjunction with point-cloud feature extractors, such as PointNet, in order to increase detection performance for dim objects in scenes with high background activity. In addition, an end-to-end event-based imaging simulator is developed to both produce data for model training as well as approximate the optimal sensor parameters for event-based sensing in the context of electrooptical telescope imagery. Experimental results confirm that the inclusion of point-cloud feature extractors increases recall for dim objects in the high-background regime.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133694593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
3DRefTransformer: Fine-Grained Object Identification in Real-World Scenes Using Natural Language 3DRefTransformer:在现实世界场景中使用自然语言的细粒度对象识别
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00068
Ahmed Abdelreheem, Ujjwal Upadhyay, Ivan Skorokhodov, Rawan Al Yahya, Jun Chen, Mohamed Elhoseiny
In this paper, we study fine-grained 3D object identification in real-world scenes described by a textual query. The task aims to discriminatively understand an instance of a particular 3D object described by natural language utterances among other instances of 3D objects of the same class appearing in a visual scene. We introduce the 3DRefTransformer net, a transformer-based neural network that identifies 3D objects described by linguistic utterances in real-world scenes. The network’s input is 3D object segmented point cloud images representing a real-world scene and a language utterance that refers to one of the scene objects. The goal is to identify the referred object. Compared to the state-of-the-art models that are mostly based on graph convolutions and LSTMs, our 3DRefTrans-former net offers two key advantages. First, it is an end-to-end transformer model that operates both on language and 3D visual objects. Second, it has a natural ability to ground textual terms in the utterance to the learning representation of 3D objects in the scene. We further incorporate object pairwise spatial relation loss and contrastive learning during model training. We show in our experiments that our model improves the performance upon the current SOTA significantly on Referit3D Nr3D and Sr3D datasets. Code and Models will be made publicly available at https://vision-cair.github.io/3dreftransformer/.
在本文中,我们研究了用文本查询描述的真实场景中的细粒度三维物体识别。该任务旨在区分地理解由自然语言话语描述的特定3D对象的实例,以及在视觉场景中出现的同类3D对象的其他实例。我们介绍了3DRefTransformer网络,这是一个基于变压器的神经网络,可以识别现实世界场景中由语言话语描述的3D物体。该网络的输入是代表现实世界场景的3D对象分割点云图像,以及指向场景对象之一的语言话语。目标是识别被引用的对象。与主要基于图卷积和lstm的最先进的模型相比,我们的3dreftransformer网络具有两个关键优势。首先,它是一个端到端转换器模型,可以同时操作语言和3D可视化对象。其次,它具有将话语中的文本术语与场景中3D物体的学习表示联系起来的天然能力。在模型训练过程中,我们进一步将对象两两空间关系损失和对比学习结合起来。我们的实验表明,我们的模型在refit3d Nr3D和Sr3D数据集上显著提高了当前SOTA的性能。代码和模型将在https://vision-cair.github.io/3dreftransformer/上公开提供。
{"title":"3DRefTransformer: Fine-Grained Object Identification in Real-World Scenes Using Natural Language","authors":"Ahmed Abdelreheem, Ujjwal Upadhyay, Ivan Skorokhodov, Rawan Al Yahya, Jun Chen, Mohamed Elhoseiny","doi":"10.1109/WACV51458.2022.00068","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00068","url":null,"abstract":"In this paper, we study fine-grained 3D object identification in real-world scenes described by a textual query. The task aims to discriminatively understand an instance of a particular 3D object described by natural language utterances among other instances of 3D objects of the same class appearing in a visual scene. We introduce the 3DRefTransformer net, a transformer-based neural network that identifies 3D objects described by linguistic utterances in real-world scenes. The network’s input is 3D object segmented point cloud images representing a real-world scene and a language utterance that refers to one of the scene objects. The goal is to identify the referred object. Compared to the state-of-the-art models that are mostly based on graph convolutions and LSTMs, our 3DRefTrans-former net offers two key advantages. First, it is an end-to-end transformer model that operates both on language and 3D visual objects. Second, it has a natural ability to ground textual terms in the utterance to the learning representation of 3D objects in the scene. We further incorporate object pairwise spatial relation loss and contrastive learning during model training. We show in our experiments that our model improves the performance upon the current SOTA significantly on Referit3D Nr3D and Sr3D datasets. Code and Models will be made publicly available at https://vision-cair.github.io/3dreftransformer/.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125647411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Novel Ensemble Diversification Methods for Open-Set Scenarios 开集场景的集成多样化新方法
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00342
Miriam Farber, Roman Goldenberg, G. Leifman, Gal Novich
We revisit existing ensemble diversification approaches and present two novel diversification methods tailored for open-set scenarios. The first method uses a new loss, designed to encourage models disagreement on outliers only, thus alleviating the intrinsic accuracy-diversity trade-off. The second method achieves diversity via automated feature engineering, by training each model to disregard input features learned by previously trained ensemble models. We conduct an extensive evaluation and analysis of the proposed techniques on seven datasets that cover image classification, re-identification and recognition domains. We compare to and demonstrate accuracy improvements over the existing state-of-the-art ensemble diversification methods.
我们回顾了现有的集合多样化方法,并提出了两种针对开放集场景的新颖多样化方法。第一种方法使用了一种新的损失,旨在鼓励模型只在异常值上存在分歧,从而减轻了固有的准确性和多样性权衡。第二种方法通过自动化特征工程实现多样性,通过训练每个模型忽略先前训练的集成模型学习的输入特征。我们在涵盖图像分类、再识别和识别领域的七个数据集上对所提出的技术进行了广泛的评估和分析。我们比较并展示了现有最先进的集成多样化方法的准确性改进。
{"title":"Novel Ensemble Diversification Methods for Open-Set Scenarios","authors":"Miriam Farber, Roman Goldenberg, G. Leifman, Gal Novich","doi":"10.1109/WACV51458.2022.00342","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00342","url":null,"abstract":"We revisit existing ensemble diversification approaches and present two novel diversification methods tailored for open-set scenarios. The first method uses a new loss, designed to encourage models disagreement on outliers only, thus alleviating the intrinsic accuracy-diversity trade-off. The second method achieves diversity via automated feature engineering, by training each model to disregard input features learned by previously trained ensemble models. We conduct an extensive evaluation and analysis of the proposed techniques on seven datasets that cover image classification, re-identification and recognition domains. We compare to and demonstrate accuracy improvements over the existing state-of-the-art ensemble diversification methods.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124334643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Transferable 3D Adversarial Textures using End-to-end Optimization 使用端到端优化可转移的3D对抗纹理
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00080
Camilo Pestana, Naveed Akhtar, N. Rahnavard, M. Shah, A. Mian
Deep visual models are known to be vulnerable to adversarial attacks. The last few years have seen numerous techniques to compute adversarial inputs for these models. However, there are still under-explored avenues in this critical research direction. Among those is the estimation of adversarial textures for 3D models in an end-to-end optimization scheme. In this paper, we propose such a scheme to generate adversarial textures for 3D models that are highly transferable and invariant to different camera views and lighting conditions. Our method makes use of neural rendering with explicit control over the model texture and background. We ensure transferability of the adversarial textures by employing an ensemble of robust and non-robust models. Our technique utilizes 3D models as a proxy to simulate closer to real-life conditions, in contrast to conventional use of 2D images for adversarial attacks. We show the efficacy of our method with extensive experiments.
众所周知,深度视觉模型容易受到对抗性攻击。最近几年出现了许多计算这些模型的对抗性输入的技术。然而,在这一关键的研究方向上,仍有未开发的途径。其中包括在端到端优化方案中对3D模型的对抗纹理的估计。在本文中,我们提出了这样一种方案来为3D模型生成对抗性纹理,这种纹理在不同的相机视图和光照条件下具有高度可转移性和不变性。我们的方法利用神经渲染对模型纹理和背景进行显式控制。我们通过采用鲁棒和非鲁棒模型的集合来确保对抗性纹理的可转移性。我们的技术利用3D模型作为代理来模拟更接近现实生活的条件,而不是传统的使用2D图像进行对抗性攻击。我们通过大量的实验证明了我们方法的有效性。
{"title":"Transferable 3D Adversarial Textures using End-to-end Optimization","authors":"Camilo Pestana, Naveed Akhtar, N. Rahnavard, M. Shah, A. Mian","doi":"10.1109/WACV51458.2022.00080","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00080","url":null,"abstract":"Deep visual models are known to be vulnerable to adversarial attacks. The last few years have seen numerous techniques to compute adversarial inputs for these models. However, there are still under-explored avenues in this critical research direction. Among those is the estimation of adversarial textures for 3D models in an end-to-end optimization scheme. In this paper, we propose such a scheme to generate adversarial textures for 3D models that are highly transferable and invariant to different camera views and lighting conditions. Our method makes use of neural rendering with explicit control over the model texture and background. We ensure transferability of the adversarial textures by employing an ensemble of robust and non-robust models. Our technique utilizes 3D models as a proxy to simulate closer to real-life conditions, in contrast to conventional use of 2D images for adversarial attacks. We show the efficacy of our method with extensive experiments.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133503645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
High Dynamic Range Imaging of Dynamic Scenes with Saturation Compensation but without Explicit Motion Compensation 具有饱和补偿但无显式运动补偿的动态场景的高动态范围成像
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00014
Haesoo Chung, N. Cho
High dynamic range (HDR) imaging is a highly challenging task since a large amount of information is lost due to the limitations of camera sensors. For HDR imaging, some methods capture multiple low dynamic range (LDR) images with altering exposures to aggregate more information. However, these approaches introduce ghosting artifacts when significant inter-frame motions are present. Moreover, although multi-exposure images are given, we have little information in severely over-exposed areas. Most existing methods focus on motion compensation, i.e., alignment of multiple LDR shots to reduce the ghosting artifacts, but they still produce unsatisfying results. These methods also rather overlook the need to restore the saturated areas. In this paper, we generate well-aligned multi-exposure features by reformulating a motion alignment problem into a simple brightness adjustment problem. In addition, we propose a coarse-to-fine merging strategy with explicit saturation compensation. The saturated areas are reconstructed with similar well-exposed content using adaptive contextual attention. We demonstrate that our method outperforms the state-of-the-art methods regarding qualitative and quantitative evaluations.
高动态范围(HDR)成像是一项极具挑战性的任务,因为由于相机传感器的限制,大量信息丢失。对于HDR成像,一些方法通过改变曝光来捕获多个低动态范围(LDR)图像以聚合更多信息。然而,当存在显著的帧间运动时,这些方法会引入重影伪影。此外,虽然给出了多次曝光图像,但在严重过度曝光的区域,我们几乎没有信息。现有的方法大多侧重于运动补偿,即对多个LDR镜头进行对齐,以减少重影伪影,但效果仍不理想。这些方法也忽略了恢复饱和区域的需要。在本文中,我们通过将运动对齐问题重新表述为简单的亮度调整问题来生成对齐良好的多曝光特征。此外,我们还提出了一种具有显式饱和补偿的由粗到精的合并策略。使用自适应上下文注意,用相似的充分暴露的内容重建饱和区域。我们证明,我们的方法优于关于定性和定量评估的最先进的方法。
{"title":"High Dynamic Range Imaging of Dynamic Scenes with Saturation Compensation but without Explicit Motion Compensation","authors":"Haesoo Chung, N. Cho","doi":"10.1109/WACV51458.2022.00014","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00014","url":null,"abstract":"High dynamic range (HDR) imaging is a highly challenging task since a large amount of information is lost due to the limitations of camera sensors. For HDR imaging, some methods capture multiple low dynamic range (LDR) images with altering exposures to aggregate more information. However, these approaches introduce ghosting artifacts when significant inter-frame motions are present. Moreover, although multi-exposure images are given, we have little information in severely over-exposed areas. Most existing methods focus on motion compensation, i.e., alignment of multiple LDR shots to reduce the ghosting artifacts, but they still produce unsatisfying results. These methods also rather overlook the need to restore the saturated areas. In this paper, we generate well-aligned multi-exposure features by reformulating a motion alignment problem into a simple brightness adjustment problem. In addition, we propose a coarse-to-fine merging strategy with explicit saturation compensation. The saturated areas are reconstructed with similar well-exposed content using adaptive contextual attention. We demonstrate that our method outperforms the state-of-the-art methods regarding qualitative and quantitative evaluations.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"300 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131897867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Multi-Scale Patch-Based Representation Learning for Image Anomaly Detection and Segmentation 基于多尺度斑块表示学习的图像异常检测与分割
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00312
Chin-Chia Tsai, Tsung-Hsuan Wu, S. Lai
Unsupervised representation learning has been proven to be effective for the challenging anomaly detection and segmentation tasks. In this paper, we propose a multi-scale patch-based representation learning method to extract critical and representative information from normal images. By taking the relative feature similarity between patches of different local distances into account, we can achieve better representation learning. Moreover, we propose a refined way to improve the self-supervised learning strategy, thus allowing our model to learn better geometric relationship between neighboring patches. Through sliding patches of different scales all over an image, our model extracts representative features from each patch and compares them with those in the training set of normal images to detect the anomalous regions. Our experimental results on MVTec AD dataset and BTAD dataset demonstrate the proposed method achieves the state-of-the-art accuracy for both anomaly detection and segmentation.
无监督表示学习已被证明是有效的异常检测和分割任务。本文提出了一种基于多尺度斑块的表示学习方法,从正常图像中提取关键信息和代表性信息。通过考虑不同局部距离的patch之间的相对特征相似度,可以实现更好的表示学习。此外,我们提出了一种改进自监督学习策略的改进方法,从而使我们的模型能够更好地学习相邻斑块之间的几何关系。我们的模型通过在图像上滑动不同尺度的小块,从每个小块中提取有代表性的特征,并与正常图像训练集中的特征进行比较,从而检测出异常区域。在MVTec AD数据集和BTAD数据集上的实验结果表明,该方法在异常检测和分割方面都达到了最先进的精度。
{"title":"Multi-Scale Patch-Based Representation Learning for Image Anomaly Detection and Segmentation","authors":"Chin-Chia Tsai, Tsung-Hsuan Wu, S. Lai","doi":"10.1109/WACV51458.2022.00312","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00312","url":null,"abstract":"Unsupervised representation learning has been proven to be effective for the challenging anomaly detection and segmentation tasks. In this paper, we propose a multi-scale patch-based representation learning method to extract critical and representative information from normal images. By taking the relative feature similarity between patches of different local distances into account, we can achieve better representation learning. Moreover, we propose a refined way to improve the self-supervised learning strategy, thus allowing our model to learn better geometric relationship between neighboring patches. Through sliding patches of different scales all over an image, our model extracts representative features from each patch and compares them with those in the training set of normal images to detect the anomalous regions. Our experimental results on MVTec AD dataset and BTAD dataset demonstrate the proposed method achieves the state-of-the-art accuracy for both anomaly detection and segmentation.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"298 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133199235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
期刊
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1