首页 > 最新文献

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
X-MIR: EXplainable Medical Image Retrieval X-MIR:可解释医学图像检索
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00161
Brian Hu, Bhavan Kumar Vasu, Anthony J. Hoogs
Despite significant progress in the past few years, machine learning systems are still often viewed as "black boxes," which lack the ability to explain their output decisions. In high-stakes situations such as healthcare, there is a need for explainable AI (XAI) tools that can help open up this black box. In contrast to approaches which largely tackle classification problems in the medical imaging domain, we address the less-studied problem of explainable image retrieval. We test our approach on a COVID-19 chest X-ray dataset and the ISIC 2017 skin lesion dataset, showing that saliency maps help reveal the image features used by models to determine image similarity. We evaluated three different saliency algorithms, which were either occlusion-based, attention-based, or relied on a form of activation mapping. We also develop quantitative evaluation metrics that allow us to go beyond simple qualitative comparisons of the different saliency algorithms. Our results have the potential to aid clinicians when viewing medical images and addresses an urgent need for interventional tools in response to COVID-19. The source code is publicly available at: https://gitlab.kitware.com/brianhhu/x-mir.
尽管过去几年取得了重大进展,但机器学习系统仍然经常被视为“黑匣子”,缺乏解释其输出决策的能力。在医疗保健等高风险情况下,需要可解释的AI (XAI)工具来帮助打开这个黑盒子。与主要解决医学成像领域分类问题的方法相反,我们解决了研究较少的可解释图像检索问题。我们在COVID-19胸部x射线数据集和ISIC 2017皮肤病变数据集上测试了我们的方法,结果表明显著性图有助于揭示模型用于确定图像相似性的图像特征。我们评估了三种不同的显著性算法,它们要么基于闭塞,要么基于注意力,要么依赖于一种形式的激活映射。我们还开发了定量评估指标,使我们能够超越对不同显著性算法的简单定性比较。我们的研究结果有可能帮助临床医生查看医学图像,并解决应对COVID-19对介入工具的迫切需求。源代码可以在:https://gitlab.kitware.com/brianhhu/x-mir上公开获得。
{"title":"X-MIR: EXplainable Medical Image Retrieval","authors":"Brian Hu, Bhavan Kumar Vasu, Anthony J. Hoogs","doi":"10.1109/WACV51458.2022.00161","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00161","url":null,"abstract":"Despite significant progress in the past few years, machine learning systems are still often viewed as \"black boxes,\" which lack the ability to explain their output decisions. In high-stakes situations such as healthcare, there is a need for explainable AI (XAI) tools that can help open up this black box. In contrast to approaches which largely tackle classification problems in the medical imaging domain, we address the less-studied problem of explainable image retrieval. We test our approach on a COVID-19 chest X-ray dataset and the ISIC 2017 skin lesion dataset, showing that saliency maps help reveal the image features used by models to determine image similarity. We evaluated three different saliency algorithms, which were either occlusion-based, attention-based, or relied on a form of activation mapping. We also develop quantitative evaluation metrics that allow us to go beyond simple qualitative comparisons of the different saliency algorithms. Our results have the potential to aid clinicians when viewing medical images and addresses an urgent need for interventional tools in response to COVID-19. The source code is publicly available at: https://gitlab.kitware.com/brianhhu/x-mir.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133179525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Self-supervised Test-time Adaptation on Video Data 视频数据的自监督测试时间适应
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00266
Fatemeh Azimi, Sebastián M. Palacio, Federico Raue, Jörn Hees, Luca Bertinetto, A. Dengel
In typical computer vision problems revolving around video data, pre-trained models are simply evaluated at test time, without adaptation. This general approach clearly cannot capture the shifts that will likely arise between the distributions from which training and test data have been sampled. Adapting a pre-trained model to a new video en-countered at test time could be essential to avoid the potentially catastrophic effects of such shifts. However, given the inherent impossibility of labeling data only available at test-time, traditional "fine-tuning" techniques cannot be lever-aged in this highly practical scenario. This paper explores whether the recent progress in test-time adaptation in the image domain and self-supervised learning can be lever-aged to adapt a model to previously unseen and unlabelled videos presenting both mild (but arbitrary) and severe covariate shifts. In our experiments, we show that test-time adaptation approaches applied to self-supervised methods are always beneficial, but also that the extent of their effectiveness largely depends on the specific combination of the algorithms used for adaptation and self-supervision, and also on the type of covariate shift taking place.
在典型的围绕视频数据的计算机视觉问题中,预先训练的模型只是在测试时进行评估,而不进行调整。这种一般的方法显然不能捕捉到在训练和测试数据被采样的分布之间可能出现的变化。将预先训练好的模型适应测试时遇到的新视频,对于避免这种转变可能带来的灾难性影响至关重要。然而,考虑到标记仅在测试时可用的数据的固有不可能性,传统的“微调”技术无法在这种高度实用的场景中发挥作用。本文探讨了图像域和自监督学习中测试时间适应的最新进展是否可以被利用来使模型适应以前未见过的和未标记的视频,这些视频呈现轻微(但任意)和严重的协变量移位。在我们的实验中,我们表明,将测试时间适应方法应用于自监督方法总是有益的,但其有效性的程度在很大程度上取决于用于自适应和自监督的算法的具体组合,以及发生协变量移位的类型。
{"title":"Self-supervised Test-time Adaptation on Video Data","authors":"Fatemeh Azimi, Sebastián M. Palacio, Federico Raue, Jörn Hees, Luca Bertinetto, A. Dengel","doi":"10.1109/WACV51458.2022.00266","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00266","url":null,"abstract":"In typical computer vision problems revolving around video data, pre-trained models are simply evaluated at test time, without adaptation. This general approach clearly cannot capture the shifts that will likely arise between the distributions from which training and test data have been sampled. Adapting a pre-trained model to a new video en-countered at test time could be essential to avoid the potentially catastrophic effects of such shifts. However, given the inherent impossibility of labeling data only available at test-time, traditional \"fine-tuning\" techniques cannot be lever-aged in this highly practical scenario. This paper explores whether the recent progress in test-time adaptation in the image domain and self-supervised learning can be lever-aged to adapt a model to previously unseen and unlabelled videos presenting both mild (but arbitrary) and severe covariate shifts. In our experiments, we show that test-time adaptation approaches applied to self-supervised methods are always beneficial, but also that the extent of their effectiveness largely depends on the specific combination of the algorithms used for adaptation and self-supervision, and also on the type of covariate shift taking place.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132446911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Weakly-Supervised Convolutional Neural Networks for Vessel Segmentation in Cerebral Angiography 弱监督卷积神经网络在脑血管造影血管分割中的应用
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00328
Arvind Vepa, Andy Choi, Noor Nakhaei, Wonjun Lee, Noah Stier, Andrew Vu, Greyson Jenkins, Xiaoyan Yang, Manjot Shergill, Moira Desphy, K. Delao, M. Levy, Cristopher Garduno, Lacy Nelson, Wan-Ching Liu, Fan Hung, F. Scalzo
Automated vessel segmentation in cerebral digital subtraction angiography (DSA) has significant clinical utility in the management of cerebrovascular diseases. Although deep learning has become the foundation for state-of-the-art image segmentation, a significant amount of labeled data is needed for training. Furthermore, due to domain differences, pre-trained networks cannot be applied to DSA data out-of-the-box. To address this, we propose a novel learning framework, which utilizes an active contour model for weak supervision and low-cost human-in-the-loop strategies to improve weak label quality. Our study produces several significant results, including state-of-the-art results for cerebral DSA vessel segmentation, which exceed human annotator quality, and an analysis of annotation cost and model performance trade-offs when utilizing weak supervision strategies. For comparison purposes, we also demonstrate our approach on the Digital Retinal Images for Vessel Extraction (DRIVE) dataset. Additionally, we will be publicly releasing code to reproduce our methodology and our dataset, the largest known high-quality annotated cerebral DSA vessel segmentation dataset.
脑数字减影血管造影(DSA)中的自动血管分割在脑血管疾病的治疗中具有重要的临床应用价值。虽然深度学习已经成为最先进的图像分割的基础,但训练需要大量的标记数据。此外,由于领域的差异,预训练的网络不能开箱即用地应用于DSA数据。为了解决这个问题,我们提出了一种新的学习框架,该框架利用主动轮廓模型进行弱监督和低成本的人在环策略来提高弱标签质量。我们的研究产生了几个重要的结果,包括最先进的脑DSA血管分割结果,超过了人类注释器的质量,以及使用弱监督策略时注释成本和模型性能权衡的分析。为了进行比较,我们还在用于血管提取的数字视网膜图像(DRIVE)数据集上展示了我们的方法。此外,我们将公开发布代码来复制我们的方法和我们的数据集,这是已知最大的高质量注释脑DSA血管分割数据集。
{"title":"Weakly-Supervised Convolutional Neural Networks for Vessel Segmentation in Cerebral Angiography","authors":"Arvind Vepa, Andy Choi, Noor Nakhaei, Wonjun Lee, Noah Stier, Andrew Vu, Greyson Jenkins, Xiaoyan Yang, Manjot Shergill, Moira Desphy, K. Delao, M. Levy, Cristopher Garduno, Lacy Nelson, Wan-Ching Liu, Fan Hung, F. Scalzo","doi":"10.1109/WACV51458.2022.00328","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00328","url":null,"abstract":"Automated vessel segmentation in cerebral digital subtraction angiography (DSA) has significant clinical utility in the management of cerebrovascular diseases. Although deep learning has become the foundation for state-of-the-art image segmentation, a significant amount of labeled data is needed for training. Furthermore, due to domain differences, pre-trained networks cannot be applied to DSA data out-of-the-box. To address this, we propose a novel learning framework, which utilizes an active contour model for weak supervision and low-cost human-in-the-loop strategies to improve weak label quality. Our study produces several significant results, including state-of-the-art results for cerebral DSA vessel segmentation, which exceed human annotator quality, and an analysis of annotation cost and model performance trade-offs when utilizing weak supervision strategies. For comparison purposes, we also demonstrate our approach on the Digital Retinal Images for Vessel Extraction (DRIVE) dataset. Additionally, we will be publicly releasing code to reproduce our methodology and our dataset, the largest known high-quality annotated cerebral DSA vessel segmentation dataset.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132428115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
SWAG-V: Explanations for Video using Superpixels Weighted by Average Gradients swagv:使用平均梯度加权的超像素视频解释
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00164
Thomas Hartley, K. Sidorov, Christopher Willis, A. D. Marshall
CNN architectures that take videos as an input are often overlooked when it comes to the development of explanation techniques. This is despite their use in often critical domains such as surveillance and healthcare. Explanation techniques developed for these networks must take into account the additional temporal domain if they are to be successful. In this paper we introduce SWAG-V, an extension of SWAG for use with networks that take video as an input. In addition we show how these explanations can be created in such a way that they are balanced between fine and coarse explanations. By creating superpixels that incorporate the frames of the input video we are able to create explanations that better locate regions of the input that are important to the networks prediction. We compare SWAG-V against a number of similar techniques using metrics such as insertion and deletion, and weak localisation. We compute these using Kinetics-400 with both the C3D and R(2+1)D network architectures and find that SWAG-V is able to outperform multiple techniques.
当涉及到解释技术的开发时,将视频作为输入的CNN架构经常被忽视。尽管它们通常用于监测和医疗保健等关键领域。为这些网络开发的解释技术如果要取得成功,就必须考虑到额外的时间域。在本文中,我们介绍了SWAG的扩展SWAG- v,用于将视频作为输入的网络。此外,我们还展示了如何以一种平衡于精细和粗糙解释之间的方式创建这些解释。通过创建包含输入视频帧的超像素,我们能够创建解释,从而更好地定位对网络预测重要的输入区域。我们使用插入和删除以及弱定位等指标将swagv与许多类似的技术进行比较。我们使用具有C3D和R(2+1)D网络架构的Kinetics-400计算这些,并发现swagv能够优于多种技术。
{"title":"SWAG-V: Explanations for Video using Superpixels Weighted by Average Gradients","authors":"Thomas Hartley, K. Sidorov, Christopher Willis, A. D. Marshall","doi":"10.1109/WACV51458.2022.00164","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00164","url":null,"abstract":"CNN architectures that take videos as an input are often overlooked when it comes to the development of explanation techniques. This is despite their use in often critical domains such as surveillance and healthcare. Explanation techniques developed for these networks must take into account the additional temporal domain if they are to be successful. In this paper we introduce SWAG-V, an extension of SWAG for use with networks that take video as an input. In addition we show how these explanations can be created in such a way that they are balanced between fine and coarse explanations. By creating superpixels that incorporate the frames of the input video we are able to create explanations that better locate regions of the input that are important to the networks prediction. We compare SWAG-V against a number of similar techniques using metrics such as insertion and deletion, and weak localisation. We compute these using Kinetics-400 with both the C3D and R(2+1)D network architectures and find that SWAG-V is able to outperform multiple techniques.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115084659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Fast and Efficient Restoration of Extremely Dark Light Fields 快速和有效的恢复极端黑暗的光场
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00321
Mohit Lamba, K. Mitra
The ability of Light Field (LF) cameras to capture the 3D geometry of a scene in a single photographic exposure has become central to several applications ranging from passive depth estimation to post-capture refocusing and view synthesis. But these LF applications break down in extreme low-light conditions due to excessive noise and poor image photometry. Existing low-light restoration techniques are inappropriate because they either do not leverage LF’s multi-view perspective or have enormous time and memory complexity. We propose a three-stage network that is simultaneously fast and accurate for real world applications. Our accuracy comes from the fact that our three stage architecture utilizes global, local and view-specific information present in low-light LFs and fuse them using an RNN inspired feedforward network. We are fast because we restore multiple views simultaneously and so require less number of forward passes. Besides these advantages, our network is flexible enough to restore a m × m LF during inference even if trained for a smaller n × n (n < m) LF without any finetuning. Extensive experiments on real low-light LF demonstrate that compared to the current state-of-the-art, our model can achieve up to 1 dB higher restoration PSNR, with 9× speedup, 23% smaller model size and about 5× lower floating-point operations.
光场(LF)相机在单次曝光中捕捉场景的3D几何形状的能力已经成为从被动深度估计到捕获后重新聚焦和视图合成等几个应用的核心。但是,由于噪声过大和图像测光性能差,这些LF应用在极端低光条件下会崩溃。现有的低光恢复技术是不合适的,因为它们要么没有利用LF的多视图视角,要么有巨大的时间和内存复杂性。我们提出了一个三级网络,同时快速和准确的现实世界的应用。我们的准确性来自于这样一个事实,即我们的三阶段架构利用了低光照LFs中存在的全局、局部和特定视图信息,并使用RNN启发的前馈网络将它们融合在一起。我们的速度很快,因为我们可以同时恢复多个视图,因此需要更少的前向传递。除了这些优点之外,我们的网络足够灵活,即使在没有任何微调的情况下训练较小的n × n (n < m) LF,也可以在推理期间恢复m × m的LF。在实际低光照下进行的大量实验表明,与目前最先进的模型相比,我们的模型可以实现高达1 dB的恢复PSNR,加速提高9倍,模型尺寸缩小23%,浮点运算减少约5倍。
{"title":"Fast and Efficient Restoration of Extremely Dark Light Fields","authors":"Mohit Lamba, K. Mitra","doi":"10.1109/WACV51458.2022.00321","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00321","url":null,"abstract":"The ability of Light Field (LF) cameras to capture the 3D geometry of a scene in a single photographic exposure has become central to several applications ranging from passive depth estimation to post-capture refocusing and view synthesis. But these LF applications break down in extreme low-light conditions due to excessive noise and poor image photometry. Existing low-light restoration techniques are inappropriate because they either do not leverage LF’s multi-view perspective or have enormous time and memory complexity. We propose a three-stage network that is simultaneously fast and accurate for real world applications. Our accuracy comes from the fact that our three stage architecture utilizes global, local and view-specific information present in low-light LFs and fuse them using an RNN inspired feedforward network. We are fast because we restore multiple views simultaneously and so require less number of forward passes. Besides these advantages, our network is flexible enough to restore a m × m LF during inference even if trained for a smaller n × n (n < m) LF without any finetuning. Extensive experiments on real low-light LF demonstrate that compared to the current state-of-the-art, our model can achieve up to 1 dB higher restoration PSNR, with 9× speedup, 23% smaller model size and about 5× lower floating-point operations.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116346339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SC-UDA: Style and Content Gaps aware Unsupervised Domain Adaptation for Object Detection 面向对象检测的风格和内容差距感知的无监督域自适应
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00113
Fuxun Yu, Di Wang, Yinpeng Chen, Nikolaos Karianakis, Tong Shen, Pei Yu, Dimitrios Lymberopoulos, Sidi Lu, Weisong Shi, Xiang Chen
Current state-of-the-art object detectors can have significant performance drop when deployed in the wild due to domain gaps with training data. Unsupervised Domain Adaptation (UDA) is a promising approach to adapt detectors for new domains/environments without any expensive label cost. Previous mainstream UDA works for object detection usually focused on image-level and/or feature-level adaptation by using adversarial learning methods. In this work, we show that such adversarial-based methods can only reduce domain style gap, but cannot address the domain content gap that is also important for object detectors. To overcome this limitation, we propose the SC-UDA framework to concurrently reduce both gaps: We propose fine-grained domain style transfer to reduce the style gaps with finer image details preserved for detecting small objects; Then we leverage the pseudo label-based self-training to reduce content gaps; To address pseudo label error accumulation during self-training, novel optimizations are proposed, including uncertainty-based pseudo labeling and imbalanced mini-batch sampling strategy. Experiment results show that our approach consistently outperforms prior state-of-the-art methods (up to 8.6%, 2.7% and 2.5% mAP on three UDA benchmarks).
由于与训练数据的域差距,当前最先进的目标检测器在野外部署时可能会出现显著的性能下降。无监督域自适应(UDA)是一种很有前途的方法,可以使检测器适应新的域/环境,而不需要昂贵的标签成本。以前主流的UDA用于目标检测的工作通常侧重于使用对抗学习方法进行图像级和/或特征级的适应。在这项工作中,我们表明这种基于对抗性的方法只能减少领域风格差距,但不能解决领域内容差距,而领域内容差距对目标检测器也很重要。为了克服这一限制,我们提出了SC-UDA框架来同时减少这两种差距:我们提出了细粒度域风格转移,以减少风格差距,同时保留更精细的图像细节以检测小物体;然后利用基于伪标签的自我训练来减少内容差距;为了解决自训练过程中的伪标签误差积累问题,提出了基于不确定性的伪标签和不平衡小批量采样策略。实验结果表明,我们的方法始终优于先前的最先进的方法(在三个UDA基准上高达8.6%,2.7%和2.5%的mAP)。
{"title":"SC-UDA: Style and Content Gaps aware Unsupervised Domain Adaptation for Object Detection","authors":"Fuxun Yu, Di Wang, Yinpeng Chen, Nikolaos Karianakis, Tong Shen, Pei Yu, Dimitrios Lymberopoulos, Sidi Lu, Weisong Shi, Xiang Chen","doi":"10.1109/WACV51458.2022.00113","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00113","url":null,"abstract":"Current state-of-the-art object detectors can have significant performance drop when deployed in the wild due to domain gaps with training data. Unsupervised Domain Adaptation (UDA) is a promising approach to adapt detectors for new domains/environments without any expensive label cost. Previous mainstream UDA works for object detection usually focused on image-level and/or feature-level adaptation by using adversarial learning methods. In this work, we show that such adversarial-based methods can only reduce domain style gap, but cannot address the domain content gap that is also important for object detectors. To overcome this limitation, we propose the SC-UDA framework to concurrently reduce both gaps: We propose fine-grained domain style transfer to reduce the style gaps with finer image details preserved for detecting small objects; Then we leverage the pseudo label-based self-training to reduce content gaps; To address pseudo label error accumulation during self-training, novel optimizations are proposed, including uncertainty-based pseudo labeling and imbalanced mini-batch sampling strategy. Experiment results show that our approach consistently outperforms prior state-of-the-art methods (up to 8.6%, 2.7% and 2.5% mAP on three UDA benchmarks).","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122992057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Deep Feature Prior Guided Face Deblurring 深度特征先验引导人脸去模糊
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00096
S. Jung, Tae Bok Lee, Y. S. Heo
Most recent face deblurring methods have focused on utilizing facial shape priors such as face landmarks and parsing maps. While these priors can provide facial geometric cues effectively, they are insufficient to contain local texture details that act as important clues to solve face deblurring problem. To deal with this, we focus on estimating the deep features of pre-trained face recognition networks (e.g., VGGFace network) that include rich information about sharp faces as a prior, and adopt a generative adversarial network (GAN) to learn it. To this end, we propose a deep feature prior guided network (DFPGnet) that restores facial details using the estimated the deep feature prior from a blurred image. In our DFPGnet, the generator is divided into two streams including prior estimation and deblurring streams. Since the estimated deep features of the prior estimation stream are learned from the VGGFace network which is trained for face recognition not for deblurring, we need to alleviate the discrepancy of feature distributions between the two streams. Therefore, we present feature transform modules at the connecting points of the two streams. In addition, we propose a channel-attention feature discriminator and prior loss, which encourages the generator to focus on more important channels for deblurring among the deep feature prior during training. Experimental results show that our method achieves state-of-the-art performance both qualitatively and quantitatively.
最近的人脸去模糊方法主要集中在利用人脸地标和解析地图等面部形状先验。虽然这些先验可以有效地提供面部几何线索,但它们不足以包含作为解决面部去模糊问题重要线索的局部纹理细节。为了解决这个问题,我们专注于估计预训练的人脸识别网络(例如VGGFace网络)的深度特征,这些网络包括关于尖锐面孔的丰富信息作为先验,并采用生成对抗网络(GAN)来学习它。为此,我们提出了一种深度特征先验引导网络(DFPGnet),该网络利用从模糊图像中估计的深度特征先验来恢复面部细节。在我们的DFPGnet中,生成器被分为两个流,包括先验估计和去模糊流。由于先验估计流的估计深度特征是从VGGFace网络中学习来的,而VGGFace网络是为了人脸识别而训练的,而不是为了去模糊,因此我们需要缓解两种流之间特征分布的差异。因此,我们在两个流的连接点上提出了特征变换模块。此外,我们还提出了信道关注特征鉴别器和先验损失,这鼓励生成器在训练过程中关注更重要的信道来消除深度特征之间的模糊。实验结果表明,我们的方法在定性和定量上都达到了最先进的性能。
{"title":"Deep Feature Prior Guided Face Deblurring","authors":"S. Jung, Tae Bok Lee, Y. S. Heo","doi":"10.1109/WACV51458.2022.00096","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00096","url":null,"abstract":"Most recent face deblurring methods have focused on utilizing facial shape priors such as face landmarks and parsing maps. While these priors can provide facial geometric cues effectively, they are insufficient to contain local texture details that act as important clues to solve face deblurring problem. To deal with this, we focus on estimating the deep features of pre-trained face recognition networks (e.g., VGGFace network) that include rich information about sharp faces as a prior, and adopt a generative adversarial network (GAN) to learn it. To this end, we propose a deep feature prior guided network (DFPGnet) that restores facial details using the estimated the deep feature prior from a blurred image. In our DFPGnet, the generator is divided into two streams including prior estimation and deblurring streams. Since the estimated deep features of the prior estimation stream are learned from the VGGFace network which is trained for face recognition not for deblurring, we need to alleviate the discrepancy of feature distributions between the two streams. Therefore, we present feature transform modules at the connecting points of the two streams. In addition, we propose a channel-attention feature discriminator and prior loss, which encourages the generator to focus on more important channels for deblurring among the deep feature prior during training. Experimental results show that our method achieves state-of-the-art performance both qualitatively and quantitatively.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124951899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Lane-Level Street Map Extraction from Aerial Imagery 基于航拍图像的巷级街道地图提取
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00156
Songtao He, Harinarayanan Balakrishnan
Digital maps with lane-level details are the foundation of many applications. However, creating and maintaining digital maps especially maps with lane-level details, are labor-intensive and expensive. In this work, we propose a mapping pipeline to extract lane-level street maps from aerial imagery automatically. Our mapping pipeline first extracts lanes at non-intersection areas, then it enumerates all the possible turning lanes at intersections, validates the connectivity of them, and extracts the valid turning lanes to complete the map. We evaluate the accuracy of our mapping pipeline on a dataset consisting of four U.S. cities, demonstrating the effectiveness of our proposed mapping pipeline and the potential of scalable mapping solutions based on aerial imagery.
具有车道级别细节的数字地图是许多应用程序的基础。然而,创建和维护数字地图,特别是具有车道级详细信息的地图,是劳动密集型且昂贵的。在这项工作中,我们提出了一个映射管道,从航空图像中自动提取车道级街道地图。我们的映射管道首先提取非交叉口区域的车道,然后枚举交叉口所有可能的转弯车道,验证它们的连通性,提取有效的转弯车道,完成地图绘制。我们在由四个美国城市组成的数据集上评估了我们的测绘管道的准确性,展示了我们提出的测绘管道的有效性以及基于航空图像的可扩展测绘解决方案的潜力。
{"title":"Lane-Level Street Map Extraction from Aerial Imagery","authors":"Songtao He, Harinarayanan Balakrishnan","doi":"10.1109/WACV51458.2022.00156","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00156","url":null,"abstract":"Digital maps with lane-level details are the foundation of many applications. However, creating and maintaining digital maps especially maps with lane-level details, are labor-intensive and expensive. In this work, we propose a mapping pipeline to extract lane-level street maps from aerial imagery automatically. Our mapping pipeline first extracts lanes at non-intersection areas, then it enumerates all the possible turning lanes at intersections, validates the connectivity of them, and extracts the valid turning lanes to complete the map. We evaluate the accuracy of our mapping pipeline on a dataset consisting of four U.S. cities, demonstrating the effectiveness of our proposed mapping pipeline and the potential of scalable mapping solutions based on aerial imagery.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128692221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Learning Temporal Video Procedure Segmentation from an Automatically Collected Large Dataset 从自动收集的大数据集学习视频过程的时间分割
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00279
Lei Ji, Chenfei Wu, Daisy Zhou, Kun Yan, Edward Cui, Xilin Chen, Nan Duan
Temporal Video Segmentation (TVS) is a fundamental video understanding task and has been widely researched in recent years. There are two subtasks of TVS: Video Action Segmentation (VAS) and Video Procedure Segmentation (VPS): VAS aims to recognize what actions happen in-side the video while VPS aims to segment the video into a sequence of video clips as a procedure. The VAS task inevitably relies on pre-defined action labels and is thus hard to scale to various open-domain videos. To overcome this limitation, the VPS task tries to divide a video into several category-independent procedure segments. However, the existing dataset for the VPS task is small (2k videos) and lacks diversity (only cooking domain). To tackle these problems, we collect a large and diverse dataset called TIPS, specifically for the VPS task. TIPS contains 63k videos including more than 300k procedure segments from instructional videos on YouTube, which covers plenty of how-to areas such as cooking, health, beauty, parenting, gardening, etc. We then propose a multi-modal Transformer with Gaussian Boundary Detection (MT-GBD) model for VPS, with the backbone of the Transformer and Convolution. Furthermore, we propose a new EIOU metric for the VPS task, which helps better evaluate VPS quality in a more comprehensive way. Experimental results show the effectiveness of our proposed model and metric.
时间视频分割(Temporal Video Segmentation, TVS)是一项基本的视频理解任务,近年来得到了广泛的研究。TVS有两个子任务:视频动作分割(VAS)和视频过程分割(VPS): VAS的目的是识别视频内部发生的动作,而VPS的目的是将视频分割成一系列视频剪辑作为一个过程。VAS任务不可避免地依赖于预定义的动作标签,因此难以扩展到各种开放域视频。为了克服这一限制,VPS任务尝试将视频划分为几个类别无关的过程段。然而,VPS任务的现有数据集很小(2k个视频)并且缺乏多样性(只有烹饪领域)。为了解决这些问题,我们专门为VPS任务收集了一个名为TIPS的大型多样化数据集。TIPS包含63,000个视频,其中包括来自YouTube上教学视频的30多万个过程片段,涵盖了烹饪,健康,美容,育儿,园艺等大量操作领域。然后,我们提出了一个具有高斯边界检测(MT-GBD)模型的多模态变压器用于VPS,其中变压器的主干是卷积。此外,我们为VPS任务提出了一个新的EIOU度量,该度量有助于更全面地更好地评估VPS质量。实验结果表明了模型和度量的有效性。
{"title":"Learning Temporal Video Procedure Segmentation from an Automatically Collected Large Dataset","authors":"Lei Ji, Chenfei Wu, Daisy Zhou, Kun Yan, Edward Cui, Xilin Chen, Nan Duan","doi":"10.1109/WACV51458.2022.00279","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00279","url":null,"abstract":"Temporal Video Segmentation (TVS) is a fundamental video understanding task and has been widely researched in recent years. There are two subtasks of TVS: Video Action Segmentation (VAS) and Video Procedure Segmentation (VPS): VAS aims to recognize what actions happen in-side the video while VPS aims to segment the video into a sequence of video clips as a procedure. The VAS task inevitably relies on pre-defined action labels and is thus hard to scale to various open-domain videos. To overcome this limitation, the VPS task tries to divide a video into several category-independent procedure segments. However, the existing dataset for the VPS task is small (2k videos) and lacks diversity (only cooking domain). To tackle these problems, we collect a large and diverse dataset called TIPS, specifically for the VPS task. TIPS contains 63k videos including more than 300k procedure segments from instructional videos on YouTube, which covers plenty of how-to areas such as cooking, health, beauty, parenting, gardening, etc. We then propose a multi-modal Transformer with Gaussian Boundary Detection (MT-GBD) model for VPS, with the backbone of the Transformer and Convolution. Furthermore, we propose a new EIOU metric for the VPS task, which helps better evaluate VPS quality in a more comprehensive way. Experimental results show the effectiveness of our proposed model and metric.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125158199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement SporeAgent:增强场景级别的物体姿态优化合理性
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00027
Dominik Bauer, T. Patten, M. Vincze
Observational noise, inaccurate segmentation and ambiguity due to symmetry and occlusion lead to inaccurate object pose estimates. While depth- and RGB-based pose refinement approaches increase the accuracy of the resulting pose estimates, they are susceptible to ambiguity in the observation as they consider visual alignment. We propose to leverage the fact that we often observe static, rigid scenes. Thus, the objects therein need to be under physically plausible poses. We show that considering plausibility reduces ambiguity and, in consequence, allows poses to be more accurately predicted in cluttered environments. To this end, we extend a recent RL-based registration approach towards iterative refinement of object poses. Experiments on the LINEMOD and YCB-VIDEO datasets demonstrate the state-of-the-art performance of our depth-based refinement approach. Code is available at github.com/dornik/sporeagent.
观测噪声、不准确的分割和由于对称和遮挡造成的模糊导致不准确的目标姿态估计。虽然基于深度和rgb的姿态改进方法提高了最终姿态估计的准确性,但由于它们考虑视觉对齐,因此在观察中容易产生歧义。我们建议利用我们经常观察静态、刚性场景的事实。因此,其中的物体需要在物理上合理的姿势下。我们表明,考虑合理性可以减少模糊性,因此,可以在混乱的环境中更准确地预测姿势。为此,我们扩展了最近基于rl的配准方法,用于对象姿态的迭代细化。在LINEMOD和YCB-VIDEO数据集上的实验证明了我们基于深度的改进方法的最先进性能。代码可从github.com/dornik/sporeagent获得。
{"title":"SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement","authors":"Dominik Bauer, T. Patten, M. Vincze","doi":"10.1109/WACV51458.2022.00027","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00027","url":null,"abstract":"Observational noise, inaccurate segmentation and ambiguity due to symmetry and occlusion lead to inaccurate object pose estimates. While depth- and RGB-based pose refinement approaches increase the accuracy of the resulting pose estimates, they are susceptible to ambiguity in the observation as they consider visual alignment. We propose to leverage the fact that we often observe static, rigid scenes. Thus, the objects therein need to be under physically plausible poses. We show that considering plausibility reduces ambiguity and, in consequence, allows poses to be more accurately predicted in cluttered environments. To this end, we extend a recent RL-based registration approach towards iterative refinement of object poses. Experiments on the LINEMOD and YCB-VIDEO datasets demonstrate the state-of-the-art performance of our depth-based refinement approach. Code is available at github.com/dornik/sporeagent.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127230370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1