首页 > 最新文献

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
Efficient Counterfactual Debiasing for Visual Question Answering 视觉问答的有效反事实去偏
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00263
Camila Kolling, Martin D. Móre, Nathan Gavenski, E. Pooch, Otávio Parraga, Rodrigo C. Barros
Despite the success of neural architectures for Visual Question Answering (VQA), several recent studies have shown that VQA models are mostly driven by superficial correlations that are learned by exploiting undesired priors within training datasets. They often lack sufficient image grounding or tend to overly-rely on textual information, failing to capture knowledge from the images. This affects their generalization to test sets with slight changes in the distribution of facts. To address such an issue, some bias mitigation methods have relied on new training procedures that are capable of synthesizing counterfactual samples by masking critical objects within the images, and words within the questions, while also changing the corresponding ground truth. We propose a novel model-agnostic counterfactual training procedure, namely Efficient Counterfactual Debiasing (ECD), in which we introduce a new negative answer-assignment mechanism that exploits the probability distribution of the answers based on their frequencies, as well as an improved counterfactual sample synthesizer. Our experiments demonstrate that ECD is a simple, computationally-efficient counterfactual sample-synthesizer training procedure that establishes itself as the new state of the art for unbiased VQA.
尽管视觉问答(VQA)的神经架构取得了成功,但最近的一些研究表明,VQA模型主要是由表面相关性驱动的,这种相关性是通过利用训练数据集中不需要的先验来学习的。他们往往缺乏足够的图像基础或倾向于过度依赖文本信息,未能从图像中获取知识。这影响了它们对事实分布稍有变化的测试集的泛化。为了解决这一问题,一些减轻偏见的方法依赖于新的训练程序,这些程序能够通过掩盖图像中的关键物体和问题中的单词来合成反事实样本,同时也改变相应的基本事实。我们提出了一种新的模型无关的反事实训练过程,即高效反事实去偏见(ECD),其中我们引入了一种新的负面答案分配机制,该机制利用基于频率的答案概率分布,以及改进的反事实样本合成器。我们的实验表明,ECD是一种简单、计算效率高的反事实样本合成器训练程序,它将自己确立为无偏VQA的新技术。
{"title":"Efficient Counterfactual Debiasing for Visual Question Answering","authors":"Camila Kolling, Martin D. Móre, Nathan Gavenski, E. Pooch, Otávio Parraga, Rodrigo C. Barros","doi":"10.1109/WACV51458.2022.00263","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00263","url":null,"abstract":"Despite the success of neural architectures for Visual Question Answering (VQA), several recent studies have shown that VQA models are mostly driven by superficial correlations that are learned by exploiting undesired priors within training datasets. They often lack sufficient image grounding or tend to overly-rely on textual information, failing to capture knowledge from the images. This affects their generalization to test sets with slight changes in the distribution of facts. To address such an issue, some bias mitigation methods have relied on new training procedures that are capable of synthesizing counterfactual samples by masking critical objects within the images, and words within the questions, while also changing the corresponding ground truth. We propose a novel model-agnostic counterfactual training procedure, namely Efficient Counterfactual Debiasing (ECD), in which we introduce a new negative answer-assignment mechanism that exploits the probability distribution of the answers based on their frequencies, as well as an improved counterfactual sample synthesizer. Our experiments demonstrate that ECD is a simple, computationally-efficient counterfactual sample-synthesizer training procedure that establishes itself as the new state of the art for unbiased VQA.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114830917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
On Black-Box Explanation for Face Verification 论人脸验证的黑匣子解释
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00126
D. Mery, Bernardita Morris
Given a facial matcher, in explainable face verification, the task is to answer: how relevant are the parts of a probe image to establish the matching with an enrolled image. In many cases, however, the trained models cannot be manipulated and must be treated as "black-boxes". In this paper, we present six different saliency maps that can be used to explain any face verification algorithm with no manipulation inside of the face recognition model. The key idea of the methods is based on how the matching score of the two face images changes when the probe is perturbed. The proposed methods remove and aggregate different parts of the face, and measure contributions of these parts individually and in-collaboration as well. We test and compare our proposed methods in three different scenarios: synthetic images with different qualities and occlusions, real face images with different facial expressions, poses, and occlusions and faces from different demographic groups. In our experiments, five different face verification algorithms are used: ArcFace, Dlib, FaceNet (trained on VGGface2 and CasiaWebFace), and LBP. We conclude that one of the proposed methods achieves saliency maps that are stable and interpretable to humans. In addition, our method, in combination with a new visualization of saliency maps based on contours, shows promising results in comparison with other state-of-the-art art methods. This paper presents good insights into any face verification algorithm, in which it can be clearly appreciated which are the most relevant face areas that an algorithm takes into account to carry out the recognition process.
给定一个人脸匹配器,在可解释的人脸验证中,任务是回答:探测图像的各个部分与注册图像建立匹配的相关性如何。然而,在许多情况下,训练好的模型不能被操纵,必须被视为“黑盒”。在本文中,我们提出了六种不同的显著性图,可用于解释任何面部验证算法,而无需在面部识别模型内部进行操作。该方法的核心思想是基于探针受到干扰时两幅人脸图像匹配分数的变化。提出的方法对人脸的不同部分进行去除和聚合,并对这些部分的贡献进行单独和协同度量。我们在三种不同的场景下测试和比较了我们提出的方法:具有不同质量和遮挡的合成图像,具有不同面部表情、姿势的真实人脸图像,以及来自不同人口群体的遮挡和人脸。在我们的实验中,使用了五种不同的人脸验证算法:ArcFace, Dlib, FaceNet(在VGGface2和CasiaWebFace上训练)和LBP。我们得出的结论是,提出的方法之一实现显著性地图是稳定的和可解释的人类。此外,与其他最先进的艺术方法相比,我们的方法与基于等高线的显着性地图的新可视化相结合,显示出有希望的结果。本文对任何人脸验证算法都提出了很好的见解,可以清楚地了解算法在进行识别过程中考虑的最相关的人脸区域。
{"title":"On Black-Box Explanation for Face Verification","authors":"D. Mery, Bernardita Morris","doi":"10.1109/WACV51458.2022.00126","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00126","url":null,"abstract":"Given a facial matcher, in explainable face verification, the task is to answer: how relevant are the parts of a probe image to establish the matching with an enrolled image. In many cases, however, the trained models cannot be manipulated and must be treated as \"black-boxes\". In this paper, we present six different saliency maps that can be used to explain any face verification algorithm with no manipulation inside of the face recognition model. The key idea of the methods is based on how the matching score of the two face images changes when the probe is perturbed. The proposed methods remove and aggregate different parts of the face, and measure contributions of these parts individually and in-collaboration as well. We test and compare our proposed methods in three different scenarios: synthetic images with different qualities and occlusions, real face images with different facial expressions, poses, and occlusions and faces from different demographic groups. In our experiments, five different face verification algorithms are used: ArcFace, Dlib, FaceNet (trained on VGGface2 and CasiaWebFace), and LBP. We conclude that one of the proposed methods achieves saliency maps that are stable and interpretable to humans. In addition, our method, in combination with a new visualization of saliency maps based on contours, shows promising results in comparison with other state-of-the-art art methods. This paper presents good insights into any face verification algorithm, in which it can be clearly appreciated which are the most relevant face areas that an algorithm takes into account to carry out the recognition process.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134150962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
3D Modeling Beneath Ground: Plant Root Detection and Reconstruction Based on Ground-Penetrating Radar 地下三维建模:基于探地雷达的植物根系检测与重建
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00077
Yawen Lu, G. Lu
3D object reconstruction based on deep neural networks has been gaining attention in recent years. However, recovering 3D shapes of hidden and buried objects remains to be a challenge. Ground Penetrating Radar (GPR) is among the most powerful and widely used instruments for detecting and locating underground objects such as plant roots and pipes, with affordable prices and continually evolving technology. This paper first proposes a deep convolution neural network-based anchor-free GPR curve signal detection net- work utilizing B-scans from a GPR sensor. The detection results can help obtain precisely fitted parabola curves. Furthermore, a graph neural network-based root shape reconstruction network is designated in order to progressively recover major taproot and then fine root branches’ geometry. Our results on the gprMax simulated root data as well as the real-world GPR data collected from apple orchards demonstrate the potential of using the proposed framework as a new approach for fine-grained underground object shape reconstruction in a non-destructive way.
近年来,基于深度神经网络的三维物体重建技术越来越受到人们的关注。然而,恢复隐藏和埋藏物体的3D形状仍然是一个挑战。探地雷达(GPR)是用于探测和定位地下物体(如植物根和管道)的最强大和最广泛使用的仪器之一,价格合理,技术不断发展。本文首先提出了一种基于深度卷积神经网络的无锚探地雷达曲线信号检测网络——利用探地雷达传感器的b扫描。检测结果有助于获得精确拟合的抛物线曲线。在此基础上,设计了基于图神经网络的根形重建网络,逐步恢复主根和细根的几何形状。我们在gprMax模拟根数据以及从苹果园收集的真实GPR数据上的结果表明,使用所提出的框架作为一种新的方法,可以以非破坏性的方式进行细粒度地下物体形状重建。
{"title":"3D Modeling Beneath Ground: Plant Root Detection and Reconstruction Based on Ground-Penetrating Radar","authors":"Yawen Lu, G. Lu","doi":"10.1109/WACV51458.2022.00077","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00077","url":null,"abstract":"3D object reconstruction based on deep neural networks has been gaining attention in recent years. However, recovering 3D shapes of hidden and buried objects remains to be a challenge. Ground Penetrating Radar (GPR) is among the most powerful and widely used instruments for detecting and locating underground objects such as plant roots and pipes, with affordable prices and continually evolving technology. This paper first proposes a deep convolution neural network-based anchor-free GPR curve signal detection net- work utilizing B-scans from a GPR sensor. The detection results can help obtain precisely fitted parabola curves. Furthermore, a graph neural network-based root shape reconstruction network is designated in order to progressively recover major taproot and then fine root branches’ geometry. Our results on the gprMax simulated root data as well as the real-world GPR data collected from apple orchards demonstrate the potential of using the proposed framework as a new approach for fine-grained underground object shape reconstruction in a non-destructive way.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134034696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Detecting Tear Gas Canisters With Limited Training Data 利用有限的训练数据探测催泪瓦斯罐
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00135
Ashwin D. D’Cruz, Christopher Tegho, Sean Greaves, Lachlan Kermode
Human rights investigations often entail triaging large volumes of open source images and video in order to find moments that are relevant to a given investigation and warrant further inspection. Searching for instances of tear gas usage online manually is laborious and time-consuming. In this paper, we study various object detection models for their potential use in the discovery and identification of tear gas canisters for human rights monitors. CNN based object detection typically requires large volumes of training data, and prior to our work, an appropriate dataset of tear gas canisters did not exist. We benchmark methods for training object detectors using limited labelled data: we fine-tune different object detection models on the limited labelled data and compare performance to a few shot detector and augmentation strategies using synthetic data. We provide a dataset for evaluating and training tear gas canister detectors and indicate how such detectors can be deployed in real-world contexts for investigating human rights violations. Our experiments show that various techniques can improve results, including fine-tuning state of the art detectors, using few shot detectors, and including synthetic data as part of the training set.
人权调查通常需要对大量开源图像和视频进行分类,以便找到与特定调查相关并需要进一步检查的时刻。在网上手动搜索使用催泪瓦斯的实例既费力又耗时。在本文中,我们研究了各种目标检测模型,用于人权监测员发现和识别催泪瓦斯罐的潜在用途。基于CNN的目标检测通常需要大量的训练数据,在我们的工作之前,没有合适的催泪瓦斯罐数据集。我们使用有限的标记数据对训练目标检测器的方法进行基准测试:我们在有限的标记数据上微调不同的目标检测模型,并使用合成数据将性能与几个镜头检测器和增强策略进行比较。我们提供了一个评估和训练催泪瓦斯罐探测器的数据集,并指出如何在现实环境中部署这种探测器,以调查侵犯人权的行为。我们的实验表明,各种技术可以改善结果,包括微调最先进的检测器状态,使用少量的射击检测器,以及将合成数据作为训练集的一部分。
{"title":"Detecting Tear Gas Canisters With Limited Training Data","authors":"Ashwin D. D’Cruz, Christopher Tegho, Sean Greaves, Lachlan Kermode","doi":"10.1109/WACV51458.2022.00135","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00135","url":null,"abstract":"Human rights investigations often entail triaging large volumes of open source images and video in order to find moments that are relevant to a given investigation and warrant further inspection. Searching for instances of tear gas usage online manually is laborious and time-consuming. In this paper, we study various object detection models for their potential use in the discovery and identification of tear gas canisters for human rights monitors. CNN based object detection typically requires large volumes of training data, and prior to our work, an appropriate dataset of tear gas canisters did not exist. We benchmark methods for training object detectors using limited labelled data: we fine-tune different object detection models on the limited labelled data and compare performance to a few shot detector and augmentation strategies using synthetic data. We provide a dataset for evaluating and training tear gas canister detectors and indicate how such detectors can be deployed in real-world contexts for investigating human rights violations. Our experiments show that various techniques can improve results, including fine-tuning state of the art detectors, using few shot detectors, and including synthetic data as part of the training set.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133648026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extracting Vignetting and Grain Filter Effects from Photos 从照片中提取渐晕和颗粒过滤效果
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00013
A. Abdelhamed, Jonghwa Yim, Abhijith Punnappurath, Michael S. Brown, Jihwan Choe, Kihwan Kim
Most smartphones support the use of real-time camera filters to impart visual effects to captured images. Currently, such filters come preinstalled on-device or need to be downloaded and installed before use (e.g., Instagram filters). Recent work [24] proposed a method to extract a camera filter directly from an example photo that has already had a filter applied. The work in [24] focused only on the color and tonal aspects of the underlying filter. In this paper, we introduce a method to extract two spatially varying effects commonly used by on-device camera filters—namely, image vignetting and image grain. Specifically, we show how to extract the parameters for vignetting and image grain present in an example image and replicate these effects as an on-device filter. We use lightweight CNNs to estimate the filter parameters and employ efficient techniques—isotropic Gaussian filters and simplex noise—for regenerating the filters. Our design achieves a reasonable trade-off between efficiency and realism. We show that our method can extract vignetting and image grain filters from stylized photos and replicate the filters on captured images more faithfully, as compared to color and style transfer methods. Our method is significantly efficient and has been already deployed to millions of flagship smartphones.
大多数智能手机都支持使用实时相机滤镜,为拍摄的图像赋予视觉效果。目前,这样的过滤器是预装在设备上的,或者需要在使用前下载并安装(例如,Instagram过滤器)。最近的研究[24]提出了一种直接从已经应用了滤镜的示例照片中提取相机滤镜的方法。[24]中的工作只关注底层滤波器的颜色和色调方面。本文介绍了一种提取设备上相机滤波器常用的两种空间变化效果的方法,即图像渐晕和图像颗粒。具体来说,我们展示了如何提取示例图像中存在的渐晕和图像颗粒的参数,并将这些效果复制为设备上的滤波器。我们使用轻量级的cnn来估计滤波器参数,并使用有效的技术-各向同性高斯滤波器和单纯形噪声-来重新生成滤波器。我们的设计在效率和现实主义之间实现了合理的权衡。我们表明,与颜色和风格转移方法相比,我们的方法可以从风格化的照片中提取渐晕和图像颗粒过滤器,并在捕获的图像上更忠实地复制过滤器。我们的方法非常高效,已经在数百万部旗舰智能手机上得到了应用。
{"title":"Extracting Vignetting and Grain Filter Effects from Photos","authors":"A. Abdelhamed, Jonghwa Yim, Abhijith Punnappurath, Michael S. Brown, Jihwan Choe, Kihwan Kim","doi":"10.1109/WACV51458.2022.00013","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00013","url":null,"abstract":"Most smartphones support the use of real-time camera filters to impart visual effects to captured images. Currently, such filters come preinstalled on-device or need to be downloaded and installed before use (e.g., Instagram filters). Recent work [24] proposed a method to extract a camera filter directly from an example photo that has already had a filter applied. The work in [24] focused only on the color and tonal aspects of the underlying filter. In this paper, we introduce a method to extract two spatially varying effects commonly used by on-device camera filters—namely, image vignetting and image grain. Specifically, we show how to extract the parameters for vignetting and image grain present in an example image and replicate these effects as an on-device filter. We use lightweight CNNs to estimate the filter parameters and employ efficient techniques—isotropic Gaussian filters and simplex noise—for regenerating the filters. Our design achieves a reasonable trade-off between efficiency and realism. We show that our method can extract vignetting and image grain filters from stylized photos and replicate the filters on captured images more faithfully, as compared to color and style transfer methods. Our method is significantly efficient and has been already deployed to millions of flagship smartphones.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124915304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MoESR: Blind Super-Resolution using Kernel-Aware Mixture of Experts MoESR:使用核感知混合专家的盲超分辨率
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00406
Mohammad Emad, Maurice Peemen, H. Corporaal
Modern deep learning super-resolution approaches have achieved remarkable performance where the low-resolution (LR) input is a degraded high-resolution (HR) image by a fixed known kernel i.e. kernel-specific super-resolution (SR). However, real images often vary in their degradation kernels, thus a single kernel-specific SR approach does not often produce accurate HR results. Recently, degradation-aware networks are introduced to generate blind SR results for unknown kernel conditions. They can restore images for multiple blur kernels. However, they have to compromise in quality compared to their kernel-specific counterparts. To address this issue, we propose a novel blind SR method called Mixture of Experts Super-Resolution (MoESR), which uses different experts for different degradation kernels. A broad space of degradation kernels is covered by kernel-specific SR networks (experts). We present an accurate kernel prediction method (gating mechanism) by evaluating the sharpness of images generated by experts. Based on the predicted kernel, our most suited expert network is selected for the input image. Finally, we fine-tune the selected network on the test image itself to leverage the advantage of internal learning. Our experimental results on standard synthetic datasets and real images demonstrate that MoESR outperforms state-of-the-art methods both quantitatively and qualitatively. Especially for the challenging ×4 SR task, our PSNR improvement of 0.93 dB on the DIV2KRK dataset is substantial1.
现代深度学习超分辨率方法已经取得了显著的性能,其中低分辨率(LR)输入是由固定的已知内核(即特定于内核的超分辨率(SR))退化的高分辨率(HR)图像。然而,真实图像的退化核通常不同,因此单一核特定的SR方法通常不能产生准确的HR结果。最近,引入了退化感知网络来生成未知核条件下的盲SR结果。他们可以为多个模糊核恢复图像。然而,与特定于内核的对应程序相比,它们必须在质量上做出妥协。为了解决这个问题,我们提出了一种新的盲SR方法,称为混合专家超分辨率(MoESR),该方法使用不同的专家来处理不同的退化核。特定于核的SR网络覆盖了广泛的退化核空间(专家)。我们通过评估专家生成的图像的清晰度,提出了一种精确的核预测方法(门控机制)。在预测核的基础上,选择最适合的专家网络作为输入图像。最后,我们在测试图像本身上微调选择的网络,以利用内部学习的优势。我们在标准合成数据集和真实图像上的实验结果表明,MoESR在定量和定性上都优于最先进的方法。特别是对于具有挑战性的×4 SR任务,我们在DIV2KRK数据集上的PSNR提高了0.93 dB,这是实质性的1。
{"title":"MoESR: Blind Super-Resolution using Kernel-Aware Mixture of Experts","authors":"Mohammad Emad, Maurice Peemen, H. Corporaal","doi":"10.1109/WACV51458.2022.00406","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00406","url":null,"abstract":"Modern deep learning super-resolution approaches have achieved remarkable performance where the low-resolution (LR) input is a degraded high-resolution (HR) image by a fixed known kernel i.e. kernel-specific super-resolution (SR). However, real images often vary in their degradation kernels, thus a single kernel-specific SR approach does not often produce accurate HR results. Recently, degradation-aware networks are introduced to generate blind SR results for unknown kernel conditions. They can restore images for multiple blur kernels. However, they have to compromise in quality compared to their kernel-specific counterparts. To address this issue, we propose a novel blind SR method called Mixture of Experts Super-Resolution (MoESR), which uses different experts for different degradation kernels. A broad space of degradation kernels is covered by kernel-specific SR networks (experts). We present an accurate kernel prediction method (gating mechanism) by evaluating the sharpness of images generated by experts. Based on the predicted kernel, our most suited expert network is selected for the input image. Finally, we fine-tune the selected network on the test image itself to leverage the advantage of internal learning. Our experimental results on standard synthetic datasets and real images demonstrate that MoESR outperforms state-of-the-art methods both quantitatively and qualitatively. Especially for the challenging ×4 SR task, our PSNR improvement of 0.93 dB on the DIV2KRK dataset is substantial1.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130423645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Latent to Latent: A Learned Mapper for Identity Preserving Editing of Multiple Face Attributes in StyleGAN-generated Images 隐到隐:stylegan生成图像中多人脸属性身份保持编辑的学习映射器
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00373
Siavash Khodadadeh, S. Ghadar, Saeid Motiian, Wei-An Lin, Ladislau Bölöni, R. Kalarot
Several recent papers introduced techniques to adjust the attributes of human faces generated by unconditional GANs such as StyleGAN. Despite efforts to disentangle the attributes, a request to change one attribute often triggers unwanted changes to other attributes as well. More importantly, in some cases, a human observer would not recognize the edited face to belong to the same person. We propose an approach where a neural network takes as input the latent encoding of a face and the desired attribute changes and outputs the latent space encoding of the edited image. The network is trained offline using unsupervised data, with training labels generated by an off-the-shelf attribute classifier. The desired attribute changes and conservation laws, such as identity maintenance, are encoded in the training loss. The number of attributes the mapper can simultaneously modify is only limited by the attributes available to the classifier – we trained a network that handles 35 attributes, more than any previous approach. As no optimization is performed at deployment time, the computation time is negligible, allowing real-time attribute editing. Qualitative and quantitative comparisons with the current state-of-the-art show our method is better at conserving the identity of the face and restricting changes to the requested attributes.
最近的几篇论文介绍了调整无条件gan(如StyleGAN)生成的人脸属性的技术。尽管努力将属性分开,但更改一个属性的请求通常也会触发对其他属性的不必要更改。更重要的是,在某些情况下,人类观察者不会认出编辑过的脸属于同一个人。我们提出了一种方法,神经网络以人脸的潜在编码和期望的属性变化作为输入,输出编辑后图像的潜在空间编码。该网络使用无监督数据进行离线训练,训练标签由现成的属性分类器生成。期望的属性变化和守恒定律,如恒等维护,被编码在训练损失中。映射器可以同时修改的属性数量仅受分类器可用属性的限制——我们训练了一个处理35个属性的网络,比以前的任何方法都多。由于在部署时没有执行任何优化,因此计算时间可以忽略不计,从而允许进行实时属性编辑。与当前最先进的定性和定量比较表明,我们的方法更好地保存了人脸的身份并限制了对所请求属性的更改。
{"title":"Latent to Latent: A Learned Mapper for Identity Preserving Editing of Multiple Face Attributes in StyleGAN-generated Images","authors":"Siavash Khodadadeh, S. Ghadar, Saeid Motiian, Wei-An Lin, Ladislau Bölöni, R. Kalarot","doi":"10.1109/WACV51458.2022.00373","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00373","url":null,"abstract":"Several recent papers introduced techniques to adjust the attributes of human faces generated by unconditional GANs such as StyleGAN. Despite efforts to disentangle the attributes, a request to change one attribute often triggers unwanted changes to other attributes as well. More importantly, in some cases, a human observer would not recognize the edited face to belong to the same person. We propose an approach where a neural network takes as input the latent encoding of a face and the desired attribute changes and outputs the latent space encoding of the edited image. The network is trained offline using unsupervised data, with training labels generated by an off-the-shelf attribute classifier. The desired attribute changes and conservation laws, such as identity maintenance, are encoded in the training loss. The number of attributes the mapper can simultaneously modify is only limited by the attributes available to the classifier – we trained a network that handles 35 attributes, more than any previous approach. As no optimization is performed at deployment time, the computation time is negligible, allowing real-time attribute editing. Qualitative and quantitative comparisons with the current state-of-the-art show our method is better at conserving the identity of the face and restricting changes to the requested attributes.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129867628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
SIGNAV: Semantically-Informed GPS-Denied Navigation and Mapping in Visually-Degraded Environments SIGNAV:视觉退化环境中语义信息的gps拒绝导航和制图
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00192
Alex Krasner, Mikhail Sizintsev, Abhinav Rajvanshi, Han-Pang Chiu, Niluthpol Chowdhury Mithun, Kevin Kaighn, Philip Miller, R. Villamil, S. Samarasekera
Understanding the perceived scene during navigation enables intelligent robot behaviors. Current vision-based semantic SLAM (Simultaneous Localization and Mapping) systems provide these capabilities. However, their performance decreases in visually-degraded environments, that are common places for critical robotic applications, such as search and rescue missions. In this paper, we present SIGNAV, a real-time semantic SLAM system to operate in perceptually-challenging situations. To improve the robustness for navigation in dark environments, SIGNAV leverages a multi-sensor navigation architecture to fuse vision with additional sensing modalities, including an inertial measurement unit (IMU), LiDAR, and wheel odometry. A new 2.5D semantic segmentation method is also developed to combine both images and LiDAR depth maps to generate semantic labels of 3D mapped points in real time. We demonstrate that the navigation accuracy from SIGNAV in a variety of indoor environments under both normal lighting and dark conditions. SIGNAV also provides semantic scene understanding capabilities in visually-degraded environments. We also show the benefits of semantic information to SIGNAV’s performance.
在导航过程中理解感知到的场景可以实现智能机器人的行为。当前基于视觉的语义SLAM(同步定位和映射)系统提供了这些功能。然而,它们的性能在视觉退化的环境中下降,这是关键机器人应用的常见场所,如搜索和救援任务。在本文中,我们提出了SIGNAV,这是一个实时语义SLAM系统,可以在感知挑战的情况下运行。为了提高在黑暗环境中导航的鲁棒性,SIGNAV利用多传感器导航架构将视觉与其他传感模式融合在一起,包括惯性测量单元(IMU)、激光雷达和车轮里程计。提出了一种新的2.5D语义分割方法,将图像和激光雷达深度图结合起来,实时生成三维地图点的语义标签。我们证明了SIGNAV在各种室内环境下在正常光照和黑暗条件下的导航精度。SIGNAV还在视觉退化的环境中提供语义场景理解能力。我们还展示了语义信息对SIGNAV性能的好处。
{"title":"SIGNAV: Semantically-Informed GPS-Denied Navigation and Mapping in Visually-Degraded Environments","authors":"Alex Krasner, Mikhail Sizintsev, Abhinav Rajvanshi, Han-Pang Chiu, Niluthpol Chowdhury Mithun, Kevin Kaighn, Philip Miller, R. Villamil, S. Samarasekera","doi":"10.1109/WACV51458.2022.00192","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00192","url":null,"abstract":"Understanding the perceived scene during navigation enables intelligent robot behaviors. Current vision-based semantic SLAM (Simultaneous Localization and Mapping) systems provide these capabilities. However, their performance decreases in visually-degraded environments, that are common places for critical robotic applications, such as search and rescue missions. In this paper, we present SIGNAV, a real-time semantic SLAM system to operate in perceptually-challenging situations. To improve the robustness for navigation in dark environments, SIGNAV leverages a multi-sensor navigation architecture to fuse vision with additional sensing modalities, including an inertial measurement unit (IMU), LiDAR, and wheel odometry. A new 2.5D semantic segmentation method is also developed to combine both images and LiDAR depth maps to generate semantic labels of 3D mapped points in real time. We demonstrate that the navigation accuracy from SIGNAV in a variety of indoor environments under both normal lighting and dark conditions. SIGNAV also provides semantic scene understanding capabilities in visually-degraded environments. We also show the benefits of semantic information to SIGNAV’s performance.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126349083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Information Bottlenecked Variational Autoencoder for Disentangled 3D Facial Expression Modelling 信息瓶颈的变分自编码器解纠缠三维面部表情建模
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00239
Hao Sun, Nick E. Pears, Yajie Gu
Learning a disentangled representation is essential to build 3D face models that accurately capture identity and expression. We propose a novel variational autoencoder (VAE) framework to disentangle identity and expression from 3D input faces that have a wide variety of expressions. Specifically, we design a system that has two decoders: one for neutral-expression faces (i.e. identity-only faces) and one for the original (expressive) input faces respectively. Crucially, we have an additional mutual-information regulariser applied on the identity part to solve the issue of imbalanced information over the expressive input faces and the reconstructed neutral faces. Our evaluations on two public datasets (CoMA and BU-3DFE) show that this model achieves competitive results on the 3D face reconstruction task and state-of-the-art results on identity-expression disentanglement. We also show that by updating to a conditional VAE, we have a system that generates different levels of expressions from semantically meaningful variables.
学习一种解纠缠的表示对于建立准确捕捉身份和表情的3D面部模型至关重要。我们提出了一种新的变分自编码器(VAE)框架,用于从具有多种表情的3D输入人脸中分离身份和表情。具体来说,我们设计了一个具有两个解码器的系统:一个用于中性表情面孔(即只有身份的面孔),另一个用于原始(表达)输入面孔。至关重要的是,我们在身份部分应用了一个额外的互信息正则器,以解决表达性输入人脸和重建的中性人脸之间的信息不平衡问题。我们对两个公共数据集(CoMA和BU-3DFE)的评估表明,该模型在3D人脸重建任务上取得了相当好的结果,在身份-表情解纠缠方面取得了最先进的结果。我们还表明,通过更新到条件VAE,我们有一个从语义上有意义的变量生成不同级别表达式的系统。
{"title":"Information Bottlenecked Variational Autoencoder for Disentangled 3D Facial Expression Modelling","authors":"Hao Sun, Nick E. Pears, Yajie Gu","doi":"10.1109/WACV51458.2022.00239","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00239","url":null,"abstract":"Learning a disentangled representation is essential to build 3D face models that accurately capture identity and expression. We propose a novel variational autoencoder (VAE) framework to disentangle identity and expression from 3D input faces that have a wide variety of expressions. Specifically, we design a system that has two decoders: one for neutral-expression faces (i.e. identity-only faces) and one for the original (expressive) input faces respectively. Crucially, we have an additional mutual-information regulariser applied on the identity part to solve the issue of imbalanced information over the expressive input faces and the reconstructed neutral faces. Our evaluations on two public datasets (CoMA and BU-3DFE) show that this model achieves competitive results on the 3D face reconstruction task and state-of-the-art results on identity-expression disentanglement. We also show that by updating to a conditional VAE, we have a system that generates different levels of expressions from semantically meaningful variables.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124056310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
To miss-attend is to misalign! Residual Self-Attentive Feature Alignment for Adapting Object Detectors 不专心就是错位!自适应目标检测器的残差自关注特征对齐
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00045
Vaishnavi Khindkar, Chetan Arora, V. Balasubramanian, A. Subramanian, Rohit Saluja, C.V. Jawahar
Advancements in adaptive object detection can lead to tremendous improvements in applications like autonomous navigation, as they alleviate the distributional shifts along the detection pipeline. Prior works adopt adversarial learning to align image features at global and local levels, yet the instance-specific misalignment persists. Also, adaptive object detection remains challenging due to visual diversity in background scenes and intricate combinations of objects. Motivated by structural importance, we aim to attend prominent instance-specific regions, overcoming the feature misalignment issue. We propose a novel resIduaL seLf-attentive featUre alignMEnt (ILLUME) method for adaptive object detection. ILLUME comprises Self-Attention Feature Map (SAFM) module that enhances structural attention to object-related regions and thereby generates domain invariant features. Our approach significantly reduces the domain distance with the improved feature alignment of the instances. Qualitative results demonstrate the ability of ILLUME to attend important object instances required for alignment. Experimental results on several benchmark datasets show that our method outperforms the existing state-of-the-art approaches.
自适应目标检测的进步可以极大地改善自动导航等应用,因为它们减轻了检测管道沿线的分布变化。先前的工作采用对抗学习在全局和局部级别对齐图像特征,但特定实例的不对齐仍然存在。此外,由于背景场景的视觉多样性和物体的复杂组合,自适应目标检测仍然具有挑战性。在结构重要性的激励下,我们的目标是关注突出的实例特定区域,克服特征不对齐问题。提出了一种新的残差自关注特征对齐(ILLUME)自适应目标检测方法。ILLUME包含自关注特征映射(SAFM)模块,该模块增强了对对象相关区域的结构关注,从而生成域不变特征。我们的方法通过改进实例的特征对齐,显著减少了域距离。定性结果证明了ILLUME能够参加对齐所需的重要对象实例。在几个基准数据集上的实验结果表明,我们的方法优于现有的最先进的方法。
{"title":"To miss-attend is to misalign! Residual Self-Attentive Feature Alignment for Adapting Object Detectors","authors":"Vaishnavi Khindkar, Chetan Arora, V. Balasubramanian, A. Subramanian, Rohit Saluja, C.V. Jawahar","doi":"10.1109/WACV51458.2022.00045","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00045","url":null,"abstract":"Advancements in adaptive object detection can lead to tremendous improvements in applications like autonomous navigation, as they alleviate the distributional shifts along the detection pipeline. Prior works adopt adversarial learning to align image features at global and local levels, yet the instance-specific misalignment persists. Also, adaptive object detection remains challenging due to visual diversity in background scenes and intricate combinations of objects. Motivated by structural importance, we aim to attend prominent instance-specific regions, overcoming the feature misalignment issue. We propose a novel resIduaL seLf-attentive featUre alignMEnt (ILLUME) method for adaptive object detection. ILLUME comprises Self-Attention Feature Map (SAFM) module that enhances structural attention to object-related regions and thereby generates domain invariant features. Our approach significantly reduces the domain distance with the improved feature alignment of the instances. Qualitative results demonstrate the ability of ILLUME to attend important object instances required for alignment. Experimental results on several benchmark datasets show that our method outperforms the existing state-of-the-art approaches.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"22 6S 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115946073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1