首页 > 最新文献

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
Robust and Efficient Alignment of Calcium Imaging Data through Simultaneous Low Rank and Sparse Decomposition 基于低秩和稀疏同时分解的钙成像数据鲁棒高效对齐
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00198
Junmo Cho, Seungjae Han, Eun-Seo Cho, Kijung Shin, Young-Gyu Yoon
Accurate alignment of calcium imaging data, which is critical for the extraction of neuronal activity signals, is often hindered by the image noise and the neuronal activity itself. To address the problem, we propose an algorithm named REALS for robust and efficient batch image alignment through simultaneous transformation and low rank and sparse decomposition. REALS is constructed upon our finding that the low rank subspace can be recovered via linear projection, which allows us to perform simultaneous image alignment and decomposition with gradient-based updates. REALS achieves orders-of-magnitude improvement in terms of accuracy and speed compared to the state-of-the-art robust image alignment algorithms.
钙离子成像数据的精确对齐是提取神经元活动信号的关键,但常常受到图像噪声和神经元活动本身的阻碍。为了解决这个问题,我们提出了一种名为REALS的算法,通过同时变换和低秩稀疏分解来实现鲁棒高效的批量图像对齐。REALS是在我们发现低秩子空间可以通过线性投影恢复的基础上构建的,这允许我们同时使用基于梯度的更新执行图像对齐和分解。与最先进的鲁棒图像对齐算法相比,REALS在精度和速度方面实现了数量级的改进。
{"title":"Robust and Efficient Alignment of Calcium Imaging Data through Simultaneous Low Rank and Sparse Decomposition","authors":"Junmo Cho, Seungjae Han, Eun-Seo Cho, Kijung Shin, Young-Gyu Yoon","doi":"10.1109/WACV56688.2023.00198","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00198","url":null,"abstract":"Accurate alignment of calcium imaging data, which is critical for the extraction of neuronal activity signals, is often hindered by the image noise and the neuronal activity itself. To address the problem, we propose an algorithm named REALS for robust and efficient batch image alignment through simultaneous transformation and low rank and sparse decomposition. REALS is constructed upon our finding that the low rank subspace can be recovered via linear projection, which allows us to perform simultaneous image alignment and decomposition with gradient-based updates. REALS achieves orders-of-magnitude improvement in terms of accuracy and speed compared to the state-of-the-art robust image alignment algorithms.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116706308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Representation Disentanglement in Generative Models with Contrastive Learning 基于对比学习的生成模型中的表示解缠
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00158
Shentong Mo, Zhun Sun, Chao Li
Contrastive learning has shown its effectiveness in image classification and generation. Recent works apply contrastive learning to the discriminator of the Generative Adversarial Networks. However, there is little work exploring if contrastive learning can be applied to the encoderdecoder structure to learn disentangled representations. In this work, we propose a simple yet effective method via incorporating contrastive learning into latent optimization, where we name it ContraLORD. Specifically, we first use a generator to learn discriminative and disentangled embeddings via latent optimization. Then an encoder and two momentum encoders are applied to dynamically learn disentangled information across a large number of samples with content-level and residual-level contrastive loss. In the meanwhile, we tune the encoder with the learned embeddings in an amortized manner. We evaluate our approach on ten benchmarks regarding representation disentanglement and linear classification. Extensive experiments demonstrate the effectiveness of our ContraLORD on learning both discriminative and generative representations.
对比学习在图像分类和生成中已显示出其有效性。最近的工作将对比学习应用于生成对抗网络的鉴别器。然而,关于对比学习是否可以应用于编码器-解码器结构来学习解纠缠表征的研究很少。在这项工作中,我们提出了一种简单而有效的方法,通过将对比学习纳入潜在优化,我们将其命名为ContraLORD。具体来说,我们首先使用生成器通过潜在优化来学习判别和解纠缠嵌入。然后利用一个编码器和两个动量编码器动态学习具有内容级和残差级对比损失的大量样本中的解纠缠信息。同时,我们用学习到的嵌入以平摊的方式调整编码器。我们在关于表示解纠缠和线性分类的十个基准上评估我们的方法。大量的实验证明了我们的ContraLORD在学习判别表示和生成表示方面的有效性。
{"title":"Representation Disentanglement in Generative Models with Contrastive Learning","authors":"Shentong Mo, Zhun Sun, Chao Li","doi":"10.1109/WACV56688.2023.00158","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00158","url":null,"abstract":"Contrastive learning has shown its effectiveness in image classification and generation. Recent works apply contrastive learning to the discriminator of the Generative Adversarial Networks. However, there is little work exploring if contrastive learning can be applied to the encoderdecoder structure to learn disentangled representations. In this work, we propose a simple yet effective method via incorporating contrastive learning into latent optimization, where we name it ContraLORD. Specifically, we first use a generator to learn discriminative and disentangled embeddings via latent optimization. Then an encoder and two momentum encoders are applied to dynamically learn disentangled information across a large number of samples with content-level and residual-level contrastive loss. In the meanwhile, we tune the encoder with the learned embeddings in an amortized manner. We evaluate our approach on ten benchmarks regarding representation disentanglement and linear classification. Extensive experiments demonstrate the effectiveness of our ContraLORD on learning both discriminative and generative representations.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115422818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
SHARDS: Efficient SHAdow Removal using Dual Stage Network for High-Resolution Images 碎片:高效的阴影去除使用双阶段网络的高分辨率图像
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00185
Mrinmoy Sen, Sai Pradyumna Chermala, Nazrinbanu Nurmohammad Nagori, V. Peddigari, Praful Mathur, B. H. P. Prasad, Moonsik Jeong
Shadow Removal is an important and widely researched topic in computer vision. Recent advances in deep learning have resulted in addressing this problem by using convolutional neural networks (CNNs) similar to other vision tasks. But these existing works are limited to low-resolution images. Furthermore, the existing methods rely on heavy network architectures which cannot be deployed on resource-constrained platforms like smartphones. In this paper, we propose SHARDS, a shadow removal method for high-resolution images. The proposed method solves shadow removal for high-resolution images in two stages using two lightweight networks: a Low-resolution Shadow Removal Network (LSRNet) followed by a Detail Refinement Network (DRNet). LSRNet operates at low-resolution and computes a low-resolution, shadow-free output. It achieves state-of-the-art results on standard datasets with 65x lesser network parameters than existing methods. This is followed by DRNet, which is tasked to refine the low-resolution output to a high-resolution output using the high-resolution input shadow image as guidance. We construct high-resolution shadow removal datasets and through our experiments, prove the effectiveness of our proposed method on them. It is then demonstrated that this method can be deployed on modern day smartphones and is the first of its kind solution that can efficiently (2.4secs) perform shadow removal for high-resolution images (12MP) in these devices. Like many existing approaches, our shadow removal network relies on a shadow region mask as input to the network. To complement the lightweight shadow removal network, we also propose a lightweight shadow detector in this paper.
阴影去除是计算机视觉中一个重要且被广泛研究的课题。深度学习的最新进展已经通过使用卷积神经网络(cnn)解决了这个问题,类似于其他视觉任务。但这些现有的作品仅限于低分辨率的图像。此外,现有方法依赖于繁重的网络架构,无法在智能手机等资源受限的平台上部署。本文提出了一种用于高分辨率图像的阴影去除方法——SHARDS。该方法采用两个轻量级网络:低分辨率阴影去除网络(LSRNet)和细节细化网络(DRNet),分两个阶段解决高分辨率图像的阴影去除问题。LSRNet在低分辨率下运行,并计算低分辨率、无阴影的输出。它在标准数据集上获得最先进的结果,网络参数比现有方法少65倍。接下来是DRNet,它的任务是使用高分辨率输入阴影图像作为指导,将低分辨率输出细化为高分辨率输出。我们构建了高分辨率的阴影去除数据集,并通过实验证明了该方法的有效性。然后证明了这种方法可以部署在现代智能手机上,并且是同类解决方案中的第一个,可以有效地(2.4秒)对这些设备中的高分辨率图像(12MP)执行阴影去除。像许多现有的方法一样,我们的阴影去除网络依赖于阴影区域掩码作为网络的输入。为了补充轻量级阴影去除网络,本文还提出了一种轻量级阴影检测器。
{"title":"SHARDS: Efficient SHAdow Removal using Dual Stage Network for High-Resolution Images","authors":"Mrinmoy Sen, Sai Pradyumna Chermala, Nazrinbanu Nurmohammad Nagori, V. Peddigari, Praful Mathur, B. H. P. Prasad, Moonsik Jeong","doi":"10.1109/WACV56688.2023.00185","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00185","url":null,"abstract":"Shadow Removal is an important and widely researched topic in computer vision. Recent advances in deep learning have resulted in addressing this problem by using convolutional neural networks (CNNs) similar to other vision tasks. But these existing works are limited to low-resolution images. Furthermore, the existing methods rely on heavy network architectures which cannot be deployed on resource-constrained platforms like smartphones. In this paper, we propose SHARDS, a shadow removal method for high-resolution images. The proposed method solves shadow removal for high-resolution images in two stages using two lightweight networks: a Low-resolution Shadow Removal Network (LSRNet) followed by a Detail Refinement Network (DRNet). LSRNet operates at low-resolution and computes a low-resolution, shadow-free output. It achieves state-of-the-art results on standard datasets with 65x lesser network parameters than existing methods. This is followed by DRNet, which is tasked to refine the low-resolution output to a high-resolution output using the high-resolution input shadow image as guidance. We construct high-resolution shadow removal datasets and through our experiments, prove the effectiveness of our proposed method on them. It is then demonstrated that this method can be deployed on modern day smartphones and is the first of its kind solution that can efficiently (2.4secs) perform shadow removal for high-resolution images (12MP) in these devices. Like many existing approaches, our shadow removal network relies on a shadow region mask as input to the network. To complement the lightweight shadow removal network, we also propose a lightweight shadow detector in this paper.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125031012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
D2F2WOD: Learning Object Proposals for Weakly-Supervised Object Detection via Progressive Domain Adaptation 基于渐进式领域自适应的弱监督目标检测的学习对象建议
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00011
Yuting Wang, Ricardo Guerrero, V. Pavlovic
Weakly-supervised object detection (WSOD) models attempt to leverage image-level annotations in lieu of accurate but costly-to-obtain object localization labels. This oftentimes leads to substandard object detection and lo-calization at inference time. To tackle this issue, we propose D2F2WOD, a Dual-Domain Fully-to-Weakly Supervised Object Detection framework that leverages synthetic data, annotated with precise object localization, to supplement a natural image target domain, where only image-level labels are available. In its warm-up domain adaptation stage, the model learns a fully-supervised object detector (FSOD) to improve the precision of the object proposals in the target domain, and at the same time learns target-domain-specific and detection-aware proposal features. In its main WSOD stage, a WSOD model is specifically tuned to the target domain. The feature extractor and the object proposal generator of the WSOD model are built upon the fine-tuned FSOD model. We test D2F2WOD on five dual-domain image benchmarks. The results show that our method results in consistently improved object detection and localization compared with state-of-the-art methods.
弱监督对象检测(WSOD)模型试图利用图像级注释来代替准确但获取代价高昂的对象定位标签。这通常会导致不合格的对象检测和推理时的低化。为了解决这个问题,我们提出了D2F2WOD,一个双域全到弱监督对象检测框架,它利用合成数据,用精确的对象定位注释,来补充自然图像目标域,其中只有图像级标签可用。在预热域适应阶段,该模型学习了一种全监督目标检测器(FSOD)来提高目标域目标建议的精度,同时学习了目标域特定的和检测感知的建议特征。在其主要的WSOD阶段,WSOD模型被专门调优到目标域。WSOD模型的特征提取器和目标建议生成器是建立在微调后的FSOD模型之上的。我们在五个双域图像基准上测试D2F2WOD。结果表明,与现有方法相比,我们的方法在目标检测和定位方面取得了持续的进步。
{"title":"D2F2WOD: Learning Object Proposals for Weakly-Supervised Object Detection via Progressive Domain Adaptation","authors":"Yuting Wang, Ricardo Guerrero, V. Pavlovic","doi":"10.1109/WACV56688.2023.00011","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00011","url":null,"abstract":"Weakly-supervised object detection (WSOD) models attempt to leverage image-level annotations in lieu of accurate but costly-to-obtain object localization labels. This oftentimes leads to substandard object detection and lo-calization at inference time. To tackle this issue, we propose D2F2WOD, a Dual-Domain Fully-to-Weakly Supervised Object Detection framework that leverages synthetic data, annotated with precise object localization, to supplement a natural image target domain, where only image-level labels are available. In its warm-up domain adaptation stage, the model learns a fully-supervised object detector (FSOD) to improve the precision of the object proposals in the target domain, and at the same time learns target-domain-specific and detection-aware proposal features. In its main WSOD stage, a WSOD model is specifically tuned to the target domain. The feature extractor and the object proposal generator of the WSOD model are built upon the fine-tuned FSOD model. We test D2F2WOD on five dual-domain image benchmarks. The results show that our method results in consistently improved object detection and localization compared with state-of-the-art methods.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"56 42","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120839604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Expert-defined Keywords Improve Interpretability of Retinal Image Captioning 专家定义关键词提高视网膜图像标题的可解释性
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00190
Ting-Wei Wu, Jia-Hong Huang, Joseph Lin, M. Worring
Automatic machine learning-based (ML-based) medical report generation systems for retinal images suffer from a relative lack of interpretability. Hence, such ML-based systems are still not widely accepted. The main reason is that trust is one of the important motivating aspects of interpretability and humans do not trust blindly. Precise technical definitions of interpretability still lack consensus. Hence, it is difficult to make a human-comprehensible ML-based medical report generation system. Heat maps/saliency maps, i.e., post-hoc explanation approaches, are widely used to improve the interpretability of ML-based medical systems. However, they are well known to be problematic. From an ML-based medical model’s perspective, the highlighted areas of an image are considered important for making a prediction. However, from a doctor’s perspective, even the hottest regions of a heat map contain both useful and non-useful information. Simply localizing the region, therefore, does not reveal exactly what it was in that area that the model considered useful. Hence, the post-hoc explanation-based method relies on humans who probably have a biased nature to decide what a given heat map might mean. Interpretability boosters, in particular expert-defined keywords, are effective carriers of expert domain knowledge and they are human-comprehensible. In this work, we propose to exploit such keywords and a specialized attention-based strategy to build a more human-comprehensible medical report generation system for retinal images. Both keywords and the proposed strategy effectively improve the interpretability. The proposed method achieves state-of-the-art performance under commonly used text evaluation metrics BLEU, ROUGE, CIDEr, and METEOR. Project website: https://github.com/Jhhuangkay/Expert-defined-Keywords-Improve-Interpretability-of-Retinal-Image-Captioning.
基于自动机器学习(ml)的视网膜图像医学报告生成系统相对缺乏可解释性。因此,这种基于ml的系统仍然没有被广泛接受。主要原因是信任是可解释性的重要激励因素之一,人类不会盲目信任。关于可解释性的精确技术定义仍然缺乏共识。因此,基于ml的医学报告生成系统很难被人类理解。热图/显著性图,即事后解释方法,被广泛用于提高基于ml的医疗系统的可解释性。然而,他们是众所周知的问题。从基于ml的医学模型的角度来看,图像中突出显示的区域对于进行预测是重要的。然而,从医生的角度来看,即使是热图中最热的区域也包含有用和无用的信息。因此,简单地定位区域并不能准确地揭示该区域中模型认为有用的东西。因此,基于事后解释的方法依赖于可能有偏见的人来决定给定的热图可能意味着什么。可解释性增强器,特别是专家定义的关键词,是专家领域知识的有效载体,是人类可理解的。在这项工作中,我们建议利用这些关键词和一个专门的基于注意力的策略来构建一个更容易理解的视网膜图像医学报告生成系统。关键词和所提出的策略都有效地提高了可解释性。本文提出的方法在常用的文本评估指标BLEU、ROUGE、CIDEr和METEOR下达到了最先进的性能。项目网站:https://github.com/Jhhuangkay/Expert-defined-Keywords-Improve-Interpretability-of-Retinal-Image-Captioning。
{"title":"Expert-defined Keywords Improve Interpretability of Retinal Image Captioning","authors":"Ting-Wei Wu, Jia-Hong Huang, Joseph Lin, M. Worring","doi":"10.1109/WACV56688.2023.00190","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00190","url":null,"abstract":"Automatic machine learning-based (ML-based) medical report generation systems for retinal images suffer from a relative lack of interpretability. Hence, such ML-based systems are still not widely accepted. The main reason is that trust is one of the important motivating aspects of interpretability and humans do not trust blindly. Precise technical definitions of interpretability still lack consensus. Hence, it is difficult to make a human-comprehensible ML-based medical report generation system. Heat maps/saliency maps, i.e., post-hoc explanation approaches, are widely used to improve the interpretability of ML-based medical systems. However, they are well known to be problematic. From an ML-based medical model’s perspective, the highlighted areas of an image are considered important for making a prediction. However, from a doctor’s perspective, even the hottest regions of a heat map contain both useful and non-useful information. Simply localizing the region, therefore, does not reveal exactly what it was in that area that the model considered useful. Hence, the post-hoc explanation-based method relies on humans who probably have a biased nature to decide what a given heat map might mean. Interpretability boosters, in particular expert-defined keywords, are effective carriers of expert domain knowledge and they are human-comprehensible. In this work, we propose to exploit such keywords and a specialized attention-based strategy to build a more human-comprehensible medical report generation system for retinal images. Both keywords and the proposed strategy effectively improve the interpretability. The proposed method achieves state-of-the-art performance under commonly used text evaluation metrics BLEU, ROUGE, CIDEr, and METEOR. Project website: https://github.com/Jhhuangkay/Expert-defined-Keywords-Improve-Interpretability-of-Retinal-Image-Captioning.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124015723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Quality Aware Sample-to-Sample Comparison for Face Recognition 基于质量意识的人脸识别样本间比较
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00607
Mohammad Saeed Ebrahimi Saadabadi, Sahar Rahimi Malakshan, A. Zafari, Moktari Mostofa, N. Nasrabadi
Currently available face datasets mainly consist of a large number of high-quality and a small number of low-quality samples. As a result, a Face Recognition (FR) network fails to learn the distribution of low-quality samples since they are less frequent during training (underrepresented). Moreover, current state-of-the-art FR training paradigms are based on the sample-to-center comparison (i.e., Softmax-based classifier), which results in a lack of uniformity between train and test metrics. This work integrates a quality-aware learning process at the sample level into the classification training paradigm (QAFace). In this regard, Softmax centers are adaptively guided to pay more attention to low-quality samples by using a quality-aware function. Accordingly, QAFace adds a quality-based adjustment to the updating procedure of the Softmax-based classifier to improve the performance on the underrepresented low-quality samples. Our method adaptively finds and assigns more attention to the recognizable low-quality samples in the training datasets. In addition, QAFace ignores the unrecognizable low-quality samples using the feature magnitude as a proxy for quality. As a result, QAFace prevents class centers from getting distracted from the optimal direction. The proposed method is superior to the state-of-the-art algorithms in extensive experimental results on the CFP-FP, LFW, CPLFW, CALFW, AgeDB, IJB-B, and IJB-C datasets.
目前可用的人脸数据集主要由大量高质量样本和少量低质量样本组成。因此,人脸识别(FR)网络无法学习低质量样本的分布,因为它们在训练期间频率较低(代表性不足)。此外,目前最先进的FR训练范例是基于样本到中心的比较(即基于softmax的分类器),这导致训练和测试指标之间缺乏一致性。这项工作将样本级别的质量意识学习过程集成到分类训练范式(qface)中。因此,Softmax中心通过使用质量意识功能,自适应地引导中心更加关注低质量的样品。因此,qface在基于softmax的分类器的更新过程中增加了基于质量的调整,以提高在未充分代表的低质量样本上的性能。我们的方法自适应地发现训练数据集中可识别的低质量样本并给予更多的关注。此外,qface使用特征大小作为质量的代理来忽略不可识别的低质量样本。因此,qface可以防止课堂中心偏离最佳方向。在CFP-FP、LFW、CPLFW、CALFW、AgeDB、IJB-B和IJB-C数据集上的大量实验结果表明,该方法优于目前最先进的算法。
{"title":"A Quality Aware Sample-to-Sample Comparison for Face Recognition","authors":"Mohammad Saeed Ebrahimi Saadabadi, Sahar Rahimi Malakshan, A. Zafari, Moktari Mostofa, N. Nasrabadi","doi":"10.1109/WACV56688.2023.00607","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00607","url":null,"abstract":"Currently available face datasets mainly consist of a large number of high-quality and a small number of low-quality samples. As a result, a Face Recognition (FR) network fails to learn the distribution of low-quality samples since they are less frequent during training (underrepresented). Moreover, current state-of-the-art FR training paradigms are based on the sample-to-center comparison (i.e., Softmax-based classifier), which results in a lack of uniformity between train and test metrics. This work integrates a quality-aware learning process at the sample level into the classification training paradigm (QAFace). In this regard, Softmax centers are adaptively guided to pay more attention to low-quality samples by using a quality-aware function. Accordingly, QAFace adds a quality-based adjustment to the updating procedure of the Softmax-based classifier to improve the performance on the underrepresented low-quality samples. Our method adaptively finds and assigns more attention to the recognizable low-quality samples in the training datasets. In addition, QAFace ignores the unrecognizable low-quality samples using the feature magnitude as a proxy for quality. As a result, QAFace prevents class centers from getting distracted from the optimal direction. The proposed method is superior to the state-of-the-art algorithms in extensive experimental results on the CFP-FP, LFW, CPLFW, CALFW, AgeDB, IJB-B, and IJB-C datasets.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"1981 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130297215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Heightfields for Efficient Scene Reconstruction for AR 用于AR高效场景重建的高度场
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00580
Jamie Watson, S. Vicente, Oisin Mac Aodha, Clément Godard, G. Brostow, Michael Firman
3D scene reconstruction from a sequence of posed RGB images is a cornerstone task for computer vision and augmented reality (AR). While depth-based fusion is the foundation of most real-time approaches for 3D reconstruction, recent learning based methods that operate directly on RGB images can achieve higher quality reconstructions, but at the cost of increased runtime and memory requirements, making them unsuitable for AR applications. We propose an efficient learning-based method that refines the 3D reconstruction obtained by a traditional fusion approach. By leveraging a top-down heightfield representation, our method remains real-time while approaching the quality of other learning-based methods. Despite being a simplification, our heightfield is perfectly appropriate for robotic path planning or augmented reality character placement. We outline several innovations that push the performance beyond existing top-down prediction baselines, and we present an evaluation framework on the challenging ScanNetV2 dataset, targeting AR tasks.
从一系列RGB图像中重建三维场景是计算机视觉和增强现实(AR)的基础任务。虽然基于深度的融合是大多数实时3D重建方法的基础,但最近直接在RGB图像上操作的基于学习的方法可以实现更高质量的重建,但代价是增加了运行时和内存需求,使其不适合AR应用。我们提出了一种高效的基于学习的方法来改进传统融合方法获得的三维重建。通过利用自顶向下的高度场表示,我们的方法在接近其他基于学习的方法的质量的同时保持实时。尽管这是一种简化,但我们的高度场非常适合机器人路径规划或增强现实角色位置。我们概述了一些创新,将性能超越现有的自上而下的预测基线,我们提出了一个针对AR任务的具有挑战性的ScanNetV2数据集的评估框架。
{"title":"Heightfields for Efficient Scene Reconstruction for AR","authors":"Jamie Watson, S. Vicente, Oisin Mac Aodha, Clément Godard, G. Brostow, Michael Firman","doi":"10.1109/WACV56688.2023.00580","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00580","url":null,"abstract":"3D scene reconstruction from a sequence of posed RGB images is a cornerstone task for computer vision and augmented reality (AR). While depth-based fusion is the foundation of most real-time approaches for 3D reconstruction, recent learning based methods that operate directly on RGB images can achieve higher quality reconstructions, but at the cost of increased runtime and memory requirements, making them unsuitable for AR applications. We propose an efficient learning-based method that refines the 3D reconstruction obtained by a traditional fusion approach. By leveraging a top-down heightfield representation, our method remains real-time while approaching the quality of other learning-based methods. Despite being a simplification, our heightfield is perfectly appropriate for robotic path planning or augmented reality character placement. We outline several innovations that push the performance beyond existing top-down prediction baselines, and we present an evaluation framework on the challenging ScanNetV2 dataset, targeting AR tasks.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130056256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SIUNet: Sparsity Invariant U-Net for Edge-Aware Depth Completion SIUNet:边缘感知深度补全的稀疏不变U-Net
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00577
A. Ramesh, F. Giovanneschi, M. González-Huici
Depth completion is the task of generating dense depth images from sparse depth measurements, e.g., LiDARs. Existing unguided approaches fail to recover dense depth images with sharp object boundaries due to depth bleeding, especially from extremely sparse measurements. State-of-the-art guided approaches require additional processing for spatial and temporal alignment of multi-modal inputs, and sophisticated architectures for data fusion, making them non-trivial for customized sensor setup. To address these limitations, we propose an unguided approach based on U-Net that is invariant to sparsity of inputs. Boundary consistency in reconstruction is explicitly enforced through auxiliary learning on a synthetic dataset with dense depth and depth contour images as targets, followed by fine-tuning on a real-world dataset. With our network architecture and simple implementation approach, we achieve competitive results among unguided approaches on KITTI benchmark and show that the reconstructed image has sharp boundaries and is robust even towards extremely sparse LiDAR measurements.
深度补全是指从稀疏深度测量(如激光雷达)中生成密集深度图像的任务。由于深度出血,现有的非制导方法无法恢复具有清晰物体边界的密集深度图像,特别是在极其稀疏的测量中。最先进的引导方法需要额外的处理来对多模态输入进行空间和时间对齐,以及用于数据融合的复杂架构,这使得它们对于定制传感器设置来说非常重要。为了解决这些限制,我们提出了一种基于U-Net的非引导方法,该方法对输入的稀疏性不变。重建中的边界一致性是通过辅助学习在以密集深度和深度轮廓图像为目标的合成数据集上明确执行的,然后在真实数据集上进行微调。利用我们的网络架构和简单的实现方法,我们在KITTI基准上取得了与非制导方法相比具有竞争力的结果,并表明重构图像具有清晰的边界,即使对极其稀疏的LiDAR测量也具有鲁棒性。
{"title":"SIUNet: Sparsity Invariant U-Net for Edge-Aware Depth Completion","authors":"A. Ramesh, F. Giovanneschi, M. González-Huici","doi":"10.1109/WACV56688.2023.00577","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00577","url":null,"abstract":"Depth completion is the task of generating dense depth images from sparse depth measurements, e.g., LiDARs. Existing unguided approaches fail to recover dense depth images with sharp object boundaries due to depth bleeding, especially from extremely sparse measurements. State-of-the-art guided approaches require additional processing for spatial and temporal alignment of multi-modal inputs, and sophisticated architectures for data fusion, making them non-trivial for customized sensor setup. To address these limitations, we propose an unguided approach based on U-Net that is invariant to sparsity of inputs. Boundary consistency in reconstruction is explicitly enforced through auxiliary learning on a synthetic dataset with dense depth and depth contour images as targets, followed by fine-tuning on a real-world dataset. With our network architecture and simple implementation approach, we achieve competitive results among unguided approaches on KITTI benchmark and show that the reconstructed image has sharp boundaries and is robust even towards extremely sparse LiDAR measurements.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130180179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-scale Contrastive Learning for Complex Scene Generation 复杂场景生成的多尺度对比学习
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00083
Hanbit Lee, Youna Kim, Sang-goo Lee
Recent advances in Generative Adversarial Networks (GANs) have enabled photo-realistic synthesis of single object images. Yet, modeling more complex distributions, such as scenes with multiple objects, remains challenging. The difficulty stems from the incalculable variety of scene configurations which contain multiple objects of different categories placed at various locations. In this paper, we aim to alleviate the difficulty by enhancing the discriminative ability of the discriminator through a locally defined self-supervised pretext task. To this end, we design a discriminator to leverage multi-scale local feedback that guides the generator to better model local semantic structures in the scene. Then, we require the discriminator to carry out pixel-level contrastive learning at multiple scales to enhance discriminative capability on local regions. Experimental results on several challenging scene datasets show that our method improves the synthesis quality by a substantial margin compared to state-of-the-art baselines.
生成对抗网络(GANs)的最新进展使单个对象图像的逼真合成成为可能。然而,建模更复杂的分布,例如具有多个对象的场景,仍然具有挑战性。这种困难源于难以计算的各种场景配置,其中包含放置在不同位置的多个不同类别的物体。在本文中,我们旨在通过局部定义的自监督借口任务来提高鉴别器的判别能力,从而缓解这一困难。为此,我们设计了一个鉴别器来利用多尺度局部反馈,引导生成器更好地模拟场景中的局部语义结构。然后,我们要求鉴别器在多个尺度上进行像素级的对比学习,以增强局部区域的判别能力。在几个具有挑战性的场景数据集上的实验结果表明,与最先进的基线相比,我们的方法大大提高了合成质量。
{"title":"Multi-scale Contrastive Learning for Complex Scene Generation","authors":"Hanbit Lee, Youna Kim, Sang-goo Lee","doi":"10.1109/WACV56688.2023.00083","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00083","url":null,"abstract":"Recent advances in Generative Adversarial Networks (GANs) have enabled photo-realistic synthesis of single object images. Yet, modeling more complex distributions, such as scenes with multiple objects, remains challenging. The difficulty stems from the incalculable variety of scene configurations which contain multiple objects of different categories placed at various locations. In this paper, we aim to alleviate the difficulty by enhancing the discriminative ability of the discriminator through a locally defined self-supervised pretext task. To this end, we design a discriminator to leverage multi-scale local feedback that guides the generator to better model local semantic structures in the scene. Then, we require the discriminator to carry out pixel-level contrastive learning at multiple scales to enhance discriminative capability on local regions. Experimental results on several challenging scene datasets show that our method improves the synthesis quality by a substantial margin compared to state-of-the-art baselines.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131165439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fast Differentiable Transient Rendering for Non-Line-of-Sight Reconstruction 非视距重建的快速可微分瞬态渲染
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00308
Markus Plack, C. Callenberg, M. Schneider, M. Hullin
Research into non-line-of-sight imaging problems has gained momentum in recent years motivated by intriguing prospective applications in e.g. medicine and autonomous driving. While transient image formation is well understood and there exist various reconstruction approaches for non-line-of-sight scenes that combine efficient forward renderers with optimization schemes, those approaches suffer from runtimes in the order of hours even for moderately sized scenes. Furthermore, the ill-posedness of the inverse problem often leads to instabilities in the optimization.Inspired by the latest advances in direct-line-of-sight inverse rendering that have led to stunning results for reconstructing scene geometry and appearance, we present a fast differentiable transient renderer that accelerates the inverse rendering runtime to minutes on consumer hardware, making it possible to apply inverse transient imaging on a wider range of tasks and in more time-critical scenarios. We demonstrate its effectiveness on a series of applications using various datasets and show that it can be used for self-supervised learning.
近年来,由于在医学和自动驾驶等领域的潜在应用,对非视距成像问题的研究势头强劲。虽然瞬态图像形成很好理解,并且存在各种非视线场景的重建方法,这些方法结合了高效的前向渲染器和优化方案,但即使对于中等大小的场景,这些方法也需要几个小时的运行时间。此外,逆问题的病态性往往导致优化过程的不稳定性。受直接视线反向渲染的最新进展的启发,在重建场景几何形状和外观方面取得了惊人的结果,我们提出了一个快速可微分的瞬态渲染器,可将消费者硬件上的反向渲染运行时间加速到几分钟,从而可以在更广泛的任务范围和更多时间关键的场景中应用反向瞬态成像。我们在使用各种数据集的一系列应用中证明了它的有效性,并表明它可以用于自监督学习。
{"title":"Fast Differentiable Transient Rendering for Non-Line-of-Sight Reconstruction","authors":"Markus Plack, C. Callenberg, M. Schneider, M. Hullin","doi":"10.1109/WACV56688.2023.00308","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00308","url":null,"abstract":"Research into non-line-of-sight imaging problems has gained momentum in recent years motivated by intriguing prospective applications in e.g. medicine and autonomous driving. While transient image formation is well understood and there exist various reconstruction approaches for non-line-of-sight scenes that combine efficient forward renderers with optimization schemes, those approaches suffer from runtimes in the order of hours even for moderately sized scenes. Furthermore, the ill-posedness of the inverse problem often leads to instabilities in the optimization.Inspired by the latest advances in direct-line-of-sight inverse rendering that have led to stunning results for reconstructing scene geometry and appearance, we present a fast differentiable transient renderer that accelerates the inverse rendering runtime to minutes on consumer hardware, making it possible to apply inverse transient imaging on a wider range of tasks and in more time-critical scenarios. We demonstrate its effectiveness on a series of applications using various datasets and show that it can be used for self-supervised learning.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128725960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1