首页 > 最新文献

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
SIUNet: Sparsity Invariant U-Net for Edge-Aware Depth Completion SIUNet:边缘感知深度补全的稀疏不变U-Net
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00577
A. Ramesh, F. Giovanneschi, M. González-Huici
Depth completion is the task of generating dense depth images from sparse depth measurements, e.g., LiDARs. Existing unguided approaches fail to recover dense depth images with sharp object boundaries due to depth bleeding, especially from extremely sparse measurements. State-of-the-art guided approaches require additional processing for spatial and temporal alignment of multi-modal inputs, and sophisticated architectures for data fusion, making them non-trivial for customized sensor setup. To address these limitations, we propose an unguided approach based on U-Net that is invariant to sparsity of inputs. Boundary consistency in reconstruction is explicitly enforced through auxiliary learning on a synthetic dataset with dense depth and depth contour images as targets, followed by fine-tuning on a real-world dataset. With our network architecture and simple implementation approach, we achieve competitive results among unguided approaches on KITTI benchmark and show that the reconstructed image has sharp boundaries and is robust even towards extremely sparse LiDAR measurements.
深度补全是指从稀疏深度测量(如激光雷达)中生成密集深度图像的任务。由于深度出血,现有的非制导方法无法恢复具有清晰物体边界的密集深度图像,特别是在极其稀疏的测量中。最先进的引导方法需要额外的处理来对多模态输入进行空间和时间对齐,以及用于数据融合的复杂架构,这使得它们对于定制传感器设置来说非常重要。为了解决这些限制,我们提出了一种基于U-Net的非引导方法,该方法对输入的稀疏性不变。重建中的边界一致性是通过辅助学习在以密集深度和深度轮廓图像为目标的合成数据集上明确执行的,然后在真实数据集上进行微调。利用我们的网络架构和简单的实现方法,我们在KITTI基准上取得了与非制导方法相比具有竞争力的结果,并表明重构图像具有清晰的边界,即使对极其稀疏的LiDAR测量也具有鲁棒性。
{"title":"SIUNet: Sparsity Invariant U-Net for Edge-Aware Depth Completion","authors":"A. Ramesh, F. Giovanneschi, M. González-Huici","doi":"10.1109/WACV56688.2023.00577","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00577","url":null,"abstract":"Depth completion is the task of generating dense depth images from sparse depth measurements, e.g., LiDARs. Existing unguided approaches fail to recover dense depth images with sharp object boundaries due to depth bleeding, especially from extremely sparse measurements. State-of-the-art guided approaches require additional processing for spatial and temporal alignment of multi-modal inputs, and sophisticated architectures for data fusion, making them non-trivial for customized sensor setup. To address these limitations, we propose an unguided approach based on U-Net that is invariant to sparsity of inputs. Boundary consistency in reconstruction is explicitly enforced through auxiliary learning on a synthetic dataset with dense depth and depth contour images as targets, followed by fine-tuning on a real-world dataset. With our network architecture and simple implementation approach, we achieve competitive results among unguided approaches on KITTI benchmark and show that the reconstructed image has sharp boundaries and is robust even towards extremely sparse LiDAR measurements.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130180179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
AdvisIL - A Class-Incremental Learning Advisor AdvisIL -一个班级增量学习顾问
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00243
Eva Feillet, Grégoire Petit, Adrian-Stefan Popescu, M. Reyboz, C. Hudelot
Recent class-incremental learning methods combine deep neural architectures and learning algorithms to handle streaming data under memory and computational constraints. The performance of existing methods varies depending on the characteristics of the incremental process. To date, there is no other approach than to test all pairs of learning algorithms and neural architectures on the training data available at the start of the learning process to select a suited algorithm-architecture combination. To tackle this problem, in this article, we introduce AdvisIL, a method which takes as input the main characteristics of the incremental process (memory budget for the deep model, initial number of classes, size of incremental steps) and recommends an adapted pair of learning algorithm and neural architecture. The recommendation is based on a similarity between the user-provided settings and a large set of pre-computed experiments. AdvisIL makes class-incremental learning easier, since users do not need to run cumbersome experiments to design their system. We evaluate our method on four datasets under six incremental settings and three deep model sizes. We compare six algorithms and three deep neural architectures. Results show that AdvisIL has better overall performance than any of the individual combinations of a learning algorithm and a neural architecture. AdvisIL’s code is available at https://github.com/EvaJF/AdvisIL.
最近的类增量学习方法结合了深度神经结构和学习算法来处理内存和计算约束下的流数据。现有方法的性能取决于增量过程的特性。迄今为止,除了在学习过程开始时在可用的训练数据上测试所有对学习算法和神经架构,以选择合适的算法-架构组合之外,没有其他方法。为了解决这一问题,本文引入了AdvisIL方法,该方法将增量过程的主要特征(深度模型的内存预算、初始类数、增量步长)作为输入,并推荐了一套适应的学习算法和神经结构。该建议是基于用户提供的设置和大量预先计算的实验之间的相似性。AdvisIL使类增量学习更容易,因为用户不需要运行繁琐的实验来设计他们的系统。我们在六个增量设置和三个深度模型尺寸下的四个数据集上评估了我们的方法。我们比较了六种算法和三种深度神经结构。结果表明,AdvisIL的整体性能优于任何一种学习算法和神经结构的单独组合。AdvisIL的代码可在https://github.com/EvaJF/AdvisIL上获得。
{"title":"AdvisIL - A Class-Incremental Learning Advisor","authors":"Eva Feillet, Grégoire Petit, Adrian-Stefan Popescu, M. Reyboz, C. Hudelot","doi":"10.1109/WACV56688.2023.00243","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00243","url":null,"abstract":"Recent class-incremental learning methods combine deep neural architectures and learning algorithms to handle streaming data under memory and computational constraints. The performance of existing methods varies depending on the characteristics of the incremental process. To date, there is no other approach than to test all pairs of learning algorithms and neural architectures on the training data available at the start of the learning process to select a suited algorithm-architecture combination. To tackle this problem, in this article, we introduce AdvisIL, a method which takes as input the main characteristics of the incremental process (memory budget for the deep model, initial number of classes, size of incremental steps) and recommends an adapted pair of learning algorithm and neural architecture. The recommendation is based on a similarity between the user-provided settings and a large set of pre-computed experiments. AdvisIL makes class-incremental learning easier, since users do not need to run cumbersome experiments to design their system. We evaluate our method on four datasets under six incremental settings and three deep model sizes. We compare six algorithms and three deep neural architectures. Results show that AdvisIL has better overall performance than any of the individual combinations of a learning algorithm and a neural architecture. AdvisIL’s code is available at https://github.com/EvaJF/AdvisIL.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131350214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Heightfields for Efficient Scene Reconstruction for AR 用于AR高效场景重建的高度场
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00580
Jamie Watson, S. Vicente, Oisin Mac Aodha, Clément Godard, G. Brostow, Michael Firman
3D scene reconstruction from a sequence of posed RGB images is a cornerstone task for computer vision and augmented reality (AR). While depth-based fusion is the foundation of most real-time approaches for 3D reconstruction, recent learning based methods that operate directly on RGB images can achieve higher quality reconstructions, but at the cost of increased runtime and memory requirements, making them unsuitable for AR applications. We propose an efficient learning-based method that refines the 3D reconstruction obtained by a traditional fusion approach. By leveraging a top-down heightfield representation, our method remains real-time while approaching the quality of other learning-based methods. Despite being a simplification, our heightfield is perfectly appropriate for robotic path planning or augmented reality character placement. We outline several innovations that push the performance beyond existing top-down prediction baselines, and we present an evaluation framework on the challenging ScanNetV2 dataset, targeting AR tasks.
从一系列RGB图像中重建三维场景是计算机视觉和增强现实(AR)的基础任务。虽然基于深度的融合是大多数实时3D重建方法的基础,但最近直接在RGB图像上操作的基于学习的方法可以实现更高质量的重建,但代价是增加了运行时和内存需求,使其不适合AR应用。我们提出了一种高效的基于学习的方法来改进传统融合方法获得的三维重建。通过利用自顶向下的高度场表示,我们的方法在接近其他基于学习的方法的质量的同时保持实时。尽管这是一种简化,但我们的高度场非常适合机器人路径规划或增强现实角色位置。我们概述了一些创新,将性能超越现有的自上而下的预测基线,我们提出了一个针对AR任务的具有挑战性的ScanNetV2数据集的评估框架。
{"title":"Heightfields for Efficient Scene Reconstruction for AR","authors":"Jamie Watson, S. Vicente, Oisin Mac Aodha, Clément Godard, G. Brostow, Michael Firman","doi":"10.1109/WACV56688.2023.00580","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00580","url":null,"abstract":"3D scene reconstruction from a sequence of posed RGB images is a cornerstone task for computer vision and augmented reality (AR). While depth-based fusion is the foundation of most real-time approaches for 3D reconstruction, recent learning based methods that operate directly on RGB images can achieve higher quality reconstructions, but at the cost of increased runtime and memory requirements, making them unsuitable for AR applications. We propose an efficient learning-based method that refines the 3D reconstruction obtained by a traditional fusion approach. By leveraging a top-down heightfield representation, our method remains real-time while approaching the quality of other learning-based methods. Despite being a simplification, our heightfield is perfectly appropriate for robotic path planning or augmented reality character placement. We outline several innovations that push the performance beyond existing top-down prediction baselines, and we present an evaluation framework on the challenging ScanNetV2 dataset, targeting AR tasks.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130056256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fine Gaze Redirection Learning with Gaze Hardness-aware Transformation 基于注视硬度感知变换的精细注视重定向学习
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00346
Sangjin Park, D. Kim, B. Song
The gaze redirection is a task to adjust the gaze of a given face or eye image toward the desired direction and aims to learn the gaze direction of a face image through a neural network-based generator. Considering that the prior arts have learned coarse gaze directions, learning fine gaze directions is very challenging. In addition, explicit discriminative learning of high-dimensional gaze features has not been reported yet. This paper presents solutions to overcome the above limitations. First, we propose the feature-level transformation which provides gaze features corresponding to various gaze directions in the latent feature space. Second, we propose a novel loss function for discriminative learning of gaze features. Specifically, features with insignificant or irrelevant effects on gaze (e.g., head pose and appearance) are set as negative pairs, and important gaze features are set as positive pairs, and then pair-wise similarity learning is performed. As a result, the proposed method showed a redirection error of only 2° for the Gaze-Capture dataset. This is a 10% better performance than a state-of-the-art method, i.e., STED. Additionally, the rationale for why latent features of various attributes should be discriminated is presented through activation visualization. Code is available at https://github.com/san9569/Gaze-Redir-Learning
注视重定向是将给定的人脸或眼睛图像的注视方向调整到期望的方向,目的是通过基于神经网络的生成器来学习人脸图像的注视方向。考虑到现有技术已经学习了粗糙的凝视方向,学习精细的凝视方向是非常具有挑战性的。此外,高维凝视特征的外显判别学习尚未见报道。本文提出了克服上述限制的解决方案。首先,提出特征级变换,在潜在特征空间中提供不同凝视方向对应的凝视特征;其次,我们提出了一种新的用于凝视特征判别学习的损失函数。具体而言,将对凝视影响不显著或不相关的特征(如头部姿势和外表)设置为负向对,将重要的凝视特征设置为正向对,然后进行两两相似性学习。结果表明,该方法对Gaze-Capture数据集的重定向误差仅为2°。这比最先进的方法,即STED,提高了10%的性能。此外,本文还通过激活可视化的方法解释了为什么要区分各种属性的潜在特征。代码可从https://github.com/san9569/Gaze-Redir-Learning获得
{"title":"Fine Gaze Redirection Learning with Gaze Hardness-aware Transformation","authors":"Sangjin Park, D. Kim, B. Song","doi":"10.1109/WACV56688.2023.00346","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00346","url":null,"abstract":"The gaze redirection is a task to adjust the gaze of a given face or eye image toward the desired direction and aims to learn the gaze direction of a face image through a neural network-based generator. Considering that the prior arts have learned coarse gaze directions, learning fine gaze directions is very challenging. In addition, explicit discriminative learning of high-dimensional gaze features has not been reported yet. This paper presents solutions to overcome the above limitations. First, we propose the feature-level transformation which provides gaze features corresponding to various gaze directions in the latent feature space. Second, we propose a novel loss function for discriminative learning of gaze features. Specifically, features with insignificant or irrelevant effects on gaze (e.g., head pose and appearance) are set as negative pairs, and important gaze features are set as positive pairs, and then pair-wise similarity learning is performed. As a result, the proposed method showed a redirection error of only 2° for the Gaze-Capture dataset. This is a 10% better performance than a state-of-the-art method, i.e., STED. Additionally, the rationale for why latent features of various attributes should be discriminated is presented through activation visualization. Code is available at https://github.com/san9569/Gaze-Redir-Learning","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130213340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast Differentiable Transient Rendering for Non-Line-of-Sight Reconstruction 非视距重建的快速可微分瞬态渲染
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00308
Markus Plack, C. Callenberg, M. Schneider, M. Hullin
Research into non-line-of-sight imaging problems has gained momentum in recent years motivated by intriguing prospective applications in e.g. medicine and autonomous driving. While transient image formation is well understood and there exist various reconstruction approaches for non-line-of-sight scenes that combine efficient forward renderers with optimization schemes, those approaches suffer from runtimes in the order of hours even for moderately sized scenes. Furthermore, the ill-posedness of the inverse problem often leads to instabilities in the optimization.Inspired by the latest advances in direct-line-of-sight inverse rendering that have led to stunning results for reconstructing scene geometry and appearance, we present a fast differentiable transient renderer that accelerates the inverse rendering runtime to minutes on consumer hardware, making it possible to apply inverse transient imaging on a wider range of tasks and in more time-critical scenarios. We demonstrate its effectiveness on a series of applications using various datasets and show that it can be used for self-supervised learning.
近年来,由于在医学和自动驾驶等领域的潜在应用,对非视距成像问题的研究势头强劲。虽然瞬态图像形成很好理解,并且存在各种非视线场景的重建方法,这些方法结合了高效的前向渲染器和优化方案,但即使对于中等大小的场景,这些方法也需要几个小时的运行时间。此外,逆问题的病态性往往导致优化过程的不稳定性。受直接视线反向渲染的最新进展的启发,在重建场景几何形状和外观方面取得了惊人的结果,我们提出了一个快速可微分的瞬态渲染器,可将消费者硬件上的反向渲染运行时间加速到几分钟,从而可以在更广泛的任务范围和更多时间关键的场景中应用反向瞬态成像。我们在使用各种数据集的一系列应用中证明了它的有效性,并表明它可以用于自监督学习。
{"title":"Fast Differentiable Transient Rendering for Non-Line-of-Sight Reconstruction","authors":"Markus Plack, C. Callenberg, M. Schneider, M. Hullin","doi":"10.1109/WACV56688.2023.00308","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00308","url":null,"abstract":"Research into non-line-of-sight imaging problems has gained momentum in recent years motivated by intriguing prospective applications in e.g. medicine and autonomous driving. While transient image formation is well understood and there exist various reconstruction approaches for non-line-of-sight scenes that combine efficient forward renderers with optimization schemes, those approaches suffer from runtimes in the order of hours even for moderately sized scenes. Furthermore, the ill-posedness of the inverse problem often leads to instabilities in the optimization.Inspired by the latest advances in direct-line-of-sight inverse rendering that have led to stunning results for reconstructing scene geometry and appearance, we present a fast differentiable transient renderer that accelerates the inverse rendering runtime to minutes on consumer hardware, making it possible to apply inverse transient imaging on a wider range of tasks and in more time-critical scenarios. We demonstrate its effectiveness on a series of applications using various datasets and show that it can be used for self-supervised learning.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128725960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
D2F2WOD: Learning Object Proposals for Weakly-Supervised Object Detection via Progressive Domain Adaptation 基于渐进式领域自适应的弱监督目标检测的学习对象建议
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00011
Yuting Wang, Ricardo Guerrero, V. Pavlovic
Weakly-supervised object detection (WSOD) models attempt to leverage image-level annotations in lieu of accurate but costly-to-obtain object localization labels. This oftentimes leads to substandard object detection and lo-calization at inference time. To tackle this issue, we propose D2F2WOD, a Dual-Domain Fully-to-Weakly Supervised Object Detection framework that leverages synthetic data, annotated with precise object localization, to supplement a natural image target domain, where only image-level labels are available. In its warm-up domain adaptation stage, the model learns a fully-supervised object detector (FSOD) to improve the precision of the object proposals in the target domain, and at the same time learns target-domain-specific and detection-aware proposal features. In its main WSOD stage, a WSOD model is specifically tuned to the target domain. The feature extractor and the object proposal generator of the WSOD model are built upon the fine-tuned FSOD model. We test D2F2WOD on five dual-domain image benchmarks. The results show that our method results in consistently improved object detection and localization compared with state-of-the-art methods.
弱监督对象检测(WSOD)模型试图利用图像级注释来代替准确但获取代价高昂的对象定位标签。这通常会导致不合格的对象检测和推理时的低化。为了解决这个问题,我们提出了D2F2WOD,一个双域全到弱监督对象检测框架,它利用合成数据,用精确的对象定位注释,来补充自然图像目标域,其中只有图像级标签可用。在预热域适应阶段,该模型学习了一种全监督目标检测器(FSOD)来提高目标域目标建议的精度,同时学习了目标域特定的和检测感知的建议特征。在其主要的WSOD阶段,WSOD模型被专门调优到目标域。WSOD模型的特征提取器和目标建议生成器是建立在微调后的FSOD模型之上的。我们在五个双域图像基准上测试D2F2WOD。结果表明,与现有方法相比,我们的方法在目标检测和定位方面取得了持续的进步。
{"title":"D2F2WOD: Learning Object Proposals for Weakly-Supervised Object Detection via Progressive Domain Adaptation","authors":"Yuting Wang, Ricardo Guerrero, V. Pavlovic","doi":"10.1109/WACV56688.2023.00011","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00011","url":null,"abstract":"Weakly-supervised object detection (WSOD) models attempt to leverage image-level annotations in lieu of accurate but costly-to-obtain object localization labels. This oftentimes leads to substandard object detection and lo-calization at inference time. To tackle this issue, we propose D2F2WOD, a Dual-Domain Fully-to-Weakly Supervised Object Detection framework that leverages synthetic data, annotated with precise object localization, to supplement a natural image target domain, where only image-level labels are available. In its warm-up domain adaptation stage, the model learns a fully-supervised object detector (FSOD) to improve the precision of the object proposals in the target domain, and at the same time learns target-domain-specific and detection-aware proposal features. In its main WSOD stage, a WSOD model is specifically tuned to the target domain. The feature extractor and the object proposal generator of the WSOD model are built upon the fine-tuned FSOD model. We test D2F2WOD on five dual-domain image benchmarks. The results show that our method results in consistently improved object detection and localization compared with state-of-the-art methods.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"56 42","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120839604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Generative Alignment of Posterior Probabilities for Source-free Domain Adaptation 无源域自适应的后验概率生成对齐
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00411
S. Chhabra, Hemanth Venkateswara, Baoxin Li
Existing domain adaptation literature comprises multiple techniques that align the labeled source and unlabeled target domains at different stages, and predict the target labels. In a source-free domain adaptation setting, the source data is not available for alignment. We present a source-free generative paradigm that captures the relations between the source categories and enforces them onto the unlabeled target data, thereby circumventing the need for source data without introducing any new hyper-parameters. The adaptation is performed through the adversarial alignment of the posterior probabilities of the source and target categories. The proposed approach demonstrates competitive performance against other source-free domain adaptation techniques and can also be used for source-present settings.
现有的领域适应文献包括多种技术,它们在不同阶段对标记的源和未标记的目标领域进行对齐,并预测目标标签。在无源域自适应设置中,源数据不可用于对齐。我们提出了一个无源的生成范式,它捕获源类别之间的关系,并将它们强制到未标记的目标数据上,从而在不引入任何新的超参数的情况下避免了对源数据的需求。自适应是通过源和目标类别的后验概率的对抗性对齐来执行的。所提出的方法展示了与其他无源域自适应技术相比具有竞争力的性能,也可用于源当前设置。
{"title":"Generative Alignment of Posterior Probabilities for Source-free Domain Adaptation","authors":"S. Chhabra, Hemanth Venkateswara, Baoxin Li","doi":"10.1109/WACV56688.2023.00411","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00411","url":null,"abstract":"Existing domain adaptation literature comprises multiple techniques that align the labeled source and unlabeled target domains at different stages, and predict the target labels. In a source-free domain adaptation setting, the source data is not available for alignment. We present a source-free generative paradigm that captures the relations between the source categories and enforces them onto the unlabeled target data, thereby circumventing the need for source data without introducing any new hyper-parameters. The adaptation is performed through the adversarial alignment of the posterior probabilities of the source and target categories. The proposed approach demonstrates competitive performance against other source-free domain adaptation techniques and can also be used for source-present settings.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116291484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Computer Vision for Ocean Eddy Detection in Infrared Imagery 红外图像中海洋涡流检测的计算机视觉
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00633
Evangelos Moschos, Alisa Kugusheva, Paul Coste, A. Stegner
Reliable and precise detection of ocean eddies can significantly improve the monitoring of the ocean surface and subsurface dynamics, besides the characterization of local hydrographical and biological properties, or the concentration pelagic species. Today, most of the eddy detection algorithms operate on satellite altimetry gridded observations, which provide daily maps of sea surface height and surface geostrophic velocity. However, the reliability and the spatial resolution of altimetry products is limited by the strong spatio-temporal averaging of the mapping procedure. Yet, the availability of high-resolution satellite imagery makes real-time object detection possible at a much finer scale, via advanced computer vision methods. We propose a novel eddy detection method via a transfer learning schema, using the ground truth of high-resolution ocean numerical models to link the characteristic streamlines of eddies with their signature (gradients, swirls, and filaments) on Sea Surface Temperature (SST). A trained, multi-task convolutional neural network is then employed to segment infrared satellite imagery of SST in order to retrieve the accurate position, size, and form of each detected eddy. The EddyScan-SST is an operational oceanographic module that provides, in real-time, key information on the ocean dynamics to maritime stakeholders.
可靠而精确的海洋涡旋检测除了可以表征局部水文和生物特性,或集中上层生物外,还可以显著改善对海洋表面和地下动力学的监测。今天,大多数涡旋检测算法都是基于卫星测高网格观测,这些观测提供海面高度和地表地转速度的每日地图。然而,测高产品的可靠性和空间分辨率受到测高过程中强烈的时空平均的限制。然而,通过先进的计算机视觉方法,高分辨率卫星图像的可用性使得实时目标检测在更精细的尺度上成为可能。我们提出了一种新的涡流检测方法,通过迁移学习模式,利用高分辨率海洋数值模式的地面真实值,将涡流的特征流线与它们在海表温度(SST)上的特征(梯度、漩涡和细丝)联系起来。然后使用经过训练的多任务卷积神经网络对海温的红外卫星图像进行分割,以检索每个检测到的涡流的准确位置、大小和形式。EddyScan-SST是一个可操作的海洋学模块,可为海事利益相关者提供有关海洋动态的实时关键信息。
{"title":"Computer Vision for Ocean Eddy Detection in Infrared Imagery","authors":"Evangelos Moschos, Alisa Kugusheva, Paul Coste, A. Stegner","doi":"10.1109/WACV56688.2023.00633","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00633","url":null,"abstract":"Reliable and precise detection of ocean eddies can significantly improve the monitoring of the ocean surface and subsurface dynamics, besides the characterization of local hydrographical and biological properties, or the concentration pelagic species. Today, most of the eddy detection algorithms operate on satellite altimetry gridded observations, which provide daily maps of sea surface height and surface geostrophic velocity. However, the reliability and the spatial resolution of altimetry products is limited by the strong spatio-temporal averaging of the mapping procedure. Yet, the availability of high-resolution satellite imagery makes real-time object detection possible at a much finer scale, via advanced computer vision methods. We propose a novel eddy detection method via a transfer learning schema, using the ground truth of high-resolution ocean numerical models to link the characteristic streamlines of eddies with their signature (gradients, swirls, and filaments) on Sea Surface Temperature (SST). A trained, multi-task convolutional neural network is then employed to segment infrared satellite imagery of SST in order to retrieve the accurate position, size, and form of each detected eddy. The EddyScan-SST is an operational oceanographic module that provides, in real-time, key information on the ocean dynamics to maritime stakeholders.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126905283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Couplformer: Rethinking Vision Transformer with Coupling Attention 耦合器:重新思考视觉变压器与耦合注意
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00641
Hai Lan, Xihao Wang, Hao Shen, Peidong Liang, Xian Wei
With the development of the self-attention mechanism, the Transformer model has demonstrated its outstanding performance in the computer vision domain. However, the massive computation brought from the full attention mechanism became a heavy burden for memory consumption. Sequentially, the limitation of memory consumption hinders the deployment of the Transformer model on the embedded system where the computing resources are limited. To remedy this problem, we propose a novel memory economy attention mechanism named Couplformer, which decouples the attention map into two sub-matrices and generates the alignment scores from spatial information. Our method enables the Transformer model to improve time and memory efficiency while maintaining expressive power. A series of different scale image classification tasks are applied to evaluate the effectiveness of our model. The result of experiments shows that on the ImageNet-1K classification task, the Couplformer can significantly decrease 42% memory consumption compared with the regular Transformer. Meanwhile, it accesses sufficient accuracy requirements, which outperforms 0.56% on Top-1 accuracy and occupies the same memory footprint. Besides, the Couplformer achieves state-of-art performance in MS COCO 2017 object detection and instance segmentation tasks. As a result, the Couplformer can serve as an efficient backbone in visual tasks and provide a novel perspective on deploying attention mechanisms for researchers.
随着自注意机制的发展,Transformer模型在计算机视觉领域表现出了优异的性能。然而,全注意机制带来的大量计算成为内存消耗的沉重负担。接着,内存消耗的限制阻碍了在计算资源有限的嵌入式系统上部署Transformer模型。为了解决这一问题,我们提出了一种新的记忆经济注意机制——耦合器(Couplformer),该机制将注意图解耦为两个子矩阵,并根据空间信息生成对齐分数。我们的方法使Transformer模型能够在保持表达能力的同时提高时间和内存效率。应用一系列不同尺度的图像分类任务来评估我们的模型的有效性。实验结果表明,在ImageNet-1K分类任务上,与常规Transformer相比,Couplformer可以显著降低42%的内存消耗。同时,它访问了足够的精度要求,优于0.56%的Top-1精度,占用相同的内存占用。此外,Couplformer在MS COCO 2017目标检测和实例分割任务中达到了最先进的性能。因此,耦合器可以作为视觉任务的有效骨干,为研究人员提供了一个部署注意机制的新视角。
{"title":"Couplformer: Rethinking Vision Transformer with Coupling Attention","authors":"Hai Lan, Xihao Wang, Hao Shen, Peidong Liang, Xian Wei","doi":"10.1109/WACV56688.2023.00641","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00641","url":null,"abstract":"With the development of the self-attention mechanism, the Transformer model has demonstrated its outstanding performance in the computer vision domain. However, the massive computation brought from the full attention mechanism became a heavy burden for memory consumption. Sequentially, the limitation of memory consumption hinders the deployment of the Transformer model on the embedded system where the computing resources are limited. To remedy this problem, we propose a novel memory economy attention mechanism named Couplformer, which decouples the attention map into two sub-matrices and generates the alignment scores from spatial information. Our method enables the Transformer model to improve time and memory efficiency while maintaining expressive power. A series of different scale image classification tasks are applied to evaluate the effectiveness of our model. The result of experiments shows that on the ImageNet-1K classification task, the Couplformer can significantly decrease 42% memory consumption compared with the regular Transformer. Meanwhile, it accesses sufficient accuracy requirements, which outperforms 0.56% on Top-1 accuracy and occupies the same memory footprint. Besides, the Couplformer achieves state-of-art performance in MS COCO 2017 object detection and instance segmentation tasks. As a result, the Couplformer can serve as an efficient backbone in visual tasks and provide a novel perspective on deploying attention mechanisms for researchers.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128970690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Robust and Efficient Alignment of Calcium Imaging Data through Simultaneous Low Rank and Sparse Decomposition 基于低秩和稀疏同时分解的钙成像数据鲁棒高效对齐
Pub Date : 2023-01-01 DOI: 10.1109/WACV56688.2023.00198
Junmo Cho, Seungjae Han, Eun-Seo Cho, Kijung Shin, Young-Gyu Yoon
Accurate alignment of calcium imaging data, which is critical for the extraction of neuronal activity signals, is often hindered by the image noise and the neuronal activity itself. To address the problem, we propose an algorithm named REALS for robust and efficient batch image alignment through simultaneous transformation and low rank and sparse decomposition. REALS is constructed upon our finding that the low rank subspace can be recovered via linear projection, which allows us to perform simultaneous image alignment and decomposition with gradient-based updates. REALS achieves orders-of-magnitude improvement in terms of accuracy and speed compared to the state-of-the-art robust image alignment algorithms.
钙离子成像数据的精确对齐是提取神经元活动信号的关键,但常常受到图像噪声和神经元活动本身的阻碍。为了解决这个问题,我们提出了一种名为REALS的算法,通过同时变换和低秩稀疏分解来实现鲁棒高效的批量图像对齐。REALS是在我们发现低秩子空间可以通过线性投影恢复的基础上构建的,这允许我们同时使用基于梯度的更新执行图像对齐和分解。与最先进的鲁棒图像对齐算法相比,REALS在精度和速度方面实现了数量级的改进。
{"title":"Robust and Efficient Alignment of Calcium Imaging Data through Simultaneous Low Rank and Sparse Decomposition","authors":"Junmo Cho, Seungjae Han, Eun-Seo Cho, Kijung Shin, Young-Gyu Yoon","doi":"10.1109/WACV56688.2023.00198","DOIUrl":"https://doi.org/10.1109/WACV56688.2023.00198","url":null,"abstract":"Accurate alignment of calcium imaging data, which is critical for the extraction of neuronal activity signals, is often hindered by the image noise and the neuronal activity itself. To address the problem, we propose an algorithm named REALS for robust and efficient batch image alignment through simultaneous transformation and low rank and sparse decomposition. REALS is constructed upon our finding that the low rank subspace can be recovered via linear projection, which allows us to perform simultaneous image alignment and decomposition with gradient-based updates. REALS achieves orders-of-magnitude improvement in terms of accuracy and speed compared to the state-of-the-art robust image alignment algorithms.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116706308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1