首页 > 最新文献

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
Matching and Recovering 3D People from Multiple Views 从多个视图中匹配和恢复3D人物
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00125
Alejandro Pérez-Yus, Antonio Agudo
This paper introduces an approach to simultaneously match and recover 3D people from multiple calibrated cameras. To this end, we present an affinity measure between 2D detections across different views that enforces an uncertainty geometric consistency. This similarity is then exploited by a novel multi-view matching algorithm to cluster the detections, being robust against partial observations as well as bad detections and without assuming any prior about the number of people in the scene. After that, the multi-view correspondences are used in order to efficiently infer the 3D pose of each body by means of a 3D pictorial structure model in combination with physico-geometric constraints. Our algorithm is thoroughly evaluated on challenging scenarios where several human bodies are performing different activities which involve complex motions, producing large occlusions in some views and noisy observations. We outperform state-of-the-art results in terms of matching and 3D reconstruction.
本文介绍了一种从多个标定相机中同时匹配和恢复三维人物的方法。为此,我们提出了跨不同视图的二维检测之间的亲和度量,以强制不确定性几何一致性。这种相似性随后被一种新的多视图匹配算法用于聚类检测,对部分观察和不良检测具有鲁棒性,并且不需要假设场景中有多少人。然后,利用多视图对应,结合物理几何约束,通过三维图形结构模型有效地推断出每个身体的三维姿态。我们的算法在具有挑战性的场景中进行了彻底的评估,其中几个人体正在执行不同的活动,这些活动涉及复杂的运动,在某些视图和嘈杂的观察中产生大的遮挡。我们在匹配和3D重建方面优于最先进的结果。
{"title":"Matching and Recovering 3D People from Multiple Views","authors":"Alejandro Pérez-Yus, Antonio Agudo","doi":"10.1109/WACV51458.2022.00125","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00125","url":null,"abstract":"This paper introduces an approach to simultaneously match and recover 3D people from multiple calibrated cameras. To this end, we present an affinity measure between 2D detections across different views that enforces an uncertainty geometric consistency. This similarity is then exploited by a novel multi-view matching algorithm to cluster the detections, being robust against partial observations as well as bad detections and without assuming any prior about the number of people in the scene. After that, the multi-view correspondences are used in order to efficiently infer the 3D pose of each body by means of a 3D pictorial structure model in combination with physico-geometric constraints. Our algorithm is thoroughly evaluated on challenging scenarios where several human bodies are performing different activities which involve complex motions, producing large occlusions in some views and noisy observations. We outperform state-of-the-art results in terms of matching and 3D reconstruction.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127900637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Coupled Training for Multi-Source Domain Adaptation 多源域自适应的耦合训练
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00114
Ohad Amosy, Gal Chechik
Unsupervised domain adaptation is often addressed by learning a joint representation of labeled samples from a source domain and unlabeled samples from a target domain. Unfortunately, hard sharing of representation may hurt adaptation because of negative transfer, where features that are useful for source domains are learned even if they hurt inference on the target domain. Here, we propose an alternative, soft sharing scheme. We train separate but weakly-coupled models for the source and the target data, while encouraging their predictions to agree. Training the two coupled models jointly effectively exploits the distribution over unlabeled target data and achieves high accuracy on the target. Specifically, we show analytically and empirically that the decision boundaries of the target model converge to low-density "valleys" of the target distribution. We evaluate our approach on four multi-source domain adaptation (MSDA) benchmarks, digits, amazon text reviews, Office-Caltech and images (DomainNet). We find that it consistently outperforms current MSDA SoTA, sometimes by a very large margin.
无监督域自适应通常通过学习源域的标记样本和目标域的未标记样本的联合表示来解决。不幸的是,由于负迁移,难以共享表示可能会损害自适应,在负迁移中,对源域有用的特征被学习,即使它们损害了对目标域的推断。在这里,我们提出了一种替代的软共享方案。我们为源数据和目标数据训练独立但弱耦合的模型,同时鼓励它们的预测一致。两种耦合模型的联合训练有效地利用了未标记目标数据上的分布,达到了较高的目标精度。具体来说,我们通过分析和经验证明了目标模型的决策边界收敛于目标分布的低密度“谷”。我们用四个多源域适应(MSDA)基准、数字、亚马逊文本评论、Office-Caltech和图像(DomainNet)来评估我们的方法。我们发现它始终优于当前的MSDA SoTA,有时甚至有很大的差距。
{"title":"Coupled Training for Multi-Source Domain Adaptation","authors":"Ohad Amosy, Gal Chechik","doi":"10.1109/WACV51458.2022.00114","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00114","url":null,"abstract":"Unsupervised domain adaptation is often addressed by learning a joint representation of labeled samples from a source domain and unlabeled samples from a target domain. Unfortunately, hard sharing of representation may hurt adaptation because of negative transfer, where features that are useful for source domains are learned even if they hurt inference on the target domain. Here, we propose an alternative, soft sharing scheme. We train separate but weakly-coupled models for the source and the target data, while encouraging their predictions to agree. Training the two coupled models jointly effectively exploits the distribution over unlabeled target data and achieves high accuracy on the target. Specifically, we show analytically and empirically that the decision boundaries of the target model converge to low-density \"valleys\" of the target distribution. We evaluate our approach on four multi-source domain adaptation (MSDA) benchmarks, digits, amazon text reviews, Office-Caltech and images (DomainNet). We find that it consistently outperforms current MSDA SoTA, sometimes by a very large margin.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126722370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An experimental comparison of multi-view stereo approaches on satellite images 卫星图像多视点立体方法的实验比较
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00078
Álvaro Gómez, G. Randall, G. Facciolo, Rafael Grompone von Gioi
Different methods can be applied to satellite images to derive an altitude map from a set of images. In this article we evaluate a set of representative methods from different approaches. We consider true multi-view stereo methods as well as pair-wise ones, classic methods and deep learning based ones, methods already in use on satellite images and others that were originally devised for close range imaging and are adapted to satellite imagery. While deep learning (DL) methods have taken over multi-view stereo reconstruction in the last years, this tendency has not fully reached satellite stereo pipelines that still largely rely on pair-wise classic algorithms. For the comparison, we set-up a framework that allows to interface a DL-based stereo method taken from the computer vision literature with a satellite stereo pipeline. For multi-view stereo algorithms we build on a recently proposed framework originally devised to apply Colmap method to satellite images. Methods are compared on several datasets that include sets of images taken within a few days and sets of images taken months apart. Results show that DL methods have, in general, a good generalization power. In particular, the use of the GANet DL method as the matching step in a pair-wise stereo pipeline is promising as it already performs better than the classic counterpart, even without a specific training.
可以对卫星图像应用不同的方法,从一组图像中导出高度图。在本文中,我们评估了一组来自不同方法的代表性方法。我们考虑了真正的多视图立体方法以及成对方法,经典方法和基于深度学习的方法,已经在卫星图像上使用的方法以及最初为近距离成像而设计并适用于卫星图像的其他方法。虽然深度学习(DL)方法在过去几年中已经接管了多视图立体重建,但这种趋势尚未完全达到卫星立体管道,仍然主要依赖于成对经典算法。为了进行比较,我们建立了一个框架,该框架允许从计算机视觉文献中获取基于dl的立体方法与卫星立体管道相连接。对于多视图立体算法,我们建立在最近提出的框架上,最初设计用于将Colmap方法应用于卫星图像。方法在几个数据集上进行比较,这些数据集包括几天内拍摄的图像集和相隔数月拍摄的图像集。结果表明,深度学习方法总体上具有较好的泛化能力。特别是,使用GANet DL方法作为配对立体管道中的匹配步骤是有希望的,因为即使没有特定的训练,它也比经典的匹配步骤表现得更好。
{"title":"An experimental comparison of multi-view stereo approaches on satellite images","authors":"Álvaro Gómez, G. Randall, G. Facciolo, Rafael Grompone von Gioi","doi":"10.1109/WACV51458.2022.00078","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00078","url":null,"abstract":"Different methods can be applied to satellite images to derive an altitude map from a set of images. In this article we evaluate a set of representative methods from different approaches. We consider true multi-view stereo methods as well as pair-wise ones, classic methods and deep learning based ones, methods already in use on satellite images and others that were originally devised for close range imaging and are adapted to satellite imagery. While deep learning (DL) methods have taken over multi-view stereo reconstruction in the last years, this tendency has not fully reached satellite stereo pipelines that still largely rely on pair-wise classic algorithms. For the comparison, we set-up a framework that allows to interface a DL-based stereo method taken from the computer vision literature with a satellite stereo pipeline. For multi-view stereo algorithms we build on a recently proposed framework originally devised to apply Colmap method to satellite images. Methods are compared on several datasets that include sets of images taken within a few days and sets of images taken months apart. Results show that DL methods have, in general, a good generalization power. In particular, the use of the GANet DL method as the matching step in a pair-wise stereo pipeline is promising as it already performs better than the classic counterpart, even without a specific training.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126633659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Deep Feature Prior Guided Face Deblurring 深度特征先验引导人脸去模糊
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00096
S. Jung, Tae Bok Lee, Y. S. Heo
Most recent face deblurring methods have focused on utilizing facial shape priors such as face landmarks and parsing maps. While these priors can provide facial geometric cues effectively, they are insufficient to contain local texture details that act as important clues to solve face deblurring problem. To deal with this, we focus on estimating the deep features of pre-trained face recognition networks (e.g., VGGFace network) that include rich information about sharp faces as a prior, and adopt a generative adversarial network (GAN) to learn it. To this end, we propose a deep feature prior guided network (DFPGnet) that restores facial details using the estimated the deep feature prior from a blurred image. In our DFPGnet, the generator is divided into two streams including prior estimation and deblurring streams. Since the estimated deep features of the prior estimation stream are learned from the VGGFace network which is trained for face recognition not for deblurring, we need to alleviate the discrepancy of feature distributions between the two streams. Therefore, we present feature transform modules at the connecting points of the two streams. In addition, we propose a channel-attention feature discriminator and prior loss, which encourages the generator to focus on more important channels for deblurring among the deep feature prior during training. Experimental results show that our method achieves state-of-the-art performance both qualitatively and quantitatively.
最近的人脸去模糊方法主要集中在利用人脸地标和解析地图等面部形状先验。虽然这些先验可以有效地提供面部几何线索,但它们不足以包含作为解决面部去模糊问题重要线索的局部纹理细节。为了解决这个问题,我们专注于估计预训练的人脸识别网络(例如VGGFace网络)的深度特征,这些网络包括关于尖锐面孔的丰富信息作为先验,并采用生成对抗网络(GAN)来学习它。为此,我们提出了一种深度特征先验引导网络(DFPGnet),该网络利用从模糊图像中估计的深度特征先验来恢复面部细节。在我们的DFPGnet中,生成器被分为两个流,包括先验估计和去模糊流。由于先验估计流的估计深度特征是从VGGFace网络中学习来的,而VGGFace网络是为了人脸识别而训练的,而不是为了去模糊,因此我们需要缓解两种流之间特征分布的差异。因此,我们在两个流的连接点上提出了特征变换模块。此外,我们还提出了信道关注特征鉴别器和先验损失,这鼓励生成器在训练过程中关注更重要的信道来消除深度特征之间的模糊。实验结果表明,我们的方法在定性和定量上都达到了最先进的性能。
{"title":"Deep Feature Prior Guided Face Deblurring","authors":"S. Jung, Tae Bok Lee, Y. S. Heo","doi":"10.1109/WACV51458.2022.00096","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00096","url":null,"abstract":"Most recent face deblurring methods have focused on utilizing facial shape priors such as face landmarks and parsing maps. While these priors can provide facial geometric cues effectively, they are insufficient to contain local texture details that act as important clues to solve face deblurring problem. To deal with this, we focus on estimating the deep features of pre-trained face recognition networks (e.g., VGGFace network) that include rich information about sharp faces as a prior, and adopt a generative adversarial network (GAN) to learn it. To this end, we propose a deep feature prior guided network (DFPGnet) that restores facial details using the estimated the deep feature prior from a blurred image. In our DFPGnet, the generator is divided into two streams including prior estimation and deblurring streams. Since the estimated deep features of the prior estimation stream are learned from the VGGFace network which is trained for face recognition not for deblurring, we need to alleviate the discrepancy of feature distributions between the two streams. Therefore, we present feature transform modules at the connecting points of the two streams. In addition, we propose a channel-attention feature discriminator and prior loss, which encourages the generator to focus on more important channels for deblurring among the deep feature prior during training. Experimental results show that our method achieves state-of-the-art performance both qualitatively and quantitatively.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124951899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Lane-Level Street Map Extraction from Aerial Imagery 基于航拍图像的巷级街道地图提取
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00156
Songtao He, Harinarayanan Balakrishnan
Digital maps with lane-level details are the foundation of many applications. However, creating and maintaining digital maps especially maps with lane-level details, are labor-intensive and expensive. In this work, we propose a mapping pipeline to extract lane-level street maps from aerial imagery automatically. Our mapping pipeline first extracts lanes at non-intersection areas, then it enumerates all the possible turning lanes at intersections, validates the connectivity of them, and extracts the valid turning lanes to complete the map. We evaluate the accuracy of our mapping pipeline on a dataset consisting of four U.S. cities, demonstrating the effectiveness of our proposed mapping pipeline and the potential of scalable mapping solutions based on aerial imagery.
具有车道级别细节的数字地图是许多应用程序的基础。然而,创建和维护数字地图,特别是具有车道级详细信息的地图,是劳动密集型且昂贵的。在这项工作中,我们提出了一个映射管道,从航空图像中自动提取车道级街道地图。我们的映射管道首先提取非交叉口区域的车道,然后枚举交叉口所有可能的转弯车道,验证它们的连通性,提取有效的转弯车道,完成地图绘制。我们在由四个美国城市组成的数据集上评估了我们的测绘管道的准确性,展示了我们提出的测绘管道的有效性以及基于航空图像的可扩展测绘解决方案的潜力。
{"title":"Lane-Level Street Map Extraction from Aerial Imagery","authors":"Songtao He, Harinarayanan Balakrishnan","doi":"10.1109/WACV51458.2022.00156","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00156","url":null,"abstract":"Digital maps with lane-level details are the foundation of many applications. However, creating and maintaining digital maps especially maps with lane-level details, are labor-intensive and expensive. In this work, we propose a mapping pipeline to extract lane-level street maps from aerial imagery automatically. Our mapping pipeline first extracts lanes at non-intersection areas, then it enumerates all the possible turning lanes at intersections, validates the connectivity of them, and extracts the valid turning lanes to complete the map. We evaluate the accuracy of our mapping pipeline on a dataset consisting of four U.S. cities, demonstrating the effectiveness of our proposed mapping pipeline and the potential of scalable mapping solutions based on aerial imagery.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128692221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of Feature Similarity LEAD:基于特征相似度对齐分布的自监督地标估计
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00310
Tejan Karmali, Abhinav Atrishi, Sai Sree Harsha, Susmit Agrawal, Varun Jampani, R. Venkatesh Babu
In this work, we introduce LEAD, an approach to dis-cover landmarks from an unannotated collection of category-specific images. Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image, which are further used to learn landmarks in a semi-supervised manner. While there have been advances in self-supervised learning of image features for instance-level tasks like classification, these methods do not ensure dense equivariant representations. The property of equivariance is of interest for dense prediction tasks like landmark estimation. In this work, we introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion. We follow a two-stage training approach: first, we train a network using the BYOL [13] objective which operates at an instance level. The correspondences obtained through this network are further used to train a dense and compact representation of the image using a lightweight network. We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations while also improving generalization across scale variations.
在这项工作中,我们介绍了LEAD,一种从未注释的特定类别图像集合中发现地标的方法。现有的自监督地标检测工作是基于从图像中学习密集(像素级)特征表示,并进一步以半监督的方式学习地标。虽然在实例级任务(如分类)中图像特征的自监督学习方面取得了进展,但这些方法并不能确保密集的等变表示。等方差的性质对于像地标估计这样的密集预测任务很有意义。在这项工作中,我们介绍了一种以自监督的方式增强密集等变表示学习的方法。我们采用两阶段训练方法:首先,我们使用BYOL[13]目标训练网络,该目标在实例级别上运行。通过该网络获得的对应关系进一步用于使用轻量级网络训练图像的密集和紧凑表示。我们表明,在特征提取器中具有这样的先验有助于地标检测,即使在注释数量非常有限的情况下,也可以提高跨尺度变化的泛化。
{"title":"LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of Feature Similarity","authors":"Tejan Karmali, Abhinav Atrishi, Sai Sree Harsha, Susmit Agrawal, Varun Jampani, R. Venkatesh Babu","doi":"10.1109/WACV51458.2022.00310","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00310","url":null,"abstract":"In this work, we introduce LEAD, an approach to dis-cover landmarks from an unannotated collection of category-specific images. Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image, which are further used to learn landmarks in a semi-supervised manner. While there have been advances in self-supervised learning of image features for instance-level tasks like classification, these methods do not ensure dense equivariant representations. The property of equivariance is of interest for dense prediction tasks like landmark estimation. In this work, we introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion. We follow a two-stage training approach: first, we train a network using the BYOL [13] objective which operates at an instance level. The correspondences obtained through this network are further used to train a dense and compact representation of the image using a lightweight network. We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations while also improving generalization across scale variations.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133319495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Visualizing Paired Image Similarity in Transformer Networks 变压器网络中成对图像相似度的可视化
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00160
Samuel Black, Abby Stylianou, Robert Pless, Richard Souvenir
Transformer architectures have shown promise for a wide range of computer vision tasks, including image embedding. As was the case with convolutional neural networks and other models, explainability of the predictions is a key concern, but visualization approaches tend to be architecture-specific. In this paper, we introduce a new method for producing interpretable visualizations that, given a pair of images encoded with a Transformer, show which regions contributed to their similarity. Additionally, for the task of image retrieval, we compare the performance of Transformer and ResNet models of similar capacity and show that while they have similar performance in aggregate, the retrieved results and the visual explanations for those results are quite different. Code is available at https://github.com/vidarlab/xformer-paired-viz.
变压器架构已经显示出广泛的计算机视觉任务的前景,包括图像嵌入。与卷积神经网络和其他模型的情况一样,预测的可解释性是一个关键问题,但可视化方法往往是特定于体系结构的。在本文中,我们介绍了一种新的方法来产生可解释的可视化,给定一对用Transformer编码的图像,显示哪些区域促成了它们的相似性。此外,对于图像检索任务,我们比较了Transformer和ResNet相似容量模型的性能,结果表明,虽然它们总体上具有相似的性能,但检索结果和对这些结果的视觉解释却有很大不同。代码可从https://github.com/vidarlab/xformer-paired-viz获得。
{"title":"Visualizing Paired Image Similarity in Transformer Networks","authors":"Samuel Black, Abby Stylianou, Robert Pless, Richard Souvenir","doi":"10.1109/WACV51458.2022.00160","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00160","url":null,"abstract":"Transformer architectures have shown promise for a wide range of computer vision tasks, including image embedding. As was the case with convolutional neural networks and other models, explainability of the predictions is a key concern, but visualization approaches tend to be architecture-specific. In this paper, we introduce a new method for producing interpretable visualizations that, given a pair of images encoded with a Transformer, show which regions contributed to their similarity. Additionally, for the task of image retrieval, we compare the performance of Transformer and ResNet models of similar capacity and show that while they have similar performance in aggregate, the retrieved results and the visual explanations for those results are quite different. Code is available at https://github.com/vidarlab/xformer-paired-viz.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127212174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Trading-off Information Modalities in Zero-shot Classification 零射击分类中的交换信息模式
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00174
Jorge Sánchez, Matías Molina
Zero-shot classification is the task of learning predictors for classes not seen during training. A practical way to deal with the lack of annotations for the target categories is to encode not only the inputs (images) but also the outputs (object classes) into a suitable representation space. We can use these representations to measure the degree at which images and categories agree by fitting a compatibility measure using the information available during training. One way to define such a measure is by a two step process in which we first project the elements of either space (visual or semantic) onto the other and then compute a similarity score in the target space. Although projections onto the visual space has shown better general performance, little attention has been paid to the degree at which the visual and semantic information contribute to the final predictions. In this paper, we build on this observation and propose two different formulations that allow us to explicitly trade-off the relative importance of the visual and semantic spaces for classification in a zero-shot setting. Our formulations are based on redefinition of the similarity scoring and loss function used to learn the projections. Experiments on six different datasets show that our approach lead to improve performance compared to similar methods. Moreover, combined with synthetic features, our approach competes favorably with the state of the art on both the standard and generalized settings.
零射击分类是学习在训练中没有看到的类的预测器的任务。处理目标类别缺乏注释的一种实用方法是不仅将输入(图像)编码,而且将输出(对象类)编码到合适的表示空间中。我们可以使用这些表示来测量图像和类别的一致程度,通过使用训练期间可用的信息拟合兼容性度量。定义这种度量的一种方法是通过两个步骤的过程,我们首先将其中一个空间的元素(视觉或语义)投射到另一个空间,然后计算目标空间中的相似性分数。虽然对视觉空间的投影显示出更好的总体性能,但很少有人注意到视觉和语义信息对最终预测的贡献程度。在本文中,我们以这一观察为基础,提出了两种不同的公式,使我们能够明确地权衡视觉空间和语义空间在零射击设置中分类的相对重要性。我们的公式是基于重新定义的相似性评分和用于学习预测的损失函数。在六个不同的数据集上的实验表明,与类似的方法相比,我们的方法可以提高性能。此外,结合综合功能,我们的方法在标准和广义设置上都与最先进的技术相竞争。
{"title":"Trading-off Information Modalities in Zero-shot Classification","authors":"Jorge Sánchez, Matías Molina","doi":"10.1109/WACV51458.2022.00174","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00174","url":null,"abstract":"Zero-shot classification is the task of learning predictors for classes not seen during training. A practical way to deal with the lack of annotations for the target categories is to encode not only the inputs (images) but also the outputs (object classes) into a suitable representation space. We can use these representations to measure the degree at which images and categories agree by fitting a compatibility measure using the information available during training. One way to define such a measure is by a two step process in which we first project the elements of either space (visual or semantic) onto the other and then compute a similarity score in the target space. Although projections onto the visual space has shown better general performance, little attention has been paid to the degree at which the visual and semantic information contribute to the final predictions. In this paper, we build on this observation and propose two different formulations that allow us to explicitly trade-off the relative importance of the visual and semantic spaces for classification in a zero-shot setting. Our formulations are based on redefinition of the similarity scoring and loss function used to learn the projections. Experiments on six different datasets show that our approach lead to improve performance compared to similar methods. Moreover, combined with synthetic features, our approach competes favorably with the state of the art on both the standard and generalized settings.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"483 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127565300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards Durability Estimation of Bioprosthetic Heart Valves Via Motion Symmetry Analysis 基于运动对称分析的生物人工心脏瓣膜耐久性评估
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00176
M. Alizadeh, Melissa Cote, A. Albu
This paper addresses bioprosthetic heart valve (BHV) durability estimation via computer vision (CV)-based analyses of the visual symmetry of valve leaflet motion. BHVs are routinely implanted in patients suffering from valvular heart diseases. Valve designs are rigorously tested using cardiovascular equipment, but once implanted, more than 50% of BHVs encounter a structural failure within 15 years. We investigate the correlation between the visual dynamic symmetry of BHV leaflets and the functional symmetry of the valves. We hypothesize that an asymmetry in the valve leaflet motion will generate an asymmetry in the flow patterns, resulting in added local stress and forces on some of the leaflets, which can accelerate the failure of the valve. We propose two different pair-wise leaflet symmetry scores based on the diagonals of orthogonal projection matrices (DOPM) and on dynamic time warping (DTW), computed from videos recorded during pulsatile flow tests. We compare the symmetry score profiles with those of fluid dynamic parameters (velocity and vorticity values) at the leaflet borders, obtained from valve-specific numerical simulations. Experiments on four cases that include three different tricuspid BHVs yielded promising results, with the DTW scores showing a good coherence with respect to the simulations. With a link between visual and functional symmetries established, this approach paves the way towards BHV durability estimation using CV techniques.
本文通过基于计算机视觉(CV)的瓣膜叶运动视觉对称性分析,研究了生物人工心脏瓣膜(BHV)的耐久性估计。bhv通常被植入患有瓣膜性心脏病的患者。阀门设计经过了心血管设备的严格测试,但一旦植入,超过50%的bhv会在15年内出现结构故障。我们研究了BHV小叶的视觉动态对称性与瓣膜功能对称性之间的相关性。我们假设阀瓣运动的不对称会产生流动模式的不对称,导致部分阀瓣上的局部应力和力增加,从而加速阀瓣的失效。我们提出了基于正交投影矩阵对角线(DOPM)和动态时间规整(DTW)的两种不同的成对小叶对称分数,从脉动流测试期间记录的视频中计算得到。我们比较了对称分数分布与那些流体动力学参数(速度和涡量值)在小叶边界,从阀门特定的数值模拟获得。在包括三种不同三尖瓣bhv的四种情况下进行的实验取得了令人满意的结果,DTW分数与模拟结果显示出良好的一致性。通过建立视觉对称性和功能对称性之间的联系,该方法为使用CV技术进行BHV耐久性评估铺平了道路。
{"title":"Towards Durability Estimation of Bioprosthetic Heart Valves Via Motion Symmetry Analysis","authors":"M. Alizadeh, Melissa Cote, A. Albu","doi":"10.1109/WACV51458.2022.00176","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00176","url":null,"abstract":"This paper addresses bioprosthetic heart valve (BHV) durability estimation via computer vision (CV)-based analyses of the visual symmetry of valve leaflet motion. BHVs are routinely implanted in patients suffering from valvular heart diseases. Valve designs are rigorously tested using cardiovascular equipment, but once implanted, more than 50% of BHVs encounter a structural failure within 15 years. We investigate the correlation between the visual dynamic symmetry of BHV leaflets and the functional symmetry of the valves. We hypothesize that an asymmetry in the valve leaflet motion will generate an asymmetry in the flow patterns, resulting in added local stress and forces on some of the leaflets, which can accelerate the failure of the valve. We propose two different pair-wise leaflet symmetry scores based on the diagonals of orthogonal projection matrices (DOPM) and on dynamic time warping (DTW), computed from videos recorded during pulsatile flow tests. We compare the symmetry score profiles with those of fluid dynamic parameters (velocity and vorticity values) at the leaflet borders, obtained from valve-specific numerical simulations. Experiments on four cases that include three different tricuspid BHVs yielded promising results, with the DTW scores showing a good coherence with respect to the simulations. With a link between visual and functional symmetries established, this approach paves the way towards BHV durability estimation using CV techniques.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134379015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel-View Synthesis of Human Tourist Photos 人类旅游照片的新视角合成
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00093
Jonathan Freer, K. M. Yi, Wei Jiang, Jongwon Choi, H. Chang
We present a novel framework for performing novel-view synthesis on human tourist photos. Given a tourist photo from a known scene, we reconstruct the photo in 3D space through modeling the human and the background independently. We generate a deep buffer from a novel viewpoint of the reconstruction and utilize a deep network to translate the buffer into a photo-realistic rendering of the novel view. We additionally present a method to relight the renderings, allowing for relighting of both human and background to match either the provided input image or any other. The key contributions of our paper are: 1) a framework for performing novel view synthesis on human tourist photos, 2) an appearance transfer method for relighting of humans to match synthesized backgrounds, and 3) a method for estimating lighting properties from a single human photo. We demonstrate the proposed framework on photos from two different scenes of various tourists.
我们提出了一种对人类旅游照片进行新视图合成的新框架。给定一个已知场景的游客照片,我们通过对人物和背景的独立建模,在三维空间中重建照片。我们从重建的新视角生成一个深度缓冲区,并利用深度网络将缓冲区转换为新视图的逼真渲染。我们还提出了一种方法来重新照亮渲染,允许重新照亮人和背景,以匹配所提供的输入图像或任何其他。本文的主要贡献是:1)一个对人类旅游照片进行新颖视图合成的框架,2)一种重新照亮人类以匹配合成背景的外观转移方法,以及3)一种从单个人类照片估计照明属性的方法。我们在不同游客的两个不同场景的照片上展示了所提出的框架。
{"title":"Novel-View Synthesis of Human Tourist Photos","authors":"Jonathan Freer, K. M. Yi, Wei Jiang, Jongwon Choi, H. Chang","doi":"10.1109/WACV51458.2022.00093","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00093","url":null,"abstract":"We present a novel framework for performing novel-view synthesis on human tourist photos. Given a tourist photo from a known scene, we reconstruct the photo in 3D space through modeling the human and the background independently. We generate a deep buffer from a novel viewpoint of the reconstruction and utilize a deep network to translate the buffer into a photo-realistic rendering of the novel view. We additionally present a method to relight the renderings, allowing for relighting of both human and background to match either the provided input image or any other. The key contributions of our paper are: 1) a framework for performing novel view synthesis on human tourist photos, 2) an appearance transfer method for relighting of humans to match synthesized backgrounds, and 3) a method for estimating lighting properties from a single human photo. We demonstrate the proposed framework on photos from two different scenes of various tourists.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134452992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1