首页 > 最新文献

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
Trading-off Information Modalities in Zero-shot Classification 零射击分类中的交换信息模式
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00174
Jorge Sánchez, Matías Molina
Zero-shot classification is the task of learning predictors for classes not seen during training. A practical way to deal with the lack of annotations for the target categories is to encode not only the inputs (images) but also the outputs (object classes) into a suitable representation space. We can use these representations to measure the degree at which images and categories agree by fitting a compatibility measure using the information available during training. One way to define such a measure is by a two step process in which we first project the elements of either space (visual or semantic) onto the other and then compute a similarity score in the target space. Although projections onto the visual space has shown better general performance, little attention has been paid to the degree at which the visual and semantic information contribute to the final predictions. In this paper, we build on this observation and propose two different formulations that allow us to explicitly trade-off the relative importance of the visual and semantic spaces for classification in a zero-shot setting. Our formulations are based on redefinition of the similarity scoring and loss function used to learn the projections. Experiments on six different datasets show that our approach lead to improve performance compared to similar methods. Moreover, combined with synthetic features, our approach competes favorably with the state of the art on both the standard and generalized settings.
零射击分类是学习在训练中没有看到的类的预测器的任务。处理目标类别缺乏注释的一种实用方法是不仅将输入(图像)编码,而且将输出(对象类)编码到合适的表示空间中。我们可以使用这些表示来测量图像和类别的一致程度,通过使用训练期间可用的信息拟合兼容性度量。定义这种度量的一种方法是通过两个步骤的过程,我们首先将其中一个空间的元素(视觉或语义)投射到另一个空间,然后计算目标空间中的相似性分数。虽然对视觉空间的投影显示出更好的总体性能,但很少有人注意到视觉和语义信息对最终预测的贡献程度。在本文中,我们以这一观察为基础,提出了两种不同的公式,使我们能够明确地权衡视觉空间和语义空间在零射击设置中分类的相对重要性。我们的公式是基于重新定义的相似性评分和用于学习预测的损失函数。在六个不同的数据集上的实验表明,与类似的方法相比,我们的方法可以提高性能。此外,结合综合功能,我们的方法在标准和广义设置上都与最先进的技术相竞争。
{"title":"Trading-off Information Modalities in Zero-shot Classification","authors":"Jorge Sánchez, Matías Molina","doi":"10.1109/WACV51458.2022.00174","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00174","url":null,"abstract":"Zero-shot classification is the task of learning predictors for classes not seen during training. A practical way to deal with the lack of annotations for the target categories is to encode not only the inputs (images) but also the outputs (object classes) into a suitable representation space. We can use these representations to measure the degree at which images and categories agree by fitting a compatibility measure using the information available during training. One way to define such a measure is by a two step process in which we first project the elements of either space (visual or semantic) onto the other and then compute a similarity score in the target space. Although projections onto the visual space has shown better general performance, little attention has been paid to the degree at which the visual and semantic information contribute to the final predictions. In this paper, we build on this observation and propose two different formulations that allow us to explicitly trade-off the relative importance of the visual and semantic spaces for classification in a zero-shot setting. Our formulations are based on redefinition of the similarity scoring and loss function used to learn the projections. Experiments on six different datasets show that our approach lead to improve performance compared to similar methods. Moreover, combined with synthetic features, our approach competes favorably with the state of the art on both the standard and generalized settings.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"483 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127565300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Visualizing Paired Image Similarity in Transformer Networks 变压器网络中成对图像相似度的可视化
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00160
Samuel Black, Abby Stylianou, Robert Pless, Richard Souvenir
Transformer architectures have shown promise for a wide range of computer vision tasks, including image embedding. As was the case with convolutional neural networks and other models, explainability of the predictions is a key concern, but visualization approaches tend to be architecture-specific. In this paper, we introduce a new method for producing interpretable visualizations that, given a pair of images encoded with a Transformer, show which regions contributed to their similarity. Additionally, for the task of image retrieval, we compare the performance of Transformer and ResNet models of similar capacity and show that while they have similar performance in aggregate, the retrieved results and the visual explanations for those results are quite different. Code is available at https://github.com/vidarlab/xformer-paired-viz.
变压器架构已经显示出广泛的计算机视觉任务的前景,包括图像嵌入。与卷积神经网络和其他模型的情况一样,预测的可解释性是一个关键问题,但可视化方法往往是特定于体系结构的。在本文中,我们介绍了一种新的方法来产生可解释的可视化,给定一对用Transformer编码的图像,显示哪些区域促成了它们的相似性。此外,对于图像检索任务,我们比较了Transformer和ResNet相似容量模型的性能,结果表明,虽然它们总体上具有相似的性能,但检索结果和对这些结果的视觉解释却有很大不同。代码可从https://github.com/vidarlab/xformer-paired-viz获得。
{"title":"Visualizing Paired Image Similarity in Transformer Networks","authors":"Samuel Black, Abby Stylianou, Robert Pless, Richard Souvenir","doi":"10.1109/WACV51458.2022.00160","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00160","url":null,"abstract":"Transformer architectures have shown promise for a wide range of computer vision tasks, including image embedding. As was the case with convolutional neural networks and other models, explainability of the predictions is a key concern, but visualization approaches tend to be architecture-specific. In this paper, we introduce a new method for producing interpretable visualizations that, given a pair of images encoded with a Transformer, show which regions contributed to their similarity. Additionally, for the task of image retrieval, we compare the performance of Transformer and ResNet models of similar capacity and show that while they have similar performance in aggregate, the retrieved results and the visual explanations for those results are quite different. Code is available at https://github.com/vidarlab/xformer-paired-viz.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127212174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
How and What to Learn: Taxonomizing Self-Supervised Learning for 3D Action Recognition 如何学习和学习什么:用于3D动作识别的分类自监督学习
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00294
Amor Ben Tanfous, Aimen Zerroug, Drew A. Linsley, Thomas Serre
There are two competing standards for self-supervised learning in action recognition from 3D skeletons. Su et al., 2020 [31] used an auto-encoder architecture and an image reconstruction objective function to achieve state-of-the-art performance on the NTU60 C-View benchmark. Rao et al., 2020 [23] used Contrastive learning in the latent space to achieve state-of-the-art performance on the NTU60 C-Sub benchmark. Here, we reconcile these disparate approaches by developing a taxonomy of self-supervised learning for action recognition. We observe that leading approaches generally use one of two types of objective functions: those that seek to reconstruct the input from a latent representation ("Attractive" learning) versus those that also try to maximize the representations distinctiveness ("Contrastive" learning). Independently, leading approaches also differ in how they implement these objective functions: there are those that optimize representations in the decoder output space and those which optimize representations in the network’s latent space (encoder output). We find that combining these approaches leads to larger gains in performance and tolerance to transformation than is achievable by any individual method, leading to state-of-the-art performance on three standard action recognition datasets. We include links to our code and data.
在3D骨骼的动作识别中,有两种相互竞争的自我监督学习标准。Su等人,2020[31]使用自编码器架构和图像重建目标函数在NTU60 C-View基准上实现了最先进的性能。Rao等人,2020[23]使用潜在空间中的对比学习在NTU60 C-Sub基准上实现了最先进的性能。在这里,我们通过开发一种用于动作识别的自监督学习分类法来调和这些不同的方法。我们观察到,领先的方法通常使用两种类型的目标函数之一:那些试图从潜在表征(“吸引”学习)中重建输入的方法,以及那些也试图最大化表征独特性的方法(“对比”学习)。独立地,领先的方法在如何实现这些目标函数方面也有所不同:有些方法在解码器输出空间中优化表示,有些方法在网络的潜在空间(编码器输出)中优化表示。我们发现,与任何单独的方法相比,结合这些方法可以在性能和转换容忍度方面获得更大的收益,从而在三个标准动作识别数据集上获得最先进的性能。我们包含了代码和数据的链接。
{"title":"How and What to Learn: Taxonomizing Self-Supervised Learning for 3D Action Recognition","authors":"Amor Ben Tanfous, Aimen Zerroug, Drew A. Linsley, Thomas Serre","doi":"10.1109/WACV51458.2022.00294","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00294","url":null,"abstract":"There are two competing standards for self-supervised learning in action recognition from 3D skeletons. Su et al., 2020 [31] used an auto-encoder architecture and an image reconstruction objective function to achieve state-of-the-art performance on the NTU60 C-View benchmark. Rao et al., 2020 [23] used Contrastive learning in the latent space to achieve state-of-the-art performance on the NTU60 C-Sub benchmark. Here, we reconcile these disparate approaches by developing a taxonomy of self-supervised learning for action recognition. We observe that leading approaches generally use one of two types of objective functions: those that seek to reconstruct the input from a latent representation (\"Attractive\" learning) versus those that also try to maximize the representations distinctiveness (\"Contrastive\" learning). Independently, leading approaches also differ in how they implement these objective functions: there are those that optimize representations in the decoder output space and those which optimize representations in the network’s latent space (encoder output). We find that combining these approaches leads to larger gains in performance and tolerance to transformation than is achievable by any individual method, leading to state-of-the-art performance on three standard action recognition datasets. We include links to our code and data.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124034734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Coupled Training for Multi-Source Domain Adaptation 多源域自适应的耦合训练
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00114
Ohad Amosy, Gal Chechik
Unsupervised domain adaptation is often addressed by learning a joint representation of labeled samples from a source domain and unlabeled samples from a target domain. Unfortunately, hard sharing of representation may hurt adaptation because of negative transfer, where features that are useful for source domains are learned even if they hurt inference on the target domain. Here, we propose an alternative, soft sharing scheme. We train separate but weakly-coupled models for the source and the target data, while encouraging their predictions to agree. Training the two coupled models jointly effectively exploits the distribution over unlabeled target data and achieves high accuracy on the target. Specifically, we show analytically and empirically that the decision boundaries of the target model converge to low-density "valleys" of the target distribution. We evaluate our approach on four multi-source domain adaptation (MSDA) benchmarks, digits, amazon text reviews, Office-Caltech and images (DomainNet). We find that it consistently outperforms current MSDA SoTA, sometimes by a very large margin.
无监督域自适应通常通过学习源域的标记样本和目标域的未标记样本的联合表示来解决。不幸的是,由于负迁移,难以共享表示可能会损害自适应,在负迁移中,对源域有用的特征被学习,即使它们损害了对目标域的推断。在这里,我们提出了一种替代的软共享方案。我们为源数据和目标数据训练独立但弱耦合的模型,同时鼓励它们的预测一致。两种耦合模型的联合训练有效地利用了未标记目标数据上的分布,达到了较高的目标精度。具体来说,我们通过分析和经验证明了目标模型的决策边界收敛于目标分布的低密度“谷”。我们用四个多源域适应(MSDA)基准、数字、亚马逊文本评论、Office-Caltech和图像(DomainNet)来评估我们的方法。我们发现它始终优于当前的MSDA SoTA,有时甚至有很大的差距。
{"title":"Coupled Training for Multi-Source Domain Adaptation","authors":"Ohad Amosy, Gal Chechik","doi":"10.1109/WACV51458.2022.00114","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00114","url":null,"abstract":"Unsupervised domain adaptation is often addressed by learning a joint representation of labeled samples from a source domain and unlabeled samples from a target domain. Unfortunately, hard sharing of representation may hurt adaptation because of negative transfer, where features that are useful for source domains are learned even if they hurt inference on the target domain. Here, we propose an alternative, soft sharing scheme. We train separate but weakly-coupled models for the source and the target data, while encouraging their predictions to agree. Training the two coupled models jointly effectively exploits the distribution over unlabeled target data and achieves high accuracy on the target. Specifically, we show analytically and empirically that the decision boundaries of the target model converge to low-density \"valleys\" of the target distribution. We evaluate our approach on four multi-source domain adaptation (MSDA) benchmarks, digits, amazon text reviews, Office-Caltech and images (DomainNet). We find that it consistently outperforms current MSDA SoTA, sometimes by a very large margin.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126722370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FASSST: Fast Attention Based Single-Stage Segmentation Net for Real-Time Instance Segmentation FASSST:基于快速注意力的单阶段实时实例分割网络
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00277
Yuan Cheng, Rui Lin, Peining Zhen, Tianshu Hou, C. Ng, Hai-Bao Chen, Hao Yu, Ngai Wong
Real-time instance segmentation is crucial in various AI applications. This work designs a network named Fast Attention based Single-Stage Segmentation NeT (FASSST) that performs instance segmentation with video-grade speed. Using an instance attention module (IAM), FASSST quickly locates target instances and segments with region of interest (ROI) feature fusion (RFF) aggregating ROI features from pyramid mask layers. The module employs an efficient single-stage feature regression, straight from features to instance coordinates and class probabilities. Experiments on COCO and CityScapes datasets show that FASSST achieves state-of-the-art performance under competitive accuracy: real-time inference of 47.5FPS on a GTX1080Ti GPU and 5.3FPS on a Jetson Xavier NX board with only 71.6 GFLOPs.
实时实例分割在各种人工智能应用中至关重要。本文设计了一个基于快速注意力的单阶段分割网络(Fast Attention based Single-Stage Segmentation NeT, FASSST),以视频级的速度进行实例分割。FASSST使用实例关注模块(IAM),利用感兴趣区域(ROI)特征融合(RFF)从金字塔掩模层中聚合ROI特征,快速定位目标实例和段。该模块采用高效的单阶段特征回归,直接从特征到实例坐标和类概率。在COCO和cityscape数据集上的实验表明,FASSST在竞争精度下达到了最先进的性能:在GTX1080Ti GPU上的实时推理速度为47.5FPS,在Jetson Xavier NX板上的实时推理速度为5.3FPS,仅为71.6 GFLOPs。
{"title":"FASSST: Fast Attention Based Single-Stage Segmentation Net for Real-Time Instance Segmentation","authors":"Yuan Cheng, Rui Lin, Peining Zhen, Tianshu Hou, C. Ng, Hai-Bao Chen, Hao Yu, Ngai Wong","doi":"10.1109/WACV51458.2022.00277","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00277","url":null,"abstract":"Real-time instance segmentation is crucial in various AI applications. This work designs a network named Fast Attention based Single-Stage Segmentation NeT (FASSST) that performs instance segmentation with video-grade speed. Using an instance attention module (IAM), FASSST quickly locates target instances and segments with region of interest (ROI) feature fusion (RFF) aggregating ROI features from pyramid mask layers. The module employs an efficient single-stage feature regression, straight from features to instance coordinates and class probabilities. Experiments on COCO and CityScapes datasets show that FASSST achieves state-of-the-art performance under competitive accuracy: real-time inference of 47.5FPS on a GTX1080Ti GPU and 5.3FPS on a Jetson Xavier NX board with only 71.6 GFLOPs.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117323515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Matching and Recovering 3D People from Multiple Views 从多个视图中匹配和恢复3D人物
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00125
Alejandro Pérez-Yus, Antonio Agudo
This paper introduces an approach to simultaneously match and recover 3D people from multiple calibrated cameras. To this end, we present an affinity measure between 2D detections across different views that enforces an uncertainty geometric consistency. This similarity is then exploited by a novel multi-view matching algorithm to cluster the detections, being robust against partial observations as well as bad detections and without assuming any prior about the number of people in the scene. After that, the multi-view correspondences are used in order to efficiently infer the 3D pose of each body by means of a 3D pictorial structure model in combination with physico-geometric constraints. Our algorithm is thoroughly evaluated on challenging scenarios where several human bodies are performing different activities which involve complex motions, producing large occlusions in some views and noisy observations. We outperform state-of-the-art results in terms of matching and 3D reconstruction.
本文介绍了一种从多个标定相机中同时匹配和恢复三维人物的方法。为此,我们提出了跨不同视图的二维检测之间的亲和度量,以强制不确定性几何一致性。这种相似性随后被一种新的多视图匹配算法用于聚类检测,对部分观察和不良检测具有鲁棒性,并且不需要假设场景中有多少人。然后,利用多视图对应,结合物理几何约束,通过三维图形结构模型有效地推断出每个身体的三维姿态。我们的算法在具有挑战性的场景中进行了彻底的评估,其中几个人体正在执行不同的活动,这些活动涉及复杂的运动,在某些视图和嘈杂的观察中产生大的遮挡。我们在匹配和3D重建方面优于最先进的结果。
{"title":"Matching and Recovering 3D People from Multiple Views","authors":"Alejandro Pérez-Yus, Antonio Agudo","doi":"10.1109/WACV51458.2022.00125","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00125","url":null,"abstract":"This paper introduces an approach to simultaneously match and recover 3D people from multiple calibrated cameras. To this end, we present an affinity measure between 2D detections across different views that enforces an uncertainty geometric consistency. This similarity is then exploited by a novel multi-view matching algorithm to cluster the detections, being robust against partial observations as well as bad detections and without assuming any prior about the number of people in the scene. After that, the multi-view correspondences are used in order to efficiently infer the 3D pose of each body by means of a 3D pictorial structure model in combination with physico-geometric constraints. Our algorithm is thoroughly evaluated on challenging scenarios where several human bodies are performing different activities which involve complex motions, producing large occlusions in some views and noisy observations. We outperform state-of-the-art results in terms of matching and 3D reconstruction.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127900637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of Feature Similarity LEAD:基于特征相似度对齐分布的自监督地标估计
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00310
Tejan Karmali, Abhinav Atrishi, Sai Sree Harsha, Susmit Agrawal, Varun Jampani, R. Venkatesh Babu
In this work, we introduce LEAD, an approach to dis-cover landmarks from an unannotated collection of category-specific images. Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image, which are further used to learn landmarks in a semi-supervised manner. While there have been advances in self-supervised learning of image features for instance-level tasks like classification, these methods do not ensure dense equivariant representations. The property of equivariance is of interest for dense prediction tasks like landmark estimation. In this work, we introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion. We follow a two-stage training approach: first, we train a network using the BYOL [13] objective which operates at an instance level. The correspondences obtained through this network are further used to train a dense and compact representation of the image using a lightweight network. We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations while also improving generalization across scale variations.
在这项工作中,我们介绍了LEAD,一种从未注释的特定类别图像集合中发现地标的方法。现有的自监督地标检测工作是基于从图像中学习密集(像素级)特征表示,并进一步以半监督的方式学习地标。虽然在实例级任务(如分类)中图像特征的自监督学习方面取得了进展,但这些方法并不能确保密集的等变表示。等方差的性质对于像地标估计这样的密集预测任务很有意义。在这项工作中,我们介绍了一种以自监督的方式增强密集等变表示学习的方法。我们采用两阶段训练方法:首先,我们使用BYOL[13]目标训练网络,该目标在实例级别上运行。通过该网络获得的对应关系进一步用于使用轻量级网络训练图像的密集和紧凑表示。我们表明,在特征提取器中具有这样的先验有助于地标检测,即使在注释数量非常有限的情况下,也可以提高跨尺度变化的泛化。
{"title":"LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of Feature Similarity","authors":"Tejan Karmali, Abhinav Atrishi, Sai Sree Harsha, Susmit Agrawal, Varun Jampani, R. Venkatesh Babu","doi":"10.1109/WACV51458.2022.00310","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00310","url":null,"abstract":"In this work, we introduce LEAD, an approach to dis-cover landmarks from an unannotated collection of category-specific images. Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image, which are further used to learn landmarks in a semi-supervised manner. While there have been advances in self-supervised learning of image features for instance-level tasks like classification, these methods do not ensure dense equivariant representations. The property of equivariance is of interest for dense prediction tasks like landmark estimation. In this work, we introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion. We follow a two-stage training approach: first, we train a network using the BYOL [13] objective which operates at an instance level. The correspondences obtained through this network are further used to train a dense and compact representation of the image using a lightweight network. We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations while also improving generalization across scale variations.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133319495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Generative Adversarial Attack on Ensemble Clustering 集成聚类的生成对抗攻击
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00389
Chetan Kumar, Deepak Kumar, Ming Shao
Adversarial attack on learning tasks has attracted substantial attention in recent years; however, most existing works focus on supervised learning. Recently, research has shown that unsupervised learning, such as clustering, tends to be vulnerable due to adversarial attack. In this paper, we focus on a clustering algorithm widely used in the real-world environment, namely, ensemble clustering (EC). EC algorithms usually leverage basic partition (BP) and ensemble techniques to improve the clustering performance collaboratively. Each BP may stem from one trial of clustering, feature segment, or part of data stored on the cloud. We have observed that the attack tends to be less perceivable when only a few BPs are compromised. To explore plausible attack strategies, we propose a novel generative adversarial attack (GA2) model for EC, titled GA2EC. First, we show that not all BPs are equally important, and some of them are more vulnerable under adversarial attack. Second, we develop a generative adversarial model to mimic the attack on EC. In particular, the generative model will simulate behaviors of both clean BPs and perturbed key BPs, and their derived graphs, and thus can launch effective attacks with less attention. We have conducted extensive experiments on eleven clustering benchmarks and have demonstrated that our approach is effective in attacking EC under both transductive and inductive settings.
近年来,对学习任务的对抗性攻击引起了人们的广泛关注;然而,大多数现有的工作都集中在监督学习上。最近,研究表明,无监督学习,如聚类,往往容易受到对抗性攻击。在本文中,我们重点研究了在现实环境中广泛使用的聚类算法,即集成聚类(EC)。聚类算法通常利用基本分割(BP)和集成技术来协同提高聚类性能。每个BP可能源于一次聚类试验、特征片段或存储在云上的部分数据。我们观察到,当只有几个bp受到攻击时,攻击往往不太容易被察觉。为了探索可行的攻击策略,我们提出了一种新的生成式对抗性攻击(GA2)模型,称为GA2EC。首先,我们表明并非所有bp都同样重要,其中一些在对抗性攻击下更容易受到攻击。其次,我们开发了一个生成对抗模型来模拟对EC的攻击。特别是,生成模型将模拟干净bp和扰动关键bp的行为及其衍生图,从而可以在较少关注的情况下发动有效的攻击。我们在11个聚类基准上进行了广泛的实验,并证明了我们的方法在传导和感应设置下都能有效地攻击EC。
{"title":"Generative Adversarial Attack on Ensemble Clustering","authors":"Chetan Kumar, Deepak Kumar, Ming Shao","doi":"10.1109/WACV51458.2022.00389","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00389","url":null,"abstract":"Adversarial attack on learning tasks has attracted substantial attention in recent years; however, most existing works focus on supervised learning. Recently, research has shown that unsupervised learning, such as clustering, tends to be vulnerable due to adversarial attack. In this paper, we focus on a clustering algorithm widely used in the real-world environment, namely, ensemble clustering (EC). EC algorithms usually leverage basic partition (BP) and ensemble techniques to improve the clustering performance collaboratively. Each BP may stem from one trial of clustering, feature segment, or part of data stored on the cloud. We have observed that the attack tends to be less perceivable when only a few BPs are compromised. To explore plausible attack strategies, we propose a novel generative adversarial attack (GA2) model for EC, titled GA2EC. First, we show that not all BPs are equally important, and some of them are more vulnerable under adversarial attack. Second, we develop a generative adversarial model to mimic the attack on EC. In particular, the generative model will simulate behaviors of both clean BPs and perturbed key BPs, and their derived graphs, and thus can launch effective attacks with less attention. We have conducted extensive experiments on eleven clustering benchmarks and have demonstrated that our approach is effective in attacking EC under both transductive and inductive settings.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134153368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Durability Estimation of Bioprosthetic Heart Valves Via Motion Symmetry Analysis 基于运动对称分析的生物人工心脏瓣膜耐久性评估
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00176
M. Alizadeh, Melissa Cote, A. Albu
This paper addresses bioprosthetic heart valve (BHV) durability estimation via computer vision (CV)-based analyses of the visual symmetry of valve leaflet motion. BHVs are routinely implanted in patients suffering from valvular heart diseases. Valve designs are rigorously tested using cardiovascular equipment, but once implanted, more than 50% of BHVs encounter a structural failure within 15 years. We investigate the correlation between the visual dynamic symmetry of BHV leaflets and the functional symmetry of the valves. We hypothesize that an asymmetry in the valve leaflet motion will generate an asymmetry in the flow patterns, resulting in added local stress and forces on some of the leaflets, which can accelerate the failure of the valve. We propose two different pair-wise leaflet symmetry scores based on the diagonals of orthogonal projection matrices (DOPM) and on dynamic time warping (DTW), computed from videos recorded during pulsatile flow tests. We compare the symmetry score profiles with those of fluid dynamic parameters (velocity and vorticity values) at the leaflet borders, obtained from valve-specific numerical simulations. Experiments on four cases that include three different tricuspid BHVs yielded promising results, with the DTW scores showing a good coherence with respect to the simulations. With a link between visual and functional symmetries established, this approach paves the way towards BHV durability estimation using CV techniques.
本文通过基于计算机视觉(CV)的瓣膜叶运动视觉对称性分析,研究了生物人工心脏瓣膜(BHV)的耐久性估计。bhv通常被植入患有瓣膜性心脏病的患者。阀门设计经过了心血管设备的严格测试,但一旦植入,超过50%的bhv会在15年内出现结构故障。我们研究了BHV小叶的视觉动态对称性与瓣膜功能对称性之间的相关性。我们假设阀瓣运动的不对称会产生流动模式的不对称,导致部分阀瓣上的局部应力和力增加,从而加速阀瓣的失效。我们提出了基于正交投影矩阵对角线(DOPM)和动态时间规整(DTW)的两种不同的成对小叶对称分数,从脉动流测试期间记录的视频中计算得到。我们比较了对称分数分布与那些流体动力学参数(速度和涡量值)在小叶边界,从阀门特定的数值模拟获得。在包括三种不同三尖瓣bhv的四种情况下进行的实验取得了令人满意的结果,DTW分数与模拟结果显示出良好的一致性。通过建立视觉对称性和功能对称性之间的联系,该方法为使用CV技术进行BHV耐久性评估铺平了道路。
{"title":"Towards Durability Estimation of Bioprosthetic Heart Valves Via Motion Symmetry Analysis","authors":"M. Alizadeh, Melissa Cote, A. Albu","doi":"10.1109/WACV51458.2022.00176","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00176","url":null,"abstract":"This paper addresses bioprosthetic heart valve (BHV) durability estimation via computer vision (CV)-based analyses of the visual symmetry of valve leaflet motion. BHVs are routinely implanted in patients suffering from valvular heart diseases. Valve designs are rigorously tested using cardiovascular equipment, but once implanted, more than 50% of BHVs encounter a structural failure within 15 years. We investigate the correlation between the visual dynamic symmetry of BHV leaflets and the functional symmetry of the valves. We hypothesize that an asymmetry in the valve leaflet motion will generate an asymmetry in the flow patterns, resulting in added local stress and forces on some of the leaflets, which can accelerate the failure of the valve. We propose two different pair-wise leaflet symmetry scores based on the diagonals of orthogonal projection matrices (DOPM) and on dynamic time warping (DTW), computed from videos recorded during pulsatile flow tests. We compare the symmetry score profiles with those of fluid dynamic parameters (velocity and vorticity values) at the leaflet borders, obtained from valve-specific numerical simulations. Experiments on four cases that include three different tricuspid BHVs yielded promising results, with the DTW scores showing a good coherence with respect to the simulations. With a link between visual and functional symmetries established, this approach paves the way towards BHV durability estimation using CV techniques.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134379015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel-View Synthesis of Human Tourist Photos 人类旅游照片的新视角合成
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00093
Jonathan Freer, K. M. Yi, Wei Jiang, Jongwon Choi, H. Chang
We present a novel framework for performing novel-view synthesis on human tourist photos. Given a tourist photo from a known scene, we reconstruct the photo in 3D space through modeling the human and the background independently. We generate a deep buffer from a novel viewpoint of the reconstruction and utilize a deep network to translate the buffer into a photo-realistic rendering of the novel view. We additionally present a method to relight the renderings, allowing for relighting of both human and background to match either the provided input image or any other. The key contributions of our paper are: 1) a framework for performing novel view synthesis on human tourist photos, 2) an appearance transfer method for relighting of humans to match synthesized backgrounds, and 3) a method for estimating lighting properties from a single human photo. We demonstrate the proposed framework on photos from two different scenes of various tourists.
我们提出了一种对人类旅游照片进行新视图合成的新框架。给定一个已知场景的游客照片,我们通过对人物和背景的独立建模,在三维空间中重建照片。我们从重建的新视角生成一个深度缓冲区,并利用深度网络将缓冲区转换为新视图的逼真渲染。我们还提出了一种方法来重新照亮渲染,允许重新照亮人和背景,以匹配所提供的输入图像或任何其他。本文的主要贡献是:1)一个对人类旅游照片进行新颖视图合成的框架,2)一种重新照亮人类以匹配合成背景的外观转移方法,以及3)一种从单个人类照片估计照明属性的方法。我们在不同游客的两个不同场景的照片上展示了所提出的框架。
{"title":"Novel-View Synthesis of Human Tourist Photos","authors":"Jonathan Freer, K. M. Yi, Wei Jiang, Jongwon Choi, H. Chang","doi":"10.1109/WACV51458.2022.00093","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00093","url":null,"abstract":"We present a novel framework for performing novel-view synthesis on human tourist photos. Given a tourist photo from a known scene, we reconstruct the photo in 3D space through modeling the human and the background independently. We generate a deep buffer from a novel viewpoint of the reconstruction and utilize a deep network to translate the buffer into a photo-realistic rendering of the novel view. We additionally present a method to relight the renderings, allowing for relighting of both human and background to match either the provided input image or any other. The key contributions of our paper are: 1) a framework for performing novel view synthesis on human tourist photos, 2) an appearance transfer method for relighting of humans to match synthesized backgrounds, and 3) a method for estimating lighting properties from a single human photo. We demonstrate the proposed framework on photos from two different scenes of various tourists.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134452992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1