首页 > 最新文献

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
Non-local Attention Improves Description Generation for Retinal Images 非局部注意改进视网膜图像描述生成
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00331
Jia-Hong Huang, Ting-Wei Wu, C. Yang, Zenglin Shi, I-Hung Lin, J. Tegnér, M. Worring
Automatically generating medical reports from retinal images is a difficult task in which an algorithm must generate semantically coherent descriptions for a given retinal image. Existing methods mainly rely on the input image to generate descriptions. However, many abstract medical concepts or descriptions cannot be generated based on image information only. In this work, we integrate additional information to help solve this task; we observe that early in the diagnosis process, ophthalmologists have usually written down a small set of keywords denoting important information. These keywords are then subsequently used to aid the later creation of medical reports for a patient. Since these keywords commonly exist and are useful for generating medical reports, we incorporate them into automatic report generation. Since we have two types of inputs expert-defined unordered keywords and images - effectively fusing features from these different modalities is challenging. To that end, we propose a new keyword-driven medical report generation method based on a non-local attention-based multi-modal feature fusion approach, TransFuser, which is capable of fusing features from different types of inputs based on such attention. Our experiments show the proposed method successfully captures the mutual information of keywords and image content. We further show our proposed keyword-driven generation model reinforced by the TransFuser is superior to baselines under the popular text evaluation metrics BLEU, CIDEr, and ROUGE. Trans-Fuser Github: https://github.com/Jhhuangkay/Non-local-Attention-ImprovesDescription-Generation-for-Retinal-Images.
从视网膜图像中自动生成医学报告是一项困难的任务,其中算法必须为给定的视网膜图像生成语义一致的描述。现有的方法主要依靠输入图像来生成描述。然而,许多抽象的医学概念或描述不能仅仅基于图像信息来生成。在这项工作中,我们集成了额外的信息来帮助解决这个任务;我们观察到,在诊断过程的早期,眼科医生通常会写下一小组表示重要信息的关键词。然后使用这些关键字来帮助稍后为患者创建医疗报告。由于这些关键字通常存在并且对生成医疗报告很有用,因此我们将它们合并到自动报告生成中。由于我们有两种类型的输入——专家定义的无序关键词和图像——有效地融合这些不同模式的特征是具有挑战性的。为此,我们提出了一种新的基于非局部关注的多模态特征融合方法的关键字驱动医疗报告生成方法——输血,该方法能够基于这种关注融合来自不同类型输入的特征。实验表明,该方法成功地捕获了关键词和图像内容之间的互信息。我们进一步表明,在流行的文本评估指标BLEU、CIDEr和ROUGE下,我们提出的关键词驱动生成模型在输血用户的强化下优于基线。Trans-Fuser Github: https://github.com/Jhhuangkay/Non-local-Attention-ImprovesDescription-Generation-for-Retinal-Images。
{"title":"Non-local Attention Improves Description Generation for Retinal Images","authors":"Jia-Hong Huang, Ting-Wei Wu, C. Yang, Zenglin Shi, I-Hung Lin, J. Tegnér, M. Worring","doi":"10.1109/WACV51458.2022.00331","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00331","url":null,"abstract":"Automatically generating medical reports from retinal images is a difficult task in which an algorithm must generate semantically coherent descriptions for a given retinal image. Existing methods mainly rely on the input image to generate descriptions. However, many abstract medical concepts or descriptions cannot be generated based on image information only. In this work, we integrate additional information to help solve this task; we observe that early in the diagnosis process, ophthalmologists have usually written down a small set of keywords denoting important information. These keywords are then subsequently used to aid the later creation of medical reports for a patient. Since these keywords commonly exist and are useful for generating medical reports, we incorporate them into automatic report generation. Since we have two types of inputs expert-defined unordered keywords and images - effectively fusing features from these different modalities is challenging. To that end, we propose a new keyword-driven medical report generation method based on a non-local attention-based multi-modal feature fusion approach, TransFuser, which is capable of fusing features from different types of inputs based on such attention. Our experiments show the proposed method successfully captures the mutual information of keywords and image content. We further show our proposed keyword-driven generation model reinforced by the TransFuser is superior to baselines under the popular text evaluation metrics BLEU, CIDEr, and ROUGE. Trans-Fuser Github: https://github.com/Jhhuangkay/Non-local-Attention-ImprovesDescription-Generation-for-Retinal-Images.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125114678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
High Dynamic Range Imaging of Dynamic Scenes with Saturation Compensation but without Explicit Motion Compensation 具有饱和补偿但无显式运动补偿的动态场景的高动态范围成像
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00014
Haesoo Chung, N. Cho
High dynamic range (HDR) imaging is a highly challenging task since a large amount of information is lost due to the limitations of camera sensors. For HDR imaging, some methods capture multiple low dynamic range (LDR) images with altering exposures to aggregate more information. However, these approaches introduce ghosting artifacts when significant inter-frame motions are present. Moreover, although multi-exposure images are given, we have little information in severely over-exposed areas. Most existing methods focus on motion compensation, i.e., alignment of multiple LDR shots to reduce the ghosting artifacts, but they still produce unsatisfying results. These methods also rather overlook the need to restore the saturated areas. In this paper, we generate well-aligned multi-exposure features by reformulating a motion alignment problem into a simple brightness adjustment problem. In addition, we propose a coarse-to-fine merging strategy with explicit saturation compensation. The saturated areas are reconstructed with similar well-exposed content using adaptive contextual attention. We demonstrate that our method outperforms the state-of-the-art methods regarding qualitative and quantitative evaluations.
高动态范围(HDR)成像是一项极具挑战性的任务,因为由于相机传感器的限制,大量信息丢失。对于HDR成像,一些方法通过改变曝光来捕获多个低动态范围(LDR)图像以聚合更多信息。然而,当存在显著的帧间运动时,这些方法会引入重影伪影。此外,虽然给出了多次曝光图像,但在严重过度曝光的区域,我们几乎没有信息。现有的方法大多侧重于运动补偿,即对多个LDR镜头进行对齐,以减少重影伪影,但效果仍不理想。这些方法也忽略了恢复饱和区域的需要。在本文中,我们通过将运动对齐问题重新表述为简单的亮度调整问题来生成对齐良好的多曝光特征。此外,我们还提出了一种具有显式饱和补偿的由粗到精的合并策略。使用自适应上下文注意,用相似的充分暴露的内容重建饱和区域。我们证明,我们的方法优于关于定性和定量评估的最先进的方法。
{"title":"High Dynamic Range Imaging of Dynamic Scenes with Saturation Compensation but without Explicit Motion Compensation","authors":"Haesoo Chung, N. Cho","doi":"10.1109/WACV51458.2022.00014","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00014","url":null,"abstract":"High dynamic range (HDR) imaging is a highly challenging task since a large amount of information is lost due to the limitations of camera sensors. For HDR imaging, some methods capture multiple low dynamic range (LDR) images with altering exposures to aggregate more information. However, these approaches introduce ghosting artifacts when significant inter-frame motions are present. Moreover, although multi-exposure images are given, we have little information in severely over-exposed areas. Most existing methods focus on motion compensation, i.e., alignment of multiple LDR shots to reduce the ghosting artifacts, but they still produce unsatisfying results. These methods also rather overlook the need to restore the saturated areas. In this paper, we generate well-aligned multi-exposure features by reformulating a motion alignment problem into a simple brightness adjustment problem. In addition, we propose a coarse-to-fine merging strategy with explicit saturation compensation. The saturated areas are reconstructed with similar well-exposed content using adaptive contextual attention. We demonstrate that our method outperforms the state-of-the-art methods regarding qualitative and quantitative evaluations.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"300 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131897867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Multi-Scale Patch-Based Representation Learning for Image Anomaly Detection and Segmentation 基于多尺度斑块表示学习的图像异常检测与分割
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00312
Chin-Chia Tsai, Tsung-Hsuan Wu, S. Lai
Unsupervised representation learning has been proven to be effective for the challenging anomaly detection and segmentation tasks. In this paper, we propose a multi-scale patch-based representation learning method to extract critical and representative information from normal images. By taking the relative feature similarity between patches of different local distances into account, we can achieve better representation learning. Moreover, we propose a refined way to improve the self-supervised learning strategy, thus allowing our model to learn better geometric relationship between neighboring patches. Through sliding patches of different scales all over an image, our model extracts representative features from each patch and compares them with those in the training set of normal images to detect the anomalous regions. Our experimental results on MVTec AD dataset and BTAD dataset demonstrate the proposed method achieves the state-of-the-art accuracy for both anomaly detection and segmentation.
无监督表示学习已被证明是有效的异常检测和分割任务。本文提出了一种基于多尺度斑块的表示学习方法,从正常图像中提取关键信息和代表性信息。通过考虑不同局部距离的patch之间的相对特征相似度,可以实现更好的表示学习。此外,我们提出了一种改进自监督学习策略的改进方法,从而使我们的模型能够更好地学习相邻斑块之间的几何关系。我们的模型通过在图像上滑动不同尺度的小块,从每个小块中提取有代表性的特征,并与正常图像训练集中的特征进行比较,从而检测出异常区域。在MVTec AD数据集和BTAD数据集上的实验结果表明,该方法在异常检测和分割方面都达到了最先进的精度。
{"title":"Multi-Scale Patch-Based Representation Learning for Image Anomaly Detection and Segmentation","authors":"Chin-Chia Tsai, Tsung-Hsuan Wu, S. Lai","doi":"10.1109/WACV51458.2022.00312","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00312","url":null,"abstract":"Unsupervised representation learning has been proven to be effective for the challenging anomaly detection and segmentation tasks. In this paper, we propose a multi-scale patch-based representation learning method to extract critical and representative information from normal images. By taking the relative feature similarity between patches of different local distances into account, we can achieve better representation learning. Moreover, we propose a refined way to improve the self-supervised learning strategy, thus allowing our model to learn better geometric relationship between neighboring patches. Through sliding patches of different scales all over an image, our model extracts representative features from each patch and compares them with those in the training set of normal images to detect the anomalous regions. Our experimental results on MVTec AD dataset and BTAD dataset demonstrate the proposed method achieves the state-of-the-art accuracy for both anomaly detection and segmentation.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"298 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133199235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Transferable 3D Adversarial Textures using End-to-end Optimization 使用端到端优化可转移的3D对抗纹理
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00080
Camilo Pestana, Naveed Akhtar, N. Rahnavard, M. Shah, A. Mian
Deep visual models are known to be vulnerable to adversarial attacks. The last few years have seen numerous techniques to compute adversarial inputs for these models. However, there are still under-explored avenues in this critical research direction. Among those is the estimation of adversarial textures for 3D models in an end-to-end optimization scheme. In this paper, we propose such a scheme to generate adversarial textures for 3D models that are highly transferable and invariant to different camera views and lighting conditions. Our method makes use of neural rendering with explicit control over the model texture and background. We ensure transferability of the adversarial textures by employing an ensemble of robust and non-robust models. Our technique utilizes 3D models as a proxy to simulate closer to real-life conditions, in contrast to conventional use of 2D images for adversarial attacks. We show the efficacy of our method with extensive experiments.
众所周知,深度视觉模型容易受到对抗性攻击。最近几年出现了许多计算这些模型的对抗性输入的技术。然而,在这一关键的研究方向上,仍有未开发的途径。其中包括在端到端优化方案中对3D模型的对抗纹理的估计。在本文中,我们提出了这样一种方案来为3D模型生成对抗性纹理,这种纹理在不同的相机视图和光照条件下具有高度可转移性和不变性。我们的方法利用神经渲染对模型纹理和背景进行显式控制。我们通过采用鲁棒和非鲁棒模型的集合来确保对抗性纹理的可转移性。我们的技术利用3D模型作为代理来模拟更接近现实生活的条件,而不是传统的使用2D图像进行对抗性攻击。我们通过大量的实验证明了我们方法的有效性。
{"title":"Transferable 3D Adversarial Textures using End-to-end Optimization","authors":"Camilo Pestana, Naveed Akhtar, N. Rahnavard, M. Shah, A. Mian","doi":"10.1109/WACV51458.2022.00080","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00080","url":null,"abstract":"Deep visual models are known to be vulnerable to adversarial attacks. The last few years have seen numerous techniques to compute adversarial inputs for these models. However, there are still under-explored avenues in this critical research direction. Among those is the estimation of adversarial textures for 3D models in an end-to-end optimization scheme. In this paper, we propose such a scheme to generate adversarial textures for 3D models that are highly transferable and invariant to different camera views and lighting conditions. Our method makes use of neural rendering with explicit control over the model texture and background. We ensure transferability of the adversarial textures by employing an ensemble of robust and non-robust models. Our technique utilizes 3D models as a proxy to simulate closer to real-life conditions, in contrast to conventional use of 2D images for adversarial attacks. We show the efficacy of our method with extensive experiments.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133503645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Novel Ensemble Diversification Methods for Open-Set Scenarios 开集场景的集成多样化新方法
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00342
Miriam Farber, Roman Goldenberg, G. Leifman, Gal Novich
We revisit existing ensemble diversification approaches and present two novel diversification methods tailored for open-set scenarios. The first method uses a new loss, designed to encourage models disagreement on outliers only, thus alleviating the intrinsic accuracy-diversity trade-off. The second method achieves diversity via automated feature engineering, by training each model to disregard input features learned by previously trained ensemble models. We conduct an extensive evaluation and analysis of the proposed techniques on seven datasets that cover image classification, re-identification and recognition domains. We compare to and demonstrate accuracy improvements over the existing state-of-the-art ensemble diversification methods.
我们回顾了现有的集合多样化方法,并提出了两种针对开放集场景的新颖多样化方法。第一种方法使用了一种新的损失,旨在鼓励模型只在异常值上存在分歧,从而减轻了固有的准确性和多样性权衡。第二种方法通过自动化特征工程实现多样性,通过训练每个模型忽略先前训练的集成模型学习的输入特征。我们在涵盖图像分类、再识别和识别领域的七个数据集上对所提出的技术进行了广泛的评估和分析。我们比较并展示了现有最先进的集成多样化方法的准确性改进。
{"title":"Novel Ensemble Diversification Methods for Open-Set Scenarios","authors":"Miriam Farber, Roman Goldenberg, G. Leifman, Gal Novich","doi":"10.1109/WACV51458.2022.00342","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00342","url":null,"abstract":"We revisit existing ensemble diversification approaches and present two novel diversification methods tailored for open-set scenarios. The first method uses a new loss, designed to encourage models disagreement on outliers only, thus alleviating the intrinsic accuracy-diversity trade-off. The second method achieves diversity via automated feature engineering, by training each model to disregard input features learned by previously trained ensemble models. We conduct an extensive evaluation and analysis of the proposed techniques on seven datasets that cover image classification, re-identification and recognition domains. We compare to and demonstrate accuracy improvements over the existing state-of-the-art ensemble diversification methods.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124334643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FLUID: Few-Shot Self-Supervised Image Deraining 流体:少镜头自监督图像脱轨
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00049
Shyam Nandan Rai, Rohit Saluja, Chetan Arora, V. Balasubramanian, A. Subramanian, C.V. Jawahar
Self-supervised methods have shown promising results in denoising and dehazing tasks, where the collection of the paired dataset is challenging and expensive. However, we find that these methods fail to remove the rain streaks when applied for image deraining tasks. The method’s poor performance is due to the explicit assumptions: (i) the distribution of noise or haze is uniform and (ii) the value of a noisy or hazy pixel is independent of its neighbors. The rainy pixels are non-uniformly distributed, and it is not necessarily dependant on its neighboring pixels. Hence, we conclude that the self-supervised method needs to have some prior knowledge about rain distribution to perform the deraining task. To provide this knowledge, we hypothesize a network trained with minimal supervision to estimate the likelihood of rainy pixels. This leads us to our proposed method called FLUID: Few Shot Sel f-Supervised Image Deraining.We perform extensive experiments and comparisons with existing image deraining and few-shot image-to-image translation methods on Rain 100L and DDN-SIRR datasets containing real and synthetic rainy images. In addition, we use the Rainy Cityscapes dataset to show that our method trained in a few-shot setting can improve semantic segmentation and object detection in rainy conditions. Our approach obtains a mIoU gain of 51.20 over the current best-performing deraining method. [Project Page]
自监督方法在去噪和去雾任务中显示出有希望的结果,其中成对数据集的收集具有挑战性且昂贵。然而,我们发现这些方法在应用于图像脱轨任务时不能去除雨纹。该方法的性能差是由于明确的假设:(i)噪声或雾霾的分布是均匀的,(ii)噪声或雾霾像素的值与其邻居无关。雨像元的分布不均匀,并不一定依赖于邻近像元。因此,我们得出结论,自监督方法需要有一些关于降雨分布的先验知识来执行训练任务。为了提供这些知识,我们假设了一个在最小监督下训练的网络来估计下雨像素的可能性。这导致我们提出的方法称为流体:少数镜头自监督图像脱轨。我们在Rain 100L和DDN-SIRR数据集上进行了大量的实验,并与现有的图像脱除和少量图像到图像的转换方法进行了比较,这些数据集包含真实和合成的降雨图像。此外,我们使用雨天城市景观数据集来证明我们的方法在少数镜头设置下训练可以改善雨天条件下的语义分割和目标检测。我们的方法比目前表现最好的训练方法获得了51.20的mIoU增益。(项目页面)
{"title":"FLUID: Few-Shot Self-Supervised Image Deraining","authors":"Shyam Nandan Rai, Rohit Saluja, Chetan Arora, V. Balasubramanian, A. Subramanian, C.V. Jawahar","doi":"10.1109/WACV51458.2022.00049","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00049","url":null,"abstract":"Self-supervised methods have shown promising results in denoising and dehazing tasks, where the collection of the paired dataset is challenging and expensive. However, we find that these methods fail to remove the rain streaks when applied for image deraining tasks. The method’s poor performance is due to the explicit assumptions: (i) the distribution of noise or haze is uniform and (ii) the value of a noisy or hazy pixel is independent of its neighbors. The rainy pixels are non-uniformly distributed, and it is not necessarily dependant on its neighboring pixels. Hence, we conclude that the self-supervised method needs to have some prior knowledge about rain distribution to perform the deraining task. To provide this knowledge, we hypothesize a network trained with minimal supervision to estimate the likelihood of rainy pixels. This leads us to our proposed method called FLUID: Few Shot Sel f-Supervised Image Deraining.We perform extensive experiments and comparisons with existing image deraining and few-shot image-to-image translation methods on Rain 100L and DDN-SIRR datasets containing real and synthetic rainy images. In addition, we use the Rainy Cityscapes dataset to show that our method trained in a few-shot setting can improve semantic segmentation and object detection in rainy conditions. Our approach obtains a mIoU gain of 51.20 over the current best-performing deraining method. [Project Page]","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128852726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Less Can Be More: Sound Source Localization With a Classification Model 少即是多:声源定位与分类模型
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00065
Arda Senocak, H. Ryu, Junsik Kim, In-So Kweon
In this paper, we tackle sound localization as a natural outcome of the audio-visual video classification problem. Differently from the existing sound localization approaches, we do not use any explicit sub-modules or training mechanisms but use simple cross-modal attention on top of the representations learned by a classification loss. Our key contribution is to show that a simple audio-visual classification model has the ability to localize sound sources accurately and to give on par performance with state-of-the-art methods by proving that indeed "less is more". Furthermore, we propose potential applications that can be built based on our model. First, we introduce informative moment selection to enhance the localization task learning in the existing approaches compare to mid-frame usage. Then, we introduce a pseudo bounding box generation procedure that can significantly boost the performance of the existing methods in semi-supervised settings or be used for large-scale automatic annotation with minimal effort from any video dataset.
在本文中,我们将声音定位作为视听视频分类问题的自然结果来解决。与现有的声音定位方法不同,我们不使用任何显式的子模块或训练机制,而是在通过分类损失学习到的表征之上使用简单的跨模态注意。我们的主要贡献是表明一个简单的视听分类模型能够准确地定位声源,并通过证明确实“少即是多”来提供与最先进的方法相当的性能。此外,我们提出了可以基于我们的模型构建的潜在应用程序。首先,与中帧方法相比,我们引入了信息矩选择来增强现有方法中的定位任务学习。然后,我们引入了一个伪边界框生成过程,该过程可以显着提高现有方法在半监督设置下的性能,或者用于对任何视频数据集进行大规模自动标注。
{"title":"Less Can Be More: Sound Source Localization With a Classification Model","authors":"Arda Senocak, H. Ryu, Junsik Kim, In-So Kweon","doi":"10.1109/WACV51458.2022.00065","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00065","url":null,"abstract":"In this paper, we tackle sound localization as a natural outcome of the audio-visual video classification problem. Differently from the existing sound localization approaches, we do not use any explicit sub-modules or training mechanisms but use simple cross-modal attention on top of the representations learned by a classification loss. Our key contribution is to show that a simple audio-visual classification model has the ability to localize sound sources accurately and to give on par performance with state-of-the-art methods by proving that indeed \"less is more\". Furthermore, we propose potential applications that can be built based on our model. First, we introduce informative moment selection to enhance the localization task learning in the existing approaches compare to mid-frame usage. Then, we introduce a pseudo bounding box generation procedure that can significantly boost the performance of the existing methods in semi-supervised settings or be used for large-scale automatic annotation with minimal effort from any video dataset.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117136306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Few-Shot Open-Set Recognition of Hyperspectral Images with Outlier Calibration Network 基于离群点标定网络的高光谱图像少镜头开集识别
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00215
Debabrata Pal, Valay Bundele, Renuka Sharma, Biplab Banerjee, Y. Jeppu
We tackle the few-shot open-set recognition (FSOSR) problem in the context of remote sensing hyperspectral image (HSI) classification. Prior research on OSR mainly considers an empirical threshold on the class prediction scores to reject the outlier samples. Further, recent endeavors in few-shot HSI classification fail to recognize outliers due to the ‘closed-set’ nature of the problem and the fact that the entire class distributions are unknown during training. To this end, we propose to optimize a novel outlier calibration network (OCN) together with a feature extraction module during the meta-training phase. The feature extractor is equipped with a novel residual 3D convolutional block attention network (R3CBAM) for enhanced spectral-spatial feature learning from HSI. Our method rejects the outliers based on OCN prediction scores barring the need for manual thresholding. Finally, we propose to augment the query set with synthesized support set features during the similarity learning stage in order to combat the data scarcity issue of few-shot learning. The superiority of the proposed model is showcased on four benchmark HSI datasets. 1
研究了基于遥感高光谱图像分类的少镜头开集识别(FSOSR)问题。以往对OSR的研究主要考虑在类预测分数上设置一个经验阈值来拒绝异常样本。此外,由于问题的“闭集”性质以及整个类分布在训练期间未知的事实,最近在少数次HSI分类中的努力未能识别异常值。为此,我们提出在元训练阶段优化一种新的离群点校准网络(OCN)和特征提取模块。特征提取器配备了一种新的残差三维卷积块注意网络(R3CBAM),用于增强从HSI中学习频谱空间特征。我们的方法拒绝基于OCN预测分数的异常值,而不需要手动阈值。最后,我们提出在相似学习阶段用综合支持集特征来增强查询集,以解决少镜头学习的数据稀缺性问题。在四个基准HSI数据集上展示了该模型的优越性。1
{"title":"Few-Shot Open-Set Recognition of Hyperspectral Images with Outlier Calibration Network","authors":"Debabrata Pal, Valay Bundele, Renuka Sharma, Biplab Banerjee, Y. Jeppu","doi":"10.1109/WACV51458.2022.00215","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00215","url":null,"abstract":"We tackle the few-shot open-set recognition (FSOSR) problem in the context of remote sensing hyperspectral image (HSI) classification. Prior research on OSR mainly considers an empirical threshold on the class prediction scores to reject the outlier samples. Further, recent endeavors in few-shot HSI classification fail to recognize outliers due to the ‘closed-set’ nature of the problem and the fact that the entire class distributions are unknown during training. To this end, we propose to optimize a novel outlier calibration network (OCN) together with a feature extraction module during the meta-training phase. The feature extractor is equipped with a novel residual 3D convolutional block attention network (R3CBAM) for enhanced spectral-spatial feature learning from HSI. Our method rejects the outliers based on OCN prediction scores barring the need for manual thresholding. Finally, we propose to augment the query set with synthesized support set features during the similarity learning stage in order to combat the data scarcity issue of few-shot learning. The superiority of the proposed model is showcased on four benchmark HSI datasets. 1","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"559 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114316510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Unsupervised Sounding Object Localization with Bottom-Up and Top-Down Attention 基于自底向上和自顶向下注意的无监督探测目标定位
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00222
Jiaying Shi, Chao Ma
Learning to localize sounding objects in visual scenes without manual annotations has drawn increasing attention recently. In this paper, we propose an unsupervised sounding object localization algorithm by using bottom-up and top-down attention in visual scenes. The bottom-up attention module generates an objectness confidence map, while the top-down attention draws the similarity between sound and visual regions. Moreover, we propose a bottom-up attention loss function, which models the correlation relationship between bottom-up and top-down attention. Extensive experimental results demonstrate that our proposed unsupervised method significantly advances the state-of-the-art unsupervised methods. The source code is available at https://github.com/VISION-SJTU/USOL.
如何在不需要人工标注的情况下对视觉场景中的发声物体进行定位,近年来受到越来越多的关注。本文提出了一种基于自底向上和自顶向下的视觉场景无监督探测目标定位算法。自下而上的注意模块生成对象置信度图,而自上而下的注意模块绘制声音和视觉区域之间的相似性。此外,我们提出了一个自下而上的注意损失函数,该函数模拟了自下而上和自上而下的注意之间的相关关系。大量的实验结果表明,我们提出的无监督方法显着提高了最先进的无监督方法。源代码可从https://github.com/VISION-SJTU/USOL获得。
{"title":"Unsupervised Sounding Object Localization with Bottom-Up and Top-Down Attention","authors":"Jiaying Shi, Chao Ma","doi":"10.1109/WACV51458.2022.00222","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00222","url":null,"abstract":"Learning to localize sounding objects in visual scenes without manual annotations has drawn increasing attention recently. In this paper, we propose an unsupervised sounding object localization algorithm by using bottom-up and top-down attention in visual scenes. The bottom-up attention module generates an objectness confidence map, while the top-down attention draws the similarity between sound and visual regions. Moreover, we propose a bottom-up attention loss function, which models the correlation relationship between bottom-up and top-down attention. Extensive experimental results demonstrate that our proposed unsupervised method significantly advances the state-of-the-art unsupervised methods. The source code is available at https://github.com/VISION-SJTU/USOL.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127056318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Learning to Generate the Unknowns as a Remedy to the Open-Set Domain Shift 学习生成未知数作为开集域移位的补救措施
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00379
Mahsa Baktash, Tianle Chen, M. Salzmann
In many situations, the data one has access to at test time follows a different distribution from the training data. Over the years, this problem has been tackled by closed-set domain adaptation techniques. Recently, open-set domain adaptation has emerged to address the more realistic scenario where additional unknown classes are present in the target data. In this setting, existing techniques focus on the challenging task of isolating the unknown target samples, so as to avoid the negative transfer resulting from aligning the source feature distributions with the broader target one that encompasses the additional unknown classes. Here, we propose a simpler and more effective solution consisting of complementing the source data distribution and making it comparable to the target one by enabling the model to generate source samples corresponding to the unknown target classes. We formulate this as a general module that can be incorporated into any existing closed-set approach and show that this strategy allows us to outperform the state of the art on open-set domain adaptation benchmark datasets.
在许多情况下,测试时访问的数据遵循与训练数据不同的分布。多年来,这一问题一直被闭集域自适应技术所解决。最近,开放集域自适应已经出现,以解决目标数据中存在额外未知类的更现实的场景。在这种情况下,现有的技术专注于隔离未知目标样本的挑战性任务,以避免将源特征分布与包含额外未知类的更广泛的目标特征分布对齐所导致的负迁移。在这里,我们提出了一种更简单有效的解决方案,通过使模型能够生成与未知目标类对应的源样本,来补充源数据分布并使其与目标数据分布相当。我们将其表述为一个通用模块,可以合并到任何现有的闭集方法中,并表明该策略使我们能够在开集域自适应基准数据集上超越目前的技术水平。
{"title":"Learning to Generate the Unknowns as a Remedy to the Open-Set Domain Shift","authors":"Mahsa Baktash, Tianle Chen, M. Salzmann","doi":"10.1109/WACV51458.2022.00379","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00379","url":null,"abstract":"In many situations, the data one has access to at test time follows a different distribution from the training data. Over the years, this problem has been tackled by closed-set domain adaptation techniques. Recently, open-set domain adaptation has emerged to address the more realistic scenario where additional unknown classes are present in the target data. In this setting, existing techniques focus on the challenging task of isolating the unknown target samples, so as to avoid the negative transfer resulting from aligning the source feature distributions with the broader target one that encompasses the additional unknown classes. Here, we propose a simpler and more effective solution consisting of complementing the source data distribution and making it comparable to the target one by enabling the model to generate source samples corresponding to the unknown target classes. We formulate this as a general module that can be incorporated into any existing closed-set approach and show that this strategy allows us to outperform the state of the art on open-set domain adaptation benchmark datasets.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126338346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1