首页 > 最新文献

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference最新文献

英文 中文
Learning Anatomically Consistent Embedding for Chest Radiography. 学习胸部放射摄影的解剖一致性嵌入
Ziyu Zhou, Haozhe Luo, Jiaxuan Pang, Xiaowei Ding, Michael Gotway, Jianming Liang

Self-supervised learning (SSL) approaches have recently shown substantial success in learning visual representations from unannotated images. Compared with photographic images, medical images acquired with the same imaging protocol exhibit high consistency in anatomy. To exploit this anatomical consistency, this paper introduces a novel SSL approach, called PEAC (patch embedding of anatomical consistency), for medical image analysis. Specifically, in this paper, we propose to learn global and local consistencies via stable grid-based matching, transfer pre-trained PEAC models to diverse downstream tasks, and extensively demonstrate that (1) PEAC achieves significantly better performance than the existing state-of-the-art fully/self-supervised methods, and (2) PEAC captures the anatomical structure consistency across views of the same patient and across patients of different genders, weights, and healthy statuses, which enhances the interpretability of our method for medical image analysis. All code and pretrained models are available at GitHub.com/JLiangLab/PEAC.

自我监督学习(SSL)方法最近在从无标注图像中学习视觉表征方面取得了巨大成功。与摄影图像相比,以相同成像协议获取的医学图像在解剖学上表现出高度的一致性。为了利用这种解剖一致性,本文介绍了一种用于医学图像分析的新型 SSL 方法,称为 PEAC(解剖一致性补丁嵌入)。具体来说,本文提出通过基于网格的稳定匹配来学习全局和局部一致性,将预先训练好的 PEAC 模型转移到不同的下游任务中,并广泛证明:(1) PEAC 的性能明显优于现有的最先进的完全/自我监督方法;(2) PEAC 能够捕捉同一患者不同视图以及不同性别、体重和健康状况患者的解剖结构一致性,从而增强了我们的方法在医学图像分析中的可解释性。所有代码和预训练模型可从 GitHub.com/JLiangLab/PEAC 获取。
{"title":"Learning Anatomically Consistent Embedding for Chest Radiography.","authors":"Ziyu Zhou, Haozhe Luo, Jiaxuan Pang, Xiaowei Ding, Michael Gotway, Jianming Liang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Self-supervised learning (SSL) approaches have recently shown substantial success in learning visual representations from unannotated images. Compared with photographic images, medical images acquired with the same imaging protocol exhibit high consistency in anatomy. To exploit this anatomical consistency, this paper introduces a novel SSL approach, called PEAC (patch embedding of anatomical consistency), for medical image analysis. Specifically, in this paper, we propose to learn global and local consistencies via stable grid-based matching, transfer pre-trained PEAC models to diverse downstream tasks, and extensively demonstrate that (1) PEAC achieves significantly better performance than the existing state-of-the-art fully/self-supervised methods, and (2) PEAC captures the anatomical structure consistency across views of the same patient and across patients of different genders, weights, and healthy statuses, which enhances the interpretability of our method for medical image analysis. All code and pretrained models are available at GitHub.com/JLiangLab/PEAC.</p>","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"2023 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11135486/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141174538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single Pixel Spectral Color Constancy 单像素光谱色彩常数
S. Koskinen, Erman Acar, Joni-Kristian Kämäräinen
{"title":"Single Pixel Spectral Color Constancy","authors":"S. Koskinen, Erman Acar, Joni-Kristian Kämäräinen","doi":"10.1007/s11263-023-01867-x","DOIUrl":"https://doi.org/10.1007/s11263-023-01867-x","url":null,"abstract":"","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"58 1","pages":"326"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91172866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiffSketching: Sketch Control Image Synthesis with Diffusion Models DiffSketching:用扩散模型合成草图控制图像
Qiang Wang, Di Kong, Fengyin Lin, Yonggang Qi
Creative sketch is a universal way of visual expression, but translating images from an abstract sketch is very challenging. Traditionally, creating a deep learning model for sketch-to-image synthesis needs to overcome the distorted input sketch without visual details, and requires to collect large-scale sketch-image datasets. We first study this task by using diffusion models. Our model matches sketches through the cross domain constraints, and uses a classifier to guide the image synthesis more accurately. Extensive experiments confirmed that our method can not only be faithful to user's input sketches, but also maintain the diversity and imagination of synthetic image results. Our model can beat GAN-based method in terms of generation quality and human evaluation, and does not rely on massive sketch-image datasets. Additionally, we present applications of our method in image editing and interpolation.
创意素描是一种普遍的视觉表达方式,但从抽象素描中翻译图像是非常具有挑战性的。传统上,为草图到图像的合成创建深度学习模型需要克服没有视觉细节的扭曲输入草图,并且需要收集大规模的草图图像数据集。我们首先通过使用扩散模型来研究这个任务。我们的模型通过跨域约束匹配草图,并使用分类器更准确地指导图像合成。大量实验证明,我们的方法既能忠实于用户输入的草图,又能保持合成图像结果的多样性和想象力。我们的模型在生成质量和人工评估方面优于基于gan的方法,并且不依赖于大量的草图图像数据集。此外,我们还介绍了我们的方法在图像编辑和插值中的应用。
{"title":"DiffSketching: Sketch Control Image Synthesis with Diffusion Models","authors":"Qiang Wang, Di Kong, Fengyin Lin, Yonggang Qi","doi":"10.48550/arXiv.2305.18812","DOIUrl":"https://doi.org/10.48550/arXiv.2305.18812","url":null,"abstract":"Creative sketch is a universal way of visual expression, but translating images from an abstract sketch is very challenging. Traditionally, creating a deep learning model for sketch-to-image synthesis needs to overcome the distorted input sketch without visual details, and requires to collect large-scale sketch-image datasets. We first study this task by using diffusion models. Our model matches sketches through the cross domain constraints, and uses a classifier to guide the image synthesis more accurately. Extensive experiments confirmed that our method can not only be faithful to user's input sketches, but also maintain the diversity and imagination of synthetic image results. Our model can beat GAN-based method in terms of generation quality and human evaluation, and does not rely on massive sketch-image datasets. Additionally, we present applications of our method in image editing and interpolation.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"129 1","pages":"67"},"PeriodicalIF":0.0,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76692642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Defect Transfer GAN: Diverse Defect Synthesis for Data Augmentation 缺陷转移GAN:用于数据增强的多种缺陷综合
Ruyu Wang, Sabrina Hoppe, Eduardo Monari, Marco F. Huber
Data-hunger and data-imbalance are two major pitfalls in many deep learning approaches. For example, on highly optimized production lines, defective samples are hardly acquired while non-defective samples come almost for free. The defects however often seem to resemble each other, e.g., scratches on different products may only differ in a few characteristics. In this work, we introduce a framework, Defect Transfer GAN (DT-GAN), which learns to represent defect types independent of and across various background products and yet can apply defect-specific styles to generate realistic defective images. An empirical study on the MVTec AD and two additional datasets showcase DT-GAN outperforms state-of-the-art image synthesis methods w.r.t. sample fidelity and diversity in defect generation. We further demonstrate benefits for a critical downstream task in manufacturing -- defect classification. Results show that the augmented data from DT-GAN provides consistent gains even in the few samples regime and reduces the error rate up to 51% compared to both traditional and advanced data augmentation methods.
数据饥渴和数据不平衡是许多深度学习方法中的两个主要陷阱。例如,在高度优化的生产线上,几乎不可能获得有缺陷的样品,而没有缺陷的样品几乎是免费的。然而,这些缺陷往往看起来彼此相似,例如,不同产品上的划痕可能只在几个特征上有所不同。在这项工作中,我们引入了一个框架,缺陷转移GAN (DT-GAN),它学习表示独立于各种背景产品的缺陷类型,并且可以应用缺陷特定的样式来生成真实的缺陷图像。对MVTec AD和另外两个数据集的实证研究表明,DT-GAN在缺陷生成方面优于最先进的图像合成方法w.r.t.样本保真度和多样性。我们进一步证明了在制造业中一个关键的下游任务——缺陷分类的好处。结果表明,与传统和先进的数据增强方法相比,DT-GAN增强的数据即使在少量样本情况下也能提供一致的增益,并将错误率降低到51%。
{"title":"Defect Transfer GAN: Diverse Defect Synthesis for Data Augmentation","authors":"Ruyu Wang, Sabrina Hoppe, Eduardo Monari, Marco F. Huber","doi":"10.48550/arXiv.2302.08366","DOIUrl":"https://doi.org/10.48550/arXiv.2302.08366","url":null,"abstract":"Data-hunger and data-imbalance are two major pitfalls in many deep learning approaches. For example, on highly optimized production lines, defective samples are hardly acquired while non-defective samples come almost for free. The defects however often seem to resemble each other, e.g., scratches on different products may only differ in a few characteristics. In this work, we introduce a framework, Defect Transfer GAN (DT-GAN), which learns to represent defect types independent of and across various background products and yet can apply defect-specific styles to generate realistic defective images. An empirical study on the MVTec AD and two additional datasets showcase DT-GAN outperforms state-of-the-art image synthesis methods w.r.t. sample fidelity and diversity in defect generation. We further demonstrate benefits for a critical downstream task in manufacturing -- defect classification. Results show that the augmented data from DT-GAN provides consistent gains even in the few samples regime and reduces the error rate up to 51% compared to both traditional and advanced data augmentation methods.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"7 1","pages":"445"},"PeriodicalIF":0.0,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89106587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Mitigating Bias in Visual Transformers via Targeted Alignment 通过目标对齐减轻视觉变形中的偏差
Sruthi Sudhakar, Viraj Prabhu, Arvindkumar Krishnakumar, Judy Hoffman
As transformer architectures become increasingly prevalent in computer vision, it is critical to understand their fairness implications. We perform the first study of the fairness of transformers applied to computer vision and benchmark several bias mitigation approaches from prior work. We visualize the feature space of the transformer self-attention modules and discover that a significant portion of the bias is encoded in the query matrix. With this knowledge, we propose TADeT, a targeted alignment strategy for debiasing transformers that aims to discover and remove bias primarily from query matrix features. We measure performance using Balanced Accuracy and Standard Accuracy, and fairness using Equalized Odds and Balanced Accuracy Difference. TADeT consistently leads to improved fairness over prior work on multiple attribute prediction tasks on the CelebA dataset, without compromising performance.
随着变压器架构在计算机视觉中变得越来越普遍,理解它们的公平性含义至关重要。我们对应用于计算机视觉的变压器公平性进行了首次研究,并对先前工作中的几种偏差缓解方法进行了基准测试。我们可视化了变压器自关注模块的特征空间,发现很大一部分偏置被编码在查询矩阵中。有了这些知识,我们提出了TADeT,这是一种针对去偏变压器的定向对齐策略,旨在主要从查询矩阵特征中发现和消除偏置。我们使用平衡精度和标准精度来衡量性能,使用均衡赔率和平衡精度差来衡量公平性。与CelebA数据集上的多个属性预测任务的先前工作相比,TADeT始终能够提高公平性,而不会影响性能。
{"title":"Mitigating Bias in Visual Transformers via Targeted Alignment","authors":"Sruthi Sudhakar, Viraj Prabhu, Arvindkumar Krishnakumar, Judy Hoffman","doi":"10.48550/arXiv.2302.04358","DOIUrl":"https://doi.org/10.48550/arXiv.2302.04358","url":null,"abstract":"As transformer architectures become increasingly prevalent in computer vision, it is critical to understand their fairness implications. We perform the first study of the fairness of transformers applied to computer vision and benchmark several bias mitigation approaches from prior work. We visualize the feature space of the transformer self-attention modules and discover that a significant portion of the bias is encoded in the query matrix. With this knowledge, we propose TADeT, a targeted alignment strategy for debiasing transformers that aims to discover and remove bias primarily from query matrix features. We measure performance using Balanced Accuracy and Standard Accuracy, and fairness using Equalized Odds and Balanced Accuracy Difference. TADeT consistently leads to improved fairness over prior work on multiple attribute prediction tasks on the CelebA dataset, without compromising performance.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"25 1","pages":"138"},"PeriodicalIF":0.0,"publicationDate":"2023-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86601650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Program Generation from Diverse Video Demonstrations 从不同的视频演示程序生成
Anthony Manchin, J. Sherrah, Qi Wu, A. Hengel
The ability to use inductive reasoning to extract general rules from multiple observations is a vital indicator of intelligence. As humans, we use this ability to not only interpret the world around us, but also to predict the outcomes of the various interactions we experience. Generalising over multiple observations is a task that has historically presented difficulties for machines to grasp, especially when requiring computer vision. In this paper, we propose a model that can extract general rules from video demonstrations by simultaneously performing summarisation and translation. Our approach differs from prior works by framing the problem as a multi-sequence-to-sequence task, wherein summarisation is learnt by the model. This allows our model to utilise edge cases that would otherwise be suppressed or discarded by traditional summarisation techniques. Additionally, we show that our approach can handle noisy specifications without the need for additional filtering methods. We evaluate our model by synthesising programs from video demonstrations in the Vizdoom environment achieving state-of-the-art results with a relative increase of 11.75% program accuracy on prior works
运用归纳推理从多种观察中提取一般规则的能力是智力的重要指标。作为人类,我们不仅利用这种能力来解读我们周围的世界,还利用这种能力来预测我们所经历的各种互动的结果。从历史上看,对多个观察结果进行泛化是一项机器难以掌握的任务,特别是在需要计算机视觉的情况下。在本文中,我们提出了一个可以通过同时进行摘要和翻译从视频演示中提取一般规则的模型。我们的方法与之前的工作不同,它将问题构建为一个多序列到序列的任务,其中总结由模型学习。这允许我们的模型利用边缘情况,否则将被传统的总结技术压制或丢弃。此外,我们表明,我们的方法可以处理有噪声的规格,而不需要额外的滤波方法。我们通过在Vizdoom环境中合成视频演示程序来评估我们的模型,获得了最先进的结果,与之前的作品相比,程序精度相对提高了11.75%
{"title":"Program Generation from Diverse Video Demonstrations","authors":"Anthony Manchin, J. Sherrah, Qi Wu, A. Hengel","doi":"10.48550/arXiv.2302.00178","DOIUrl":"https://doi.org/10.48550/arXiv.2302.00178","url":null,"abstract":"The ability to use inductive reasoning to extract general rules from multiple observations is a vital indicator of intelligence. As humans, we use this ability to not only interpret the world around us, but also to predict the outcomes of the various interactions we experience. Generalising over multiple observations is a task that has historically presented difficulties for machines to grasp, especially when requiring computer vision. In this paper, we propose a model that can extract general rules from video demonstrations by simultaneously performing summarisation and translation. Our approach differs from prior works by framing the problem as a multi-sequence-to-sequence task, wherein summarisation is learnt by the model. This allows our model to utilise edge cases that would otherwise be suppressed or discarded by traditional summarisation techniques. Additionally, we show that our approach can handle noisy specifications without the need for additional filtering methods. We evaluate our model by synthesising programs from video demonstrations in the Vizdoom environment achieving state-of-the-art results with a relative increase of 11.75% program accuracy on prior works","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"82 1","pages":"1039"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79962771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local Feature Extraction from Salient Regions by Feature Map Transformation 基于特征映射变换的显著区域局部特征提取
Yerim Jung, Nur Suriza Syazwany, Sang-Chul Lee
Local feature matching is essential for many applications, such as localization and 3D reconstruction. However, it is challenging to match feature points accurately in various camera viewpoints and illumination conditions. In this paper, we propose a framework that robustly extracts and describes salient local features regardless of changing light and viewpoints. The framework suppresses illumination variations and encourages structural information to ignore the noise from light and to focus on edges. We classify the elements in the feature covariance matrix, an implicit feature map information, into two components. Our model extracts feature points from salient regions leading to reduced incorrect matches. In our experiments, the proposed method achieved higher accuracy than the state-of-the-art methods in the public dataset, such as HPatches, Aachen Day-Night, and ETH, which especially show highly variant viewpoints and illumination.
局部特征匹配在许多应用中是必不可少的,例如定位和3D重建。然而,在不同的摄像机视点和光照条件下准确匹配特征点是一个挑战。在本文中,我们提出了一个框架,无论光线和视点如何变化,都能鲁棒地提取和描述显著的局部特征。该框架抑制光照变化,并鼓励结构信息忽略光的噪声,并将重点放在边缘上。我们将隐式特征映射信息特征协方差矩阵中的元素分为两个分量。我们的模型从显著区域提取特征点,从而减少错误匹配。在我们的实验中,所提出的方法比公共数据集中最先进的方法(如HPatches, Aachen Day-Night和ETH)获得了更高的精度,特别是显示高度变化的视点和照明。
{"title":"Local Feature Extraction from Salient Regions by Feature Map Transformation","authors":"Yerim Jung, Nur Suriza Syazwany, Sang-Chul Lee","doi":"10.48550/arXiv.2301.10413","DOIUrl":"https://doi.org/10.48550/arXiv.2301.10413","url":null,"abstract":"Local feature matching is essential for many applications, such as localization and 3D reconstruction. However, it is challenging to match feature points accurately in various camera viewpoints and illumination conditions. In this paper, we propose a framework that robustly extracts and describes salient local features regardless of changing light and viewpoints. The framework suppresses illumination variations and encourages structural information to ignore the noise from light and to focus on edges. We classify the elements in the feature covariance matrix, an implicit feature map information, into two components. Our model extracts feature points from salient regions leading to reduced incorrect matches. In our experiments, the proposed method achieved higher accuracy than the state-of-the-art methods in the public dataset, such as HPatches, Aachen Day-Night, and ETH, which especially show highly variant viewpoints and illumination.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"24 1","pages":"552"},"PeriodicalIF":0.0,"publicationDate":"2023-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86380460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Few-shot Semantic Segmentation with Support-induced Graph Convolutional Network 基于支持诱导图卷积网络的少镜头语义分割
Jie Liu, Yanqi Bao, Wenzhe Yin, Haochen Wang, Yang Gao, J. Sonke, E. Gavves
Few-shot semantic segmentation (FSS) aims to achieve novel objects segmentation with only a few annotated samples and has made great progress recently. Most of the existing FSS models focus on the feature matching between support and query to tackle FSS. However, the appearance variations between objects from the same category could be extremely large, leading to unreliable feature matching and query mask prediction. To this end, we propose a Support-induced Graph Convolutional Network (SiGCN) to explicitly excavate latent context structure in query images. Specifically, we propose a Support-induced Graph Reasoning (SiGR) module to capture salient query object parts at different semantic levels with a Support-induced GCN. Furthermore, an instance association (IA) module is designed to capture high-order instance context from both support and query instances. By integrating the proposed two modules, SiGCN can learn rich query context representation, and thus being more robust to appearance variations. Extensive experiments on PASCAL-5i and COCO-20i demonstrate that our SiGCN achieves state-of-the-art performance.
少镜头语义分割(few -shot semantic segmentation, FSS)旨在利用少量的标注样本实现新颖的目标分割,近年来取得了很大的进展。现有的FSS模型大多侧重于支持和查询之间的特征匹配来解决FSS问题。然而,来自同一类别的对象之间的外观差异可能非常大,导致不可靠的特征匹配和查询掩码预测。为此,我们提出了一种支持诱导图卷积网络(SiGCN)来明确挖掘查询图像中的潜在上下文结构。具体来说,我们提出了一个支持诱导图推理(SiGR)模块,通过支持诱导的GCN捕获不同语义级别的显著查询对象部分。此外,还设计了一个实例关联(IA)模块,用于从支持和查询实例中捕获高阶实例上下文。通过集成这两个模块,SiGCN可以学习丰富的查询上下文表示,从而对外观变化具有更强的鲁棒性。PASCAL-5i和COCO-20i上的大量实验表明,我们的SiGCN达到了最先进的性能。
{"title":"Few-shot Semantic Segmentation with Support-induced Graph Convolutional Network","authors":"Jie Liu, Yanqi Bao, Wenzhe Yin, Haochen Wang, Yang Gao, J. Sonke, E. Gavves","doi":"10.48550/arXiv.2301.03194","DOIUrl":"https://doi.org/10.48550/arXiv.2301.03194","url":null,"abstract":"Few-shot semantic segmentation (FSS) aims to achieve novel objects segmentation with only a few annotated samples and has made great progress recently. Most of the existing FSS models focus on the feature matching between support and query to tackle FSS. However, the appearance variations between objects from the same category could be extremely large, leading to unreliable feature matching and query mask prediction. To this end, we propose a Support-induced Graph Convolutional Network (SiGCN) to explicitly excavate latent context structure in query images. Specifically, we propose a Support-induced Graph Reasoning (SiGR) module to capture salient query object parts at different semantic levels with a Support-induced GCN. Furthermore, an instance association (IA) module is designed to capture high-order instance context from both support and query instances. By integrating the proposed two modules, SiGCN can learn rich query context representation, and thus being more robust to appearance variations. Extensive experiments on PASCAL-5i and COCO-20i demonstrate that our SiGCN achieves state-of-the-art performance.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"40 1","pages":"126"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74616690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
RGB-T Multi-Modal Crowd Counting Based on Transformer 基于变压器的RGB-T多模态人群计数
Zhengyi Liu, liuzywen
Crowd counting aims to estimate the number of persons in a scene. Most state-of-the-art crowd counting methods based on color images can't work well in poor illumination conditions due to invisible objects. With the widespread use of infrared cameras, crowd counting based on color and thermal images is studied. Existing methods only achieve multi-modal fusion without count objective constraint. To better excavate multi-modal information, we use count-guided multi-modal fusion and modal-guided count enhancement to achieve the impressive performance. The proposed count-guided multi-modal fusion module utilizes a multi-scale token transformer to interact two-modal information under the guidance of count information and perceive different scales from the token perspective. The proposed modal-guided count enhancement module employs multi-scale deformable transformer decoder structure to enhance one modality feature and count information by the other modality. Experiment in public RGBT-CC dataset shows that our method refreshes the state-of-the-art results. https://github.com/liuzywen/RGBTCC
人群计数的目的是估计一个场景中的人数。大多数最先进的基于彩色图像的人群计数方法由于不可见的物体而无法在较差的照明条件下很好地工作。随着红外摄像机的广泛应用,研究了基于彩色和热成像的人群计数。现有方法只能实现多模态融合,没有计数目标约束。为了更好地挖掘多模态信息,我们使用了计数引导的多模态融合和模态引导的计数增强来获得令人印象深刻的性能。提出的计数引导多模态融合模块利用多尺度令牌转换器在计数信息的引导下对两模态信息进行交互,并从令牌角度感知不同的尺度。该模态引导计数增强模块采用多尺度可变形变压器译码器结构,增强一种模态特征,对另一种模态进行计数。在公开RGBT-CC数据集上的实验表明,我们的方法刷新了最先进的结果。https://github.com/liuzywen/RGBTCC
{"title":"RGB-T Multi-Modal Crowd Counting Based on Transformer","authors":"Zhengyi Liu, liuzywen","doi":"10.48550/arXiv.2301.03033","DOIUrl":"https://doi.org/10.48550/arXiv.2301.03033","url":null,"abstract":"Crowd counting aims to estimate the number of persons in a scene. Most state-of-the-art crowd counting methods based on color images can't work well in poor illumination conditions due to invisible objects. With the widespread use of infrared cameras, crowd counting based on color and thermal images is studied. Existing methods only achieve multi-modal fusion without count objective constraint. To better excavate multi-modal information, we use count-guided multi-modal fusion and modal-guided count enhancement to achieve the impressive performance. The proposed count-guided multi-modal fusion module utilizes a multi-scale token transformer to interact two-modal information under the guidance of count information and perceive different scales from the token perspective. The proposed modal-guided count enhancement module employs multi-scale deformable transformer decoder structure to enhance one modality feature and count information by the other modality. Experiment in public RGBT-CC dataset shows that our method refreshes the state-of-the-art results. https://github.com/liuzywen/RGBTCC","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"35 1","pages":"427"},"PeriodicalIF":0.0,"publicationDate":"2023-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75632368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
MoBYv2AL: Self-supervised Active Learning for Image Classification MoBYv2AL:用于图像分类的自监督主动学习
Razvan Caramalau, Binod Bhattarai, D. Stoyanov, Tae-Kyun Kim
Active learning(AL) has recently gained popularity for deep learning(DL) models. This is due to efficient and informative sampling, especially when the learner requires large-scale labelled datasets. Commonly, the sampling and training happen in stages while more batches are added. One main bottleneck in this strategy is the narrow representation learned by the model that affects the overall AL selection. We present MoBYv2AL, a novel self-supervised active learning framework for image classification. Our contribution lies in lifting MoBY, one of the most successful self-supervised learning algorithms, to the AL pipeline. Thus, we add the downstream task-aware objective function and optimize it jointly with contrastive loss. Further, we derive a data-distribution selection function from labelling the new examples. Finally, we test and study our pipeline robustness and performance for image classification tasks. We successfully achieved state-of-the-art results when compared to recent AL methods. Code available: https://github.com/razvancaramalau/MoBYv2AL
主动学习(AL)最近在深度学习(DL)模型中得到了普及。这是由于高效和信息丰富的采样,特别是当学习者需要大规模标记数据集时。通常,抽样和训练是分阶段进行的,同时增加更多批次。该策略的一个主要瓶颈是模型学习到的狭窄表示,这会影响整体的人工智能选择。我们提出了一种新的用于图像分类的自监督主动学习框架MoBYv2AL。我们的贡献在于将最成功的自监督学习算法之一MoBY提升到人工智能管道中。因此,我们增加了下游任务感知目标函数,并与对比损失联合优化。此外,我们通过标记新示例推导出数据分布选择函数。最后,我们测试和研究了我们的管道鲁棒性和图像分类任务的性能。与最近的人工智能方法相比,我们成功地获得了最先进的结果。可用代码:https://github.com/razvancaramalau/MoBYv2AL
{"title":"MoBYv2AL: Self-supervised Active Learning for Image Classification","authors":"Razvan Caramalau, Binod Bhattarai, D. Stoyanov, Tae-Kyun Kim","doi":"10.48550/arXiv.2301.01531","DOIUrl":"https://doi.org/10.48550/arXiv.2301.01531","url":null,"abstract":"Active learning(AL) has recently gained popularity for deep learning(DL) models. This is due to efficient and informative sampling, especially when the learner requires large-scale labelled datasets. Commonly, the sampling and training happen in stages while more batches are added. One main bottleneck in this strategy is the narrow representation learned by the model that affects the overall AL selection. We present MoBYv2AL, a novel self-supervised active learning framework for image classification. Our contribution lies in lifting MoBY, one of the most successful self-supervised learning algorithms, to the AL pipeline. Thus, we add the downstream task-aware objective function and optimize it jointly with contrastive loss. Further, we derive a data-distribution selection function from labelling the new examples. Finally, we test and study our pipeline robustness and performance for image classification tasks. We successfully achieved state-of-the-art results when compared to recent AL methods. Code available: https://github.com/razvancaramalau/MoBYv2AL","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"358 1","pages":"674"},"PeriodicalIF":0.0,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76518513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1