CNN2Graph: Building Graphs for Image Classification

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2023-01-01 DOI:10.1109/WACV56688.2023.00009

Vivek Trivedy, Longin Jan Latecki

{"title":"CNN2Graph: Building Graphs for Image Classification","authors":"Vivek Trivedy, Longin Jan Latecki","doi":"10.1109/WACV56688.2023.00009","DOIUrl":null,"url":null,"abstract":"Neural Network classifiers generally operate via the i.i.d. assumption where examples are passed through independently during training. We propose CNN2GNN and CNN2Transformer which instead leverage inter-example information for classification. We use Graph Neural Networks (GNNs) to build a latent space bipartite graph and compute cross-attention scores between input images and a proxy set. Our approach addresses several challenges of existing methods. Firstly, it is end-to-end differentiable despite the generally discrete nature of graph construction. Secondly, it allows inductive inference at no extra cost. Thirdly, it presents a simple method to construct graphs from arbitrary datasets that captures both example level and class level information. Finally, it addresses the proxy collapse problem by combining contrastive and cross-entropy losses rather than separate clustering algorithms. Our results increase classification performance over baseline experiments and outperform other methods. We also conduct an empirical investigation showing that Transformer style attention scales better than GAT attention with dataset size.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"122 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV56688.2023.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Neural Network classifiers generally operate via the i.i.d. assumption where examples are passed through independently during training. We propose CNN2GNN and CNN2Transformer which instead leverage inter-example information for classification. We use Graph Neural Networks (GNNs) to build a latent space bipartite graph and compute cross-attention scores between input images and a proxy set. Our approach addresses several challenges of existing methods. Firstly, it is end-to-end differentiable despite the generally discrete nature of graph construction. Secondly, it allows inductive inference at no extra cost. Thirdly, it presents a simple method to construct graphs from arbitrary datasets that captures both example level and class level information. Finally, it addresses the proxy collapse problem by combining contrastive and cross-entropy losses rather than separate clustering algorithms. Our results increase classification performance over baseline experiments and outperform other methods. We also conduct an empirical investigation showing that Transformer style attention scales better than GAT attention with dataset size.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CNN2Graph:构建图像分类图

神经网络分类器通常通过i.i.d假设进行操作，其中示例在训练过程中独立通过。我们提出了CNN2GNN和CNN2Transformer，它们利用样本间信息进行分类。我们使用图神经网络(gnn)来构建潜在空间二部图，并计算输入图像与代理集之间的交叉注意分数。我们的方法解决了现有方法的几个挑战。首先，尽管图的构造通常是离散的，但它是端到端可微的。其次，它允许在没有额外成本的情况下进行归纳推理。第三，提出了一种从任意数据集构建图的简单方法，该方法既可以捕获示例级别信息，也可以捕获类级别信息。最后，它通过结合对比和交叉熵损失而不是单独的聚类算法来解决代理崩溃问题。我们的结果比基线实验提高了分类性能，并且优于其他方法。我们还进行了一项实证调查，表明Transformer风格的注意力在数据集大小上优于GAT风格的注意力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

自引率

0.00%

发文量

期刊最新文献

Aggregating Bilateral Attention for Few-Shot Instance Localization Burst Reflection Removal using Reflection Motion Aggregation Cues Complementary Cues from Audio Help Combat Noise in Weakly-Supervised Object Detection Efficient Skeleton-Based Action Recognition via Joint-Mapping strategies Few-shot Object Detection via Improved Classification Features