首页 > 最新文献

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision最新文献

英文 中文
ST-CoNAL: Consistency-Based Acquisition Criterion Using Temporal Self-Ensemble for Active Learning ST-CoNAL:基于一致性的学习准则,基于时间自集成的主动学习
J. Baik, In Young Yoon, J. Choi
Modern deep learning has achieved great success in various fields. However, it requires the labeling of huge amounts of data, which is expensive and labor-intensive. Active learning (AL), which identifies the most informative samples to be labeled, is becoming increasingly important to maximize the efficiency of the training process. The existing AL methods mostly use only a single final fixed model for acquiring the samples to be labeled. This strategy may not be good enough in that the structural uncertainty of a model for given training data is not considered to acquire the samples. In this study, we propose a novel acquisition criterion based on temporal self-ensemble generated by conventional stochastic gradient descent (SGD) optimization. These self-ensemble models are obtained by capturing the intermediate network weights obtained through SGD iterations. Our acquisition function relies on a consistency measure between the student and teacher models. The student models are given a fixed number of temporal self-ensemble models, and the teacher model is constructed by averaging the weights of the student models. Using the proposed acquisition criterion, we present an AL algorithm, namely student-teacher consistency-based AL (ST-CoNAL). Experiments conducted for image classification tasks on CIFAR-10, CIFAR-100, Caltech-256, and Tiny ImageNet datasets demonstrate that the proposed ST-CoNAL achieves significantly better performance than the existing acquisition methods. Furthermore, extensive experiments show the robustness and effectiveness of our methods.
现代深度学习在各个领域都取得了巨大的成功。然而,它需要对大量数据进行标记,这既昂贵又费力。主动学习(AL)能够识别出需要标记的信息量最大的样本,这对于最大限度地提高训练过程的效率变得越来越重要。现有的人工智能方法大多只使用单一的最终固定模型来获取待标记的样本。这种策略可能不够好,因为没有考虑给定训练数据模型的结构不确定性来获取样本。在这项研究中,我们提出了一种基于传统随机梯度下降(SGD)优化产生的时间自系综的新采集准则。这些自集成模型是通过捕获通过SGD迭代获得的中间网络权重得到的。我们的习得功能依赖于学生和教师模型之间的一致性度量。学生模型被赋予固定数量的时间自集成模型,教师模型通过平均学生模型的权重来构建。基于所提出的习得准则,我们提出了一种基于学生-教师一致性的人工智能算法(ST-CoNAL)。在CIFAR-10、CIFAR-100、Caltech-256和Tiny ImageNet数据集上进行的图像分类实验表明,ST-CoNAL的性能明显优于现有的采集方法。此外,大量的实验表明了我们的方法的鲁棒性和有效性。
{"title":"ST-CoNAL: Consistency-Based Acquisition Criterion Using Temporal Self-Ensemble for Active Learning","authors":"J. Baik, In Young Yoon, J. Choi","doi":"10.48550/arXiv.2207.02182","DOIUrl":"https://doi.org/10.48550/arXiv.2207.02182","url":null,"abstract":"Modern deep learning has achieved great success in various fields. However, it requires the labeling of huge amounts of data, which is expensive and labor-intensive. Active learning (AL), which identifies the most informative samples to be labeled, is becoming increasingly important to maximize the efficiency of the training process. The existing AL methods mostly use only a single final fixed model for acquiring the samples to be labeled. This strategy may not be good enough in that the structural uncertainty of a model for given training data is not considered to acquire the samples. In this study, we propose a novel acquisition criterion based on temporal self-ensemble generated by conventional stochastic gradient descent (SGD) optimization. These self-ensemble models are obtained by capturing the intermediate network weights obtained through SGD iterations. Our acquisition function relies on a consistency measure between the student and teacher models. The student models are given a fixed number of temporal self-ensemble models, and the teacher model is constructed by averaging the weights of the student models. Using the proposed acquisition criterion, we present an AL algorithm, namely student-teacher consistency-based AL (ST-CoNAL). Experiments conducted for image classification tasks on CIFAR-10, CIFAR-100, Caltech-256, and Tiny ImageNet datasets demonstrate that the proposed ST-CoNAL achieves significantly better performance than the existing acquisition methods. Furthermore, extensive experiments show the robustness and effectiveness of our methods.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83987885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Using Privileged Information for Zero-Shot Action Recognition 利用特权信息学习零射击动作识别
Zhiyi Gao, Wanqing Li, Zihui Guo, Ting Yu, Yonghong Hou
Zero-Shot Action Recognition (ZSAR) aims to recognize video actions that have never been seen during training. Most existing methods assume a shared semantic space between seen and unseen actions and intend to directly learn a mapping from a visual space to the semantic space. This approach has been challenged by the semantic gap between the visual space and semantic space. This paper presents a novel method that uses object semantics as privileged information to narrow the semantic gap and, hence, effectively, assist the learning. In particular, a simple hallucination network is proposed to implicitly extract object semantics during testing without explicitly extracting objects and a cross-attention module is developed to augment visual feature with the object semantics. Experiments on the Olympic Sports, HMDB51 and UCF101 datasets have shown that the proposed method outperforms the state-of-the-art methods by a large margin.
零射击动作识别(ZSAR)旨在识别训练中从未见过的视频动作。大多数现有的方法假设在可见和不可见的动作之间有一个共享的语义空间,并打算直接学习从视觉空间到语义空间的映射。这种方法受到了视觉空间和语义空间之间语义差距的挑战。本文提出了一种利用对象语义作为特权信息来缩小语义差距,从而有效地辅助学习的新方法。特别地,提出了一个简单的幻觉网络,在测试过程中隐式提取对象语义,而不显式提取对象,并开发了一个交叉注意模块,以增强视觉特征与对象语义。在奥林匹克运动、HMDB51和UCF101数据集上的实验表明,所提出的方法在很大程度上优于最先进的方法。
{"title":"Learning Using Privileged Information for Zero-Shot Action Recognition","authors":"Zhiyi Gao, Wanqing Li, Zihui Guo, Ting Yu, Yonghong Hou","doi":"10.48550/arXiv.2206.08632","DOIUrl":"https://doi.org/10.48550/arXiv.2206.08632","url":null,"abstract":"Zero-Shot Action Recognition (ZSAR) aims to recognize video actions that have never been seen during training. Most existing methods assume a shared semantic space between seen and unseen actions and intend to directly learn a mapping from a visual space to the semantic space. This approach has been challenged by the semantic gap between the visual space and semantic space. This paper presents a novel method that uses object semantics as privileged information to narrow the semantic gap and, hence, effectively, assist the learning. In particular, a simple hallucination network is proposed to implicitly extract object semantics during testing without explicitly extracting objects and a cross-attention module is developed to augment visual feature with the object semantics. Experiments on the Olympic Sports, HMDB51 and UCF101 datasets have shown that the proposed method outperforms the state-of-the-art methods by a large margin.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82891807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction 3D- c2ft:用于多视图三维重建的粗到精变压器
L. Tiong, Dick Sigmund, A. Teoh
Recently, the transformer model has been successfully employed for the multi-view 3D reconstruction problem. However, challenges remain on designing an attention mechanism to explore the multiview features and exploit their relations for reinforcing the encoding-decoding modules. This paper proposes a new model, namely 3D coarse-to-fine transformer (3D-C2FT), by introducing a novel coarse-to-fine(C2F) attention mechanism for encoding multi-view features and rectifying defective 3D objects. C2F attention mechanism enables the model to learn multi-view information flow and synthesize 3D surface correction in a coarse to fine-grained manner. The proposed model is evaluated by ShapeNet and Multi-view Real-life datasets. Experimental results show that 3D-C2FT achieves notable results and outperforms several competing models on these datasets.
近年来,变压器模型已成功地用于多视图三维重建问题。然而,如何设计一种关注机制来探索多视图特征,并利用它们之间的关系来加强编解码模块,仍然是一个挑战。本文通过引入一种新的粗到精(C2F)注意机制,对多视图特征进行编码,并对有缺陷的三维物体进行校正,提出了一种新的模型3D粗到精变压器(3D- c2ft)。C2F注意机制使模型能够学习多视图信息流,并以粗粒度到细粒度的方式综合三维曲面校正。该模型通过ShapeNet和多视图真实数据集进行了评估。实验结果表明,3D-C2FT在这些数据集上取得了显著的效果,并且优于几种竞争模型。
{"title":"3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction","authors":"L. Tiong, Dick Sigmund, A. Teoh","doi":"10.48550/arXiv.2205.14575","DOIUrl":"https://doi.org/10.48550/arXiv.2205.14575","url":null,"abstract":"Recently, the transformer model has been successfully employed for the multi-view 3D reconstruction problem. However, challenges remain on designing an attention mechanism to explore the multiview features and exploit their relations for reinforcing the encoding-decoding modules. This paper proposes a new model, namely 3D coarse-to-fine transformer (3D-C2FT), by introducing a novel coarse-to-fine(C2F) attention mechanism for encoding multi-view features and rectifying defective 3D objects. C2F attention mechanism enables the model to learn multi-view information flow and synthesize 3D surface correction in a coarse to fine-grained manner. The proposed model is evaluated by ShapeNet and Multi-view Real-life datasets. Experimental results show that 3D-C2FT achieves notable results and outperforms several competing models on these datasets.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74834622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Exemplar Free Class Agnostic Counting 范例免费类不可知论计数
Viresh Ranjan, Minh Hoai
We tackle the task of Class Agnostic Counting, which aims to count objects in a novel object category at test time without any access to labeled training data for that category. All previous class agnostic counting methods cannot work in a fully automated setting, and require computationally expensive test time adaptation. To address these challenges, we propose a visual counter which operates in a fully automated setting and does not require any test time adaptation. Our proposed approach first identifies exemplars from repeating objects in an image, and then counts the repeating objects. We propose a novel region proposal network for identifying the exemplars. After identifying the exemplars, we obtain the corresponding count by using a density estimation based Visual Counter. We evaluate our proposed approach on FSC-147 dataset, and show that it achieves superior performance compared to the existing approaches.
我们解决了类不可知论计数的任务,其目的是在测试时对新对象类别中的对象进行计数,而无需访问该类别的标记训练数据。所有以前的类不可知计数方法不能在完全自动化的设置中工作,并且需要计算昂贵的测试时间适应。为了解决这些挑战,我们提出了一个可视化计数器,它在完全自动化的设置中运行,不需要任何测试时间适应。我们提出的方法首先从图像中的重复对象中识别样本,然后对重复对象进行计数。我们提出了一个新的区域建议网络来识别样本。在识别样本后,我们使用基于密度估计的视觉计数器获得相应的计数。我们在FSC-147数据集上评估了我们提出的方法,并表明与现有方法相比,它取得了更好的性能。
{"title":"Exemplar Free Class Agnostic Counting","authors":"Viresh Ranjan, Minh Hoai","doi":"10.48550/arXiv.2205.14212","DOIUrl":"https://doi.org/10.48550/arXiv.2205.14212","url":null,"abstract":"We tackle the task of Class Agnostic Counting, which aims to count objects in a novel object category at test time without any access to labeled training data for that category. All previous class agnostic counting methods cannot work in a fully automated setting, and require computationally expensive test time adaptation. To address these challenges, we propose a visual counter which operates in a fully automated setting and does not require any test time adaptation. Our proposed approach first identifies exemplars from repeating objects in an image, and then counts the repeating objects. We propose a novel region proposal network for identifying the exemplars. After identifying the exemplars, we obtain the corresponding count by using a density estimation based Visual Counter. We evaluate our proposed approach on FSC-147 dataset, and show that it achieves superior performance compared to the existing approaches.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86472185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Spotlights: Probing Shapes from Spherical Viewpoints 聚光灯:从球面视点探测形状
Jiaxin Wei, Lige Liu, Ran Cheng, W. Jiang, Minghao Xu, Xinyu Jiang, Tao Sun, S. Schwertfeger, L. Kneip
Recent years have witnessed the surge of learned representations that directly build upon point clouds. Though becoming increasingly expressive, most existing representations still struggle to generate ordered point sets. Inspired by spherical multi-view scanners, we propose a novel sampling model called Spotlights to represent a 3D shape as a compact 1D array of depth values. It simulates the configuration of cameras evenly distributed on a sphere, where each virtual camera casts light rays from its principal point through sample points on a small concentric spherical cap to probe for the possible intersections with the object surrounded by the sphere. The structured point cloud is hence given implicitly as a function of depths. We provide a detailed geometric analysis of this new sampling scheme and prove its effectiveness in the context of the point cloud completion task. Experimental results on both synthetic and real data demonstrate that our method achieves competitive accuracy and consistency while having a significantly reduced computational cost. Furthermore, we show superior performance on the downstream point cloud registration task over state-of-the-art completion methods.
近年来,直接建立在点云上的学习表示激增。尽管表达能力越来越强,但大多数现有的表示仍然难以生成有序的点集。受球形多视图扫描仪的启发,我们提出了一种称为聚光灯的新颖采样模型,将3D形状表示为紧凑的一维深度值数组。它模拟了均匀分布在一个球体上的摄像机的配置,其中每个虚拟摄像机将光线从其主点投射到一个小同心球帽上的采样点上,以探测与球体周围物体的可能相交。因此,结构化点云隐式地作为深度的函数给出。我们对这种新的采样方案进行了详细的几何分析,并证明了它在点云补全任务中的有效性。在合成数据和真实数据上的实验结果表明,该方法在显著降低计算成本的同时,取得了相当好的准确性和一致性。此外,我们在下游点云配准任务上的表现优于最先进的补全方法。
{"title":"Spotlights: Probing Shapes from Spherical Viewpoints","authors":"Jiaxin Wei, Lige Liu, Ran Cheng, W. Jiang, Minghao Xu, Xinyu Jiang, Tao Sun, S. Schwertfeger, L. Kneip","doi":"10.48550/arXiv.2205.12564","DOIUrl":"https://doi.org/10.48550/arXiv.2205.12564","url":null,"abstract":"Recent years have witnessed the surge of learned representations that directly build upon point clouds. Though becoming increasingly expressive, most existing representations still struggle to generate ordered point sets. Inspired by spherical multi-view scanners, we propose a novel sampling model called Spotlights to represent a 3D shape as a compact 1D array of depth values. It simulates the configuration of cameras evenly distributed on a sphere, where each virtual camera casts light rays from its principal point through sample points on a small concentric spherical cap to probe for the possible intersections with the object surrounded by the sphere. The structured point cloud is hence given implicitly as a function of depths. We provide a detailed geometric analysis of this new sampling scheme and prove its effectiveness in the context of the point cloud completion task. Experimental results on both synthetic and real data demonstrate that our method achieves competitive accuracy and consistency while having a significantly reduced computational cost. Furthermore, we show superior performance on the downstream point cloud registration task over state-of-the-art completion methods.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82306037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Active Domain Adaptation with Multi-level Contrastive Units for Semantic Segmentation 基于多层次对比单元的主动域自适应语义分割
Hao Zhang, Ruimao Zhang, Zhanglin Peng, Junle Wang, Yanqing Jing
To further reduce the cost of semi-supervised domain adaptation (SSDA) labeling, a more effective way is to use active learning (AL) to annotate a selected subset with specific properties. However, domain adaptation tasks are always addressed in two interactive aspects: domain transfer and the enhancement of discrimination, which requires the selected data to be both uncertain under the model and diverse in feature space. Contrary to active learning in classification tasks, it is usually challenging to select pixels that contain both the above properties in segmentation tasks, leading to the complex design of pixel selection strategy. To address such an issue, we propose a novel Active Domain Adaptation scheme with Multi-level Contrastive Units (ADA-MCU) for semantic image segmentation. A simple pixel selection strategy followed with the construction of multi-level contrastive units is introduced to optimize the model for both domain adaptation and active supervised learning. In practice, MCUs are constructed from intra-image, cross-image, and cross-domain levels by using both labeled and unlabeled pixels. At each level, we define contrastive losses from center-to-center and pixel-to-pixel manners, with the aim of jointly aligning the category centers and reducing outliers near the decision boundaries. In addition, we also introduce a categories correlation matrix to implicitly describe the relationship between categories, which are used to adjust the weights of the losses for MCUs. Extensive experimental results on standard benchmarks show that the proposed method achieves competitive performance against state-of-the-art SSDA methods with 50% fewer labeled pixels and significantly outperforms state-of-the-art with a large margin by using the same level of annotation cost.
为了进一步降低半监督域自适应(SSDA)标注的成本,一种更有效的方法是使用主动学习(AL)对选定的子集进行特定属性标注。然而,领域自适应任务总是在两个相互作用的方面解决:领域转移和增强识别,这要求所选数据在模型下具有不确定性,在特征空间上具有多样性。与分类任务中的主动学习相反,在分割任务中选择包含上述属性的像素通常具有挑战性,导致像素选择策略的设计非常复杂。为了解决这一问题,我们提出了一种新的基于多级对比单元(ADA-MCU)的语义图像分割主动域自适应方案。引入一种简单的像素选择策略,然后构建多层次对比单元,以优化模型的领域适应和主动监督学习。实际上,mcu通过使用标记和未标记的像素,从图像内、跨图像和跨域级别构建。在每个层次上,我们定义了从中心到中心和像素到像素的对比损失,目的是共同对齐类别中心并减少决策边界附近的异常值。此外,我们还引入了一个类别相关矩阵来隐式描述类别之间的关系,用于调整mcu的损失权重。在标准基准测试上的大量实验结果表明,所提出的方法与最先进的SSDA方法相比具有竞争力的性能,标记像素减少了50%,并且在使用相同水平的标注成本时显著优于最先进的方法。
{"title":"Active Domain Adaptation with Multi-level Contrastive Units for Semantic Segmentation","authors":"Hao Zhang, Ruimao Zhang, Zhanglin Peng, Junle Wang, Yanqing Jing","doi":"10.48550/arXiv.2205.11192","DOIUrl":"https://doi.org/10.48550/arXiv.2205.11192","url":null,"abstract":"To further reduce the cost of semi-supervised domain adaptation (SSDA) labeling, a more effective way is to use active learning (AL) to annotate a selected subset with specific properties. However, domain adaptation tasks are always addressed in two interactive aspects: domain transfer and the enhancement of discrimination, which requires the selected data to be both uncertain under the model and diverse in feature space. Contrary to active learning in classification tasks, it is usually challenging to select pixels that contain both the above properties in segmentation tasks, leading to the complex design of pixel selection strategy. To address such an issue, we propose a novel Active Domain Adaptation scheme with Multi-level Contrastive Units (ADA-MCU) for semantic image segmentation. A simple pixel selection strategy followed with the construction of multi-level contrastive units is introduced to optimize the model for both domain adaptation and active supervised learning. In practice, MCUs are constructed from intra-image, cross-image, and cross-domain levels by using both labeled and unlabeled pixels. At each level, we define contrastive losses from center-to-center and pixel-to-pixel manners, with the aim of jointly aligning the category centers and reducing outliers near the decision boundaries. In addition, we also introduce a categories correlation matrix to implicitly describe the relationship between categories, which are used to adjust the weights of the losses for MCUs. Extensive experimental results on standard benchmarks show that the proposed method achieves competitive performance against state-of-the-art SSDA methods with 50% fewer labeled pixels and significantly outperforms state-of-the-art with a large margin by using the same level of annotation cost.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77501476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CV4Code: Sourcecode Understanding via Visual Code Representations CV4Code:通过可视化代码表示来理解源代码
Ruibo Shi, Lili Tao, Rohan Saphal, Fran Silavong, S. Moran
We present CV4Code, a compact and effective computer vision method for sourcecode understanding. Our method leverages the contextual and the structural information available from the code snippet by treating each snippet as a two-dimensional image, which naturally encodes the context and retains the underlying structural information through an explicit spatial representation. To codify snippets as images, we propose an ASCII codepoint-based image representation that facilitates fast generation of sourcecode images and eliminates redundancy in the encoding that would arise from an RGB pixel representation. Furthermore, as sourcecode is treated as images, neither lexical analysis (tokenisation) nor syntax tree parsing is required, which makes the proposed method agnostic to any particular programming language and lightweight from the application pipeline point of view. CV4Code can even featurise syntactically incorrect code which is not possible from methods that depend on the Abstract Syntax Tree (AST). We demonstrate the effectiveness of CV4Code by learning Convolutional and Transformer networks to predict the functional task, i.e. the problem it solves, of the source code directly from its two-dimensional representation, and using an embedding from its latent space to derive a similarity score of two code snippets in a retrieval setup. Experimental results show that our approach achieves state-of-the-art performance in comparison to other methods with the same task and data configurations. For the first time we show the benefits of treating sourcecode understanding as a form of image processing task.
我们提出CV4Code,一个紧凑和有效的计算机视觉方法的源代码理解。我们的方法通过将每个代码片段视为二维图像来利用代码片段中可用的上下文和结构信息,该图像自然地对上下文进行编码,并通过显式的空间表示保留底层结构信息。为了将片段编码为图像,我们提出了一种基于ASCII码点的图像表示,它有助于快速生成源代码图像,并消除了由RGB像素表示产生的编码冗余。此外,由于源代码被视为图像,因此既不需要词法分析(标记化)也不需要语法树解析,这使得所提出的方法与任何特定的编程语言无关,并且从应用程序管道的角度来看是轻量级的。CV4Code甚至可以提供语法错误的代码,这是不可能从依赖于抽象语法树(AST)的方法。我们通过学习卷积和变压器网络来证明CV4Code的有效性,以直接从其二维表示中预测源代码的功能任务,即它解决的问题,并使用其潜在空间的嵌入来获得检索设置中两个代码片段的相似分数。实验结果表明,与具有相同任务和数据配置的其他方法相比,我们的方法达到了最先进的性能。我们首次展示了将源代码理解作为图像处理任务的一种形式的好处。
{"title":"CV4Code: Sourcecode Understanding via Visual Code Representations","authors":"Ruibo Shi, Lili Tao, Rohan Saphal, Fran Silavong, S. Moran","doi":"10.48550/arXiv.2205.08585","DOIUrl":"https://doi.org/10.48550/arXiv.2205.08585","url":null,"abstract":"We present CV4Code, a compact and effective computer vision method for sourcecode understanding. Our method leverages the contextual and the structural information available from the code snippet by treating each snippet as a two-dimensional image, which naturally encodes the context and retains the underlying structural information through an explicit spatial representation. To codify snippets as images, we propose an ASCII codepoint-based image representation that facilitates fast generation of sourcecode images and eliminates redundancy in the encoding that would arise from an RGB pixel representation. Furthermore, as sourcecode is treated as images, neither lexical analysis (tokenisation) nor syntax tree parsing is required, which makes the proposed method agnostic to any particular programming language and lightweight from the application pipeline point of view. CV4Code can even featurise syntactically incorrect code which is not possible from methods that depend on the Abstract Syntax Tree (AST). We demonstrate the effectiveness of CV4Code by learning Convolutional and Transformer networks to predict the functional task, i.e. the problem it solves, of the source code directly from its two-dimensional representation, and using an embedding from its latent space to derive a similarity score of two code snippets in a retrieval setup. Experimental results show that our approach achieves state-of-the-art performance in comparison to other methods with the same task and data configurations. For the first time we show the benefits of treating sourcecode understanding as a form of image processing task.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80523458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal Cross-Attention for Action Recognition 动作识别的时间交叉注意
Ryota Hashiguchi, Toru Tamaki
{"title":"Temporal Cross-Attention for Action Recognition","authors":"Ryota Hashiguchi, Toru Tamaki","doi":"10.1007/978-3-031-27066-6_20","DOIUrl":"https://doi.org/10.1007/978-3-031-27066-6_20","url":null,"abstract":"","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85888216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Diffusion Models for Counterfactual Explanations 反事实解释的扩散模型
Guillaume Jeanneret, Loïc Simon, F. Jurie
Counterfactual explanations have shown promising results as a post-hoc framework to make image classifiers more explainable. In this paper, we propose DiME, a method allowing the generation of counterfactual images using the recent diffusion models. By leveraging the guided generative diffusion process, our proposed methodology shows how to use the gradients of the target classifier to generate counterfactual explanations of input instances. Further, we analyze current approaches to evaluate spurious correlations and extend the evaluation measurements by proposing a new metric: Correlation Difference. Our experimental validations show that the proposed algorithm surpasses previous State-of-the-Art results on 5 out of 6 metrics on CelebA.
反事实解释作为一种使图像分类器更具可解释性的事后框架已经显示出有希望的结果。在本文中,我们提出了DiME,一种允许使用最新扩散模型生成反事实图像的方法。通过利用引导生成扩散过程,我们提出的方法展示了如何使用目标分类器的梯度来生成输入实例的反事实解释。此外,我们分析了目前评估虚假相关的方法,并通过提出一个新的度量:相关差来扩展评估度量。我们的实验验证表明,所提出的算法在CelebA的6个指标中的5个指标上超过了以前最先进的结果。
{"title":"Diffusion Models for Counterfactual Explanations","authors":"Guillaume Jeanneret, Loïc Simon, F. Jurie","doi":"10.48550/arXiv.2203.15636","DOIUrl":"https://doi.org/10.48550/arXiv.2203.15636","url":null,"abstract":"Counterfactual explanations have shown promising results as a post-hoc framework to make image classifiers more explainable. In this paper, we propose DiME, a method allowing the generation of counterfactual images using the recent diffusion models. By leveraging the guided generative diffusion process, our proposed methodology shows how to use the gradients of the target classifier to generate counterfactual explanations of input instances. Further, we analyze current approaches to evaluate spurious correlations and extend the evaluation measurements by proposing a new metric: Correlation Difference. Our experimental validations show that the proposed algorithm surpasses previous State-of-the-Art results on 5 out of 6 metrics on CelebA.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76474249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Learning to Adapt to Unseen Abnormal Activities Under Weak Supervision 学习适应弱监督下看不见的异常活动
Jaeyoo Park, Junha Kim, Bohyung Han
{"title":"Learning to Adapt to Unseen Abnormal Activities Under Weak Supervision","authors":"Jaeyoo Park, Junha Kim, Bohyung Han","doi":"10.1007/978-3-030-69541-5_31","DOIUrl":"https://doi.org/10.1007/978-3-030-69541-5_31","url":null,"abstract":"","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80105854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1