Advancing Causal Intervention in Image Captioning With Causal Prompt

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE transactions on neural networks and learning systems Pub Date : 2024-11-14 DOI:10.1109/TNNLS.2024.3487200

Youngjoon Yu;Yeonju Kim;Yong Man Ro

{"title":"Advancing Causal Intervention in Image Captioning With Causal Prompt","authors":"Youngjoon Yu;Yeonju Kim;Yong Man Ro","doi":"10.1109/TNNLS.2024.3487200","DOIUrl":null,"url":null,"abstract":"This article introduces a novel approach, called causal prompting network (CPNet), to enhance the causal intervention in the context of image captioning. By leveraging visual prompt engineering in the feature space, this method aims to achieve superior performance in causal intervention tasks. Since CPNet is highly flexible and adaptable, it can be incorporated into any existing causal intervention-based image captioning framework. Specifically, two types of visual prompts—causal region of interest (RoI) prompt (CRP) and causal matching prompt (CMP)—are employed to refine the feature representations effectively. CRP is utilized on the RoI feature of the object feature to enhance RoI features with deconfounded causal features. Meanwhile, CMP is used to strengthen the contextual representation of confounders linked to image captioning tasks. To evaluate the proposed CPNet’s effectiveness, an extensive range of experiments are conducted on the popular microsoft common objects in context dataset (MS-COCO) and Flickr30k datasets, and the results are validated using the Karpathy split. Experimental results demonstrate that the proposed CPNet surpasses the performance of other state-of-the-art (SOTA) image captioning methods.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 7","pages":"12631-12642"},"PeriodicalIF":8.9000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10753054/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

This article introduces a novel approach, called causal prompting network (CPNet), to enhance the causal intervention in the context of image captioning. By leveraging visual prompt engineering in the feature space, this method aims to achieve superior performance in causal intervention tasks. Since CPNet is highly flexible and adaptable, it can be incorporated into any existing causal intervention-based image captioning framework. Specifically, two types of visual prompts—causal region of interest (RoI) prompt (CRP) and causal matching prompt (CMP)—are employed to refine the feature representations effectively. CRP is utilized on the RoI feature of the object feature to enhance RoI features with deconfounded causal features. Meanwhile, CMP is used to strengthen the contextual representation of confounders linked to image captioning tasks. To evaluate the proposed CPNet’s effectiveness, an extensive range of experiments are conducted on the popular microsoft common objects in context dataset (MS-COCO) and Flickr30k datasets, and the results are validated using the Karpathy split. Experimental results demonstrate that the proposed CPNet surpasses the performance of other state-of-the-art (SOTA) image captioning methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用因果提示推进图像标题中的因果干预

本文介绍了一种新的方法，称为因果提示网络（CPNet），以增强图像字幕背景下的因果干预。通过利用特征空间中的视觉提示工程，该方法旨在在因果干预任务中取得优异的性能。由于CPNet具有高度的灵活性和适应性，因此可以将其纳入任何现有的基于因果干预的图像字幕框架。具体来说，两种类型的视觉提示-因果感兴趣区域（RoI）提示（CRP）和因果匹配提示(CMP) -被用来有效地改进特征表示。在对象特征的RoI特征上利用CRP，增强具有反成立因果特征的RoI特征。同时，CMP用于加强与图像字幕任务相关的混杂因素的上下文表示。为了评估所提出的CPNet的有效性，在流行的microsoft公共对象上下文数据集（MS-COCO）和Flickr30k数据集上进行了广泛的实验，并使用Karpathy分裂对结果进行了验证。实验结果表明，所提出的CPNet优于其他最先进的图像字幕方法（SOTA）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.