Advancing Causal Intervention in Image Captioning With Causal Prompt

IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE transactions on neural networks and learning systems Pub Date : 2024-11-14 DOI:10.1109/TNNLS.2024.3487200
Youngjoon Yu;Yeonju Kim;Yong Man Ro
{"title":"Advancing Causal Intervention in Image Captioning With Causal Prompt","authors":"Youngjoon Yu;Yeonju Kim;Yong Man Ro","doi":"10.1109/TNNLS.2024.3487200","DOIUrl":null,"url":null,"abstract":"This article introduces a novel approach, called causal prompting network (CPNet), to enhance the causal intervention in the context of image captioning. By leveraging visual prompt engineering in the feature space, this method aims to achieve superior performance in causal intervention tasks. Since CPNet is highly flexible and adaptable, it can be incorporated into any existing causal intervention-based image captioning framework. Specifically, two types of visual prompts—causal region of interest (RoI) prompt (CRP) and causal matching prompt (CMP)—are employed to refine the feature representations effectively. CRP is utilized on the RoI feature of the object feature to enhance RoI features with deconfounded causal features. Meanwhile, CMP is used to strengthen the contextual representation of confounders linked to image captioning tasks. To evaluate the proposed CPNet’s effectiveness, an extensive range of experiments are conducted on the popular microsoft common objects in context dataset (MS-COCO) and Flickr30k datasets, and the results are validated using the Karpathy split. Experimental results demonstrate that the proposed CPNet surpasses the performance of other state-of-the-art (SOTA) image captioning methods.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 7","pages":"12631-12642"},"PeriodicalIF":8.9000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10753054/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

This article introduces a novel approach, called causal prompting network (CPNet), to enhance the causal intervention in the context of image captioning. By leveraging visual prompt engineering in the feature space, this method aims to achieve superior performance in causal intervention tasks. Since CPNet is highly flexible and adaptable, it can be incorporated into any existing causal intervention-based image captioning framework. Specifically, two types of visual prompts—causal region of interest (RoI) prompt (CRP) and causal matching prompt (CMP)—are employed to refine the feature representations effectively. CRP is utilized on the RoI feature of the object feature to enhance RoI features with deconfounded causal features. Meanwhile, CMP is used to strengthen the contextual representation of confounders linked to image captioning tasks. To evaluate the proposed CPNet’s effectiveness, an extensive range of experiments are conducted on the popular microsoft common objects in context dataset (MS-COCO) and Flickr30k datasets, and the results are validated using the Karpathy split. Experimental results demonstrate that the proposed CPNet surpasses the performance of other state-of-the-art (SOTA) image captioning methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用因果提示推进图像标题中的因果干预
本文介绍了一种新的方法,称为因果提示网络(CPNet),以增强图像字幕背景下的因果干预。通过利用特征空间中的视觉提示工程,该方法旨在在因果干预任务中取得优异的性能。由于CPNet具有高度的灵活性和适应性,因此可以将其纳入任何现有的基于因果干预的图像字幕框架。具体来说,两种类型的视觉提示-因果感兴趣区域(RoI)提示(CRP)和因果匹配提示(CMP) -被用来有效地改进特征表示。在对象特征的RoI特征上利用CRP,增强具有反成立因果特征的RoI特征。同时,CMP用于加强与图像字幕任务相关的混杂因素的上下文表示。为了评估所提出的CPNet的有效性,在流行的microsoft公共对象上下文数据集(MS-COCO)和Flickr30k数据集上进行了广泛的实验,并使用Karpathy分裂对结果进行了验证。实验结果表明,所提出的CPNet优于其他最先进的图像字幕方法(SOTA)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE transactions on neural networks and learning systems
IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
CiteScore
23.80
自引率
9.60%
发文量
2102
审稿时长
3-8 weeks
期刊介绍: The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.
期刊最新文献
A Dual-Network Framework With Adversarial GMM Augmentation and Frequency-Mamba Fusion for Hyperspectral Target Detection. Disentangled Generative Graph Representation Learning Adaptive Prototype-Guided Personalized Propagation for Heterophilic Graphs With Missing Data. Causal Counterfactual Inference Network for Video Object State Changes in Open-World Scenarios. Attribute-Topology Cross-Frequency Aligned Graph Neural Networks for Homophilic and Heterophilic Graphs in Node Classification.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1