Shapley visual transformers for image-to-text generation

IF 7.2 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Applied Soft Computing Pub Date : 2024-09-02 DOI:10.1016/j.asoc.2024.112205
{"title":"Shapley visual transformers for image-to-text generation","authors":"","doi":"10.1016/j.asoc.2024.112205","DOIUrl":null,"url":null,"abstract":"<div><p>In the contemporary landscape of the web, text-to-image generation stands out as a crucial information service. Recently, deep learning has emerged as the cutting-edge methodology for advancing text-to-image generation systems. However, these models are typically constructed using domain knowledge specific to the application at hand and a very particular data distribution. Consequently, data scientists must be well-versed in the relevant subject. In this research work, we target a new foundation for text-to-image generation systems by introducing a consensus method that facilitates self-adaptation and flexibility to handle different learning tasks and diverse data distributions. This paper presents I2T-SP (Image-to-Text Generation for Shapley Pruning) as a consensus method for general-purpose intelligence without the assistance of a domain expert. The trained model is developed using a general deep-learning approach that investigates the contribution of each model in the training process. Multiple deep learning models are trained for each set of historical data, and the Shapley Value is determined to compute the contribution of each subset of models in the training. Subsequently, the models are pruned according to their contribution to the learning process. We present the evaluation of the generality of I2T-SP using different datasets with varying shapes and complexities. The results reveal the effectiveness of I2T-SP compared to baseline image-to-text generation solutions. This research marks a significant step towards establishing a more adaptable and broadly applicable foundation for image-to-text generation systems.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1568494624009797/pdfft?md5=39a52c2a5fb7a1074b8c576121d7aca6&pid=1-s2.0-S1568494624009797-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494624009797","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In the contemporary landscape of the web, text-to-image generation stands out as a crucial information service. Recently, deep learning has emerged as the cutting-edge methodology for advancing text-to-image generation systems. However, these models are typically constructed using domain knowledge specific to the application at hand and a very particular data distribution. Consequently, data scientists must be well-versed in the relevant subject. In this research work, we target a new foundation for text-to-image generation systems by introducing a consensus method that facilitates self-adaptation and flexibility to handle different learning tasks and diverse data distributions. This paper presents I2T-SP (Image-to-Text Generation for Shapley Pruning) as a consensus method for general-purpose intelligence without the assistance of a domain expert. The trained model is developed using a general deep-learning approach that investigates the contribution of each model in the training process. Multiple deep learning models are trained for each set of historical data, and the Shapley Value is determined to compute the contribution of each subset of models in the training. Subsequently, the models are pruned according to their contribution to the learning process. We present the evaluation of the generality of I2T-SP using different datasets with varying shapes and complexities. The results reveal the effectiveness of I2T-SP compared to baseline image-to-text generation solutions. This research marks a significant step towards establishing a more adaptable and broadly applicable foundation for image-to-text generation systems.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于图像到文本生成的夏普利视觉变换器
在当代网络环境中,文本到图像的生成是一项重要的信息服务。最近,深度学习已成为推动文本到图像生成系统发展的最前沿方法。然而,这些模型通常是利用与当前应用相关的领域知识和非常特殊的数据分布来构建的。因此,数据科学家必须精通相关学科。在这项研究工作中,我们的目标是为文本到图像生成系统奠定新的基础,为此我们引入了一种共识方法,这种方法有利于自适应和灵活处理不同的学习任务和多样化的数据分布。本文提出的 I2T-SP(夏普利剪枝的图像到文本生成)是一种无需领域专家协助的通用智能共识方法。训练好的模型采用通用的深度学习方法开发,该方法研究了每个模型在训练过程中的贡献。针对每组历史数据训练多个深度学习模型,并确定 Shapley 值,以计算每个模型子集在训练中的贡献。随后,根据模型对学习过程的贡献对其进行剪枝。我们使用不同形状和复杂程度的数据集对 I2T-SP 的通用性进行了评估。结果显示,与基线图像到文本生成解决方案相比,I2T-SP 非常有效。这项研究标志着我们在为图像到文本生成系统建立适应性更强、适用范围更广的基础方面迈出了重要一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Applied Soft Computing
Applied Soft Computing 工程技术-计算机:跨学科应用
CiteScore
15.80
自引率
6.90%
发文量
874
审稿时长
10.9 months
期刊介绍: Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.
期刊最新文献
An adaptive genetic algorithm with neighborhood search for integrated O2O takeaway order assignment and delivery optimization by e-bikes with varied compartments LesionMix data enhancement and entropy minimization for semi-supervised lesion segmentation of lung cancer A preordonance-based decision tree method and its parallel implementation in the framework of Map-Reduce A personality-guided preference aggregator for ephemeral group recommendation A decomposition-based multi-objective evolutionary algorithm using infinitesimal method
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1