Shapley visual transformers for image-to-text generation

IF 6.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Applied Soft Computing Pub Date : 2024-09-02 DOI:10.1016/j.asoc.2024.112205

Asma Belhadi , Youcef Djenouri , Ahmed Nabil Belbachir , Tomasz Michalak , Gautam Srivastava

{"title":"Shapley visual transformers for image-to-text generation","authors":"Asma Belhadi , Youcef Djenouri , Ahmed Nabil Belbachir , Tomasz Michalak , Gautam Srivastava","doi":"10.1016/j.asoc.2024.112205","DOIUrl":null,"url":null,"abstract":"<div><p>In the contemporary landscape of the web, text-to-image generation stands out as a crucial information service. Recently, deep learning has emerged as the cutting-edge methodology for advancing text-to-image generation systems. However, these models are typically constructed using domain knowledge specific to the application at hand and a very particular data distribution. Consequently, data scientists must be well-versed in the relevant subject. In this research work, we target a new foundation for text-to-image generation systems by introducing a consensus method that facilitates self-adaptation and flexibility to handle different learning tasks and diverse data distributions. This paper presents I2T-SP (Image-to-Text Generation for Shapley Pruning) as a consensus method for general-purpose intelligence without the assistance of a domain expert. The trained model is developed using a general deep-learning approach that investigates the contribution of each model in the training process. Multiple deep learning models are trained for each set of historical data, and the Shapley Value is determined to compute the contribution of each subset of models in the training. Subsequently, the models are pruned according to their contribution to the learning process. We present the evaluation of the generality of I2T-SP using different datasets with varying shapes and complexities. The results reveal the effectiveness of I2T-SP compared to baseline image-to-text generation solutions. This research marks a significant step towards establishing a more adaptable and broadly applicable foundation for image-to-text generation systems.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"166 ","pages":"Article 112205"},"PeriodicalIF":6.6000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1568494624009797/pdfft?md5=39a52c2a5fb7a1074b8c576121d7aca6&pid=1-s2.0-S1568494624009797-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494624009797","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In the contemporary landscape of the web, text-to-image generation stands out as a crucial information service. Recently, deep learning has emerged as the cutting-edge methodology for advancing text-to-image generation systems. However, these models are typically constructed using domain knowledge specific to the application at hand and a very particular data distribution. Consequently, data scientists must be well-versed in the relevant subject. In this research work, we target a new foundation for text-to-image generation systems by introducing a consensus method that facilitates self-adaptation and flexibility to handle different learning tasks and diverse data distributions. This paper presents I2T-SP (Image-to-Text Generation for Shapley Pruning) as a consensus method for general-purpose intelligence without the assistance of a domain expert. The trained model is developed using a general deep-learning approach that investigates the contribution of each model in the training process. Multiple deep learning models are trained for each set of historical data, and the Shapley Value is determined to compute the contribution of each subset of models in the training. Subsequently, the models are pruned according to their contribution to the learning process. We present the evaluation of the generality of I2T-SP using different datasets with varying shapes and complexities. The results reveal the effectiveness of I2T-SP compared to baseline image-to-text generation solutions. This research marks a significant step towards establishing a more adaptable and broadly applicable foundation for image-to-text generation systems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于图像到文本生成的夏普利视觉变换器

在当代网络环境中，文本到图像的生成是一项重要的信息服务。最近，深度学习已成为推动文本到图像生成系统发展的最前沿方法。然而，这些模型通常是利用与当前应用相关的领域知识和非常特殊的数据分布来构建的。因此，数据科学家必须精通相关学科。在这项研究工作中，我们的目标是为文本到图像生成系统奠定新的基础，为此我们引入了一种共识方法，这种方法有利于自适应和灵活处理不同的学习任务和多样化的数据分布。本文提出的 I2T-SP（夏普利剪枝的图像到文本生成）是一种无需领域专家协助的通用智能共识方法。训练好的模型采用通用的深度学习方法开发，该方法研究了每个模型在训练过程中的贡献。针对每组历史数据训练多个深度学习模型，并确定 Shapley 值，以计算每个模型子集在训练中的贡献。随后，根据模型对学习过程的贡献对其进行剪枝。我们使用不同形状和复杂程度的数据集对 I2T-SP 的通用性进行了评估。结果显示，与基线图像到文本生成解决方案相比，I2T-SP 非常有效。这项研究标志着我们在为图像到文本生成系统建立适应性更强、适用范围更广的基础方面迈出了重要一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.