用风格控制:基于风格嵌入的变异自动编码器用于受控风格化标题生成框架

IF 5 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-03-30 DOI:10.1109/TCDS.2024.3405573
Dhruv Sharma;Chhavi Dhiman;Dinesh Kumar
{"title":"用风格控制:基于风格嵌入的变异自动编码器用于受控风格化标题生成框架","authors":"Dhruv Sharma;Chhavi Dhiman;Dinesh Kumar","doi":"10.1109/TCDS.2024.3405573","DOIUrl":null,"url":null,"abstract":"Automatic image captioning is a computationally intensive and structurally complicated task that describes the contents of an image in the form of a natural language sentence. Methods developed in the recent past focused mainly on the description of factual content in images thereby ignoring the different emotions and styles (romantic, humorous, angry, etc.) associated with the image. To overcome this, few works incorporated style-based caption generation that captures the variability in the generated descriptions. This article presents a style embedding-based variational autoencoder for controlled stylized caption generation framework (RFCG+SE-VAE-CSCG). It generates controlled text-based stylized descriptions of images. It works in two phases, i.e., \n<inline-formula><tex-math>$ 1)$</tex-math></inline-formula>\n refined factual caption generation (RFCG); and \n<inline-formula><tex-math>$ 2)$</tex-math></inline-formula>\n SE-VAE-CSCG. The former defines an encoder–decoder model for the generation of refined factual captions. Whereas, the latter presents a SE-VAE for controlled stylized caption generation. The overall proposed framework generates style-based descriptions of images by leveraging bag of captions (BoCs). More so, with the use of a controlled text generation model, the proposed work efficiently learns disentangled representations and generates realistic stylized descriptions of images. Experiments on MSCOCO, Flickr30K, and FlickrStyle10K provide state-of-the-art results for both refined and style-based caption generation, supported with an ablation study.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"2032-2042"},"PeriodicalIF":5.0000,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Control With Style: Style Embedding-Based Variational Autoencoder for Controlled Stylized Caption Generation Framework\",\"authors\":\"Dhruv Sharma;Chhavi Dhiman;Dinesh Kumar\",\"doi\":\"10.1109/TCDS.2024.3405573\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic image captioning is a computationally intensive and structurally complicated task that describes the contents of an image in the form of a natural language sentence. Methods developed in the recent past focused mainly on the description of factual content in images thereby ignoring the different emotions and styles (romantic, humorous, angry, etc.) associated with the image. To overcome this, few works incorporated style-based caption generation that captures the variability in the generated descriptions. This article presents a style embedding-based variational autoencoder for controlled stylized caption generation framework (RFCG+SE-VAE-CSCG). It generates controlled text-based stylized descriptions of images. It works in two phases, i.e., \\n<inline-formula><tex-math>$ 1)$</tex-math></inline-formula>\\n refined factual caption generation (RFCG); and \\n<inline-formula><tex-math>$ 2)$</tex-math></inline-formula>\\n SE-VAE-CSCG. The former defines an encoder–decoder model for the generation of refined factual captions. Whereas, the latter presents a SE-VAE for controlled stylized caption generation. The overall proposed framework generates style-based descriptions of images by leveraging bag of captions (BoCs). More so, with the use of a controlled text generation model, the proposed work efficiently learns disentangled representations and generates realistic stylized descriptions of images. Experiments on MSCOCO, Flickr30K, and FlickrStyle10K provide state-of-the-art results for both refined and style-based caption generation, supported with an ablation study.\",\"PeriodicalId\":54300,\"journal\":{\"name\":\"IEEE Transactions on Cognitive and Developmental Systems\",\"volume\":\"16 6\",\"pages\":\"2032-2042\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2024-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Cognitive and Developmental Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10542089/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive and Developmental Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10542089/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

自动图像字幕是一项计算密集型和结构复杂的任务,它以自然语言句子的形式描述图像的内容。最近发展起来的方法主要集中在描述图像中的事实内容,从而忽略了与图像相关的不同情感和风格(浪漫,幽默,愤怒等)。为了克服这个问题,很少有作品采用基于样式的标题生成来捕获生成描述中的可变性。本文提出了一种基于样式嵌入的变分自编码器,用于控制样式化标题生成框架(RFCG+SE-VAE-CSCG)。它生成受控的基于文本的图像风格化描述。它分为两个阶段,即:$ 1)$精炼事实标题生成(RFCG);$ 2)$ SE-VAE-CSCG。前者定义了一个编码器-解码器模型,用于生成精炼的事实说明。而后者则提出了一种用于受控风格化标题生成的SE-VAE。整个提议的框架通过利用字幕包(boc)生成基于样式的图像描述。更重要的是,通过使用受控文本生成模型,所提出的工作有效地学习解纠缠表示并生成逼真的图像风格化描述。在MSCOCO、Flickr30K和FlickrStyle10K上的实验为精炼和基于样式的标题生成提供了最先进的结果,并得到了烧消研究的支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Control With Style: Style Embedding-Based Variational Autoencoder for Controlled Stylized Caption Generation Framework
Automatic image captioning is a computationally intensive and structurally complicated task that describes the contents of an image in the form of a natural language sentence. Methods developed in the recent past focused mainly on the description of factual content in images thereby ignoring the different emotions and styles (romantic, humorous, angry, etc.) associated with the image. To overcome this, few works incorporated style-based caption generation that captures the variability in the generated descriptions. This article presents a style embedding-based variational autoencoder for controlled stylized caption generation framework (RFCG+SE-VAE-CSCG). It generates controlled text-based stylized descriptions of images. It works in two phases, i.e., $ 1)$ refined factual caption generation (RFCG); and $ 2)$ SE-VAE-CSCG. The former defines an encoder–decoder model for the generation of refined factual captions. Whereas, the latter presents a SE-VAE for controlled stylized caption generation. The overall proposed framework generates style-based descriptions of images by leveraging bag of captions (BoCs). More so, with the use of a controlled text generation model, the proposed work efficiently learns disentangled representations and generates realistic stylized descriptions of images. Experiments on MSCOCO, Flickr30K, and FlickrStyle10K provide state-of-the-art results for both refined and style-based caption generation, supported with an ablation study.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.20
自引率
10.00%
发文量
170
期刊介绍: The IEEE Transactions on Cognitive and Developmental Systems (TCDS) focuses on advances in the study of development and cognition in natural (humans, animals) and artificial (robots, agents) systems. It welcomes contributions from multiple related disciplines including cognitive systems, cognitive robotics, developmental and epigenetic robotics, autonomous and evolutionary robotics, social structures, multi-agent and artificial life systems, computational neuroscience, and developmental psychology. Articles on theoretical, computational, application-oriented, and experimental studies as well as reviews in these areas are considered.
期刊最新文献
Table of Contents IEEE Transactions on Cognitive and Developmental Systems Information for Authors IEEE Computational Intelligence Society Information Editorial: 2025 New Year Message From the Editor-in-Chief IEEE Transactions on Cognitive and Developmental Systems Publication Information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1