Memory-Based Learning and Fusion Attention for Few-Shot Food Image Generation Method

Q1 Mathematics Applied Sciences Pub Date : 2024-09-17 DOI:10.3390/app14188347
Jinlin Ma, Yuetong Wan, Ziping Ma
{"title":"Memory-Based Learning and Fusion Attention for Few-Shot Food Image Generation Method","authors":"Jinlin Ma, Yuetong Wan, Ziping Ma","doi":"10.3390/app14188347","DOIUrl":null,"url":null,"abstract":"Generating food images aims to convert textual food ingredients into corresponding images for the visualization of color and shape adjustments, dietary guidance, and the creation of new dishes. It has a wide range of applications, including food recommendation, recipe development, and health management. However, existing food image generation models, predominantly based on GANs (Generative Adversarial Networks), face challenges in maintaining semantic consistency between image and text, as well as achieving visual realism in the generated images. These limitations are attributed to the constrained representational capacity of sparse ingredient embedding and the lack of diversity in GAN-based food image generation models. To alleviate this problem, this paper proposes a food image generation network, named MLA-Diff, in which ingredient and image features are learned and integrated as ingredient-image pairs to generate initial images, and then image details are refined by using an attention fusion module. The main contributions are as follows: (1) The enhanced CLIP (Contrastive Language-Image Pre-Training) module is constructed by transforming sparse ingredient embedding into compact embedding and capturing multi-scale image features, providing an effective solution to alleviate semantic consistency issues. (2) The Memory module is proposed by embedding a pre-trained diffusion model to generate initial images with diversity and reality. (3) The attention fusion module is proposed by integrating features from diverse modalities to enhance the comprehension between ingredient and image features. Extensive experiments on the Mini-food dataset demonstrate the superiority of the MLA-Diff in terms of semantic consistency and visual realism, generating high-quality food images.","PeriodicalId":8224,"journal":{"name":"Applied Sciences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/app14188347","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

Abstract

Generating food images aims to convert textual food ingredients into corresponding images for the visualization of color and shape adjustments, dietary guidance, and the creation of new dishes. It has a wide range of applications, including food recommendation, recipe development, and health management. However, existing food image generation models, predominantly based on GANs (Generative Adversarial Networks), face challenges in maintaining semantic consistency between image and text, as well as achieving visual realism in the generated images. These limitations are attributed to the constrained representational capacity of sparse ingredient embedding and the lack of diversity in GAN-based food image generation models. To alleviate this problem, this paper proposes a food image generation network, named MLA-Diff, in which ingredient and image features are learned and integrated as ingredient-image pairs to generate initial images, and then image details are refined by using an attention fusion module. The main contributions are as follows: (1) The enhanced CLIP (Contrastive Language-Image Pre-Training) module is constructed by transforming sparse ingredient embedding into compact embedding and capturing multi-scale image features, providing an effective solution to alleviate semantic consistency issues. (2) The Memory module is proposed by embedding a pre-trained diffusion model to generate initial images with diversity and reality. (3) The attention fusion module is proposed by integrating features from diverse modalities to enhance the comprehension between ingredient and image features. Extensive experiments on the Mini-food dataset demonstrate the superiority of the MLA-Diff in terms of semantic consistency and visual realism, generating high-quality food images.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于记忆学习和融合注意力的少镜头食物图像生成方法
生成食物图像的目的是将食物配料的文字转换成相应的图像,以实现颜色和形状调整的可视化、饮食指导和新菜肴的制作。它的应用范围非常广泛,包括食品推荐、食谱开发和健康管理。然而,现有的食品图像生成模型主要基于生成对抗网络(GANs),在保持图像和文本之间的语义一致性以及实现生成图像的视觉真实性方面面临挑战。这些局限性归因于稀疏成分嵌入的表征能力有限,以及基于 GAN 的食品图像生成模型缺乏多样性。为了缓解这一问题,本文提出了一种名为 MLA-Diff 的食品图像生成网络,该网络将食材特征和图像特征作为食材-图像对进行学习和整合,生成初始图像,然后通过注意力融合模块对图像细节进行细化。主要贡献如下(1) 通过将稀疏成分嵌入转化为紧凑嵌入和捕捉多尺度图像特征,构建了增强型 CLIP(对比语言-图像预训练)模块,为缓解语义一致性问题提供了有效的解决方案。(2) 通过嵌入预训练的扩散模型来生成具有多样性和真实性的初始图像,从而提出了记忆模块。(3) 提出了注意力融合模块,通过整合来自不同模态的特征来增强食材特征与图像特征之间的理解力。在迷你食品数据集上进行的大量实验证明,MLA-Diff 在语义一致性和视觉真实性方面具有优势,能生成高质量的食品图像。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Applied Sciences
Applied Sciences Mathematics-Applied Mathematics
CiteScore
6.40
自引率
0.00%
发文量
0
审稿时长
11 weeks
期刊介绍: APPS is an international journal. APPS covers a wide spectrum of pure and applied mathematics in science and technology, promoting especially papers presented at Carpato-Balkan meetings. The Editorial Board of APPS takes a very active role in selecting and refereeing papers, ensuring the best quality of contemporary mathematics and its applications. APPS is abstracted in Zentralblatt für Mathematik. The APPS journal uses Double blind peer review.
期刊最新文献
The Effectiveness of Exercise Programs on Balance, Functional Ability, Quality of Life, and Depression in Progressive Supranuclear Palsy: A Case Study Application of Historical Comprehensive Multimodal Transportation Data for Testing the Commuting Time Paradox: Evidence from the Portland, OR Region Real-Time Optimization of Ancillary Service Allocation in Renewable Energy Microgrids Using Virtual Load Exploring the Association between Pro-Inflammation and the Early Diagnosis of Alzheimer’s Disease in Buccal Cells Using Immunocytochemistry and Machine Learning Techniques HumanEnerg Hotspot: Conceptual Design of an Agile Toolkit for Human Energy Reinforcement in Industry 5.0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1