Advancing Image Generation with Denoising Diffusion Probabilistic Model and ConvNeXt-V2: A novel approach for enhanced diversity and quality

IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computer Vision and Image Understanding Pub Date : 2024-07-14 DOI:10.1016/j.cviu.2024.104077
{"title":"Advancing Image Generation with Denoising Diffusion Probabilistic Model and ConvNeXt-V2: A novel approach for enhanced diversity and quality","authors":"","doi":"10.1016/j.cviu.2024.104077","DOIUrl":null,"url":null,"abstract":"<div><p>In the rapidly evolving domain of image generation, the availability of sufficient data is crucial for effective model training. However, obtaining a large dataset is often challenging. Medical imaging, industrial monitoring, and self-driving cars are among the applications that require high-fidelity image generation from limited or single data points. The paper proposes a novel approach for increasing the diversity of images generated from a single input image by combining a Denoising Diffusion Probabilistic Model (DDPM) with the ConvNeXt-V2 architecture. This technique addresses the issue of limited data availability by utilizing single images using the BSD and Places365 datasets, significantly increasing the ability of the model through different conditions. The research greatly enhances the image quality by including Global Response Normalization (GRN) and Sigmoid-Weighted Linear Units (SiLU) in the DDPM. In-depth analyses and comparisons with the existing State-of-the-art (SOTA) models highlight the model’s effectiveness, which shows higher experimental results. Achievements include a Pixel Diversity score of 0.87±0.1, an LPIPS Diversity score of 0.42±0.03, and a SIFID for Patch Distribution of 0.046±0.02, along with notable NIQE and RECO scores. These findings indicate the exceptional ability of the model to generate a wide range of high-quality images, exhibiting significant advancement over existing State-of-the-art models in the field of image generation.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224001589","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In the rapidly evolving domain of image generation, the availability of sufficient data is crucial for effective model training. However, obtaining a large dataset is often challenging. Medical imaging, industrial monitoring, and self-driving cars are among the applications that require high-fidelity image generation from limited or single data points. The paper proposes a novel approach for increasing the diversity of images generated from a single input image by combining a Denoising Diffusion Probabilistic Model (DDPM) with the ConvNeXt-V2 architecture. This technique addresses the issue of limited data availability by utilizing single images using the BSD and Places365 datasets, significantly increasing the ability of the model through different conditions. The research greatly enhances the image quality by including Global Response Normalization (GRN) and Sigmoid-Weighted Linear Units (SiLU) in the DDPM. In-depth analyses and comparisons with the existing State-of-the-art (SOTA) models highlight the model’s effectiveness, which shows higher experimental results. Achievements include a Pixel Diversity score of 0.87±0.1, an LPIPS Diversity score of 0.42±0.03, and a SIFID for Patch Distribution of 0.046±0.02, along with notable NIQE and RECO scores. These findings indicate the exceptional ability of the model to generate a wide range of high-quality images, exhibiting significant advancement over existing State-of-the-art models in the field of image generation.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用去噪扩散概率模型和 ConvNeXt-V2 推动图像生成:提高多样性和质量的新方法
在快速发展的图像生成领域,充足的数据对于有效的模型训练至关重要。然而,获取大型数据集往往具有挑战性。医学成像、工业监控和自动驾驶汽车等应用都需要从有限或单一的数据点生成高保真图像。本文提出了一种新方法,通过将去噪扩散概率模型(DDPM)与 ConvNeXt-V2 架构相结合,增加从单一输入图像生成的图像的多样性。该技术利用 BSD 和 Places365 数据集的单张图像解决了数据可用性有限的问题,大大提高了模型在不同条件下的能力。研究通过在 DDPM 中加入全局响应归一化(GRN)和西格玛加权线性单位(SiLU),大大提高了图像质量。深入分析并与现有的最先进(SOTA)模型进行比较,凸显了该模型的有效性,并显示出更高的实验结果。所取得的成绩包括像素多样性得分(0.87±0.1)、LPIPS 多样性得分(0.42±0.03)和补丁分布的 SIFID(0.046±0.02),以及显著的 NIQE 和 RECO 分数。这些研究结果表明,该模型具有生成各种高质量图像的卓越能力,与图像生成领域现有的最先进模型相比具有显著进步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer Vision and Image Understanding
Computer Vision and Image Understanding 工程技术-工程:电子与电气
CiteScore
7.80
自引率
4.40%
发文量
112
审稿时长
79 days
期刊介绍: The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems
期刊最新文献
Deformable surface reconstruction via Riemannian metric preservation Estimating optical flow: A comprehensive review of the state of the art A lightweight convolutional neural network-based feature extractor for visible images LightSOD: Towards lightweight and efficient network for salient object detection Triple-Stream Commonsense Circulation Transformer Network for Image Captioning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1