Advancing Image Generation with Denoising Diffusion Probabilistic Model and ConvNeXt-V2: A novel approach for enhanced diversity and quality

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computer Vision and Image Understanding Pub Date : 2024-10-01 Epub Date: 2024-07-14 DOI:10.1016/j.cviu.2024.104077

Ayushi Verma, Tapas Badal, Abhay Bansal

{"title":"Advancing Image Generation with Denoising Diffusion Probabilistic Model and ConvNeXt-V2: A novel approach for enhanced diversity and quality","authors":"Ayushi Verma, Tapas Badal, Abhay Bansal","doi":"10.1016/j.cviu.2024.104077","DOIUrl":null,"url":null,"abstract":"<div><p>In the rapidly evolving domain of image generation, the availability of sufficient data is crucial for effective model training. However, obtaining a large dataset is often challenging. Medical imaging, industrial monitoring, and self-driving cars are among the applications that require high-fidelity image generation from limited or single data points. The paper proposes a novel approach for increasing the diversity of images generated from a single input image by combining a Denoising Diffusion Probabilistic Model (DDPM) with the ConvNeXt-V2 architecture. This technique addresses the issue of limited data availability by utilizing single images using the BSD and Places365 datasets, significantly increasing the ability of the model through different conditions. The research greatly enhances the image quality by including Global Response Normalization (GRN) and Sigmoid-Weighted Linear Units (SiLU) in the DDPM. In-depth analyses and comparisons with the existing State-of-the-art (SOTA) models highlight the model’s effectiveness, which shows higher experimental results. Achievements include a Pixel Diversity score of 0.87±0.1, an LPIPS Diversity score of 0.42±0.03, and a SIFID for Patch Distribution of 0.046±0.02, along with notable NIQE and RECO scores. These findings indicate the exceptional ability of the model to generate a wide range of high-quality images, exhibiting significant advancement over existing State-of-the-art models in the field of image generation.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"247 ","pages":"Article 104077"},"PeriodicalIF":3.5000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224001589","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/14 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In the rapidly evolving domain of image generation, the availability of sufficient data is crucial for effective model training. However, obtaining a large dataset is often challenging. Medical imaging, industrial monitoring, and self-driving cars are among the applications that require high-fidelity image generation from limited or single data points. The paper proposes a novel approach for increasing the diversity of images generated from a single input image by combining a Denoising Diffusion Probabilistic Model (DDPM) with the ConvNeXt-V2 architecture. This technique addresses the issue of limited data availability by utilizing single images using the BSD and Places365 datasets, significantly increasing the ability of the model through different conditions. The research greatly enhances the image quality by including Global Response Normalization (GRN) and Sigmoid-Weighted Linear Units (SiLU) in the DDPM. In-depth analyses and comparisons with the existing State-of-the-art (SOTA) models highlight the model’s effectiveness, which shows higher experimental results. Achievements include a Pixel Diversity score of 0.87±0.1, an LPIPS Diversity score of 0.42±0.03, and a SIFID for Patch Distribution of 0.046±0.02, along with notable NIQE and RECO scores. These findings indicate the exceptional ability of the model to generate a wide range of high-quality images, exhibiting significant advancement over existing State-of-the-art models in the field of image generation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用去噪扩散概率模型和 ConvNeXt-V2 推动图像生成：提高多样性和质量的新方法

在快速发展的图像生成领域，充足的数据对于有效的模型训练至关重要。然而，获取大型数据集往往具有挑战性。医学成像、工业监控和自动驾驶汽车等应用都需要从有限或单一的数据点生成高保真图像。本文提出了一种新方法，通过将去噪扩散概率模型（DDPM）与 ConvNeXt-V2 架构相结合，增加从单一输入图像生成的图像的多样性。该技术利用 BSD 和 Places365 数据集的单张图像解决了数据可用性有限的问题，大大提高了模型在不同条件下的能力。研究通过在 DDPM 中加入全局响应归一化（GRN）和西格玛加权线性单位（SiLU），大大提高了图像质量。深入分析并与现有的最先进（SOTA）模型进行比较，凸显了该模型的有效性，并显示出更高的实验结果。所取得的成绩包括像素多样性得分（0.87±0.1）、LPIPS 多样性得分（0.42±0.03）和补丁分布的 SIFID（0.046±0.02），以及显著的 NIQE 和 RECO 分数。这些研究结果表明，该模型具有生成各种高质量图像的卓越能力，与图像生成领域现有的最先进模型相比具有显著进步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems