{"title":"Advancing Image Generation with Denoising Diffusion Probabilistic Model and ConvNeXt-V2: A novel approach for enhanced diversity and quality","authors":"","doi":"10.1016/j.cviu.2024.104077","DOIUrl":null,"url":null,"abstract":"<div><p>In the rapidly evolving domain of image generation, the availability of sufficient data is crucial for effective model training. However, obtaining a large dataset is often challenging. Medical imaging, industrial monitoring, and self-driving cars are among the applications that require high-fidelity image generation from limited or single data points. The paper proposes a novel approach for increasing the diversity of images generated from a single input image by combining a Denoising Diffusion Probabilistic Model (DDPM) with the ConvNeXt-V2 architecture. This technique addresses the issue of limited data availability by utilizing single images using the BSD and Places365 datasets, significantly increasing the ability of the model through different conditions. The research greatly enhances the image quality by including Global Response Normalization (GRN) and Sigmoid-Weighted Linear Units (SiLU) in the DDPM. In-depth analyses and comparisons with the existing State-of-the-art (SOTA) models highlight the model’s effectiveness, which shows higher experimental results. Achievements include a Pixel Diversity score of 0.87±0.1, an LPIPS Diversity score of 0.42±0.03, and a SIFID for Patch Distribution of 0.046±0.02, along with notable NIQE and RECO scores. These findings indicate the exceptional ability of the model to generate a wide range of high-quality images, exhibiting significant advancement over existing State-of-the-art models in the field of image generation.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224001589","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In the rapidly evolving domain of image generation, the availability of sufficient data is crucial for effective model training. However, obtaining a large dataset is often challenging. Medical imaging, industrial monitoring, and self-driving cars are among the applications that require high-fidelity image generation from limited or single data points. The paper proposes a novel approach for increasing the diversity of images generated from a single input image by combining a Denoising Diffusion Probabilistic Model (DDPM) with the ConvNeXt-V2 architecture. This technique addresses the issue of limited data availability by utilizing single images using the BSD and Places365 datasets, significantly increasing the ability of the model through different conditions. The research greatly enhances the image quality by including Global Response Normalization (GRN) and Sigmoid-Weighted Linear Units (SiLU) in the DDPM. In-depth analyses and comparisons with the existing State-of-the-art (SOTA) models highlight the model’s effectiveness, which shows higher experimental results. Achievements include a Pixel Diversity score of 0.87±0.1, an LPIPS Diversity score of 0.42±0.03, and a SIFID for Patch Distribution of 0.046±0.02, along with notable NIQE and RECO scores. These findings indicate the exceptional ability of the model to generate a wide range of high-quality images, exhibiting significant advancement over existing State-of-the-art models in the field of image generation.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems