Patch Inverter: A Novel Block-Wise GAN Inversion Method for Arbitrary Image Resolutions

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Signal Processing Letters Pub Date : 2024-11-27 DOI:10.1109/LSP.2024.3506859

Yifei Li;Mai Xu;Shengxi Li;Jialu Zhang;Zhenyu Guan

{"title":"Patch Inverter: A Novel Block-Wise GAN Inversion Method for Arbitrary Image Resolutions","authors":"Yifei Li;Mai Xu;Shengxi Li;Jialu Zhang;Zhenyu Guan","doi":"10.1109/LSP.2024.3506859","DOIUrl":null,"url":null,"abstract":"Generative adversarial networks (GANs) have achieved remarkable progress in generating realistic images from merely small dimensions, which essentially establishes the latent generating space by rich semantics. GAN inversion thus aims at mapping real-world images back into the latent space, allowing for the access of semantics from images. However, existing GAN inversion methods can only invert images with fixed resolutions; this significantly restricts the representation capability in real-world scenarios. To address this issue, we propose to invert images by patches, thus named as patch inverter, which is the first attempt in terms of block-wise inversion for arbitrary resolutions. More specifically, we develop the padding-free operation to ensure the continuity across patches, and analyse the intrinsic mismatch within the inversion procedure. To relieve the mismatch, we propose a shifted convolution operation, which retains the continuity across image patches and simultaneously enlarges the receptive field for each convolution layer. We further propose the reciprocal loss to regularize the inverted latent codes to reside on the original latent generating space, such that the rich semantics can be maximally preserved. Experimental results have demonstrated that our patch inverter is able to accurately invert images with arbitrary resolutions, whilst representing precise and rich image semantics in real-world scenarios.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"171-175"},"PeriodicalIF":3.2000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10767750/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Generative adversarial networks (GANs) have achieved remarkable progress in generating realistic images from merely small dimensions, which essentially establishes the latent generating space by rich semantics. GAN inversion thus aims at mapping real-world images back into the latent space, allowing for the access of semantics from images. However, existing GAN inversion methods can only invert images with fixed resolutions; this significantly restricts the representation capability in real-world scenarios. To address this issue, we propose to invert images by patches, thus named as patch inverter, which is the first attempt in terms of block-wise inversion for arbitrary resolutions. More specifically, we develop the padding-free operation to ensure the continuity across patches, and analyse the intrinsic mismatch within the inversion procedure. To relieve the mismatch, we propose a shifted convolution operation, which retains the continuity across image patches and simultaneously enlarges the receptive field for each convolution layer. We further propose the reciprocal loss to regularize the inverted latent codes to reside on the original latent generating space, such that the rich semantics can be maximally preserved. Experimental results have demonstrated that our patch inverter is able to accurately invert images with arbitrary resolutions, whilst representing precise and rich image semantics in real-world scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

贴片逆变器：一种适用于任意图像分辨率的GAN分块反演方法

生成式对抗网络（GAN）在从小维度生成逼真图像方面取得了显著进展，这从根本上通过丰富的语义建立了潜在生成空间。因此，GAN 反演旨在将现实世界的图像映射回潜在空间，从而从图像中获取语义。然而，现有的 GAN 反演方法只能反演具有固定分辨率的图像，这大大限制了真实世界场景中的表示能力。为了解决这个问题，我们提出了通过补丁反转图像的方法，因此被命名为补丁反转，这是首次尝试针对任意分辨率的分块反转。更具体地说，我们开发了无填充操作，以确保跨补丁的连续性，并分析了反转过程中的内在不匹配问题。为了缓解这种不匹配，我们提出了一种移位卷积操作，它既能保持图像斑块间的连续性，又能同时扩大每个卷积层的感受野。我们还进一步提出了倒易损失法，将反转潜码正则化，使其驻留在原始潜码生成空间，从而最大限度地保留了丰富的语义。实验结果表明，我们的补丁反相器能够准确反相任意分辨率的图像，同时在真实世界场景中呈现精确而丰富的图像语义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.

期刊最新文献

Heterogeneous Dual-Branch Emotional Consistency Network for Facial Expression Recognition Adaptive Superpixel-Guided Non-Homogeneous Image Dehazing Video Inpainting Localization With Contrastive Learning Cross-View Fusion for Multi-View Clustering Piecewise Student's t-distribution Mixture Model-Based Estimation for NAND Flash Memory Channels