Patch Inverter: A Novel Block-Wise GAN Inversion Method for Arbitrary Image Resolutions

IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Signal Processing Letters Pub Date : 2024-11-27 DOI:10.1109/LSP.2024.3506859
Yifei Li;Mai Xu;Shengxi Li;Jialu Zhang;Zhenyu Guan
{"title":"Patch Inverter: A Novel Block-Wise GAN Inversion Method for Arbitrary Image Resolutions","authors":"Yifei Li;Mai Xu;Shengxi Li;Jialu Zhang;Zhenyu Guan","doi":"10.1109/LSP.2024.3506859","DOIUrl":null,"url":null,"abstract":"Generative adversarial networks (GANs) have achieved remarkable progress in generating realistic images from merely small dimensions, which essentially establishes the latent generating space by rich semantics. GAN inversion thus aims at mapping real-world images back into the latent space, allowing for the access of semantics from images. However, existing GAN inversion methods can only invert images with fixed resolutions; this significantly restricts the representation capability in real-world scenarios. To address this issue, we propose to invert images by patches, thus named as patch inverter, which is the first attempt in terms of block-wise inversion for arbitrary resolutions. More specifically, we develop the padding-free operation to ensure the continuity across patches, and analyse the intrinsic mismatch within the inversion procedure. To relieve the mismatch, we propose a shifted convolution operation, which retains the continuity across image patches and simultaneously enlarges the receptive field for each convolution layer. We further propose the reciprocal loss to regularize the inverted latent codes to reside on the original latent generating space, such that the rich semantics can be maximally preserved. Experimental results have demonstrated that our patch inverter is able to accurately invert images with arbitrary resolutions, whilst representing precise and rich image semantics in real-world scenarios.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"171-175"},"PeriodicalIF":3.2000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10767750/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Generative adversarial networks (GANs) have achieved remarkable progress in generating realistic images from merely small dimensions, which essentially establishes the latent generating space by rich semantics. GAN inversion thus aims at mapping real-world images back into the latent space, allowing for the access of semantics from images. However, existing GAN inversion methods can only invert images with fixed resolutions; this significantly restricts the representation capability in real-world scenarios. To address this issue, we propose to invert images by patches, thus named as patch inverter, which is the first attempt in terms of block-wise inversion for arbitrary resolutions. More specifically, we develop the padding-free operation to ensure the continuity across patches, and analyse the intrinsic mismatch within the inversion procedure. To relieve the mismatch, we propose a shifted convolution operation, which retains the continuity across image patches and simultaneously enlarges the receptive field for each convolution layer. We further propose the reciprocal loss to regularize the inverted latent codes to reside on the original latent generating space, such that the rich semantics can be maximally preserved. Experimental results have demonstrated that our patch inverter is able to accurately invert images with arbitrary resolutions, whilst representing precise and rich image semantics in real-world scenarios.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
贴片逆变器:一种适用于任意图像分辨率的GAN分块反演方法
生成式对抗网络(GAN)在从小维度生成逼真图像方面取得了显著进展,这从根本上通过丰富的语义建立了潜在生成空间。因此,GAN 反演旨在将现实世界的图像映射回潜在空间,从而从图像中获取语义。然而,现有的 GAN 反演方法只能反演具有固定分辨率的图像,这大大限制了真实世界场景中的表示能力。为了解决这个问题,我们提出了通过补丁反转图像的方法,因此被命名为补丁反转,这是首次尝试针对任意分辨率的分块反转。更具体地说,我们开发了无填充操作,以确保跨补丁的连续性,并分析了反转过程中的内在不匹配问题。为了缓解这种不匹配,我们提出了一种移位卷积操作,它既能保持图像斑块间的连续性,又能同时扩大每个卷积层的感受野。我们还进一步提出了倒易损失法,将反转潜码正则化,使其驻留在原始潜码生成空间,从而最大限度地保留了丰富的语义。实验结果表明,我们的补丁反相器能够准确反相任意分辨率的图像,同时在真实世界场景中呈现精确而丰富的图像语义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Signal Processing Letters
IEEE Signal Processing Letters 工程技术-工程:电子与电气
CiteScore
7.40
自引率
12.80%
发文量
339
审稿时长
2.8 months
期刊介绍: The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.
期刊最新文献
Heterogeneous Dual-Branch Emotional Consistency Network for Facial Expression Recognition Adaptive Superpixel-Guided Non-Homogeneous Image Dehazing Video Inpainting Localization With Contrastive Learning Cross-View Fusion for Multi-View Clustering Piecewise Student's t-distribution Mixture Model-Based Estimation for NAND Flash Memory Channels
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1