Transformer-based image and video inpainting: current challenges and future directions

IF 10.7 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence Review Pub Date : 2025-02-05 DOI:10.1007/s10462-024-11075-9
Omar Elharrouss, Rafat Damseh, Abdelkader Nasreddine Belkacem, Elarbi Badidi, Abderrahmane Lakas
{"title":"Transformer-based image and video inpainting: current challenges and future directions","authors":"Omar Elharrouss,&nbsp;Rafat Damseh,&nbsp;Abdelkader Nasreddine Belkacem,&nbsp;Elarbi Badidi,&nbsp;Abderrahmane Lakas","doi":"10.1007/s10462-024-11075-9","DOIUrl":null,"url":null,"abstract":"<div><p>Image inpainting is currently a hot topic within the field of computer vision. It offers a viable solution for various applications, including photographic restoration, video editing, and medical imaging. Deep learning advancements, notably convolutional neural networks (CNNs) and generative adversarial networks (GANs), have significantly enhanced the inpainting task with an improved capability to fill missing or damaged regions in an image or a video through the incorporation of contextually appropriate details. These advancements have improved other aspects, including efficiency, information preservation, and achieving both realistic textures and structures. Recently, Vision Transformers (ViTs) have been exploited and offer some improvements to image or video inpainting. The advent of transformer-based architectures, which were initially designed for natural language processing, has also been integrated into computer vision tasks. These methods utilize self-attention mechanisms that excel in capturing long-range dependencies within data; therefore, they are particularly effective for tasks requiring a comprehensive understanding of the global context of an image or video. In this paper, we provide a comprehensive review of the current image/video inpainting approaches, with a specific focus on Vision Transformer (ViT) techniques, with the goal to highlight the significant improvements and provide a guideline for new researchers in the field of image/video inpainting using vision transformers. We categorized the transformer-based techniques by their architectural configurations, types of damage, and performance metrics. Furthermore, we present an organized synthesis of the current challenges, and suggest directions for future research in the field of image or video inpainting.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 4","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-11075-9.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-11075-9","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Image inpainting is currently a hot topic within the field of computer vision. It offers a viable solution for various applications, including photographic restoration, video editing, and medical imaging. Deep learning advancements, notably convolutional neural networks (CNNs) and generative adversarial networks (GANs), have significantly enhanced the inpainting task with an improved capability to fill missing or damaged regions in an image or a video through the incorporation of contextually appropriate details. These advancements have improved other aspects, including efficiency, information preservation, and achieving both realistic textures and structures. Recently, Vision Transformers (ViTs) have been exploited and offer some improvements to image or video inpainting. The advent of transformer-based architectures, which were initially designed for natural language processing, has also been integrated into computer vision tasks. These methods utilize self-attention mechanisms that excel in capturing long-range dependencies within data; therefore, they are particularly effective for tasks requiring a comprehensive understanding of the global context of an image or video. In this paper, we provide a comprehensive review of the current image/video inpainting approaches, with a specific focus on Vision Transformer (ViT) techniques, with the goal to highlight the significant improvements and provide a guideline for new researchers in the field of image/video inpainting using vision transformers. We categorized the transformer-based techniques by their architectural configurations, types of damage, and performance metrics. Furthermore, we present an organized synthesis of the current challenges, and suggest directions for future research in the field of image or video inpainting.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
图像着色是当前计算机视觉领域的热门话题。它为各种应用提供了可行的解决方案,包括照片修复、视频编辑和医学成像。深度学习的进步,特别是卷积神经网络(CNN)和生成对抗网络(GAN)的进步,大大增强了内绘任务的能力,通过结合上下文的适当细节,填补图像或视频中缺失或损坏的区域。这些进步还改善了其他方面,包括效率、信息保存以及实现逼真的纹理和结构。最近,视觉变换器(ViTs)得到了利用,并为图像或视频的绘制提供了一些改进。基于变换器的架构最初是为自然语言处理而设计的,这种架构的出现也融入了计算机视觉任务中。这些方法利用自我关注机制,擅长捕捉数据中的长距离依赖关系;因此,它们对于需要全面了解图像或视频全局背景的任务特别有效。在本文中,我们对当前的图像/视频绘制方法进行了全面回顾,并特别关注了视觉变换器(ViT)技术,目的是强调这些方法的显著改进,并为使用视觉变换器进行图像/视频绘制领域的新研究人员提供指导。我们按照架构配置、损坏类型和性能指标对基于变换器的技术进行了分类。此外,我们还对当前面临的挑战进行了有条理的总结,并提出了图像或视频着色领域的未来研究方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Artificial Intelligence Review
Artificial Intelligence Review 工程技术-计算机:人工智能
CiteScore
22.00
自引率
3.30%
发文量
194
审稿时长
5.3 months
期刊介绍: Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.
期刊最新文献
Text analytics for co-creation in public sector organizations: a literature review-based research framework New fuzzy zeroing neural network with noise suppression capability for time-varying linear equation solving Tornado optimizer with Coriolis force: a novel bio-inspired meta-heuristic algorithm for solving engineering problems Transformer-based image and video inpainting: current challenges and future directions Advanced Hybridization and Optimization of DNNs for Medical Imaging: A Survey on Disease Detection Techniques
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1