Leveraging vision-language prompts for real-world image restoration and enhancement

IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computer Vision and Image Understanding Pub Date : 2024-11-16 DOI:10.1016/j.cviu.2024.104222
Yanyan Wei , Yilin Zhang , Kun Li , Fei Wang , Shengeng Tang , Zhao Zhang
{"title":"Leveraging vision-language prompts for real-world image restoration and enhancement","authors":"Yanyan Wei ,&nbsp;Yilin Zhang ,&nbsp;Kun Li ,&nbsp;Fei Wang ,&nbsp;Shengeng Tang ,&nbsp;Zhao Zhang","doi":"10.1016/j.cviu.2024.104222","DOIUrl":null,"url":null,"abstract":"<div><div>Significant advancements have been made in image restoration methods aimed at removing adverse weather effects. However, due to natural constraints, it is challenging to collect real-world datasets for adverse weather removal tasks. Consequently, existing methods predominantly rely on synthetic datasets, which struggle to generalize to real-world data, thereby limiting their practical utility. While some real-world adverse weather removal datasets have emerged, their design, which involves capturing ground truths at a different moment, inevitably introduces interfering discrepancies between the degraded images and the ground truths. These discrepancies include variations in brightness, color, contrast, and minor misalignments. Meanwhile, real-world datasets typically involve complex rather than singular degradation types. In many samples, degradation features are not overt, which poses immense challenges to real-world adverse weather removal methodologies. To tackle these issues, we introduce the recently prominent vision-language model, CLIP, to aid in the image restoration process. An expanded and fine-tuned CLIP model acts as an ‘expert’, leveraging the image priors acquired through large-scale pre-training to guide the operation of the image restoration model. Additionally, we generate a set of pseudo-ground-truths on sequences of degraded images to further alleviate the difficulty for the model in fitting the data. To imbue the model with more prior knowledge about degradation characteristics, we also incorporate additional synthetic training data. Lastly, the progressive learning and fine-tuning strategies employed during training enhance the model’s final performance, enabling our method to surpass existing approaches in both visual quality and objective image quality assessment metrics.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"250 ","pages":"Article 104222"},"PeriodicalIF":4.3000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224003035","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Significant advancements have been made in image restoration methods aimed at removing adverse weather effects. However, due to natural constraints, it is challenging to collect real-world datasets for adverse weather removal tasks. Consequently, existing methods predominantly rely on synthetic datasets, which struggle to generalize to real-world data, thereby limiting their practical utility. While some real-world adverse weather removal datasets have emerged, their design, which involves capturing ground truths at a different moment, inevitably introduces interfering discrepancies between the degraded images and the ground truths. These discrepancies include variations in brightness, color, contrast, and minor misalignments. Meanwhile, real-world datasets typically involve complex rather than singular degradation types. In many samples, degradation features are not overt, which poses immense challenges to real-world adverse weather removal methodologies. To tackle these issues, we introduce the recently prominent vision-language model, CLIP, to aid in the image restoration process. An expanded and fine-tuned CLIP model acts as an ‘expert’, leveraging the image priors acquired through large-scale pre-training to guide the operation of the image restoration model. Additionally, we generate a set of pseudo-ground-truths on sequences of degraded images to further alleviate the difficulty for the model in fitting the data. To imbue the model with more prior knowledge about degradation characteristics, we also incorporate additional synthetic training data. Lastly, the progressive learning and fine-tuning strategies employed during training enhance the model’s final performance, enabling our method to surpass existing approaches in both visual quality and objective image quality assessment metrics.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用视觉语言提示进行真实世界图像修复和增强
旨在消除不利天气影响的图像修复方法取得了重大进展。然而,由于自然条件的限制,收集真实世界的数据集来完成消除不利天气影响的任务具有挑战性。因此,现有的方法主要依赖于合成数据集,而合成数据集很难推广到真实世界的数据中,从而限制了这些方法的实用性。虽然已经出现了一些真实世界的不利天气消除数据集,但其设计涉及在不同时刻捕捉地面实况,不可避免地会在降级图像和地面实况之间引入干扰性差异。这些差异包括亮度、颜色、对比度的变化以及轻微的错位。同时,真实世界的数据集通常涉及复杂而非单一的退化类型。在许多样本中,退化特征并不明显,这给现实世界中的不利天气消除方法带来了巨大挑战。为了解决这些问题,我们引入了最近突出的视觉语言模型 CLIP 来帮助图像修复过程。经过扩展和微调的 CLIP 模型就像一个 "专家",利用通过大规模预训练获得的图像先验来指导图像修复模型的运行。此外,我们还在退化图像序列上生成了一组伪地面真实值,以进一步减轻模型拟合数据的难度。为了给模型注入更多关于降解特征的先验知识,我们还加入了额外的合成训练数据。最后,在训练过程中采用的渐进学习和微调策略提高了模型的最终性能,使我们的方法在视觉质量和客观图像质量评估指标方面都超越了现有方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer Vision and Image Understanding
Computer Vision and Image Understanding 工程技术-工程:电子与电气
CiteScore
7.80
自引率
4.40%
发文量
112
审稿时长
79 days
期刊介绍: The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems
期刊最新文献
Editorial Board UATST: Towards unpaired arbitrary text-guided style transfer with cross-space modulation Multi-Scale Adaptive Skeleton Transformer for action recognition Open-set domain adaptation with visual-language foundation models Leveraging vision-language prompts for real-world image restoration and enhancement
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1