视频超分辨率扩散模型中学习帧间时间相干

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Access Pub Date : 2025-01-14 DOI:10.1109/ACCESS.2025.3529758

Linlin Liu;Lele Niu;Jun Tang;Yong Ding

{"title":"视频超分辨率扩散模型中学习帧间时间相干","authors":"Linlin Liu;Lele Niu;Jun Tang;Yong Ding","doi":"10.1109/ACCESS.2025.3529758","DOIUrl":null,"url":null,"abstract":"Video Super-Resolution (VSR) aims to reconstruct high-quality high-resolution (HR) videos from low-resolution (LR) inputs. Recent studies have explored diffusion models (DMs) for VSR by exploiting their generative priors to produce realistic details. However, the inherent randomness of diffusion models presents significant challenges for controlling content. In particular, current DM-based VSR methods often neglect inter-frame temporal coherence and reconstruction-oriented objectives, leading to visual distortion and temporal inconsistency. In this paper, we introduce VSRDiff, a DM-based framework for VSR that emphasizes inter-frame temporal coherence and adopts a novel reconstruction perspective. Specifically, the Inter-Frame Aggregation Guidance (IFAG) module is developed to learn contextual inter-frame aggregation guidance, alleviating visual distortion caused by the randomness of diffusion models. Furthermore, the Progressive Reconstruction Sampling (PRS) approach is employed to generate reconstruction-oriented latents, balancing fidelity and detail richness. Additionally, temporal consistency is enhanced through second-order bidirectional latent propagation using the Flow-guided Latent Correction (FLC) module. Extensive experiments on the REDS4 and Vid4 datasets demonstrate that VSRDiff achieves highly competitive VSR performance with more realistic details, surpassing existing state-of-the-art methods in both visual fidelity and temporal consistency. Specifically, VSRDiff achieves the best scores on the REDS4 dataset in LPIPS, DISTS, and NIQE, with values of 0.1137, 0.0445, and 2.970, respectively. The result will be released at <uri>https://github.com/aigcvsr/VSRDiff</uri>.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"11447-11462"},"PeriodicalIF":3.4000,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10840194","citationCount":"0","resultStr":"{\"title\":\"VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution\",\"authors\":\"Linlin Liu;Lele Niu;Jun Tang;Yong Ding\",\"doi\":\"10.1109/ACCESS.2025.3529758\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video Super-Resolution (VSR) aims to reconstruct high-quality high-resolution (HR) videos from low-resolution (LR) inputs. Recent studies have explored diffusion models (DMs) for VSR by exploiting their generative priors to produce realistic details. However, the inherent randomness of diffusion models presents significant challenges for controlling content. In particular, current DM-based VSR methods often neglect inter-frame temporal coherence and reconstruction-oriented objectives, leading to visual distortion and temporal inconsistency. In this paper, we introduce VSRDiff, a DM-based framework for VSR that emphasizes inter-frame temporal coherence and adopts a novel reconstruction perspective. Specifically, the Inter-Frame Aggregation Guidance (IFAG) module is developed to learn contextual inter-frame aggregation guidance, alleviating visual distortion caused by the randomness of diffusion models. Furthermore, the Progressive Reconstruction Sampling (PRS) approach is employed to generate reconstruction-oriented latents, balancing fidelity and detail richness. Additionally, temporal consistency is enhanced through second-order bidirectional latent propagation using the Flow-guided Latent Correction (FLC) module. Extensive experiments on the REDS4 and Vid4 datasets demonstrate that VSRDiff achieves highly competitive VSR performance with more realistic details, surpassing existing state-of-the-art methods in both visual fidelity and temporal consistency. Specifically, VSRDiff achieves the best scores on the REDS4 dataset in LPIPS, DISTS, and NIQE, with values of 0.1137, 0.0445, and 2.970, respectively. The result will be released at <uri>https://github.com/aigcvsr/VSRDiff</uri>.\",\"PeriodicalId\":13079,\"journal\":{\"name\":\"IEEE Access\",\"volume\":\"13 \",\"pages\":\"11447-11462\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-01-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10840194\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Access\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10840194/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10840194/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

视频超分辨率（VSR）旨在从低分辨率（LR）输入中重建高质量的高分辨率（HR）视频。最近的研究通过利用VSR的生成先验来探索扩散模型（DMs）以产生真实的细节。然而，扩散模型固有的随机性对控制内容提出了重大挑战。特别是，当前基于数据的VSR方法往往忽略帧间时间相干性和面向重建的目标，导致视觉失真和时间不一致。在本文中，我们介绍了VSRDiff，这是一个基于dm的VSR框架，它强调帧间时间相干性，并采用了一种新的重建视角。具体而言，开发了帧间聚合制导（IFAG）模块，学习上下文帧间聚合制导，减轻扩散模型随机性造成的视觉失真。此外，采用渐进式重构采样（PRS）方法生成面向重构的电位，平衡图像保真度和细节丰富度。此外，利用流导潜校正（Flow-guided latent Correction， FLC）模块，通过二阶双向潜传播增强了时间一致性。在REDS4和Vid4数据集上进行的大量实验表明，VSRDiff在视觉保真度和时间一致性方面超越了现有的最先进方法，具有更逼真的细节，具有极具竞争力的VSR性能。其中，VSRDiff在LPIPS、DISTS和NIQE中在REDS4数据集上的得分最高，分别为0.1137、0.0445和2.970。结果将在https://github.com/aigcvsr/VSRDiff上公布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution

Video Super-Resolution (VSR) aims to reconstruct high-quality high-resolution (HR) videos from low-resolution (LR) inputs. Recent studies have explored diffusion models (DMs) for VSR by exploiting their generative priors to produce realistic details. However, the inherent randomness of diffusion models presents significant challenges for controlling content. In particular, current DM-based VSR methods often neglect inter-frame temporal coherence and reconstruction-oriented objectives, leading to visual distortion and temporal inconsistency. In this paper, we introduce VSRDiff, a DM-based framework for VSR that emphasizes inter-frame temporal coherence and adopts a novel reconstruction perspective. Specifically, the Inter-Frame Aggregation Guidance (IFAG) module is developed to learn contextual inter-frame aggregation guidance, alleviating visual distortion caused by the randomness of diffusion models. Furthermore, the Progressive Reconstruction Sampling (PRS) approach is employed to generate reconstruction-oriented latents, balancing fidelity and detail richness. Additionally, temporal consistency is enhanced through second-order bidirectional latent propagation using the Flow-guided Latent Correction (FLC) module. Extensive experiments on the REDS4 and Vid4 datasets demonstrate that VSRDiff achieves highly competitive VSR performance with more realistic details, surpassing existing state-of-the-art methods in both visual fidelity and temporal consistency. Specifically, VSRDiff achieves the best scores on the REDS4 dataset in LPIPS, DISTS, and NIQE, with values of 0.1137, 0.0445, and 2.970, respectively. The result will be released at https://github.com/aigcvsr/VSRDiff.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

9.80

自引率

7.70%

发文量

6673

审稿时长

6 weeks

期刊介绍： IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.