A Practical Gated Recurrent Transformer Network Incorporating Multiple Fusions for Video Denoising

arXiv - EE - Image and Video Processing Pub Date : 2024-09-10 DOI:arxiv-2409.06603

Kai Guo, Seungwon Choi, Jongseong Choi, Lae-Hoon Kim

{"title":"A Practical Gated Recurrent Transformer Network Incorporating Multiple Fusions for Video Denoising","authors":"Kai Guo, Seungwon Choi, Jongseong Choi, Lae-Hoon Kim","doi":"arxiv-2409.06603","DOIUrl":null,"url":null,"abstract":"State-of-the-art (SOTA) video denoising methods employ multi-frame\nsimultaneous denoising mechanisms, resulting in significant delays (e.g., 16\nframes), making them impractical for real-time cameras. To overcome this\nlimitation, we propose a multi-fusion gated recurrent Transformer network\n(GRTN) that achieves SOTA denoising performance with only a single-frame delay.\nSpecifically, the spatial denoising module extracts features from the current\nframe, while the reset gate selects relevant information from the previous\nframe and fuses it with current frame features via the temporal denoising\nmodule. The update gate then further blends this result with the previous frame\nfeatures, and the reconstruction module integrates it with the current frame.\nTo robustly compute attention for noisy features, we propose a residual\nsimplified Swin Transformer with Euclidean distance (RSSTE) in the spatial and\ntemporal denoising modules. Comparative objective and subjective results show\nthat our GRTN achieves denoising performance comparable to SOTA multi-frame\ndelay networks, with only a single-frame delay.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Image and Video Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

State-of-the-art (SOTA) video denoising methods employ multi-frame simultaneous denoising mechanisms, resulting in significant delays (e.g., 16 frames), making them impractical for real-time cameras. To overcome this limitation, we propose a multi-fusion gated recurrent Transformer network (GRTN) that achieves SOTA denoising performance with only a single-frame delay. Specifically, the spatial denoising module extracts features from the current frame, while the reset gate selects relevant information from the previous frame and fuses it with current frame features via the temporal denoising module. The update gate then further blends this result with the previous frame features, and the reconstruction module integrates it with the current frame. To robustly compute attention for noisy features, we propose a residual simplified Swin Transformer with Euclidean distance (RSSTE) in the spatial and temporal denoising modules. Comparative objective and subjective results show that our GRTN achieves denoising performance comparable to SOTA multi-frame delay networks, with only a single-frame delay.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于视频去噪的包含多重融合的实用门控循环变压器网络

最先进的（SOTA）视频去噪方法采用了多帧同时去噪机制，导致显著的延迟（例如 16 帧），使其不适用于实时摄像机。为了克服这一限制，我们提出了一种多融合门控递归变换器网络（GRTN），它只需单帧延迟就能实现 SOTA 去噪性能。具体来说，空间去噪模块从当前帧中提取特征，而重置门则从先前帧中选择相关信息，并通过时间去噪模块将其与当前帧特征融合。为了稳健地计算噪声特征的关注度，我们在空间和时间去噪模块中提出了带欧氏距离的残差简化斯文变换器（RSSTE）。客观和主观的比较结果表明，我们的 GRTN 在仅有单帧延迟的情况下实现了与 SOTA 多帧延迟网络相当的去噪性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - EE - Image and Video Processing

自引率

0.00%

发文量