A Unified Framework for Event-Based Frame Interpolation With Ad-Hoc Deblurring in the Wild

Lei Sun;Daniel Gehrig;Christos Sakaridis;Mathias Gehrig;Jingyun Liang;Peng Sun;Zhijie Xu;Kaiwei Wang;Luc Van Gool;Davide Scaramuzza
{"title":"A Unified Framework for Event-Based Frame Interpolation With Ad-Hoc Deblurring in the Wild","authors":"Lei Sun;Daniel Gehrig;Christos Sakaridis;Mathias Gehrig;Jingyun Liang;Peng Sun;Zhijie Xu;Kaiwei Wang;Luc Van Gool;Davide Scaramuzza","doi":"10.1109/TPAMI.2024.3510690","DOIUrl":null,"url":null,"abstract":"Effective video frame interpolation hinges on the adept handling of motion in the input scene. Prior work acknowledges asynchronous event information for this, but often overlooks whether motion induces blur in the video, limiting its scope to sharp frame interpolation. We instead propose a unified framework for event-based frame interpolation that performs deblurring ad-hoc and thus works both on sharp and blurry input videos. Our model consists in a bidirectional recurrent network that incorporates the temporal dimension of interpolation and fuses information from the input frames and the events adaptively based on their temporal proximity. To enhance the generalization from synthetic data to real event cameras, we integrate self-supervised framework with the proposed model to enhance the generalization on real-world datasets in the wild. At the dataset level, we introduce a novel real-world high-resolution dataset with events and color videos named HighREV, which provides a challenging evaluation setting for the examined task. Extensive experiments show that our network consistently outperforms previous state-of-the-art methods on frame interpolation, single image deblurring, and the joint task of both. Experiments on domain transfer reveal that self-supervised training effectively mitigates the performance degradation observed when transitioning from synthetic data to real-world data. Code and datasets are available at <uri>https://github.com/AHupuJR/REFID</uri>.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2265-2279"},"PeriodicalIF":18.6000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10794600/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Effective video frame interpolation hinges on the adept handling of motion in the input scene. Prior work acknowledges asynchronous event information for this, but often overlooks whether motion induces blur in the video, limiting its scope to sharp frame interpolation. We instead propose a unified framework for event-based frame interpolation that performs deblurring ad-hoc and thus works both on sharp and blurry input videos. Our model consists in a bidirectional recurrent network that incorporates the temporal dimension of interpolation and fuses information from the input frames and the events adaptively based on their temporal proximity. To enhance the generalization from synthetic data to real event cameras, we integrate self-supervised framework with the proposed model to enhance the generalization on real-world datasets in the wild. At the dataset level, we introduce a novel real-world high-resolution dataset with events and color videos named HighREV, which provides a challenging evaluation setting for the examined task. Extensive experiments show that our network consistently outperforms previous state-of-the-art methods on frame interpolation, single image deblurring, and the joint task of both. Experiments on domain transfer reveal that self-supervised training effectively mitigates the performance degradation observed when transitioning from synthetic data to real-world data. Code and datasets are available at https://github.com/AHupuJR/REFID.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种基于事件的帧插值的统一框架
有效的视频帧插值取决于熟练处理输入场景中的运动。先前的工作为此承认异步事件信息,但通常忽略了运动是否会导致视频中的模糊,将其范围限制在锐利的帧插值。相反,我们提出了一个统一的框架,用于基于事件的帧插值,该框架执行非固定的去模糊处理,因此可以在清晰和模糊的输入视频上工作。我们的模型由一个双向循环网络组成,该网络结合了插值的时间维度,并根据输入帧和事件的时间接近度自适应地融合了信息。为了增强从合成数据到真实事件相机的泛化,我们将自监督框架与所提出的模型集成在一起,以增强对野外真实数据集的泛化。在数据集层面,我们引入了一个新的真实世界的高分辨率数据集,其中包含事件和彩色视频,名为HighREV,它为所检查的任务提供了一个具有挑战性的评估设置。大量的实验表明,我们的网络在帧插值、单幅图像去模糊以及两者的联合任务上始终优于以前最先进的方法。领域转移实验表明,自监督训练有效地缓解了从合成数据到真实数据转换时所观察到的性能下降。代码和数据集可在https://github.com/AHupuJR/REFID上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
GrowSP++: Growing Superpoints and Primitives for Unsupervised 3D Semantic Segmentation. Unsupervised Gaze Representation Learning by Switching Features. H2OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers. MV2DFusion: Leveraging Modality-Specific Object Semantics for Multi-Modal 3D Detection. Parse Trees Guided LLM Prompt Compression.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1