Wen Yang;Jinjian Wu;Jupo Ma;Leida Li;Weisheng Dong;Guangming Shi
{"title":"Learning Frame-Event Fusion for Motion Deblurring","authors":"Wen Yang;Jinjian Wu;Jupo Ma;Leida Li;Weisheng Dong;Guangming Shi","doi":"10.1109/TIP.2024.3512362","DOIUrl":null,"url":null,"abstract":"Motion deblurring is a highly ill-posed problem due to the significant loss of motion information in the blurring process. Complementary informative features from auxiliary sensors such as event cameras can be explored for guiding motion deblurring. The event camera can capture rich motion information asynchronously with microsecond accuracy. In this paper, a novel frame-event fusion framework is proposed for event-driven motion deblurring (FEF-Deblur), which can sufficiently explore long-range cross-modal information interactions. Firstly, different modalities are usually complementary and also redundant. Cross-modal fusion is modeled as complementary-unique features separation-and-aggregation, avoiding the modality redundancy. Unique features and complementary features are first inferred with parallel intra-modal self-attention and inter-modal cross-attention respectively. After that, a correlation-based constraint is designed to act between unique and complementary features to facilitate their differentiation, which assists in cross-modal redundancy suppression. Additionally, spatio-temporal dependencies among neighboring inputs are crucial for motion deblurring. A recurrent cross attention is introduced to preserve inter-input attention information, in which the current spatial features and aggregated temporal features are attending to each other by establishing the long-range interaction between them. Extensive experiments on both synthetic and real-world motion deblurring datasets demonstrate our method outperforms state-of-the-art event-based and image/video-based methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6836-6849"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10794620/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Motion deblurring is a highly ill-posed problem due to the significant loss of motion information in the blurring process. Complementary informative features from auxiliary sensors such as event cameras can be explored for guiding motion deblurring. The event camera can capture rich motion information asynchronously with microsecond accuracy. In this paper, a novel frame-event fusion framework is proposed for event-driven motion deblurring (FEF-Deblur), which can sufficiently explore long-range cross-modal information interactions. Firstly, different modalities are usually complementary and also redundant. Cross-modal fusion is modeled as complementary-unique features separation-and-aggregation, avoiding the modality redundancy. Unique features and complementary features are first inferred with parallel intra-modal self-attention and inter-modal cross-attention respectively. After that, a correlation-based constraint is designed to act between unique and complementary features to facilitate their differentiation, which assists in cross-modal redundancy suppression. Additionally, spatio-temporal dependencies among neighboring inputs are crucial for motion deblurring. A recurrent cross attention is introduced to preserve inter-input attention information, in which the current spatial features and aggregated temporal features are attending to each other by establishing the long-range interaction between them. Extensive experiments on both synthetic and real-world motion deblurring datasets demonstrate our method outperforms state-of-the-art event-based and image/video-based methods.