MSI-DTrans: A multi-focus image fusion using multilayer semantic interaction and dynamic transformer

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Displays Pub Date : 2024-09-17 DOI:10.1016/j.displa.2024.102837

Hao Zhai, Yuncan Ouyang, Nannan Luo, Lianhua Chen, Zhi Zeng

{"title":"MSI-DTrans: A multi-focus image fusion using multilayer semantic interaction and dynamic transformer","authors":"Hao Zhai, Yuncan Ouyang, Nannan Luo, Lianhua Chen, Zhi Zeng","doi":"10.1016/j.displa.2024.102837","DOIUrl":null,"url":null,"abstract":"<div><p>Multi-focus image fusion (MFIF) aims to utilize multiple images with different focal lengths to fuse into a single full-focus image. This process enhances the realism and clarity of the resulting image. In this paper, a MFIF method called MSI-DTrans was proposed. On the one hand, in order to fully utilize all the effective information that the source image carries, the proposed method adopts a multilayer semantic interaction strategy to enhance the interaction of high-frequency and low-frequency information. This approach gradually mines more abstract semantic information, guiding the generation of feature maps from coarse to fine. On the other hand, a parallel multi-scale joint self-attention computation model is designed. The model adopts dynamic sense field and dynamic token embedding to overcome the performance degradation problem when dealing with multi-scale objects. This enables self-attention to integrate long-range dependencies between objects of different scales and reduces computational overhead. Numerous experimental results show that the proposed method effectively avoids image distortion, achieves better visualization results, and demonstrates good competitiveness with many state-of-the-art methods in terms of qualitative and quantitative analysis, as well as efficiency comparison. The source code is available at <span><span>https://github.com/ouyangbaicai/MSI-DTrans</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102837"},"PeriodicalIF":3.7000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938224002014","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-focus image fusion (MFIF) aims to utilize multiple images with different focal lengths to fuse into a single full-focus image. This process enhances the realism and clarity of the resulting image. In this paper, a MFIF method called MSI-DTrans was proposed. On the one hand, in order to fully utilize all the effective information that the source image carries, the proposed method adopts a multilayer semantic interaction strategy to enhance the interaction of high-frequency and low-frequency information. This approach gradually mines more abstract semantic information, guiding the generation of feature maps from coarse to fine. On the other hand, a parallel multi-scale joint self-attention computation model is designed. The model adopts dynamic sense field and dynamic token embedding to overcome the performance degradation problem when dealing with multi-scale objects. This enables self-attention to integrate long-range dependencies between objects of different scales and reduces computational overhead. Numerous experimental results show that the proposed method effectively avoids image distortion, achieves better visualization results, and demonstrates good competitiveness with many state-of-the-art methods in terms of qualitative and quantitative analysis, as well as efficiency comparison. The source code is available at https://github.com/ouyangbaicai/MSI-DTrans.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MSI-DTrans：利用多层语义交互和动态变换器进行多焦点图像融合

多焦图像融合（MFIF）旨在利用不同焦距的多幅图像融合成一幅全焦图像。这一过程可增强图像的真实感和清晰度。本文提出了一种名为 MSI-DTrans 的 MFIF 方法。一方面，为了充分利用源图像所携带的所有有效信息，所提出的方法采用了多层语义交互策略，以增强高频和低频信息的交互。这种方法能逐渐挖掘出更多抽象的语义信息，引导特征图的生成由粗到细。另一方面，设计了一个并行的多尺度联合自我注意计算模型。该模型采用动态感知场和动态标记嵌入来克服处理多尺度对象时的性能下降问题。这使得自注意能够整合不同尺度对象之间的长距离依赖关系，并减少计算开销。大量实验结果表明，所提出的方法有效地避免了图像失真，实现了更好的可视化效果，并在定性和定量分析以及效率比较方面与许多最先进的方法相比具有良好的竞争力。源代码见 https://github.com/ouyangbaicai/MSI-DTrans。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Displays 工程技术-工程：电子与电气

CiteScore

4.60

自引率

25.60%

发文量

138

审稿时长

92 days

期刊介绍： Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.

期刊最新文献

Editorial Board Adversarially Regularized Tri-Transformer Fusion for continual multimodal egocentric activity recognition A general and flexible point cloud simplification method based on feature fusion Bayesian generation based foveated JND estimation in the DCT domain Feature enhanced spherical transformer for spherical image compression