CMFormer: Non-line-of-sight imaging with a memory-efficient MetaFormer network

IF 3.5 2区工程技术 Q2 OPTICS Optics and Lasers in Engineering Pub Date : 2025-02-12 DOI:10.1016/j.optlaseng.2025.108875

Shihao Zhang , Shaohui Jin , Hao Liu , Yue Li , Xiaoheng Jiang , Mingliang Xu

{"title":"CMFormer: Non-line-of-sight imaging with a memory-efficient MetaFormer network","authors":"Shihao Zhang , Shaohui Jin , Hao Liu , Yue Li , Xiaoheng Jiang , Mingliang Xu","doi":"10.1016/j.optlaseng.2025.108875","DOIUrl":null,"url":null,"abstract":"<div><div>Non-line-of-sight (NLOS) imaging aims to overcome the limitation of traditional sensors that can only detect targets within the line of sight. While existing NLOS imaging algorithms have achieved notable imaging quality, they are constrained by significant memory requirements due to the 3D nature of transient measurements. In this paper, we propose a new memory-efficient MetaFormer-based NLOS imaging method, named CMFormer, which enables NLOS imaging with lower memory usage and faster imaging speed, facilitating deployment on consumer-grade GPUs. Specifically, we design a lightweight module based on MetaFormer, which employs multi-dimensional global convolution and multi-scale dilated convolution as token mixers. This approach leverages the strong temporal-spatial correlation more effectively without separating the transient data into distinct temporal and spatial components for feature extraction. With the unique characteristics of this token mixer, we propose aggregate feature transmission to replace conventional skip connections, achieving better performance without needing to increase network width at the decoder stage. Additionally, to mitigate the loss of important detail features during downsampling, we design a cross-layer integration attention module to enhance the interaction between the adjacent hierarchical features. Leveraging gradient checkpointing technology, the proposed method can be easily trained and inferred on consumer-grade GPUs, significantly less than the current best imaging algorithm NLOST, and achieves an imaging speed of 8 FPS. We employ the UNet hierarchical structure to build our pipeline, ensuring that our network can better denoise and enhance generalization to real-world scenarios even when trained on synthetic datasets. Extensive experimental results demonstrate that our method achieves the best performance on both synthetic and real-world data with low memory cost and higher imaging speed. The code will be released soon.</div></div>","PeriodicalId":49719,"journal":{"name":"Optics and Lasers in Engineering","volume":"187 ","pages":"Article 108875"},"PeriodicalIF":3.5000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optics and Lasers in Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0143816625000624","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OPTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Non-line-of-sight (NLOS) imaging aims to overcome the limitation of traditional sensors that can only detect targets within the line of sight. While existing NLOS imaging algorithms have achieved notable imaging quality, they are constrained by significant memory requirements due to the 3D nature of transient measurements. In this paper, we propose a new memory-efficient MetaFormer-based NLOS imaging method, named CMFormer, which enables NLOS imaging with lower memory usage and faster imaging speed, facilitating deployment on consumer-grade GPUs. Specifically, we design a lightweight module based on MetaFormer, which employs multi-dimensional global convolution and multi-scale dilated convolution as token mixers. This approach leverages the strong temporal-spatial correlation more effectively without separating the transient data into distinct temporal and spatial components for feature extraction. With the unique characteristics of this token mixer, we propose aggregate feature transmission to replace conventional skip connections, achieving better performance without needing to increase network width at the decoder stage. Additionally, to mitigate the loss of important detail features during downsampling, we design a cross-layer integration attention module to enhance the interaction between the adjacent hierarchical features. Leveraging gradient checkpointing technology, the proposed method can be easily trained and inferred on consumer-grade GPUs, significantly less than the current best imaging algorithm NLOST, and achieves an imaging speed of 8 FPS. We employ the UNet hierarchical structure to build our pipeline, ensuring that our network can better denoise and enhance generalization to real-world scenarios even when trained on synthetic datasets. Extensive experimental results demonstrate that our method achieves the best performance on both synthetic and real-world data with low memory cost and higher imaging speed. The code will be released soon.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Optics and Lasers in Engineering 工程技术-光学

CiteScore

8.90

自引率

8.70%

发文量

384

审稿时长

42 days

期刊介绍： Optics and Lasers in Engineering aims at providing an international forum for the interchange of information on the development of optical techniques and laser technology in engineering. Emphasis is placed on contributions targeted at the practical use of methods and devices, the development and enhancement of solutions and new theoretical concepts for experimental methods. Optics and Lasers in Engineering reflects the main areas in which optical methods are being used and developed for an engineering environment. Manuscripts should offer clear evidence of novelty and significance. Papers focusing on parameter optimization or computational issues are not suitable. Similarly, papers focussed on an application rather than the optical method fall outside the journal''s scope. The scope of the journal is defined to include the following: -Optical Metrology- Optical Methods for 3D visualization and virtual engineering- Optical Techniques for Microsystems- Imaging, Microscopy and Adaptive Optics- Computational Imaging- Laser methods in manufacturing- Integrated optical and photonic sensors- Optics and Photonics in Life Science- Hyperspectral and spectroscopic methods- Infrared and Terahertz techniques