Shihao Zhang , Shaohui Jin , Hao Liu , Yue Li , Xiaoheng Jiang , Mingliang Xu
{"title":"CMFormer: Non-line-of-sight imaging with a memory-efficient MetaFormer network","authors":"Shihao Zhang , Shaohui Jin , Hao Liu , Yue Li , Xiaoheng Jiang , Mingliang Xu","doi":"10.1016/j.optlaseng.2025.108875","DOIUrl":null,"url":null,"abstract":"<div><div>Non-line-of-sight (NLOS) imaging aims to overcome the limitation of traditional sensors that can only detect targets within the line of sight. While existing NLOS imaging algorithms have achieved notable imaging quality, they are constrained by significant memory requirements due to the 3D nature of transient measurements. In this paper, we propose a new memory-efficient MetaFormer-based NLOS imaging method, named CMFormer, which enables NLOS imaging with lower memory usage and faster imaging speed, facilitating deployment on consumer-grade GPUs. Specifically, we design a lightweight module based on MetaFormer, which employs multi-dimensional global convolution and multi-scale dilated convolution as token mixers. This approach leverages the strong temporal-spatial correlation more effectively without separating the transient data into distinct temporal and spatial components for feature extraction. With the unique characteristics of this token mixer, we propose aggregate feature transmission to replace conventional skip connections, achieving better performance without needing to increase network width at the decoder stage. Additionally, to mitigate the loss of important detail features during downsampling, we design a cross-layer integration attention module to enhance the interaction between the adjacent hierarchical features. Leveraging gradient checkpointing technology, the proposed method can be easily trained and inferred on consumer-grade GPUs, significantly less than the current best imaging algorithm NLOST, and achieves an imaging speed of 8 FPS. We employ the UNet hierarchical structure to build our pipeline, ensuring that our network can better denoise and enhance generalization to real-world scenarios even when trained on synthetic datasets. Extensive experimental results demonstrate that our method achieves the best performance on both synthetic and real-world data with low memory cost and higher imaging speed. The code will be released soon.</div></div>","PeriodicalId":49719,"journal":{"name":"Optics and Lasers in Engineering","volume":"187 ","pages":"Article 108875"},"PeriodicalIF":3.5000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optics and Lasers in Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0143816625000624","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OPTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Non-line-of-sight (NLOS) imaging aims to overcome the limitation of traditional sensors that can only detect targets within the line of sight. While existing NLOS imaging algorithms have achieved notable imaging quality, they are constrained by significant memory requirements due to the 3D nature of transient measurements. In this paper, we propose a new memory-efficient MetaFormer-based NLOS imaging method, named CMFormer, which enables NLOS imaging with lower memory usage and faster imaging speed, facilitating deployment on consumer-grade GPUs. Specifically, we design a lightweight module based on MetaFormer, which employs multi-dimensional global convolution and multi-scale dilated convolution as token mixers. This approach leverages the strong temporal-spatial correlation more effectively without separating the transient data into distinct temporal and spatial components for feature extraction. With the unique characteristics of this token mixer, we propose aggregate feature transmission to replace conventional skip connections, achieving better performance without needing to increase network width at the decoder stage. Additionally, to mitigate the loss of important detail features during downsampling, we design a cross-layer integration attention module to enhance the interaction between the adjacent hierarchical features. Leveraging gradient checkpointing technology, the proposed method can be easily trained and inferred on consumer-grade GPUs, significantly less than the current best imaging algorithm NLOST, and achieves an imaging speed of 8 FPS. We employ the UNet hierarchical structure to build our pipeline, ensuring that our network can better denoise and enhance generalization to real-world scenarios even when trained on synthetic datasets. Extensive experimental results demonstrate that our method achieves the best performance on both synthetic and real-world data with low memory cost and higher imaging speed. The code will be released soon.
期刊介绍:
Optics and Lasers in Engineering aims at providing an international forum for the interchange of information on the development of optical techniques and laser technology in engineering. Emphasis is placed on contributions targeted at the practical use of methods and devices, the development and enhancement of solutions and new theoretical concepts for experimental methods.
Optics and Lasers in Engineering reflects the main areas in which optical methods are being used and developed for an engineering environment. Manuscripts should offer clear evidence of novelty and significance. Papers focusing on parameter optimization or computational issues are not suitable. Similarly, papers focussed on an application rather than the optical method fall outside the journal''s scope. The scope of the journal is defined to include the following:
-Optical Metrology-
Optical Methods for 3D visualization and virtual engineering-
Optical Techniques for Microsystems-
Imaging, Microscopy and Adaptive Optics-
Computational Imaging-
Laser methods in manufacturing-
Integrated optical and photonic sensors-
Optics and Photonics in Life Science-
Hyperspectral and spectroscopic methods-
Infrared and Terahertz techniques