CMFormer: Non-line-of-sight imaging with a memory-efficient MetaFormer network

IF 3.5 2区 工程技术 Q2 OPTICS Optics and Lasers in Engineering Pub Date : 2025-02-12 DOI:10.1016/j.optlaseng.2025.108875
Shihao Zhang , Shaohui Jin , Hao Liu , Yue Li , Xiaoheng Jiang , Mingliang Xu
{"title":"CMFormer: Non-line-of-sight imaging with a memory-efficient MetaFormer network","authors":"Shihao Zhang ,&nbsp;Shaohui Jin ,&nbsp;Hao Liu ,&nbsp;Yue Li ,&nbsp;Xiaoheng Jiang ,&nbsp;Mingliang Xu","doi":"10.1016/j.optlaseng.2025.108875","DOIUrl":null,"url":null,"abstract":"<div><div>Non-line-of-sight (NLOS) imaging aims to overcome the limitation of traditional sensors that can only detect targets within the line of sight. While existing NLOS imaging algorithms have achieved notable imaging quality, they are constrained by significant memory requirements due to the 3D nature of transient measurements. In this paper, we propose a new memory-efficient MetaFormer-based NLOS imaging method, named CMFormer, which enables NLOS imaging with lower memory usage and faster imaging speed, facilitating deployment on consumer-grade GPUs. Specifically, we design a lightweight module based on MetaFormer, which employs multi-dimensional global convolution and multi-scale dilated convolution as token mixers. This approach leverages the strong temporal-spatial correlation more effectively without separating the transient data into distinct temporal and spatial components for feature extraction. With the unique characteristics of this token mixer, we propose aggregate feature transmission to replace conventional skip connections, achieving better performance without needing to increase network width at the decoder stage. Additionally, to mitigate the loss of important detail features during downsampling, we design a cross-layer integration attention module to enhance the interaction between the adjacent hierarchical features. Leveraging gradient checkpointing technology, the proposed method can be easily trained and inferred on consumer-grade GPUs, significantly less than the current best imaging algorithm NLOST, and achieves an imaging speed of 8 FPS. We employ the UNet hierarchical structure to build our pipeline, ensuring that our network can better denoise and enhance generalization to real-world scenarios even when trained on synthetic datasets. Extensive experimental results demonstrate that our method achieves the best performance on both synthetic and real-world data with low memory cost and higher imaging speed. The code will be released soon.</div></div>","PeriodicalId":49719,"journal":{"name":"Optics and Lasers in Engineering","volume":"187 ","pages":"Article 108875"},"PeriodicalIF":3.5000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optics and Lasers in Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0143816625000624","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OPTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Non-line-of-sight (NLOS) imaging aims to overcome the limitation of traditional sensors that can only detect targets within the line of sight. While existing NLOS imaging algorithms have achieved notable imaging quality, they are constrained by significant memory requirements due to the 3D nature of transient measurements. In this paper, we propose a new memory-efficient MetaFormer-based NLOS imaging method, named CMFormer, which enables NLOS imaging with lower memory usage and faster imaging speed, facilitating deployment on consumer-grade GPUs. Specifically, we design a lightweight module based on MetaFormer, which employs multi-dimensional global convolution and multi-scale dilated convolution as token mixers. This approach leverages the strong temporal-spatial correlation more effectively without separating the transient data into distinct temporal and spatial components for feature extraction. With the unique characteristics of this token mixer, we propose aggregate feature transmission to replace conventional skip connections, achieving better performance without needing to increase network width at the decoder stage. Additionally, to mitigate the loss of important detail features during downsampling, we design a cross-layer integration attention module to enhance the interaction between the adjacent hierarchical features. Leveraging gradient checkpointing technology, the proposed method can be easily trained and inferred on consumer-grade GPUs, significantly less than the current best imaging algorithm NLOST, and achieves an imaging speed of 8 FPS. We employ the UNet hierarchical structure to build our pipeline, ensuring that our network can better denoise and enhance generalization to real-world scenarios even when trained on synthetic datasets. Extensive experimental results demonstrate that our method achieves the best performance on both synthetic and real-world data with low memory cost and higher imaging speed. The code will be released soon.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Optics and Lasers in Engineering
Optics and Lasers in Engineering 工程技术-光学
CiteScore
8.90
自引率
8.70%
发文量
384
审稿时长
42 days
期刊介绍: Optics and Lasers in Engineering aims at providing an international forum for the interchange of information on the development of optical techniques and laser technology in engineering. Emphasis is placed on contributions targeted at the practical use of methods and devices, the development and enhancement of solutions and new theoretical concepts for experimental methods. Optics and Lasers in Engineering reflects the main areas in which optical methods are being used and developed for an engineering environment. Manuscripts should offer clear evidence of novelty and significance. Papers focusing on parameter optimization or computational issues are not suitable. Similarly, papers focussed on an application rather than the optical method fall outside the journal''s scope. The scope of the journal is defined to include the following: -Optical Metrology- Optical Methods for 3D visualization and virtual engineering- Optical Techniques for Microsystems- Imaging, Microscopy and Adaptive Optics- Computational Imaging- Laser methods in manufacturing- Integrated optical and photonic sensors- Optics and Photonics in Life Science- Hyperspectral and spectroscopic methods- Infrared and Terahertz techniques
期刊最新文献
High-throughput compact Raman spectrometer based on polarization transformation: Development and biological trials Comparative performance of DIC and optical flow algorithms for displacement and strain analysis in laser beam welding Development of high sensitivity shore-based laser induced fluorescence radar and its application in high-precision online monitoring of chlorophyll concentration Holographically marked metasurface towards multi-user authentication Large-format grating groove density measurement method based on optical interferometry
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1