NLOST: Non-Line-of-Sight Imaging with Transformer

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2023-06-01 DOI:10.1109/CVPR52729.2023.01279

Yue Li, Jiayong Peng, Juntian Ye, Yueyi Zhang, Feihu Xu, Zhiwei Xiong

{"title":"NLOST: Non-Line-of-Sight Imaging with Transformer","authors":"Yue Li, Jiayong Peng, Juntian Ye, Yueyi Zhang, Feihu Xu, Zhiwei Xiong","doi":"10.1109/CVPR52729.2023.01279","DOIUrl":null,"url":null,"abstract":"Time-resolved non-line-of-sight (NLOS) imaging is based on the multi-bounce indirect reflections from the hidden objects for 3D sensing. Reconstruction from NLOS measurements remains challenging especially for complicated scenes. To boost the performance, we present NLOST, the first transformer-based neural network for NLOS reconstruction. Specifically, after extracting the shallow features with the assistance of physics-based priors, we design two spatial-temporal self attention encoders to explore both local and global correlations within 3D NLOS data by splitting or downsampling the features into different scales, respectively. Then, we design a spatial-temporal cross attention decoder to integrate local and global features in the token space of transformer, resulting in deep features with high representation capabilities. Finally, deep and shallow features are fused to reconstruct the 3D volume of hidden scenes. Extensive experimental results demonstrate the superior performance of the proposed method over existing solutions on both synthetic data and real-world data captured by different NLOS imaging systems.","PeriodicalId":376416,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR52729.2023.01279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Time-resolved non-line-of-sight (NLOS) imaging is based on the multi-bounce indirect reflections from the hidden objects for 3D sensing. Reconstruction from NLOS measurements remains challenging especially for complicated scenes. To boost the performance, we present NLOST, the first transformer-based neural network for NLOS reconstruction. Specifically, after extracting the shallow features with the assistance of physics-based priors, we design two spatial-temporal self attention encoders to explore both local and global correlations within 3D NLOS data by splitting or downsampling the features into different scales, respectively. Then, we design a spatial-temporal cross attention decoder to integrate local and global features in the token space of transformer, resulting in deep features with high representation capabilities. Finally, deep and shallow features are fused to reconstruct the 3D volume of hidden scenes. Extensive experimental results demonstrate the superior performance of the proposed method over existing solutions on both synthetic data and real-world data captured by different NLOS imaging systems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

非视距成像与变压器

时间分辨非视距成像(NLOS)是基于隐藏物体的多次反射间接反射来实现三维传感的。从NLOS测量数据重建仍然具有挑战性，特别是对于复杂的场景。为了提高性能，我们提出了NLOST，这是第一个基于变压器的NLOS重建神经网络。具体而言，在基于物理先验的帮助下提取浅层特征后，我们设计了两个时空自注意编码器，分别通过将特征拆分或下采样到不同尺度来探索3D NLOS数据中的局部和全局相关性。然后，我们设计了一个时空交叉注意解码器，将局部和全局特征整合到转换器的令牌空间中，得到具有高表示能力的深度特征。最后，融合深、浅特征重建隐藏场景的三维体。大量的实验结果表明，该方法在不同NLOS成像系统捕获的合成数据和实际数据上都优于现有解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量

期刊最新文献

L-CoIns: Language-based Colorization With Instance Awareness Neural Texture Synthesis with Guided Correspondence LOGO: A Long-Form Video Dataset for Group Action Quality Assessment ERM-KTP: Knowledge-Level Machine Unlearning via Knowledge Transfer Target-referenced Reactive Grasping for Dynamic Objects