PCTNet: 3D Point Cloud and Transformer Network for Monocular Depth Estimation

Yu Hong, Xiaolong Liu, H. Dai, Wenqi Tao
{"title":"PCTNet: 3D Point Cloud and Transformer Network for Monocular Depth Estimation","authors":"Yu Hong, Xiaolong Liu, H. Dai, Wenqi Tao","doi":"10.1109/ICIET55102.2022.9779008","DOIUrl":null,"url":null,"abstract":"Estimating dense depth map from one image is a challenging task for computer vision. Because the same image can correspond to the infinite variety of 3D spaces. Neural networks have gradually achieved reasonable results on this task with the continuous development of deep learning. But the depth estimation method based on monocular cameras still has a gap in accuracy compared with multi-view or sensor-based methods. Thus, this paper proposes to supplement a limited number of sparse 3D point clouds combined with transformer processing to increase the accuracy of the monocular depth estimation model. The sparse 3D point clouds are used as supplementary geometric information and the 3D point clouds are input into the network with the RGB image. After five times integration, the multi-scale features are extracted, and then the swin transformer block is used to process the output feature map of the main network, further improving the accuracy. Experiments demonstrate that our network achieves better results than the best method on the current most commonly used dataset for monocular depth estimation, NYU Depth V2. However, the qualitative results are also better than the best method.","PeriodicalId":371262,"journal":{"name":"2022 10th International Conference on Information and Education Technology (ICIET)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 10th International Conference on Information and Education Technology (ICIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIET55102.2022.9779008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Estimating dense depth map from one image is a challenging task for computer vision. Because the same image can correspond to the infinite variety of 3D spaces. Neural networks have gradually achieved reasonable results on this task with the continuous development of deep learning. But the depth estimation method based on monocular cameras still has a gap in accuracy compared with multi-view or sensor-based methods. Thus, this paper proposes to supplement a limited number of sparse 3D point clouds combined with transformer processing to increase the accuracy of the monocular depth estimation model. The sparse 3D point clouds are used as supplementary geometric information and the 3D point clouds are input into the network with the RGB image. After five times integration, the multi-scale features are extracted, and then the swin transformer block is used to process the output feature map of the main network, further improving the accuracy. Experiments demonstrate that our network achieves better results than the best method on the current most commonly used dataset for monocular depth estimation, NYU Depth V2. However, the qualitative results are also better than the best method.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于单目深度估计的三维点云和变压器网络
从一幅图像中估计密集深度图是计算机视觉的一个具有挑战性的任务。因为同一张图像可以对应无限多种三维空间。随着深度学习的不断发展,神经网络在这一任务上逐渐取得了合理的结果。但是基于单目摄像机的深度估计方法与基于多视点或传感器的深度估计方法相比,在精度上还有一定的差距。因此,本文提出补充有限数量的稀疏三维点云并结合变压器处理来提高单目深度估计模型的精度。利用稀疏的三维点云作为补充几何信息,将三维点云与RGB图像一起输入到网络中。经过5次积分提取多尺度特征,再利用旋转变压器块对主网输出特征图进行处理,进一步提高了精度。实验表明,我们的网络在当前最常用的单目深度估计数据集NYU depth V2上取得了比最佳方法更好的结果。但定性结果也优于最佳方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Reforms to Teaching Practices in College Human Resource Management Courses Based on Integration with Regional Industries Recognizing Objects from on-Board Vehicle Footage to Build an Educational Foundation for Undergraduate Research in Computer Vision An AI Mock-interview Platform for Interview Performance Analysis Gender Inequity in Engineering Higher Education: A Case Study of an American University in a Middle Eastern Country Designing a Teaching Model in Pharmacotherapeutics Course to Improve Learning Outcomes Through Web-Based Learning for Pharmacy Students
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1