多高度关注的分层占用网络,用于以视觉为中心的三维占用预测

Can Li, Zhi Gao, Zhipeng Lin, Tonghui Ye, Ziyao Li
{"title":"多高度关注的分层占用网络,用于以视觉为中心的三维占用预测","authors":"Can Li, Zhi Gao, Zhipeng Lin, Tonghui Ye, Ziyao Li","doi":"10.1111/phor.12500","DOIUrl":null,"url":null,"abstract":"The precise geometric representation and ability to handle long‐tail targets have led to the increasing attention towards vision‐centric 3D occupancy prediction, which models the real world as a voxel‐wise model solely through visual inputs. Despite some notable achievements in this field, many prior or concurrent approaches simply adapt existing spatial cross‐attention (SCA) as their 2D–3D transformation module, which may lead to informative coupling or compromise the global receptive field along the height dimension. To overcome these limitations, we propose a hierarchical occupancy (HierOcc) network featuring our innovative height‐aware cross‐attention (HACA) and hierarchical self‐attention (HSA) as its core modules to achieve enhanced precision and completeness in 3D occupancy prediction. The former module enables 2D–3D transformation, while the latter promotes voxels’ intercommunication. The key insight behind both modules is our multi‐height attention mechanism which ensures each attention head corresponds explicitly to a specific height, thereby decoupling height information while maintaining global attention across the height dimension. Extensive experiments show that our method brings significant improvements compared to baseline and surpasses all concurrent methods, demonstrating its superiority.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A hierarchical occupancy network with multi‐height attention for vision‐centric 3D occupancy prediction\",\"authors\":\"Can Li, Zhi Gao, Zhipeng Lin, Tonghui Ye, Ziyao Li\",\"doi\":\"10.1111/phor.12500\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The precise geometric representation and ability to handle long‐tail targets have led to the increasing attention towards vision‐centric 3D occupancy prediction, which models the real world as a voxel‐wise model solely through visual inputs. Despite some notable achievements in this field, many prior or concurrent approaches simply adapt existing spatial cross‐attention (SCA) as their 2D–3D transformation module, which may lead to informative coupling or compromise the global receptive field along the height dimension. To overcome these limitations, we propose a hierarchical occupancy (HierOcc) network featuring our innovative height‐aware cross‐attention (HACA) and hierarchical self‐attention (HSA) as its core modules to achieve enhanced precision and completeness in 3D occupancy prediction. The former module enables 2D–3D transformation, while the latter promotes voxels’ intercommunication. The key insight behind both modules is our multi‐height attention mechanism which ensures each attention head corresponds explicitly to a specific height, thereby decoupling height information while maintaining global attention across the height dimension. Extensive experiments show that our method brings significant improvements compared to baseline and surpasses all concurrent methods, demonstrating its superiority.\",\"PeriodicalId\":22881,\"journal\":{\"name\":\"The Photogrammetric Record\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Photogrammetric Record\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1111/phor.12500\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Photogrammetric Record","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/phor.12500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

精确的几何表示和处理长尾目标的能力使人们越来越关注以视觉为中心的三维占位预测,这种预测仅通过视觉输入将真实世界建模为一个体素模型。尽管在这一领域取得了一些显著成就,但许多先前或同时出现的方法只是将现有的空间交叉注意(SCA)作为其 2D-3D 转换模块,这可能会导致信息耦合或损害沿高度维度的全局感受野。为了克服这些局限性,我们提出了分层占位(HierOcc)网络,以创新的高度感知交叉注意(HACA)和分层自注意(HSA)为核心模块,从而提高三维占位预测的精度和完整性。前者实现了 2D-3D 转换,后者促进了体素之间的互通。这两个模块背后的关键见解是我们的多高度注意机制,它确保每个注意头明确对应于特定高度,从而在高度维度上保持全局注意的同时解耦高度信息。广泛的实验表明,与基线相比,我们的方法带来了显著的改进,并超越了所有并行方法,证明了它的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A hierarchical occupancy network with multi‐height attention for vision‐centric 3D occupancy prediction
The precise geometric representation and ability to handle long‐tail targets have led to the increasing attention towards vision‐centric 3D occupancy prediction, which models the real world as a voxel‐wise model solely through visual inputs. Despite some notable achievements in this field, many prior or concurrent approaches simply adapt existing spatial cross‐attention (SCA) as their 2D–3D transformation module, which may lead to informative coupling or compromise the global receptive field along the height dimension. To overcome these limitations, we propose a hierarchical occupancy (HierOcc) network featuring our innovative height‐aware cross‐attention (HACA) and hierarchical self‐attention (HSA) as its core modules to achieve enhanced precision and completeness in 3D occupancy prediction. The former module enables 2D–3D transformation, while the latter promotes voxels’ intercommunication. The key insight behind both modules is our multi‐height attention mechanism which ensures each attention head corresponds explicitly to a specific height, thereby decoupling height information while maintaining global attention across the height dimension. Extensive experiments show that our method brings significant improvements compared to baseline and surpasses all concurrent methods, demonstrating its superiority.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
59th Photogrammetric Week: Advancement in photogrammetry, remote sensing and Geoinformatics Obituary for Prof. Dr.‐Ing. Dr. h.c. mult. Gottfried Konecny Topographic mapping from space dedicated to Dr. Karsten Jacobsen’s 80th birthday Frontispiece: Comparison of 3D models with texture before and after restoration ISPRS TC IV Mid‐Term Symposium: Spatial information to empower the Metaverse
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1