通过正交平面解缠和多视角几何一致性感知进行 360 布局估计。

IF 18.6 IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-13 DOI:10.1109/TPAMI.2024.3442481

Zhijie Shen;Chunyu Lin;Junsong Zhang;Lang Nie;Kang Liao;Yao Zhao

{"title":"通过正交平面解缠和多视角几何一致性感知进行 360 布局估计。","authors":"Zhijie Shen;Chunyu Lin;Junsong Zhang;Lang Nie;Kang Liao;Yao Zhao","doi":"10.1109/TPAMI.2024.3442481","DOIUrl":null,"url":null,"abstract":"Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results as the compression process often muddles the semantics between various planes. Besides, these data-driven approaches impose an urgent demand for massive data annotations, which are laborious and time-consuming. For the first problem, we propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics. DOPNet consists of three modules that are integrated to deliver distortion-free, semantics-clean, and detail-sharp disentangled representations, which benefit the subsequent layout recovery. For the second problem, we present an unsupervised adaptation technique tailored for horizon-depth and ratio representations. Concretely, we introduce an optimization strategy for decision-level layout analysis and a 1D cost volume construction method for feature-level multi-view aggregation, both of which are designed to fully exploit the geometric consistency across multiple perspectives. The optimizer provides a reliable set of pseudo-labels for network training, while the 1D cost volume enriches each view with comprehensive scene information derived from other perspectives. Extensive experiments demonstrate that our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10486-10498"},"PeriodicalIF":18.6000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-View Geometric Consistency Perception\",\"authors\":\"Zhijie Shen;Chunyu Lin;Junsong Zhang;Lang Nie;Kang Liao;Yao Zhao\",\"doi\":\"10.1109/TPAMI.2024.3442481\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results as the compression process often muddles the semantics between various planes. Besides, these data-driven approaches impose an urgent demand for massive data annotations, which are laborious and time-consuming. For the first problem, we propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics. DOPNet consists of three modules that are integrated to deliver distortion-free, semantics-clean, and detail-sharp disentangled representations, which benefit the subsequent layout recovery. For the second problem, we present an unsupervised adaptation technique tailored for horizon-depth and ratio representations. Concretely, we introduce an optimization strategy for decision-level layout analysis and a 1D cost volume construction method for feature-level multi-view aggregation, both of which are designed to fully exploit the geometric consistency across multiple perspectives. The optimizer provides a reliable set of pseudo-labels for network training, while the 1D cost volume enriches each view with comprehensive scene information derived from other perspectives. Extensive experiments demonstrate that our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"46 12\",\"pages\":\"10486-10498\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2024-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10634822/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10634822/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

现有的全景布局估算解决方案倾向于从垂直压缩序列中恢复房间边界，但由于压缩过程通常会混淆不同平面之间的语义，因此结果并不精确。此外，这些数据驱动型方法对海量数据注释提出了迫切要求，既费力又费时。针对第一个问题，我们提出了一种正交平面解缠网络（简称 DOPNet）来区分模棱两可的语义。DOPNet 由三个模块组成，它们集成在一起，提供无失真、语义清晰和细节锐利的解缠表示，有利于后续的布局恢复。针对第二个问题，我们提出了一种针对水平深度和比例表示的无监督适应技术。具体来说，我们介绍了一种用于决策级布局分析的优化策略和一种用于特征级多视角聚合的一维成本体积构建方法，这两种方法都是为了充分利用多视角的几何一致性而设计的。优化器为网络训练提供了一组可靠的伪标签，而一维代价体积则利用从其他视角获得的综合场景信息丰富了每个视角。大量实验证明，在单目布局估计和多视角布局估计任务中，我们的解决方案都优于其他 SoTA 模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-View Geometric Consistency Perception

Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results as the compression process often muddles the semantics between various planes. Besides, these data-driven approaches impose an urgent demand for massive data annotations, which are laborious and time-consuming. For the first problem, we propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics. DOPNet consists of three modules that are integrated to deliver distortion-free, semantics-clean, and detail-sharp disentangled representations, which benefit the subsequent layout recovery. For the second problem, we present an unsupervised adaptation technique tailored for horizon-depth and ratio representations. Concretely, we introduce an optimization strategy for decision-level layout analysis and a 1D cost volume construction method for feature-level multi-view aggregation, both of which are designed to fully exploit the geometric consistency across multiple perspectives. The optimizer provides a reliable set of pseudo-labels for network training, while the 1D cost volume enriches each view with comprehensive scene information derived from other perspectives. Extensive experiments demonstrate that our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量

期刊最新文献

Calibrating Biased Distribution in VFM-Derived Latent Space via Cross-Domain Geometric Consistency. Penny-Wise and Pound-Foolish in AI-Generated Image Detection. 50 Years of Automated Face Recognition. Soft Label Pruning and Quantization for Large-Scale Dataset Distillation. On the Adversarial Transferability of Generalized "Skip Connections".