从多视图图像中重建三维形状作为盒子的联合

Zihan Yang, Minglun Gong
{"title":"从多视图图像中重建三维形状作为盒子的联合","authors":"Zihan Yang, Minglun Gong","doi":"10.1145/3609703.3609705","DOIUrl":null,"url":null,"abstract":"The task of reconstructing object shapes from input images has become increasingly important in various fields, such as computer vision, robotics, augmented reality, video games, and autonomous vehicles. While approaches for reconstructing shapes with varying levels of detail have been proposed, balancing representation accuracy and model complexity remains a challenge. To address this challenge, we propose an end-to-end approach for reconstructing object shapes from multiple images using a union of box primitives. Our approach offers a simpler and more efficient 3D representation of objects without the need for intermediate products such as voxels, resulting in faster inference times. Additionally, we introduce an auxiliary task to aid in learning how to extract and transform spatial features from images without requiring camera calibrations. Extensive experiments demonstrate that our method can produce comparable results to approaches that require 3D voxelized input while utilizing only 2D RGB images as input. Furthermore, our method significantly outperforms the aforementioned approaches in terms of inference time.","PeriodicalId":101485,"journal":{"name":"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reconstructing 3D Shapes as an Union of Boxes from Multi-View Images\",\"authors\":\"Zihan Yang, Minglun Gong\",\"doi\":\"10.1145/3609703.3609705\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The task of reconstructing object shapes from input images has become increasingly important in various fields, such as computer vision, robotics, augmented reality, video games, and autonomous vehicles. While approaches for reconstructing shapes with varying levels of detail have been proposed, balancing representation accuracy and model complexity remains a challenge. To address this challenge, we propose an end-to-end approach for reconstructing object shapes from multiple images using a union of box primitives. Our approach offers a simpler and more efficient 3D representation of objects without the need for intermediate products such as voxels, resulting in faster inference times. Additionally, we introduce an auxiliary task to aid in learning how to extract and transform spatial features from images without requiring camera calibrations. Extensive experiments demonstrate that our method can produce comparable results to approaches that require 3D voxelized input while utilizing only 2D RGB images as input. Furthermore, our method significantly outperforms the aforementioned approaches in terms of inference time.\",\"PeriodicalId\":101485,\"journal\":{\"name\":\"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems\",\"volume\":\"96 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3609703.3609705\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3609703.3609705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

从输入图像中重建物体形状的任务在计算机视觉、机器人、增强现实、视频游戏和自动驾驶汽车等各个领域变得越来越重要。虽然已经提出了重建具有不同细节级别的形状的方法,但平衡表示精度和模型复杂性仍然是一个挑战。为了解决这一挑战,我们提出了一种端到端方法,用于使用盒原语的联合从多个图像中重建物体形状。我们的方法提供了一种更简单、更有效的物体3D表示,而不需要体素等中间产品,从而加快了推理时间。此外,我们还引入了一个辅助任务来帮助学习如何在不需要相机校准的情况下从图像中提取和转换空间特征。大量的实验表明,我们的方法可以产生与只使用2D RGB图像作为输入而需要3D体素化输入的方法相当的结果。此外,我们的方法在推理时间方面明显优于上述方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Reconstructing 3D Shapes as an Union of Boxes from Multi-View Images
The task of reconstructing object shapes from input images has become increasingly important in various fields, such as computer vision, robotics, augmented reality, video games, and autonomous vehicles. While approaches for reconstructing shapes with varying levels of detail have been proposed, balancing representation accuracy and model complexity remains a challenge. To address this challenge, we propose an end-to-end approach for reconstructing object shapes from multiple images using a union of box primitives. Our approach offers a simpler and more efficient 3D representation of objects without the need for intermediate products such as voxels, resulting in faster inference times. Additionally, we introduce an auxiliary task to aid in learning how to extract and transform spatial features from images without requiring camera calibrations. Extensive experiments demonstrate that our method can produce comparable results to approaches that require 3D voxelized input while utilizing only 2D RGB images as input. Furthermore, our method significantly outperforms the aforementioned approaches in terms of inference time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Identification-Dissemination-Warning: Algorithm and Prediction of Early Warning of Network Public Opinion Exploration of transfer learning capability of multilingual models for text classification Reconstructing 3D Shapes as an Union of Boxes from Multi-View Images LLFormer: An Efficient and Real-time LiDAR Lane Detection Method based on Transformer Survey of the Formal Verification of Operating Systems in Power Monitoring System
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1