MonoAMNet：基于自适应方法的三阶段实时单目三维目标检测

IF 8.4 1区工程技术 Q1 ENGINEERING, CIVIL IEEE Transactions on Intelligent Transportation Systems Pub Date : 2025-01-16 DOI:10.1109/TITS.2025.3525772

Huihui Pan;Yisong Jia;Jue Wang;Weichao Sun

{"title":"MonoAMNet：基于自适应方法的三阶段实时单目三维目标检测","authors":"Huihui Pan;Yisong Jia;Jue Wang;Weichao Sun","doi":"10.1109/TITS.2025.3525772","DOIUrl":null,"url":null,"abstract":"Monocular 3D object detection finds applications in various fields, notably in intelligent driving, due to its cost-effectiveness and ease of deployment. However, its accuracy significantly lags behind LiDAR-based methods, primarily because the monocular depth estimation problem is inherently challenging. While some methods leverage additional information to aid in network training and enhance performance, they are hindered by their reliance on specific datasets. We contend that many components of monocular 3D object detection lack the necessary adaptability, impeding the performance of the detector. In this paper, we propose six adaptive methods addressing issues related to network structure, loss function, and optimizer. These methods specifically target the rigid components within the detector that hinder adaptability. Simultaneously, we provide theoretical insights into the network output and propose two novel regression methods. These methods facilitate more straightforward learning for the network. Importantly, our approach does not depend on supplementary information, allowing for end-to-end training. In comparison with existing methods, our proposed approach demonstrates competitive speed and accuracy. On the KITTI dataset, our method achieves a 17.72% AP3D(IOU =0.7, Car, Moderate), outperforming all previous monocular methods. Additionally, our approach prioritizes speed, achieving a runtime of up to 52 FPS on an RTX 2080Ti GPU, surpassing all previous monocular methods. The source codes are at: <uri>https://github.com/jiayisong/AMNet</uri>.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 3","pages":"3574-3587"},"PeriodicalIF":8.4000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MonoAMNet: Three-Stage Real-Time Monocular 3D Object Detection With Adaptive Methods\",\"authors\":\"Huihui Pan;Yisong Jia;Jue Wang;Weichao Sun\",\"doi\":\"10.1109/TITS.2025.3525772\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Monocular 3D object detection finds applications in various fields, notably in intelligent driving, due to its cost-effectiveness and ease of deployment. However, its accuracy significantly lags behind LiDAR-based methods, primarily because the monocular depth estimation problem is inherently challenging. While some methods leverage additional information to aid in network training and enhance performance, they are hindered by their reliance on specific datasets. We contend that many components of monocular 3D object detection lack the necessary adaptability, impeding the performance of the detector. In this paper, we propose six adaptive methods addressing issues related to network structure, loss function, and optimizer. These methods specifically target the rigid components within the detector that hinder adaptability. Simultaneously, we provide theoretical insights into the network output and propose two novel regression methods. These methods facilitate more straightforward learning for the network. Importantly, our approach does not depend on supplementary information, allowing for end-to-end training. In comparison with existing methods, our proposed approach demonstrates competitive speed and accuracy. On the KITTI dataset, our method achieves a 17.72% AP3D(IOU =0.7, Car, Moderate), outperforming all previous monocular methods. Additionally, our approach prioritizes speed, achieving a runtime of up to 52 FPS on an RTX 2080Ti GPU, surpassing all previous monocular methods. The source codes are at: <uri>https://github.com/jiayisong/AMNet</uri>.\",\"PeriodicalId\":13416,\"journal\":{\"name\":\"IEEE Transactions on Intelligent Transportation Systems\",\"volume\":\"26 3\",\"pages\":\"3574-3587\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2025-01-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Intelligent Transportation Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10843993/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Transportation Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10843993/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

摘要

单目3D目标检测由于其成本效益和易于部署，在各个领域都有应用，特别是在智能驾驶领域。然而，其精度明显落后于基于激光雷达的方法，主要是因为单目深度估计问题本身就具有挑战性。虽然一些方法利用额外的信息来帮助网络训练和提高性能，但它们受到对特定数据集的依赖的阻碍。我们认为单目三维物体检测的许多组件缺乏必要的适应性，阻碍了检测器的性能。在本文中，我们提出了六种自适应方法来解决与网络结构、损失函数和优化器相关的问题。这些方法专门针对探测器内阻碍适应性的刚性部件。同时，我们对网络输出提供了理论见解，并提出了两种新的回归方法。这些方法使得网络的学习更加直接。重要的是，我们的方法不依赖于补充信息，允许端到端训练。与现有方法相比，我们提出的方法具有竞争力的速度和准确性。在KITTI数据集上，我们的方法实现了17.72%的AP3D(IOU =0.7, Car, Moderate)，优于以往的所有单目方法。此外，我们的方法优先考虑速度，在RTX 2080Ti GPU上实现高达52 FPS的运行时，超过了所有以前的单目方法。源代码在：https://github.com/jiayisong/AMNet。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MonoAMNet: Three-Stage Real-Time Monocular 3D Object Detection With Adaptive Methods

Monocular 3D object detection finds applications in various fields, notably in intelligent driving, due to its cost-effectiveness and ease of deployment. However, its accuracy significantly lags behind LiDAR-based methods, primarily because the monocular depth estimation problem is inherently challenging. While some methods leverage additional information to aid in network training and enhance performance, they are hindered by their reliance on specific datasets. We contend that many components of monocular 3D object detection lack the necessary adaptability, impeding the performance of the detector. In this paper, we propose six adaptive methods addressing issues related to network structure, loss function, and optimizer. These methods specifically target the rigid components within the detector that hinder adaptability. Simultaneously, we provide theoretical insights into the network output and propose two novel regression methods. These methods facilitate more straightforward learning for the network. Importantly, our approach does not depend on supplementary information, allowing for end-to-end training. In comparison with existing methods, our proposed approach demonstrates competitive speed and accuracy. On the KITTI dataset, our method achieves a 17.72% AP3D(IOU =0.7, Car, Moderate), outperforming all previous monocular methods. Additionally, our approach prioritizes speed, achieving a runtime of up to 52 FPS on an RTX 2080Ti GPU, surpassing all previous monocular methods. The source codes are at: https://github.com/jiayisong/AMNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Intelligent Transportation Systems 工程技术-工程：电子与电气

CiteScore

14.80

自引率

12.90%

发文量

1872

审稿时长

7.5 months

期刊介绍： The theoretical, experimental and operational aspects of electrical and electronics engineering and information technologies as applied to Intelligent Transportation Systems (ITS). Intelligent Transportation Systems are defined as those systems utilizing synergistic technologies and systems engineering concepts to develop and improve transportation systems of all kinds. The scope of this interdisciplinary activity includes the promotion, consolidation and coordination of ITS technical activities among IEEE entities, and providing a focus for cooperative activities, both internally and externally.

期刊最新文献

IEEE Intelligent Transportation Systems Society Information An Adaptive Forwarding With Path Optimization Method for Vehicular Named Data Networking Vehicle Localization in GPS-Denied Scenarios Using Arc-Length-Based Map Matching IEEE Intelligent Transportation Systems Society Information Controllable Multimodal Motion Behavior Generation for Autonomous Driving