Deep End-to-end 3D Person Detection from Camera and Lidar

2019 IEEE Intelligent Transportation Systems Conference (ITSC) Pub Date : 2019-10-01 DOI:10.1109/ITSC.2019.8917366

M. Roth, Dominik Jargot, D. Gavrila

引用次数: 11

Abstract

We present a method for 3D person detection from camera images and lidar point clouds in automotive scenes. The method comprises a deep neural network which estimates the 3D location and extent of persons present in the scene. 3D anchor proposals are refined in two stages: a region proposal network and a subsequent detection network.For both input modalities high-level feature representations are learned from raw sensor data instead of being manually designed. To that end, we use Voxel Feature Encoders [1] to obtain point cloud features instead of widely used projection-based point cloud representations, thus allowing the network to learn to predict the location and extent of persons in an end-to-end manner.Experiments on the validation set of the KITTI 3D object detection benchmark [2] show that the proposed method outperforms state-of-the-art methods with an average precision (AP) of 47.06% on moderate difficulty.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从相机和激光雷达的深度端到端3D人检测

我们提出了一种基于汽车场景中相机图像和激光雷达点云的三维人检测方法。该方法包括一个深度神经网络，用于估计场景中存在的人的三维位置和范围。三维锚建议的细化分为两个阶段:区域建议网络和后续检测网络。对于这两种输入模式，高级特征表示都是从原始传感器数据中学习而不是手工设计的。为此，我们使用体素特征编码器[1]来获取点云特征，而不是广泛使用的基于投影的点云表示，从而使网络能够以端到端方式学习预测人的位置和范围。在KITTI三维目标检测基准验证集上的实验[2]表明，本文方法在中等难度下的平均精度(AP)为47.06%，优于目前最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE Intelligent Transportation Systems Conference (ITSC)

自引率

0.00%

发文量

期刊最新文献

Reliable Monocular Ego-Motion Estimation System in Rainy Urban Environments Coarse-to-Fine Luminance Estimation for Low-Light Image Enhancement in Maritime Video Surveillance Vehicle Occupancy Detection for HOV/HOT Lanes Enforcement Road Roughness Crowd-Sensing with Smartphone Apps LACI: Low-effort Automatic Calibration of Infrastructure Sensors