Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913856
Mukhlas A. Rasyidy, Y. Y. Nazaruddin, A. Widyotriatmo
This paper describes a technique to perform a post-detection fusion of camera and LiDAR data for road boundary estimation tasks. To be specific, the technique takes the road boundary detection results that are generated separately from the camera and LiDAR to enhance the accuracy of the estimated road boundaries. The proposed approach can achieve a more accurate estimation in the near range than just LiDAR-based detection and in the long range than just camera-based detection. Random sample consensus (RANSAC) of linear regressions is used to create the road boundary model that is capable of reducing errors and outliers while keeping it simple, explainable, and adaptive to the road curvature. The generated linear models are then combined into a single road boundary that can be interpolated and extrapolated using a Boosting-like algorithm with a non-parametric strategy. This technique is called as RANSAC-Ensemble. The experiments show that this technique has better accuracy with comparable processing time than certain other common methods of road boundary model estimation.
{"title":"Regression with Ensemble of RANSAC in Camera-LiDAR Fusion for Road Boundary Detection and Modeling","authors":"Mukhlas A. Rasyidy, Y. Y. Nazaruddin, A. Widyotriatmo","doi":"10.1109/MFI55806.2022.9913856","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913856","url":null,"abstract":"This paper describes a technique to perform a post-detection fusion of camera and LiDAR data for road boundary estimation tasks. To be specific, the technique takes the road boundary detection results that are generated separately from the camera and LiDAR to enhance the accuracy of the estimated road boundaries. The proposed approach can achieve a more accurate estimation in the near range than just LiDAR-based detection and in the long range than just camera-based detection. Random sample consensus (RANSAC) of linear regressions is used to create the road boundary model that is capable of reducing errors and outliers while keeping it simple, explainable, and adaptive to the road curvature. The generated linear models are then combined into a single road boundary that can be interpolated and extrapolated using a Boosting-like algorithm with a non-parametric strategy. This technique is called as RANSAC-Ensemble. The experiments show that this technique has better accuracy with comparable processing time than certain other common methods of road boundary model estimation.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"12 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113979383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913858
Tim Baur, J. Reuter, Antonio Zea, U. Hanebeck
With the high resolution of modern sensors such as multilayer LiDARs, estimating the 3D shape in an extended object tracking procedure is possible. In recent years, 3D shapes have been estimated in spherical coordinates using Gaussian processes, spherical double Fourier series or spherical harmonics. However, observations have shown that in many scenarios only a few measurements are obtained from top or bottom surfaces, leading to error-prone estimates in spherical coordinates. Therefore, in this paper we propose to estimate the shape in cylindrical coordinates instead, applying harmonic functions. Specifically, we derive an expansion for 3D shapes in cylindrical coordinates by solving a boundary value problem for the Laplace equation. This shape representation is then integrated in a plain greedy association model and compared to shape estimation procedures in spherical coordinates. Since the shape representation is only integrated in a basic estimator, the results are preliminary and a detailed discussion for future work is presented at the end of the paper.
{"title":"Harmonic Functions for Three-Dimensional Shape Estimation in Cylindrical Coordinates","authors":"Tim Baur, J. Reuter, Antonio Zea, U. Hanebeck","doi":"10.1109/MFI55806.2022.9913858","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913858","url":null,"abstract":"With the high resolution of modern sensors such as multilayer LiDARs, estimating the 3D shape in an extended object tracking procedure is possible. In recent years, 3D shapes have been estimated in spherical coordinates using Gaussian processes, spherical double Fourier series or spherical harmonics. However, observations have shown that in many scenarios only a few measurements are obtained from top or bottom surfaces, leading to error-prone estimates in spherical coordinates. Therefore, in this paper we propose to estimate the shape in cylindrical coordinates instead, applying harmonic functions. Specifically, we derive an expansion for 3D shapes in cylindrical coordinates by solving a boundary value problem for the Laplace equation. This shape representation is then integrated in a plain greedy association model and compared to shape estimation procedures in spherical coordinates. Since the shape representation is only integrated in a basic estimator, the results are preliminary and a detailed discussion for future work is presented at the end of the paper.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133995905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913867
Florent Feriol, Yoko Watanabe, Damien Vivet
Context-adaptive navigation is currently considered as one of the potential solutions to achieve a more precise and robust positioning. The goal would be to adapt the sensor parameters and the navigation filter structure so that it takes into account the context-dependant sensor performance, notably GNSS signal degradations. For that, a reliable context detection is essential. This paper proposes a GNSS-based environmental context detector which classifies the environment surrounding a vehicle into four classes: canyon, open-sky, trees and urban. A support-vector machine classifier is trained on our database collected around Toulouse. We first show the classification results of a model based on GNSS data only, revealing its limitation to distinguish trees and urban contexts. For addressing this issue, this paper proposes the vision-enhanced model by adding satellite visibility information from sky segmentation on fisheye camera images. Compared to the GNSS-only model, the proposed vision-enhanced model significantly improved the classification performance and raised an average F1-score from 78% to 86%.
{"title":"Vision-enhanced GNSS-based environmental context detection for autonomous vehicle navigation","authors":"Florent Feriol, Yoko Watanabe, Damien Vivet","doi":"10.1109/MFI55806.2022.9913867","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913867","url":null,"abstract":"Context-adaptive navigation is currently considered as one of the potential solutions to achieve a more precise and robust positioning. The goal would be to adapt the sensor parameters and the navigation filter structure so that it takes into account the context-dependant sensor performance, notably GNSS signal degradations. For that, a reliable context detection is essential. This paper proposes a GNSS-based environmental context detector which classifies the environment surrounding a vehicle into four classes: canyon, open-sky, trees and urban. A support-vector machine classifier is trained on our database collected around Toulouse. We first show the classification results of a model based on GNSS data only, revealing its limitation to distinguish trees and urban contexts. For addressing this issue, this paper proposes the vision-enhanced model by adding satellite visibility information from sky segmentation on fisheye camera images. Compared to the GNSS-only model, the proposed vision-enhanced model significantly improved the classification performance and raised an average F1-score from 78% to 86%.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122849670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913844
Semih Beycimen, Dmitry I. Ignatyev, A. Zolotas
This paper presents a study of end-to-end methods for predicting autonomous vehicle navigation parameters. Image-based and Image & Lidar points-based end-to-end models have been trained under Nvidia learning architectures as well as Densenet-169, Resnet-152 and Inception-v4. Various learning parameters for autonomous vehicle navigation, input models and pre-processing data algorithms i.e. image cropping, noise removing, semantic segmentation for image data have been investigated and tested. The best ones, from the rigorous investigation, are selected for the main framework of the study. Results reveal that the Nvidia architecture trained Image & Lidar points-based method offers the better results accuracy rate-wise for steering angle and speed.
{"title":"Predicting Autonomous Vehicle Navigation Parameters via Image and Image-and-Point Cloud Fusion-based End-to-End Methods","authors":"Semih Beycimen, Dmitry I. Ignatyev, A. Zolotas","doi":"10.1109/MFI55806.2022.9913844","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913844","url":null,"abstract":"This paper presents a study of end-to-end methods for predicting autonomous vehicle navigation parameters. Image-based and Image & Lidar points-based end-to-end models have been trained under Nvidia learning architectures as well as Densenet-169, Resnet-152 and Inception-v4. Various learning parameters for autonomous vehicle navigation, input models and pre-processing data algorithms i.e. image cropping, noise removing, semantic segmentation for image data have been investigated and tested. The best ones, from the rigorous investigation, are selected for the main framework of the study. Results reveal that the Nvidia architecture trained Image & Lidar points-based method offers the better results accuracy rate-wise for steering angle and speed.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114716226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913853
John Daniel Bossér, Gustaf Hendeby, M. L. Nordenvaad, I. Skog
A theoretically sound likelihood function for passive sonar surveillance using a hydrophone array is presented. The likelihood is derived from first order principles along with the assumption that the source signal can be approximated as white Gaussian noise within the considered frequency band. The resulting likelihood is a nonlinear function of the delay-and-sum beamformer response and signal-to-noise ratio (SNR).Evaluation of the proposed likelihood function is done by using it in a Bernoulli filter based track-before-detect (TkBD) framework. As a reference, the same TkBD framework, but with another beamforming response based likelihood, is used. Results from Monte-Carlo simulations of two bearings-only tracking scenarios are presented. The results show that the TkBD framework with the proposed likelihood yields an approx. 10 seconds faster target detection for a target at an SNR of -27 dB, and a lower bearing tracking error. Compared to a classical detect-and-track target tracker, the TkBD framework with the proposed likelihood yields 4 dB to 5 dB detection gain.
{"title":"A Statistically Motivated Likelihood for Track-Before-Detect","authors":"John Daniel Bossér, Gustaf Hendeby, M. L. Nordenvaad, I. Skog","doi":"10.1109/MFI55806.2022.9913853","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913853","url":null,"abstract":"A theoretically sound likelihood function for passive sonar surveillance using a hydrophone array is presented. The likelihood is derived from first order principles along with the assumption that the source signal can be approximated as white Gaussian noise within the considered frequency band. The resulting likelihood is a nonlinear function of the delay-and-sum beamformer response and signal-to-noise ratio (SNR).Evaluation of the proposed likelihood function is done by using it in a Bernoulli filter based track-before-detect (TkBD) framework. As a reference, the same TkBD framework, but with another beamforming response based likelihood, is used. Results from Monte-Carlo simulations of two bearings-only tracking scenarios are presented. The results show that the TkBD framework with the proposed likelihood yields an approx. 10 seconds faster target detection for a target at an SNR of -27 dB, and a lower bearing tracking error. Compared to a classical detect-and-track target tracker, the TkBD framework with the proposed likelihood yields 4 dB to 5 dB detection gain.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122586815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913852
Zichen Liang, Hu Cao, Chu Yang, Zikai Zhang, G. Chen
Event sequence conveys asynchronous pixel-wise visual information in a low power and high temporal resolution manner, which enables more robust perception under challenging conditions, e.g., fast motion. Two main factors limit the development of event-based object detection in traffic scenes: lack of high-quality datasets and effective event-based algorithms. To solve the first problem, we propose a simulated event-based detection dataset named EventKITTI, which incorporates the novel event modality information into a mixed two-level (i.e. object-level and video-level) detection dataset under traffic scenarios. EventKITTI possesses the high-quality event stream and the largest number of categories at microsecond temporal resolution and 1242×375 spatial resolution, exceeding existing datasets. As for the second problem, existing algorithms rely on CNN-based, spiking or graph architectures to capture local features of moving objects, leading to poor performance in objects with incomplete contours. Hence, we propose event-based object detectors named GFA-Net and CGFA-Net. To enhance the global-local learning ability in the spatial dimension, GFA-Net introduces transformer with edge-based position encoding and multi-scale feature fusion to detect objects on static frame. Furthermore, CGFA-Net optimizes edge-based position encoding with close-loop learning based on previous detected heatmap, which aggregates temporal global features across event frames. The proposed event-based object detectors achieve the best speed-accuracy trade-off on EventKITTI, approaching an 81.3% MAP at 33.0 FPS on object-level detection dataset and a 64.5% MAP at 30.3 FPS on video-level detection dataset.
{"title":"Global-local Feature Aggregation for Event-based Object Detection on EventKITTI","authors":"Zichen Liang, Hu Cao, Chu Yang, Zikai Zhang, G. Chen","doi":"10.1109/MFI55806.2022.9913852","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913852","url":null,"abstract":"Event sequence conveys asynchronous pixel-wise visual information in a low power and high temporal resolution manner, which enables more robust perception under challenging conditions, e.g., fast motion. Two main factors limit the development of event-based object detection in traffic scenes: lack of high-quality datasets and effective event-based algorithms. To solve the first problem, we propose a simulated event-based detection dataset named EventKITTI, which incorporates the novel event modality information into a mixed two-level (i.e. object-level and video-level) detection dataset under traffic scenarios. EventKITTI possesses the high-quality event stream and the largest number of categories at microsecond temporal resolution and 1242×375 spatial resolution, exceeding existing datasets. As for the second problem, existing algorithms rely on CNN-based, spiking or graph architectures to capture local features of moving objects, leading to poor performance in objects with incomplete contours. Hence, we propose event-based object detectors named GFA-Net and CGFA-Net. To enhance the global-local learning ability in the spatial dimension, GFA-Net introduces transformer with edge-based position encoding and multi-scale feature fusion to detect objects on static frame. Furthermore, CGFA-Net optimizes edge-based position encoding with close-loop learning based on previous detected heatmap, which aggregates temporal global features across event frames. The proposed event-based object detectors achieve the best speed-accuracy trade-off on EventKITTI, approaching an 81.3% MAP at 33.0 FPS on object-level detection dataset and a 64.5% MAP at 30.3 FPS on video-level detection dataset.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114185703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913864
A. Jøsang, Jinny Cho, Feng Chen
The noninformative prior weight W of a Dirichlet PDF (Probability Density Function) determines the balance between the prior probability and the influence of new observations on the posterior probability distribution. In this work, we propose a method for dynamically converging the weight W in a way that satisfies two constraints. The first constraint is that the prior Dirichlet PDF (i.e. in the absence of evidence) must always be uniform, which dictates that W = k where k is the cardinality of the domain. The second constraint is that the prior weight of large domains must not be so heavy that it prevents new observation evidence from having the expected influence over the shape of the Dirichlet PDF, which dictates that W quickly converges to a low constant CW in the presence of observation evidence, where typically CW = 2. In the case of a binary domain, the noninformative prior weight is normally set to W = 2, irrespective of the amount of evidence. In the case of a multidimensional domain with arbitrarily large cardinality k, the noninformative prior weight is initially equal to the domain cardinality k, but rapidly decreases to the constant convergence factor CW as the amount of evidence increases.
{"title":"Noninformative Prior Weights for Dirichlet PDFs*","authors":"A. Jøsang, Jinny Cho, Feng Chen","doi":"10.1109/MFI55806.2022.9913864","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913864","url":null,"abstract":"The noninformative prior weight W of a Dirichlet PDF (Probability Density Function) determines the balance between the prior probability and the influence of new observations on the posterior probability distribution. In this work, we propose a method for dynamically converging the weight W in a way that satisfies two constraints. The first constraint is that the prior Dirichlet PDF (i.e. in the absence of evidence) must always be uniform, which dictates that W = k where k is the cardinality of the domain. The second constraint is that the prior weight of large domains must not be so heavy that it prevents new observation evidence from having the expected influence over the shape of the Dirichlet PDF, which dictates that W quickly converges to a low constant CW in the presence of observation evidence, where typically CW = 2. In the case of a binary domain, the noninformative prior weight is normally set to W = 2, irrespective of the amount of evidence. In the case of a multidimensional domain with arbitrarily large cardinality k, the noninformative prior weight is initially equal to the domain cardinality k, but rapidly decreases to the constant convergence factor CW as the amount of evidence increases.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130835556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913849
Juan D. González, Hans-Joachim Wünsche
We propose a method to extract spatial context of unknown objects in a driving scenario by classifying the surfaces in which the traffic participants transit. In order to classify these surfaces without the need for a big amount of labeled data, we resort to an unsupervised learning method that clusters patches of terrain using features extracted from LiDAR and image data. Using an iterative method, we are able to model the characteristics of map features from a geographical information system (GIS), such as streets and sidewalks, and extend their contextual meaning to the area around our test vehicle. We evaluate our results using a partially labeled 3D scan of our campus and find that our method is able to correctly extract and extend the spatial context of the map features from the GIS to the labeled surfaces on the campus.
{"title":"Context Extraction from GIS Data Using LiDAR and Camera Features","authors":"Juan D. González, Hans-Joachim Wünsche","doi":"10.1109/MFI55806.2022.9913849","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913849","url":null,"abstract":"We propose a method to extract spatial context of unknown objects in a driving scenario by classifying the surfaces in which the traffic participants transit. In order to classify these surfaces without the need for a big amount of labeled data, we resort to an unsupervised learning method that clusters patches of terrain using features extracted from LiDAR and image data. Using an iterative method, we are able to model the characteristics of map features from a geographical information system (GIS), such as streets and sidewalks, and extend their contextual meaning to the area around our test vehicle. We evaluate our results using a partially labeled 3D scan of our campus and find that our method is able to correctly extract and extend the spatial context of the map features from the GIS to the labeled surfaces on the campus.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128077425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913874
Hao Shen, Xiwen Yang, D. Lin, Jianduo Chai, Jiakai Huo, Xiaofeng Xing, Shaoming He
Vision-based multi-sensor multi-object tracking is a fundamental task in the applications of a swarm of Unmanned Aerial Vehicles (UAVs). The benchmark datasets are critical to the development of computer vision research since they can provide a fair and principled way to evaluate various approaches and promote the improvement of corresponding algorithms. In recent years, many benchmarks have been created for single-camera single-object tracking, single-camera multi-object detection, and single-camera multi-object tracking scenarios. However, up to the best of our knowledge, few benchmarks of multi-camera multi-object tracking have been provided. In this paper, we build a dataset for multi-UAV multi-object tracking tasks to fill the gap. Several cameras are placed in the VICON motion capture system to simulate the UAV team, and several toy cars are employed to represent ground targets. The first-perspective videos from the cameras, the motion states of the cameras, and the ground truth of the objects are recorded. We also propose a metric to evaluate the performance of the multi-UAV multi-object tracking task. The dataset and the code for algorithm evaluation are available at our GitHub (https://github.com/bitshenwenxiao/MUMO).
{"title":"A Benchmark for Vision-based Multi-UAV Multi-object Tracking","authors":"Hao Shen, Xiwen Yang, D. Lin, Jianduo Chai, Jiakai Huo, Xiaofeng Xing, Shaoming He","doi":"10.1109/MFI55806.2022.9913874","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913874","url":null,"abstract":"Vision-based multi-sensor multi-object tracking is a fundamental task in the applications of a swarm of Unmanned Aerial Vehicles (UAVs). The benchmark datasets are critical to the development of computer vision research since they can provide a fair and principled way to evaluate various approaches and promote the improvement of corresponding algorithms. In recent years, many benchmarks have been created for single-camera single-object tracking, single-camera multi-object detection, and single-camera multi-object tracking scenarios. However, up to the best of our knowledge, few benchmarks of multi-camera multi-object tracking have been provided. In this paper, we build a dataset for multi-UAV multi-object tracking tasks to fill the gap. Several cameras are placed in the VICON motion capture system to simulate the UAV team, and several toy cars are employed to represent ground targets. The first-perspective videos from the cameras, the motion states of the cameras, and the ground truth of the objects are recorded. We also propose a metric to evaluate the performance of the multi-UAV multi-object tracking task. The dataset and the code for algorithm evaluation are available at our GitHub (https://github.com/bitshenwenxiao/MUMO).","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131445921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913841
Jiazheng Fu, Lei Chai, Boxiang Zhang, Wei Yi
Compared with the probability hypothesis density (PHD) filter for sets of targets, the trajectory probability hypothesis density (TPHD) filter can estimate the sets of trajectories in a principle way and has better target tracking performance. This paper aims at extending the TPHD filter to distributed multitarget tracking (MTT) for the multi-sensor system. However, in the trajectory set based distributed fusion implementation, the trajectory state difference phenomenon makes the clustering and merging techniques unfeasible in trajectory state space. To address this problem, this paper studies the space decomposition of the TPHD and proposes a distributed MTT method based on the TPHD filter with the weighted arithmetic average (WAA) fusion rule. First, we prove the rationality of the space decomposition in the posterior density of the TPHD filter. Then, based on the proposed property, we derive the WAA fusion formulation of the TPHD filter by minimizing the weighted sum of Kullback-Leibler divergences (KLD) from local posterior densities, and develop the analytical Gaussian mixture (GM) implementation with the L-scan approximation. Numerical results demonstrate the efficacy of the proposed fusion method.
{"title":"Arithmetic Average Based Multi-sensor TPHD Filter for Distributed Multi-target Tracking","authors":"Jiazheng Fu, Lei Chai, Boxiang Zhang, Wei Yi","doi":"10.1109/MFI55806.2022.9913841","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913841","url":null,"abstract":"Compared with the probability hypothesis density (PHD) filter for sets of targets, the trajectory probability hypothesis density (TPHD) filter can estimate the sets of trajectories in a principle way and has better target tracking performance. This paper aims at extending the TPHD filter to distributed multitarget tracking (MTT) for the multi-sensor system. However, in the trajectory set based distributed fusion implementation, the trajectory state difference phenomenon makes the clustering and merging techniques unfeasible in trajectory state space. To address this problem, this paper studies the space decomposition of the TPHD and proposes a distributed MTT method based on the TPHD filter with the weighted arithmetic average (WAA) fusion rule. First, we prove the rationality of the space decomposition in the posterior density of the TPHD filter. Then, based on the proposed property, we derive the WAA fusion formulation of the TPHD filter by minimizing the weighted sum of Kullback-Leibler divergences (KLD) from local posterior densities, and develop the analytical Gaussian mixture (GM) implementation with the L-scan approximation. Numerical results demonstrate the efficacy of the proposed fusion method.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124502277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}