Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913868
Hu Cao, Boyang Peng, Linxuan Jia, Bin Li, Alois Knoll, Guang Chen
The rise of intelligent vision-based people detection and counting methods will have a significant impact on the future security and space management of intelligent buildings. The current deep learning-based people detection algorithm achieves state-of-the-art performance in images collected by standard cameras. Nevertheless, standard vision approaches do not perform well on fisheye cameras because they are not suitable for fisheye images with radial geometry and barrel distortion. Overhead fisheye cameras can provide a larger field of view compared to standard cameras in people detection and counting tasks. In this paper, we propose an orientation-aware people detection and counting method based on an overhead fisheye camera. Specifically, an orientation-aware deep convolutional neural network with simultaneous attention refinement module (SARM) is introduced for people detection in arbitrary directions. Based on the attention mechanism, SARM can suppress the noise feature and highlight the object feature to improve the context focusing ability of the network on the people with different poses and orientations. Following the collection of detection results, an Internet of Things (IoT) system based on Real Time Streaming Protocol (RTSP) is constructed to output results to different devices. Experiments on three common fisheye image datasets show that under low light conditions, our method has high generalization ability and outperforms the state-of-the-art methods.
基于智能视觉的人员检测和计数方法的兴起,将对未来智能建筑的安全和空间管理产生重大影响。目前基于深度学习的人物检测算法在标准相机采集的图像中达到了最先进的性能。然而,标准视觉方法在鱼眼相机上表现不佳,因为它们不适合具有径向几何形状和桶形失真的鱼眼图像。在人员检测和计数任务中,与标准摄像机相比,头顶的鱼眼摄像机可以提供更大的视野。本文提出了一种基于架空鱼眼摄像机的方位感知人群检测与计数方法。具体来说,提出了一种带有同步注意细化模块(SARM)的方向感知深度卷积神经网络,用于任意方向的人物检测。基于注意机制,SARM可以抑制噪声特征,突出目标特征,提高网络对不同姿态和方向的人的上下文聚焦能力。采集检测结果后,构建基于RTSP (Real Time Streaming Protocol)协议的物联网(IoT)系统,将检测结果输出到不同的设备。在三个常见的鱼眼图像数据集上进行的实验表明,在弱光条件下,我们的方法具有较高的泛化能力,优于现有的方法。
{"title":"Orientation-aware People Detection and Counting Method based on Overhead Fisheye Camera","authors":"Hu Cao, Boyang Peng, Linxuan Jia, Bin Li, Alois Knoll, Guang Chen","doi":"10.1109/MFI55806.2022.9913868","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913868","url":null,"abstract":"The rise of intelligent vision-based people detection and counting methods will have a significant impact on the future security and space management of intelligent buildings. The current deep learning-based people detection algorithm achieves state-of-the-art performance in images collected by standard cameras. Nevertheless, standard vision approaches do not perform well on fisheye cameras because they are not suitable for fisheye images with radial geometry and barrel distortion. Overhead fisheye cameras can provide a larger field of view compared to standard cameras in people detection and counting tasks. In this paper, we propose an orientation-aware people detection and counting method based on an overhead fisheye camera. Specifically, an orientation-aware deep convolutional neural network with simultaneous attention refinement module (SARM) is introduced for people detection in arbitrary directions. Based on the attention mechanism, SARM can suppress the noise feature and highlight the object feature to improve the context focusing ability of the network on the people with different poses and orientations. Following the collection of detection results, an Internet of Things (IoT) system based on Real Time Streaming Protocol (RTSP) is constructed to output results to different devices. Experiments on three common fisheye image datasets show that under low light conditions, our method has high generalization ability and outperforms the state-of-the-art methods.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131171364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913869
Devanand Palur Palanivelu, M. Oispuu, W. Koch
This paper introduces a novel subspace-based Direct Position Determination (DPD) approach named Multipath-DPD to localize the source passively from the raw array data representing the direct signals and the first-order reflections in a multipath environment. The multipath propagation is modeled based on the Image-Source Method (ISM). This method takes advantage of the known urban environment and overcomes ambiguity issues, appropriate modeling and assignment of bearing measurements inherent in bearing-based localization approaches. Simulation results show that the proposed Multipath-DPD outperforms the classical DPD in the considered scenarios and demonstrates an asymptotic behavior to the derived Cramér-Rao Bound (CRB).
{"title":"Direct Position Determination Using Direct Signals and First-Order Reflections by Exploiting the Multipath Environment","authors":"Devanand Palur Palanivelu, M. Oispuu, W. Koch","doi":"10.1109/MFI55806.2022.9913869","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913869","url":null,"abstract":"This paper introduces a novel subspace-based Direct Position Determination (DPD) approach named Multipath-DPD to localize the source passively from the raw array data representing the direct signals and the first-order reflections in a multipath environment. The multipath propagation is modeled based on the Image-Source Method (ISM). This method takes advantage of the known urban environment and overcomes ambiguity issues, appropriate modeling and assignment of bearing measurements inherent in bearing-based localization approaches. Simulation results show that the proposed Multipath-DPD outperforms the classical DPD in the considered scenarios and demonstrates an asymptotic behavior to the derived Cramér-Rao Bound (CRB).","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131363696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913846
Curie Kim, Ue-Hwan Kim, Jong-Hwan Kim
There have been attempts to detect 3D objects by fusion of stereo camera images and LiDAR sensor data or using LiDAR for pre-training and only monocular images for testing, but there have been less attempts to use only monocular image sequences due to low accuracy. In addition, when depth prediction using only monocular images, only scale-inconsistent depth can be predicted, which is the reason why researchers are reluctant to use monocular images alone.Therefore, we propose a method for predicting absolute depth and detecting 3D objects using only monocular image sequences by enabling end-to-end learning of detection networks and depth prediction networks. As a result, the proposed method surpasses other existing methods in performance on the KITTI 3D dataset. Even when monocular image and 3D LiDAR are used together during training in an attempt to improve performance, ours exhibit is the best performance compared to other methods using the same input. In addition, end-to-end learning not only improves depth prediction performance, but also enables absolute depth prediction, because our network utilizes the fact that the size of a 3D object such as a car is determined by the approximate size.
{"title":"Self-supervised 3D Object Detection from Monocular Pseudo-LiDAR","authors":"Curie Kim, Ue-Hwan Kim, Jong-Hwan Kim","doi":"10.1109/MFI55806.2022.9913846","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913846","url":null,"abstract":"There have been attempts to detect 3D objects by fusion of stereo camera images and LiDAR sensor data or using LiDAR for pre-training and only monocular images for testing, but there have been less attempts to use only monocular image sequences due to low accuracy. In addition, when depth prediction using only monocular images, only scale-inconsistent depth can be predicted, which is the reason why researchers are reluctant to use monocular images alone.Therefore, we propose a method for predicting absolute depth and detecting 3D objects using only monocular image sequences by enabling end-to-end learning of detection networks and depth prediction networks. As a result, the proposed method surpasses other existing methods in performance on the KITTI 3D dataset. Even when monocular image and 3D LiDAR are used together during training in an attempt to improve performance, ours exhibit is the best performance compared to other methods using the same input. In addition, end-to-end learning not only improves depth prediction performance, but also enables absolute depth prediction, because our network utilizes the fact that the size of a 3D object such as a car is determined by the approximate size.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126467143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913873
Mark Niemeyer, Sebastian Pütz, J. Hertzberg
The large amount of high resolution sensor data, both temporal and spatial, that autonomous mobile robots collect in today’s systems requires structured and efficient management and storage during the robot mission. In response, we present SEEREP: A Spatio-Temporal-Semantic Environment Representation for Autonomous Mobile Robots. SEEREP handles various types of data at once and provides an efficient query interface for all three modalities that can be combined for high-level analyses. It supports common robotic sensor data types such as images and point clouds, as well as sensor and robot coordinate frames changing over time. Furthermore, SEEREP provides an efficient HDF5-based storage system running on the robot during operation, compatible with ROS and the corresponding sensor message definitions. The compressed HDF5 data backend can be transferred efficiently to an application server with a running SEEREP query server providing gRPC interfaces with Protobuf and Flattbuffer message types. The query server can support high-level planning and reasoning systems in e.g. agricultural environments, or other partially unstructured environments that change over time. In this paper we show that SEEREP is much better suited for these tasks than a traditional GIS, which cannot handle the different types of robotic sensor data.
{"title":"A Spatio-Temporal-Semantic Environment Representation for Autonomous Mobile Robots equipped with various Sensor Systems","authors":"Mark Niemeyer, Sebastian Pütz, J. Hertzberg","doi":"10.1109/MFI55806.2022.9913873","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913873","url":null,"abstract":"The large amount of high resolution sensor data, both temporal and spatial, that autonomous mobile robots collect in today’s systems requires structured and efficient management and storage during the robot mission. In response, we present SEEREP: A Spatio-Temporal-Semantic Environment Representation for Autonomous Mobile Robots. SEEREP handles various types of data at once and provides an efficient query interface for all three modalities that can be combined for high-level analyses. It supports common robotic sensor data types such as images and point clouds, as well as sensor and robot coordinate frames changing over time. Furthermore, SEEREP provides an efficient HDF5-based storage system running on the robot during operation, compatible with ROS and the corresponding sensor message definitions. The compressed HDF5 data backend can be transferred efficiently to an application server with a running SEEREP query server providing gRPC interfaces with Protobuf and Flattbuffer message types. The query server can support high-level planning and reasoning systems in e.g. agricultural environments, or other partially unstructured environments that change over time. In this paper we show that SEEREP is much better suited for these tasks than a traditional GIS, which cannot handle the different types of robotic sensor data.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128819491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913847
Daniel Frisch, Kailai Li, U. Hanebeck
We present a novel algorithm for optimal sensor placement in multilateration problems. Our goal is to design a sensor network that achieves optimal localization accuracy anywhere in the covered region. We consider the discrete placement problem, where the possible locations of the sensors are selected from a discrete set. Thus, we obtain a combinatorial optimization problem instead of a continuous one. While at first, combinatorial optimization sounds like more effort, we present an algorithm that finds a globally optimal solution surprisingly quickly.
{"title":"Optimal Sensor Placement for Multilateration Using Alternating Greedy Removal and Placement","authors":"Daniel Frisch, Kailai Li, U. Hanebeck","doi":"10.1109/MFI55806.2022.9913847","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913847","url":null,"abstract":"We present a novel algorithm for optimal sensor placement in multilateration problems. Our goal is to design a sensor network that achieves optimal localization accuracy anywhere in the covered region. We consider the discrete placement problem, where the possible locations of the sensors are selected from a discrete set. Thus, we obtain a combinatorial optimization problem instead of a continuous one. While at first, combinatorial optimization sounds like more effort, we present an algorithm that finds a globally optimal solution surprisingly quickly.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133583671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913859
Michael Sieverts, Yoshihiro Obata, Mohammad Farhadmanesh, D. Sacharny, T. Henderson
A growing remote sensing network comprised of consumer dashcams presents Departments of Transportation (DOTs) worldwide with opportunities to dramatically reduce the costs and effort associated with monitoring and maintaining hundreds of thousands of sign assets on public roadways. However, many technical challenges confront the applications and technologies that will enable this transformation of roadway maintenance. This paper highlights an efficient approach to the problem of detection and classification of more than 600 classes of traffic signs in the United States, as defined in the Manual on Uniform Traffic Control Devices (MUTCD). Given the variability of specifications and the quality of images and metadata collected from consumer dashcams, a deep learning approach offers an efficient development tool to small organizations that want to leverage this data type for detection and classification. This paper presents a two-step process, a detection network that locates signs in dashcam images and a classification network that first extracts the bounding box from the previous detection to assign a specific sign class from over 600 classes of signs. The detection network is trained using labeled data from dashcams in Nashville, Tennessee, and a combination of real and synthetic data is used to train the classification network. The architecture presented here was applied to real-world image data provided by the Utah Department of Transportation and Blyncsy, Inc., and achieved modest results (test accuracy of 0.47) with a relatively low development time.
{"title":"Automated Road Asset Data Collection and Classification using Consumer Dashcams","authors":"Michael Sieverts, Yoshihiro Obata, Mohammad Farhadmanesh, D. Sacharny, T. Henderson","doi":"10.1109/MFI55806.2022.9913859","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913859","url":null,"abstract":"A growing remote sensing network comprised of consumer dashcams presents Departments of Transportation (DOTs) worldwide with opportunities to dramatically reduce the costs and effort associated with monitoring and maintaining hundreds of thousands of sign assets on public roadways. However, many technical challenges confront the applications and technologies that will enable this transformation of roadway maintenance. This paper highlights an efficient approach to the problem of detection and classification of more than 600 classes of traffic signs in the United States, as defined in the Manual on Uniform Traffic Control Devices (MUTCD). Given the variability of specifications and the quality of images and metadata collected from consumer dashcams, a deep learning approach offers an efficient development tool to small organizations that want to leverage this data type for detection and classification. This paper presents a two-step process, a detection network that locates signs in dashcam images and a classification network that first extracts the bounding box from the previous detection to assign a specific sign class from over 600 classes of signs. The detection network is trained using labeled data from dashcams in Nashville, Tennessee, and a combination of real and synthetic data is used to train the classification network. The architecture presented here was applied to real-world image data provided by the Utah Department of Transportation and Blyncsy, Inc., and achieved modest results (test accuracy of 0.47) with a relatively low development time.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134204110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-02-11DOI: 10.1109/MFI55806.2022.9913860
Yingfeng Cai, Junqiao Zhao, Jiafeng Cui, Fenglin Zhang, Chen Ye, T. Feng
Visual Place Recognition (VPR) in areas with similar scenes such as urban or indoor scenarios is a major challenge. Existing VPR methods using global descriptors have difficulty capturing local specific region (LSR) in the scene and are therefore prone to localization confusion in such scenarios. As a result, finding the LSRs that are critical for location recognition becomes key. To address this challenge, we introduced Patch-NetVLAD+, which was inspired by patch-based VPR researches. Our method proposed a fine-tuning strategy with triplet loss to make NetVLAD suitable for extracting patch-level descriptors. Moreover, unlike existing methods that treat all patches in an image equally, our method extracts patches of LSR, which present less frequently throughout the dataset, and makes them play an important role in VPR by assigning proper weights to them. Experiments on Pittsburgh30k and Tokyo247 datasets show that our approach achieved up to 9.3% performance improvement than existing patch-based methods.
{"title":"Patch-NetVLAD+: Learned patch descriptor and weighted matching strategy for place recognition","authors":"Yingfeng Cai, Junqiao Zhao, Jiafeng Cui, Fenglin Zhang, Chen Ye, T. Feng","doi":"10.1109/MFI55806.2022.9913860","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913860","url":null,"abstract":"Visual Place Recognition (VPR) in areas with similar scenes such as urban or indoor scenarios is a major challenge. Existing VPR methods using global descriptors have difficulty capturing local specific region (LSR) in the scene and are therefore prone to localization confusion in such scenarios. As a result, finding the LSRs that are critical for location recognition becomes key. To address this challenge, we introduced Patch-NetVLAD+, which was inspired by patch-based VPR researches. Our method proposed a fine-tuning strategy with triplet loss to make NetVLAD suitable for extracting patch-level descriptors. Moreover, unlike existing methods that treat all patches in an image equally, our method extracts patches of LSR, which present less frequently throughout the dataset, and makes them play an important role in VPR by assigning proper weights to them. Experiments on Pittsburgh30k and Tokyo247 datasets show that our approach achieved up to 9.3% performance improvement than existing patch-based methods.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121461956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-27DOI: 10.1109/MFI55806.2022.9913850
Jiafeng Cui, Teng Huang, Yingfeng Cai, Junqiao Zhao, Lu Xiong, Zhuoping Yu
LiDAR-based place recognition is an essential and challenging task both in loop closure detection and global relocalization. We propose Deep Scan Context (DSC), a general and discriminative global descriptor that captures the relationship among segments of a point cloud. Unlike previous methods that utilize either semantics or a sequence of adjacent point clouds for better place recognition, we only use the raw point clouds to get competitive results. Concretely, we first segment the point cloud egocentrically to divide the point cloud into several segments and extract the features of the segments from both spatial distribution and shape differences. Then, we introduce a graph neural network to aggregate these features into an embedding representation. Extensive experiments conducted on the KITTI dataset show that DSC is robust to scene variants and outperforms existing methods.
{"title":"DSC: Deep Scan Context Descriptor for Large-Scale Place Recognition","authors":"Jiafeng Cui, Teng Huang, Yingfeng Cai, Junqiao Zhao, Lu Xiong, Zhuoping Yu","doi":"10.1109/MFI55806.2022.9913850","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913850","url":null,"abstract":"LiDAR-based place recognition is an essential and challenging task both in loop closure detection and global relocalization. We propose Deep Scan Context (DSC), a general and discriminative global descriptor that captures the relationship among segments of a point cloud. Unlike previous methods that utilize either semantics or a sequence of adjacent point clouds for better place recognition, we only use the raw point clouds to get competitive results. Concretely, we first segment the point cloud egocentrically to divide the point cloud into several segments and extract the features of the segments from both spatial distribution and shape differences. Then, we introduce a graph neural network to aggregate these features into an embedding representation. Extensive experiments conducted on the KITTI dataset show that DSC is robust to scene variants and outperforms existing methods.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131380827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}