Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913877
Adeyemi Osigbesan, Solene Barrat, Harkeerat Singh, Dongzi Xia, Siddharth Singh, Yang Xing, Weisi Guo, A. Tsourdos
Fall-related injuries at the workplace account for a fair percentage of the global accident at work claims according to Health and Safety Executive (HSE). With a significant percentage of these being fatal, industrial and maintenance workshops have great potential for injuries that can be associated with slips, trips, and other types of falls, owing to their characteristic fast-paced workspaces. Typically, the short turnaround time expected for aircraft undergoing maintenance increases the risk of workers falling, and thus makes a good case for the study of more contemporary methods for the detection of work-related falls in the aircraft maintenance environment. Advanced development in human pose estimation using computer vision technology has made it possible to automate real-time detection and classification of human actions by analyzing body part motion and position relative to time. This paper attempts to combine the analysis of body silhouette bounding box with body joint position estimation to detect and categorize in real-time, human motion captured in continuous video feeds into a fall or a non-fall event. We proposed a standard wide-angle camera, installed at a diagonal ceiling position in an aircraft hangar for our visual data input, and a three-dimensional convolutional neural network with Long Short-Term Memory (LSTM) layers using a technique we referred to as Region Key point (Reg-Key) repartitioning for visual pose estimation and fall detection.
{"title":"Vision-based Fall Detection in Aircraft Maintenance Environment with Pose Estimation","authors":"Adeyemi Osigbesan, Solene Barrat, Harkeerat Singh, Dongzi Xia, Siddharth Singh, Yang Xing, Weisi Guo, A. Tsourdos","doi":"10.1109/MFI55806.2022.9913877","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913877","url":null,"abstract":"Fall-related injuries at the workplace account for a fair percentage of the global accident at work claims according to Health and Safety Executive (HSE). With a significant percentage of these being fatal, industrial and maintenance workshops have great potential for injuries that can be associated with slips, trips, and other types of falls, owing to their characteristic fast-paced workspaces. Typically, the short turnaround time expected for aircraft undergoing maintenance increases the risk of workers falling, and thus makes a good case for the study of more contemporary methods for the detection of work-related falls in the aircraft maintenance environment. Advanced development in human pose estimation using computer vision technology has made it possible to automate real-time detection and classification of human actions by analyzing body part motion and position relative to time. This paper attempts to combine the analysis of body silhouette bounding box with body joint position estimation to detect and categorize in real-time, human motion captured in continuous video feeds into a fall or a non-fall event. We proposed a standard wide-angle camera, installed at a diagonal ceiling position in an aircraft hangar for our visual data input, and a three-dimensional convolutional neural network with Long Short-Term Memory (LSTM) layers using a technique we referred to as Region Key point (Reg-Key) repartitioning for visual pose estimation and fall detection.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124423667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For large-scale multi-agent systems (MAS), ensuring the safety and effectiveness of navigation in complicated scenarios is a challenging task. With the agent scale increasing, most existing centralized methods lose their magic for the lack of scalability, and the popular decentralized approaches are hampered by high latency and computing requirements. In this research, we offer PIPO, a novel policy optimization algorithm for decentralized MAS navigation with permutation-invariant constraints. To conduct navigation and avoid un-necessary exploration in the early episodes, we first defined a guide-policy. Then, we introduce the permutation invariant property in decentralized multi-agent systems and leverage the graph convolution network to produce the same output under shuffled observations. Our approach can be easily scaled to an arbitrary number of agents and used in large-scale systems for its decentralized training and execution. We also provide extensive experiments to demonstrate that our PIPO significantly outperforms the baselines of multi-agent reinforcement learning algorithms and other leading methods in variant scenarios.
{"title":"PIPO: Policy Optimization with Permutation-Invariant Constraint for Distributed Multi-Robot Navigation","authors":"Ruiqi Zhang, Guang Chen, Jing Hou, Zhijun Li, Alois Knoll","doi":"10.1109/MFI55806.2022.9913862","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913862","url":null,"abstract":"For large-scale multi-agent systems (MAS), ensuring the safety and effectiveness of navigation in complicated scenarios is a challenging task. With the agent scale increasing, most existing centralized methods lose their magic for the lack of scalability, and the popular decentralized approaches are hampered by high latency and computing requirements. In this research, we offer PIPO, a novel policy optimization algorithm for decentralized MAS navigation with permutation-invariant constraints. To conduct navigation and avoid un-necessary exploration in the early episodes, we first defined a guide-policy. Then, we introduce the permutation invariant property in decentralized multi-agent systems and leverage the graph convolution network to produce the same output under shuffled observations. Our approach can be easily scaled to an arbitrary number of agents and used in large-scale systems for its decentralized training and execution. We also provide extensive experiments to demonstrate that our PIPO significantly outperforms the baselines of multi-agent reinforcement learning algorithms and other leading methods in variant scenarios.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115808769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913876
Zhifei Li, Hongyan Wang, Shi Yan, Hongxia Zou, Mingyang Du
This work is concerned with the distributed extended object tracking system over a realistic network, where all nodes are required to achieve consensus on both the extent and kinematics. To this end, we first exploit an aligned velocity model to establish a tight relation between the orientation and velocity vector. Then, we use the moment-matching method to give two separate models to match the information filter (IF) framework. Later, we resort to the two models to propose a centralized IF and extend it to the distributed scenario based on the embedded alternating direction method of multipliers (ADMM) technique. To keep an agreement between nodes, an optimization function is given, followed by a consensus-based constraint. Numerical simulation together with theoretical analysis verifies the convergence and consensus of the proposed filter.
{"title":"Distributed Extended Object Tracking Filter Through Embedded ADMM Technique","authors":"Zhifei Li, Hongyan Wang, Shi Yan, Hongxia Zou, Mingyang Du","doi":"10.1109/MFI55806.2022.9913876","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913876","url":null,"abstract":"This work is concerned with the distributed extended object tracking system over a realistic network, where all nodes are required to achieve consensus on both the extent and kinematics. To this end, we first exploit an aligned velocity model to establish a tight relation between the orientation and velocity vector. Then, we use the moment-matching method to give two separate models to match the information filter (IF) framework. Later, we resort to the two models to propose a centralized IF and extend it to the distributed scenario based on the embedded alternating direction method of multipliers (ADMM) technique. To keep an agreement between nodes, an optimization function is given, followed by a consensus-based constraint. Numerical simulation together with theoretical analysis verifies the convergence and consensus of the proposed filter.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122676957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913840
Marko Ristic, B. Noack
State estimate fusion is a common requirement in distributed sensor networks and can be complicated by untrusted participants or network eavesdroppers. We present a method for computing the common Fast Covariance Intersection fusion algorithm on an untrusted cloud without disclosing individual estimates or the fused result. In an existing solution to this problem, fusion weights corresponding to estimate errors are leaked to the cloud to perform the fusion. In this work, we present a method that guarantees no data identifying estimators or their estimated values is leaked to the cloud by requiring an additional computation step by the party querying the cloud for the fused result. The Paillier encryption scheme is used to homomorphically compute separate parts of the computation that can be combined after decryption. This encrypted Fast Covariance Intersection algorithm can be used in scenarios where the fusing cloud is not trusted and any information on estimator performances must remain confidential.
{"title":"Encrypted Fast Covariance Intersection Without Leaking Fusion Weights","authors":"Marko Ristic, B. Noack","doi":"10.1109/MFI55806.2022.9913840","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913840","url":null,"abstract":"State estimate fusion is a common requirement in distributed sensor networks and can be complicated by untrusted participants or network eavesdroppers. We present a method for computing the common Fast Covariance Intersection fusion algorithm on an untrusted cloud without disclosing individual estimates or the fused result. In an existing solution to this problem, fusion weights corresponding to estimate errors are leaked to the cloud to perform the fusion. In this work, we present a method that guarantees no data identifying estimators or their estimated values is leaked to the cloud by requiring an additional computation step by the party querying the cloud for the fused result. The Paillier encryption scheme is used to homomorphically compute separate parts of the computation that can be combined after decryption. This encrypted Fast Covariance Intersection algorithm can be used in scenarios where the fusing cloud is not trusted and any information on estimator performances must remain confidential.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128416574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Driver distraction is one of the important factors leading to traffic accidents. With the development of mobile infotainment and the overestimation of immature autonomous driving technology, this phenomenon has become more and more serious. However, most existing distraction detection algorithms can not achieve satisfactory performance due to the complex in-cabin light condition and limited computing resource of edge devices. To this end, we introduce a light weight and flexible event-based system to monitor driver state. Compared with frame-based camera, the event camera responds to pixel wise light intensity changes asynchronously and has several promising advantages, including high dynamic range, high temporal resolution, low latency and low data redundant, which makes it suitable for the mobile terminal applications. The system first denoises the events stream and encode it into a sequence of 3D tensors. Then, the voxel features at different time steps are extracted using efficient net and fed into LSTM to establish temporal model, based on which, the driver distraction is detected. In addition, we extend the proposed architecture to recognise driver action and adopt transfer learning strategy to improve the detection performance. Extensive experiments are conducted on both simulated dataset (transform from Drive&Act) and real event dataset (collected by ourselves). The experimental results shows the advantages of the system on accuracy and efficient for driver state monitoring.
{"title":"Event-based Driver Distraction Detection and Action Recognition","authors":"Chu Yang, Peigen Liu, Guang Chen, Zhengfa Liu, Ya Wu, Alois Knoll","doi":"10.1109/MFI55806.2022.9913871","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913871","url":null,"abstract":"Driver distraction is one of the important factors leading to traffic accidents. With the development of mobile infotainment and the overestimation of immature autonomous driving technology, this phenomenon has become more and more serious. However, most existing distraction detection algorithms can not achieve satisfactory performance due to the complex in-cabin light condition and limited computing resource of edge devices. To this end, we introduce a light weight and flexible event-based system to monitor driver state. Compared with frame-based camera, the event camera responds to pixel wise light intensity changes asynchronously and has several promising advantages, including high dynamic range, high temporal resolution, low latency and low data redundant, which makes it suitable for the mobile terminal applications. The system first denoises the events stream and encode it into a sequence of 3D tensors. Then, the voxel features at different time steps are extracted using efficient net and fed into LSTM to establish temporal model, based on which, the driver distraction is detected. In addition, we extend the proposed architecture to recognise driver action and adopt transfer learning strategy to improve the detection performance. Extensive experiments are conducted on both simulated dataset (transform from Drive&Act) and real event dataset (collected by ourselves). The experimental results shows the advantages of the system on accuracy and efficient for driver state monitoring.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128156644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benefiting from the extremely low latency, events have been used for Structured Light Imaging (SLI) to predict the depth surface. However, existing methods only focus on improving scanning speeds but neglect perturbations from event noise and timestamp jittering for depth estimation. In this paper, we build a hybrid SLI system equipped with an event camera, a high-resolution frame camera, and a digital light projector, where a single intensity frame is adopted as a guidance to enhance the event-based SLI quality. To achieve this end, we propose a Multi-Modal Feature Fusion Network (MFFN) consisting of a feature fusion module and an upscale module to simultaneously fuse events and a single intensity frame, suppress event perturbations, and reconstruct a high-quality depth surface. Further, for training MFFN, we build a new Structured Light Imaging based on Event and Frame cameras (EF-SLI) dataset collected from the hybrid SLI system, containing paired inputs composed of a set of synchronized events and one single corresponding frame, and ground-truth references obtained by a high-quality SLI approach. Experiments demonstrate that our proposed MFFN outperforms state-of-the-art event-based SLI approaches in terms of accuracy at different scanning speeds.
{"title":"Enhancing Event-based Structured Light Imaging with a Single Frame","authors":"Huijiao Wang, Tangbo Liu, Chu He, Cheng Li, Jian-zhuo Liu, Lei Yu","doi":"10.1109/MFI55806.2022.9913845","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913845","url":null,"abstract":"Benefiting from the extremely low latency, events have been used for Structured Light Imaging (SLI) to predict the depth surface. However, existing methods only focus on improving scanning speeds but neglect perturbations from event noise and timestamp jittering for depth estimation. In this paper, we build a hybrid SLI system equipped with an event camera, a high-resolution frame camera, and a digital light projector, where a single intensity frame is adopted as a guidance to enhance the event-based SLI quality. To achieve this end, we propose a Multi-Modal Feature Fusion Network (MFFN) consisting of a feature fusion module and an upscale module to simultaneously fuse events and a single intensity frame, suppress event perturbations, and reconstruct a high-quality depth surface. Further, for training MFFN, we build a new Structured Light Imaging based on Event and Frame cameras (EF-SLI) dataset collected from the hybrid SLI system, containing paired inputs composed of a set of synchronized events and one single corresponding frame, and ground-truth references obtained by a high-quality SLI approach. Experiments demonstrate that our proposed MFFN outperforms state-of-the-art event-based SLI approaches in terms of accuracy at different scanning speeds.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"57 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116834161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913863
Yujun Liao, Y. Wu, Y. Mo, Feilin Liu, Yufei He, Junqiao Zhao
To promote the development of object detection in a more realistic world, efforts have been made to a new task named open-set object detection. This task aims to increase the model’s ability to recognize unknown classes. In this work, we propose a novel dynamic self-labeling algorithm, named UPC-Faster-RCNN. The wisdom of DBSCAN is applied to build our unknown proposal clustering algorithm, which aims to filter and cluster the unknown objects proposals. An effective dynamic self-labeling algorithm is proposed to generate high-quality pseudo labels from clustered proposals. We evaluate UPC-Faster-RCNN on a composite dataset of PASCAL VOC and COCO. The extensive experiments show that UPC-Faster-RCNN effectively increases the ability upon Faster-RCNN baseline to detect unknown target, while keeping the ability to detect known targets. Specifically, UPC-Faster-RCNN decreases the WI by 23.8%, decreases the A-OSE by 6542, and slightly increase the mAP in known classes by 0.3%.
{"title":"UPC-Faster-RCNN: A Dynamic Self-Labeling Algorithm for Open-Set Object Detection Based on Unknown Proposal Clustering","authors":"Yujun Liao, Y. Wu, Y. Mo, Feilin Liu, Yufei He, Junqiao Zhao","doi":"10.1109/MFI55806.2022.9913863","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913863","url":null,"abstract":"To promote the development of object detection in a more realistic world, efforts have been made to a new task named open-set object detection. This task aims to increase the model’s ability to recognize unknown classes. In this work, we propose a novel dynamic self-labeling algorithm, named UPC-Faster-RCNN. The wisdom of DBSCAN is applied to build our unknown proposal clustering algorithm, which aims to filter and cluster the unknown objects proposals. An effective dynamic self-labeling algorithm is proposed to generate high-quality pseudo labels from clustered proposals. We evaluate UPC-Faster-RCNN on a composite dataset of PASCAL VOC and COCO. The extensive experiments show that UPC-Faster-RCNN effectively increases the ability upon Faster-RCNN baseline to detect unknown target, while keeping the ability to detect known targets. Specifically, UPC-Faster-RCNN decreases the WI by 23.8%, decreases the A-OSE by 6542, and slightly increase the mAP in known classes by 0.3%.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126254691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913872
Kaipei Yang, Y. Bar-Shalom
In distributed sensor fusion systems, each of the local sensors has its own tracker processing local measurements for measurement-to-track association and state estimation. Only the processed data, local tracks (LT) comprising state vector estimates and their covariance matrices are transmitted to the fusion center (FC). In this work, a multi-object hybrid probabilistic information matrix fusion (MO-HPIMF) is derived taking into account all association hypotheses. In MO-HPIMF, the association carried out is between the FC track states (prediction) and the LT state estimates from local sensors. When having a large number of objects and sensors in fusion, only the m-best FC-track-to-LT association hypotheses should be incorporated in MO-HPIMF to reduce the computational complexity. A Sequential m-best 2-D method is used for solving the multidimensional assignment problem in this work. It is shown in the simulations that MO-HPIMF can successfully track all targets of interest and is superior to track-to-track fusion (T2TF, a commonly used approach in distributed sensor fusion system) which relies on hard association decisions.
在分布式传感器融合系统中,每个本地传感器都有自己的跟踪器处理本地测量,用于测量-跟踪关联和状态估计。只有经过处理的数据、包含状态向量估计及其协方差矩阵的局部轨迹(LT)才被传输到融合中心(FC)。本文提出了一种考虑所有关联假设的多目标混合概率信息矩阵融合(MO-HPIMF)。在MO-HPIMF中,FC轨迹状态(预测)与来自局部传感器的LT状态估计之间进行了关联。当融合对象和传感器数量较大时,MO-HPIMF中只应采用m-best fc -track- lt关联假设,以降低计算复杂度。本文采用序列m-最优二维方法求解多维分配问题。仿真结果表明,MO-HPIMF可以成功地跟踪所有感兴趣的目标,并且优于依赖于硬关联决策的分布式传感器融合系统中常用的航迹到航迹融合(T2TF)。
{"title":"Probabilistic Information Matrix Fusion in a Multi-Object Environment","authors":"Kaipei Yang, Y. Bar-Shalom","doi":"10.1109/MFI55806.2022.9913872","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913872","url":null,"abstract":"In distributed sensor fusion systems, each of the local sensors has its own tracker processing local measurements for measurement-to-track association and state estimation. Only the processed data, local tracks (LT) comprising state vector estimates and their covariance matrices are transmitted to the fusion center (FC). In this work, a multi-object hybrid probabilistic information matrix fusion (MO-HPIMF) is derived taking into account all association hypotheses. In MO-HPIMF, the association carried out is between the FC track states (prediction) and the LT state estimates from local sensors. When having a large number of objects and sensors in fusion, only the m-best FC-track-to-LT association hypotheses should be incorporated in MO-HPIMF to reduce the computational complexity. A Sequential m-best 2-D method is used for solving the multidimensional assignment problem in this work. It is shown in the simulations that MO-HPIMF can successfully track all targets of interest and is superior to track-to-track fusion (T2TF, a commonly used approach in distributed sensor fusion system) which relies on hard association decisions.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129456474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913855
Yi Li, Yang Sun, S. M. Naqvi
In self-supervised learning-based single-channel speech denoising problem, it is challenging to reduce the gap between the denoising performance on the estimated and target speech signals with existed pre-tasks. In this paper, we propose a multi-task pre-training method to improve the speech denoising performance within self-supervised learning. In the proposed pre-training autoencoder (PAE), only a very limited set of unpaired and unseen clean speech signals are required to learn speech latent representations. Meanwhile, to solve the limitation of existing single pre-task, the proposed masking module exploits the dereverberated mask and estimated ratio mask to denoise the mixture as the new pre-task. The downstream task autoencoder (DAE) utilizes unlabeled and unseen reverberant mixtures to generate the estimated mixtures. The DAE is trained to share a latent representation with the clean examples from the learned representation in the PAE. Experimental results on a benchmark dataset demonstrate that the proposed method outperforms the state-of-the-art approaches.
{"title":"Self-Supervised Learning and Multi-Task Pre-Training Based Single-Channel Acoustic Denoising","authors":"Yi Li, Yang Sun, S. M. Naqvi","doi":"10.1109/MFI55806.2022.9913855","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913855","url":null,"abstract":"In self-supervised learning-based single-channel speech denoising problem, it is challenging to reduce the gap between the denoising performance on the estimated and target speech signals with existed pre-tasks. In this paper, we propose a multi-task pre-training method to improve the speech denoising performance within self-supervised learning. In the proposed pre-training autoencoder (PAE), only a very limited set of unpaired and unseen clean speech signals are required to learn speech latent representations. Meanwhile, to solve the limitation of existing single pre-task, the proposed masking module exploits the dereverberated mask and estimated ratio mask to denoise the mixture as the new pre-task. The downstream task autoencoder (DAE) utilizes unlabeled and unseen reverberant mixtures to generate the estimated mixtures. The DAE is trained to share a latent representation with the clean examples from the learned representation in the PAE. Experimental results on a benchmark dataset demonstrate that the proposed method outperforms the state-of-the-art approaches.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130250025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-20DOI: 10.1109/MFI55806.2022.9913873
Mark Niemeyer, Sebastian Pütz, J. Hertzberg
The large amount of high resolution sensor data, both temporal and spatial, that autonomous mobile robots collect in today’s systems requires structured and efficient management and storage during the robot mission. In response, we present SEEREP: A Spatio-Temporal-Semantic Environment Representation for Autonomous Mobile Robots. SEEREP handles various types of data at once and provides an efficient query interface for all three modalities that can be combined for high-level analyses. It supports common robotic sensor data types such as images and point clouds, as well as sensor and robot coordinate frames changing over time. Furthermore, SEEREP provides an efficient HDF5-based storage system running on the robot during operation, compatible with ROS and the corresponding sensor message definitions. The compressed HDF5 data backend can be transferred efficiently to an application server with a running SEEREP query server providing gRPC interfaces with Protobuf and Flattbuffer message types. The query server can support high-level planning and reasoning systems in e.g. agricultural environments, or other partially unstructured environments that change over time. In this paper we show that SEEREP is much better suited for these tasks than a traditional GIS, which cannot handle the different types of robotic sensor data.
{"title":"A Spatio-Temporal-Semantic Environment Representation for Autonomous Mobile Robots equipped with various Sensor Systems","authors":"Mark Niemeyer, Sebastian Pütz, J. Hertzberg","doi":"10.1109/MFI55806.2022.9913873","DOIUrl":"https://doi.org/10.1109/MFI55806.2022.9913873","url":null,"abstract":"The large amount of high resolution sensor data, both temporal and spatial, that autonomous mobile robots collect in today’s systems requires structured and efficient management and storage during the robot mission. In response, we present SEEREP: A Spatio-Temporal-Semantic Environment Representation for Autonomous Mobile Robots. SEEREP handles various types of data at once and provides an efficient query interface for all three modalities that can be combined for high-level analyses. It supports common robotic sensor data types such as images and point clouds, as well as sensor and robot coordinate frames changing over time. Furthermore, SEEREP provides an efficient HDF5-based storage system running on the robot during operation, compatible with ROS and the corresponding sensor message definitions. The compressed HDF5 data backend can be transferred efficiently to an application server with a running SEEREP query server providing gRPC interfaces with Protobuf and Flattbuffer message types. The query server can support high-level planning and reasoning systems in e.g. agricultural environments, or other partially unstructured environments that change over time. In this paper we show that SEEREP is much better suited for these tasks than a traditional GIS, which cannot handle the different types of robotic sensor data.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128819491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}