Daniel Flögel, Marcos Gómez Villafañe, Joshua Ransiek, Sören Hohmann
{"title":"Disentangling Uncertainty for Safe Social Navigation using Deep Reinforcement Learning","authors":"Daniel Flögel, Marcos Gómez Villafañe, Joshua Ransiek, Sören Hohmann","doi":"arxiv-2409.10655","DOIUrl":null,"url":null,"abstract":"Autonomous mobile robots are increasingly employed in pedestrian-rich\nenvironments where safe navigation and appropriate human interaction are\ncrucial. While Deep Reinforcement Learning (DRL) enables socially integrated\nrobot behavior, challenges persist in novel or perturbed scenarios to indicate\nwhen and why the policy is uncertain. Unknown uncertainty in decision-making\ncan lead to collisions or human discomfort and is one reason why safe and\nrisk-aware navigation is still an open problem. This work introduces a novel\napproach that integrates aleatoric, epistemic, and predictive uncertainty\nestimation into a DRL-based navigation framework for uncertainty estimates in\ndecision-making. We, therefore, incorporate Observation-Dependent Variance\n(ODV) and dropout into the Proximal Policy Optimization (PPO) algorithm. For\ndifferent types of perturbations, we compare the ability of Deep Ensembles and\nMonte-Carlo Dropout (MC-Dropout) to estimate the uncertainties of the policy.\nIn uncertain decision-making situations, we propose to change the robot's\nsocial behavior to conservative collision avoidance. The results show that the\nODV-PPO algorithm converges faster with better generalization and disentangles\nthe aleatoric and epistemic uncertainties. In addition, the MC-Dropout approach\nis more sensitive to perturbations and capable to correlate the uncertainty\ntype to the perturbation type better. With the proposed safe action selection\nscheme, the robot can navigate in perturbed environments with fewer collisions.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"47 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10655","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Autonomous mobile robots are increasingly employed in pedestrian-rich
environments where safe navigation and appropriate human interaction are
crucial. While Deep Reinforcement Learning (DRL) enables socially integrated
robot behavior, challenges persist in novel or perturbed scenarios to indicate
when and why the policy is uncertain. Unknown uncertainty in decision-making
can lead to collisions or human discomfort and is one reason why safe and
risk-aware navigation is still an open problem. This work introduces a novel
approach that integrates aleatoric, epistemic, and predictive uncertainty
estimation into a DRL-based navigation framework for uncertainty estimates in
decision-making. We, therefore, incorporate Observation-Dependent Variance
(ODV) and dropout into the Proximal Policy Optimization (PPO) algorithm. For
different types of perturbations, we compare the ability of Deep Ensembles and
Monte-Carlo Dropout (MC-Dropout) to estimate the uncertainties of the policy.
In uncertain decision-making situations, we propose to change the robot's
social behavior to conservative collision avoidance. The results show that the
ODV-PPO algorithm converges faster with better generalization and disentangles
the aleatoric and epistemic uncertainties. In addition, the MC-Dropout approach
is more sensitive to perturbations and capable to correlate the uncertainty
type to the perturbation type better. With the proposed safe action selection
scheme, the robot can navigate in perturbed environments with fewer collisions.