Pub Date : 2025-01-18DOI: 10.1007/s10514-024-10188-y
Ananth Jonnavittula, Sagar Parekh, Dylan P. Losey
Robots can use visual imitation learning (VIL) to learn manipulation tasks from video demonstrations. However, translating visual observations into actionable robot policies is challenging due to the high-dimensional nature of video data. This challenge is further exacerbated by the morphological differences between humans and robots, especially when the video demonstrations feature humans performing tasks. To address these problems we introduce Visual Imitation lEarning with Waypoints (VIEW), an algorithm that significantly enhances the sample efficiency of human-to-robot VIL. VIEW achieves this efficiency using a multi-pronged approach: extracting a condensed prior trajectory that captures the demonstrator’s intent, employing an agent-agnostic reward function for feedback on the robot’s actions, and utilizing an exploration algorithm that efficiently samples around waypoints in the extracted trajectory. VIEW also segments the human trajectory into grasp and task phases to further accelerate learning efficiency. Through comprehensive simulations and real-world experiments, VIEW demonstrates improved performance compared to current state-of-the-art VIL methods. VIEW enables robots to learn manipulation tasks involving multiple objects from arbitrarily long video demonstrations. Additionally, it can learn standard manipulation tasks such as pushing or moving objects from a single video demonstration in under 30 min, with fewer than 20 real-world rollouts. Code and videos here: https://collab.me.vt.edu/view/
{"title":"View: visual imitation learning with waypoints","authors":"Ananth Jonnavittula, Sagar Parekh, Dylan P. Losey","doi":"10.1007/s10514-024-10188-y","DOIUrl":"10.1007/s10514-024-10188-y","url":null,"abstract":"<div><p>Robots can use visual imitation learning (VIL) to learn manipulation tasks from video demonstrations. However, translating visual observations into actionable robot policies is challenging due to the high-dimensional nature of video data. This challenge is further exacerbated by the morphological differences between humans and robots, especially when the video demonstrations feature humans performing tasks. To address these problems we introduce <b>V</b>isual <b>I</b>mitation l<b>E</b>arning with <b>W</b>aypoints (VIEW), an algorithm that significantly enhances the sample efficiency of human-to-robot VIL. VIEW achieves this efficiency using a multi-pronged approach: extracting a condensed prior trajectory that captures the demonstrator’s intent, employing an agent-agnostic reward function for feedback on the robot’s actions, and utilizing an exploration algorithm that efficiently samples around waypoints in the extracted trajectory. VIEW also segments the human trajectory into grasp and task phases to further accelerate learning efficiency. Through comprehensive simulations and real-world experiments, VIEW demonstrates improved performance compared to current state-of-the-art VIL methods. VIEW enables robots to learn manipulation tasks involving multiple objects from arbitrarily long video demonstrations. Additionally, it can learn standard manipulation tasks such as pushing or moving objects from a single video demonstration in under 30 min, with fewer than 20 real-world rollouts. Code and videos here: https://collab.me.vt.edu/view/</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"49 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-024-10188-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142995247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-17DOI: 10.1007/s10514-024-10186-0
Dawei Zhang, Roberto Tron
We present a novel approach that aims to address both safety and stability of a haptic teleoperation system within a framework of Haptic Shared Autonomy (HSA). We use Control Barrier Functions (CBFs) to generate the control input that follows the user’s input as closely as possible while guaranteeing safety. In the context of stability of the human-in-the-loop system, we limit the force feedback perceived by the user via a small (mathcal {L}_2)-gain, which is achieved by limiting the control and the force feedback via a differential constraint. Specifically, with the property of HSA, we propose two pathways to design the control and the force feedback: Sequential Control Force (SCF) and Joint Control Force (JCF). Both designs can achieve safety and stability but with different responses to the user’s commands. We conducted experimental simulations to evaluate and investigate the properties of the designed methods. We also tested the proposed method on a physical quadrotor UAV and a haptic interface.
{"title":"Safe and stable teleoperation of quadrotor UAVs under haptic shared autonomy","authors":"Dawei Zhang, Roberto Tron","doi":"10.1007/s10514-024-10186-0","DOIUrl":"10.1007/s10514-024-10186-0","url":null,"abstract":"<div><p>We present a novel approach that aims to address both safety and stability of a haptic teleoperation system within a framework of Haptic Shared Autonomy (HSA). We use Control Barrier Functions (CBFs) to generate the control input that follows the user’s input as closely as possible while guaranteeing safety. In the context of stability of the human-in-the-loop system, we limit the force feedback perceived by the user via a small <span>(mathcal {L}_2)</span>-gain, which is achieved by limiting the control and the force feedback via a differential constraint. Specifically, with the property of HSA, we propose two pathways to design the control and the force feedback: Sequential Control Force (SCF) and Joint Control Force (JCF). Both designs can achieve safety and stability but with different responses to the user’s commands. We conducted experimental simulations to evaluate and investigate the properties of the designed methods. We also tested the proposed method on a physical quadrotor UAV and a haptic interface.</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"49 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142995138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-14DOI: 10.1007/s10514-024-10187-z
Emily Scheide, Graeme Best, Geoffrey A. Hollinger
Complex robotics domains (e.g., remote exploration applications and scenarios involving interactions with humans) require encoding high-level mission specifications that consider uncertainty. Most current fielded systems in practice require humans to manually encode mission specifications in ways that require amounts of time and expertise that can become infeasible and limit mission scope. Therefore, we propose a method of automating the process of encoding mission specifications as behavior trees. In particular, we present an algorithm for synthesizing behavior trees that represent the optimal policy for a user-defined specification of a domain and problem in the Probabilistic Planning Domain Definition Language (PPDDL). Our algorithm provides access to behavior tree advantages including compactness and modularity, while alleviating the need for the time-intensive manual design of behavior trees, which requires substantial expert knowledge. Our method converts the PPDDL specification into solvable MDP matrices, simplifies the solution, i.e. policy, using Boolean algebra simplification, and converts this simplified policy to a compact behavior tree that can be executed by a robot. We present simulated experiments for a marine target search and response scenario and an infant-robot interaction for mobility domain. Our results demonstrate that the synthesized, simplified behavior trees have approximately between 15 x and 26 x fewer nodes and an average of between 8 x and 13 x fewer active conditions for selecting the active action than they would without simplification. These compactness and activity results suggest an increase in the interpretability and execution efficiency of the behavior trees synthesized by the proposed method. Additionally, our results demonstrate that this synthesis method is robust to a variety of user input mistakes, and we empirically confirm that the synthesized behavior trees perform equivalently to the optimal policy that they are constructed to logically represent.
{"title":"Synthesizing compact behavior trees for probabilistic robotics domains","authors":"Emily Scheide, Graeme Best, Geoffrey A. Hollinger","doi":"10.1007/s10514-024-10187-z","DOIUrl":"10.1007/s10514-024-10187-z","url":null,"abstract":"<div><p>Complex robotics domains (e.g., remote exploration applications and scenarios involving interactions with humans) require encoding high-level mission specifications that consider uncertainty. Most current fielded systems in practice require humans to manually encode mission specifications in ways that require amounts of time and expertise that can become infeasible and limit mission scope. Therefore, we propose a method of automating the process of encoding mission specifications as behavior trees. In particular, we present an algorithm for synthesizing behavior trees that represent the optimal policy for a user-defined specification of a domain and problem in the Probabilistic Planning Domain Definition Language (PPDDL). Our algorithm provides access to behavior tree advantages including compactness and modularity, while alleviating the need for the time-intensive manual design of behavior trees, which requires substantial expert knowledge. Our method converts the PPDDL specification into solvable MDP matrices, simplifies the solution, i.e. policy, using Boolean algebra simplification, and converts this simplified policy to a compact behavior tree that can be executed by a robot. We present simulated experiments for a marine target search and response scenario and an infant-robot interaction for mobility domain. Our results demonstrate that the synthesized, simplified behavior trees have approximately between 15 <span>x</span> and 26 <span>x</span> fewer nodes and an average of between 8 <span>x</span> and 13 <span>x</span> fewer active conditions for selecting the active action than they would without simplification. These compactness and activity results suggest an increase in the interpretability and execution efficiency of the behavior trees synthesized by the proposed method. Additionally, our results demonstrate that this synthesis method is robust to a variety of user input mistakes, and we empirically confirm that the synthesized behavior trees perform equivalently to the optimal policy that they are constructed to logically represent.\u0000</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"49 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142976351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-09DOI: 10.1007/s10514-024-10184-2
Verena Schuengel, Bjoern Braunstein, Fabian Goell, Daniel Braun, Nadine Reißner, Kirill Safronov, Christian Weiser, Jule Heieis, Kirsten Albracht
Patients with sarcopenia, who face difficulties in carrying heavy loads, may benefit from collaborative robotic assistance that is modeled after human–human interaction. The objective of this study is to describe the kinematics and spatio-temporal parameters during a collaborative carrying task involving both human and robotic partners. Fourteen subjects carried a table while moving forward with a human and a robotic partner. The movements were recorded using a three-dimensional motion capture system. The subjects successfully completed the task of carrying the table with the robot. No significant differences were found in the shoulder and elbow flexion/extension angles. In human–human dyads, the center of mass naturally oscillated vertically with an amplitude of approximately 2 cm. The here presented results of the human–human interaction serve as a model for the development of future robotic systems, designed for collaborative manipulation.
{"title":"Integrative biomechanics of a human–robot carrying task: implications for future collaborative work","authors":"Verena Schuengel, Bjoern Braunstein, Fabian Goell, Daniel Braun, Nadine Reißner, Kirill Safronov, Christian Weiser, Jule Heieis, Kirsten Albracht","doi":"10.1007/s10514-024-10184-2","DOIUrl":"10.1007/s10514-024-10184-2","url":null,"abstract":"<div><p>Patients with sarcopenia, who face difficulties in carrying heavy loads, may benefit from collaborative robotic assistance that is modeled after human–human interaction. The objective of this study is to describe the kinematics and spatio-temporal parameters during a collaborative carrying task involving both human and robotic partners. Fourteen subjects carried a table while moving forward with a human and a robotic partner. The movements were recorded using a three-dimensional motion capture system. The subjects successfully completed the task of carrying the table with the robot. No significant differences were found in the shoulder and elbow flexion/extension angles. In human–human dyads, the center of mass naturally oscillated vertically with an amplitude of approximately 2 cm. The here presented results of the human–human interaction serve as a model for the development of future robotic systems, designed for collaborative manipulation.\u0000</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"49 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-024-10184-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-24DOI: 10.1007/s10514-024-10185-1
Mengxue Hou, Tony X. Lin, Enlu Zhou, Fumin Zhang
We propose a learning-based method to extract symbolic representations of the belief state and its dynamics in order to solve planning problems in a continuous-state partially observable Markov decision processes (POMDP) problem. While existing approaches typically parameterize the continuous-state POMDP into a finite-dimensional Markovian model, they are unable to preserve fidelity of the abstracted model. To improve accuracy of the abstracted representation, we introduce a memory-dependent abstraction approach to mitigate the modeling error. The first major contribution of this paper is we propose a Neural Network based method to learn the non-Markovian transition model based on the Mori-Zwanzig (M-Z) formalism. Different from existing work in applying M-Z formalism to autonomous time-invariant systems, our approach is the first work generalizing the M-Z formalism to robotics, by addressing the non-Markovian modeling of the belief dynamics that is dependent on historical observations and actions. The second major contribution is we theoretically show that modeling the non-Markovian memory effect in the abstracted belief dynamics improves the modeling accuracy, which is the key benefit of the proposed algorithm. Simulation experiment of a belief space planning problem is provided to validate the performance of the proposed belief abstraction algorithms.
{"title":"Mori-zwanzig approach for belief abstraction with application to belief space planning","authors":"Mengxue Hou, Tony X. Lin, Enlu Zhou, Fumin Zhang","doi":"10.1007/s10514-024-10185-1","DOIUrl":"10.1007/s10514-024-10185-1","url":null,"abstract":"<div><p>We propose a learning-based method to extract symbolic representations of the belief state and its dynamics in order to solve planning problems in a continuous-state partially observable Markov decision processes (POMDP) problem. While existing approaches typically parameterize the continuous-state POMDP into a finite-dimensional Markovian model, they are unable to preserve fidelity of the abstracted model. To improve accuracy of the abstracted representation, we introduce a memory-dependent abstraction approach to mitigate the modeling error. The first major contribution of this paper is we propose a Neural Network based method to learn the non-Markovian transition model based on the Mori-Zwanzig (M-Z) formalism. Different from existing work in applying M-Z formalism to autonomous time-invariant systems, our approach is the first work generalizing the M-Z formalism to robotics, by addressing the non-Markovian modeling of the belief dynamics that is dependent on historical observations and actions. The second major contribution is we theoretically show that modeling the non-Markovian memory effect in the abstracted belief dynamics improves the modeling accuracy, which is the key benefit of the proposed algorithm. Simulation experiment of a belief space planning problem is provided to validate the performance of the proposed belief abstraction algorithms.\u0000</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"49 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-024-10185-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142880465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-28DOI: 10.1007/s10514-024-10183-3
Sotirios N. Aspragkathos, Panagiotis Rousseas, George C. Karras, Kostas J. Kyriakopoulos
This article presents a Visual Servoing Nonlinear Model Predictive Control (NMPC) scheme for autonomously tracking a moving target using multirotor Unmanned Aerial Vehicles (UAVs). The scheme is developed for surveillance and tracking of contour-based areas with evolving features. NMPC is used to manage input and state constraints, while additional barrier functions are incorporated in order to ensure system safety and optimal performance. The proposed control scheme is designed based on the extraction and implementation of the full dynamic model of the features describing the target and the state variables. Real-time simulations and experiments using a quadrotor UAV equipped with a camera demonstrate the effectiveness of the proposed strategy.
{"title":"Multirotor nonlinear model predictive control based on visual servoing of evolving features","authors":"Sotirios N. Aspragkathos, Panagiotis Rousseas, George C. Karras, Kostas J. Kyriakopoulos","doi":"10.1007/s10514-024-10183-3","DOIUrl":"10.1007/s10514-024-10183-3","url":null,"abstract":"<div><p>This article presents a Visual Servoing Nonlinear Model Predictive Control (NMPC) scheme for autonomously tracking a moving target using multirotor Unmanned Aerial Vehicles (UAVs). The scheme is developed for surveillance and tracking of contour-based areas with evolving features. NMPC is used to manage input and state constraints, while additional barrier functions are incorporated in order to ensure system safety and optimal performance. The proposed control scheme is designed based on the extraction and implementation of the full dynamic model of the features describing the target and the state variables. Real-time simulations and experiments using a quadrotor UAV equipped with a camera demonstrate the effectiveness of the proposed strategy.</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"48 8","pages":""},"PeriodicalIF":3.7,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142737228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19DOI: 10.1007/s10514-024-10176-2
Anas Alhashimi, Daniel Adolfsson, Henrik Andreasson, Achim Lilienthal, Martin Magnusson
This work introduces a novel detector, bounded false-alarm rate (BFAR), for distinguishing true detections from noise in radar data, leading to improved accuracy in radar odometry estimation. Scanning frequency-modulated continuous wave (FMCW) radars can serve as valuable tools for localization and mapping under low visibility conditions. However, they tend to yield a higher level of noise in comparison to the more commonly employed lidars, thereby introducing additional challenges to the detection process. We propose a new radar target detector called BFAR which uses an affine transformation of the estimated noise level compared to the classical constant false-alarm rate (CFAR) detector. This transformation employs learned parameters that minimize the error in odometry estimation. Conceptually, BFAR can be viewed as an optimized blend of CFAR and fixed-level thresholding designed to minimize odometry estimation error. The strength of this approach lies in its simplicity. Only a single parameter needs to be learned from a training dataset when the affine transformation scale parameter is maintained. Compared to ad-hoc detectors, BFAR has the advantage of a specified upper-bound for the false-alarm probability, and better noise handling than CFAR. Repeatability tests show that BFAR yields highly repeatable detections with minimal redundancy. We have conducted simulations to compare the detection and false-alarm probabilities of BFAR with those of three baselines in non-homogeneous noise and varying target sizes. The results show that BFAR outperforms the other detectors. Moreover, We apply BFAR to the use case of radar odometry, and adapt a recent odometry pipeline, replacing its original conservative filtering with BFAR. In this way, we reduce the translation/rotation odometry errors/100 m from 1.3%/0.4(^circ ) to 1.12%/0.38(^circ ), and from 1.62%/0.57(^circ ) to 1.21%/0.32(^circ ), improving translation error by 14.2% and 25% on Oxford and Mulran public data sets, respectively.
{"title":"BFAR: improving radar odometry estimation using a bounded false alarm rate detector","authors":"Anas Alhashimi, Daniel Adolfsson, Henrik Andreasson, Achim Lilienthal, Martin Magnusson","doi":"10.1007/s10514-024-10176-2","DOIUrl":"10.1007/s10514-024-10176-2","url":null,"abstract":"<div><p>This work introduces a novel detector, bounded false-alarm rate (BFAR), for distinguishing true detections from noise in radar data, leading to improved accuracy in radar odometry estimation. Scanning frequency-modulated continuous wave (FMCW) radars can serve as valuable tools for localization and mapping under low visibility conditions. However, they tend to yield a higher level of noise in comparison to the more commonly employed lidars, thereby introducing additional challenges to the detection process. We propose a new radar target detector called BFAR which uses an affine transformation of the estimated noise level compared to the classical constant false-alarm rate (CFAR) detector. This transformation employs learned parameters that minimize the error in odometry estimation. Conceptually, BFAR can be viewed as an optimized blend of CFAR and fixed-level thresholding designed to minimize odometry estimation error. The strength of this approach lies in its simplicity. Only a single parameter needs to be learned from a training dataset when the affine transformation scale parameter is maintained. Compared to ad-hoc detectors, BFAR has the advantage of a specified upper-bound for the false-alarm probability, and better noise handling than CFAR. Repeatability tests show that BFAR yields highly repeatable detections with minimal redundancy. We have conducted simulations to compare the detection and false-alarm probabilities of BFAR with those of three baselines in non-homogeneous noise and varying target sizes. The results show that BFAR outperforms the other detectors. Moreover, We apply BFAR to the use case of radar odometry, and adapt a recent odometry pipeline, replacing its original conservative filtering with BFAR. In this way, we reduce the translation/rotation odometry errors/100 m from 1.3%/0.4<span>(^circ )</span> to 1.12%/0.38<span>(^circ )</span>, and from 1.62%/0.57<span>(^circ )</span> to 1.21%/0.32<span>(^circ )</span>, improving translation error by 14.2% and 25% on Oxford and Mulran public data sets, respectively.</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"48 8","pages":""},"PeriodicalIF":3.7,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-024-10176-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1007/s10514-024-10182-4
Cameron Berg, Vittorio Caggiano, Vikash Kumar
Learning effective continuous control policies in high-dimensional systems, including musculoskeletal agents, remains a significant challenge. Over the course of biological evolution, organisms have developed robust mechanisms for overcoming this complexity to learn highly sophisticated strategies for motor control. What accounts for this robust behavioral flexibility? Modular control via muscle synergies, i.e. coordinated muscle co-contractions, is considered to be one putative mechanism that enables organisms to learn muscle control in a simplified and generalizable action space. Drawing inspiration from this evolved motor control strategy, we use physiologically accurate human hand and leg models as a testbed for determining the extent to which a Synergistic Action Representation (SAR) acquired from simpler tasks facilitates learning and generalization on more complex tasks. We find in both cases that SAR-exploiting policies significantly outperform end-to-end reinforcement learning. Policies trained with SAR were able to achieve robust locomotion on a diverse set of terrains (e.g., stairs, hills) with state-of-the-art sample efficiency (4 M total steps), while baseline approaches failed to learn any meaningful behaviors under the same training regime. Additionally, policies trained with SAR on in-hand 100-object manipulation task significantly outperformed (>70% success) baseline approaches (<20% success). Both SAR-exploiting policies were also found to generalize zero-shot to out-of-domain environmental conditions, while policies that did not adopt SAR failed to generalize. Finally, using a simulated robotic hand and humanoid agent, we establish the generality of SAR on broader high-dimensional control problems, solving tasks with greatly improved sample efficiency. To the best of our knowledge, this investigation is the first of its kind to present an end-to-end pipeline for discovering synergies and using this representation to learn high-dimensional continuous control across a wide diversity of tasks. Project website:https://sites.google.com/view/sar-rl
在高维系统(包括肌肉骨骼系统)中学习有效的连续控制策略仍然是一项重大挑战。在生物进化的过程中,生物已经发展出克服这种复杂性的强大机制,从而学会了高度复杂的运动控制策略。是什么造就了这种强大的行为灵活性?通过肌肉协同作用(即协调的肌肉共同收缩)进行的模块化控制被认为是一种推定机制,它使生物能够在简化和可泛化的动作空间中学习肌肉控制。从这种进化的运动控制策略中汲取灵感,我们使用生理上精确的人类手部和腿部模型作为试验平台,以确定从较简单任务中获得的协同动作表征(SAR)在多大程度上促进了对较复杂任务的学习和泛化。我们发现,在这两种情况下,利用 SAR 的策略都明显优于端到端强化学习。利用 SAR 训练的策略能够在各种地形(如楼梯、山丘)上实现稳健的运动,并具有最先进的采样效率(总步数为 400 万步),而基线方法在相同的训练机制下无法学习到任何有意义的行为。此外,在手持 100 个物体的操作任务中,使用 SAR 训练的策略明显优于基线方法(成功率为 70%)(成功率为 20%)。研究还发现,这两种利用合成孔径雷达的策略都能在域外环境条件下实现零误差泛化,而未采用合成孔径雷达的策略则无法实现泛化。最后,我们利用模拟机器人手和仿人代理,在更广泛的高维控制问题上确立了 SAR 的通用性,大大提高了解决任务的采样效率。据我们所知,这项研究首次提出了一个端到端的管道,用于发现协同效应,并利用这种表示学习各种任务的高维连续控制。项目网站:https://sites.google.com/view/sar-rl
{"title":"SAR: generalization of physiological agility and dexterity via synergistic action representation","authors":"Cameron Berg, Vittorio Caggiano, Vikash Kumar","doi":"10.1007/s10514-024-10182-4","DOIUrl":"10.1007/s10514-024-10182-4","url":null,"abstract":"<div><p>Learning effective continuous control policies in high-dimensional systems, including musculoskeletal agents, remains a significant challenge. Over the course of biological evolution, organisms have developed robust mechanisms for overcoming this complexity to learn highly sophisticated strategies for motor control. What accounts for this robust behavioral flexibility? Modular control via muscle synergies, i.e. coordinated muscle co-contractions, is considered to be one putative mechanism that enables organisms to learn muscle control in a simplified and generalizable action space. Drawing inspiration from this evolved motor control strategy, we use physiologically accurate human hand and leg models as a testbed for determining the extent to which a <i>Synergistic Action Representation</i> (<i>SAR</i>) acquired from simpler tasks facilitates learning and generalization on more complex tasks. We find in both cases that <i>SAR</i>-exploiting policies significantly outperform end-to-end reinforcement learning. Policies trained with <i>SAR</i> were able to achieve robust locomotion on a diverse set of terrains (e.g., stairs, hills) with state-of-the-art sample efficiency (4 M total steps), while baseline approaches failed to learn any meaningful behaviors under the same training regime. Additionally, policies trained with <i>SAR</i> on in-hand 100-object manipulation task significantly outperformed (>70% success) baseline approaches (<20% success). Both <i>SAR</i>-exploiting policies were also found to generalize zero-shot to out-of-domain environmental conditions, while policies that did not adopt <i>SAR</i> failed to generalize. Finally, using a simulated robotic hand and humanoid agent, we establish the generality of SAR on broader high-dimensional control problems, solving tasks with greatly improved sample efficiency. To the best of our knowledge, this investigation is the first of its kind to present an end-to-end pipeline for discovering synergies and using this representation to learn high-dimensional continuous control across a wide diversity of tasks. <b>Project website:</b>https://sites.google.com/view/sar-rl</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"48 8","pages":""},"PeriodicalIF":3.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-22DOI: 10.1007/s10514-024-10179-z
Bernardo Martinez Rocamora Jr., Guilherme A. S. Pereira
Several applications require that unmanned vehicles, such as UAVs and AUVs, navigate environmental flows. While the flow can improve the vehicle’s efficiency when directed towards the goal, it may also cause feasibility problems when it is against the desired motion and is too strong to be counteracted by the vehicle. This paper proposes the flow-aware fast marching tree algorithm (FlowFMT*) to solve the optimal motion planning problem in generic three-dimensional flows. Our method creates either an optimal path from start to goal or, with a few modifications, a vector field-based policy that guides the vehicle from anywhere in its workspace to the goal. The basic idea of the proposed method is to replace the original neighborhood set used by FMT* with two sets that consider the reachability from/to each sampled position in the space. The new neighborhood sets are computed considering the flow and the maximum speed of the vehicle. Numerical results that compare our methods with the state-of-the-art optimal control solver illustrate the simplicity and correctness of the method.
{"title":"Optimal policies for autonomous navigation in strong currents using fast marching trees","authors":"Bernardo Martinez Rocamora Jr., Guilherme A. S. Pereira","doi":"10.1007/s10514-024-10179-z","DOIUrl":"10.1007/s10514-024-10179-z","url":null,"abstract":"<div><p>Several applications require that unmanned vehicles, such as UAVs and AUVs, navigate environmental flows. While the flow can improve the vehicle’s efficiency when directed towards the goal, it may also cause feasibility problems when it is against the desired motion and is too strong to be counteracted by the vehicle. This paper proposes the flow-aware fast marching tree algorithm (FlowFMT*) to solve the optimal motion planning problem in generic three-dimensional flows. Our method creates either an optimal path from start to goal or, with a few modifications, a vector field-based policy that guides the vehicle from anywhere in its workspace to the goal. The basic idea of the proposed method is to replace the original neighborhood set used by FMT* with two sets that consider the reachability from/to each sampled position in the space. The new neighborhood sets are computed considering the flow and the maximum speed of the vehicle. Numerical results that compare our methods with the state-of-the-art optimal control solver illustrate the simplicity and correctness of the method.</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"48 8","pages":""},"PeriodicalIF":3.7,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142453025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}