Autonomous Robots最新文献_第3页

R (times ) R: Rapid eXploration for Reinforcement learning via sampling-based reset distributions and imitation pre-training R $$times $$ R：通过基于采样的重置分布和模仿预训练实现强化学习的快速扩展

IF 3.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Autonomous Robots

Pub Date : 2024-08-27 DOI: 10.1007/s10514-024-10170-8

Gagan Khandate, Tristan L. Saidi, Siqi Shang, Eric T. Chang, Yang Liu, Seth Dennis, Johnson Adams, Matei Ciocarlie

We present a method for enabling Reinforcement Learning of motor control policies for complex skills such as dexterous manipulation. We posit that a key difficulty for training such policies is the difficulty of exploring the problem state space, as the accessible and useful regions of this space form a complex structure along manifolds of the original high-dimensional state space. This work presents a method to enable and support exploration with Sampling-based Planning. We use a generally applicable non-holonomic Rapidly-exploring Random Trees algorithm and present multiple methods to use the resulting structure to bootstrap model-free Reinforcement Learning. Our method is effective at learning various challenging dexterous motor control skills of higher difficulty than previously shown. In particular, we achieve dexterous in-hand manipulation of complex objects while simultaneously securing the object without the use of passive support surfaces. These policies also transfer effectively to real robots. A number of example videos can also be found on the project website: sbrl.cs.columbia.edu

我们提出了一种针对灵巧操作等复杂技能的运动控制策略的强化学习方法。我们认为，训练此类策略的主要困难在于探索问题状态空间的难度，因为该空间中可访问的有用区域沿着原始高维状态空间的流形形成了复杂的结构。本研究提出了一种利用基于采样的规划来实现和支持探索的方法。我们采用了一种普遍适用的非整体快速探索随机树算法，并提出了多种方法来利用由此产生的结构引导无模型强化学习。我们的方法能有效地学习各种具有挑战性的灵巧运动控制技能，其难度高于以往的研究。特别是，我们实现了对复杂物体的灵巧徒手操控，同时在不使用被动支撑面的情况下固定物体。这些策略也能有效地应用于真实机器人。您还可以在项目网站：sbrl.cs.columbia.edu 上找到一些示例视频。

{"title":"R (times ) R: Rapid eXploration for Reinforcement learning via sampling-based reset distributions and imitation pre-training","authors":"Gagan Khandate, Tristan L. Saidi, Siqi Shang, Eric T. Chang, Yang Liu, Seth Dennis, Johnson Adams, Matei Ciocarlie","doi":"10.1007/s10514-024-10170-8","DOIUrl":"10.1007/s10514-024-10170-8","url":null,"abstract":"<div><p>We present a method for enabling Reinforcement Learning of motor control policies for complex skills such as dexterous manipulation. We posit that a key difficulty for training such policies is the difficulty of exploring the problem state space, as the accessible and useful regions of this space form a complex structure along manifolds of the original high-dimensional state space. This work presents a method to enable and support exploration with Sampling-based Planning. We use a generally applicable non-holonomic Rapidly-exploring Random Trees algorithm and present multiple methods to use the resulting structure to bootstrap model-free Reinforcement Learning. Our method is effective at learning various challenging dexterous motor control skills of higher difficulty than previously shown. In particular, we achieve dexterous in-hand manipulation of complex objects while simultaneously securing the object without the use of passive support surfaces. These policies also transfer effectively to real robots. A number of example videos can also be found on the project website: sbrl.cs.columbia.edu</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"48 7","pages":""},"PeriodicalIF":3.7,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142193366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ASAPs: asynchronous hybrid self-reconfiguration algorithm for porous modular robotic structures ASAPs：多孔模块机器人结构的异步混合自重新配置算法

IF 3.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Autonomous Robots

Pub Date : 2024-08-22 DOI: 10.1007/s10514-024-10171-7

Jad Bassil, Benoît Piranda, Abdallah Makhoul, Julien Bourgeois

Programmable matter refers to material that can be programmed to alter its physical properties, including its shape. Such matter can be built as a lattice of attached robotic modules, each seen as an autonomous agent with communication and motion capabilities. Self-reconfiguration consists in changing the initial arrangement of modules to form a desired goal shape, and is known to be a complex problem due to its algorithmic complexity and motion constraints. In this paper, we propose to use a max-flow algorithm as a centralized global planner to determine the concurrent paths to be traversed by modules through a porous structure composed of 3D Catoms meta-modules with the aim of increasing the parallelism of motions, and hence decreasing the self-reconfiguration time. We implement a traffic light system as a distributed asynchronous local planning algorithm to control the motions to avoid collisions. We evaluated our algorithm using VisibleSim simulator on different self-reconfiguration scenarios and compared the performance with an existing fully distributed synchronous self-reconfiguration algorithm for similar structures. The results show that the new method provides a significant gain in self-reconfiguration time and energy efficiency.

可编程物质是指可以通过编程改变其物理特性（包括形状）的材料。可编程物质是指可以通过编程改变其物理特性（包括形状）的材料。这种物质可以构建成一个由附加机器人模块组成的晶格，每个模块都是一个具有通信和运动能力的自主代理。众所周知，由于算法的复杂性和运动限制，自我重新配置是一个复杂的问题。在本文中，我们建议使用最大流算法作为集中式全局规划器，以确定模块通过由 3D Catoms 元模块组成的多孔结构的并发路径，从而提高运动的并行性，进而缩短自我重新配置时间。我们采用交通灯系统作为分布式异步局部规划算法，控制运动以避免碰撞。我们使用 VisibleSim 模拟器在不同的自重新配置场景中评估了我们的算法，并将其性能与现有的针对类似结构的全分布式同步自重新配置算法进行了比较。结果表明，新方法显著缩短了自重新配置时间，提高了能效。

{"title":"ASAPs: asynchronous hybrid self-reconfiguration algorithm for porous modular robotic structures","authors":"Jad Bassil, Benoît Piranda, Abdallah Makhoul, Julien Bourgeois","doi":"10.1007/s10514-024-10171-7","DOIUrl":"10.1007/s10514-024-10171-7","url":null,"abstract":"<div><p>Programmable matter refers to material that can be programmed to alter its physical properties, including its shape. Such matter can be built as a lattice of attached robotic modules, each seen as an autonomous agent with communication and motion capabilities. Self-reconfiguration consists in changing the initial arrangement of modules to form a desired goal shape, and is known to be a complex problem due to its algorithmic complexity and motion constraints. In this paper, we propose to use a max-flow algorithm as a centralized global planner to determine the concurrent paths to be traversed by modules through a porous structure composed of <i>3D Catoms</i> meta-modules with the aim of increasing the parallelism of motions, and hence decreasing the self-reconfiguration time. We implement a traffic light system as a distributed asynchronous local planning algorithm to control the motions to avoid collisions. We evaluated our algorithm using <i>VisibleSim</i> simulator on different self-reconfiguration scenarios and compared the performance with an existing fully distributed synchronous self-reconfiguration algorithm for similar structures. The results show that the new method provides a significant gain in self-reconfiguration time and energy efficiency.\u0000</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"48 7","pages":""},"PeriodicalIF":3.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142193369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Active velocity estimation using light curtains via self-supervised multi-armed bandits 通过自监督多臂匪帮使用光幕进行主动速度估计

IF 3.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Autonomous Robots

Pub Date : 2024-08-10 DOI: 10.1007/s10514-024-10168-2

Siddharth Ancha, Gaurav Pathak, Ji Zhang, Srinivasa Narasimhan, David Held

To navigate in an environment safely and autonomously, robots must accurately estimate where obstacles are and how they move. Instead of using expensive traditional 3D sensors, we explore the use of a much cheaper, faster, and higher resolution alternative: programmable light curtains. Light curtains are a controllable depth sensor that sense only along a surface that the user selects. We adapt a probabilistic method based on particle filters and occupancy grids to explicitly estimate the position and velocity of 3D points in the scene using partial measurements made by light curtains. The central challenge is to decide where to place the light curtain to accurately perform this task. We propose multiple curtain placement strategies guided by maximizing information gain and verifying predicted object locations. Then, we combine these strategies using an online learning framework. We propose a novel self-supervised reward function that evaluates the accuracy of current velocity estimates using future light curtain placements. We use a multi-armed bandit framework to intelligently switch between placement policies in real time, outperforming fixed policies. We develop a full-stack navigation system that uses position and velocity estimates from light curtains for downstream tasks such as localization, mapping, path-planning, and obstacle avoidance. This work paves the way for controllable light curtains to accurately, efficiently, and purposefully perceive and navigate complex and dynamic environments.

为了在环境中安全自主地导航，机器人必须准确估计障碍物的位置及其移动方式。与使用昂贵的传统 3D 传感器相比，我们探索了一种更便宜、更快速、分辨率更高的替代方法：可编程光幕。光幕是一种可控深度传感器，只能沿着用户选择的表面进行感应。我们采用了一种基于粒子滤波器和占位网格的概率方法，利用光幕的部分测量结果来明确估计场景中三维点的位置和速度。核心挑战在于如何决定光幕的位置，以准确地执行这项任务。我们提出了以信息增益最大化和验证预测物体位置为指导的多种光幕放置策略。然后，我们利用在线学习框架将这些策略结合起来。我们提出了一种新颖的自监督奖励函数，该函数利用未来的光幕位置来评估当前速度估计的准确性。我们使用多臂匪框架在不同的放置策略之间进行实时智能切换，其效果优于固定策略。我们开发了一个全栈导航系统，利用光幕的位置和速度估计值来完成定位、绘图、路径规划和避障等下游任务。这项工作为可控光幕准确、高效、有目的地感知和导航复杂动态环境铺平了道路。

{"title":"Active velocity estimation using light curtains via self-supervised multi-armed bandits","authors":"Siddharth Ancha, Gaurav Pathak, Ji Zhang, Srinivasa Narasimhan, David Held","doi":"10.1007/s10514-024-10168-2","DOIUrl":"10.1007/s10514-024-10168-2","url":null,"abstract":"<div><p>To navigate in an environment safely and autonomously, robots must accurately estimate where obstacles are and how they move. Instead of using expensive traditional 3D sensors, we explore the use of a much cheaper, faster, and higher resolution alternative: <i>programmable light curtains</i>. Light curtains are a controllable depth sensor that sense only along a surface that the user selects. We adapt a probabilistic method based on particle filters and occupancy grids to explicitly estimate the position and velocity of 3D points in the scene using partial measurements made by light curtains. The central challenge is to decide where to place the light curtain to accurately perform this task. We propose multiple curtain placement strategies guided by maximizing information gain and verifying predicted object locations. Then, we combine these strategies using an online learning framework. We propose a novel self-supervised reward function that evaluates the accuracy of current velocity estimates using future light curtain placements. We use a multi-armed bandit framework to intelligently switch between placement policies in real time, outperforming fixed policies. We develop a full-stack navigation system that uses position and velocity estimates from light curtains for downstream tasks such as localization, mapping, path-planning, and obstacle avoidance. This work paves the way for controllable light curtains to accurately, efficiently, and purposefully perceive and navigate complex and dynamic environments.\u0000</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"48 6","pages":""},"PeriodicalIF":3.7,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Formal design, verification and implementation of robotic controller software via RoboChart and RoboTool 通过 RoboChart 和 RoboTool 对机器人控制器软件进行形式化设计、验证和实施

IF 3.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Autonomous Robots

Pub Date : 2024-07-05 DOI: 10.1007/s10514-024-10163-7

Wei Li, Pedro Ribeiro, Alvaro Miyazawa, Richard Redpath, Ana Cavalcanti, Kieran Alden, Jim Woodcock, Jon Timmis

Current practice in simulation and implementation of robot controllers is usually undertaken with guidance from high-level design diagrams and pseudocode. Thus, no rigorous connection between the design and the development of a robot controller is established. This paper presents a framework for designing robotic controllers with support for automatic generation of executable code and automatic property checking. A state-machine based notation, RoboChart, and a tool (RoboTool) that implements the automatic generation of code and mathematical models from the designed controllers are presented. We demonstrate the application of RoboChart and its related tool through a case study of a robot performing an exploration task. The automatically generated code is platform independent and is used in both simulation and two different physical robotic platforms. Properties are formally checked against the mathematical models generated by RoboTool, and further validated in the actual simulations and physical experiments. The tool not only provides engineers with a way of designing robotic controllers formally but also paves the way for correct implementation of robotic systems.

目前，机器人控制器的模拟和实现通常是在高层设计图和伪代码的指导下进行的。因此，机器人控制器的设计与开发之间没有建立严格的联系。本文提出了一个设计机器人控制器的框架，支持自动生成可执行代码和自动属性检查。本文介绍了一种基于状态机的符号--RoboChart 和一种工具（RoboTool），该工具可根据设计的控制器自动生成代码和数学模型。我们通过一个机器人执行探索任务的案例研究，展示了 RoboChart 及其相关工具的应用。自动生成的代码与平台无关，可用于模拟和两种不同的物理机器人平台。根据 RoboTool 生成的数学模型对属性进行正式检查，并在实际模拟和物理实验中进一步验证。该工具不仅为工程师提供了正式设计机器人控制器的方法，还为正确实施机器人系统铺平了道路。

{"title":"Formal design, verification and implementation of robotic controller software via RoboChart and RoboTool","authors":"Wei Li, Pedro Ribeiro, Alvaro Miyazawa, Richard Redpath, Ana Cavalcanti, Kieran Alden, Jim Woodcock, Jon Timmis","doi":"10.1007/s10514-024-10163-7","DOIUrl":"10.1007/s10514-024-10163-7","url":null,"abstract":"<div><p>Current practice in simulation and implementation of robot controllers is usually undertaken with guidance from high-level design diagrams and pseudocode. Thus, no rigorous connection between the design and the development of a robot controller is established. This paper presents a framework for designing robotic controllers with support for automatic generation of executable code and automatic property checking. A state-machine based notation, RoboChart, and a tool (RoboTool) that implements the automatic generation of code and mathematical models from the designed controllers are presented. We demonstrate the application of RoboChart and its related tool through a case study of a robot performing an exploration task. The automatically generated code is platform independent and is used in both simulation and two different physical robotic platforms. Properties are formally checked against the mathematical models generated by RoboTool, and further validated in the actual simulations and physical experiments. The tool not only provides engineers with a way of designing robotic controllers formally but also paves the way for correct implementation of robotic systems.\u0000</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"48 6","pages":""},"PeriodicalIF":3.7,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-024-10163-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141552134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reinforcement learning based autonomous multi-rotor landing on moving platforms 基于强化学习的多旋翼自主着陆移动平台

IF 3.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Autonomous Robots

Pub Date : 2024-06-06 DOI: 10.1007/s10514-024-10162-8

Pascal Goldschmid, Aamir Ahmad

Multi-rotor UAVs suffer from a restricted range and flight duration due to limited battery capacity. Autonomous landing on a 2D moving platform offers the possibility to replenish batteries and offload data, thus increasing the utility of the vehicle. Classical approaches rely on accurate, complex and difficult-to-derive models of the vehicle and the environment. Reinforcement learning (RL) provides an attractive alternative due to its ability to learn a suitable control policy exclusively from data during a training procedure. However, current methods require several hours to train, have limited success rates and depend on hyperparameters that need to be tuned by trial-and-error. We address all these issues in this work. First, we decompose the landing procedure into a sequence of simpler, but similar learning tasks. This is enabled by applying two instances of the same RL based controller trained for 1D motion for controlling the multi-rotor’s movement in both the longitudinal and the lateral directions. Second, we introduce a powerful state space discretization technique that is based on i) kinematic modeling of the moving platform to derive information about the state space topology and ii) structuring the training as a sequential curriculum using transfer learning. Third, we leverage the kinematics model of the moving platform to also derive interpretable hyperparameters for the training process that ensure sufficient maneuverability of the multi-rotor vehicle. The training is performed using the tabular RL method Double Q-Learning. Through extensive simulations we show that the presented method significantly increases the rate of successful landings, while requiring less training time compared to other deep RL approaches. Furthermore, for two comparison scenarios it achieves comparable performance than a cascaded PI controller. Finally, we deploy and demonstrate our algorithm on real hardware. For all evaluation scenarios we provide statistics on the agent’s performance. Source code is openly available at https://github.com/robot-perception-group/rl_multi_rotor_landing.

由于电池容量有限，多旋翼无人飞行器的航程和飞行时间受到限制。在二维移动平台上自主着陆可以补充电池和卸载数据，从而提高飞行器的效用。传统方法依赖于精确、复杂且难以推导的飞行器和环境模型。强化学习（RL）是一种有吸引力的替代方法，因为它能够在训练过程中完全从数据中学习合适的控制策略。然而，目前的方法需要数小时的训练时间，成功率有限，并且依赖于需要通过试错来调整的超参数。我们在这项工作中解决了所有这些问题。首先，我们将着陆程序分解为一系列更简单但类似的学习任务。为此，我们采用了两个基于 RL 的控制器实例，分别用于控制多旋翼飞行器的纵向和横向运动。其次，我们引入了一种功能强大的状态空间离散化技术，该技术基于 i) 运动平台的运动学建模，以获取状态空间拓扑信息；ii) 利用迁移学习将训练结构化为顺序课程。第三，我们还利用移动平台的运动学模型，为训练过程推导出可解释的超参数，确保多旋翼飞行器具有足够的机动性。训练使用表格 RL 方法 Double Q-Learning 进行。通过大量仿真，我们发现与其他深度 RL 方法相比，该方法大大提高了着陆成功率，同时所需的训练时间也更短。此外，在两个对比场景中，它的性能与级联 PI 控制器相当。最后，我们在实际硬件上部署并演示了我们的算法。对于所有评估场景，我们都提供了关于代理性能的统计数据。源代码可通过 https://github.com/robot-perception-group/rl_multi_rotor_landing 公开获取。

{"title":"Reinforcement learning based autonomous multi-rotor landing on moving platforms","authors":"Pascal Goldschmid, Aamir Ahmad","doi":"10.1007/s10514-024-10162-8","DOIUrl":"10.1007/s10514-024-10162-8","url":null,"abstract":"<div><p>Multi-rotor UAVs suffer from a restricted range and flight duration due to limited battery capacity. Autonomous landing on a 2D moving platform offers the possibility to replenish batteries and offload data, thus increasing the utility of the vehicle. Classical approaches rely on accurate, complex and difficult-to-derive models of the vehicle and the environment. Reinforcement learning (RL) provides an attractive alternative due to its ability to learn a suitable control policy exclusively from data during a training procedure. However, current methods require several hours to train, have limited success rates and depend on hyperparameters that need to be tuned by trial-and-error. We address all these issues in this work. First, we decompose the landing procedure into a sequence of simpler, but similar learning tasks. This is enabled by applying two instances of the same RL based controller trained for 1D motion for controlling the multi-rotor’s movement in both the longitudinal and the lateral directions. Second, we introduce a powerful state space discretization technique that is based on i) kinematic modeling of the moving platform to derive information about the state space topology and ii) structuring the training as a sequential curriculum using transfer learning. Third, we leverage the kinematics model of the moving platform to also derive interpretable hyperparameters for the training process that ensure sufficient maneuverability of the multi-rotor vehicle. The training is performed using the tabular RL method <i>Double Q-Learning</i>. Through extensive simulations we show that the presented method significantly increases the rate of successful landings, while requiring less training time compared to other deep RL approaches. Furthermore, for two comparison scenarios it achieves comparable performance than a cascaded PI controller. Finally, we deploy and demonstrate our algorithm on real hardware. For all evaluation scenarios we provide statistics on the agent’s performance. Source code is openly available at https://github.com/robot-perception-group/rl_multi_rotor_landing.</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"48 4-5","pages":""},"PeriodicalIF":3.7,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-024-10162-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141552153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Guiding real-world reinforcement learning for in-contact manipulation tasks with Shared Control Templates 用共享控制模板指导真实世界中接触式操作任务的强化学习

IF 3.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Autonomous Robots

Pub Date : 2024-06-04 DOI: 10.1007/s10514-024-10164-6

Abhishek Padalkar, Gabriel Quere, Antonin Raffin, João Silvério, Freek Stulp

The requirement for a high number of training episodes has been a major limiting factor for the application of Reinforcement Learning (RL) in robotics. Learning skills directly on real robots requires time, causes wear and tear and can lead to damage to the robot and environment due to unsafe exploratory actions. The success of learning skills in simulation and transferring them to real robots has also been limited by the gap between reality and simulation. This is particularly problematic for tasks involving contact with the environment as contact dynamics are hard to model and simulate. In this paper we propose a framework which leverages a shared control framework for modeling known constraints defined by object interactions and task geometry to reduce the state and action spaces and hence the overall dimensionality of the reinforcement learning problem. The unknown task knowledge and actions are learned by a reinforcement learning agent by conducting exploration in the constrained environment. Using a pouring task and grid-clamp placement task (similar to peg-in-hole) as use cases and a 7-DoF arm, we show that our approach can be used to learn directly on the real robot. The pouring task is learned in only 65 episodes (16 min) and the grid-clamp placement task is learned in 75 episodes (17 min) with strong safety guarantees and simple reward functions, greatly alleviating the need for simulation.

强化学习（RL）在机器人技术中的应用主要受限于对大量训练情节的要求。直接在真实机器人上学习技能需要时间，会造成磨损，并可能因不安全的探索行动而对机器人和环境造成损害。在仿真机器人上学习技能并将其移植到真实机器人上的成功率也受到了现实与仿真之间差距的限制。这在涉及与环境接触的任务中尤为突出，因为接触动力学很难建模和仿真。在本文中，我们提出了一个框架，利用共享控制框架对物体交互和任务几何定义的已知约束进行建模，以减少状态和动作空间，从而降低强化学习问题的整体维度。强化学习代理通过在受限环境中进行探索，学习未知的任务知识和行动。我们使用浇注任务和网格夹放置任务（类似于孔中钉）作为用例，并使用 7-DoF 机械臂，证明我们的方法可以直接用于真实机器人的学习。浇注任务的学习仅用了 65 次（16 分钟），而网格夹放置任务的学习用了 75 次（17 分钟），并且具有很强的安全保证和简单的奖励函数，大大减少了模拟的需要。

{"title":"Guiding real-world reinforcement learning for in-contact manipulation tasks with Shared Control Templates","authors":"Abhishek Padalkar, Gabriel Quere, Antonin Raffin, João Silvério, Freek Stulp","doi":"10.1007/s10514-024-10164-6","DOIUrl":"10.1007/s10514-024-10164-6","url":null,"abstract":"<div><p>The requirement for a high number of training episodes has been a major limiting factor for the application of <i>Reinforcement Learning</i> (RL) in robotics. Learning skills directly on real robots requires time, causes wear and tear and can lead to damage to the robot and environment due to unsafe exploratory actions. The success of learning skills in simulation and transferring them to real robots has also been limited by the gap between reality and simulation. This is particularly problematic for tasks involving contact with the environment as contact dynamics are hard to model and simulate. In this paper we propose a framework which leverages a shared control framework for modeling known constraints defined by object interactions and task geometry to reduce the state and action spaces and hence the overall dimensionality of the reinforcement learning problem. The unknown task knowledge and actions are learned by a reinforcement learning agent by conducting exploration in the constrained environment. Using a pouring task and grid-clamp placement task (similar to peg-in-hole) as use cases and a 7-DoF arm, we show that our approach can be used to learn directly on the real robot. The pouring task is learned in only 65 episodes (16 min) and the grid-clamp placement task is learned in 75 episodes (17 min) with strong safety guarantees and simple reward functions, greatly alleviating the need for simulation.</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"48 4-5","pages":""},"PeriodicalIF":3.7,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-024-10164-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141259479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Simultaneously learning intentions and preferences during physical human-robot cooperation 在人与机器人的物理合作过程中同时学习意图和偏好

IF 3.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Autonomous Robots

Pub Date : 2024-06-04 DOI: 10.1007/s10514-024-10167-3

Linda van der Spaa, Jens Kober, Michael Gienger

The advent of collaborative robots allows humans and robots to cooperate in a direct and physical way. While this leads to amazing new opportunities to create novel robotics applications, it is challenging to make the collaboration intuitive for the human. From a system’s perspective, understanding the human intentions seems to be one promising way to get there. However, human behavior exhibits large variations between individuals, such as for instance preferences or physical abilities. This paper presents a novel concept for simultaneously learning a model of the human intentions and preferences incrementally during collaboration with a robot. Starting out with a nominal model, the system acquires collaborative skills step-by-step within only very few trials. The concept is based on a combination of model-based reinforcement learning and inverse reinforcement learning, adapted to fit collaborations in which human and robot think and act independently. We test the method and compare it to two baselines: one that imitates the human and one that uses plain maximum entropy inverse reinforcement learning, both in simulation and in a user study with a Franka Emika Panda robot arm.

协作机器人的出现使人类和机器人能够以直接和物理的方式进行合作。虽然这为创造新颖的机器人应用带来了令人惊叹的新机遇，但如何让人类直观地感受到这种合作却是一项挑战。从系统的角度来看，理解人类的意图似乎是一种很有前景的方法。然而，人类行为在个体之间存在很大差异，例如喜好或体能。本文提出了一个新颖的概念，即在与机器人合作的过程中，同时逐步学习人类意图和偏好的模型。从一个名义模型开始，系统只需进行几次试验，就能逐步掌握协作技能。这一概念基于基于模型的强化学习和逆向强化学习的结合，适用于人类和机器人独立思考和行动的协作。我们对该方法进行了测试，并将其与两种基线方法进行了比较：一种是模仿人类的方法，另一种是使用普通最大熵反强化学习的方法。

{"title":"Simultaneously learning intentions and preferences during physical human-robot cooperation","authors":"Linda van der Spaa, Jens Kober, Michael Gienger","doi":"10.1007/s10514-024-10167-3","DOIUrl":"10.1007/s10514-024-10167-3","url":null,"abstract":"<div><p>The advent of collaborative robots allows humans and robots to cooperate in a direct and physical way. While this leads to amazing new opportunities to create novel robotics applications, it is challenging to make the collaboration intuitive for the human. From a system’s perspective, understanding the human intentions seems to be one promising way to get there. However, human behavior exhibits large variations between individuals, such as for instance preferences or physical abilities. This paper presents a novel concept for simultaneously learning a model of the human intentions and preferences incrementally during collaboration with a robot. Starting out with a nominal model, the system acquires collaborative skills step-by-step within only very few trials. The concept is based on a combination of model-based reinforcement learning and inverse reinforcement learning, adapted to fit collaborations in which human and robot think and act independently. We test the method and compare it to two baselines: one that imitates the human and one that uses plain maximum entropy inverse reinforcement learning, both in simulation and in a user study with a Franka Emika Panda robot arm.</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"48 4-5","pages":""},"PeriodicalIF":3.7,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-024-10167-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141259728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Laplacian regularized motion tomography for underwater vehicle flow mapping with sporadic localization measurements 利用零星定位测量绘制水下航行器流动图的拉普拉斯正则化运动断层成像技术

IF 3.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Autonomous Robots

Pub Date : 2024-05-24 DOI: 10.1007/s10514-024-10165-5

Ouerghi Meriam, Hou Mengxue, Zhang Fumin

Localization measurements for an autonomous underwater vehicle (AUV) are often difficult to obtain. In many cases, localization measurements are only available sporadically after the AUV comes to the sea surface. Since the motion of AUVs is often affected by unknown underwater flow fields, the sporadic localization measurements carry information of the underwater flow field. Motion tomography (MT) algorithms have been developed to compute a underwater flow map based on the sporadic localization measurements. This paper extends MT by introducing Laplacian regularization in to the problem formulation and the MT algorithm. Laplacian regularization enforces smoothness in the spatial distribution of the underwater flow field. The resulted Laplacian regularized motion tomography (RMT) algorithm converges to achieve a finite error bounded. The performance of the RMT and other variants of MT are compared through the method of data resolution analysis. The improved performance of RMT is confirmed by experimental data collected from underwater glider ocean sensing experiments.

自动潜航器（AUV）的定位测量通常很难获得。在许多情况下，只有在 AUV 到达海面后才能获得零星的定位测量数据。由于自动潜航器的运动通常会受到未知水下流场的影响，因此零星的定位测量会携带水下流场的信息。目前已开发出基于零星定位测量值计算水下流场图的运动层析（MT）算法。本文在问题表述和 MT 算法中引入了拉普拉斯正则化，对 MT 进行了扩展。拉普拉斯正则化能使水下流场的空间分布更加平滑。由此产生的拉普拉斯正则化运动断层扫描（RMT）算法收敛后达到有限误差约束。通过数据分辨率分析方法，比较了 RMT 和其他 MT 变体的性能。水下滑翔机海洋传感实验收集的数据证实了 RMT 性能的提高。

引用次数: 0

Correction: Adaptive hybrid local-global sampling for fast informed sampling-based optimal path planning 更正：自适应局部-全局混合采样，实现基于采样的快速知情最优路径规划

IF 3.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Autonomous Robots

Pub Date : 2024-05-17 DOI: 10.1007/s10514-024-10166-4

Marco Faroni, Nicola Pedrocchi, Manuel Beschi

引用次数: 0

The human in the loop Perspectives and challenges for RoboCup 2050 人在回路中 2050 年机器人世界杯的前景与挑战

IF 3.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Autonomous Robots

Pub Date : 2024-05-16 DOI: 10.1007/s10514-024-10159-3

Alessandra Rossi, Maike Paetzel-Prüsmann, Merel Keijsers, Michael Anderson, Susan Leigh Anderson, Daniel Barry, Jan Gutsche, Justin Hart, Luca Iocchi, Ainse Kokkelmans, Wouter Kuijpers, Yun Liu, Daniel Polani, Caleb Roscon, Marcus Scheunemann, Peter Stone, Florian Vahl, René van de Molengraft, Oskar von Stryk

Robotics researchers have been focusing on developing autonomous and human-like intelligent robots that are able to plan, navigate, manipulate objects, and interact with humans in both static and dynamic environments. These capabilities, however, are usually developed for direct interactions with people in controlled environments, and evaluated primarily in terms of human safety. Consequently, human-robot interaction (HRI) in scenarios with no intervention of technical personnel is under-explored. However, in the future, robots will be deployed in unstructured and unsupervised environments where they will be expected to work unsupervised on tasks which require direct interaction with humans and may not necessarily be collaborative. Developing such robots requires comparing the effectiveness and efficiency of similar design approaches and techniques. Yet, issues regarding the reproducibility of results, comparing different approaches between research groups, and creating challenging milestones to measure performance and development over time make this difficult. Here we discuss the international robotics competition called RoboCup as a benchmark for the progress and open challenges in AI and robotics development. The long term goal of RoboCup is developing a robot soccer team that can win against the world’s best human soccer team by 2050. We selected RoboCup because it requires robots to be able to play with and against humans in unstructured environments, such as uneven fields and natural lighting conditions, and it challenges the known accepted dynamics in HRI. Considering the current state of robotics technology, RoboCup’s goal opens up several open research questions to be addressed by roboticists. In this paper, we (a) summarise the current challenges in robotics by using RoboCup development as an evaluation metric, (b) discuss the state-of-the-art approaches to these challenges and how they currently apply to RoboCup, and (c) present a path for future development in the given areas to meet RoboCup’s goal of having robots play soccer against and with humans by 2050.

机器人研究人员一直致力于开发能够在静态和动态环境中进行规划、导航、操纵物体以及与人类互动的自主式仿人智能机器人。然而，这些能力通常是为在受控环境中与人直接互动而开发的，并主要从人类安全的角度进行评估。因此，在没有技术人员干预的情况下进行的人机交互（HRI）还没有得到充分探索。然而，在未来，机器人将被部署在非结构化和无人监管的环境中，它们将在无人监管的情况下执行需要与人类直接互动的任务，而且不一定是协作性的。开发这类机器人需要比较类似设计方法和技术的有效性和效率。然而，有关结果的可重复性、不同研究小组间不同方法的比较，以及创建具有挑战性的里程碑来衡量性能和随时间推移的发展等问题，都给这项工作带来了困难。在此，我们将讨论名为 RoboCup 的国际机器人竞赛，将其作为人工智能和机器人发展的进步和公开挑战的基准。RoboCup 的长期目标是在 2050 年前开发出一支能战胜世界上最优秀的人类足球队的机器人足球队。我们之所以选择 RoboCup，是因为它要求机器人能够在非结构化环境（如不平整的场地和自然光条件）中与人类同场竞技或与人类对抗，而且它对已知的公认的人力资源集成动态提出了挑战。考虑到机器人技术的现状，RoboCup 的目标提出了几个有待机器人专家解决的开放性研究问题。在本文中，我们(a) 以 RoboCup 的发展作为评估指标，总结了当前机器人技术面临的挑战；(b) 讨论了应对这些挑战的最新方法，以及这些方法目前如何应用于 RoboCup；(c) 提出了未来在特定领域的发展路径，以实现 RoboCup 的目标，即到 2050 年让机器人与人类踢足球。

{"title":"The human in the loop Perspectives and challenges for RoboCup 2050","authors":"Alessandra Rossi, Maike Paetzel-Prüsmann, Merel Keijsers, Michael Anderson, Susan Leigh Anderson, Daniel Barry, Jan Gutsche, Justin Hart, Luca Iocchi, Ainse Kokkelmans, Wouter Kuijpers, Yun Liu, Daniel Polani, Caleb Roscon, Marcus Scheunemann, Peter Stone, Florian Vahl, René van de Molengraft, Oskar von Stryk","doi":"10.1007/s10514-024-10159-3","DOIUrl":"10.1007/s10514-024-10159-3","url":null,"abstract":"<div><p>Robotics researchers have been focusing on developing autonomous and human-like intelligent robots that are able to plan, navigate, manipulate objects, and interact with humans in both static and dynamic environments. These capabilities, however, are usually developed for direct interactions with people in controlled environments, and evaluated primarily in terms of human safety. Consequently, human-robot interaction (HRI) in scenarios with no intervention of technical personnel is under-explored. However, in the future, robots will be deployed in unstructured and unsupervised environments where they will be expected to work unsupervised on tasks which require direct interaction with humans and may not necessarily be collaborative. Developing such robots requires comparing the effectiveness and efficiency of similar design approaches and techniques. Yet, issues regarding the reproducibility of results, comparing different approaches between research groups, and creating challenging milestones to measure performance and development over time make this difficult. Here we discuss the international robotics competition called RoboCup as a benchmark for the progress and open challenges in AI and robotics development. The long term goal of RoboCup is developing a robot soccer team that can win against the world’s best human soccer team by 2050. We selected RoboCup because it requires robots to be able to play with and against humans in unstructured environments, such as uneven fields and natural lighting conditions, and it challenges the known accepted dynamics in HRI. Considering the current state of robotics technology, RoboCup’s goal opens up several open research questions to be addressed by roboticists. In this paper, we (a) summarise the current challenges in robotics by using RoboCup development as an evaluation metric, (b) discuss the state-of-the-art approaches to these challenges and how they currently apply to RoboCup, and (c) present a path for future development in the given areas to meet RoboCup’s goal of having robots play soccer against and with humans by 2050.</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"48 2-3","pages":""},"PeriodicalIF":3.7,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-024-10159-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141032933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0