With the fast improvement of machine learning, reinforcement learning (RL) has been used to automate human tasks in different areas. However, training such agents is difficult and restricted to expert users. Moreover, it is mostly limited to simulation environments due to the high cost and safety concerns of interactions in the real-world. Demonstration Learning is a paradigm in which an agent learns to perform a task by imitating the behavior of an expert shown in demonstrations. Learning from demonstration accelerates the learning process by improving sample efficiency, while also reducing the effort of the programmer. Because the task is learned without interacting with the environment, demonstration learning allows the automation of a wide range of real-world applications such as robotics and healthcare. This paper provides a survey of demonstration learning, where we formally introduce the demonstration problem along with its main challenges and provide a comprehensive overview of the process of learning from demonstrations from the creation of the demonstration data set, to learning methods from demonstrations, and optimization by combining demonstration learning with different machine learning methods. We also review the existing benchmarks and identify their strengths and limitations. Additionally, we discuss the advantages and disadvantages of the paradigm as well as its main applications. Lastly, we discuss the open problems and future research directions of the field.
Tendon-driven continuum robots (TDCRs) have infinite degrees of freedom and high flexibility, posing challenges for accurate modeling and autonomous control, especially in confined environments. This paper presents a model-less optimal visual control (MLOVC) method using neurodynamic optimization to enable autonomous target tracking of TDCRs in confined environments. The TDCR’s kinematics are estimated online from sensory data, establishing a connection between the actuator input and visual features. An optimal visual servoing method based on quadratic programming (QP) is developed to ensure precise target tracking without violating the robot’s physical constraints. An inverse-free recurrent neural network (RNN)-based neurodynamic optimization method is designed to solve the complex QP problem. Comparative simulations and experiments demonstrate that the proposed method outperforms existing methods in target tracking accuracy and computational efficiency. The RNN-based controller successfully achieves target tracking within constraints in confined environments.
Humans excel at performing a wide range of sophisticated tasks by leveraging skills acquired from prior experiences. This characteristic is especially essential in robotics empowered by deep reinforcement learning, as learning every skill from scratch is time-consuming and may not always be feasible. With the prior skills incorporated, skill composition aims to accelerate the learning process on new robotic tasks. Previous works have given insight into combining pre-trained task-agnostic skills, whereas skills are transformed into fixed order representation, resulting in poor capturing of potential complex skill relations. In this paper, we novelly propose a Graph-based framework for Skill Composition (GSC). To learn rich structural information, a carefully designed skill graph is constructed, where skill representations are taken as nodes and skill relations are utilized as edges. Furthermore, to allow it trained efficiently on large-scale skill set, a transformer-style graph updating method is employed to achieve comprehensive information aggregation. Our simulation experiments indicate that GSC outperforms the state-of-the-art methods on various challenging tasks. Additionally, we successfully apply the technique to the navigation task on a real quadruped robot. The project homepage can be found at Graph Skill Composition.
With the shift from Cloud to Fog and Dew Robotics a lot of emphasis of the research community has been devoted to task offloading. Effective and efficient resource monitoring is however necessary for such offloading and it is also fundamental for other important safety and security tasks. Despite this, robot monitoring has received little attention in general and also for Robot Operating System (ROS) the most employed framework in robotics. In this paper DewROS2 is presented, a platform for Dew Robotics that comprises entities to monitor the system status and to share it with interested applications. The design and implementation of the platform is presented together with the monitoring entities created. DewROS2 has been deployed on different real devices, including an unmanned aerial vehicle and an industrial router, to move from theory to practice and to analyze the impact of monitoring on robot resources. DewROS2 has also been tested in a search and rescue use case where robots are used to collect and transmit videos to spot signs of humans in trouble. Results in controlled and uncontrolled conditions show that the monitoring nodes do not have a significant impact on the performance while providing important and measurable benefits to the applications. Accurately monitoring of robot resources, for example, allows the search and rescue application to almost double the utilization of the network, therefore collecting video at a much higher resolution.
In this paper, we consider a general task of jumping varying distances and heights for a quadrupedal robot in noisy environments, such as off of uneven terrain and with variable robot dynamics parameters. To accurately jump in such conditions, we propose a framework using deep reinforcement learning that leverages and augments the complex solution of nonlinear trajectory optimization for quadrupedal jumping. While the standalone optimization limits jumping to take-off from flat ground and requires accurate assumptions of robot dynamics, our proposed approach improves the robustness to allow jumping off of significantly uneven terrain with variable robot dynamical parameters and environmental conditions. Compared with walking and running, the realization of aggressive jumping on hardware necessitates accounting for the motors’ torque-speed relationship as well as the robot’s total power limits. By incorporating these constraints into our learning framework, we successfully deploy our policy sim-to-real without further tuning, fully exploiting the available onboard power supply and motors. We demonstrate robustness to environment noise of foot disturbances of up to 6 cm in height, or 33% of the robot’s nominal standing height, while jumping 2x the body length in distance.
The “Internet of Robotic Things” (IoRT) is a concept that connects sensors and robotic objects. One of the practical applications of IoRT is swarm robotics, where multiple robots collaborate in a shared workspace to accomplish assigned tasks that may be challenging or impossible for a single robot to conquer. Swarm robots are particularly useful in critical situations, such as post-earthquake scenarios, where they can locate survivors and provide assistance in areas inaccessible to humans. In these life-saving situations, reliable and prompt communication among swarm robots is of utmost importance. To address the need for highly dependable and low-latency communication in swarm robotics, this research introduces a novel hybrid approach called Multi-objective QoS optimization based on Support vector regression and Genetic algorithm (MQSG). The MQSG method consists of two main phases: Parameter Relationship Identification and Parameter Optimization. In the Parameter Relationship Identification phase, the relationship between network inputs (Packet inter-arrival time, Packet size, Transmission power, Distance between sender and receiver) and outputs (quality of service (QoS) parameters) is established using support vector regression. In the parameter optimization phase, a multi-objective function is created based on the obtained relationships from the Parameter Relationship Identification phase. By solving this multi-objective function, optimal values for each QoS parameter are determined, leading to enhanced network performance. Simulation results demonstrate that the MQSG method outperforms other similar algorithms in terms of transmission latency, packet delivery rate, and the number of retransmitted packets.