Presents corrections to the article “Design Models and Performance Analysis for a Novel Shape Memory Alloy-Actuated Wearable Hand Exoskeleton for Rehabilitation”.
介绍对文章 "用于康复的新型形状记忆合金可穿戴手部外骨骼的设计模型和性能分析 "的更正。
{"title":"Correction To: “Design Models and Performance Analysis for a Novel Shape Memory Alloy-Actuated Wearable Hand Exoskeleton for Rehabilitation”","authors":"Elio Matteo Curcio;Francesco Lago;Giuseppe Carbone","doi":"10.1109/LRA.2024.3495353","DOIUrl":"https://doi.org/10.1109/LRA.2024.3495353","url":null,"abstract":"Presents corrections to the article “Design Models and Performance Analysis for a Novel Shape Memory Alloy-Actuated Wearable Hand Exoskeleton for Rehabilitation”.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11657-11657"},"PeriodicalIF":4.6,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10758911","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142679299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1109/LRA.2024.3498774
Eran Bamani;Eden Nissinman;Lisa Koenigsberg;Inbar Meir;Avishai Sintov
Object recognition, commonly performed by a camera, is a fundamental requirement for robots to complete complex tasks. Some tasks require recognizing objects far from the robot's camera. A challenging example is Ultra-Range Gesture Recognition (URGR) in human-robot interaction where the user exhibits directive gestures at a distance of up to 25 m from the robot. However, training a model to recognize hardly visible objects located in ultra-range requires an exhaustive collection of a significant amount of labeled samples. The generation of synthetic training datasets is a recent solution to the lack of real-world data, while unable to properly replicate the realistic visual characteristics of distant objects in images. In this letter, we propose the Diffusion in Ultra-Range (DUR) framework based on a Diffusion model to generate labeled images of distant objects in various scenes. The DUR generator receives a desired distance and class (e.g., gesture) and outputs a corresponding synthetic image. We apply DUR to train a URGR model with directive gestures in which fine details of the gesturing hand are challenging to distinguish. DUR is compared to other types of generative models showcasing superiority both in fidelity and in recognition success rate when training a URGR model. More importantly, training a DUR model on a limited amount of real data and then using it to generate synthetic data for training a URGR model outperforms directly training the URGR model on real data. The synthetic-based URGR model is also demonstrated in gesture-based direction of a ground robot.
{"title":"A Diffusion-Based Data Generator for Training Object Recognition Models in Ultra-Range Distance","authors":"Eran Bamani;Eden Nissinman;Lisa Koenigsberg;Inbar Meir;Avishai Sintov","doi":"10.1109/LRA.2024.3498774","DOIUrl":"https://doi.org/10.1109/LRA.2024.3498774","url":null,"abstract":"Object recognition, commonly performed by a camera, is a fundamental requirement for robots to complete complex tasks. Some tasks require recognizing objects far from the robot's camera. A challenging example is Ultra-Range Gesture Recognition (URGR) in human-robot interaction where the user exhibits directive gestures at a distance of up to 25 m from the robot. However, training a model to recognize hardly visible objects located in ultra-range requires an exhaustive collection of a significant amount of labeled samples. The generation of synthetic training datasets is a recent solution to the lack of real-world data, while unable to properly replicate the realistic visual characteristics of distant objects in images. In this letter, we propose the Diffusion in Ultra-Range (DUR) framework based on a Diffusion model to generate labeled images of distant objects in various scenes. The DUR generator receives a desired distance and class (e.g., gesture) and outputs a corresponding synthetic image. We apply DUR to train a URGR model with directive gestures in which fine details of the gesturing hand are challenging to distinguish. DUR is compared to other types of generative models showcasing superiority both in fidelity and in recognition success rate when training a URGR model. More importantly, training a DUR model on a limited amount of real data and then using it to generate synthetic data for training a URGR model outperforms directly training the URGR model on real data. The synthetic-based URGR model is also demonstrated in gesture-based direction of a ground robot.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11722-11729"},"PeriodicalIF":4.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142679289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1109/LRA.2024.3498700
Keli Wu;Haifei Chen;Lijun Li;Zhengxiong Liu;Haitao Chang
Robot position is a crucial information flow for space teleoperation, and the existence of time delay makes it actual asynchronous in sending and reception, greatly affecting the telepresence. To address this issue, this letter investigates the position prediction for space teleoperation and proposes an Snow Ablation Optimization (SAO)-CNN-BiGRU-Attention based prediction algorithm. Through prediction, the spatiotemporal synchronization of position information is achieved, thereby improving the telepresence. Firstly, based on the bilateral active estimation delay control framework, the CNN-BiGRU-Attention model is first introduced into position prediction for space teleoperation, where CNN serves for capturing the spatial feature relationship of the past position information, while BiGRU perceives its dynamic changes and combines Attention mechanism to focus on key feature, ultimately ensuring the accuracy of the prediction model. However, hyperparameter selection for the CNN-BiGRU-Attention model directly affects its prediction efficiency, and the custom selection way of hyperparameter obviously cannot guarantee optimality. To solve this problem, the SAO algorithm is introduced into the hyperparameter selection, utilizing its unique dual population mechanism and flexible position update equation to autonomously identify the optimal model hyperparameter and ensure optimal prediction efficiency. Finally, the effectiveness of the SAO-CNN-BiGRU-Attention algorithm was verified through comparative simulation experiments.
{"title":"Position Prediction for Space Teleoperation With SAO-CNN-BiGRU-Attention Algorithm","authors":"Keli Wu;Haifei Chen;Lijun Li;Zhengxiong Liu;Haitao Chang","doi":"10.1109/LRA.2024.3498700","DOIUrl":"https://doi.org/10.1109/LRA.2024.3498700","url":null,"abstract":"Robot position is a crucial information flow for space teleoperation, and the existence of time delay makes it actual asynchronous in sending and reception, greatly affecting the telepresence. To address this issue, this letter investigates the position prediction for space teleoperation and proposes an Snow Ablation Optimization (SAO)-CNN-BiGRU-Attention based prediction algorithm. Through prediction, the spatiotemporal synchronization of position information is achieved, thereby improving the telepresence. Firstly, based on the bilateral active estimation delay control framework, the CNN-BiGRU-Attention model is first introduced into position prediction for space teleoperation, where CNN serves for capturing the spatial feature relationship of the past position information, while BiGRU perceives its dynamic changes and combines Attention mechanism to focus on key feature, ultimately ensuring the accuracy of the prediction model. However, hyperparameter selection for the CNN-BiGRU-Attention model directly affects its prediction efficiency, and the custom selection way of hyperparameter obviously cannot guarantee optimality. To solve this problem, the SAO algorithm is introduced into the hyperparameter selection, utilizing its unique dual population mechanism and flexible position update equation to autonomously identify the optimal model hyperparameter and ensure optimal prediction efficiency. Finally, the effectiveness of the SAO-CNN-BiGRU-Attention algorithm was verified through comparative simulation experiments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11674-11681"},"PeriodicalIF":4.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142679295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-13DOI: 10.1109/LRA.2024.3497721
Rui Chen;Haixiang Zhang;Shamiao Zhou;Xiangjian Xu;Zean Yuan;Huijiang Wang;Jun Luo
The human hand is capable of executing a wide range of complex movements due to its biomechanical structure and skin sensing system. Designing an anthropomorphic hand that mimics the biomechanical structure of the human and incorporates skin sensing, presents a long-term challenge in the field of robotics. In this paper, we proposed a concept for structure design, which is to combine rigid, flexible and soft components, and designed a rigid-flexible-soft coupled dexterous hand based on this concept. For enhancing dexterous hand's adaptivity to environment, we also developed a soft piezoresistive tactile module inspired by human skin, and mounted it on fingertips to detect sliding states. Meanwhile, we have also designed an integrated system for dexterous manipulation including sensing, actuation and control, based on the concept of embodied intelligence, aiming to achieve a closed-loop control to dexterous hand. This letter provides a reliable structure and control strategy to enrich the perceptual abilities of the dexterous hand and enable their applications in unstructured environments.
{"title":"A Rigid-Flexible-Soft Coupled Dexterous Hand With Sliding Tactile Perception and Feedback","authors":"Rui Chen;Haixiang Zhang;Shamiao Zhou;Xiangjian Xu;Zean Yuan;Huijiang Wang;Jun Luo","doi":"10.1109/LRA.2024.3497721","DOIUrl":"https://doi.org/10.1109/LRA.2024.3497721","url":null,"abstract":"The human hand is capable of executing a wide range of complex movements due to its biomechanical structure and skin sensing system. Designing an anthropomorphic hand that mimics the biomechanical structure of the human and incorporates skin sensing, presents a long-term challenge in the field of robotics. In this paper, we proposed a concept for structure design, which is to combine rigid, flexible and soft components, and designed a rigid-flexible-soft coupled dexterous hand based on this concept. For enhancing dexterous hand's adaptivity to environment, we also developed a soft piezoresistive tactile module inspired by human skin, and mounted it on fingertips to detect sliding states. Meanwhile, we have also designed an integrated system for dexterous manipulation including sensing, actuation and control, based on the concept of embodied intelligence, aiming to achieve a closed-loop control to dexterous hand. This letter provides a reliable structure and control strategy to enrich the perceptual abilities of the dexterous hand and enable their applications in unstructured environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11682-11689"},"PeriodicalIF":4.6,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142679297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Grasping an unknown object is difficult for robot hands. When the characteristics of the object are unknown, knowing how to plan the speed at and width to which the fingers are narrowed is difficult. In this letter, we propose a method to realize the three functions of simultaneous finger contact, impact reduction, and contact force control, which enable effective grasping of an unknown object. We accomplish this by using a control framework called multiple virtual dynamics-based control, which was proposed in a previous study. The advantage of this control is that multiple functions can be realized without switching control laws. The previous study achieved two functions, impact reduction and contact force control, with a two layers of impedance control which was applied independently to individual fingers. In this letter, a new idea of virtual dynamics that treats multiple fingers comprehensively is introduced, which enables the function of simultaneous contact without compromising the other two functions. This research provides a method to achieve delicate grasping by using proximity sensors.
{"title":"Integrated Grasping Controller Leveraging Optical Proximity Sensors for Simultaneous Contact, Impact Reduction, and Force Control","authors":"Shunsuke Tokiwa;Hikaru Arita;Yosuke Suzuki;Kenji Tahara","doi":"10.1109/LRA.2024.3497726","DOIUrl":"https://doi.org/10.1109/LRA.2024.3497726","url":null,"abstract":"Grasping an unknown object is difficult for robot hands. When the characteristics of the object are unknown, knowing how to plan the speed at and width to which the fingers are narrowed is difficult. In this letter, we propose a method to realize the three functions of simultaneous finger contact, impact reduction, and contact force control, which enable effective grasping of an unknown object. We accomplish this by using a control framework called multiple virtual dynamics-based control, which was proposed in a previous study. The advantage of this control is that multiple functions can be realized without switching control laws. The previous study achieved two functions, impact reduction and contact force control, with a two layers of impedance control which was applied independently to individual fingers. In this letter, a new idea of virtual dynamics that treats multiple fingers comprehensively is introduced, which enables the function of simultaneous contact without compromising the other two functions. This research provides a method to achieve delicate grasping by using proximity sensors.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11633-11640"},"PeriodicalIF":4.6,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10752343","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This letter proposes a novel robotic gripper that can expand workable spaces in a target environment to pick up objects from confined spaces. The proposed gripper is most effective for retrieving objects from deformable environments, such as taking an object out of a drawstring bag, or for extracting target objects located behind surrounding objects. The proposed gripper achieves both work-space expansion and grasping motion by using only a single motor. The gripper is equipped with four outer fingers for expanding the environment and two inner fingers for grasping an object. The inner and outer fingers move in different directions for their respective functions of grasping and spatial expansion. To realize two different movements of the fingers, a novel self-motion switching mechanism that switches between the functions as feed-screw and rack-and-pinion mechanisms is developed. The mechanism switches the motions according to the magnitude of the force applied to the inner fingers. This letter presents the mechanism design of the developed gripper, including the self-motion switching mechanism and the actuation strategy for expanding the workable space. The mechanical analysis is also presented, and the analysis result is validated experimentally. Moreover, an automatic object-picking system using the developed gripper is constructed to evaluate the gripper.
{"title":"Single-Motor-Driven (4 + 2)-Fingered Robotic Gripper Capable of Expanding the Workable Space in the Extremely Confined Environment","authors":"Toshihiro Nishimura;Keisuke Akasaka;Subaru Ishikawa;Tetsuyou Watanabe","doi":"10.1109/LRA.2024.3497753","DOIUrl":"https://doi.org/10.1109/LRA.2024.3497753","url":null,"abstract":"This letter proposes a novel robotic gripper that can expand workable spaces in a target environment to pick up objects from confined spaces. The proposed gripper is most effective for retrieving objects from deformable environments, such as taking an object out of a drawstring bag, or for extracting target objects located behind surrounding objects. The proposed gripper achieves both work-space expansion and grasping motion by using only a single motor. The gripper is equipped with four outer fingers for expanding the environment and two inner fingers for grasping an object. The inner and outer fingers move in different directions for their respective functions of grasping and spatial expansion. To realize two different movements of the fingers, a novel self-motion switching mechanism that switches between the functions as feed-screw and rack-and-pinion mechanisms is developed. The mechanism switches the motions according to the magnitude of the force applied to the inner fingers. This letter presents the mechanism design of the developed gripper, including the self-motion switching mechanism and the actuation strategy for expanding the workable space. The mechanical analysis is also presented, and the analysis result is validated experimentally. Moreover, an automatic object-picking system using the developed gripper is constructed to evaluate the gripper.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11585-11592"},"PeriodicalIF":4.6,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a learned probabilistic neural policy for safe, occlusion-free target tracking. The core novelty of our work stems from the structure of our policy network that combines generative modeling based on Conditional Variational Autoencoder (CVAE) with differentiable optimization layers. The weights of the CVAE network and the parameters of the differentiable optimization can be learned in an end-to-end fashion through demonstration trajectories. We improve the state-of-the-art (SOTA) in the following respects. We show that our learned policy outperforms existing SOTA in terms of occlusion/collision avoidance capabilities and computation time. Second, we present an extensive ablation showing how different components of our learning pipeline contribute to the overall tracking task. We also demonstrate the real-time performance of our approach on resource-constrained hardware such as NVIDIA Jetson TX2. Finally, our learned policy can also be viewed as a reactive planner for navigation in highly cluttered environments.
{"title":"Differentiable-Optimization Based Neural Policy for Occlusion-Aware Target Tracking","authors":"Houman Masnavi;Arun Kumar Singh;Farrokh Janabi-Sharifi","doi":"10.1109/LRA.2024.3497717","DOIUrl":"https://doi.org/10.1109/LRA.2024.3497717","url":null,"abstract":"We propose a learned probabilistic neural policy for safe, occlusion-free target tracking. The core novelty of our work stems from the structure of our policy network that combines generative modeling based on Conditional Variational Autoencoder (CVAE) with differentiable optimization layers. The weights of the CVAE network and the parameters of the differentiable optimization can be learned in an end-to-end fashion through demonstration trajectories. We improve the state-of-the-art (SOTA) in the following respects. We show that our learned policy outperforms existing SOTA in terms of occlusion/collision avoidance capabilities and computation time. Second, we present an extensive ablation showing how different components of our learning pipeline contribute to the overall tracking task. We also demonstrate the real-time performance of our approach on resource-constrained hardware such as NVIDIA Jetson TX2. Finally, our learned policy can also be viewed as a reactive planner for navigation in highly cluttered environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11714-11721"},"PeriodicalIF":4.6,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142679321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-11DOI: 10.1109/LRA.2024.3495375
Zhenhua Wan;Peng Fu;Kunfeng Wang;Kaichun Zhao
In this letter, we develop a tightly coupled polarization-visual-inertial localization system that utilizes naturally-attributed polarized skylight to provide a global heading. We introduce a focal plane polarization camera with negligible instantaneous field-of-view error to collect polarized skylight. Then, we design a robust heading determination method from polarized skylight and construct a global stable heading constraint. In particular, this constraint compensates for the heading unobservability present in standard VINS. In addition to the standard sparse visual feature measurements used in VINS, polarization heading residuals are constructed and co-optimized in a tightly-coupled VINS update. An adaptive fusion strategy is designed to correct the cumulative drift. Outdoor real-world experiments show that the proposed method outperforms state-of-the-art VINS-Fusion in terms of localization accuracy, and improves 22% over VINS-Fusion in a wooded campus environment.
{"title":"Visual-Inertial Localization Leveraging Skylight Polarization Pattern Constraints","authors":"Zhenhua Wan;Peng Fu;Kunfeng Wang;Kaichun Zhao","doi":"10.1109/LRA.2024.3495375","DOIUrl":"https://doi.org/10.1109/LRA.2024.3495375","url":null,"abstract":"In this letter, we develop a tightly coupled polarization-visual-inertial localization system that utilizes naturally-attributed polarized skylight to provide a global heading. We introduce a focal plane polarization camera with negligible instantaneous field-of-view error to collect polarized skylight. Then, we design a robust heading determination method from polarized skylight and construct a global stable heading constraint. In particular, this constraint compensates for the heading unobservability present in standard VINS. In addition to the standard sparse visual feature measurements used in VINS, polarization heading residuals are constructed and co-optimized in a tightly-coupled VINS update. An adaptive fusion strategy is designed to correct the cumulative drift. Outdoor real-world experiments show that the proposed method outperforms state-of-the-art VINS-Fusion in terms of localization accuracy, and improves 22% over VINS-Fusion in a wooded campus environment.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11481-11488"},"PeriodicalIF":4.6,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-11DOI: 10.1109/LRA.2024.3495577
Carlos Cardenas-Perez;Giulio Romualdi;Mohamed Elobaid;Stefano Dafarra;Giuseppe L'Erario;Silvio Traversaro;Pietro Morerio;Alessio Del Bue;Daniele Pucci
This letter presents XBG (eXteroceptive Behaviour Generation), a multimodal end-to-end Imitation Learning (IL) system for whole-body autonomous humanoid robots used in real-world Human-Robot Interaction (HRI) scenarios. The main contribution is an architecture for learning HRI behaviours using a data-driven approach. A diverse dataset is collected via teleoperation, covering multiple HRI scenarios, such as handshaking, handwaving, payload reception, walking, and walking with a payload. After synchronizing, filtering, and transforming the data, we show how to train the presented Deep Neural Networks (DNN), integrating exteroceptive and proprioceptive information to help the robot understand both its environment and its actions. The robot takes in sequences of images (RGB and depth) and joints state information to react accordingly. By fusing multimodal signals over time, the model enables autonomous capabilities in a robotic platform. The models are evaluated based on the success rates in the mentioned HRI scenarios and they are deployed on the ergoCub humanoid robot. XBG achieves success rates between 60% and 100% even when tested in unseen environments.
{"title":"XBG: End-to-End Imitation Learning for Autonomous Behaviour in Human-Robot Interaction and Collaboration","authors":"Carlos Cardenas-Perez;Giulio Romualdi;Mohamed Elobaid;Stefano Dafarra;Giuseppe L'Erario;Silvio Traversaro;Pietro Morerio;Alessio Del Bue;Daniele Pucci","doi":"10.1109/LRA.2024.3495577","DOIUrl":"https://doi.org/10.1109/LRA.2024.3495577","url":null,"abstract":"This letter presents XBG (eXteroceptive Behaviour Generation), a multimodal end-to-end Imitation Learning (IL) system for whole-body autonomous humanoid robots used in real-world Human-Robot Interaction (HRI) scenarios. The main contribution is an architecture for learning HRI behaviours using a data-driven approach. A diverse dataset is collected via teleoperation, covering multiple HRI scenarios, such as handshaking, handwaving, payload reception, walking, and walking with a payload. After synchronizing, filtering, and transforming the data, we show how to train the presented Deep Neural Networks (DNN), integrating exteroceptive and proprioceptive information to help the robot understand both its environment and its actions. The robot takes in sequences of images (RGB and depth) and joints state information to react accordingly. By fusing multimodal signals over time, the model enables autonomous capabilities in a robotic platform. The models are evaluated based on the success rates in the mentioned HRI scenarios and they are deployed on the ergoCub humanoid robot. XBG achieves success rates between 60% and 100% even when tested in unseen environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11617-11624"},"PeriodicalIF":4.6,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-11DOI: 10.1109/LRA.2024.3495452
Zhimin Shao;Jialang Xu;Danail Stoyanov;Evangelos B. Mazomenos;Yueming Jin
Despite advancements in robotic systems and surgical data science, ensuring safe execution in robot-assisted minimally invasive surgery (RMIS) remains challenging. Current methods for surgical error detection typically involve two parts: identifying gestures and then detecting errors within each gesture clip. These methods often overlook the rich contextual and semantic information inherent in surgical videos, with limited performance due to reliance on accurate gesture identification. Inspired by the chain-of-thought prompting in natural language processing, this letter presents a novel and real-time end-to-end error detection framework, Chain-of-Gesture (COG) prompting, integrating contextual information from surgical videos step by step. This encompasses two reasoning modules that simulate expert surgeons' decision-making: a Gestural-Visual Reasoning module using transformer and attention architectures for gesture prompting and a Multi-Scale Temporal Reasoning module employing a multi-stage temporal convolutional network with slow and fast paths for temporal information extraction. We validate our method on the JIGSAWS dataset and show improvements over the state-of-the-art, achieving 4.6% higher F1 score, 4.6% higher Accuracy, and 5.9% higher Jaccard index, with an average frame processing time of 6.69 milliseconds. This demonstrates our approach's potential to enhance RMIS safety and surgical education efficacy.
{"title":"Think Step by Step: Chain-of-Gesture Prompting for Error Detection in Robotic Surgical Videos","authors":"Zhimin Shao;Jialang Xu;Danail Stoyanov;Evangelos B. Mazomenos;Yueming Jin","doi":"10.1109/LRA.2024.3495452","DOIUrl":"https://doi.org/10.1109/LRA.2024.3495452","url":null,"abstract":"Despite advancements in robotic systems and surgical data science, ensuring safe execution in robot-assisted minimally invasive surgery (RMIS) remains challenging. Current methods for surgical error detection typically involve two parts: identifying gestures and then detecting errors within each gesture clip. These methods often overlook the rich contextual and semantic information inherent in surgical videos, with limited performance due to reliance on accurate gesture identification. Inspired by the chain-of-thought prompting in natural language processing, this letter presents a novel and real-time end-to-end error detection framework, Chain-of-Gesture (COG) prompting, integrating contextual information from surgical videos step by step. This encompasses two reasoning modules that simulate expert surgeons' decision-making: a Gestural-Visual Reasoning module using transformer and attention architectures for gesture prompting and a Multi-Scale Temporal Reasoning module employing a multi-stage temporal convolutional network with slow and fast paths for temporal information extraction. We validate our method on the JIGSAWS dataset and show improvements over the state-of-the-art, achieving 4.6% higher F1 score, 4.6% higher Accuracy, and 5.9% higher Jaccard index, with an average frame processing time of 6.69 milliseconds. This demonstrates our approach's potential to enhance RMIS safety and surgical education efficacy.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11513-11520"},"PeriodicalIF":4.6,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}