Pub Date : 2026-02-09DOI: 10.1109/LRA.2026.3662532
Nikolas Thelenberg;Christian Ott
In kinesthetic teaching, a robot is manually guided by a human operator to demonstrate a task. Most methods focus on replaying the recorded motion, but are agnostic to contact transitions, which can be critical when interacting with rigid environments. To overcome this limitation, we propose a framework that allows to teach motions in free space as well as in contact while preventing fast unintended contact transitions. This is accomplished by exploiting a projection-based unilateral damping force that increases close to contact. We derive an explicit analytical expression for the damping characteristics to ensure a safe stop before the contact when no further forces act on the robot. Furthermore, after the teaching, the recorded motion data is utilized to generate a time-optimized trajectory based on convex optimization, in which the contact transitions are explicitly considered. We validated our framework in experiments with a torque-controlled manipulator.
{"title":"A Kinesthetic Teaching Framework for Tasks With Contact Transitions and Time-Optimized Execution","authors":"Nikolas Thelenberg;Christian Ott","doi":"10.1109/LRA.2026.3662532","DOIUrl":"https://doi.org/10.1109/LRA.2026.3662532","url":null,"abstract":"In kinesthetic teaching, a robot is manually guided by a human operator to demonstrate a task. Most methods focus on replaying the recorded motion, but are agnostic to contact transitions, which can be critical when interacting with rigid environments. To overcome this limitation, we propose a framework that allows to teach motions in free space as well as in contact while preventing fast unintended contact transitions. This is accomplished by exploiting a projection-based unilateral damping force that increases close to contact. We derive an explicit analytical expression for the damping characteristics to ensure a safe stop before the contact when no further forces act on the robot. Furthermore, after the teaching, the recorded motion data is utilized to generate a time-optimized trajectory based on convex optimization, in which the contact transitions are explicitly considered. We validated our framework in experiments with a torque-controlled manipulator.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"3971-3978"},"PeriodicalIF":5.3,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11373869","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146216643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1109/LRA.2026.3662653
Yifan Zhai;Rudolf Reiter;Davide Scaramuzza
Quadrotor navigation in unknown environments is critical for practical missions such as search-and-rescue. Solving it requires addressing three key challenges: path-planning in non-convex free space due to obstacles, satisfying quadrotor-specific dynamics and objectives, and exploring unknown regions to expand the map. Recently, the Model Predictive Path Integral (MPPI) method has emerged as a promising solution that solves the first two challenges. By leveraging sampling-based optimization, it can effectively handle non-convex free space while directly optimizing over the full quadrotor dynamics, enabling the inclusion of quadrotor-specific costs such as energy consumption. However, MPPI has been limited to tracking control that only optimizes trajectories in a small neighbourhood around a reference trajectory, as it lacks the ability to explore unknown regions and plan alternative paths when blocked by large obstacles. To solve this issue, we introduce Perception-Aware MPPI (PA-MPPI). Here, perception-awareness is characterized by planning and adapting the trajectory online based on perception objectives. Specifically, when the goal is occluded, PA-MPPI's perception cost biases trajectories that can perceive unknown regions. This expands the mapped traversable space and increases the likelihood of finding alternative paths to the goal. Through hardware experiments, we demonstrate that PA-MPPI, running at 50 Hz, performs on par with the SOTA quadrotor navigation planner for unknown environments in our challenging test scenarios. In addition, we demonstrate that PA-MPPI can be used as a safe and robust action policy for navigation foundation models, which often provide goal poses that are not directly reachable.
{"title":"PA-MPPI: Perception-Aware Model Predictive Path Integral Control for Quadrotor Navigation in Unknown Environments","authors":"Yifan Zhai;Rudolf Reiter;Davide Scaramuzza","doi":"10.1109/LRA.2026.3662653","DOIUrl":"https://doi.org/10.1109/LRA.2026.3662653","url":null,"abstract":"Quadrotor navigation in unknown environments is critical for practical missions such as search-and-rescue. Solving it requires addressing three key challenges: path-planning in non-convex free space due to obstacles, satisfying quadrotor-specific dynamics and objectives, and exploring unknown regions to expand the map. Recently, the Model Predictive Path Integral (MPPI) method has emerged as a promising solution that solves the first two challenges. By leveraging sampling-based optimization, it can effectively handle non-convex free space while directly optimizing over the full quadrotor dynamics, enabling the inclusion of quadrotor-specific costs such as energy consumption. However, MPPI has been limited to tracking control that only optimizes trajectories in a small neighbourhood around a reference trajectory, as it lacks the ability to explore unknown regions and plan alternative paths when blocked by large obstacles. To solve this issue, we introduce Perception-Aware MPPI (PA-MPPI). Here, perception-awareness is characterized by planning and adapting the trajectory online based on perception objectives. Specifically, when the goal is occluded, PA-MPPI's perception cost biases trajectories that can perceive unknown regions. This expands the mapped traversable space and increases the likelihood of finding alternative paths to the goal. Through hardware experiments, we demonstrate that PA-MPPI, running at 50 Hz, performs on par with the SOTA quadrotor navigation planner for unknown environments in our challenging test scenarios. In addition, we demonstrate that PA-MPPI can be used as a safe and robust action policy for navigation foundation models, which often provide goal poses that are not directly reachable.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3804-3811"},"PeriodicalIF":5.3,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aerial manipulators (AMs) are gaining increasing attention in automated transportation and emergency services due to their superior dexterity compared to conventional multirotor drones. However, their practical deployment is challenged by the complexity of time-varying inertial parameters, which are highly sensitive to payload variations and manipulator configurations. Inspired by human strategies for interacting with unknown objects, this letter presents a novel onboard framework for robust aerial manipulation. The proposed system integrates a vision-based pre-grasp inertia estimation module with a post-grasp adaptation mechanism, enabling real-time estimation and adaptation of inertial dynamics. For control, we develop an inertia-aware adaptive control strategy based on gain scheduling, and assess its robustness via frequency-domain system identification. Our study provides new insights into post-grasp control for AMs, and real-world experiments validate the effectiveness and feasibility of the proposed framework.
{"title":"FlyAware: Inertia-Aware Aerial Manipulation via Vision-Based Estimation and Post-Grasp Adaptation","authors":"Biyu Ye;Na Fan;Zhengping Fan;Weiliang Deng;Hongming Chen;Qifeng Chen;Ximin Lyu","doi":"10.1109/LRA.2026.3662562","DOIUrl":"https://doi.org/10.1109/LRA.2026.3662562","url":null,"abstract":"Aerial manipulators (AMs) are gaining increasing attention in automated transportation and emergency services due to their superior dexterity compared to conventional multirotor drones. However, their practical deployment is challenged by the complexity of time-varying inertial parameters, which are highly sensitive to payload variations and manipulator configurations. Inspired by human strategies for interacting with unknown objects, this letter presents a novel onboard framework for robust aerial manipulation. The proposed system integrates a vision-based pre-grasp inertia estimation module with a post-grasp adaptation mechanism, enabling real-time estimation and adaptation of inertial dynamics. For control, we develop an inertia-aware adaptive control strategy based on gain scheduling, and assess its robustness via frequency-domain system identification. Our study provides new insights into post-grasp control for AMs, and real-world experiments validate the effectiveness and feasibility of the proposed framework.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3780-3787"},"PeriodicalIF":5.3,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1109/LRA.2026.3662577
Kim Tien Ly;Kai Lu;Ioannis Havoutis
We introduce an interactive LLM-based framework designed to enhance the autonomy and robustness of domestic robots, targeting embodied intelligence. Our approach reduces reliance on large-scale data and incorporates a robot-agnostic pipeline that embodies an LLM. Our framework, InteLiPlan, ensures that the LLM’s decision-making capabilities are effectively aligned with robotic functions, enhancing operational robustness and adaptability, while our human-in-the-loop mechanism allows for real-time human intervention when user instruction is required. We evaluate our method in both simulation and on the real robot platforms, including a Toyota Human Support Robot and an ANYmal D robot with a Unitree Z1 arm. Our method achieves a 95% success rate in the ‘fetch me’ task completion with failure recovery, highlighting its capability in both failure reasoning and task planning. InteLiPlan achieves comparable performance to state-of-the-art LLM-based robotics planners, while using only real-time onboard computing.
{"title":"InteLiPlan: An Interactive Lightweight LLM-Based Planner for Domestic Robot Autonomy","authors":"Kim Tien Ly;Kai Lu;Ioannis Havoutis","doi":"10.1109/LRA.2026.3662577","DOIUrl":"https://doi.org/10.1109/LRA.2026.3662577","url":null,"abstract":"We introduce an interactive LLM-based framework designed to enhance the autonomy and robustness of domestic robots, targeting embodied intelligence. Our approach reduces reliance on large-scale data and incorporates a robot-agnostic pipeline that embodies an LLM. Our framework, <italic>InteLiPlan</i>, ensures that the LLM’s decision-making capabilities are effectively aligned with robotic functions, enhancing operational robustness and adaptability, while our human-in-the-loop mechanism allows for real-time human intervention when user instruction is required. We evaluate our method in both simulation and on the real robot platforms, including a Toyota Human Support Robot and an ANYmal D robot with a Unitree Z1 arm. Our method achieves a 95% success rate in the ‘fetch me’ task completion with failure recovery, highlighting its capability in both failure reasoning and task planning. <italic>InteLiPlan</i> achieves comparable performance to state-of-the-art LLM-based robotics planners, while using only real-time onboard computing.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3875-3882"},"PeriodicalIF":5.3,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pectoral-fin-based (labriform) swimming combines high-speed propulsion with agile maneuverability through rigid–flexible fin partitioning. Inspired by this principle, we present a multi-stable soft robotic swimmer composed of two wedge-shaped bistable actuators integrated with fin-like rigid–flexible morphologies. The bistable actuators generate large, rapid deformations for thrust production, while the compliant fin membranes enable drag-reducing feathering during recovery. An analytical model is developed to predict the shape of bistable actuators and is validated experimentally with a minimum prediction accuracy of 97.74%. Computational fluid dynamics (CFD) analysis reveals that bistable switching induces vortex dipole ejection, contributing to thrust generation. The proposed robot attains a maximum speed of 17.53 $text{cm}cdot text{s}^{-1}$ (1.10 $text{BL}cdot text{s}^{-1}$), a turning radius of 0.58 per body length, and a turning speed of 31.51$^circ$/s, highlighting our design in shaping both swimming speed and maneuverability. By integrating bistable actuation with bio-inspired fin morphologies, this work offers a principled design strategy for achieving fast and maneuverable swimming in robotic systems.
{"title":"A Labriform-Inspired Multi-Stable Soft Robotic Swimmer","authors":"Jiaqiao Liang;Zefeng Xu;Peiyu Liu;Qiaosong Fan;Linjun Liu;Bin Xie;Ye Chen;Yitong Zhou","doi":"10.1109/LRA.2026.3662591","DOIUrl":"https://doi.org/10.1109/LRA.2026.3662591","url":null,"abstract":"Pectoral-fin-based (labriform) swimming combines high-speed propulsion with agile maneuverability through rigid–flexible fin partitioning. Inspired by this principle, we present a multi-stable soft robotic swimmer composed of two wedge-shaped bistable actuators integrated with fin-like rigid–flexible morphologies. The bistable actuators generate large, rapid deformations for thrust production, while the compliant fin membranes enable drag-reducing feathering during recovery. An analytical model is developed to predict the shape of bistable actuators and is validated experimentally with a minimum prediction accuracy of 97.74%. Computational fluid dynamics (CFD) analysis reveals that bistable switching induces vortex dipole ejection, contributing to thrust generation. The proposed robot attains a maximum speed of 17.53 <inline-formula><tex-math>$text{cm}cdot text{s}^{-1}$</tex-math></inline-formula> (1.10 <inline-formula><tex-math>$text{BL}cdot text{s}^{-1}$</tex-math></inline-formula>), a turning radius of 0.58 per body length, and a turning speed of 31.51<inline-formula><tex-math>$^circ$</tex-math></inline-formula>/s, highlighting our design in shaping both swimming speed and maneuverability. By integrating bistable actuation with bio-inspired fin morphologies, this work offers a principled design strategy for achieving fast and maneuverable swimming in robotic systems.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4026-4033"},"PeriodicalIF":5.3,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146216621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1109/LRA.2026.3662582
Zhuoyi Zhang;Yixin Han;Renjun Li;Xiao Li
Data-driven methods offer promising solutions for robotic manipulation in human-centric environments, but enabling robots to operate complex appliances from natural language remains a significant challenge. The ambiguity of human instructions and the visual diversity of real-world objects make it difficult to generate precise and reliable action sequences. In this letter, we propose a hierarchical multimodal Retrieval-Augmented Generation (RAG) framework that fuses visual perception with language understanding. Our framework uses a vision-based module to identify an appliance and its documentation from a snapshot, then leverages a task-oriented RAG pipeline to process user instructions, retrieve relevant manual sections, and generate executable action sequences. We train and validate this framework on a custom dataset of microwave oven operation tasks and demonstrate its effectiveness, robustness, and practical viability through extensive virtual and physical experiments on a robotic platform.
{"title":"HiMRAG: Hierarchical Multimodal Retrieval-Augmented Generation for Robot Task Planning","authors":"Zhuoyi Zhang;Yixin Han;Renjun Li;Xiao Li","doi":"10.1109/LRA.2026.3662582","DOIUrl":"https://doi.org/10.1109/LRA.2026.3662582","url":null,"abstract":"Data-driven methods offer promising solutions for robotic manipulation in human-centric environments, but enabling robots to operate complex appliances from natural language remains a significant challenge. The ambiguity of human instructions and the visual diversity of real-world objects make it difficult to generate precise and reliable action sequences. In this letter, we propose a hierarchical multimodal Retrieval-Augmented Generation (RAG) framework that fuses visual perception with language understanding. Our framework uses a vision-based module to identify an appliance and its documentation from a snapshot, then leverages a task-oriented RAG pipeline to process user instructions, retrieve relevant manual sections, and generate executable action sequences. We train and validate this framework on a custom dataset of microwave oven operation tasks and demonstrate its effectiveness, robustness, and practical viability through extensive virtual and physical experiments on a robotic platform.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3883-3890"},"PeriodicalIF":5.3,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1109/LRA.2026.3662600
Mingyi Wang;Shuzhen Luo
Understanding human walking locomotion in reduced gravity could enhance astronaut mobility and improve the space exploration efficiency. However, existing studies often require high costs and significant time and resource commitments for natural locomotion studies. Here, we present a deep reinforcement learning (DRL)-based simulation framework that predicts locomotion patterns across reduced-gravity environments by learning control policies tailored to each gravity condition. This approach identifies optimal gait behaviors without extensive experimental data and can be extended to include assistive devices such as exoskeletons, enabling systematic studies of human–exoskeleton interaction and walking adaptation in reduced-gravity settings. To validate the simulation, we utilized a mechanical body-weight suspension system to replicate reduced gravity and conducted walking experiments under three reduced gravity levels. The stance phase (ST) decreased from 72.59% to 61.03% and the swing phase (SW) increased from 27.41% to 38.97%, with stride duration nearly constant. Under the exoskeleton assistance, ST decreased from 63.52% to 62.02%, and SW increased from 36.48% to 37.98%. Hip joint range of motion decreased consistently with gravity in both conditions. These trends closely matched experimental results, demonstrating the potential of DRL-based simulations for studying locomotion and assistive strategies in reduced gravity.
{"title":"Predicting Human Locomotion in Reduced Gravity via Deep Learning-Driven Musculoskeletal Simulation","authors":"Mingyi Wang;Shuzhen Luo","doi":"10.1109/LRA.2026.3662600","DOIUrl":"https://doi.org/10.1109/LRA.2026.3662600","url":null,"abstract":"Understanding human walking locomotion in reduced gravity could enhance astronaut mobility and improve the space exploration efficiency. However, existing studies often require high costs and significant time and resource commitments for natural locomotion studies. Here, we present a deep reinforcement learning (DRL)-based simulation framework that predicts locomotion patterns across reduced-gravity environments by learning control policies tailored to each gravity condition. This approach identifies optimal gait behaviors without extensive experimental data and can be extended to include assistive devices such as exoskeletons, enabling systematic studies of human–exoskeleton interaction and walking adaptation in reduced-gravity settings. To validate the simulation, we utilized a mechanical body-weight suspension system to replicate reduced gravity and conducted walking experiments under three reduced gravity levels. The stance phase (ST) decreased from 72.59% to 61.03% and the swing phase (SW) increased from 27.41% to 38.97%, with stride duration nearly constant. Under the exoskeleton assistance, ST decreased from 63.52% to 62.02%, and SW increased from 36.48% to 37.98%. Hip joint range of motion decreased consistently with gravity in both conditions. These trends closely matched experimental results, demonstrating the potential of DRL-based simulations for studying locomotion and assistive strategies in reduced gravity.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 3","pages":"3899-3906"},"PeriodicalIF":5.3,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1109/LRA.2026.3662977
Mikihisa Yuasa;Ramavarapu S. Sreenivas;Huy T. Tran
Learning-based policies have demonstrated success in many robotic applications, but often lack explainability. We propose a neuro-symbolic explanation framework that generates a weighted signal temporal logic (wSTL) specification which describes a robot policy in a human-interpretable form. Existing methods typically produce explanations that are verbose and inconsistent, which hinders explainability, and are loose, which limits meaningful insights. We address these issues by introducing a simplification process consisting of predicate filtering, regularization, and iterative pruning. We also introduce three explainability metrics—conciseness, consistency, and strictness—to assess explanation quality beyond conventional classification accuracy. Our method—TLNet—is validated in three simulated robotic environments, where it outperforms baselines in generating concise, consistent, and strict wSTL explanations without sacrificing accuracy. This work bridges policy learning and explainability through formal methods, contributing to more transparent decision-making in robotics.
{"title":"Neuro-Symbolic Generation of Explanations for Robot Policies With Weighted Signal Temporal Logic","authors":"Mikihisa Yuasa;Ramavarapu S. Sreenivas;Huy T. Tran","doi":"10.1109/LRA.2026.3662977","DOIUrl":"https://doi.org/10.1109/LRA.2026.3662977","url":null,"abstract":"Learning-based policies have demonstrated success in many robotic applications, but often lack explainability. We propose a neuro-symbolic explanation framework that generates a weighted signal temporal logic (wSTL) specification which describes a robot policy in a human-interpretable form. Existing methods typically produce explanations that are verbose and inconsistent, which hinders explainability, and are loose, which limits meaningful insights. We address these issues by introducing a simplification process consisting of predicate filtering, regularization, and iterative pruning. We also introduce three explainability metrics—conciseness, consistency, and strictness—to assess explanation quality beyond conventional classification accuracy. Our method—<sc>TLNet</small>—is validated in three simulated robotic environments, where it outperforms baselines in generating concise, consistent, and strict wSTL explanations without sacrificing accuracy. This work bridges policy learning and explainability through formal methods, contributing to more transparent decision-making in robotics.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"3963-3970"},"PeriodicalIF":5.3,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11386893","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146216611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Our aim is to learn to solve long-horizon decision-making problems in complex robotics domains given low-level skills and a handful of demonstrations containing sequences of images. To this end, we focus on learning abstract symbolic world models that facilitate zero-shot generalization to novel goals via planning. A critical component of such models is the set of symbolic predicates that define properties of and relationships between objects. In this work, we leverage pretrained vision-language models (VLMs) to propose a large set of visual predicates potentially relevant for decision-making, and to evaluate those predicates directly from camera images. At training time, we pass the proposed predicates and demonstrations into an optimization-based model-learning algorithm to obtain an abstract symbolic world model that is defined in terms of a compact subset of the proposed predicates. At test time, given a novel goal in a novel setting, we use the VLM to construct a symbolic description of the current world state, and then use a search-based planning algorithm to find a sequence of low-level skills that achieves the goal. We demonstrate empirically across experiments in both simulation and the real world that our method can generalize aggressively, applying its learned world model to solve problems with varying visual backgrounds, types, numbers, and arrangements of objects, as well as novel goals and much longer horizons than those seen at training time.
{"title":"From Pixels to Predicates: Learning Symbolic World Models via Pretrained VLMs","authors":"Ashay Athalye;Nishanth Kumar;Tom Silver;Yichao Liang;Jiuguang Wang;Tomás Lozano-Pérez;Leslie Pack Kaelbling","doi":"10.1109/LRA.2026.3662533","DOIUrl":"https://doi.org/10.1109/LRA.2026.3662533","url":null,"abstract":"Our aim is to learn to solve long-horizon decision-making problems in complex robotics domains given low-level skills and a handful of demonstrations containing sequences of images. To this end, we focus on learning abstract symbolic world models that facilitate zero-shot generalization to novel goals via planning. A critical component of such models is the set of symbolic <italic>predicates</i> that define properties of and relationships between objects. In this work, we leverage pretrained vision-language models (VLMs) to propose a large set of visual predicates potentially relevant for decision-making, and to evaluate those predicates directly from camera images. At training time, we pass the proposed predicates and demonstrations into an optimization-based model-learning algorithm to obtain an abstract symbolic world model that is defined in terms of a compact subset of the proposed predicates. At test time, given a novel goal in a novel setting, we use the VLM to construct a symbolic description of the current world state, and then use a search-based planning algorithm to find a sequence of low-level skills that achieves the goal. We demonstrate empirically across experiments in both simulation and the real world that our method can generalize aggressively, applying its learned world model to solve problems with varying visual backgrounds, types, numbers, and arrangements of objects, as well as novel goals and much longer horizons than those seen at training time.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4002-4009"},"PeriodicalIF":5.3,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146216528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1109/LRA.2026.3662646
Yao Huang;Li Liu;Jian Sun;Bo Song
Disrupted hand motor functions may be restored through exoskeleton-assisted rehabilitation training. However, the variability of soft tissue in human joints or across individuals and development of an exoskeleton that combines human-machine motion compatibility and dynamic compliance pose persistent challenges. We introduce a hybrid single-motor-driven rigid–soft exoskeleton for the index finger to assist in rehabilitation training. A rigid parallel mechanism directly drives the soft component of the metacarpophalangeal (MCP) joint. In addition, we adopt an interlocking mechanism to induce deformation in leaf springs, enabling the coordinated flexion and extension of multiple joints. A motion analysis based on the modified Denavit–Hartenberg convention confirms that the proposed parallel mechanism can compensate for the misalignment displacement of the MCP joint. Based on the displacement and force applied to the soft component by the designed rigid parallel mechanism, kinematic and static analyses along with dimensional optimization are performed on a dual-segment parallel leaf spring. A prototype exoskeleton undergoing tests demonstrated Pearson correlation coefficients of 0.998, 0.991, 0.986, for the MCP, proximal and distal interphalangeal (PIP/DIP) joints, respectively. The corresponding joint flexion angles were 68.19°, 81.91°, and 41.64°. The exoskeleton self-aligns with the index finger joints, properly assisting the natural bending motion of the finger to meet rehabilitation training needs of patients. The proposed exoskeleton can assist with a fingertip force of 6.2 N, thereby satisfying grip requirements, while the reduced force on the dorsal surface of the index finger enhances comfort during use. The proposed solution is promising for developing hand exoskeletons.
{"title":"Design and Analysis of Hybrid Rigid-Soft Self-Aligning Index Finger Exoskeleton","authors":"Yao Huang;Li Liu;Jian Sun;Bo Song","doi":"10.1109/LRA.2026.3662646","DOIUrl":"https://doi.org/10.1109/LRA.2026.3662646","url":null,"abstract":"Disrupted hand motor functions may be restored through exoskeleton-assisted rehabilitation training. However, the variability of soft tissue in human joints or across individuals and development of an exoskeleton that combines human-machine motion compatibility and dynamic compliance pose persistent challenges. We introduce a hybrid single-motor-driven rigid–soft exoskeleton for the index finger to assist in rehabilitation training. A rigid parallel mechanism directly drives the soft component of the metacarpophalangeal (MCP) joint. In addition, we adopt an interlocking mechanism to induce deformation in leaf springs, enabling the coordinated flexion and extension of multiple joints. A motion analysis based on the modified Denavit–Hartenberg convention confirms that the proposed parallel mechanism can compensate for the misalignment displacement of the MCP joint. Based on the displacement and force applied to the soft component by the designed rigid parallel mechanism, kinematic and static analyses along with dimensional optimization are performed on a dual-segment parallel leaf spring. A prototype exoskeleton undergoing tests demonstrated Pearson correlation coefficients of 0.998, 0.991, 0.986, for the MCP, proximal and distal interphalangeal (PIP/DIP) joints, respectively. The corresponding joint flexion angles were 68.19°, 81.91°, and 41.64°. The exoskeleton self-aligns with the index finger joints, properly assisting the natural bending motion of the finger to meet rehabilitation training needs of patients. The proposed exoskeleton can assist with a fingertip force of 6.2 N, thereby satisfying grip requirements, while the reduced force on the dorsal surface of the index finger enhances comfort during use. The proposed solution is promising for developing hand exoskeletons.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"11 4","pages":"4010-4017"},"PeriodicalIF":5.3,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146216616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}