Pub Date : 2025-10-28eCollection Date: 2025-01-01DOI: 10.3389/frobt.2025.1684845
Rui Wang, Ruiqi Wang, Hao Hu, Huai Yu
Introduction: Animal-involved scenarios pose significant challenges for autonomous driving systems due to their rarity, unpredictability, and safety-critical nature. Despite their importance, existing vision-language datasets for autonomous driving largely overlook these long-tail situations.
Methods: To address this gap, we introduce AniDriveQA, a novel visual question answering (VQA) dataset specifically designed to evaluate vision-language models (VLMs) in driving scenarios involving animals. The dataset is constructed through a scalable pipeline that collects diverse animal-related traffic scenes from internet videos, filters and annotates them using object detection and scene classification models, and generates multi-task VQA labels with a large vision-language model. AniDriveQA includes three key task types: scene description, animal description, and driving suggestion.
Results: For evaluation, a hybrid scheme was employed that combined classification accuracy for structured tasks with LLM-based scoring for open-ended responses. Extensive experiments on various open-source VLMs revealed large performance disparities across models and task types.
Discussion: The experimental results demonstrate that AniDriveQA effectively exposes the limitations of current VLMs in rare yet safety-critical autonomous driving scenarios. The dataset provides a valuable diagnostic benchmark for advancing reasoning, perception, and decision-making capabilities in future vision-language models.
{"title":"AniDriveQA: a VQA dataset for driving scenes with animal presence.","authors":"Rui Wang, Ruiqi Wang, Hao Hu, Huai Yu","doi":"10.3389/frobt.2025.1684845","DOIUrl":"10.3389/frobt.2025.1684845","url":null,"abstract":"<p><strong>Introduction: </strong>Animal-involved scenarios pose significant challenges for autonomous driving systems due to their rarity, unpredictability, and safety-critical nature. Despite their importance, existing vision-language datasets for autonomous driving largely overlook these long-tail situations.</p><p><strong>Methods: </strong>To address this gap, we introduce AniDriveQA, a novel visual question answering (VQA) dataset specifically designed to evaluate vision-language models (VLMs) in driving scenarios involving animals. The dataset is constructed through a scalable pipeline that collects diverse animal-related traffic scenes from internet videos, filters and annotates them using object detection and scene classification models, and generates multi-task VQA labels with a large vision-language model. AniDriveQA includes three key task types: scene description, animal description, and driving suggestion.</p><p><strong>Results: </strong>For evaluation, a hybrid scheme was employed that combined classification accuracy for structured tasks with LLM-based scoring for open-ended responses. Extensive experiments on various open-source VLMs revealed large performance disparities across models and task types.</p><p><strong>Discussion: </strong>The experimental results demonstrate that AniDriveQA effectively exposes the limitations of current VLMs in rare yet safety-critical autonomous driving scenarios. The dataset provides a valuable diagnostic benchmark for advancing reasoning, perception, and decision-making capabilities in future vision-language models.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1684845"},"PeriodicalIF":3.0,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12604350/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145507531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-28eCollection Date: 2025-01-01DOI: 10.3389/frobt.2025.1695262
Georgiy N Kuplinov
Limited battery capacity poses a challenge for autonomous robots. We believe that instead of relying solely on electric motors and batteries, basically Conventional Autonomous Robots (CAR), one way to address this challenge may be to develop Biohybrid Autonomous Robots (BAR), based on current achievements of the field of biohybrid robotics. The BAR approach is based on the facts that fat store high amount of energy, that biological muscles generate decent force per unit of cross-sectional area and that biological muscles have capability for regeneration and adaptation compared to electric motors. To reach conclusions about the feasibility of BAR, this study uses data from the fields of muscle energetics, robotics, engineering, physiology, biomechanics and others to perform analysis and interdisciplinary calculations. Our calculations show that the BAR approach is up to 5.1 times more efficient in terms of the mass of energy substrate to useful energy transported than the Conventional Autonomous Robots (CAR) with mass-produced batteries in an ideal scenario. The study also presents the model for determining the point of the rational use of the BAR, taking into the account basal metabolism of living systems. The results of this study provide a preliminary basis for further research of the BAR, putting it into the context of the other possible solutions for energy autonomy problem: Generator-Powered Autonomous Robots (GPAR) and Fuell-Cell Autonomous Robots (FCAR).
{"title":"The biohybrid autonomous robots (BAR): a feasibility of implementation.","authors":"Georgiy N Kuplinov","doi":"10.3389/frobt.2025.1695262","DOIUrl":"10.3389/frobt.2025.1695262","url":null,"abstract":"<p><p>Limited battery capacity poses a challenge for autonomous robots. We believe that instead of relying solely on electric motors and batteries, basically Conventional Autonomous Robots (CAR), one way to address this challenge may be to develop Biohybrid Autonomous Robots (BAR), based on current achievements of the field of biohybrid robotics. The BAR approach is based on the facts that fat store high amount of energy, that biological muscles generate decent force per unit of cross-sectional area and that biological muscles have capability for regeneration and adaptation compared to electric motors. To reach conclusions about the feasibility of BAR, this study uses data from the fields of muscle energetics, robotics, engineering, physiology, biomechanics and others to perform analysis and interdisciplinary calculations. Our calculations show that the BAR approach is up to 5.1 times more efficient in terms of the mass of energy substrate to useful energy transported than the Conventional Autonomous Robots (CAR) with mass-produced batteries in an ideal scenario. The study also presents the model for determining the point of the rational use of the BAR, taking into the account basal metabolism of living systems. The results of this study provide a preliminary basis for further research of the BAR, putting it into the context of the other possible solutions for energy autonomy problem: Generator-Powered Autonomous Robots (GPAR) and Fuell-Cell Autonomous Robots (FCAR).</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1695262"},"PeriodicalIF":3.0,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12603390/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145507478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-28eCollection Date: 2025-01-01DOI: 10.3389/frobt.2025.1604506
S K Surya Prakash, Darshankumar Prajapati, Bhuvan Narula, Amit Shukla
This paper presents a robust vision-based motion planning framework for dual-arm manipulators that introduces a novel three-way force equilibrium with velocity-dependent stabilization. The framework combines an improved Artificial Potential Field (iAPF) for linear velocity control with a Proportional-Derivative (PD) controller for angular velocity, creating a hybrid twist command for precise manipulation. A priority-based state machine enables human-like asymmetric dual-arm manipulation. Lyapunov stability analysis proves the asymptotic convergence to desired configurations. The method introduces a computationally efficient continuous distance calculation between links based on line segment configurations, enabling real-time collision monitoring. Experimental validation integrates a real-time vision system using YOLOv8 OBB that achieves 20 frames per second with 0.99/0.97 detection accuracy for bolts/nuts. Comparative tests against traditional APF methods demonstrate that the proposed approach provides stabilized motion planning with smoother trajectories and optimized spatial separation, effectively preventing inter-arm collisions during industrial component sorting.
{"title":"iAPF: an improved artificial potential field framework for asymmetric dual-arm manipulation with real-time inter-arm collision avoidance.","authors":"S K Surya Prakash, Darshankumar Prajapati, Bhuvan Narula, Amit Shukla","doi":"10.3389/frobt.2025.1604506","DOIUrl":"10.3389/frobt.2025.1604506","url":null,"abstract":"<p><p>This paper presents a robust vision-based motion planning framework for dual-arm manipulators that introduces a novel three-way force equilibrium with velocity-dependent stabilization. The framework combines an improved Artificial Potential Field (iAPF) for linear velocity control with a Proportional-Derivative (PD) controller for angular velocity, creating a hybrid twist command for precise manipulation. A priority-based state machine enables human-like asymmetric dual-arm manipulation. Lyapunov stability analysis proves the asymptotic convergence to desired configurations. The method introduces a computationally efficient continuous distance calculation between links based on line segment configurations, enabling real-time collision monitoring. Experimental validation integrates a real-time vision system using YOLOv8 OBB that achieves 20 frames per second with 0.99/0.97 detection accuracy for bolts/nuts. Comparative tests against traditional APF methods demonstrate that the proposed approach provides stabilized motion planning with smoother trajectories and optimized spatial separation, effectively preventing inter-arm collisions during industrial component sorting.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1604506"},"PeriodicalIF":3.0,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12602476/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145507522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-27eCollection Date: 2025-01-01DOI: 10.3389/frobt.2025.1659302
Stephanie Tulk Jesso, William George Kennedy, Nele Russwinkel, Levern Currie
{"title":"Editorial: The translation and implementation of robotics and embodied AI in healthcare.","authors":"Stephanie Tulk Jesso, William George Kennedy, Nele Russwinkel, Levern Currie","doi":"10.3389/frobt.2025.1659302","DOIUrl":"10.3389/frobt.2025.1659302","url":null,"abstract":"","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1659302"},"PeriodicalIF":3.0,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12598029/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145496678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-23eCollection Date: 2025-01-01DOI: 10.3389/frobt.2025.1687825
Federico Allione, Maria Lazzaroni, Antonios E Gkikakis, Christian Di Natali, Luigi Monica, Darwin G Caldwell, Jesús Ortiz
Musculoskeletal disorders, particularly low back pain, are some of the most common occupational health issues globally, causing significant personal suffering and economic burdens. Workers performing repetitive manual material handling tasks are especially at risk. FleXo, a lightweight (1.35 kg), flexible, ergonomic, and passive back-support exoskeleton is intended to reduce lower back strain during lifting tasks while allowing full freedom of movement for activities like walking, sitting, or side bending. FleXo's design results from an advanced multi-objective design optimization approach that balances functionality and user comfort. In this work, validated through user feedback in a series of relevant repetitive tasks, it is demonstrated that FleXo can reduce the perceived physical effort during lifting tasks, enhance user satisfaction, improve employee wellbeing, promote workplace safety, decrease injuries, and lower the costs (both to society and companies) associated with lower back pain and injury.
{"title":"FleXo: a flexible passive exoskeleton optimized for reducing lower back strain in manual handling tasks.","authors":"Federico Allione, Maria Lazzaroni, Antonios E Gkikakis, Christian Di Natali, Luigi Monica, Darwin G Caldwell, Jesús Ortiz","doi":"10.3389/frobt.2025.1687825","DOIUrl":"10.3389/frobt.2025.1687825","url":null,"abstract":"<p><p>Musculoskeletal disorders, particularly low back pain, are some of the most common occupational health issues globally, causing significant personal suffering and economic burdens. Workers performing repetitive manual material handling tasks are especially at risk. FleXo, a lightweight (1.35 kg), flexible, ergonomic, and passive back-support exoskeleton is intended to reduce lower back strain during lifting tasks while allowing full freedom of movement for activities like walking, sitting, or side bending. FleXo's design results from an advanced multi-objective design optimization approach that balances functionality and user comfort. In this work, validated through user feedback in a series of relevant repetitive tasks, it is demonstrated that FleXo can reduce the perceived physical effort during lifting tasks, enhance user satisfaction, improve employee wellbeing, promote workplace safety, decrease injuries, and lower the costs (both to society and companies) associated with lower back pain and injury.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1687825"},"PeriodicalIF":3.0,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12588867/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145483487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-22eCollection Date: 2025-01-01DOI: 10.3389/frobt.2025.1660244
Yuchao Li, Ziqi Jin, Jin Liu, Daolin Ma
Industrial terminal assembly tasks are often repetitive and involve handling components with tight tolerances that are susceptible to damage. Learning an effective terminal assembly policy in real-world is challenging, as collisions between parts and the environment can lead to slippage or part breakage. In this paper, we propose a safe reinforcement learning approach to develop a visuo-tactile assembly policy that is robust to variations in grasp poses. Our method minimizes collisions between the terminal head and terminal base by decomposing the assembly task into three distinct phases. In the first grasp phase,a vision-guided model is trained to pick the terminal head from an initial bin. In the second align phase, a tactile-based grasp pose estimation model is employed to align the terminal head with the terminal base. In the final assembly phase, a visuo-tactile policy is learned to precisely insert the terminal head into the terminal base. To ensure safe training, the robot leverages human demonstrations and interventions. Experimental results on PLC terminal assembly demonstrate that the proposed method achieves 100% successful insertions across 100 different initial end-effector and grasp poses, while imitation learning and online-RL policy yield only 9% and 0%.
{"title":"Visuo-tactile feedback policies for terminal assembly facilitated by reinforcement learning.","authors":"Yuchao Li, Ziqi Jin, Jin Liu, Daolin Ma","doi":"10.3389/frobt.2025.1660244","DOIUrl":"10.3389/frobt.2025.1660244","url":null,"abstract":"<p><p>Industrial terminal assembly tasks are often repetitive and involve handling components with tight tolerances that are susceptible to damage. Learning an effective terminal assembly policy in real-world is challenging, as collisions between parts and the environment can lead to slippage or part breakage. In this paper, we propose a safe reinforcement learning approach to develop a visuo-tactile assembly policy that is robust to variations in grasp poses. Our method minimizes collisions between the terminal head and terminal base by decomposing the assembly task into three distinct phases. In the first <i>grasp</i> phase,a vision-guided model is trained to pick the terminal head from an initial bin. In the second <i>align</i> phase, a tactile-based grasp pose estimation model is employed to align the terminal head with the terminal base. In the final <i>assembly</i> phase, a visuo-tactile policy is learned to precisely insert the terminal head into the terminal base. To ensure safe training, the robot leverages human demonstrations and interventions. Experimental results on PLC terminal assembly demonstrate that the proposed method achieves 100% successful insertions across 100 different initial end-effector and grasp poses, while imitation learning and online-RL policy yield only 9% and 0%.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1660244"},"PeriodicalIF":3.0,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12586048/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145460310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-21eCollection Date: 2025-01-01DOI: 10.3389/frobt.2025.1693988
Jongyoon Park, Pileun Kim, Daeil Ko
The integration of Vision-Language Models (VLMs) into autonomous systems is of growing importance for improving Human-Robot Interaction (HRI), enabling robots to operate within complex and unstructured environments and collaborate with non-expert users. For mobile robots to be effectively deployed in dynamic settings such as domestic or industrial areas, the ability to interpret and execute natural language commands is crucial. However, while VLMs offer powerful zero-shot, open-vocabulary recognition capabilities, their high computational cost presents a significant challenge for real-time performance on resource-constrained edge devices. This study provides a systematic analysis of the trade-offs involved in optimizing a real-time robotic perception pipeline on the NVIDIA Jetson AGX Orin 64GB platform. We investigate the relationship between accuracy and latency by evaluating combinations of two open-vocabulary detection models and two prompt-based segmentation models. Each pipeline is optimized using various precision levels (FP32, FP16, and Best) via NVIDIA TensorRT. We present a quantitative comparison of the mean Intersection over Union (mIoU) and latency for each configuration, offering practical insights and benchmarks for researchers and developers deploying these advanced models on embedded systems.
将视觉语言模型(VLMs)集成到自主系统中对于改善人机交互(HRI)越来越重要,使机器人能够在复杂和非结构化的环境中操作,并与非专业用户协作。为了使移动机器人有效地部署在家庭或工业领域等动态环境中,解释和执行自然语言命令的能力至关重要。然而,尽管vlm提供了强大的零射击、开放词汇表识别能力,但其高昂的计算成本对资源受限边缘设备的实时性能提出了重大挑战。本研究系统分析了在NVIDIA Jetson AGX Orin 64GB平台上优化实时机器人感知管道所涉及的权衡。我们通过评估两种开放词汇检测模型和两种基于提示的分割模型的组合来研究准确率和延迟之间的关系。每个管道都通过NVIDIA TensorRT使用不同的精度级别(FP32, FP16和Best)进行优化。我们对每种配置的平均交联(mIoU)和延迟进行了定量比较,为研究人员和开发人员在嵌入式系统上部署这些先进模型提供了实用的见解和基准。
{"title":"Real-time open-vocabulary perception for mobile robots on edge devices: a systematic analysis of the accuracy-latency trade-off.","authors":"Jongyoon Park, Pileun Kim, Daeil Ko","doi":"10.3389/frobt.2025.1693988","DOIUrl":"https://doi.org/10.3389/frobt.2025.1693988","url":null,"abstract":"<p><p>The integration of Vision-Language Models (VLMs) into autonomous systems is of growing importance for improving Human-Robot Interaction (HRI), enabling robots to operate within complex and unstructured environments and collaborate with non-expert users. For mobile robots to be effectively deployed in dynamic settings such as domestic or industrial areas, the ability to interpret and execute natural language commands is crucial. However, while VLMs offer powerful zero-shot, open-vocabulary recognition capabilities, their high computational cost presents a significant challenge for real-time performance on resource-constrained edge devices. This study provides a systematic analysis of the trade-offs involved in optimizing a real-time robotic perception pipeline on the NVIDIA Jetson AGX Orin 64GB platform. We investigate the relationship between accuracy and latency by evaluating combinations of two open-vocabulary detection models and two prompt-based segmentation models. Each pipeline is optimized using various precision levels (FP32, FP16, and Best) via NVIDIA TensorRT. We present a quantitative comparison of the mean Intersection over Union (mIoU) and latency for each configuration, offering practical insights and benchmarks for researchers and developers deploying these advanced models on embedded systems.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1693988"},"PeriodicalIF":3.0,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12583037/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145453636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-21eCollection Date: 2025-01-01DOI: 10.3389/frobt.2025.1650228
Jonas Smits, Pierre Schegg, Loic Wauters, Luc Perard, Corentin Languepin, Davide Recchia, Vera Damerjian Pieters, Stéphane Lopez, Didier Tchetche, Kendra Grubb, Jorgen Hansen, Eric Sejor, Pierre Berthet-Rayne
Transcatheter Aortic Valve Implantation (TAVI) is a minimally invasive procedure in which a transcatheter heart valve (THV) is implanted within the patient's diseased native aortic valve. The procedure is increasingly chosen even for intermediate-risk and younger patients, as it combines complication rates comparable to open-heart surgery with the advantage of being far less invasive. Despite its benefits, challenges remain in achieving accurate and repeatable valve positioning, with inaccuracies potentially leading to complications such as THV migration, coronary obstruction, and conduction disturbances (CD). The latter often requires a permanent pacemaker implantation as a costly and life-changing mitigation. Robotic assistance may offer solutions, enhancing precision, standardization, and reducing radiation exposure for clinicians. This article introduces a novel solution for robot-assisted TAVI, addressing the growing need for skilled clinicians and improving procedural outcomes. We present an in-vivo animal demonstration of robotic-assisted TAVI, showing feasibility of tele-operative instrument control and THV deployment. This, done at safer distances from radiation sources by a single operator. Furthermore, THV positioning and deployment under supervised autonomy is demonstrated on phantom, and shown to be feasible using both camera- and fluoroscopy-based imaging feedback and AI. Finally, an initial operator study probes performance and potential added value of various technology augmentations with respect to a manual expert operator, indicating equivalent to superior accuracy and repeatability using robotic assistance. It is concluded that robot-assisted TAVI is technically feasible in-vivo, and presents a strong case for a clinically meaningful application of level-3 autonomy. These findings support the potential of surgical robotic technology to enhance TAVI accuracy and repeatability, ultimately improving patient outcomes and expanding procedural accessibility.
{"title":"Towards autonomous robot-assisted transcatheter heart valve implantation: in vivo teleoperation and phantom validation of AI-guided positioning.","authors":"Jonas Smits, Pierre Schegg, Loic Wauters, Luc Perard, Corentin Languepin, Davide Recchia, Vera Damerjian Pieters, Stéphane Lopez, Didier Tchetche, Kendra Grubb, Jorgen Hansen, Eric Sejor, Pierre Berthet-Rayne","doi":"10.3389/frobt.2025.1650228","DOIUrl":"10.3389/frobt.2025.1650228","url":null,"abstract":"<p><p>Transcatheter Aortic Valve Implantation (TAVI) is a minimally invasive procedure in which a transcatheter heart valve (THV) is implanted within the patient's diseased native aortic valve. The procedure is increasingly chosen even for intermediate-risk and younger patients, as it combines complication rates comparable to open-heart surgery with the advantage of being far less invasive. Despite its benefits, challenges remain in achieving accurate and repeatable valve positioning, with inaccuracies potentially leading to complications such as THV migration, coronary obstruction, and conduction disturbances (CD). The latter often requires a permanent pacemaker implantation as a costly and life-changing mitigation. Robotic assistance may offer solutions, enhancing precision, standardization, and reducing radiation exposure for clinicians. This article introduces a novel solution for robot-assisted TAVI, addressing the growing need for skilled clinicians and improving procedural outcomes. We present an <i>in-vivo</i> animal demonstration of robotic-assisted TAVI, showing feasibility of tele-operative instrument control and THV deployment. This, done at safer distances from radiation sources by a single operator. Furthermore, THV positioning and deployment under supervised autonomy is demonstrated on phantom, and shown to be feasible using both camera- and fluoroscopy-based imaging feedback and AI. Finally, an initial operator study probes performance and potential added value of various technology augmentations with respect to a manual expert operator, indicating equivalent to superior accuracy and repeatability using robotic assistance. It is concluded that robot-assisted TAVI is technically feasible <i>in-vivo</i>, and presents a strong case for a clinically meaningful application of level-3 autonomy. These findings support the potential of surgical robotic technology to enhance TAVI accuracy and repeatability, ultimately improving patient outcomes and expanding procedural accessibility.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1650228"},"PeriodicalIF":3.0,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12583050/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145453642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-20eCollection Date: 2025-01-01DOI: 10.3389/frobt.2025.1671673
Fernando Amodeo, Noé Pérez-Higueras, Luis Merino, Fernando Caballero
Mobile robots require knowledge of the environment, especially of humans located in its vicinity. While the most common approaches for detecting humans involve computer vision, an often overlooked hardware feature of robots for people detection are their 2D range finders. These were originally intended for obstacle avoidance and mapping/SLAM tasks. In most robots, they are conveniently located at a height approximately between the ankle and the knee, so they can be used for detecting people too, and with a larger field of view and depth resolution compared to cameras. In this paper, we present a new dataset for people detection using knee-high 2D range finders called FROG. This dataset has greater laser resolution, scanning frequency, and more complete annotation data compared to existing datasets such as DROW (Beyer et al., 2018). Particularly, the FROG dataset contains annotations for 100% of its laser scans (unlike DROW which only annotates 5%), 17x more annotated scans, 100x more people annotations, and over twice the distance traveled by the robot. We propose a benchmark based on the FROG dataset, and analyze a collection of state-of-the-art people detectors based on 2D range finder data. We also propose and evaluate a new end-to-end deep learning approach for people detection. Our solution works with the raw sensor data directly (not needing hand-crafted input data features), thus avoiding CPU preprocessing and releasing the developer of understanding specific domain heuristics. Experimental results show how the proposed people detector attains results comparable to the state of the art, while an optimized implementation for ROS can operate at more than 500 Hz.
移动机器人需要了解环境,尤其是其附近的人类。虽然检测人类的最常见方法涉及计算机视觉,但机器人用于检测人类的一个经常被忽视的硬件功能是它们的2D测距仪。这些最初用于避障和映射/SLAM任务。在大多数机器人中,它们的高度大约在脚踝和膝盖之间,所以它们也可以用来探测人,而且与相机相比,它们具有更大的视野和深度分辨率。在本文中,我们提出了一种新的数据集,用于使用膝盖高的2D测距仪进行人员检测,称为FROG。与现有数据集(如DROW)相比,该数据集具有更高的激光分辨率、扫描频率和更完整的注释数据(Beyer et al., 2018)。特别是,FROG数据集包含100%激光扫描的注释(不像DROW只注释5%),17倍的注释扫描,100倍的人注释,以及超过两倍的机器人行进距离。我们提出了一个基于FROG数据集的基准,并基于2D测距仪数据分析了一组最先进的人体探测器。我们还提出并评估了一种新的端到端深度学习方法,用于人员检测。我们的解决方案直接使用原始传感器数据(不需要手工制作的输入数据特征),从而避免了CPU预处理,并释放了开发人员理解特定领域的启发式。实验结果表明,所提出的人检测器如何获得与当前技术水平相当的结果,而ROS的优化实现可以在500 Hz以上工作。
{"title":"FROG: a new people detection dataset for knee-high 2D range finders.","authors":"Fernando Amodeo, Noé Pérez-Higueras, Luis Merino, Fernando Caballero","doi":"10.3389/frobt.2025.1671673","DOIUrl":"10.3389/frobt.2025.1671673","url":null,"abstract":"<p><p>Mobile robots require knowledge of the environment, especially of humans located in its vicinity. While the most common approaches for detecting humans involve computer vision, an often overlooked hardware feature of robots for people detection are their 2D range finders. These were originally intended for obstacle avoidance and mapping/SLAM tasks. In most robots, they are conveniently located at a height approximately between the ankle and the knee, so they can be used for detecting people too, and with a larger field of view and depth resolution compared to cameras. In this paper, we present a new dataset for people detection using knee-high 2D range finders called FROG. This dataset has greater laser resolution, scanning frequency, and more complete annotation data compared to existing datasets such as DROW (Beyer et al., 2018). Particularly, the FROG dataset contains annotations for 100% of its laser scans (unlike DROW which only annotates 5%), 17x more annotated scans, 100x more people annotations, and over twice the distance traveled by the robot. We propose a benchmark based on the FROG dataset, and analyze a collection of state-of-the-art people detectors based on 2D range finder data. We also propose and evaluate a new end-to-end deep learning approach for people detection. Our solution works with the raw sensor data directly (not needing hand-crafted input data features), thus avoiding CPU preprocessing and releasing the developer of understanding specific domain heuristics. Experimental results show how the proposed people detector attains results comparable to the state of the art, while an optimized implementation for ROS can operate at more than 500 Hz.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1671673"},"PeriodicalIF":3.0,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12580528/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145446191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-17eCollection Date: 2025-01-01DOI: 10.3389/frobt.2025.1628213
Martin Føre, Emilia May O'Brien, Eleni Kelasidi
Due to their utility in replacing workers in tasks unsuitable for humans, unmanned underwater vehicles (UUVs) have become increasingly common tools in the fish farming industry. However, earlier studies and anecdotal evidence from farmers imply that farmed fish tend to move away from and avoid intrusive objects such as vehicles that are deployed and operated inside net pens. Such responses could imply a discomfort associated with the intrusive objects, which, in turn, can lead to stress and impaired welfare in the fish. To prevent this, vehicles and their control systems should be designed to automatically adjust operations when they perceive that they are repelling the fish. A necessary first step in this direction is to develop on-vehicle observation systems for assessing object/vehicle-fish distances in real-time settings that can provide inputs to the control algorithms. Due to their small size and low weight, modern cameras are ideal for this purpose. Moreover, the ongoing rapid developments within deep learning methods are enabling the use of increasingly sophisticated methods for analyzing footage from cameras. To explore this potential, we developed three new pipelines for the automated assessment of fish-camera distances in video and images. These methods were complemented using a recently published method, yielding four pipelines in total, namely, SegmentDepth, BBoxDepth, and SuperGlue that were based on stereo-vision and DepthAnything that was monocular. The overall performance was investigated using field data by comparing the fish-object distances obtained from the methods with those measured using a sonar. The four methods were then benchmarked by comparing the number of objects detected and the quality and overall accuracy of the stereo matches (only stereo-based methods). SegmentDepth, DepthAnything, and SuperGlue performed well in comparison with the sonar data, yielding mean absolute errors (MAE) of 0.205 m (95% CI: 0.050-0.360), 0.412 m (95% CI: 0.148-0.676), and 0.187 m (95% CI: 0.073-0.300), respectively, and were integrated into the Robot Operating System (ROS2) framework to enable real-time application in fish behavior identification and the control of robotic vehicles such as UUVs.
由于无人水下航行器(uuv)在代替工人从事不适合人类的工作方面的效用,它已成为养鱼业中越来越普遍的工具。然而,早期的研究和来自农民的轶事证据表明,养殖鱼类倾向于远离和避开侵入性物体,如部署和在渔网围栏内操作的车辆。这样的反应可能意味着与侵入性物体有关的不适,这反过来又会导致鱼的压力和福利受损。为了防止这种情况发生,车辆及其控制系统应该设计成当它们感知到它们正在驱赶鱼时自动调整操作。朝这个方向发展的第一步是开发车载观察系统,用于实时评估物体/车辆与鱼的距离,从而为控制算法提供输入。由于其体积小,重量轻,现代相机是理想的这一目的。此外,深度学习方法的持续快速发展使越来越复杂的方法能够用于分析来自摄像机的镜头。为了探索这一潜力,我们开发了三种新的管道,用于自动评估视频和图像中的鱼相机距离。这些方法与最近发布的方法相补充,总共产生了四个管道,即基于立体视觉的SegmentDepth、BBoxDepth和SuperGlue,以及基于单目的DepthAnything。通过将这些方法获得的鱼物距离与声纳测量的距离进行比较,利用现场数据调查了整体性能。然后通过比较检测到的物体数量和立体匹配的质量和整体精度(仅基于立体的方法)来对这四种方法进行基准测试。与声纳数据相比,SegmentDepth、DepthAnything和SuperGlue表现良好,平均绝对误差(MAE)分别为0.205 m (95% CI: 0.050-0.360)、0.412 m (95% CI: 0.148-0.676)和0.187 m (95% CI: 0.073-0.300),并被集成到机器人操作系统(ROS2)框架中,以实现实时应用于鱼类行为识别和机器人车辆(如uuv)的控制。
{"title":"Deep learning methods for 3D tracking of fish in challenging underwater conditions for future perception in autonomous underwater vehicles.","authors":"Martin Føre, Emilia May O'Brien, Eleni Kelasidi","doi":"10.3389/frobt.2025.1628213","DOIUrl":"10.3389/frobt.2025.1628213","url":null,"abstract":"<p><p>Due to their utility in replacing workers in tasks unsuitable for humans, unmanned underwater vehicles (UUVs) have become increasingly common tools in the fish farming industry. However, earlier studies and anecdotal evidence from farmers imply that farmed fish tend to move away from and avoid intrusive objects such as vehicles that are deployed and operated inside net pens. Such responses could imply a discomfort associated with the intrusive objects, which, in turn, can lead to stress and impaired welfare in the fish. To prevent this, vehicles and their control systems should be designed to automatically adjust operations when they perceive that they are repelling the fish. A necessary first step in this direction is to develop on-vehicle observation systems for assessing object/vehicle-fish distances in real-time settings that can provide inputs to the control algorithms. Due to their small size and low weight, modern cameras are ideal for this purpose. Moreover, the ongoing rapid developments within deep learning methods are enabling the use of increasingly sophisticated methods for analyzing footage from cameras. To explore this potential, we developed three new pipelines for the automated assessment of fish-camera distances in video and images. These methods were complemented using a recently published method, yielding four pipelines in total, namely, <i>SegmentDepth</i>, <i>BBoxDepth</i>, and <i>SuperGlue</i> that were based on stereo-vision and <i>DepthAnything</i> that was monocular. The overall performance was investigated using field data by comparing the fish-object distances obtained from the methods with those measured using a sonar. The four methods were then benchmarked by comparing the number of objects detected and the quality and overall accuracy of the stereo matches (only stereo-based methods). <i>SegmentDepth</i>, <i>DepthAnything</i>, and <i>SuperGlue</i> performed well in comparison with the sonar data, yielding mean absolute errors (MAE) of 0.205 m (95% CI: 0.050-0.360), 0.412 m (95% CI: 0.148-0.676), and 0.187 m (95% CI: 0.073-0.300), respectively, and were integrated into the Robot Operating System (ROS2) framework to enable real-time application in fish behavior identification and the control of robotic vehicles such as UUVs.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1628213"},"PeriodicalIF":3.0,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12575977/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145432746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}