首页 > 最新文献

arXiv - CS - Robotics最新文献

英文 中文
LMMCoDrive: Cooperative Driving with Large Multimodal Model LMMCoDrive:利用大型多模式模型进行协同驾驶
Pub Date : 2024-09-18 DOI: arxiv-2409.11981
Haichao Liu, Ruoyu Yao, Zhenmin Huang, Shaojie Shen, Jun Ma
To address the intricate challenges of decentralized cooperative schedulingand motion planning in Autonomous Mobility-on-Demand (AMoD) systems, this paperintroduces LMMCoDrive, a novel cooperative driving framework that leverages aLarge Multimodal Model (LMM) to enhance traffic efficiency in dynamic urbanenvironments. This framework seamlessly integrates scheduling and motionplanning processes to ensure the effective operation of Cooperative AutonomousVehicles (CAVs). The spatial relationship between CAVs and passenger requestsis abstracted into a Bird's-Eye View (BEV) to fully exploit the potential ofthe LMM. Besides, trajectories are cautiously refined for each CAV whileensuring collision avoidance through safety constraints. A decentralizedoptimization strategy, facilitated by the Alternating Direction Method ofMultipliers (ADMM) within the LMM framework, is proposed to drive the graphevolution of CAVs. Simulation results demonstrate the pivotal role andsignificant impact of LMM in optimizing CAV scheduling and enhancingdecentralized cooperative optimization process for each vehicle. This marks asubstantial stride towards achieving practical, efficient, and safe AMoDsystems that are poised to revolutionize urban transportation. The code isavailable at https://github.com/henryhcliu/LMMCoDrive.
为了解决自主按需移动(AMoD)系统中分散式合作调度和运动规划的复杂挑战,本文介绍了一种新型合作驾驶框架 LMMCoDrive,该框架利用大型多式联运模型(LMM)来提高动态城市环境中的交通效率。该框架无缝集成了调度和运动规划流程,以确保合作式自动驾驶汽车(CAV)的有效运行。CAV 与乘客请求之间的空间关系被抽象为鸟瞰图(BEV),以充分发挥 LMM 的潜力。此外,在通过安全约束确保避免碰撞的同时,对每辆 CAV 的轨迹进行谨慎改进。在 LMM 框架内,提出了一种由交替方向乘法(ADMM)促进的分散优化策略,以推动 CAV 的图形演化。仿真结果表明了 LMM 在优化 CAV 调度和增强每辆车的分散式合作优化过程中的关键作用和重大影响。这标志着在实现实用、高效和安全的 AMoD 系统方面取得了重大进展,有望彻底改变城市交通。代码见 https://github.com/henryhcliu/LMMCoDrive。
{"title":"LMMCoDrive: Cooperative Driving with Large Multimodal Model","authors":"Haichao Liu, Ruoyu Yao, Zhenmin Huang, Shaojie Shen, Jun Ma","doi":"arxiv-2409.11981","DOIUrl":"https://doi.org/arxiv-2409.11981","url":null,"abstract":"To address the intricate challenges of decentralized cooperative scheduling\u0000and motion planning in Autonomous Mobility-on-Demand (AMoD) systems, this paper\u0000introduces LMMCoDrive, a novel cooperative driving framework that leverages a\u0000Large Multimodal Model (LMM) to enhance traffic efficiency in dynamic urban\u0000environments. This framework seamlessly integrates scheduling and motion\u0000planning processes to ensure the effective operation of Cooperative Autonomous\u0000Vehicles (CAVs). The spatial relationship between CAVs and passenger requests\u0000is abstracted into a Bird's-Eye View (BEV) to fully exploit the potential of\u0000the LMM. Besides, trajectories are cautiously refined for each CAV while\u0000ensuring collision avoidance through safety constraints. A decentralized\u0000optimization strategy, facilitated by the Alternating Direction Method of\u0000Multipliers (ADMM) within the LMM framework, is proposed to drive the graph\u0000evolution of CAVs. Simulation results demonstrate the pivotal role and\u0000significant impact of LMM in optimizing CAV scheduling and enhancing\u0000decentralized cooperative optimization process for each vehicle. This marks a\u0000substantial stride towards achieving practical, efficient, and safe AMoD\u0000systems that are poised to revolutionize urban transportation. The code is\u0000available at https://github.com/henryhcliu/LMMCoDrive.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SLAM assisted 3D tracking system for laparoscopic surgery 用于腹腔镜手术的 SLAM 辅助 3D 跟踪系统
Pub Date : 2024-09-18 DOI: arxiv-2409.11688
Jingwei Song, Ray Zhang, Wenwei Zhang, Hao Zhou, Maani Ghaffari
A major limitation of minimally invasive surgery is the difficulty inaccurately locating the internal anatomical structures of the target organ dueto the lack of tactile feedback and transparency. Augmented reality (AR) offersa promising solution to overcome this challenge. Numerous studies have shownthat combining learning-based and geometric methods can achieve accuratepreoperative and intraoperative data registration. This work proposes areal-time monocular 3D tracking algorithm for post-registration tasks. TheORB-SLAM2 framework is adopted and modified for prior-based 3D tracking. Theprimitive 3D shape is used for fast initialization of the monocular SLAM. Apseudo-segmentation strategy is employed to separate the target organ from thebackground for tracking purposes, and the geometric prior of the 3D shape isincorporated as an additional constraint in the pose graph. Experiments fromin-vivo and ex-vivo tests demonstrate that the proposed 3D tracking systemprovides robust 3D tracking and effectively handles typical challenges such asfast motion, out-of-field-of-view scenarios, partial visibility, and"organ-background" relative motion.
微创手术的一个主要局限是,由于缺乏触觉反馈和透明度,很难准确定位目标器官的内部解剖结构。增强现实技术(AR)为克服这一难题提供了一个前景广阔的解决方案。大量研究表明,结合基于学习的方法和几何方法可以实现术前和术中数据的精确配准。本研究提出了一种用于术后配准任务的全时单目三维跟踪算法。该算法采用 ORB-SLAM2 框架,并对其进行了修改,用于基于先验的三维跟踪。原始三维形状用于单目 SLAM 的快速初始化。采用伪分割策略将目标器官从背景中分离出来进行跟踪,并将三维形状的几何先验值作为附加约束纳入姿态图中。体内和体外实验证明,所提出的三维跟踪系统能提供稳健的三维跟踪,并能有效处理快速运动、视场外场景、部分可见性和 "器官-背景 "相对运动等典型挑战。
{"title":"SLAM assisted 3D tracking system for laparoscopic surgery","authors":"Jingwei Song, Ray Zhang, Wenwei Zhang, Hao Zhou, Maani Ghaffari","doi":"arxiv-2409.11688","DOIUrl":"https://doi.org/arxiv-2409.11688","url":null,"abstract":"A major limitation of minimally invasive surgery is the difficulty in\u0000accurately locating the internal anatomical structures of the target organ due\u0000to the lack of tactile feedback and transparency. Augmented reality (AR) offers\u0000a promising solution to overcome this challenge. Numerous studies have shown\u0000that combining learning-based and geometric methods can achieve accurate\u0000preoperative and intraoperative data registration. This work proposes a\u0000real-time monocular 3D tracking algorithm for post-registration tasks. The\u0000ORB-SLAM2 framework is adopted and modified for prior-based 3D tracking. The\u0000primitive 3D shape is used for fast initialization of the monocular SLAM. A\u0000pseudo-segmentation strategy is employed to separate the target organ from the\u0000background for tracking purposes, and the geometric prior of the 3D shape is\u0000incorporated as an additional constraint in the pose graph. Experiments from\u0000in-vivo and ex-vivo tests demonstrate that the proposed 3D tracking system\u0000provides robust 3D tracking and effectively handles typical challenges such as\u0000fast motion, out-of-field-of-view scenarios, partial visibility, and\u0000\"organ-background\" relative motion.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Haptic-ACT: Bridging Human Intuition with Compliant Robotic Manipulation via Immersive VR 触觉-ACT:通过沉浸式虚拟现实技术将人类直觉与顺应性机器人操纵结合起来
Pub Date : 2024-09-18 DOI: arxiv-2409.11925
Kelin Li, Shubham M Wagh, Nitish Sharma, Saksham Bhadani, Wei Chen, Chang Liu, Petar Kormushev
Robotic manipulation is essential for the widespread adoption of robots inindustrial and home settings and has long been a focus within the roboticscommunity. Advances in artificial intelligence have introduced promisinglearning-based methods to address this challenge, with imitation learningemerging as particularly effective. However, efficiently acquiring high-qualitydemonstrations remains a challenge. In this work, we introduce an immersiveVR-based teleoperation setup designed to collect demonstrations from a remotehuman user. We also propose an imitation learning framework called HapticAction Chunking with Transformers (Haptic-ACT). To evaluate the platform, weconducted a pick-and-place task and collected 50 demonstration episodes.Results indicate that the immersive VR platform significantly reducesdemonstrator fingertip forces compared to systems without haptic feedback,enabling more delicate manipulation. Additionally, evaluations of theHaptic-ACT framework in both the MuJoCo simulator and on a real robotdemonstrate its effectiveness in teaching robots more compliant manipulationcompared to the original ACT. Additional materials are available athttps://sites.google.com/view/hapticact.
机器人操纵对于机器人在工业和家庭环境中的广泛应用至关重要,长期以来一直是机器人界关注的焦点。人工智能的进步引入了基于学习的方法来应对这一挑战,其中模仿学习尤为有效。然而,高效获取高质量的演示仍然是一个挑战。在这项工作中,我们介绍了一种基于沉浸式虚拟现实的远程操作设置,旨在从远程人类用户那里收集演示。我们还提出了一个名为 "触觉动作分块与变形(Haptic-ACT)"的模仿学习框架。结果表明,与没有触觉反馈的系统相比,沉浸式 VR 平台大大降低了演示者的指尖力,从而实现了更精细的操作。此外,在MuJoCo模拟器和真实机器人上对触觉-ACT框架进行的评估表明,与原始ACT相比,该框架在教授机器人更顺从的操作方面非常有效。更多资料请访问https://sites.google.com/view/hapticact。
{"title":"Haptic-ACT: Bridging Human Intuition with Compliant Robotic Manipulation via Immersive VR","authors":"Kelin Li, Shubham M Wagh, Nitish Sharma, Saksham Bhadani, Wei Chen, Chang Liu, Petar Kormushev","doi":"arxiv-2409.11925","DOIUrl":"https://doi.org/arxiv-2409.11925","url":null,"abstract":"Robotic manipulation is essential for the widespread adoption of robots in\u0000industrial and home settings and has long been a focus within the robotics\u0000community. Advances in artificial intelligence have introduced promising\u0000learning-based methods to address this challenge, with imitation learning\u0000emerging as particularly effective. However, efficiently acquiring high-quality\u0000demonstrations remains a challenge. In this work, we introduce an immersive\u0000VR-based teleoperation setup designed to collect demonstrations from a remote\u0000human user. We also propose an imitation learning framework called Haptic\u0000Action Chunking with Transformers (Haptic-ACT). To evaluate the platform, we\u0000conducted a pick-and-place task and collected 50 demonstration episodes.\u0000Results indicate that the immersive VR platform significantly reduces\u0000demonstrator fingertip forces compared to systems without haptic feedback,\u0000enabling more delicate manipulation. Additionally, evaluations of the\u0000Haptic-ACT framework in both the MuJoCo simulator and on a real robot\u0000demonstrate its effectiveness in teaching robots more compliant manipulation\u0000compared to the original ACT. Additional materials are available at\u0000https://sites.google.com/view/hapticact.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fusion in Context: A Multimodal Approach to Affective State Recognition 情境融合:情感状态识别的多模态方法
Pub Date : 2024-09-18 DOI: arxiv-2409.11906
Youssef Mohamed, Severin Lemaignan, Arzu Guneysu, Patric Jensfelt, Christian Smith
Accurate recognition of human emotions is a crucial challenge in affectivecomputing and human-robot interaction (HRI). Emotional states play a vital rolein shaping behaviors, decisions, and social interactions. However, emotionalexpressions can be influenced by contextual factors, leading tomisinterpretations if context is not considered. Multimodal fusion, combiningmodalities like facial expressions, speech, and physiological signals, hasshown promise in improving affect recognition. This paper proposes atransformer-based multimodal fusion approach that leverages facial thermaldata, facial action units, and textual context information for context-awareemotion recognition. We explore modality-specific encoders to learn tailoredrepresentations, which are then fused using additive fusion and processed by ashared transformer encoder to capture temporal dependencies and interactions.The proposed method is evaluated on a dataset collected from participantsengaged in a tangible tabletop Pacman game designed to induce various affectivestates. Our results demonstrate the effectiveness of incorporating contextualinformation and multimodal fusion for affective state recognition.
准确识别人类情绪是情感计算和人机交互(HRI)领域的一项重要挑战。情绪状态在塑造行为、决策和社会交往方面起着至关重要的作用。然而,情绪表达可能会受到上下文因素的影响,如果不考虑上下文因素,就会导致错误的解释。多模态融合将面部表情、语音和生理信号等模态结合在一起,有望提高情感识别能力。本文提出了一种基于变换器的多模态融合方法,利用面部热数据、面部动作单元和文本上下文信息进行情境感知的情感识别。我们探索了特定模态编码器来学习量身定制的表述,然后使用相加融合法进行融合,并由共享变压器编码器进行处理,以捕捉时间依赖性和交互。我们的结果表明,将上下文信息和多模态融合用于情感状态识别非常有效。
{"title":"Fusion in Context: A Multimodal Approach to Affective State Recognition","authors":"Youssef Mohamed, Severin Lemaignan, Arzu Guneysu, Patric Jensfelt, Christian Smith","doi":"arxiv-2409.11906","DOIUrl":"https://doi.org/arxiv-2409.11906","url":null,"abstract":"Accurate recognition of human emotions is a crucial challenge in affective\u0000computing and human-robot interaction (HRI). Emotional states play a vital role\u0000in shaping behaviors, decisions, and social interactions. However, emotional\u0000expressions can be influenced by contextual factors, leading to\u0000misinterpretations if context is not considered. Multimodal fusion, combining\u0000modalities like facial expressions, speech, and physiological signals, has\u0000shown promise in improving affect recognition. This paper proposes a\u0000transformer-based multimodal fusion approach that leverages facial thermal\u0000data, facial action units, and textual context information for context-aware\u0000emotion recognition. We explore modality-specific encoders to learn tailored\u0000representations, which are then fused using additive fusion and processed by a\u0000shared transformer encoder to capture temporal dependencies and interactions.\u0000The proposed method is evaluated on a dataset collected from participants\u0000engaged in a tangible tabletop Pacman game designed to induce various affective\u0000states. Our results demonstrate the effectiveness of incorporating contextual\u0000information and multimodal fusion for affective state recognition.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty-Aware Visual-Inertial SLAM with Volumetric Occupancy Mapping 具有体积占位映射的不确定性感知视觉惯性 SLAM
Pub Date : 2024-09-18 DOI: arxiv-2409.12051
Jaehyung Jung, Simon Boche, Sebastian Barbas Laina, Stefan Leutenegger
We propose visual-inertial simultaneous localization and mapping that tightlycouples sparse reprojection errors, inertial measurement unit pre-integrals,and relative pose factors with dense volumetric occupancy mapping. Hereby depthpredictions from a deep neural network are fused in a fully probabilisticmanner. Specifically, our method is rigorously uncertainty-aware: first, we usedepth and uncertainty predictions from a deep network not only from the robot'sstereo rig, but we further probabilistically fuse motion stereo that providesdepth information across a range of baselines, therefore drastically increasingmapping accuracy. Next, predicted and fused depth uncertainty propagates notonly into occupancy probabilities but also into alignment factors betweengenerated dense submaps that enter the probabilistic nonlinear least squaresestimator. This submap representation offers globally consistent geometry atscale. Our method is thoroughly evaluated in two benchmark datasets, resultingin localization and mapping accuracy that exceeds the state of the art, whilesimultaneously offering volumetric occupancy directly usable for downstreamrobotic planning and control in real-time.
我们提出了视觉-惯性同步定位和映射技术,它将稀疏的重投影误差、惯性测量单元预积分和相对姿态因素与密集的容积占位映射紧密结合在一起。因此,深度神经网络的深度预测是以完全概率的方式融合在一起的。具体来说,我们的方法具有严格的不确定性感知能力:首先,我们不仅使用了来自机器人立体钻机的深度和不确定性预测,而且还进一步以概率方式融合了提供一系列基线深度信息的运动立体图,从而大幅提高了测绘精度。接下来,预测和融合深度的不确定性不仅会传播到占用概率中,还会传播到生成的密集子地图之间的对齐因子中,这些子地图会进入概率非线性最小二乘法估计器。这种子映射表示法在尺度上提供了全局一致的几何图形。我们的方法在两个基准数据集中进行了全面评估,其定位和绘图精度均超过了目前的技术水平,同时还提供了可直接用于下游机器人实时规划和控制的体积占用率。
{"title":"Uncertainty-Aware Visual-Inertial SLAM with Volumetric Occupancy Mapping","authors":"Jaehyung Jung, Simon Boche, Sebastian Barbas Laina, Stefan Leutenegger","doi":"arxiv-2409.12051","DOIUrl":"https://doi.org/arxiv-2409.12051","url":null,"abstract":"We propose visual-inertial simultaneous localization and mapping that tightly\u0000couples sparse reprojection errors, inertial measurement unit pre-integrals,\u0000and relative pose factors with dense volumetric occupancy mapping. Hereby depth\u0000predictions from a deep neural network are fused in a fully probabilistic\u0000manner. Specifically, our method is rigorously uncertainty-aware: first, we use\u0000depth and uncertainty predictions from a deep network not only from the robot's\u0000stereo rig, but we further probabilistically fuse motion stereo that provides\u0000depth information across a range of baselines, therefore drastically increasing\u0000mapping accuracy. Next, predicted and fused depth uncertainty propagates not\u0000only into occupancy probabilities but also into alignment factors between\u0000generated dense submaps that enter the probabilistic nonlinear least squares\u0000estimator. This submap representation offers globally consistent geometry at\u0000scale. Our method is thoroughly evaluated in two benchmark datasets, resulting\u0000in localization and mapping accuracy that exceeds the state of the art, while\u0000simultaneously offering volumetric occupancy directly usable for downstream\u0000robotic planning and control in real-time.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A machine learning framework for acoustic reflector mapping 声学反射体绘图的机器学习框架
Pub Date : 2024-09-18 DOI: arxiv-2409.12094
Usama Saqib, Letizia Marchegiani, Jesper Rindom Jensen
Sonar-based indoor mapping systems have been widely employed in robotics forseveral decades. While such systems are still the mainstream in underwater andpipe inspection settings, the vulnerability to noise reduced, over time, theirgeneral widespread usage in favour of other modalities(textit{e.g.}, cameras,lidars), whose technologies were encountering, instead, extraordinaryadvancements. Nevertheless, mapping physical environments using acousticsignals and echolocation can bring significant benefits to robot navigation inadverse scenarios, thanks to their complementary characteristics compared toother sensors. Cameras and lidars, indeed, struggle in harsh weatherconditions, when dealing with lack of illumination, or with non-reflectivewalls. Yet, for acoustic sensors to be able to generate accurate maps, noisehas to be properly and effectively handled. Traditional signal processingtechniques are not always a solution in those cases. In this paper, we proposea framework where machine learning is exploited to aid more traditional signalprocessing methods to cope with background noise, by removing outliers andartefacts from the generated maps using acoustic sensors. Our goal is todemonstrate that the performance of traditional echolocation mapping techniquescan be greatly enhanced, even in particularly noisy conditions, facilitatingthe employment of acoustic sensors in state-of-the-art multi-modal robotnavigation systems. Our simulated evaluation demonstrates that the system canreliably operate at an SNR of $-10$dB. Moreover, we also show that the proposedmethod is capable of operating in different reverberate environments. In thispaper, we also use the proposed method to map the outline of a simulated roomusing a robotic platform.
几十年来,基于声纳的室内测绘系统一直被广泛应用于机器人领域。虽然这类系统在水下和管道检测环境中仍是主流,但随着时间的推移,其易受噪声影响的特性降低了其广泛使用的普遍性,转而使用其他模式(如文本、相机、激光雷达),而这些模式的技术正在取得非凡的进步。不过,由于声学信号和回声定位与其他传感器相比具有互补性,因此利用它们绘制物理环境地图可以为机器人在逆境中导航带来显著优势。相机和激光雷达在恶劣的天气条件下、在缺乏光照或面对非反射墙壁时,确实很难发挥作用。然而,声学传感器要想生成准确的地图,就必须妥善有效地处理噪声。在这种情况下,传统的信号处理技术并不总能解决问题。在本文中,我们提出了一个框架,利用机器学习来帮助更传统的信号处理方法处理背景噪声,从声学传感器生成的地图中去除异常值和伪影。我们的目标是证明传统回声定位绘图技术的性能即使在噪声特别大的情况下也能得到极大提升,从而促进声学传感器在最先进的多模式机器人导航系统中的应用。我们的模拟评估表明,该系统可以在信噪比为 10 美元分贝的条件下可靠运行。此外,我们还证明了所提出的方法能够在不同的混响环境中工作。在本文中,我们还使用所提出的方法,利用机器人平台绘制了一个模拟房间的轮廓图。
{"title":"A machine learning framework for acoustic reflector mapping","authors":"Usama Saqib, Letizia Marchegiani, Jesper Rindom Jensen","doi":"arxiv-2409.12094","DOIUrl":"https://doi.org/arxiv-2409.12094","url":null,"abstract":"Sonar-based indoor mapping systems have been widely employed in robotics for\u0000several decades. While such systems are still the mainstream in underwater and\u0000pipe inspection settings, the vulnerability to noise reduced, over time, their\u0000general widespread usage in favour of other modalities(textit{e.g.}, cameras,\u0000lidars), whose technologies were encountering, instead, extraordinary\u0000advancements. Nevertheless, mapping physical environments using acoustic\u0000signals and echolocation can bring significant benefits to robot navigation in\u0000adverse scenarios, thanks to their complementary characteristics compared to\u0000other sensors. Cameras and lidars, indeed, struggle in harsh weather\u0000conditions, when dealing with lack of illumination, or with non-reflective\u0000walls. Yet, for acoustic sensors to be able to generate accurate maps, noise\u0000has to be properly and effectively handled. Traditional signal processing\u0000techniques are not always a solution in those cases. In this paper, we propose\u0000a framework where machine learning is exploited to aid more traditional signal\u0000processing methods to cope with background noise, by removing outliers and\u0000artefacts from the generated maps using acoustic sensors. Our goal is to\u0000demonstrate that the performance of traditional echolocation mapping techniques\u0000can be greatly enhanced, even in particularly noisy conditions, facilitating\u0000the employment of acoustic sensors in state-of-the-art multi-modal robot\u0000navigation systems. Our simulated evaluation demonstrates that the system can\u0000reliably operate at an SNR of $-10$dB. Moreover, we also show that the proposed\u0000method is capable of operating in different reverberate environments. In this\u0000paper, we also use the proposed method to map the outline of a simulated room\u0000using a robotic platform.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discovering Conceptual Knowledge with Analytic Ontology Templates for Articulated Objects 用铰接对象的分析本体模板发现概念知识
Pub Date : 2024-09-18 DOI: arxiv-2409.11702
Jianhua Sun, Yuxuan Li, Longfei Xu, Jiude Wei, Liang Chai, Cewu Lu
Human cognition can leverage fundamental conceptual knowledge, like geometricand kinematic ones, to appropriately perceive, comprehend and interact withnovel objects. Motivated by this finding, we aim to endow machine intelligencewith an analogous capability through performing at the conceptual level, inorder to understand and then interact with articulated objects, especially forthose in novel categories, which is challenging due to the intricate geometricstructures and diverse joint types of articulated objects. To achieve thisgoal, we propose Analytic Ontology Template (AOT), a parameterized anddifferentiable program description of generalized conceptual ontologies. Abaseline approach called AOTNet driven by AOTs is designed accordingly to equipintelligent agents with these generalized concepts, and then empower the agentsto effectively discover the conceptual knowledge on the structure andaffordance of articulated objects. The AOT-driven approach yields benefits inthree key perspectives: i) enabling concept-level understanding of articulatedobjects without relying on any real training data, ii) providing analyticstructure information, and iii) introducing rich affordance informationindicating proper ways of interaction. We conduct exhaustive experiments andthe results demonstrate the superiority of our approach in understanding andthen interacting with articulated objects.
人类认知可以利用基本概念知识(如几何和运动学知识)来恰当地感知、理解新物体并与之交互。受这一发现的启发,我们希望通过在概念层面上的表现,赋予机器智能类似的能力,从而理解并与铰接物体,尤其是新类别的铰接物体进行交互,由于铰接物体的几何结构错综复杂,关节类型多种多样,这一点极具挑战性。为了实现这一目标,我们提出了分析本体模板(AOT),这是一种对通用概念本体进行参数化和可区分的程序描述。我们设计了由 AOT 驱动的基准方法 AOTNet,旨在为智能代理配备这些广义概念,然后使代理能够有效地发现有关衔接对象的结构和匹配性的概念知识。AOT 驱动的方法在三个关键方面产生了益处:i) 无需依赖任何真实的训练数据,即可实现对衔接对象的概念级理解;ii) 提供分析结构信息;iii) 引入丰富的负担能力信息,指出正确的交互方式。我们进行了详尽的实验,结果证明了我们的方法在理解衔接对象并与之交互方面的优越性。
{"title":"Discovering Conceptual Knowledge with Analytic Ontology Templates for Articulated Objects","authors":"Jianhua Sun, Yuxuan Li, Longfei Xu, Jiude Wei, Liang Chai, Cewu Lu","doi":"arxiv-2409.11702","DOIUrl":"https://doi.org/arxiv-2409.11702","url":null,"abstract":"Human cognition can leverage fundamental conceptual knowledge, like geometric\u0000and kinematic ones, to appropriately perceive, comprehend and interact with\u0000novel objects. Motivated by this finding, we aim to endow machine intelligence\u0000with an analogous capability through performing at the conceptual level, in\u0000order to understand and then interact with articulated objects, especially for\u0000those in novel categories, which is challenging due to the intricate geometric\u0000structures and diverse joint types of articulated objects. To achieve this\u0000goal, we propose Analytic Ontology Template (AOT), a parameterized and\u0000differentiable program description of generalized conceptual ontologies. A\u0000baseline approach called AOTNet driven by AOTs is designed accordingly to equip\u0000intelligent agents with these generalized concepts, and then empower the agents\u0000to effectively discover the conceptual knowledge on the structure and\u0000affordance of articulated objects. The AOT-driven approach yields benefits in\u0000three key perspectives: i) enabling concept-level understanding of articulated\u0000objects without relying on any real training data, ii) providing analytic\u0000structure information, and iii) introducing rich affordance information\u0000indicating proper ways of interaction. We conduct exhaustive experiments and\u0000the results demonstrate the superiority of our approach in understanding and\u0000then interacting with articulated objects.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Refractive Camera Model Calibration in Visual Inertial Odometry 视觉惯性测距中的在线折射相机模型校准
Pub Date : 2024-09-18 DOI: arxiv-2409.12074
Mohit Singh, Kostas Alexis
This paper presents a general refractive camera model and onlineco-estimation of odometry and the refractive index of unknown media. Thisenables operation in diverse and varying refractive fluids, given only thecamera calibration in air. The refractive index is estimated online as a statevariable of a monocular visual-inertial odometry framework in an iterativeformulation using the proposed camera model. The method was verified on datacollected using an underwater robot traversing inside a pool. The evaluationsdemonstrate convergence to the ideal refractive index for water despitesignificant perturbations in the initialization. Simultaneously, the approachenables on-par visual-inertial odometry performance in refractive media withoutprior knowledge of the refractive index or requirement of medium-specificcamera calibration.
本文介绍了一种通用折射率照相机模型,以及在线估算未知介质的几何尺寸和折射率。只需在空气中对相机进行校准,即可在各种不同的折射流体中进行操作。折射率作为单目视觉-惯性里程计框架的状态变量,通过使用所提出的相机模型进行迭代计算进行在线估计。该方法在使用水下机器人穿越水池收集的数据上进行了验证。评估结果表明,尽管初始化过程中存在显著扰动,但该方法仍能收敛到理想的水折射率。同时,该方法在折射介质中的视觉惯性里程测量性能与标准相当,而无需事先了解折射率,也不需要针对特定介质进行相机校准。
{"title":"Online Refractive Camera Model Calibration in Visual Inertial Odometry","authors":"Mohit Singh, Kostas Alexis","doi":"arxiv-2409.12074","DOIUrl":"https://doi.org/arxiv-2409.12074","url":null,"abstract":"This paper presents a general refractive camera model and online\u0000co-estimation of odometry and the refractive index of unknown media. This\u0000enables operation in diverse and varying refractive fluids, given only the\u0000camera calibration in air. The refractive index is estimated online as a state\u0000variable of a monocular visual-inertial odometry framework in an iterative\u0000formulation using the proposed camera model. The method was verified on data\u0000collected using an underwater robot traversing inside a pool. The evaluations\u0000demonstrate convergence to the ideal refractive index for water despite\u0000significant perturbations in the initialization. Simultaneously, the approach\u0000enables on-par visual-inertial odometry performance in refractive media without\u0000prior knowledge of the refractive index or requirement of medium-specific\u0000camera calibration.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Task Planning from Multi-Modal Demonstration for Multi-Stage Contact-Rich Manipulation 从多模式演示中学习任务规划,实现多阶段接触式丰富操纵
Pub Date : 2024-09-18 DOI: arxiv-2409.11863
Kejia Chen, Zheng Shen, Yue Zhang, Lingyun Chen, Fan Wu, Zhenshan Bing, Sami Haddadin, Alois Knoll
Large Language Models (LLMs) have gained popularity in task planning forlong-horizon manipulation tasks. To enhance the validity of LLM-generatedplans, visual demonstrations and online videos have been widely employed toguide the planning process. However, for manipulation tasks involving subtlemovements but rich contact interactions, visual perception alone may beinsufficient for the LLM to fully interpret the demonstration. Additionally,visual data provides limited information on force-related parameters andconditions, which are crucial for effective execution on real robots. In this paper, we introduce an in-context learning framework thatincorporates tactile and force-torque information from human demonstrations toenhance LLMs' ability to generate plans for new task scenarios. We propose abootstrapped reasoning pipeline that sequentially integrates each modality intoa comprehensive task plan. This task plan is then used as a reference forplanning in new task configurations. Real-world experiments on two differentsequential manipulation tasks demonstrate the effectiveness of our framework inimproving LLMs' understanding of multi-modal demonstrations and enhancing theoverall planning performance.
大型语言模型(LLM)在长视距操作任务规划中越来越受欢迎。为了提高 LLM 生成的计划的有效性,人们广泛采用视觉演示和在线视频来指导计划过程。然而,对于涉及细微动作但接触互动丰富的操纵任务,仅靠视觉感知可能不足以让 LLM 完全理解演示。此外,视觉数据提供的与力相关的参数和条件信息也很有限,而这些信息对于在真实机器人上有效执行任务至关重要。在本文中,我们介绍了一种情境学习框架,该框架结合了人类演示中的触觉和力-扭矩信息,以增强 LLM 为新任务场景生成计划的能力。我们提出了一个引导式推理流水线,该流水线将每种模式依次整合到一个综合任务计划中。然后,该任务计划将作为新任务配置计划的参考。在两个不同的顺序操作任务上进行的真实世界实验证明了我们的框架在改善 LLM 对多模态演示的理解和提高整体规划性能方面的有效性。
{"title":"Learning Task Planning from Multi-Modal Demonstration for Multi-Stage Contact-Rich Manipulation","authors":"Kejia Chen, Zheng Shen, Yue Zhang, Lingyun Chen, Fan Wu, Zhenshan Bing, Sami Haddadin, Alois Knoll","doi":"arxiv-2409.11863","DOIUrl":"https://doi.org/arxiv-2409.11863","url":null,"abstract":"Large Language Models (LLMs) have gained popularity in task planning for\u0000long-horizon manipulation tasks. To enhance the validity of LLM-generated\u0000plans, visual demonstrations and online videos have been widely employed to\u0000guide the planning process. However, for manipulation tasks involving subtle\u0000movements but rich contact interactions, visual perception alone may be\u0000insufficient for the LLM to fully interpret the demonstration. Additionally,\u0000visual data provides limited information on force-related parameters and\u0000conditions, which are crucial for effective execution on real robots. In this paper, we introduce an in-context learning framework that\u0000incorporates tactile and force-torque information from human demonstrations to\u0000enhance LLMs' ability to generate plans for new task scenarios. We propose a\u0000bootstrapped reasoning pipeline that sequentially integrates each modality into\u0000a comprehensive task plan. This task plan is then used as a reference for\u0000planning in new task configurations. Real-world experiments on two different\u0000sequential manipulation tasks demonstrate the effectiveness of our framework in\u0000improving LLMs' understanding of multi-modal demonstrations and enhancing the\u0000overall planning performance.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WeHelp: A Shared Autonomy System for Wheelchair Users WeHelp:轮椅使用者共享自主系统
Pub Date : 2024-09-18 DOI: arxiv-2409.12159
Abulikemu Abuduweili, Alice Wu, Tianhao Wei, Weiye Zhao
There is a large population of wheelchair users. Most of the wheelchair usersneed help with daily tasks. However, according to recent reports, their needsare not properly satisfied due to the lack of caregivers. Therefore, in thisproject, we develop WeHelp, a shared autonomy system aimed for wheelchairusers. A robot with a WeHelp system has three modes, following mode, remotecontrol mode and tele-operation mode. In the following mode, the robot followsthe wheelchair user automatically via visual tracking. The wheelchair user canask the robot to follow them from behind, by the left or by the right. When thewheelchair user asks for help, the robot will recognize the command via speechrecognition, and then switch to the teleoperation mode or remote control mode.In the teleoperation mode, the wheelchair user takes over the robot with a joystick and controls the robot to complete some complex tasks for their needs,such as opening doors, moving obstacles on the way, reaching objects on a highshelf or on the low ground, etc. In the remote control mode, a remote assistanttakes over the robot and helps the wheelchair user complete some complex tasksfor their needs. Our evaluation shows that the pipeline is useful and practicalfor wheelchair users. Source code and demo of the paper are available aturl{https://github.com/Walleclipse/WeHelp}.
轮椅使用者人数众多。大多数轮椅使用者在日常生活中需要帮助。然而,根据最近的报告,由于缺乏护理人员,他们的需求并没有得到很好的满足。因此,在本项目中,我们开发了针对轮椅使用者的共享自主系统 WeHelp。带有 WeHelp 系统的机器人有三种模式:跟随模式、远程控制模式和远程操作模式。在跟随模式下,机器人通过视觉跟踪自动跟随轮椅使用者。轮椅使用者可以要求机器人从后面、左边或右边跟随他们。在远程操作模式下,轮椅使用者通过操纵杆控制机器人完成一些复杂的任务,如开门、移动途中的障碍物、拿取高处或低处的物品等。在远程控制模式下,远程助手接管机器人,帮助轮椅使用者完成一些复杂的任务。我们的评估结果表明,该流水线对轮椅使用者来说非常实用。本文的源代码和演示可在以下网址获取:url{https://github.com/Walleclipse/WeHelp}。
{"title":"WeHelp: A Shared Autonomy System for Wheelchair Users","authors":"Abulikemu Abuduweili, Alice Wu, Tianhao Wei, Weiye Zhao","doi":"arxiv-2409.12159","DOIUrl":"https://doi.org/arxiv-2409.12159","url":null,"abstract":"There is a large population of wheelchair users. Most of the wheelchair users\u0000need help with daily tasks. However, according to recent reports, their needs\u0000are not properly satisfied due to the lack of caregivers. Therefore, in this\u0000project, we develop WeHelp, a shared autonomy system aimed for wheelchair\u0000users. A robot with a WeHelp system has three modes, following mode, remote\u0000control mode and tele-operation mode. In the following mode, the robot follows\u0000the wheelchair user automatically via visual tracking. The wheelchair user can\u0000ask the robot to follow them from behind, by the left or by the right. When the\u0000wheelchair user asks for help, the robot will recognize the command via speech\u0000recognition, and then switch to the teleoperation mode or remote control mode.\u0000In the teleoperation mode, the wheelchair user takes over the robot with a joy\u0000stick and controls the robot to complete some complex tasks for their needs,\u0000such as opening doors, moving obstacles on the way, reaching objects on a high\u0000shelf or on the low ground, etc. In the remote control mode, a remote assistant\u0000takes over the robot and helps the wheelchair user complete some complex tasks\u0000for their needs. Our evaluation shows that the pipeline is useful and practical\u0000for wheelchair users. Source code and demo of the paper are available at\u0000url{https://github.com/Walleclipse/WeHelp}.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Robotics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1