首页 > 最新文献

arXiv - CS - Robotics最新文献

英文 中文
InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation InterACT:利用分层注意力转换器进行相互依赖感知的动作分块,用于双手操作
Pub Date : 2024-09-12 DOI: arxiv-2409.07914
Andrew Lee, Ian Chuang, Ling-Yuan Chen, Iman Soltani
We present InterACT: Inter-dependency aware Action Chunking with HierarchicalAttention Transformers, a novel imitation learning framework for bimanualmanipulation that integrates hierarchical attention to captureinter-dependencies between dual-arm joint states and visual inputs. InterACTconsists of a Hierarchical Attention Encoder and a Multi-arm Decoder, bothdesigned to enhance information aggregation and coordination. The encoderprocesses multi-modal inputs through segment-wise and cross-segment attentionmechanisms, while the decoder leverages synchronization blocks to refineindividual action predictions, providing the counterpart's prediction ascontext. Our experiments on a variety of simulated and real-world bimanualmanipulation tasks demonstrate that InterACT significantly outperforms existingmethods. Detailed ablation studies validate the contributions of key componentsof our work, including the impact of CLS tokens, cross-segment encoders, andsynchronization blocks.
我们介绍的 InterACT:分层注意力转换器(HierarchicalAttention Transformers)是一种用于双臂操纵的新型模仿学习框架,它整合了分层注意力,以捕捉双臂关节状态和视觉输入之间的相互依赖关系。InterACT 包括一个分层注意力编码器和一个多臂解码器,两者的设计都是为了加强信息聚合和协调。编码器通过分段和跨分段注意力机制处理多模态输入,而解码器则利用同步块来完善单个动作预测,并将对应方的预测作为上下文提供。我们在各种模拟和现实世界双臂操作任务中进行的实验表明,InterACT 的性能明显优于现有方法。详细的消融研究验证了我们工作中关键部分的贡献,包括 CLS 标记、跨片段编码器和同步块的影响。
{"title":"InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation","authors":"Andrew Lee, Ian Chuang, Ling-Yuan Chen, Iman Soltani","doi":"arxiv-2409.07914","DOIUrl":"https://doi.org/arxiv-2409.07914","url":null,"abstract":"We present InterACT: Inter-dependency aware Action Chunking with Hierarchical\u0000Attention Transformers, a novel imitation learning framework for bimanual\u0000manipulation that integrates hierarchical attention to capture\u0000inter-dependencies between dual-arm joint states and visual inputs. InterACT\u0000consists of a Hierarchical Attention Encoder and a Multi-arm Decoder, both\u0000designed to enhance information aggregation and coordination. The encoder\u0000processes multi-modal inputs through segment-wise and cross-segment attention\u0000mechanisms, while the decoder leverages synchronization blocks to refine\u0000individual action predictions, providing the counterpart's prediction as\u0000context. Our experiments on a variety of simulated and real-world bimanual\u0000manipulation tasks demonstrate that InterACT significantly outperforms existing\u0000methods. Detailed ablation studies validate the contributions of key components\u0000of our work, including the impact of CLS tokens, cross-segment encoders, and\u0000synchronization blocks.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Universal Trajectory Optimization Framework for Differential-Driven Robot Class 微分驱动机器人类的通用轨迹优化框架
Pub Date : 2024-09-12 DOI: arxiv-2409.07924
Mengke Zhang, Zhichao Han, Chao Xu, Fei Gao, Yanjun Cao
Differential-driven robots are widely used in various scenarios thanks totheir straightforward principle, from household service robots to disasterresponse field robots. There are several different types of deriving mechanismsconsidering the real-world applications, including two-wheeled, four-wheeledskid-steering, tracked robots, etc. The differences in the driving mechanismusually require specific kinematic modeling when precise controlling isdesired. Furthermore, the nonholonomic dynamics and possible lateral slip leadto different degrees of difficulty in getting feasible and high-qualitytrajectories. Therefore, a comprehensive trajectory optimization framework tocompute trajectories efficiently for various kinds of differential-drivenrobots is highly desirable. In this paper, we propose a universal trajectoryoptimization framework that can be applied to differential-driven robot class,enabling the generation of high-quality trajectories within a restrictedcomputational timeframe. We introduce a novel trajectory representation basedon polynomial parameterization of motion states or their integrals, such asangular and linear velocities, that inherently matching robots' motion to thecontrol principle for differential-driven robot class. The trajectoryoptimization problem is formulated to minimize complexity while prioritizingsafety and operational efficiency. We then build a full-stack autonomousplanning and control system to show the feasibility and robustness. We conductextensive simulations and real-world testing in crowded environments with threekinds of differential-driven robots to validate the effectiveness of ourapproach. We will release our method as an open-source package.
差分驱动机器人的原理简单明了,因此被广泛应用于各种场景,从家庭服务机器人到救灾现场机器人,不一而足。考虑到实际应用,差分驱动机器人的驱动机构有多种类型,包括两轮驱动、四轮驱动、履带式机器人等。当需要精确控制时,驱动机制的差异通常需要特定的运动学建模。此外,非整体动力学和可能的横向滑移也给获得可行的高质量轨迹带来了不同程度的困难。因此,我们非常需要一个全面的轨迹优化框架,为各种差分驱动机器人高效计算轨迹。在本文中,我们提出了一种通用轨迹优化框架,可应用于差分驱动机器人类别,从而在有限的计算时间内生成高质量轨迹。我们引入了一种基于运动状态或其积分(如角速度和线速度)的多项式参数化的新型轨迹表示法,它能使机器人的运动与差分驱动机器人类的控制原理相匹配。轨迹优化问题的提出是为了最大限度地降低复杂性,同时优先考虑安全性和运行效率。然后,我们构建了一个全栈式自主规划和控制系统,以证明其可行性和鲁棒性。我们使用三种差分驱动机器人在拥挤的环境中进行了大量模拟和实际测试,以验证我们方法的有效性。我们将以开源软件包的形式发布我们的方法。
{"title":"Universal Trajectory Optimization Framework for Differential-Driven Robot Class","authors":"Mengke Zhang, Zhichao Han, Chao Xu, Fei Gao, Yanjun Cao","doi":"arxiv-2409.07924","DOIUrl":"https://doi.org/arxiv-2409.07924","url":null,"abstract":"Differential-driven robots are widely used in various scenarios thanks to\u0000their straightforward principle, from household service robots to disaster\u0000response field robots. There are several different types of deriving mechanisms\u0000considering the real-world applications, including two-wheeled, four-wheeled\u0000skid-steering, tracked robots, etc. The differences in the driving mechanism\u0000usually require specific kinematic modeling when precise controlling is\u0000desired. Furthermore, the nonholonomic dynamics and possible lateral slip lead\u0000to different degrees of difficulty in getting feasible and high-quality\u0000trajectories. Therefore, a comprehensive trajectory optimization framework to\u0000compute trajectories efficiently for various kinds of differential-driven\u0000robots is highly desirable. In this paper, we propose a universal trajectory\u0000optimization framework that can be applied to differential-driven robot class,\u0000enabling the generation of high-quality trajectories within a restricted\u0000computational timeframe. We introduce a novel trajectory representation based\u0000on polynomial parameterization of motion states or their integrals, such as\u0000angular and linear velocities, that inherently matching robots' motion to the\u0000control principle for differential-driven robot class. The trajectory\u0000optimization problem is formulated to minimize complexity while prioritizing\u0000safety and operational efficiency. We then build a full-stack autonomous\u0000planning and control system to show the feasibility and robustness. We conduct\u0000extensive simulations and real-world testing in crowded environments with three\u0000kinds of differential-driven robots to validate the effectiveness of our\u0000approach. We will release our method as an open-source package.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph Inspection for Robotic Motion Planning: Do Arithmetic Circuits Help? 机器人运动规划的图形检测:算术电路有用吗?
Pub Date : 2024-09-12 DOI: arxiv-2409.08219
Matthias Bentert, Daniel Coimbra Salomao, Alex Crane, Yosuke Mizutani, Felix Reidl, Blair D. Sullivan
We investigate whether algorithms based on arithmetic circuits are a viablealternative to existing solvers for Graph Inspection, a problem with directapplication in robotic motion planning. Specifically, we seek to address thehigh memory usage of existing solvers. Aided by novel theoretical resultsenabling fast solution recovery, we implement a circuit-based solver for GraphInspection which uses only polynomial space and test it on several realisticrobotic motion planning datasets. In particular, we provide a comprehensiveexperimental evaluation of a suite of engineered algorithms for three keysubroutines. While this evaluation demonstrates that circuit-based methods arenot yet practically competitive for our robotics application, it also providesinsights which may guide future efforts to bring circuit-based algorithms fromtheory to practice.
对于直接应用于机器人运动规划的图形检测问题,我们研究了基于算术电路的算法是否是现有求解器的可行替代方案。具体来说,我们试图解决现有求解器内存占用率高的问题。在能快速恢复解的新理论成果的帮助下,我们实现了一种基于电路的图检测求解器,它只使用多项式空间,并在几个现实的机器人运动规划数据集上进行了测试。特别是,我们为三个关键程序提供了一套工程算法的综合实验评估。评估结果表明,基于电路的方法在我们的机器人应用中还不具备实际竞争力,但它也提供了一些启示,可以指导未来将基于电路的算法从理论引入实践的工作。
{"title":"Graph Inspection for Robotic Motion Planning: Do Arithmetic Circuits Help?","authors":"Matthias Bentert, Daniel Coimbra Salomao, Alex Crane, Yosuke Mizutani, Felix Reidl, Blair D. Sullivan","doi":"arxiv-2409.08219","DOIUrl":"https://doi.org/arxiv-2409.08219","url":null,"abstract":"We investigate whether algorithms based on arithmetic circuits are a viable\u0000alternative to existing solvers for Graph Inspection, a problem with direct\u0000application in robotic motion planning. Specifically, we seek to address the\u0000high memory usage of existing solvers. Aided by novel theoretical results\u0000enabling fast solution recovery, we implement a circuit-based solver for Graph\u0000Inspection which uses only polynomial space and test it on several realistic\u0000robotic motion planning datasets. In particular, we provide a comprehensive\u0000experimental evaluation of a suite of engineered algorithms for three key\u0000subroutines. While this evaluation demonstrates that circuit-based methods are\u0000not yet practically competitive for our robotics application, it also provides\u0000insights which may guide future efforts to bring circuit-based algorithms from\u0000theory to practice.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Online Safety Corrections for Robotic Manipulation Policies 实现机器人操纵政策的在线安全修正
Pub Date : 2024-09-12 DOI: arxiv-2409.08233
Ariana Spalter, Mark Roberts, Laura M. Hiatt
Recent successes in applying reinforcement learning (RL) for robotics hasshown it is a viable approach for constructing robotic controllers. However, RLcontrollers can produce many collisions in environments where new obstaclesappear during execution. This poses a problem in safety-critical settings. Wepresent a hybrid approach, called iKinQP-RL, that uses an Inverse KinematicsQuadratic Programming (iKinQP) controller to correct actions proposed by an RLpolicy at runtime. This ensures safe execution in the presence of new obstaclesnot present during training. Preliminary experiments illustrate our iKinQP-RLframework completely eliminates collisions with new obstacles while maintaininga high task success rate.
最近,强化学习(RL)在机器人技术中的成功应用表明,它是构建机器人控制器的一种可行方法。然而,在执行过程中出现新障碍物的环境中,RL 控制器可能会产生许多碰撞。这给安全关键环境带来了问题。我们提出了一种名为 iKinQP-RL 的混合方法,它使用逆运动学二次编程(iKinQP)控制器来纠正运行时由 RL 策略提出的动作。这确保了在出现训练期间未出现的新障碍时的安全执行。初步实验表明,我们的 iKinQP-RL 框架完全消除了与新障碍物的碰撞,同时保持了较高的任务成功率。
{"title":"Towards Online Safety Corrections for Robotic Manipulation Policies","authors":"Ariana Spalter, Mark Roberts, Laura M. Hiatt","doi":"arxiv-2409.08233","DOIUrl":"https://doi.org/arxiv-2409.08233","url":null,"abstract":"Recent successes in applying reinforcement learning (RL) for robotics has\u0000shown it is a viable approach for constructing robotic controllers. However, RL\u0000controllers can produce many collisions in environments where new obstacles\u0000appear during execution. This poses a problem in safety-critical settings. We\u0000present a hybrid approach, called iKinQP-RL, that uses an Inverse Kinematics\u0000Quadratic Programming (iKinQP) controller to correct actions proposed by an RL\u0000policy at runtime. This ensures safe execution in the presence of new obstacles\u0000not present during training. Preliminary experiments illustrate our iKinQP-RL\u0000framework completely eliminates collisions with new obstacles while maintaining\u0000a high task success rate.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Language-Guided Abstraction from Contrastive Explanations 自适应语言引导下的对比解释抽象
Pub Date : 2024-09-12 DOI: arxiv-2409.08212
Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie A. Shah, Jacob Andreas, Andreea Bobu
Many approaches to robot learning begin by inferring a reward function from aset of human demonstrations. To learn a good reward, it is necessary todetermine which features of the environment are relevant before determining howthese features should be used to compute reward. End-to-end methods for jointfeature and reward learning (e.g., using deep networks or program synthesistechniques) often yield brittle reward functions that are sensitive to spuriousstate features. By contrast, humans can often generalizably learn from a smallnumber of demonstrations by incorporating strong priors about what features ofa demonstration are likely meaningful for a task of interest. How do we buildrobots that leverage this kind of background knowledge when learning from newdemonstrations? This paper describes a method named ALGAE (AdaptiveLanguage-Guided Abstraction from [Contrastive] Explanations) which alternatesbetween using language models to iteratively identify human-meaningful featuresneeded to explain demonstrated behavior, then standard inverse reinforcementlearning techniques to assign weights to these features. Experiments across avariety of both simulated and real-world robot environments show that ALGAElearns generalizable reward functions defined on interpretable features usingonly small numbers of demonstrations. Importantly, ALGAE can recognize whenfeatures are missing, then extract and define those features without any humaninput -- making it possible to quickly and efficiently acquire richrepresentations of user behavior.
许多机器人学习方法都是从一组人类示范开始推断奖励函数。要想学习到好的奖励,必须先确定环境中哪些特征是相关的,然后再确定如何使用这些特征来计算奖励。联合特征和奖励学习的端到端方法(如使用深度网络或程序合成技术)通常会产生对虚假状态特征敏感的脆弱奖励函数。相比之下,人类通常可以从少量的演示中进行泛化学习,方法是将演示中哪些特征可能对感兴趣的任务有意义纳入强大的先验。我们该如何构建机器人,以便在学习新演示时利用这种背景知识呢?本文介绍了一种名为 ALGAE(AdaptiveLanguage-GuidedAbstractionfrom[Contrastive]Explanations)的方法,它可以交替使用语言模型迭代识别解释演示行为所需的人类有意义特征,然后使用标准反强化学习技术为这些特征分配权重。在各种模拟和真实世界机器人环境中进行的实验表明,ALGAE 只需少量演示就能学习到定义在可解释特征上的通用奖励函数。重要的是,ALGAE 能够识别特征缺失的情况,然后提取并定义这些特征,而不需要任何人工输入--这使得快速高效地获取丰富的用户行为表现成为可能。
{"title":"Adaptive Language-Guided Abstraction from Contrastive Explanations","authors":"Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie A. Shah, Jacob Andreas, Andreea Bobu","doi":"arxiv-2409.08212","DOIUrl":"https://doi.org/arxiv-2409.08212","url":null,"abstract":"Many approaches to robot learning begin by inferring a reward function from a\u0000set of human demonstrations. To learn a good reward, it is necessary to\u0000determine which features of the environment are relevant before determining how\u0000these features should be used to compute reward. End-to-end methods for joint\u0000feature and reward learning (e.g., using deep networks or program synthesis\u0000techniques) often yield brittle reward functions that are sensitive to spurious\u0000state features. By contrast, humans can often generalizably learn from a small\u0000number of demonstrations by incorporating strong priors about what features of\u0000a demonstration are likely meaningful for a task of interest. How do we build\u0000robots that leverage this kind of background knowledge when learning from new\u0000demonstrations? This paper describes a method named ALGAE (Adaptive\u0000Language-Guided Abstraction from [Contrastive] Explanations) which alternates\u0000between using language models to iteratively identify human-meaningful features\u0000needed to explain demonstrated behavior, then standard inverse reinforcement\u0000learning techniques to assign weights to these features. Experiments across a\u0000variety of both simulated and real-world robot environments show that ALGAE\u0000learns generalizable reward functions defined on interpretable features using\u0000only small numbers of demonstrations. Importantly, ALGAE can recognize when\u0000features are missing, then extract and define those features without any human\u0000input -- making it possible to quickly and efficiently acquire rich\u0000representations of user behavior.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ReGentS: Real-World Safety-Critical Driving Scenario Generation Made Stable ReGentS:稳定生成真实世界的安全关键驾驶场景
Pub Date : 2024-09-12 DOI: arxiv-2409.07830
Yuan Yin, Pegah Khayatan, Éloi Zablocki, Alexandre Boulch, Matthieu Cord
Machine learning based autonomous driving systems often face challenges withsafety-critical scenarios that are rare in real-world data, hindering theirlarge-scale deployment. While increasing real-world training data coveragecould address this issue, it is costly and dangerous. This work exploresgenerating safety-critical driving scenarios by modifying complex real-worldregular scenarios through trajectory optimization. We propose ReGentS, whichstabilizes generated trajectories and introduces heuristics to avoid obviouscollisions and optimization problems. Our approach addresses unrealisticdiverging trajectories and unavoidable collision scenarios that are not usefulfor training robust planner. We also extend the scenario generation frameworkto handle real-world data with up to 32 agents. Additionally, by using adifferentiable simulator, our approach simplifies gradient descent-basedoptimization involving a simulator, paving the way for future advancements. Thecode is available at https://github.com/valeoai/ReGentS.
基于机器学习的自动驾驶系统经常面临安全关键场景的挑战,而这些场景在真实世界的数据中很少见,这阻碍了它们的大规模部署。虽然增加真实世界训练数据的覆盖面可以解决这个问题,但成本高昂且危险。这项工作探索通过轨迹优化修改复杂的真实世界常规场景来生成安全关键驾驶场景。我们提出了 ReGentS,它能稳定生成的轨迹,并引入启发式方法来避免明显的碰撞和优化问题。我们的方法可以解决不切实际的发散轨迹和无法避免的碰撞情况,这些情况对训练鲁棒规划器并无益处。我们还扩展了场景生成框架,以处理多达 32 个代理的真实世界数据。此外,通过使用可变模拟器,我们的方法简化了涉及模拟器的基于梯度下降的优化,为未来的进步铺平了道路。代码见 https://github.com/valeoai/ReGentS。
{"title":"ReGentS: Real-World Safety-Critical Driving Scenario Generation Made Stable","authors":"Yuan Yin, Pegah Khayatan, Éloi Zablocki, Alexandre Boulch, Matthieu Cord","doi":"arxiv-2409.07830","DOIUrl":"https://doi.org/arxiv-2409.07830","url":null,"abstract":"Machine learning based autonomous driving systems often face challenges with\u0000safety-critical scenarios that are rare in real-world data, hindering their\u0000large-scale deployment. While increasing real-world training data coverage\u0000could address this issue, it is costly and dangerous. This work explores\u0000generating safety-critical driving scenarios by modifying complex real-world\u0000regular scenarios through trajectory optimization. We propose ReGentS, which\u0000stabilizes generated trajectories and introduces heuristics to avoid obvious\u0000collisions and optimization problems. Our approach addresses unrealistic\u0000diverging trajectories and unavoidable collision scenarios that are not useful\u0000for training robust planner. We also extend the scenario generation framework\u0000to handle real-world data with up to 32 agents. Additionally, by using a\u0000differentiable simulator, our approach simplifies gradient descent-based\u0000optimization involving a simulator, paving the way for future advancements. The\u0000code is available at https://github.com/valeoai/ReGentS.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hand-Object Interaction Pretraining from Videos 通过视频进行手与物体交互预训练
Pub Date : 2024-09-12 DOI: arxiv-2409.08273
Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik
We present an approach to learn general robot manipulation priors from 3Dhand-object interaction trajectories. We build a framework to use in-the-wildvideos to generate sensorimotor robot trajectories. We do so by lifting boththe human hand and the manipulated object in a shared 3D space and retargetinghuman motions to robot actions. Generative modeling on this data gives us atask-agnostic base policy. This policy captures a general yet flexiblemanipulation prior. We empirically demonstrate that finetuning this policy,with both reinforcement learning (RL) and behavior cloning (BC), enablessample-efficient adaptation to downstream tasks and simultaneously improvesrobustness and generalizability compared to prior approaches. Qualitativeexperiments are available at: url{https://hgaurav2k.github.io/hop/}.
我们提出了一种从三维手-物交互轨迹中学习通用机器人操纵先验的方法。我们建立了一个框架,利用实时视频生成机器人的感应运动轨迹。我们的方法是在共享的三维空间中同时抬起人手和被操纵物体,并将人的动作重定向为机器人动作。通过对这些数据进行生成建模,我们可以获得与物体无关的基本策略。该策略捕捉到了通用但灵活的操纵先验。我们通过经验证明,利用强化学习(RL)和行为克隆(BC)对这一策略进行微调,可以实现对下游任务的无例高效适应,与之前的方法相比,同时提高了稳健性和普适性。定性实验见url{https://hgaurav2k.github.io/hop/}.
{"title":"Hand-Object Interaction Pretraining from Videos","authors":"Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik","doi":"arxiv-2409.08273","DOIUrl":"https://doi.org/arxiv-2409.08273","url":null,"abstract":"We present an approach to learn general robot manipulation priors from 3D\u0000hand-object interaction trajectories. We build a framework to use in-the-wild\u0000videos to generate sensorimotor robot trajectories. We do so by lifting both\u0000the human hand and the manipulated object in a shared 3D space and retargeting\u0000human motions to robot actions. Generative modeling on this data gives us a\u0000task-agnostic base policy. This policy captures a general yet flexible\u0000manipulation prior. We empirically demonstrate that finetuning this policy,\u0000with both reinforcement learning (RL) and behavior cloning (BC), enables\u0000sample-efficient adaptation to downstream tasks and simultaneously improves\u0000robustness and generalizability compared to prior approaches. Qualitative\u0000experiments are available at: url{https://hgaurav2k.github.io/hop/}.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time Multi-view Omnidirectional Depth Estimation System for Robots and Autonomous Driving on Real Scenes 用于真实场景中机器人和自动驾驶汽车的实时多视角全方位深度估算系统
Pub Date : 2024-09-12 DOI: arxiv-2409.07843
Ming Li, Xiong Yang, Chaofan Wu, Jiaheng Li, Pinzhi Wang, Xuejiao Hu, Sidan Du, Yang Li
Omnidirectional Depth Estimation has broad application prospects in fieldssuch as robotic navigation and autonomous driving. In this paper, we propose arobotic prototype system and corresponding algorithm designed to validateomnidirectional depth estimation for navigation and obstacle avoidance inreal-world scenarios for both robots and vehicles. The proposed HexaMODE systemcaptures 360$^circ$ depth maps using six surrounding arranged fisheye cameras.We introduce a combined spherical sweeping method and optimize the modelarchitecture for proposed RtHexa-OmniMVS algorithm to achieve real-timeomnidirectional depth estimation. To ensure high accuracy, robustness, andgeneralization in real-world environments, we employ a teacher-studentself-training strategy, utilizing large-scale unlabeled real-world data formodel training. The proposed algorithm demonstrates high accuracy in variouscomplex real-world scenarios, both indoors and outdoors, achieving an inferencespeed of 15 fps on edge computing platforms.
全向深度估计在机器人导航和自动驾驶等领域有着广阔的应用前景。在本文中,我们提出了一个机器人原型系统和相应的算法,旨在验证全向深度估计在真实世界场景中的导航和避障功能,适用于机器人和车辆。我们引入了一种组合球面扫描方法,并优化了 RtHexa-OmniMVS 算法的模型架构,以实现实时单向深度估计。为了确保在真实世界环境中的高精度、鲁棒性和泛化,我们采用了师生自我训练策略,利用大规模无标记真实世界数据进行模型训练。所提出的算法在室内和室外各种复杂的真实世界场景中都表现出了很高的准确性,在边缘计算平台上实现了 15 fps 的推断速度。
{"title":"Real-time Multi-view Omnidirectional Depth Estimation System for Robots and Autonomous Driving on Real Scenes","authors":"Ming Li, Xiong Yang, Chaofan Wu, Jiaheng Li, Pinzhi Wang, Xuejiao Hu, Sidan Du, Yang Li","doi":"arxiv-2409.07843","DOIUrl":"https://doi.org/arxiv-2409.07843","url":null,"abstract":"Omnidirectional Depth Estimation has broad application prospects in fields\u0000such as robotic navigation and autonomous driving. In this paper, we propose a\u0000robotic prototype system and corresponding algorithm designed to validate\u0000omnidirectional depth estimation for navigation and obstacle avoidance in\u0000real-world scenarios for both robots and vehicles. The proposed HexaMODE system\u0000captures 360$^circ$ depth maps using six surrounding arranged fisheye cameras.\u0000We introduce a combined spherical sweeping method and optimize the model\u0000architecture for proposed RtHexa-OmniMVS algorithm to achieve real-time\u0000omnidirectional depth estimation. To ensure high accuracy, robustness, and\u0000generalization in real-world environments, we employ a teacher-student\u0000self-training strategy, utilizing large-scale unlabeled real-world data for\u0000model training. The proposed algorithm demonstrates high accuracy in various\u0000complex real-world scenarios, both indoors and outdoors, achieving an inference\u0000speed of 15 fps on edge computing platforms.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FIReStereo: Forest InfraRed Stereo Dataset for UAS Depth Perception in Visually Degraded Environments FIReStereo:用于视觉退化环境中无人机系统深度感知的森林红外立体数据集
Pub Date : 2024-09-12 DOI: arxiv-2409.07715
Devansh Dhrafani, Yifei Liu, Andrew Jong, Ukcheol Shin, Yao He, Tyler Harp, Yaoyu Hu, Jean Oh, Sebastian Scherer
Robust depth perception in visually-degraded environments is crucial forautonomous aerial systems. Thermal imaging cameras, which capture infraredradiation, are robust to visual degradation. However, due to lack of alarge-scale dataset, the use of thermal cameras for unmanned aerial system(UAS) depth perception has remained largely unexplored. This paper presents astereo thermal depth perception dataset for autonomous aerial perceptionapplications. The dataset consists of stereo thermal images, LiDAR, IMU andground truth depth maps captured in urban and forest settings under diverseconditions like day, night, rain, and smoke. We benchmark representative stereodepth estimation algorithms, offering insights into their performance indegraded conditions. Models trained on our dataset generalize well to unseensmoky conditions, highlighting the robustness of stereo thermal imaging fordepth perception. We aim for this work to enhance robotic perception indisaster scenarios, allowing for exploration and operations in previouslyunreachable areas. The dataset and source code are available athttps://firestereo.github.io.
在视觉衰减的环境中保持稳定的深度知觉对自主飞行系统至关重要。红外热像仪能捕捉红外辐射,对视觉衰减具有很强的抗干扰能力。然而,由于缺乏大规模的数据集,热像仪在无人机系统(UAS)深度感知方面的应用在很大程度上仍未得到探索。本文介绍了用于自主航空感知应用的立体红外深度感知数据集。该数据集由立体热图像、激光雷达、IMU 和地面实况深度图组成,这些图像是在白天、夜晚、雨天和烟雾等不同条件下在城市和森林环境中拍摄的。我们对具有代表性的立体深度估算算法进行了基准测试,以便深入了解这些算法在不同条件下的性能。在我们的数据集上训练的模型能够很好地泛化到非烟雾条件下,突出了立体热成像在深度感知方面的鲁棒性。我们希望这项工作能增强机器人在灾难场景中的感知能力,从而在以前无法到达的区域进行探索和作业。数据集和源代码可在https://firestereo.github.io。
{"title":"FIReStereo: Forest InfraRed Stereo Dataset for UAS Depth Perception in Visually Degraded Environments","authors":"Devansh Dhrafani, Yifei Liu, Andrew Jong, Ukcheol Shin, Yao He, Tyler Harp, Yaoyu Hu, Jean Oh, Sebastian Scherer","doi":"arxiv-2409.07715","DOIUrl":"https://doi.org/arxiv-2409.07715","url":null,"abstract":"Robust depth perception in visually-degraded environments is crucial for\u0000autonomous aerial systems. Thermal imaging cameras, which capture infrared\u0000radiation, are robust to visual degradation. However, due to lack of a\u0000large-scale dataset, the use of thermal cameras for unmanned aerial system\u0000(UAS) depth perception has remained largely unexplored. This paper presents a\u0000stereo thermal depth perception dataset for autonomous aerial perception\u0000applications. The dataset consists of stereo thermal images, LiDAR, IMU and\u0000ground truth depth maps captured in urban and forest settings under diverse\u0000conditions like day, night, rain, and smoke. We benchmark representative stereo\u0000depth estimation algorithms, offering insights into their performance in\u0000degraded conditions. Models trained on our dataset generalize well to unseen\u0000smoky conditions, highlighting the robustness of stereo thermal imaging for\u0000depth perception. We aim for this work to enhance robotic perception in\u0000disaster scenarios, allowing for exploration and operations in previously\u0000unreachable areas. The dataset and source code are available at\u0000https://firestereo.github.io.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Relevance for Human Robot Collaboration 与人机协作的相关性
Pub Date : 2024-09-12 DOI: arxiv-2409.07753
Xiaotong Zhang, Dingcheng Huang, Kamal Youcef-Toumi
Effective human-robot collaboration (HRC) requires the robots to possesshuman-like intelligence. Inspired by the human's cognitive ability toselectively process and filter elements in complex environments, this paperintroduces a novel concept and scene-understanding approach termed `relevance.'It identifies relevant components in a scene. To accurately and efficientlyquantify relevance, we developed an event-based framework that selectivelytriggers relevance determination, along with a probabilistic methodology builton a structured scene representation. Simulation results demonstrate that therelevance framework and methodology accurately predict the relevance of ageneral HRC setup, achieving a precision of 0.99 and a recall of 0.94.Relevance can be broadly applied to several areas in HRC to improve taskplanning time by 79.56% compared with pure planning for a cereal task, reduceperception latency by up to 26.53% for an object detector, improve HRC safetyby up to 13.50% and reduce the number of inquiries for HRC by 75.36%. Areal-world demonstration showcases the relevance framework's ability tointelligently assist humans in everyday tasks.
有效的人机协作(HRC)要求机器人拥有与人类类似的智能。受人类在复杂环境中选择性处理和过滤元素的认知能力的启发,本文提出了一种新概念和场景理解方法,称为 "相关性"。为了准确、高效地量化相关性,我们开发了一个基于事件的框架,该框架可以有选择地触发相关性判断,同时还开发了一种概率方法,即结构化场景表示法。仿真结果表明,相关性框架和方法能够准确预测一般人机交互设置的相关性,精确度达到 0.99,召回率达到 0.94。相关性可广泛应用于人机交互的多个领域,与谷物任务的纯规划相比,可将任务规划时间缩短 79.56%,将物体检测器的感知延迟时间缩短 26.53%,将人机交互的安全性提高 13.50%,将人机交互的查询次数减少 75.36%。现实世界演示展示了相关性框架在日常任务中智能协助人类的能力。
{"title":"Relevance for Human Robot Collaboration","authors":"Xiaotong Zhang, Dingcheng Huang, Kamal Youcef-Toumi","doi":"arxiv-2409.07753","DOIUrl":"https://doi.org/arxiv-2409.07753","url":null,"abstract":"Effective human-robot collaboration (HRC) requires the robots to possess\u0000human-like intelligence. Inspired by the human's cognitive ability to\u0000selectively process and filter elements in complex environments, this paper\u0000introduces a novel concept and scene-understanding approach termed `relevance.'\u0000It identifies relevant components in a scene. To accurately and efficiently\u0000quantify relevance, we developed an event-based framework that selectively\u0000triggers relevance determination, along with a probabilistic methodology built\u0000on a structured scene representation. Simulation results demonstrate that the\u0000relevance framework and methodology accurately predict the relevance of a\u0000general HRC setup, achieving a precision of 0.99 and a recall of 0.94.\u0000Relevance can be broadly applied to several areas in HRC to improve task\u0000planning time by 79.56% compared with pure planning for a cereal task, reduce\u0000perception latency by up to 26.53% for an object detector, improve HRC safety\u0000by up to 13.50% and reduce the number of inquiries for HRC by 75.36%. A\u0000real-world demonstration showcases the relevance framework's ability to\u0000intelligently assist humans in everyday tasks.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Robotics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1