Charles A. Meehan, Paul Rademacher, Mark Roberts, Laura M. Hiatt
{"title":"Composing Option Sequences by Adaptation: Initial Results","authors":"Charles A. Meehan, Paul Rademacher, Mark Roberts, Laura M. Hiatt","doi":"arxiv-2409.08195","DOIUrl":null,"url":null,"abstract":"Robot manipulation in real-world settings often requires adapting the robot's\nbehavior to the current situation, such as by changing the sequences in which\npolicies execute to achieve the desired task. Problematically, however, we show\nthat composing a novel sequence of five deep RL options to perform a\npick-and-place task is unlikely to successfully complete, even if their\ninitiation and termination conditions align. We propose a framework to\ndetermine whether sequences will succeed a priori, and examine three approaches\nthat adapt options to sequence successfully if they will not. Crucially, our\nadaptation methods consider the actual subset of points that the option is\ntrained from or where it ends: (1) trains the second option to start where the\nfirst ends; (2) trains the first option to reach the centroid of where the\nsecond starts; and (3) trains the first option to reach the median of where the\nsecond starts. Our results show that our framework and adaptation methods have\npromise in adapting options to work in novel sequences.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08195","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Robot manipulation in real-world settings often requires adapting the robot's
behavior to the current situation, such as by changing the sequences in which
policies execute to achieve the desired task. Problematically, however, we show
that composing a novel sequence of five deep RL options to perform a
pick-and-place task is unlikely to successfully complete, even if their
initiation and termination conditions align. We propose a framework to
determine whether sequences will succeed a priori, and examine three approaches
that adapt options to sequence successfully if they will not. Crucially, our
adaptation methods consider the actual subset of points that the option is
trained from or where it ends: (1) trains the second option to start where the
first ends; (2) trains the first option to reach the centroid of where the
second starts; and (3) trains the first option to reach the median of where the
second starts. Our results show that our framework and adaptation methods have
promise in adapting options to work in novel sequences.