Tim Engelbracht, René Zurbrügg, Marc Pollefeys, Hermann Blum, Zuria Bauer
{"title":"SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection","authors":"Tim Engelbracht, René Zurbrügg, Marc Pollefeys, Hermann Blum, Zuria Bauer","doi":"arxiv-2409.11870","DOIUrl":null,"url":null,"abstract":"Despite increasing research efforts on household robotics, robots intended\nfor deployment in domestic settings still struggle with more complex tasks such\nas interacting with functional elements like drawers or light switches, largely\ndue to limited task-specific understanding and interaction capabilities. These\ntasks require not only detection and pose estimation but also an understanding\nof the affordances these elements provide. To address these challenges and\nenhance robotic scene understanding, we introduce SpotLight: A comprehensive\nframework for robotic interaction with functional elements, specifically light\nswitches. Furthermore, this framework enables robots to improve their\nenvironmental understanding through interaction. Leveraging VLM-based\naffordance prediction to estimate motion primitives for light switch\ninteraction, we achieve up to 84% operation success in real world experiments.\nWe further introduce a specialized dataset containing 715 images as well as a\ncustom detection model for light switch detection. We demonstrate how the\nframework can facilitate robot learning through physical interaction by having\nthe robot explore the environment and discover previously unknown relationships\nin a scene graph representation. Lastly, we propose an extension to the\nframework to accommodate other functional interactions such as swing doors,\nshowcasing its flexibility. Videos and Code:\ntimengelbracht.github.io/SpotLight/","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11870","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Despite increasing research efforts on household robotics, robots intended
for deployment in domestic settings still struggle with more complex tasks such
as interacting with functional elements like drawers or light switches, largely
due to limited task-specific understanding and interaction capabilities. These
tasks require not only detection and pose estimation but also an understanding
of the affordances these elements provide. To address these challenges and
enhance robotic scene understanding, we introduce SpotLight: A comprehensive
framework for robotic interaction with functional elements, specifically light
switches. Furthermore, this framework enables robots to improve their
environmental understanding through interaction. Leveraging VLM-based
affordance prediction to estimate motion primitives for light switch
interaction, we achieve up to 84% operation success in real world experiments.
We further introduce a specialized dataset containing 715 images as well as a
custom detection model for light switch detection. We demonstrate how the
framework can facilitate robot learning through physical interaction by having
the robot explore the environment and discover previously unknown relationships
in a scene graph representation. Lastly, we propose an extension to the
framework to accommodate other functional interactions such as swing doors,
showcasing its flexibility. Videos and Code:
timengelbracht.github.io/SpotLight/