{"title":"DLO Perceiver: Grounding Large Language Model for Deformable Linear Objects Perception","authors":"Alessio Caporali;Kevin Galassi;Gianluca Palli","doi":"10.1109/LRA.2024.3491428","DOIUrl":null,"url":null,"abstract":"The perception of Deformable Linear Objects (DLOs) is a challenging task due to their complex and ambiguous appearance, lack of discernible features, typically small sizes, and deformability. Despite these challenges, achieving a robust and effective segmentation of DLOs is crucial to introduce robots into environments where they are currently underrepresented, such as domestic and complex industrial settings. In this context, the integration of language-based inputs can simplify the perception task while also enabling the possibility of introducing robots as human companions. Therefore, this letter proposes a novel architecture for the perception of DLOs, wherein the input image is augmented with a text-based prompt guiding the segmentation of the target DLO. After encoding the image and text separately, a Perceiver-inspired structure is exploited to compress the concatenated data into transformer layers and generate the output mask from a latent vector representation. The method is experimentally evaluated on real-world images of DLOs like electrical cables and ropes, validating its efficacy and efficiency in real practical scenarios.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11385-11392"},"PeriodicalIF":4.6000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10742556","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10742556/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0
Abstract
The perception of Deformable Linear Objects (DLOs) is a challenging task due to their complex and ambiguous appearance, lack of discernible features, typically small sizes, and deformability. Despite these challenges, achieving a robust and effective segmentation of DLOs is crucial to introduce robots into environments where they are currently underrepresented, such as domestic and complex industrial settings. In this context, the integration of language-based inputs can simplify the perception task while also enabling the possibility of introducing robots as human companions. Therefore, this letter proposes a novel architecture for the perception of DLOs, wherein the input image is augmented with a text-based prompt guiding the segmentation of the target DLO. After encoding the image and text separately, a Perceiver-inspired structure is exploited to compress the concatenated data into transformer layers and generate the output mask from a latent vector representation. The method is experimentally evaluated on real-world images of DLOs like electrical cables and ropes, validating its efficacy and efficiency in real practical scenarios.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.