{"title":"VLM-MSGraph: Vision Language Model-enabled Multi-hierarchical Scene Graph for robotic assembly","authors":"Shufei Li , Zhijie Yan , Zuoxu Wang , Yiping Gao","doi":"10.1016/j.rcim.2025.102978","DOIUrl":null,"url":null,"abstract":"<div><div>Intelligent robotic assembly is becoming a pivotal component of the manufacturing sector, driven by growing demands for flexibility, sustainability, and resilience. Robots in manufacturing environments need perception, decision-making, and manipulation skills to support the flexible production of diverse products. However, traditional robotic assembly systems typically rely on time-consuming training processes specific to fixed settings, lacking generalization and zero-shot learning capabilities. To address these challenges, this paper introduces a Vision Language Model-enabled Multi-hierarchical Scene Graph (VLM-MSGraph) approach for robotic assembly, featuring generalized assembly sequence learning and 3D manipulation in open scenarios. The MSGraph incorporates high-level task planning structured as triplets, organized by multiple VLM agents. At a low level, the MSGraph retains 3D spatial relationships between industrial parts, enabling the robot to perform assembly tasks while accounting for object geometry for effective manipulation. Assembly drawings, physics simulations, and assembly tasks in a laboratory setting are used to evaluate the proposed system, advancing flexible automation in robotics.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"94 ","pages":"Article 102978"},"PeriodicalIF":9.1000,"publicationDate":"2025-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Computer-integrated Manufacturing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0736584525000328","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Intelligent robotic assembly is becoming a pivotal component of the manufacturing sector, driven by growing demands for flexibility, sustainability, and resilience. Robots in manufacturing environments need perception, decision-making, and manipulation skills to support the flexible production of diverse products. However, traditional robotic assembly systems typically rely on time-consuming training processes specific to fixed settings, lacking generalization and zero-shot learning capabilities. To address these challenges, this paper introduces a Vision Language Model-enabled Multi-hierarchical Scene Graph (VLM-MSGraph) approach for robotic assembly, featuring generalized assembly sequence learning and 3D manipulation in open scenarios. The MSGraph incorporates high-level task planning structured as triplets, organized by multiple VLM agents. At a low level, the MSGraph retains 3D spatial relationships between industrial parts, enabling the robot to perform assembly tasks while accounting for object geometry for effective manipulation. Assembly drawings, physics simulations, and assembly tasks in a laboratory setting are used to evaluate the proposed system, advancing flexible automation in robotics.
期刊介绍:
The journal, Robotics and Computer-Integrated Manufacturing, focuses on sharing research applications that contribute to the development of new or enhanced robotics, manufacturing technologies, and innovative manufacturing strategies that are relevant to industry. Papers that combine theory and experimental validation are preferred, while review papers on current robotics and manufacturing issues are also considered. However, papers on traditional machining processes, modeling and simulation, supply chain management, and resource optimization are generally not within the scope of the journal, as there are more appropriate journals for these topics. Similarly, papers that are overly theoretical or mathematical will be directed to other suitable journals. The journal welcomes original papers in areas such as industrial robotics, human-robot collaboration in manufacturing, cloud-based manufacturing, cyber-physical production systems, big data analytics in manufacturing, smart mechatronics, machine learning, adaptive and sustainable manufacturing, and other fields involving unique manufacturing technologies.