Pub Date : 2025-11-10DOI: 10.1016/j.rcim.2025.103179
Quanzhi Sun , Weipeng Liu , Tao Peng , Peng Zhao
Giga-casting has been rapidly developing in automotive industry since 2019, showing great advantages for lightweighting and production efficiency. Among many influential factors, die design is particularly critical for the quality of giga-casting components, as it governs the molten metal filling and solidification process. However, die design for giga-casting components faces significant challenges due to their large size, complex structure, and stringent performance requirements. The corresponding filling and solidification process have become increasingly complex to control, rendering traditional experience-based methods inadequate, which leads to time-consuming yet insufficient design. Recent engineering applications of Artificial Intelligence (AI) demonstrate great potential in complex product design, but how to effectively realize AI-empowered die design has received little attention. This paper conducts a comprehensive review of die design, identifies the key challenges and enabling factors of AI in this context, and elaborates on the proposed technical framework. The two major contributions are: 1) A four-stage evolution of casting die design is systematically analyzed to highlight existing research gaps. 2) A three-component technical framework of AI-empowered die design for giga-casting is proposed. The key enabling technologies and challenges in this framework are carefully discussed. It is envisioned that this study will establish a new procedure to improve die design efficiency.
{"title":"AI-empowered die design framework for giga-casting","authors":"Quanzhi Sun , Weipeng Liu , Tao Peng , Peng Zhao","doi":"10.1016/j.rcim.2025.103179","DOIUrl":"10.1016/j.rcim.2025.103179","url":null,"abstract":"<div><div>Giga-casting has been rapidly developing in automotive industry since 2019, showing great advantages for lightweighting and production efficiency. Among many influential factors, die design is particularly critical for the quality of giga-casting components, as it governs the molten metal filling and solidification process. However, die design for giga-casting components faces significant challenges due to their large size, complex structure, and stringent performance requirements. The corresponding filling and solidification process have become increasingly complex to control, rendering traditional experience-based methods inadequate, which leads to time-consuming yet insufficient design. Recent engineering applications of Artificial Intelligence (AI) demonstrate great potential in complex product design, but how to effectively realize AI-empowered die design has received little attention. This paper conducts a comprehensive review of die design, identifies the key challenges and enabling factors of AI in this context, and elaborates on the proposed technical framework. The two major contributions are: 1) A four-stage evolution of casting die design is systematically analyzed to highlight existing research gaps. 2) A three-component technical framework of AI-empowered die design for giga-casting is proposed. The key enabling technologies and challenges in this framework are carefully discussed. It is envisioned that this study will establish a new procedure to improve die design efficiency.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103179"},"PeriodicalIF":11.4,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145485537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-06DOI: 10.1016/j.rcim.2025.103175
Xiaozhi Feng , Tao Ding , Hao Wu , Di Li , Ning Jiang , Dahu Zhu
High-precision three-dimensional (3D) measurement of large complex components (LCCs) such as vehicle bodies provides data benchmark for subsequent robotized manufacturing processes. A huge challenge in LCCs measurement is to register the adjacent point clouds with partial overlap, especially when the point cloud geometric features are weak. Despite the existing sparse iterative closest point (Sparse-ICP) registration algorithm uses lp norm to reduce the influence of non-overlapping point clouds during the registration process, however sparse point pairs are prone to fall into local optimum, which causes the registration accuracy to be greatly affected by the initial pose. To overcome the challenging problem, we inherit the advantage of the Sparse-ICP algorithm that the point-to-point distance can suppress tangential slip in the smooth areas. On this basis, we introduce the constraint of point-to-plane distance variance minimization under sparse condition that can suppress the incorrect registration inclination caused by uneven point cloud density, and then present a hybrid algorithm termed as Sparse-VMICP for weak feature point cloud registration. The proposed algorithm aims to enhance the robotic vision measurement accuracy by suppressing registration inclination to adjust the local optimal solution. Robotic vision measurement experiments on two typical LCCs, including high-speed rail body and car bodywork are conducted to verify the superiority of the proposed algorithm. The results demonstrate that the proposed algorithm can effectively reduce the accumulated registration errors in large-scale metrology, compared with other state-of-the-art algorithms, and the stitching measurement accuracy of LCCs can reach 0.012 mm.
{"title":"Sparse-VMICP: A weak feature point cloud registration algorithm for robotic vision measurement of large complex components","authors":"Xiaozhi Feng , Tao Ding , Hao Wu , Di Li , Ning Jiang , Dahu Zhu","doi":"10.1016/j.rcim.2025.103175","DOIUrl":"10.1016/j.rcim.2025.103175","url":null,"abstract":"<div><div>High-precision three-dimensional (3D) measurement of large complex components (LCCs) such as vehicle bodies provides data benchmark for subsequent robotized manufacturing processes. A huge challenge in LCCs measurement is to register the adjacent point clouds with partial overlap, especially when the point cloud geometric features are weak. Despite the existing sparse iterative closest point (Sparse-ICP) registration algorithm uses <em>l<sub>p</sub></em> norm to reduce the influence of non-overlapping point clouds during the registration process, however sparse point pairs are prone to fall into local optimum, which causes the registration accuracy to be greatly affected by the initial pose. To overcome the challenging problem, we inherit the advantage of the Sparse-ICP algorithm that the point-to-point distance can suppress tangential slip in the smooth areas. On this basis, we introduce the constraint of point-to-plane distance variance minimization under sparse condition that can suppress the incorrect registration inclination caused by uneven point cloud density, and then present a hybrid algorithm termed as Sparse-VMICP for weak feature point cloud registration. The proposed algorithm aims to enhance the robotic vision measurement accuracy by suppressing registration inclination to adjust the local optimal solution. Robotic vision measurement experiments on two typical LCCs, including high-speed rail body and car bodywork are conducted to verify the superiority of the proposed algorithm. The results demonstrate that the proposed algorithm can effectively reduce the accumulated registration errors in large-scale metrology, compared with other state-of-the-art algorithms, and the stitching measurement accuracy of LCCs can reach 0.012 mm.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103175"},"PeriodicalIF":11.4,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145447346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-05DOI: 10.1016/j.rcim.2025.103176
Changchun Liu , JiaYe Song , Dunbing Tang , Liping Wang , Haihua Zhu , Qixiang Cai
In recent years, embodied intelligence has emerged as a practicable strategy for accomplishing human-level cognitive abilities, reasoning capacities, and execution capabilities within human-robot collaborative (HRC) assembly scenarios. As the physical instantiation of embodied intelligence, embodied agents remain largely in the exploratory phase; their practical application has yet to mature into a standardized paradigm. A key bottleneck lies in the lack of universally applicable enabling technologies, coupled with a disconnection from physical robot control systems. This deficiency necessitates repetitious training for a variety of functional models when operating in dynamic HRC environments, significantly hindering the ability of embodied agents to acclimate to complicated, dynamically changing collaborative settings. To address this challenge, this study proposes VLM-enhanced embodied agents, specifically tailored to support multimodal cognition, task reasoning, and autonomous execution in digital twin-assisted HRC assembly contexts. The framework is structured through several core steps to realize the full process closed loop from insight to autonomous execution of robots supported by embodied intelligent agents. First, a precise epsilon map relation between the embodied agent and the physical cobot is constructed, thereby enabling the digital characterization and functional capsulation of embodied agents. Building on this agent-based framework, a VLM is developed that integrates domain-specific knowledge with real-time scenario information. This dual-driven design endows the VLM with enhanced perceptual capabilities, allowing it to rapidly recognize and respond to dynamic changes in HRC scenarios. To provide a simulation and deduction engine for embodied reasoning of the assembly task, a digital twin model of the HRC scenario is built to serve as the “embodied brain”. Subsequently, these reasoning results are fed into the VLM serving as invoking parameters for the homologous sub-functional code module. This process facilitates the generation of complete robot motion code, enabling seamless physical execution and thus functioning as the “embodied neuron”. Finally, comparable experiments are conducted in an actual HRC assembly environment. The experimental results demonstrate that the proposed VLM-enhanced embodied agents have competitive advantages in multimodal cognition, task reasoning, and autonomous execution.
{"title":"From insight to autonomous execution: VLM-enhanced embodied agents towards digital twin-assisted human-robot collaborative assembly","authors":"Changchun Liu , JiaYe Song , Dunbing Tang , Liping Wang , Haihua Zhu , Qixiang Cai","doi":"10.1016/j.rcim.2025.103176","DOIUrl":"10.1016/j.rcim.2025.103176","url":null,"abstract":"<div><div>In recent years, embodied intelligence has emerged as a practicable strategy for accomplishing human-level cognitive abilities, reasoning capacities, and execution capabilities within human-robot collaborative (HRC) assembly scenarios. As the physical instantiation of embodied intelligence, embodied agents remain largely in the exploratory phase; their practical application has yet to mature into a standardized paradigm. A key bottleneck lies in the lack of universally applicable enabling technologies, coupled with a disconnection from physical robot control systems. This deficiency necessitates repetitious training for a variety of functional models when operating in dynamic HRC environments, significantly hindering the ability of embodied agents to acclimate to complicated, dynamically changing collaborative settings. To address this challenge, this study proposes VLM-enhanced embodied agents, specifically tailored to support multimodal cognition, task reasoning, and autonomous execution in digital twin-assisted HRC assembly contexts. The framework is structured through several core steps to realize the full process closed loop from insight to autonomous execution of robots supported by embodied intelligent agents. First, a precise epsilon map relation between the embodied agent and the physical cobot is constructed, thereby enabling the digital characterization and functional capsulation of embodied agents. Building on this agent-based framework, a VLM is developed that integrates domain-specific knowledge with real-time scenario information. This dual-driven design endows the VLM with enhanced perceptual capabilities, allowing it to rapidly recognize and respond to dynamic changes in HRC scenarios. To provide a simulation and deduction engine for embodied reasoning of the assembly task, a digital twin model of the HRC scenario is built to serve as the “embodied brain”. Subsequently, these reasoning results are fed into the VLM serving as invoking parameters for the homologous sub-functional code module. This process facilitates the generation of complete robot motion code, enabling seamless physical execution and thus functioning as the “embodied neuron”. Finally, comparable experiments are conducted in an actual HRC assembly environment. The experimental results demonstrate that the proposed VLM-enhanced embodied agents have competitive advantages in multimodal cognition, task reasoning, and autonomous execution.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103176"},"PeriodicalIF":11.4,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145441661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-04DOI: 10.1016/j.rcim.2025.103172
Omey M. Manyar, Rutvik Patel, Satyandra K. Gupta
Screwdriving is a crucial task routinely performed during assembly, yet most of the current automation techniques are focused on mass manufacturing environments where there is typically low part variability. However, a substantial portion of manufacturing falls under high-mix production that entails significant uncertainties due to limited fixtures and cost constraints on tooling, making them predominantly manual. In this paper, we present an autonomous mobile robotic screwdriving system suitable for high-mix, low-volume manufacturing applications and designed to operate under semi-structured conditions, handling hole pose uncertainties of up to 4 mm/3°in the hole pose. To enhance decision-making and operational efficiency, we develop a physics-informed machine-learning model that predicts nonlinear screw-tip dynamics in Cartesian space. Additionally, we propose a decision tree-based failure detection framework that identifies four distinct failure modes using force signals from the robot’s end effector. We further introduce a novel fifth failure mode, a time-based threshold for unsuccessful insertions, where our dynamics model is used to determine when to reattempt screwdriving. This integration of predictive modeling, real-time failure detection, and alert generation for human-in-the-loop decision-making improves system resilience. Our failure detection method achieves an F1-score of 0.94 on validation data and a perfect recall of 1.0 on testing. We validate our approach through screwdriving experiments on 10 real-world industrial parts using three different screw types, demonstrating the system’s robustness and adaptability in a high-mix setting.
{"title":"Autonomous robotic screwdriving for high-mix manufacturing","authors":"Omey M. Manyar, Rutvik Patel, Satyandra K. Gupta","doi":"10.1016/j.rcim.2025.103172","DOIUrl":"10.1016/j.rcim.2025.103172","url":null,"abstract":"<div><div>Screwdriving is a crucial task routinely performed during assembly, yet most of the current automation techniques are focused on mass manufacturing environments where there is typically low part variability. However, a substantial portion of manufacturing falls under high-mix production that entails significant uncertainties due to limited fixtures and cost constraints on tooling, making them predominantly manual. In this paper, we present an autonomous mobile robotic screwdriving system suitable for high-mix, low-volume manufacturing applications and designed to operate under semi-structured conditions, handling hole pose uncertainties of up to 4 mm/3°in the hole pose. To enhance decision-making and operational efficiency, we develop a physics-informed machine-learning model that predicts nonlinear screw-tip dynamics in Cartesian space. Additionally, we propose a decision tree-based failure detection framework that identifies four distinct failure modes using force signals from the robot’s end effector. We further introduce a novel fifth failure mode, a time-based threshold for unsuccessful insertions, where our dynamics model is used to determine when to reattempt screwdriving. This integration of predictive modeling, real-time failure detection, and alert generation for human-in-the-loop decision-making improves system resilience. Our failure detection method achieves an F1-score of 0.94 on validation data and a perfect recall of 1.0 on testing. We validate our approach through screwdriving experiments on 10 real-world industrial parts using three different screw types, demonstrating the system’s robustness and adaptability in a high-mix setting.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103172"},"PeriodicalIF":11.4,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145435021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.rcim.2025.103173
Jiawei Wu , Rui Fu , Xiaowei Tang , Shihao Xin , Fangyu Peng , Chenyang Wang
Robotic milling constitutes an important component of robotized intelligent manufacturing, gaining increasing popularity for subtractive manufacturing of large components. Extensive efforts have been devoted to the analysis and suppression of robot chatter to enhance milling efficiency and quality. However, the dynamic characteristics of robots are highly pose-dependent, leading to time-varying low-frequency chatter. Meanwhile, the low-frequency chatter is continuously influenced by the action of vibration suppression devices, making it challenging to consistently track and suppress time-varying chatter. To address this, this paper proposes a new concept, the potential chatter mode, to more accurately describe the target mode that requires attention in online chatter suppression. Inspired by the modulation mechanism between modal vibrations and spindle rotation during robotic milling, a potential chatter mode identification framework is developed. By investigating the distribution pattern of vibration spectra under the modulation mechanism, and integrating filtering, demodulation, signal decomposition, and vibration energy evaluation, it achieves the online identification of the time-varying frequency of potential chatter. Furthermore, the potential chatter exhibits a three-dimensional time-varying direction, whereas the existing suppression devices are generally designed to operate in one or two directions. This paper develops a novel three-dimensional orthogonal adaptive vibration absorber (TO-AVA) based on magnetorheological elastomers (MRE). By incorporating a parallel negative stiffness mechanism and parameter design, the TO-AVA can handle the three-dimensional time-varying direction of potential chatter. Validation experiments of robotic milling are conducted, which involves various process parameters and time-varying potential chatter across different directions, frequencies, and states. The results demonstrate that the developed framework can accurately identify time-varying potential chatter and effectively suppress it using the TO-AVA.
{"title":"Identification and three-dimensional absorption of time-varying potential chatter during robotic milling","authors":"Jiawei Wu , Rui Fu , Xiaowei Tang , Shihao Xin , Fangyu Peng , Chenyang Wang","doi":"10.1016/j.rcim.2025.103173","DOIUrl":"10.1016/j.rcim.2025.103173","url":null,"abstract":"<div><div>Robotic milling constitutes an important component of robotized intelligent manufacturing, gaining increasing popularity for subtractive manufacturing of large components. Extensive efforts have been devoted to the analysis and suppression of robot chatter to enhance milling efficiency and quality. However, the dynamic characteristics of robots are highly pose-dependent, leading to time-varying low-frequency chatter. Meanwhile, the low-frequency chatter is continuously influenced by the action of vibration suppression devices, making it challenging to consistently track and suppress time-varying chatter. To address this, this paper proposes a new concept, the potential chatter mode, to more accurately describe the target mode that requires attention in online chatter suppression. Inspired by the modulation mechanism between modal vibrations and spindle rotation during robotic milling, a potential chatter mode identification framework is developed. By investigating the distribution pattern of vibration spectra under the modulation mechanism, and integrating filtering, demodulation, signal decomposition, and vibration energy evaluation, it achieves the online identification of the time-varying frequency of potential chatter. Furthermore, the potential chatter exhibits a three-dimensional time-varying direction, whereas the existing suppression devices are generally designed to operate in one or two directions. This paper develops a novel three-dimensional orthogonal adaptive vibration absorber (TO-AVA) based on magnetorheological elastomers (MRE). By incorporating a parallel negative stiffness mechanism and parameter design, the TO-AVA can handle the three-dimensional time-varying direction of potential chatter. Validation experiments of robotic milling are conducted, which involves various process parameters and time-varying potential chatter across different directions, frequencies, and states. The results demonstrate that the developed framework can accurately identify time-varying potential chatter and effectively suppress it using the TO-AVA.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103173"},"PeriodicalIF":11.4,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145412100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-30DOI: 10.1016/j.rcim.2025.103174
Shuguang Liu , Jiacheng Xie , Xuewen Wang , Xiaojun Qiao
Robot operation follows a perception–decision–execution loop, where motion planning is a critical stage of decision-making that occurs after task planning to ensure precise and efficient execution. Under the demands of smart manufacturing and flexible production, motion planning for industrial robots in dynamic and unstructured environments is particularly important. Large Language Models (LLMs), with strong capabilities in language understanding and logical reasoning, have shown potential in robot motion planning, particularly when combined with Vision-Language Models (VLMs). However, existing approaches rely on the models’ intrinsic understanding, which is constrained by insufficient domain knowledge in industrial scenarios and often requires customized training and fine-tuning, resulting in high cost and poor generalizability. Industry 5.0 emphasizes a human-centric value orientation and a production model of human–robot collaboration. Against this backdrop, an Augmented Reality (AR)-assisted motion planning method for industrial mobile robots is proposed. The method transforms human perceptual results into the geometric and semantic information of key task elements through AR manual annotation, which is then input into LLMs as known conditions to enable motion planning in complex scenarios. It fully leverages human advantages in spatial perception and fundamentally avoids the limitations of LLMs in understanding industrial environments. Furthermore, a two-level motion planning architecture for industrial mobile robots is proposed to serve as planning constraints for LLMs, improving planning efficiency. A proof of concept (PoC) on mechanical equipment maintenance demonstrates the method’s feasibility and effectiveness in industrial tasks, while additional experiments substantiate its contributions of low cost, high reliability, and zero-shot transferability.
{"title":"You are my eyes: Integrating human intelligence and LLMs in AR-assisted motion planning for industrial mobile robots","authors":"Shuguang Liu , Jiacheng Xie , Xuewen Wang , Xiaojun Qiao","doi":"10.1016/j.rcim.2025.103174","DOIUrl":"10.1016/j.rcim.2025.103174","url":null,"abstract":"<div><div>Robot operation follows a perception–decision–execution loop, where motion planning is a critical stage of decision-making that occurs after task planning to ensure precise and efficient execution. Under the demands of smart manufacturing and flexible production, motion planning for industrial robots in dynamic and unstructured environments is particularly important. Large Language Models (LLMs), with strong capabilities in language understanding and logical reasoning, have shown potential in robot motion planning, particularly when combined with Vision-Language Models (VLMs). However, existing approaches rely on the models’ intrinsic understanding, which is constrained by insufficient domain knowledge in industrial scenarios and often requires customized training and fine-tuning, resulting in high cost and poor generalizability. Industry 5.0 emphasizes a human-centric value orientation and a production model of human–robot collaboration. Against this backdrop, an Augmented Reality (AR)-assisted motion planning method for industrial mobile robots is proposed. The method transforms human perceptual results into the geometric and semantic information of key task elements through AR manual annotation, which is then input into LLMs as known conditions to enable motion planning in complex scenarios. It fully leverages human advantages in spatial perception and fundamentally avoids the limitations of LLMs in understanding industrial environments. Furthermore, a two-level motion planning architecture for industrial mobile robots is proposed to serve as planning constraints for LLMs, improving planning efficiency. A proof of concept (PoC) on mechanical equipment maintenance demonstrates the method’s feasibility and effectiveness in industrial tasks, while additional experiments substantiate its contributions of low cost, high reliability, and zero-shot transferability.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103174"},"PeriodicalIF":11.4,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145396649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robotic manipulation necessitates the capability of advanced perception and grasp generation. Previous approaches for object perception in manipulation mainly rely on original point clouds captured from vision sensors, which exhibit inherent limitations in view perspectives and lack of further analysis of the sensor data. This research introduces implicit representation to facilitate part segmentation from imaging sensors, generating 3D models with structural information that provide grasp generation algorithms with more useful information. Regarding the robotic grasp, prior methods mostly rely on deep learning, which presents satisfactory performance on particular datasets yet raises concerns considering their generalization performance. Instead, this article proposes a novel grasp generation method based on 3D part segmentation, which circumvents the reliance on deep learning techniques. Extensive experimental results show that our approach can proficiently generate approximate part segmentation and high success rate grasps for various objects. By integrating part segmentation with grasp generation, the robot achieves accurate autonomous manipulation as shown in the supplementary video.
{"title":"Stable grasp generation enabled by part segmentation for real-world robotic applications","authors":"Zirui Guo, Xieyuanli Chen, Junkai Ren, Zhiqiang Zheng, Huimin Lu, Ruibin Guo","doi":"10.1016/j.rcim.2025.103170","DOIUrl":"10.1016/j.rcim.2025.103170","url":null,"abstract":"<div><div>Robotic manipulation necessitates the capability of advanced perception and grasp generation. Previous approaches for object perception in manipulation mainly rely on original point clouds captured from vision sensors, which exhibit inherent limitations in view perspectives and lack of further analysis of the sensor data. This research introduces implicit representation to facilitate part segmentation from imaging sensors, generating 3D models with structural information that provide grasp generation algorithms with more useful information. Regarding the robotic grasp, prior methods mostly rely on deep learning, which presents satisfactory performance on particular datasets yet raises concerns considering their generalization performance. Instead, this article proposes a novel grasp generation method based on 3D part segmentation, which circumvents the reliance on deep learning techniques. Extensive experimental results show that our approach can proficiently generate approximate part segmentation and high success rate grasps for various objects. By integrating part segmentation with grasp generation, the robot achieves accurate autonomous manipulation as shown in the supplementary video.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103170"},"PeriodicalIF":11.4,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145382962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-24DOI: 10.1016/j.rcim.2025.103171
Ruirui Zhong , Bingtao Hu , Zhihao Liu , Qiang Qin , Yixiong Feng , Xi Vincent Wang , Lihui Wang , Jianrong Tan
Natural and safe Human-to-Robot (H2R) object handover is a critical capability for effective Human–Robot Collaboration (HRC). However, learning a robust handover policy for this task is often hindered by the prohibitive cost of collecting physical robot demonstrations and the limitations of simplistic state representations that inadequately capture the complex dynamics of the interaction. To address these challenges, a two-stage learning framework is proposed that synthesizes substantially augmented, synthetically diverse handover demonstrations without requiring a physical robot and subsequently learns a handover policy from a rich 4D spatiotemporal flow. First, an offline, physical robot-free data-generation pipeline is introduced that produces augmented and diverse handover demonstrations, thereby eliminating the need for costly physical data collection. Second, a novel 4D spatiotemporal flow is defined as a comprehensive representation consisting of a skeletal kinematic flow that captures high-level motion dynamics and a geometric motion flow that characterizes fine-grained surface interactions. Finally, a diffusion-based policy conditioned on this spatiotemporal representation is developed to generate coherent and anticipatory robot actions. Extensive experiments demonstrate that the proposed method significantly outperforms state-of-the-art baselines in task success, efficiency, and motion quality, thereby paving the way for safer and more intuitive collaborative robots.
{"title":"A two-stage framework for learning human-to-robot object handover policy from 4D spatiotemporal flow","authors":"Ruirui Zhong , Bingtao Hu , Zhihao Liu , Qiang Qin , Yixiong Feng , Xi Vincent Wang , Lihui Wang , Jianrong Tan","doi":"10.1016/j.rcim.2025.103171","DOIUrl":"10.1016/j.rcim.2025.103171","url":null,"abstract":"<div><div>Natural and safe Human-to-Robot (H2R) object handover is a critical capability for effective Human–Robot Collaboration (HRC). However, learning a robust handover policy for this task is often hindered by the prohibitive cost of collecting physical robot demonstrations and the limitations of simplistic state representations that inadequately capture the complex dynamics of the interaction. To address these challenges, a two-stage learning framework is proposed that synthesizes substantially augmented, synthetically diverse handover demonstrations without requiring a physical robot and subsequently learns a handover policy from a rich 4D spatiotemporal flow. First, an offline, physical robot-free data-generation pipeline is introduced that produces augmented and diverse handover demonstrations, thereby eliminating the need for costly physical data collection. Second, a novel 4D spatiotemporal flow is defined as a comprehensive representation consisting of a skeletal kinematic flow that captures high-level motion dynamics and a geometric motion flow that characterizes fine-grained surface interactions. Finally, a diffusion-based policy conditioned on this spatiotemporal representation is developed to generate coherent and anticipatory robot actions. Extensive experiments demonstrate that the proposed method significantly outperforms state-of-the-art baselines in task success, efficiency, and motion quality, thereby paving the way for safer and more intuitive collaborative robots.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103171"},"PeriodicalIF":11.4,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145362825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-24DOI: 10.1016/j.rcim.2025.103163
Qi Gao , Zhenyu Liu , Mingjie Hou , Guodong Sa , Jianrong Tan
With the increasing demand for flexibility and adaptability in modern manufacturing systems, intelligent perception and recognition of human actions in human-robot collaborative assembly (HRCA) tasks have garnered significant attention. However, accurate action recognition in complex and dynamic environments remains challenging due to challenges in multimodal fusion and semantic understanding. To address these challenges, a semantically-contrastive action recognition network (SCAR) is proposed, which enhances fine-grained modeling and discrimination of assembly actions. SCAR integrates structural motion information from skeleton sequences with semantic and contextual features extracted from RGB images, thereby improving comprehensive scene perception. Furthermore, task-relevant textual descriptions are introduced as semantic priors to guide cross-modal feature learning. A contrastive learning strategy is employed to reinforce semantic alignment and discriminability across modalities, facilitating the learning of task-aware representations. Evaluations on the benchmark action dataset NTU RGB+D and practical HRCA tasks demonstrate that SCAR significantly outperforms mainstream methods in recognition accuracy. The advantage is particularly evident in scenarios involving ambiguous operations and semantically similar assembly tasks. Ablation studies further validate the efficacy of the semantic guidance mechanism and contrastive learning strategy in enhancing modality complementarity and system robustness.
{"title":"Multimodal action recognition in human–robot collaborative assembly: A contrastive semantic query approach","authors":"Qi Gao , Zhenyu Liu , Mingjie Hou , Guodong Sa , Jianrong Tan","doi":"10.1016/j.rcim.2025.103163","DOIUrl":"10.1016/j.rcim.2025.103163","url":null,"abstract":"<div><div>With the increasing demand for flexibility and adaptability in modern manufacturing systems, intelligent perception and recognition of human actions in human-robot collaborative assembly (HRCA) tasks have garnered significant attention. However, accurate action recognition in complex and dynamic environments remains challenging due to challenges in multimodal fusion and semantic understanding. To address these challenges, a semantically-contrastive action recognition network (SCAR) is proposed, which enhances fine-grained modeling and discrimination of assembly actions. SCAR integrates structural motion information from skeleton sequences with semantic and contextual features extracted from RGB images, thereby improving comprehensive scene perception. Furthermore, task-relevant textual descriptions are introduced as semantic priors to guide cross-modal feature learning. A contrastive learning strategy is employed to reinforce semantic alignment and discriminability across modalities, facilitating the learning of task-aware representations. Evaluations on the benchmark action dataset NTU RGB+<em>D</em> and practical HRCA tasks demonstrate that SCAR significantly outperforms mainstream methods in recognition accuracy. The advantage is particularly evident in scenarios involving ambiguous operations and semantically similar assembly tasks. Ablation studies further validate the efficacy of the semantic guidance mechanism and contrastive learning strategy in enhancing modality complementarity and system robustness.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103163"},"PeriodicalIF":11.4,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145362827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-23DOI: 10.1016/j.rcim.2025.103165
Yaguang Zhou , Chao Zhang , Guanghui Zhou , Chong Han , Jiancong Liu , Hongwen Xing , Wei Wang , Ende Ge , Xiaonan Zhang , Asoke K. Nandi
As a modular component of discrete shop-floors, the manufacturing cell offers specific strengths in detecting operation time fluctuations induced by gradual disturbances in the multi-variety, small-batch production mode. Traditional research on abnormal production state detection in shop-floors typically relies on statistical analysis, machine learning, and deep learning methods. However, these methods demonstrate limitations in both comprehensiveness and effectiveness when applied to gradual disturbance detection. Moreover, these studies could solely address the limitations of gradual disturbance detection, without providing insights into how such detection contributes to improvements in the production process. To this end, this study adopts a digital twin driven perspective to not only detect gradual disturbances, but also to associate disturbance detection with bottleneck alleviation and system performance enhancement. Grounded in the synchronization between the physical manufacturing cell in the physical space and its mirrored virtual counterpart in the virtual space, this study models production activities via actual and virtual dynamic graphs in the data space. Within the model space, we jointly employ the convolutional neural network and the graph convolutional network to extract both structured and graph features from production data. The integration across multiple spaces enables digital twin driven of gradual disturbance detection, contributing to bottleneck alleviation and performance enhancement at the system level. This study's comprehensiveness and effectiveness in detecting gradual disturbances are validated on both simulation and actual datasets. Additionally, experiments that inject gradual disturbances into real production scenarios verify that disturbance detection supports both bottleneck alleviation and overall system enhancement.
{"title":"A gradual disturbance detection model of manufacturing cell: A digital twin driven perspective","authors":"Yaguang Zhou , Chao Zhang , Guanghui Zhou , Chong Han , Jiancong Liu , Hongwen Xing , Wei Wang , Ende Ge , Xiaonan Zhang , Asoke K. Nandi","doi":"10.1016/j.rcim.2025.103165","DOIUrl":"10.1016/j.rcim.2025.103165","url":null,"abstract":"<div><div>As a modular component of discrete shop-floors, the manufacturing cell offers specific strengths in detecting operation time fluctuations induced by gradual disturbances in the multi-variety, small-batch production mode. Traditional research on abnormal production state detection in shop-floors typically relies on statistical analysis, machine learning, and deep learning methods. However, these methods demonstrate limitations in both comprehensiveness and effectiveness when applied to gradual disturbance detection. Moreover, these studies could solely address the limitations of gradual disturbance detection, without providing insights into how such detection contributes to improvements in the production process. To this end, this study adopts a digital twin driven perspective to not only detect gradual disturbances, but also to associate disturbance detection with bottleneck alleviation and system performance enhancement. Grounded in the synchronization between the physical manufacturing cell in the physical space and its mirrored virtual counterpart in the virtual space, this study models production activities via actual and virtual dynamic graphs in the data space. Within the model space, we jointly employ the convolutional neural network and the graph convolutional network to extract both structured and graph features from production data. The integration across multiple spaces enables digital twin driven of gradual disturbance detection, contributing to bottleneck alleviation and performance enhancement at the system level. This study's comprehensiveness and effectiveness in detecting gradual disturbances are validated on both simulation and actual datasets. Additionally, experiments that inject gradual disturbances into real production scenarios verify that disturbance detection supports both bottleneck alleviation and overall system enhancement.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103165"},"PeriodicalIF":11.4,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145362826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}