首页 > 最新文献

Robotics and Computer-integrated Manufacturing最新文献

英文 中文
A robotic framework for high-throughput and multi-view 3D digital image correlation (3D-DIC): Increasing measurement volume and versatility for deformation analysis 用于高通量和多视图3D数字图像相关(3D- dic)的机器人框架:增加变形分析的测量量和多功能性
IF 11.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-06-01 Epub Date: 2025-11-26 DOI: 10.1016/j.rcim.2025.103187
Özgüç Bertuğ Çapunaman , Alale Mohseni , Dennis Dombrovskij , Kaiyang Yin , Benay Gürsoy , Max David Mylo
Three-dimensional digital image correlation (3D-DIC) is a widely applicable, non-contact optical imaging technique for accurately quantifying full-field surface displacements and strains in materials and structures. However, conventional 3D-DIC implementations relying on fixed stereo camera positions face trade-offs between the field-of-view and spatial resolution and lack high-throughput for long-duration measurements. Here we present an integrated robotic 3D-DIC framework that employs an industrial robotic arm to autonomously and repeatedly reposition stereo cameras. This enables automated calibration, monitoring of multiple samples over extended periods, and expansion of the effective spatial coverage and data throughput, all while maintaining calibration stability and measurement fidelity. We validate this approach on rigid and deforming reference samples and demonstrate its ability to quantify material deformation of bio-composite samples simultaneously during the drying process. Under robotic repositioning, rigid samples exhibit stable displacement and strain measurements while benefiting from significantly increased volumetric coverage and reduced manual oversight. Thus, the proposed system improves experimental efficiency and allows for the incorporation of advanced techniques, such as multi-view stitching, to characterize complex geometries with higher effective resolution. When applied to slowly deforming bio-composites, the system can capture time-lapse images from multiple viewpoints, providing a more comprehensive assessment of complex, evolving material behaviors. These enhancements in 3D-DIC further improve geometric accuracy, increase data density, and expand applicability to a broader range of materials and experimental conditions. Ultimately, the proposed robot-assisted 3D-DIC system creates a robust, high-throughput monitoring framework for bio-fabrication, additive manufacturing, and advanced composite processing, paving the way for targeted programming of shape changes, among other applications.
三维数字图像相关(3D-DIC)是一种应用广泛的非接触式光学成像技术,用于精确量化材料和结构的全场表面位移和应变。然而,传统的3D-DIC实现依赖于固定的立体摄像机位置,面临着视场和空间分辨率之间的权衡,并且缺乏长时间测量的高通量。在这里,我们提出了一个集成的机器人3D-DIC框架,它采用工业机械臂来自主地反复重新定位立体摄像机。这可以实现自动校准,长时间监测多个样品,扩大有效的空间覆盖和数据吞吐量,同时保持校准稳定性和测量保真度。我们在刚性和变形参考样品上验证了这种方法,并证明了它在干燥过程中同时量化生物复合材料样品的材料变形的能力。在机器人重新定位下,刚性样品表现出稳定的位移和应变测量,同时受益于显著增加的体积覆盖和减少人工监督。因此,所提出的系统提高了实验效率,并允许结合先进的技术,如多视图拼接,以更高的有效分辨率表征复杂的几何形状。当应用于缓慢变形的生物复合材料时,该系统可以从多个视点捕获延时图像,从而对复杂的、不断变化的材料行为提供更全面的评估。3D-DIC的这些增强功能进一步提高了几何精度,增加了数据密度,并扩展了对更广泛的材料和实验条件的适用性。最终,提出的机器人辅助3D-DIC系统为生物制造、增材制造和先进复合材料加工创造了一个强大的、高通量的监测框架,为有针对性的形状变化编程铺平了道路,以及其他应用。
{"title":"A robotic framework for high-throughput and multi-view 3D digital image correlation (3D-DIC): Increasing measurement volume and versatility for deformation analysis","authors":"Özgüç Bertuğ Çapunaman ,&nbsp;Alale Mohseni ,&nbsp;Dennis Dombrovskij ,&nbsp;Kaiyang Yin ,&nbsp;Benay Gürsoy ,&nbsp;Max David Mylo","doi":"10.1016/j.rcim.2025.103187","DOIUrl":"10.1016/j.rcim.2025.103187","url":null,"abstract":"<div><div>Three-dimensional digital image correlation (3D-DIC) is a widely applicable, non-contact optical imaging technique for accurately quantifying full-field surface displacements and strains in materials and structures. However, conventional 3D-DIC implementations relying on fixed stereo camera positions face trade-offs between the field-of-view and spatial resolution and lack high-throughput for long-duration measurements. Here we present an integrated robotic 3D-DIC framework that employs an industrial robotic arm to autonomously and repeatedly reposition stereo cameras. This enables automated calibration, monitoring of multiple samples over extended periods, and expansion of the effective spatial coverage and data throughput, all while maintaining calibration stability and measurement fidelity. We validate this approach on rigid and deforming reference samples and demonstrate its ability to quantify material deformation of bio-composite samples simultaneously during the drying process. Under robotic repositioning, rigid samples exhibit stable displacement and strain measurements while benefiting from significantly increased volumetric coverage and reduced manual oversight. Thus, the proposed system improves experimental efficiency and allows for the incorporation of advanced techniques, such as multi-view stitching, to characterize complex geometries with higher effective resolution. When applied to slowly deforming bio-composites, the system can capture time-lapse images from multiple viewpoints, providing a more comprehensive assessment of complex, evolving material behaviors. These enhancements in 3D-DIC further improve geometric accuracy, increase data density, and expand applicability to a broader range of materials and experimental conditions. Ultimately, the proposed robot-assisted 3D-DIC system creates a robust, high-throughput monitoring framework for bio-fabrication, additive manufacturing, and advanced composite processing, paving the way for targeted programming of shape changes, among other applications.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"99 ","pages":"Article 103187"},"PeriodicalIF":11.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145595008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-based multi-scale fusion learning for STEP-NC machining feature recognition 基于图的STEP-NC加工特征识别多尺度融合学习
IF 11.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-06-01 Epub Date: 2025-12-24 DOI: 10.1016/j.rcim.2025.103210
Zichuan Chai , Wenlei Xiao , Gang Zhao , Tianze Qiu , Yan Liu , Songyuan Xue , Oluwasheyi Oyename , Zheng Shi
The integration of AI into next-generation CAM systems has attracted significant research interest. Wherein, automatic feature recognition is a critical prerequisite before machining paths could be generated accordingly. Consequently, researchers have increasingly leveraged deep learning methodologies for geometric feature recognition from B-rep models. However, research targeting the recognition of machining features that ensure compatibility with downstream CAM toolpath generation remains limited. This paper proposes a multi-scale fusion graph neural network framework that embeds STEP-NC machining features to enhance their potency on the subsequent toolpath generation. Initially, feature semantics are extracted in accordance with the STEP-NC ISO 14649 standard, and a fusion network is constructed by integrating the adjacent-face aggregation of the GIN with the multi-head self-attention mechanism of the Graph Transformer. In the output layer, fine-grained label decomposition is performed based on standard definitions, enabling concurrent prediction of feature categories and their associated EXPRESS representations. Following pre-training, the model undergoes unsupervised fine-tuning on unlabeled real-world workpiece data to improve its generalization performance in practical manufacturing scenarios. Experimental results achieve over 85% recognition accuracy for real-part machining features in the automated manufacturing tasks.
将人工智能集成到下一代CAM系统中已经引起了极大的研究兴趣。其中,自动特征识别是加工轨迹生成的关键前提。因此,研究人员越来越多地利用深度学习方法从B-rep模型中识别几何特征。然而,针对加工特征的识别,以确保与下游凸轮刀具轨迹生成的兼容性的研究仍然有限。本文提出了一种嵌入STEP-NC加工特征的多尺度融合图神经网络框架,以增强其在后续刀具路径生成中的效力。首先,根据STEP-NC ISO 14649标准提取特征语义,并将GIN的邻接面聚合与Graph Transformer的多头自关注机制相结合,构建融合网络。在输出层中,基于标准定义执行细粒度标签分解,支持对特征类别及其相关EXPRESS表示进行并发预测。在预训练之后,该模型对未标记的真实工件数据进行无监督微调,以提高其在实际制造场景中的泛化性能。实验结果表明,在自动化制造任务中,该方法对实零件加工特征的识别准确率达到85%以上。
{"title":"Graph-based multi-scale fusion learning for STEP-NC machining feature recognition","authors":"Zichuan Chai ,&nbsp;Wenlei Xiao ,&nbsp;Gang Zhao ,&nbsp;Tianze Qiu ,&nbsp;Yan Liu ,&nbsp;Songyuan Xue ,&nbsp;Oluwasheyi Oyename ,&nbsp;Zheng Shi","doi":"10.1016/j.rcim.2025.103210","DOIUrl":"10.1016/j.rcim.2025.103210","url":null,"abstract":"<div><div>The integration of AI into next-generation CAM systems has attracted significant research interest. Wherein, automatic feature recognition is a critical prerequisite before machining paths could be generated accordingly. Consequently, researchers have increasingly leveraged deep learning methodologies for geometric feature recognition from B-rep models. However, research targeting the recognition of machining features that ensure compatibility with downstream CAM toolpath generation remains limited. This paper proposes a multi-scale fusion graph neural network framework that embeds STEP-NC machining features to enhance their potency on the subsequent toolpath generation. Initially, feature semantics are extracted in accordance with the STEP-NC ISO 14649 standard, and a fusion network is constructed by integrating the adjacent-face aggregation of the GIN with the multi-head self-attention mechanism of the Graph Transformer. In the output layer, fine-grained label decomposition is performed based on standard definitions, enabling concurrent prediction of feature categories and their associated EXPRESS representations. Following pre-training, the model undergoes unsupervised fine-tuning on unlabeled real-world workpiece data to improve its generalization performance in practical manufacturing scenarios. Experimental results achieve over 85% recognition accuracy for real-part machining features in the automated manufacturing tasks.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"99 ","pages":"Article 103210"},"PeriodicalIF":11.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Time–torque coordinated optimization for trajectory planning of industrial robots 工业机器人轨迹规划的时间-力矩协调优化
IF 11.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-06-01 Epub Date: 2025-12-01 DOI: 10.1016/j.rcim.2025.103199
Zeyun Xiao, Danfeng Sun, Donglai Zhu, Yong Wang, Yi Yan, Huifeng Wu
Trajectory optimization is vital for improving the operational efficiency and reliability of industrial robotic arms. However, increasing task execution speed frequently induces sharp fluctuations in joint torque, which elevates energy consumption, places excessive stress on actuators, and excites structural vibrations. To tackle these issues, this study introduces a unified time–torque optimization framework that combines predictive modeling with heuristic search. The framework adopts a neural network incorporating frequency-domain correlation and a delay-aware mechanism to model dynamic torque variations. Based on the predicted torque profiles, the robot’s motion trajectory is parameterized by quintic polynomials, and a multi-objective loss function is constructed to jointly minimize execution time and torque variation rate. Particle Swarm Optimization (PSO) is employed to perform a global search for intermediate joint velocities and accelerations, improving convergence toward near-optimal solutions. Experiments on a six-axis industrial robotic platform demonstrate that the proposed method effectively reduces execution time and smooths torque transitions, confirming its practicality for industrial applications.
轨迹优化是提高工业机械臂运行效率和可靠性的关键。然而,任务执行速度的提高往往会引起关节扭矩的急剧波动,从而增加能量消耗,对执行器施加过大的应力,并激发结构振动。为了解决这些问题,本研究引入了一个统一的时间-扭矩优化框架,该框架将预测建模与启发式搜索相结合。该框架采用结合频域相关和延迟感知机制的神经网络对动态转矩变化进行建模。基于预测的转矩曲线,采用五次多项式参数化机器人运动轨迹,构建多目标损失函数,共同最小化执行时间和转矩变化率。采用粒子群算法(PSO)对中间关节速度和加速度进行全局搜索,提高了接近最优解的收敛性。在六轴工业机器人平台上的实验表明,该方法有效地缩短了执行时间,平滑了转矩转换,验证了其在工业应用中的实用性。
{"title":"Time–torque coordinated optimization for trajectory planning of industrial robots","authors":"Zeyun Xiao,&nbsp;Danfeng Sun,&nbsp;Donglai Zhu,&nbsp;Yong Wang,&nbsp;Yi Yan,&nbsp;Huifeng Wu","doi":"10.1016/j.rcim.2025.103199","DOIUrl":"10.1016/j.rcim.2025.103199","url":null,"abstract":"<div><div>Trajectory optimization is vital for improving the operational efficiency and reliability of industrial robotic arms. However, increasing task execution speed frequently induces sharp fluctuations in joint torque, which elevates energy consumption, places excessive stress on actuators, and excites structural vibrations. To tackle these issues, this study introduces a unified time–torque optimization framework that combines predictive modeling with heuristic search. The framework adopts a neural network incorporating frequency-domain correlation and a delay-aware mechanism to model dynamic torque variations. Based on the predicted torque profiles, the robot’s motion trajectory is parameterized by quintic polynomials, and a multi-objective loss function is constructed to jointly minimize execution time and torque variation rate. Particle Swarm Optimization (PSO) is employed to perform a global search for intermediate joint velocities and accelerations, improving convergence toward near-optimal solutions. Experiments on a six-axis industrial robotic platform demonstrate that the proposed method effectively reduces execution time and smooths torque transitions, confirming its practicality for industrial applications.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"99 ","pages":"Article 103199"},"PeriodicalIF":11.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145651051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Target-oriented collision-free robot grasping using task-attendance teachers-student knowledge distillation for various dense-clutter scenarios 基于任务出勤师生知识蒸馏的面向目标无碰撞机器人抓取
IF 11.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-06-01 Epub Date: 2025-12-10 DOI: 10.1016/j.rcim.2025.103201
Shaodong Li , Cheng Xiang , Wei Du , Xi Liu , Huajian Song , Feng Shuang
The target-oriented collision-free robot grasping in dense-clutter scenarios has made significant progress. However, the prior studies primarily focus on the complex problem brought about by an increase in object quantity. Actually, the various scenarios will also result in the challenge. That means skill models will inevitably start to obtain bloat, as more subtasks appear. Our previous study presents a multi-teacher distillation strategy, thereby effectively avoiding bloat. Unfortunately, it still has trouble in execution capability when the scenarios have more subtasks and highly similar subtasks. The root cause is attributed to the inadequate capability of distilled student in subtask identification. It is naive to develop the subtask classifier based on the traditional data-driven way when human experience fails to work. Therefore, we propose a dataset-free classifier by migrating the classification capability of teachers to the student. Then, we further propose a task-attendance teachers-student knowledge distillation strategy to afford the various scenarios involving more and highly similar subtasks, thus significantly enhancing the performance of grasping. And we also leverage the language instruction to ensure the mask map of target object through LLM, improving the intuitiveness of human-robot cooperation. Extensive comparative experiment verifies the advantage of our framework. We measure the capability of teachers-student distillation, and value of dataset-free classifier in our framework. Importantly, the performance of segmentation strategy is tested under ambiguous instruction, visual occlusion, and color conflict. Furthermore, the excellent generalization and robustness are exhibited according to the real-world experiments involving unseen object, unseen task modality, and disturbance existing.
面向目标的无碰撞机器人在密集杂波环境下抓取的研究取得了重大进展。然而,以往的研究主要集中在对象数量增加所带来的复杂问题上。实际上,各种各样的场景也会带来挑战。这意味着随着更多子任务的出现,技能模型将不可避免地开始膨胀。我们之前的研究提出了一个多教师蒸馏策略,从而有效地避免了膨胀。不幸的是,当场景具有更多的子任务和高度相似的子任务时,它在执行能力方面仍然存在问题。其根本原因在于学生在子任务识别方面的能力不足。当人的经验不起作用时,基于传统的数据驱动方法开发子任务分类器是幼稚的。因此,我们提出了一种无数据集的分类器,将教师的分类能力转移到学生身上。在此基础上,我们进一步提出了任务-考勤-师生知识蒸馏策略,以应对涉及更多且高度相似的子任务的各种场景,从而显著提高了抓取性能。并且利用语言指令通过LLM保证目标物体的掩模图,提高了人机合作的直观性。大量的对比实验验证了该框架的优越性。在我们的框架中,我们衡量了师生蒸馏的能力和无数据集分类器的价值。重要的是,在模糊指令、视觉遮挡和颜色冲突的情况下测试了分割策略的性能。此外,通过对不可见目标、不可见任务模态和存在干扰的实际实验,证明了该方法具有良好的泛化和鲁棒性。
{"title":"Target-oriented collision-free robot grasping using task-attendance teachers-student knowledge distillation for various dense-clutter scenarios","authors":"Shaodong Li ,&nbsp;Cheng Xiang ,&nbsp;Wei Du ,&nbsp;Xi Liu ,&nbsp;Huajian Song ,&nbsp;Feng Shuang","doi":"10.1016/j.rcim.2025.103201","DOIUrl":"10.1016/j.rcim.2025.103201","url":null,"abstract":"<div><div>The target-oriented collision-free robot grasping in dense-clutter scenarios has made significant progress. However, the prior studies primarily focus on the complex problem brought about by an increase in object quantity. Actually, the various scenarios will also result in the challenge. That means skill models will inevitably start to obtain bloat, as more subtasks appear. Our previous study presents a multi-teacher distillation strategy, thereby effectively avoiding bloat. Unfortunately, it still has trouble in execution capability when the scenarios have more subtasks and highly similar subtasks. The root cause is attributed to the inadequate capability of distilled student in subtask identification. It is naive to develop the subtask classifier based on the traditional data-driven way when human experience fails to work. Therefore, we propose a dataset-free classifier by migrating the classification capability of teachers to the student. Then, we further propose a task-attendance teachers-student knowledge distillation strategy to afford the various scenarios involving more and highly similar subtasks, thus significantly enhancing the performance of grasping. And we also leverage the language instruction to ensure the mask map of target object through LLM, improving the intuitiveness of human-robot cooperation. Extensive comparative experiment verifies the advantage of our framework. We measure the capability of teachers-student distillation, and value of dataset-free classifier in our framework. Importantly, the performance of segmentation strategy is tested under ambiguous instruction, visual occlusion, and color conflict. Furthermore, the excellent generalization and robustness are exhibited according to the real-world experiments involving unseen object, unseen task modality, and disturbance existing.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"99 ","pages":"Article 103201"},"PeriodicalIF":11.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145731770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From drawings to decisions: A hybrid vision-language framework for parsing 2D engineering drawings into structured manufacturing knowledge 从图纸到决策:用于将2D工程图纸解析为结构化制造知识的混合视觉语言框架
IF 11.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-06-01 Epub Date: 2025-11-28 DOI: 10.1016/j.rcim.2025.103186
Muhammad Tayyab Khan , Lequn Chen , Zane Yong , Jun Ming Tan , Wenhe Feng , Seung Ki Moon
Efficient and accurate extraction of key information from 2D engineering drawings is essential for advancing digital manufacturing workflows. This information includes elements such as geometric dimensioning and tolerancing (GD&T), measures, material specifications, and textual annotations. Manual extraction remains slow and labor-intensive, while generic optical character recognition (OCR) models often fail to interpret 2D drawings accurately due to complex layouts, engineering symbols, and rotated annotations. These limitations result in incomplete and unreliable outputs. To address these challenges, this paper proposes a hybrid vision-language framework that integrates a rotation-aware object detection model (YOLOv11-obb) with a transformer-based vision-language parser. We introduce a structured parsing pipeline that first applies YOLOv11-obb to localize annotations and extract oriented bounding box (OBB) image patches, which are subsequently parsed into structured outputs using a fine-tuned, lightweight vision-language model (VLM). To develop and evaluate this pipeline, we curate a dataset of 1367 2D mechanical drawings manually annotated across nine key categories: GD&Ts, General Tolerances, Measures, Materials, Notes, Radii, Surface Roughness, Threads, and Title Blocks. YOLOv11-obb is trained on this dataset to detect OBBs and extract annotation patches. These image patches are then parsed using two fine-tuned open-source VLMs. The first is Donut, a transformer-based model that combines a Swin-B visual encoder with a BART text decoder, enabling end-to-end parsing directly from images without relying on OCR. The second is Florence-2, a prompt-driven encoder–decoder model that integrates a DaViT vision backbone and supports structured output generation through multimodal token alignment. Both models are lightweight and well-suited for specialized industrial tasks under limited computational overhead. Following fine-tuning of both models on the curated dataset of image patches paired with structured annotation labels, a comparative experiment is conducted to evaluate parsing performance across four key metrics. Donut outperforms Florence-2, achieving 89.2 % precision, 99.2 % recall, and a 94 % F1-score, with a hallucination rate of 10.8 %. Finally, a case study demonstrates how the extracted structured information supports downstream manufacturing tasks such as process and tool selection, showcasing the practical utility of the proposed framework in modernizing 2D drawing interpretation.
从2D工程图纸中高效准确地提取关键信息对于推进数字化制造工作流程至关重要。这些信息包括几何尺寸和公差(gdt)、测量、材料规格和文本注释等元素。人工提取仍然是缓慢和劳动密集型的,而一般的光学字符识别(OCR)模型往往不能准确地解释2D图纸,因为复杂的布局,工程符号,和旋转的注释。这些限制导致输出不完整和不可靠。为了解决这些挑战,本文提出了一种混合视觉语言框架,该框架将旋转感知对象检测模型(YOLOv11-obb)与基于转换器的视觉语言解析器集成在一起。我们引入了一个结构化解析管道,该管道首先应用YOLOv11-obb来定位注释并提取面向边界框(OBB)图像补丁,随后使用微调的轻量级视觉语言模型(VLM)将其解析为结构化输出。为了开发和评估这一管道,我们整理了一个1367张2D机械图纸的数据集,这些图纸手动标注了9个关键类别:gds、一般公差、测量、材料、注释、半径、表面粗糙度、螺纹和标题块。YOLOv11-obb在此数据集上进行训练,检测obb并提取标注补丁。然后使用两个经过微调的开源vlm解析这些图像补丁。第一个是Donut,一个基于转换器的模型,它结合了swing -b视觉编码器和BART文本解码器,可以直接从图像进行端到端解析,而不依赖于OCR。第二个是Florence-2,这是一个提示驱动的编码器-解码器模型,它集成了DaViT视觉主干,并通过多模态令牌对齐支持结构化输出生成。这两种模型都是轻量级的,非常适合计算开销有限的专门工业任务。在对两种模型在与结构化注释标签配对的图像补丁的策划数据集上进行微调之后,进行了一个比较实验,以评估四个关键指标的解析性能。Donut优于Florence-2,准确率为89.2%,召回率为99.2%,f1得分为94%,幻觉率为10.8%。最后,一个案例研究展示了提取的结构化信息如何支持下游制造任务,如工艺和工具选择,展示了所提出的框架在现代化2D绘图解释中的实际用途。
{"title":"From drawings to decisions: A hybrid vision-language framework for parsing 2D engineering drawings into structured manufacturing knowledge","authors":"Muhammad Tayyab Khan ,&nbsp;Lequn Chen ,&nbsp;Zane Yong ,&nbsp;Jun Ming Tan ,&nbsp;Wenhe Feng ,&nbsp;Seung Ki Moon","doi":"10.1016/j.rcim.2025.103186","DOIUrl":"10.1016/j.rcim.2025.103186","url":null,"abstract":"<div><div>Efficient and accurate extraction of key information from 2D engineering drawings is essential for advancing digital manufacturing workflows. This information includes elements such as geometric dimensioning and tolerancing (GD&amp;T), measures, material specifications, and textual annotations. Manual extraction remains slow and labor-intensive, while generic optical character recognition (OCR) models often fail to interpret 2D drawings accurately due to complex layouts, engineering symbols, and rotated annotations. These limitations result in incomplete and unreliable outputs. To address these challenges, this paper proposes a hybrid vision-language framework that integrates a rotation-aware object detection model (YOLOv11-obb) with a transformer-based vision-language parser. We introduce a structured parsing pipeline that first applies YOLOv11-obb to localize annotations and extract oriented bounding box (OBB) image patches, which are subsequently parsed into structured outputs using a fine-tuned, lightweight vision-language model (VLM). To develop and evaluate this pipeline, we curate a dataset of 1367 2D mechanical drawings manually annotated across nine key categories: GD&amp;Ts, General Tolerances, Measures, Materials, Notes, Radii, Surface Roughness, Threads, and Title Blocks. YOLOv11-obb is trained on this dataset to detect OBBs and extract annotation patches. These image patches are then parsed using two fine-tuned open-source VLMs. The first is Donut, a transformer-based model that combines a Swin-B visual encoder with a BART text decoder, enabling end-to-end parsing directly from images without relying on OCR. The second is Florence-2, a prompt-driven encoder–decoder model that integrates a DaViT vision backbone and supports structured output generation through multimodal token alignment. Both models are lightweight and well-suited for specialized industrial tasks under limited computational overhead. Following fine-tuning of both models on the curated dataset of image patches paired with structured annotation labels, a comparative experiment is conducted to evaluate parsing performance across four key metrics. Donut outperforms Florence-2, achieving 89.2 % precision, 99.2 % recall, and a 94 % F1-score, with a hallucination rate of 10.8 %. Finally, a case study demonstrates how the extracted structured information supports downstream manufacturing tasks such as process and tool selection, showcasing the practical utility of the proposed framework in modernizing 2D drawing interpretation.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"99 ","pages":"Article 103186"},"PeriodicalIF":11.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-service combination optimization of manufacturing and logistics: models for self-managed and third-party logistics in cloud manufacturing 制造与物流双服务组合优化:云制造下的自营物流与第三方物流模式
IF 11.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-06-01 Epub Date: 2025-11-29 DOI: 10.1016/j.rcim.2025.103178
Chunhua Tang , Shuangyao Zhao , Ting Huang , Mark Goh
Service combination (SC) is a critical technique in cloud manufacturing, enabling the integration of multiple services to deliver value-added solutions. Logistics plays a pivotal role in SC by ensuring seamless coordination across various manufacturing stages, thereby maximizing the efficiency of production flows. This implies that the SC process must integrate both manufacturing services (MSs) and logistics services (LSs) to determine the optimal combination strategy. Prior research has focused mainly on MS performance, often overlooking the critical impact of logistics on SC outcomes. Although some studies have incorporated logistics considerations, they have largely treated logistics attributes as secondary components of MS evaluations or adopted linear aggregation methods to jointly configure MSs and LSs. These approaches fail to capture the dynamic nature of logistics performance and the interdependencies between MSs and LSs. To address these gaps, this study develops two optimization models for SC that integrate both MSs and LSs, tailored for self-managed and third-party logistics modes. In particular, an innovative bi-level optimization model is introduced to capture the sequential dependencies and dynamic interactions between MSs and LSs in logistics outsourcing, ensuring seamless integration. The upper level focuses on optimizing the MS selection, while the lower level identifies the optimal LSs based on the determined MSs. Improved genetic algorithms incorporating adaptive and parallel mechanisms are developed to address the models, dynamically adjusting parameters to improve solution accuracy and efficiency. Case studies and numerical experiments validate the effectiveness of the proposed models and algorithms, offering actionable managerial insights grounded in the results.
服务组合(SC)是云制造中的一项关键技术,可以集成多个服务以提供增值解决方案。物流在供应链中发挥着关键作用,确保了各个制造阶段的无缝协调,从而最大限度地提高了生产流程的效率。这意味着供应链过程必须整合制造服务(MSs)和物流服务(LSs),以确定最佳的组合策略。先前的研究主要集中在供应链绩效上,往往忽略了物流对供应链结果的关键影响。尽管一些研究纳入了物流方面的考虑,但它们在很大程度上将物流属性作为MS评估的次要组成部分,或采用线性聚合方法共同配置MS和ls。这些方法未能捕捉到物流绩效的动态性质以及物流服务提供商和物流服务提供商之间的相互依赖关系。为了解决这些差距,本研究针对自主管理和第三方物流模式开发了两种集成了物流管理和物流服务的物流管理优化模型。特别是,引入了创新的双层优化模型,以捕获物流外包中物流服务提供商和物流服务提供商之间的顺序依赖关系和动态交互,确保无缝集成。上层侧重于优化质谱选择,下层则根据确定的质谱识别出最优的质谱。采用改进的遗传算法,结合自适应和并行机制来求解模型,动态调整参数以提高求解精度和效率。案例研究和数值实验验证了所提出的模型和算法的有效性,提供了基于结果的可操作的管理见解。
{"title":"Dual-service combination optimization of manufacturing and logistics: models for self-managed and third-party logistics in cloud manufacturing","authors":"Chunhua Tang ,&nbsp;Shuangyao Zhao ,&nbsp;Ting Huang ,&nbsp;Mark Goh","doi":"10.1016/j.rcim.2025.103178","DOIUrl":"10.1016/j.rcim.2025.103178","url":null,"abstract":"<div><div>Service combination (SC) is a critical technique in cloud manufacturing, enabling the integration of multiple services to deliver value-added solutions. Logistics plays a pivotal role in SC by ensuring seamless coordination across various manufacturing stages, thereby maximizing the efficiency of production flows. This implies that the SC process must integrate both manufacturing services (MSs) and logistics services (LSs) to determine the optimal combination strategy. Prior research has focused mainly on MS performance, often overlooking the critical impact of logistics on SC outcomes. Although some studies have incorporated logistics considerations, they have largely treated logistics attributes as secondary components of MS evaluations or adopted linear aggregation methods to jointly configure MSs and LSs. These approaches fail to capture the dynamic nature of logistics performance and the interdependencies between MSs and LSs. To address these gaps, this study develops two optimization models for SC that integrate both MSs and LSs, tailored for self-managed and third-party logistics modes. In particular, an innovative bi-level optimization model is introduced to capture the sequential dependencies and dynamic interactions between MSs and LSs in logistics outsourcing, ensuring seamless integration. The upper level focuses on optimizing the MS selection, while the lower level identifies the optimal LSs based on the determined MSs. Improved genetic algorithms incorporating adaptive and parallel mechanisms are developed to address the models, dynamically adjusting parameters to improve solution accuracy and efficiency. Case studies and numerical experiments validate the effectiveness of the proposed models and algorithms, offering actionable managerial insights grounded in the results.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"99 ","pages":"Article 103178"},"PeriodicalIF":11.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145614037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-driven Single-Robot Multi-Cognitive Agent System architecture for human–robot collaborative disassembly 面向人机协同拆卸的图驱动单机器人多认知智能体系统架构
IF 11.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-06-01 Epub Date: 2025-12-18 DOI: 10.1016/j.rcim.2025.103207
Jianhao Lv, Jiahui Si, Wenchao Li, Ding Gao, Jinsong Bao
The inherent limitations of single-agent systems in tackling complex tasks, combined with the inefficiencies of traditional multi-agent paradigms—where task decomposition requires distribution among multiple robots, resulting in resource redundancy and escalated costs. To address this critical constraint, a graph-driven Single-Robot Multi-Cognitive Agent System architecture is proposed. Firstly, scene graphs are constructed to transform unstructured visual data from the environment into graph-based triplets. By aligning these triplets with pre-constructed knowledge graphs, historical memories are activated through graph matching to inform system decision-making with precedented insights. Then, an attention-driven collaboration mechanism dynamically designates leader and supporter roles among the different agents, ensuring adaptive role assignment based on contextual demands. Complementing this, a global optimization framework facilitates the collective evolution of the Single-Robot Multi-Cognitive Agent System, enhancing both individual agent performance and inter-agent collaboration. Finally, the Model Context Protocol orchestrates robotic execution by harmonizing external resource utilization with computational processes, ensuring seamless translation of decision outputs into physical actions. Experimental results demonstrate that the method exhibits strong robustness and generalizability in dynamic disassembly queries.
单智能体系统在处理复杂任务时的固有局限性,以及传统多智能体模式的低效率——任务分解需要在多个机器人之间进行分配,导致资源冗余和成本上升。为了解决这一关键约束,提出了一种图驱动的单机器人多认知智能体系统架构。首先,构建场景图,将环境中的非结构化视觉数据转换为基于图的三元组。通过将这些三元组与预先构建的知识图对齐,通过图匹配激活历史记忆,从而根据先前的见解为系统决策提供信息。然后,基于注意力驱动的协作机制,在不同的代理之间动态指定领导者和支持者角色,确保基于上下文需求的适应性角色分配。与此相辅相成的是,全局优化框架促进了单机器人多认知智能体系统的集体进化,提高了个体智能体的性能和智能体之间的协作。最后,模型上下文协议通过协调外部资源利用和计算过程来协调机器人的执行,确保将决策输出无缝地转化为物理行动。实验结果表明,该方法对动态拆解查询具有较强的鲁棒性和通用性。
{"title":"Graph-driven Single-Robot Multi-Cognitive Agent System architecture for human–robot collaborative disassembly","authors":"Jianhao Lv,&nbsp;Jiahui Si,&nbsp;Wenchao Li,&nbsp;Ding Gao,&nbsp;Jinsong Bao","doi":"10.1016/j.rcim.2025.103207","DOIUrl":"10.1016/j.rcim.2025.103207","url":null,"abstract":"<div><div>The inherent limitations of single-agent systems in tackling complex tasks, combined with the inefficiencies of traditional multi-agent paradigms—where task decomposition requires distribution among multiple robots, resulting in resource redundancy and escalated costs. To address this critical constraint, a graph-driven Single-Robot Multi-Cognitive Agent System architecture is proposed. Firstly, scene graphs are constructed to transform unstructured visual data from the environment into graph-based triplets. By aligning these triplets with pre-constructed knowledge graphs, historical memories are activated through graph matching to inform system decision-making with precedented insights. Then, an attention-driven collaboration mechanism dynamically designates leader and supporter roles among the different agents, ensuring adaptive role assignment based on contextual demands. Complementing this, a global optimization framework facilitates the collective evolution of the Single-Robot Multi-Cognitive Agent System, enhancing both individual agent performance and inter-agent collaboration. Finally, the Model Context Protocol orchestrates robotic execution by harmonizing external resource utilization with computational processes, ensuring seamless translation of decision outputs into physical actions. Experimental results demonstrate that the method exhibits strong robustness and generalizability in dynamic disassembly queries.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"99 ","pages":"Article 103207"},"PeriodicalIF":11.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145785002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A unified framework for large language model-guided reinforcement learning in digital twin industrial environments 数字孪生工业环境中大型语言模型引导强化学习的统一框架
IF 11.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-06-01 Epub Date: 2025-12-31 DOI: 10.1016/j.rcim.2025.103215
Haolin Fan , Edward Chow , Thomas Lu , Jerry Ying Hsi Fuh , Wen Feng Lu , Bingbing Li
Digital twin (DT) optimization in industrial environments faces persistent challenges, including sample inefficiency, extensive training requirements, and limited cross-domain adaptability. This paper presents a unified three-phase framework that integrates large language models (LLMs) with reinforcement learning (RL) via imitation learning (IL). The proposed approach comprises three key components: (1) offline expert demonstration collection using LLM-generated multi-agent coordination strategies, (2) offline and supervised IL to clone these strategies using a centralized training and decentralized execution (CTDE) architecture, and (3) lightweight RL fine-tuning to optimize the pre-trained policy. The system resolves equipment assignment conflicts and leverages coordination history for adaptive decision-making. Experiments in multi-agent industrial scenarios, including human–machine collaboration and fatigue-aware maintenance, demonstrate that our IL+RL hybrid reduces online training time by up to 96% while maintaining over 66% of optimal task performance, using only 4% of the training episodes required by standard RL. The approach also achieves 30%–40% task completion in zero-shot cross-domain settings (e.g., warehouse, manufacturing), and up to 99.7% with minimal fine-tuning. Conceptually, the framework establishes a new paradigm of ”language-conditioned IL,” where reasoning from general-purpose LLMs serves as an adaptive prior for efficient multi-agent coordination in DT. The results highlight how LLM-guided demonstrations can bridge symbolic reasoning and adaptive learning, offering both conceptual and practical advances for scalable, sample-efficient decision-making in Industry 5.0 systems.
工业环境中的数字孪生(DT)优化面临着持续的挑战,包括样本效率低下、广泛的培训要求和有限的跨领域适应性。本文提出了一个统一的三阶段框架,该框架通过模仿学习(IL)将大型语言模型(llm)与强化学习(RL)集成在一起。该方法包括三个关键部分:(1)使用llm生成的多智能体协调策略的离线专家演示集合,(2)使用集中训练和分散执行(CTDE)架构的离线和监督IL克隆这些策略,以及(3)轻量级RL微调以优化预训练策略。该系统解决了设备分配冲突,并利用协调历史进行自适应决策。在多智能体工业场景中的实验,包括人机协作和疲劳感知维护,表明我们的IL+RL混合方法将在线训练时间减少了96%,同时保持了超过66%的最佳任务性能,仅使用标准RL所需的4%的训练集。该方法还可以在零射击跨域设置(例如,仓库,制造)中实现30%-40%的任务完成率,并且在最小的微调下达到99.7%。从概念上讲,该框架建立了一个“语言条件IL”的新范式,其中来自通用llm的推理作为DT中有效的多代理协调的自适应先验。研究结果强调了法学硕士指导下的演示如何在符号推理和自适应学习之间架起桥梁,为工业5.0系统中可扩展的、样本高效的决策提供概念和实践上的进步。
{"title":"A unified framework for large language model-guided reinforcement learning in digital twin industrial environments","authors":"Haolin Fan ,&nbsp;Edward Chow ,&nbsp;Thomas Lu ,&nbsp;Jerry Ying Hsi Fuh ,&nbsp;Wen Feng Lu ,&nbsp;Bingbing Li","doi":"10.1016/j.rcim.2025.103215","DOIUrl":"10.1016/j.rcim.2025.103215","url":null,"abstract":"<div><div>Digital twin (DT) optimization in industrial environments faces persistent challenges, including sample inefficiency, extensive training requirements, and limited cross-domain adaptability. This paper presents a unified three-phase framework that integrates large language models (LLMs) with reinforcement learning (RL) via imitation learning (IL). The proposed approach comprises three key components: (1) offline expert demonstration collection using LLM-generated multi-agent coordination strategies, (2) offline and supervised IL to clone these strategies using a centralized training and decentralized execution (CTDE) architecture, and (3) lightweight RL fine-tuning to optimize the pre-trained policy. The system resolves equipment assignment conflicts and leverages coordination history for adaptive decision-making. Experiments in multi-agent industrial scenarios, including human–machine collaboration and fatigue-aware maintenance, demonstrate that our IL+RL hybrid reduces online training time by up to 96% while maintaining over 66% of optimal task performance, using only 4% of the training episodes required by standard RL. The approach also achieves 30%–40% task completion in zero-shot cross-domain settings (e.g., warehouse, manufacturing), and up to 99.7% with minimal fine-tuning. Conceptually, the framework establishes a new paradigm of ”language-conditioned IL,” where reasoning from general-purpose LLMs serves as an adaptive prior for efficient multi-agent coordination in DT. The results highlight how LLM-guided demonstrations can bridge symbolic reasoning and adaptive learning, offering both conceptual and practical advances for scalable, sample-efficient decision-making in Industry 5.0 systems.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"99 ","pages":"Article 103215"},"PeriodicalIF":11.4,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145883839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hierarchical spatial–aware algorithm with efficient reinforcement learning for human–robot task planning and allocation in production 面向生产中人机任务规划与分配的分层空间感知高效强化学习算法
IF 11.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-04-01 Epub Date: 2025-10-10 DOI: 10.1016/j.rcim.2025.103159
Jintao Xue, Xiao Li, Nianmin Zhang
In advanced manufacturing systems, humans and robots collaborate to conduct the production process. Effective task planning and allocation (TPA) is crucial for achieving high production efficiency, yet it remains challenging in complex and dynamic manufacturing environments. The dynamic nature of humans and robots, particularly the need to consider spatial information (e.g., humans’ real-time position and the distance they need to move to complete a task), substantially complicates TPA. To address the above challenges, we decompose production tasks into manageable subtasks. We then implement a real-time hierarchical human–robot TPA algorithm, including a high-level agent for task planning and a low-level agent for task allocation. For the high-level agent, we propose an efficient buffer-based deep Q-learning method (EBQ), which reduces training time and enhances performance in production problems with long-term and sparse reward challenges. For the low-level agent, a path planning-based spatially aware method (SAP) is designed to allocate tasks to the appropriate human–robot resources, thereby achieving the corresponding sequential subtasks. We conducted experiments on a complex real-time production process in a 3D simulator. The results demonstrate that our proposed EBQ&SAP method effectively addresses human–robot TPA problems in complex and dynamic production processes.
在先进的制造系统中,人类和机器人合作进行生产过程。有效的任务规划和分配(TPA)是实现高生产效率的关键,但在复杂和动态的制造环境中仍然具有挑战性。人类和机器人的动态特性,特别是需要考虑空间信息(例如,人类的实时位置和他们完成任务所需移动的距离),大大复杂化了TPA。为了解决上述挑战,我们将生产任务分解为可管理的子任务。然后,我们实现了一个实时分层人-机器人TPA算法,包括一个用于任务规划的高级代理和一个用于任务分配的低级代理。对于高级智能体,我们提出了一种高效的基于缓冲区的深度q学习方法(EBQ),该方法减少了训练时间,并提高了具有长期和稀疏奖励挑战的生产问题的性能。对于底层智能体,设计了一种基于路径规划的空间感知方法(SAP),将任务分配到适当的人机资源中,从而实现相应的顺序子任务。我们在3D模拟器上对一个复杂的实时生产过程进行了实验。结果表明,我们提出的EBQ&;SAP方法有效地解决了复杂动态生产过程中的人机TPA问题。
{"title":"A hierarchical spatial–aware algorithm with efficient reinforcement learning for human–robot task planning and allocation in production","authors":"Jintao Xue,&nbsp;Xiao Li,&nbsp;Nianmin Zhang","doi":"10.1016/j.rcim.2025.103159","DOIUrl":"10.1016/j.rcim.2025.103159","url":null,"abstract":"<div><div>In advanced manufacturing systems, humans and robots collaborate to conduct the production process. Effective task planning and allocation (TPA) is crucial for achieving high production efficiency, yet it remains challenging in complex and dynamic manufacturing environments. The dynamic nature of humans and robots, particularly the need to consider spatial information (e.g., humans’ real-time position and the distance they need to move to complete a task), substantially complicates TPA. To address the above challenges, we decompose production tasks into manageable subtasks. We then implement a real-time hierarchical human–robot TPA algorithm, including a high-level agent for task planning and a low-level agent for task allocation. For the high-level agent, we propose an efficient buffer-based deep Q-learning method (EBQ), which reduces training time and enhances performance in production problems with long-term and sparse reward challenges. For the low-level agent, a path planning-based spatially aware method (SAP) is designed to allocate tasks to the appropriate human–robot resources, thereby achieving the corresponding sequential subtasks. We conducted experiments on a complex real-time production process in a 3D simulator. The results demonstrate that our proposed EBQ&amp;SAP method effectively addresses human–robot TPA problems in complex and dynamic production processes.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103159"},"PeriodicalIF":11.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145261988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A lightweight object detection approach for precision gripping in multiple peg-in-hole assembly tasks 一种用于多孔钉装配任务中精确夹持的轻量目标检测方法
IF 11.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-04-01 Epub Date: 2025-11-21 DOI: 10.1016/j.rcim.2025.103185
Jianjun Jiao , Zonggang Li , Guangqing Xia , Guoping Wang , Yinjuan Chen , Ruibing Gao
Automating product assembly using manipulators in manufacturing remains challenging. This is mainly because detection and gripping prior to component assembly still depend heavily on manual operations and traditional teaching methods, resulting in a low overall level of automation. The primary difficulty in detection and gripping arises from the precise recognition of rotation angles and the complex demands for accuracy, real-time performance, and stability. This paper presents an improved lightweight model, IDPC-YOLOv8, for multiple peg-in-hole workpiece detection and gripping to address these challenges. The proposed approach integrates adaptive image preprocessing to enhance visual clarity under varying lighting conditions and employs an efficient network architecture that jointly exploits global and local features to improve detection precision and computational efficiency. In addition, a rotation-aware detection strategy is introduced to enable accurate prediction of object orientation. Moreover, a network optimization scheme further reduces model parameters, making the system suitable for real-time deployment. Experimental results reveal that the IDPC-YOLOv8 model achieves an accuracy of 97.8% and a detection speed of 126.59 FPS, representing improvements of 4% and 8.3%, respectively, over the original YOLOv8-OBB model. Compared to several state-of-the-art rotation detection models, IDPC-YOLOv8 demonstrates superior integration and generalization capabilities. The effectiveness of the proposed method is further validated through excellent gripping success rates achieved in real-world experiments using the AUBO-i5 manipulator.
在制造中使用机械手自动化产品装配仍然具有挑战性。这主要是因为组件组装前的检测和抓取仍然严重依赖人工操作和传统的教学方法,导致整体自动化水平较低。检测和抓握的主要困难来自旋转角度的精确识别以及对精度、实时性和稳定性的复杂要求。本文提出了一种改进的轻量级模型IDPC-YOLOv8,用于多个孔内钉工件检测和夹持,以解决这些挑战。该方法集成了自适应图像预处理以增强不同光照条件下的视觉清晰度,并采用高效的网络架构,共同利用全局和局部特征来提高检测精度和计算效率。此外,还引入了一种旋转感知检测策略,以实现对目标方向的准确预测。此外,网络优化方案进一步减少了模型参数,使系统适合实时部署。实验结果表明,IDPC-YOLOv8模型的准确率为97.8%,检测速度为126.59 FPS,比原YOLOv8-OBB模型分别提高了4%和8.3%。与几种最先进的旋转检测模型相比,IDPC-YOLOv8展示了卓越的集成和泛化能力。通过AUBO-i5机械手在实际实验中取得的优异抓取成功率,进一步验证了所提出方法的有效性。
{"title":"A lightweight object detection approach for precision gripping in multiple peg-in-hole assembly tasks","authors":"Jianjun Jiao ,&nbsp;Zonggang Li ,&nbsp;Guangqing Xia ,&nbsp;Guoping Wang ,&nbsp;Yinjuan Chen ,&nbsp;Ruibing Gao","doi":"10.1016/j.rcim.2025.103185","DOIUrl":"10.1016/j.rcim.2025.103185","url":null,"abstract":"<div><div>Automating product assembly using manipulators in manufacturing remains challenging. This is mainly because detection and gripping prior to component assembly still depend heavily on manual operations and traditional teaching methods, resulting in a low overall level of automation. The primary difficulty in detection and gripping arises from the precise recognition of rotation angles and the complex demands for accuracy, real-time performance, and stability. This paper presents an improved lightweight model, IDPC-YOLOv8, for multiple peg-in-hole workpiece detection and gripping to address these challenges. The proposed approach integrates adaptive image preprocessing to enhance visual clarity under varying lighting conditions and employs an efficient network architecture that jointly exploits global and local features to improve detection precision and computational efficiency. In addition, a rotation-aware detection strategy is introduced to enable accurate prediction of object orientation. Moreover, a network optimization scheme further reduces model parameters, making the system suitable for real-time deployment. Experimental results reveal that the IDPC-YOLOv8 model achieves an accuracy of 97.8% and a detection speed of 126.59 FPS, representing improvements of 4% and 8.3%, respectively, over the original YOLOv8-OBB model. Compared to several state-of-the-art rotation detection models, IDPC-YOLOv8 demonstrates superior integration and generalization capabilities. The effectiveness of the proposed method is further validated through excellent gripping success rates achieved in real-world experiments using the AUBO-i5 manipulator.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"98 ","pages":"Article 103185"},"PeriodicalIF":11.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145567450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Robotics and Computer-integrated Manufacturing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1