首页 > 最新文献

Advanced Engineering Informatics最新文献

英文 中文
Integrated optimization of non-permutation flow shop scheduling and maintenance under time-varying operating conditions considering quality control 时变工况下考虑质量控制的非排列流水车间调度与维修集成优化
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-28 DOI: 10.1016/j.aei.2026.104388
Zhijie Yang , Xinkai Hu , Yibing Li , Kaipu Wang , Shunsheng Guo , Zao Liu
Production, maintenance, and quality are key components influencing the performance of manufacturing systems and should be considered in an integrated manner. However, the production dimensions of previous integrated studies have primarily focused on inventory and lot sizing, overlooking the impact of production scheduling on maintenance and quality. In fact, product quality depends on the degradation state of quality-related components (QRC) in the corresponding machine at different times. Therefore, effective production scheduling needs to further incorporate quality considerations. Meanwhile, maintenance should be coordinated with scheduling to maintain high reliability of both machines and QRC without interrupting product processing. Thus, this study first establishes machine deterioration and product quality loss models considering QRC under time-varying conditions, respectively. Based on this, a mixed integer linear programming (MILP) model for non-permutation flow shops and maintenance is further constructed. An improved multi-objective co-evolutionary artificial bee colony algorithm (IMOCABC) is proposed. It uses six heuristic rules to generate a high-quality initial population. Four crossover operators and six problem-specific neighborhood search operators are applied to improve both global and local search ability and promote cooperative evolution. The effectiveness of the proposed improvement strategy was verified through 20 cases. Meanwhile, it is indicated that IMOCABC outperforms four advanced metaheuristic algorithms. The proposed model and algorithm are applied to an automotive engine manufacturing workshop, reducing the combined cost of preventive maintenance and quality loss from 25,806 (traditional scheme) to 11,877 (integrated scheme), achieving a 46% reduction.
生产、维护和质量是影响制造系统性能的关键组成部分,应该以综合的方式加以考虑。然而,以往综合研究的生产维度主要集中在库存和批量上,忽略了生产调度对维修和质量的影响。实际上,产品质量取决于相应机器中质量相关部件(QRC)在不同时间的退化状态。因此,有效的生产调度需要进一步纳入质量考虑。同时,维护要与调度协调,在不中断产品加工的前提下,保持机器和QRC的高可靠性。因此,本研究首先建立了时变条件下考虑质量rc的机器劣化和产品质量损失模型。在此基础上,进一步构造了非排列流车间和维修的混合整数线性规划(MILP)模型。提出一种改进的多目标协同进化人工蜂群算法(IMOCABC)。它使用六条启发式规则来生成高质量的初始种群。采用4个交叉算子和6个问题邻域搜索算子,提高全局和局部搜索能力,促进协同进化。通过20个案例验证了改进策略的有效性。同时,IMOCABC算法优于四种先进的元启发式算法。将所提出的模型和算法应用于某汽车发动机制造车间,将预防性维修和质量损失的总成本从传统方案的25806降低到集成方案的11877,降低了46%。
{"title":"Integrated optimization of non-permutation flow shop scheduling and maintenance under time-varying operating conditions considering quality control","authors":"Zhijie Yang ,&nbsp;Xinkai Hu ,&nbsp;Yibing Li ,&nbsp;Kaipu Wang ,&nbsp;Shunsheng Guo ,&nbsp;Zao Liu","doi":"10.1016/j.aei.2026.104388","DOIUrl":"10.1016/j.aei.2026.104388","url":null,"abstract":"<div><div>Production, maintenance, and quality are key components influencing the performance of manufacturing systems and should be considered in an integrated manner. However, the production dimensions of previous integrated studies have primarily focused on inventory and lot sizing, overlooking the impact of production scheduling on maintenance and quality. In fact, product quality depends on the degradation state of quality-related components (QRC) in the corresponding machine at different times. Therefore, effective production scheduling needs to further incorporate quality considerations. Meanwhile, maintenance should be coordinated with scheduling to maintain high reliability of both machines and QRC without interrupting product processing. Thus, this study first establishes machine deterioration and product quality loss models considering QRC under time-varying conditions, respectively. Based on this, a mixed integer linear programming (MILP) model for non-permutation flow shops and maintenance is further constructed. An improved multi-objective co-evolutionary artificial bee colony algorithm (IMOCABC) is proposed. It uses six heuristic rules to generate a high-quality initial population. Four crossover operators and six problem-specific neighborhood search operators are applied to improve both global and local search ability and promote cooperative evolution. The effectiveness of the proposed improvement strategy was verified through 20 cases. Meanwhile, it is indicated that IMOCABC outperforms four advanced metaheuristic algorithms. The proposed model and algorithm are applied to an automotive engine manufacturing workshop, reducing the combined cost of preventive maintenance and quality loss from 25,806 (traditional scheme) to 11,877 (integrated scheme), achieving a 46% reduction.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104388"},"PeriodicalIF":9.9,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
First things first: Effects of sequential AR/VR exposure on skill acquisition in industrial training 重要的是:顺序AR/VR暴露对工业培训中技能习得的影响
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1016/j.aei.2026.104328
Varun Phadke , Casper Harteveld , Kemi Jona , Mohsen Moghaddam
This paper explores the distinctive and collective affordances of augmented reality (AR) and virtual reality (VR) for industrial training, with a focus on their integrated use and deployment strategies. AR and VR applications were developed to conduct a two-stage, between-subjects user study on a real-life cold spray additive manufacturing task, with varying orders of exposure to AR/VR training. Results reveal nuanced adaptation patterns, indicating that VR-first training reduces the cognitive load during subsequent AR-guided training, thereby enhancing confidence and task efficiency. Conversely, AR-first training supports procedural grounding but presents challenges when transitioning to the immersive spatial demands of VR training. Interestingly, task completion times were found to be independent of the order of exposure, highlighting the flexibility of deployment strategies. Clustering analysis further identifies distinct participant response patterns, offering deeper insights into workload, learning effectiveness, retention, and types of errors. These findings emphasize the importance of leveraging task understanding before the deployment of AR and VR to maximize learning outcomes in complex psychomotor tasks.
本文探讨了增强现实(AR)和虚拟现实(VR)在工业培训中的独特和集体能力,重点是它们的综合使用和部署策略。开发了AR和VR应用程序,对现实生活中的冷喷涂增材制造任务进行了两阶段的受试者之间的用户研究,并进行了不同顺序的AR/VR培训。结果揭示了细微的适应模式,表明vr先行训练减少了后续ar引导训练中的认知负荷,从而增强了信心和任务效率。相反,ar优先培训支持程序基础,但在过渡到VR培训的沉浸式空间需求时提出了挑战。有趣的是,任务完成时间与暴露的顺序无关,突出了部署策略的灵活性。聚类分析进一步确定不同的参与者响应模式,从而更深入地了解工作量、学习效率、保留和错误类型。这些发现强调了在部署AR和VR之前利用任务理解的重要性,以最大限度地提高复杂精神运动任务的学习效果。
{"title":"First things first: Effects of sequential AR/VR exposure on skill acquisition in industrial training","authors":"Varun Phadke ,&nbsp;Casper Harteveld ,&nbsp;Kemi Jona ,&nbsp;Mohsen Moghaddam","doi":"10.1016/j.aei.2026.104328","DOIUrl":"10.1016/j.aei.2026.104328","url":null,"abstract":"<div><div>This paper explores the distinctive and collective affordances of augmented reality (AR) and virtual reality (VR) for industrial training, with a focus on their integrated use and deployment strategies. AR and VR applications were developed to conduct a two-stage, between-subjects user study on a real-life cold spray additive manufacturing task, with varying orders of exposure to AR/VR training. Results reveal nuanced adaptation patterns, indicating that VR-first training reduces the cognitive load during subsequent AR-guided training, thereby enhancing confidence and task efficiency. Conversely, AR-first training supports procedural grounding but presents challenges when transitioning to the immersive spatial demands of VR training. Interestingly, task completion times were found to be independent of the order of exposure, highlighting the flexibility of deployment strategies. Clustering analysis further identifies distinct participant response patterns, offering deeper insights into workload, learning effectiveness, retention, and types of errors. These findings emphasize the importance of leveraging task understanding before the deployment of AR and VR to maximize learning outcomes in complex psychomotor tasks.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104328"},"PeriodicalIF":9.9,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-label sewer defect classification based on CLIP with fine-to-coarse contextual representations 基于精细到粗上下文表示的CLIP多标签下水道缺陷分类
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1016/j.aei.2026.104362
Yisu Ge , Jialuo Guo , Zhihao Yang , Zhaomin Chen , Liyan Chen , Guodao Zhang
Sewer defect recognition is a critical foundation for urban drainage systems, by analyzing the video in the sewer to find the problems. Contrastive Language-Image Pre-training model(CLIP) performs well on general vision tasks but misses the fine-grained structural variations and localized defect features, resulting in limited performance in practical sewer defect classification. Therefore, a CLIP based multi-label sewer defect classification method is proposed, which leverages the transfer capability of large language model and integrates fine-grained visual-linguistic features. To tackle the problem of insufficient fine-grained defect feature extraction, the Prompt-based Contextual Representation Construction (PCRC) module is designed, leveraging learnable prompts and a two-stage modeling strategy to capture fine-to-coarse contextual representations for each category. Furthermore, the Feature-Level Matching (FLM) module is introduced to align the fine-grained image-text feature for improving defect recognition accuracy. Finally, the ablation studies and extensive comparisons with advanced methods on the public dataset Sewer-ML is presented. Experimental results demonstrate that the proposed approach achieves state-of-the-art performance, of which the mAP and F1-score achieve 75.02% and 80.08%, respectively.
下水道缺陷识别是城市排水系统的重要基础,通过分析下水道中的视频来发现问题。对比语言图像预训练模型(CLIP)在一般视觉任务中表现良好,但缺少细粒度结构变化和局部缺陷特征,在实际下水道缺陷分类中性能有限。为此,提出了一种基于CLIP的多标签下水道缺陷分类方法,该方法利用了大语言模型的迁移能力,并融合了细粒度的视觉语言特征。为了解决细粒度缺陷特征提取不足的问题,设计了基于提示的上下文表示构建(PCRC)模块,利用可学习的提示和两阶段建模策略来捕获每个类别的从细到粗的上下文表示。在此基础上,引入特征级匹配(FLM)模块对细粒度的图像-文本特征进行对齐,提高缺陷识别精度。最后,介绍了在公共数据集下水道- ml上的消融研究以及与先进方法的广泛比较。实验结果表明,该方法达到了最先进的性能,mAP和f1得分分别达到75.02%和80.08%。
{"title":"Multi-label sewer defect classification based on CLIP with fine-to-coarse contextual representations","authors":"Yisu Ge ,&nbsp;Jialuo Guo ,&nbsp;Zhihao Yang ,&nbsp;Zhaomin Chen ,&nbsp;Liyan Chen ,&nbsp;Guodao Zhang","doi":"10.1016/j.aei.2026.104362","DOIUrl":"10.1016/j.aei.2026.104362","url":null,"abstract":"<div><div>Sewer defect recognition is a critical foundation for urban drainage systems, by analyzing the video in the sewer to find the problems. Contrastive Language-Image Pre-training model(CLIP) performs well on general vision tasks but misses the fine-grained structural variations and localized defect features, resulting in limited performance in practical sewer defect classification. Therefore, a CLIP based multi-label sewer defect classification method is proposed, which leverages the transfer capability of large language model and integrates fine-grained visual-linguistic features. To tackle the problem of insufficient fine-grained defect feature extraction, the Prompt-based Contextual Representation Construction (PCRC) module is designed, leveraging learnable prompts and a two-stage modeling strategy to capture fine-to-coarse contextual representations for each category. Furthermore, the Feature-Level Matching (FLM) module is introduced to align the fine-grained image-text feature for improving defect recognition accuracy. Finally, the ablation studies and extensive comparisons with advanced methods on the public dataset Sewer-ML is presented. Experimental results demonstrate that the proposed approach achieves state-of-the-art performance, of which the mAP and F1-score achieve 75.02% and 80.08%, respectively.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104362"},"PeriodicalIF":9.9,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing aviation safety with artificial intelligence: A systematic literature review on recent advances, challenges and future perspectives 用人工智能提高航空安全:对近期进展、挑战和未来展望的系统文献综述
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1016/j.aei.2026.104378
Cho Yin Yiu , Wen-Chin Li , Kam K.H. Ng , Chia-Fen Chi , Jens Schiefele
The global air traffic is projected to grow significantly in the coming decades, leading to denser airspace and higher operational complexities. Therefore, academic and practitioners are now unleashing the potential of artificial intelligence (AI), particularly the recent advances in large language models (LLM), computer vision, and speech recognition in enhancing aviation safety through advanced cockpit design, AI assistants, human performance monitoring, and supporting air accident investigations. These applications demonstrate a significant promise in enhancing aviation safety. Nevertheless, there are still challenges in applying safe and reliable AI in supporting these safety–critical domains. Indeed, many aviation safety issues, such as accident analysis, human factors, and preventive system designs, are interconnected instead of standalone issues. This systematic literature review explores the recent advances, challenges, and future perspectives on leveraging AI to enhance aviation safety from a macro perspective. Therefore, a framework is established to review relevant studies. First, we identify the relevant literature from initial search, inspection, and screening. After that, we analyse the domains applied and the models leveraged in aviation safety enhancement on the 175 selected studies using content analysis. Then, thematic analysis is applied to reveal the challenges of applying safe and reliable AI in aviation safety. Given the challenges identified, this review recommends future work to incorporate explainable AI, develop AI certification frameworks, design based on hybrid intelligence, and adopt diversified dataset for generalisation.
未来几十年,全球空中交通预计将大幅增长,导致空域更密集,运营复杂性更高。因此,学术界和实践者现在正在释放人工智能(AI)的潜力,特别是最近在大型语言模型(LLM)、计算机视觉和语音识别方面的进展,通过先进的驾驶舱设计、人工智能助手、人类表现监测和支持航空事故调查来提高航空安全。这些应用在提高航空安全方面显示出巨大的希望。然而,在应用安全可靠的人工智能来支持这些安全关键领域方面仍然存在挑战。事实上,许多航空安全问题,如事故分析、人为因素和预防系统设计,都是相互联系的,而不是单独的问题。这篇系统的文献综述从宏观角度探讨了利用人工智能提高航空安全的最新进展、挑战和未来前景。因此,本文建立了一个框架来回顾相关研究。首先,我们从最初的搜索、检查和筛选中确定相关文献。之后,我们对175项选定的研究使用内容分析分析了航空安全增强的应用领域和模型。然后,通过专题分析,揭示了安全可靠的人工智能在航空安全中的应用所面临的挑战。鉴于所确定的挑战,本综述建议未来的工作包括纳入可解释的人工智能,开发人工智能认证框架,基于混合智能的设计,并采用多样化的数据集进行推广。
{"title":"Enhancing aviation safety with artificial intelligence: A systematic literature review on recent advances, challenges and future perspectives","authors":"Cho Yin Yiu ,&nbsp;Wen-Chin Li ,&nbsp;Kam K.H. Ng ,&nbsp;Chia-Fen Chi ,&nbsp;Jens Schiefele","doi":"10.1016/j.aei.2026.104378","DOIUrl":"10.1016/j.aei.2026.104378","url":null,"abstract":"<div><div>The global air traffic is projected to grow significantly in the coming decades, leading to denser airspace and higher operational complexities. Therefore, academic and practitioners are now unleashing the potential of artificial intelligence (AI), particularly the recent advances in large language models (LLM), computer vision, and speech recognition in enhancing aviation safety through advanced cockpit design, AI assistants, human performance monitoring, and supporting air accident investigations. These applications demonstrate a significant promise in enhancing aviation safety. Nevertheless, there are still challenges in applying safe and reliable AI in supporting these safety–critical domains. Indeed, many aviation safety issues, such as accident analysis, human factors, and preventive system designs, are interconnected instead of standalone issues. This systematic literature review explores the recent advances, challenges, and future perspectives on leveraging AI to enhance aviation safety from a macro perspective. Therefore, a framework is established to review relevant studies. First, we identify the relevant literature from initial search, inspection, and screening. After that, we analyse the domains applied and the models leveraged in aviation safety enhancement on the 175 selected studies using content analysis. Then, thematic analysis is applied to reveal the challenges of applying safe and reliable AI in aviation safety. Given the challenges identified, this review recommends future work to incorporate explainable AI, develop AI certification frameworks, design based on hybrid intelligence, and adopt diversified dataset for generalisation.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104378"},"PeriodicalIF":9.9,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A unified LLM-KG framework for low‑annotation urban rail transit signal system operation: knowledge acquisition and dynamic update 低标注城市轨道交通信号系统运行的统一LLM-KG框架:知识获取和动态更新
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.aei.2026.104327
Wei Cai , Xiaomin Zhu , Zeyu Sun , Aihui Ye , Guanhua Fu , Runtong Zhang
Intelligent operation and maintenance (O&M) of urban rail transit signal systems (URTSS) is essential for ensuring train safety and operational efficiency. However, most O&M data exist as unstructured and sparsely labeled texts, posing major challenges for reliable knowledge extraction, semantic reasoning, and dynamic knowledge management. To address these issues, this paper proposes a unified large language model-knowledge graph framework (ULLM-KG) tailored for low-annotation, knowledge-intensive O&M environments. Firstly, a bidirectional knowledge graph construction mechanism (BKGC) is introduced to rapidly build a domain-specific initial knowledge graph. Secondly, a KG-enhanced distantly supervised entity and event extraction method (KG-DS3E) is designed to enhance critical knowledge extraction accuracy from unstructured texts. Thirdly, a prompt-driven knowledge-enhanced reasoning method (PD-KER) is proposed to improve semantic quality in fault diagnosis and maintenance recommendations. Lastly, a dynamic knowledge graph updating mechanism with temporal awareness and conflict resolution (DKG-UCF) is used to ensure efficient and accurate knowledge evolution. Based on real-world URTSS O&M data, experimental evaluations are conducted on state-of-the-art LLMs (GPT-4o, DeepSeek-V3, and Qwen3-32B). On datasets with varying annotation ratios and rare faults, ULLM-KG demonstrates significantly superior performance in knowledge extraction and reasoning tasks compared to other state-of-the-art methods. Its ability to dynamically update knowledge is also verified to be excellent. ULLM-KG provides a general solution for the intelligent O&M of URTSS under low-annotation conditions.
城市轨道交通信号系统(URTSS)的智能运维是保障列车安全和运行效率的关键。然而,大多数O&;M数据以非结构化和稀疏标记的文本形式存在,这对可靠的知识提取、语义推理和动态知识管理提出了重大挑战。为了解决这些问题,本文提出了一个统一的大型语言模型-知识图框架(ULLM-KG),该框架专为低注释、知识密集型的操作和管理环境量身定制。首先,引入双向知识图谱构建机制(BKGC),快速构建特定领域的初始知识图谱;其次,设计了一种kg增强的远程监督实体和事件提取方法(KG-DS3E),以提高从非结构化文本中提取关键知识的准确性。再次,提出了一种提示驱动的知识增强推理方法(PD-KER),以提高故障诊断和维修建议的语义质量。最后,采用一种具有时间感知和冲突解决的动态知识图更新机制(DKG-UCF)来保证知识进化的高效和准确。基于真实的URTSS o&m数据,在最先进的llm (gpt - 40、DeepSeek-V3和Qwen3-32B)上进行了实验评估。在具有不同标注比率和罕见错误的数据集上,ULLM-KG在知识提取和推理任务中表现出明显优于其他最先进方法的性能。其动态更新知识的能力也被证明是优秀的。ULLM-KG为低标注条件下URTSS的智能运维提供了一种通用的解决方案。
{"title":"A unified LLM-KG framework for low‑annotation urban rail transit signal system operation: knowledge acquisition and dynamic update","authors":"Wei Cai ,&nbsp;Xiaomin Zhu ,&nbsp;Zeyu Sun ,&nbsp;Aihui Ye ,&nbsp;Guanhua Fu ,&nbsp;Runtong Zhang","doi":"10.1016/j.aei.2026.104327","DOIUrl":"10.1016/j.aei.2026.104327","url":null,"abstract":"<div><div>Intelligent operation and maintenance (O&amp;M) of urban rail transit signal systems (URTSS) is essential for ensuring train safety and operational efficiency. However, most O&amp;M data exist as unstructured and sparsely labeled texts, posing major challenges for reliable knowledge extraction, semantic reasoning, and dynamic knowledge management. To address these issues, this paper proposes a unified large language model-knowledge graph framework (ULLM-KG) tailored for low-annotation, knowledge-intensive O&amp;M environments. Firstly, a bidirectional knowledge graph construction mechanism (BKGC) is introduced to rapidly build a domain-specific initial knowledge graph. Secondly, a KG-enhanced distantly supervised entity and event extraction method (KG-DS3E) is designed to enhance critical knowledge extraction accuracy from unstructured texts. Thirdly, a prompt-driven knowledge-enhanced reasoning method (PD-KER) is proposed to improve semantic quality in fault diagnosis and maintenance recommendations. Lastly, a dynamic knowledge graph updating mechanism with temporal awareness and conflict resolution (DKG-UCF) is used to ensure efficient and accurate knowledge evolution. Based on real-world URTSS O&amp;M data, experimental evaluations are conducted on state-of-the-art LLMs (GPT-4o, DeepSeek-V3, and Qwen3-32B). On datasets with varying annotation ratios and rare faults, ULLM-KG demonstrates significantly superior performance in knowledge extraction and reasoning tasks compared to other state-of-the-art methods. Its ability to dynamically update knowledge is also verified to be excellent. ULLM-KG provides a general solution for the intelligent O&amp;M of URTSS under low-annotation conditions.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104327"},"PeriodicalIF":9.9,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FACTS: Training-free zero-shot diffusion framework for facade texture restoration in 3D urban models 事实:用于3D城市模型立面纹理恢复的无训练零射击扩散框架
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.aei.2026.104385
Juexiao Cheng , Xiangru Huang , Guanzhou Chen , Tong Wang , Jiaqi Wang , Xiaoliang Tan , Aiyi Jiang , Xiaodong Zhang
High-fidelity facade texture restoration is crucial for the realism and utility of 3D urban models in digital twin applications. Low-quality textures can compromise visualization, simulation accuracy, and decision-making. This challenge is particularly evident in Level of Detail 1 and 2 (LoD-1 and LoD-2) models, which represent buildings as basic massing models. In these models, textures baked from complex 3D mesh sources often suffer from geometric distortions, occlusions, and inconsistent illumination. To address these issues, we introduce FACTS (Facade Automated Correction and Texture Synthesis), a novel zero-shot, training-free framework for facade texture restoration. FACTS operates as an automated pipeline, taking 3D Mesh as input and producing geometrically and photometrically corrected models. Its key innovations are as follows: (1) a prompt-guided, occlusion-aware inpainting module that uses semantic guidance to repair missing texture regions; (2) a multi-scale edge-feature-guided diffusion process that enforces geometric consistency by leveraging structural priors extracted from the image; and (3) an efficient illumination harmonization method in the CIELAB color space to resolve lighting inconsistencies across texture patches. Recognizing that conventional metrics fail to assess architectural integrity, we propose three novel metrics: the Edge Straightness Score (ESS), Hough Transform Line Consistency (HTLC), and Linearity Index (LI). Our experiments on the SFDB and RUF-3D datasets show significant improvements over baselines. Specifically, FACTS improved ESS, HTLC, and LI scores on degraded textures by 40.69%, 11.16%, and 54.76%, respectively. The framework processes 2.5-megapixel texture in approximately 58.8 s on a single consumer-grade GPU. This work provides a scalable and interpretable solution for the automated restoration of defective facade textures, thereby enhancing the visual realism and structural accuracy of existing 3D urban models. Code and data available at https://github.com/CVEO/FACTS.
在数字孪生应用中,高保真立面纹理修复对于三维城市模型的真实感和实用性至关重要。低质量的纹理会影响可视化、模拟精度和决策。这一挑战在细节级别1和2 (LoD-1和LoD-2)模型中尤为明显,它们将建筑物表示为基本的体块模型。在这些模型中,从复杂的3D网格源烘烤的纹理经常遭受几何扭曲,遮挡和不一致的照明。为了解决这些问题,我们引入了FACTS(立面自动校正和纹理合成),这是一种新的零拍摄,无需训练的立面纹理恢复框架。FACTS作为自动化管道运行,以3D网格为输入,并产生几何和光度校正模型。其主要创新点如下:(1)基于语义引导修复缺失纹理区域的快速引导、闭塞感知的补图模块;(2)利用从图像中提取的结构先验来增强几何一致性的多尺度边缘特征引导扩散过程;(3)在CIELAB色彩空间中采用一种高效的光照协调方法来解决纹理斑块间的光照不一致问题。认识到传统的度量标准无法评估建筑的完整性,我们提出了三个新的度量标准:边缘直线度评分(ESS)、霍夫变换线一致性(HTLC)和线性度指数(LI)。我们在SFDB和RUF-3D数据集上的实验表明,与基线相比,我们有了显著的改进。具体来说,FACTS在退化纹理上分别提高了40.69%、11.16%和54.76%的ESS、HTLC和LI分数。该框架在单个消费级GPU上处理250万像素的纹理大约58.8秒。这项工作为有缺陷的立面纹理的自动修复提供了一个可扩展和可解释的解决方案,从而提高了现有3D城市模型的视觉真实感和结构准确性。代码和数据可在https://github.com/CVEO/FACTS上获得。
{"title":"FACTS: Training-free zero-shot diffusion framework for facade texture restoration in 3D urban models","authors":"Juexiao Cheng ,&nbsp;Xiangru Huang ,&nbsp;Guanzhou Chen ,&nbsp;Tong Wang ,&nbsp;Jiaqi Wang ,&nbsp;Xiaoliang Tan ,&nbsp;Aiyi Jiang ,&nbsp;Xiaodong Zhang","doi":"10.1016/j.aei.2026.104385","DOIUrl":"10.1016/j.aei.2026.104385","url":null,"abstract":"<div><div>High-fidelity facade texture restoration is crucial for the realism and utility of 3D urban models in digital twin applications. Low-quality textures can compromise visualization, simulation accuracy, and decision-making. This challenge is particularly evident in Level of Detail 1 and 2 (LoD-1 and LoD-2) models, which represent buildings as basic massing models. In these models, textures baked from complex 3D mesh sources often suffer from geometric distortions, occlusions, and inconsistent illumination. To address these issues, we introduce FACTS (Facade Automated Correction and Texture Synthesis), a novel zero-shot, training-free framework for facade texture restoration. FACTS operates as an automated pipeline, taking 3D Mesh as input and producing geometrically and photometrically corrected models. Its key innovations are as follows: (1) a prompt-guided, occlusion-aware inpainting module that uses semantic guidance to repair missing texture regions; (2) a multi-scale edge-feature-guided diffusion process that enforces geometric consistency by leveraging structural priors extracted from the image; and (3) an efficient illumination harmonization method in the CIELAB color space to resolve lighting inconsistencies across texture patches. Recognizing that conventional metrics fail to assess architectural integrity, we propose three novel metrics: the Edge Straightness Score (ESS), Hough Transform Line Consistency (HTLC), and Linearity Index (LI). Our experiments on the SFDB and RUF-3D datasets show significant improvements over baselines. Specifically, FACTS improved ESS, HTLC, and LI scores on degraded textures by 40.69%, 11.16%, and 54.76%, respectively. The framework processes 2.5-megapixel texture in approximately 58.8 s on a single consumer-grade GPU. This work provides a scalable and interpretable solution for the automated restoration of defective facade textures, thereby enhancing the visual realism and structural accuracy of existing 3D urban models. Code and data available at <span><span>https://github.com/CVEO/FACTS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104385"},"PeriodicalIF":9.9,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel discriminative joint adversarial network for quantitatively detecting wheel polygonization of heavy-haul locomotives across variable running conditions 一种用于重载机车变工况下车轮多边形定量检测的新型判别联合对抗网络
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.aei.2026.104377
Maoyong Dong, Shiqian Chen, Hongbing Wang, Wanming Zhai
Timely quantitative detection of wheel polygonal wear is of great significance for railway maintenance and improving the train running quality. However, existing deep learning-based detection methods struggle with speed variation-induced feature distribution shifts, exhibiting weak transferability and failing to achieve quantitative diagnosis of wheel defects. To address these issues, a novel discriminative joint adversarial network (NDJAN) for polygonal fault detection under varying running speeds is proposed in this paper. A multi-branch parallel ResNet is first developed to extract sensitive features from raw signals using shortcut connections, which can preserve critical wear amplitude-related information and alleviate gradient vanishing problems. Then, a two-level discriminative feature fusion (TLDFF) scheme is designed with a hybrid attention mechanism and lightweight depthwise separable convolutions. The former is employed to amplify discriminative features, while the latter achieves intelligent fusion of multi-branch features through learnable weighting coefficients, ensuring optimal integration of complementary information from different branches. Finally, an implicit-explicit joint distribution alignment (IEJDA) strategy is presented to address fundamental transfer distribution discrepancies under variable operating conditions. This module accomplishes global distribution matching and fine-grained adaptation of decision boundaries by acting on the feature layer and regression decision layer, respectively. Both dynamics simulations and field tests are carried out to demonstrate that the proposed NDJAN approach can effectively and accurately detect the polygonal wear amplitudes.
车轮多边形磨损的及时定量检测对铁路维修和提高列车运行质量具有重要意义。然而,现有的基于深度学习的检测方法难以应对速度变化引起的特征分布偏移,可转移性较弱,无法实现车轮缺陷的定量诊断。针对这些问题,本文提出了一种新的用于变转速下多边形故障检测的判别联合对抗网络(NDJAN)。首先开发了一个多分支并行ResNet,使用快捷连接从原始信号中提取敏感特征,可以保留关键磨损幅度相关信息并缓解梯度消失问题。然后,设计了一种混合注意机制和轻量级深度可分离卷积的两级判别特征融合方案。前者用于放大判别特征,后者通过可学习的加权系数实现多分支特征的智能融合,保证不同分支互补信息的最优融合。最后,提出了一种隐式显式联合分配对齐(IEJDA)策略来解决变工况下的基本转移分配差异。该模块分别作用于特征层和回归决策层,实现全局分布匹配和决策边界的细粒度自适应。动力学仿真和现场试验结果表明,所提出的NDJAN方法能够有效、准确地检测多边形磨损幅值。
{"title":"A novel discriminative joint adversarial network for quantitatively detecting wheel polygonization of heavy-haul locomotives across variable running conditions","authors":"Maoyong Dong,&nbsp;Shiqian Chen,&nbsp;Hongbing Wang,&nbsp;Wanming Zhai","doi":"10.1016/j.aei.2026.104377","DOIUrl":"10.1016/j.aei.2026.104377","url":null,"abstract":"<div><div>Timely quantitative detection of wheel polygonal wear is of great significance for railway maintenance and improving the train running quality. However, existing deep learning-based detection methods struggle with speed variation-induced feature distribution shifts, exhibiting weak transferability and failing to achieve quantitative diagnosis of wheel defects. To address these issues, a novel discriminative joint adversarial network (NDJAN) for polygonal fault detection under varying running speeds is proposed in this paper. A multi-branch parallel ResNet is first developed to extract sensitive features from raw signals using shortcut connections, which can preserve critical wear amplitude-related information and alleviate gradient vanishing problems. Then, a two-level discriminative feature fusion (TLDFF) scheme is designed with a hybrid attention mechanism and lightweight depthwise separable convolutions. The former is employed to amplify discriminative features, while the latter achieves intelligent fusion of multi-branch features through learnable weighting coefficients, ensuring optimal integration of complementary information from different branches. Finally, an implicit-explicit joint distribution alignment (IEJDA) strategy is presented to address fundamental transfer distribution discrepancies under variable operating conditions. This module accomplishes global distribution matching and fine-grained adaptation of decision boundaries by acting on the feature layer and regression decision layer, respectively. Both dynamics simulations and field tests are carried out to demonstrate that the proposed NDJAN approach can effectively and accurately detect the polygonal wear amplitudes.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104377"},"PeriodicalIF":9.9,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforcement learning-based hyper-heuristic algorithm for multi-warehouse and multi-machine agricultural machinery operation scheduling problem considering soil conditions and carbon emissions 考虑土壤条件和碳排放的多仓多机农机作业调度问题基于强化学习的超启发式算法
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.aei.2026.104346
Tengfei Wu , Lanyue Zhang , YingLu He , Liangcheng Zhou , Yiming Chen , Qing Yuan , Xingyun Duan , Xiaorong Lv
Multi-machine coordinated scheduling systems are a key technology for efficient production in unmanned farms and play an important role in advancing agricultural informatization. However, existing agricultural machinery scheduling studies often neglect cross-regional operational requirements in China’s hilly and mountainous areas and overlook the influence of soil moisture on machinery performance. To address these challenges, this study formulates a multi-warehouse and multi-machine agricultural machinery scheduling model and develops a tri-objective optimization framework to simultaneously minimize travel distance, operation time, and carbon emissions. A reinforcement learning-based hyper-heuristic algorithm (RLHHA) is proposed, which adopts a five-layer encoding scheme tailored to the problem structure and integrates eight customized low-level heuristics with a Q-learning controller. Hypervolume and spacing metrics are employed as state features to guide the adaptive selection of heuristics, thereby improving the accuracy and stability of scheduling decisions. Extensive experiments are conducted on six benchmark instances of different scales. Comparative results with three classical algorithms and two advanced hybrid algorithms demonstrate that the proposed RLHHA achieves superior performance in terms of solution accuracy, convergence quality, and robustness. The results indicate that the proposed model and algorithm can effectively support accurate, reliable, and sustainable decision-making for cross-regional agricultural machinery scheduling in real-world scenarios.
多机协同调度系统是实现无人农场高效生产的关键技术,对推进农业信息化具有重要作用。然而,现有的农机调度研究往往忽视了中国丘陵山区的跨区域作业要求,忽略了土壤湿度对机械性能的影响。针对这些挑战,本研究构建了多仓多机农机调度模型,并构建了三目标优化框架,以同时实现出行距离、运行时间和碳排放最小化。提出了一种基于强化学习的超启发式算法(RLHHA),该算法采用针对问题结构定制的五层编码方案,并将8种定制的低级启发式算法与Q-learning控制器集成在一起。采用超体积和间隔度量作为状态特征,指导启发式算法的自适应选择,从而提高调度决策的准确性和稳定性。在六个不同尺度的基准实例上进行了大量的实验。与三种经典算法和两种先进混合算法的比较结果表明,该算法在求解精度、收敛质量和鲁棒性方面均取得了较好的效果。结果表明,该模型和算法能够有效支持现实场景下跨区域农机调度决策的准确、可靠和可持续。
{"title":"Reinforcement learning-based hyper-heuristic algorithm for multi-warehouse and multi-machine agricultural machinery operation scheduling problem considering soil conditions and carbon emissions","authors":"Tengfei Wu ,&nbsp;Lanyue Zhang ,&nbsp;YingLu He ,&nbsp;Liangcheng Zhou ,&nbsp;Yiming Chen ,&nbsp;Qing Yuan ,&nbsp;Xingyun Duan ,&nbsp;Xiaorong Lv","doi":"10.1016/j.aei.2026.104346","DOIUrl":"10.1016/j.aei.2026.104346","url":null,"abstract":"<div><div>Multi-machine coordinated scheduling systems are a key technology for efficient production in unmanned farms and play an important role in advancing agricultural informatization. However, existing agricultural machinery scheduling studies often neglect cross-regional operational requirements in China’s hilly and mountainous areas and overlook the influence of soil moisture on machinery performance. To address these challenges, this study formulates a multi-warehouse and multi-machine agricultural machinery scheduling model and develops a tri-objective optimization framework to simultaneously minimize travel distance, operation time, and carbon emissions. A reinforcement learning-based hyper-heuristic algorithm (RLHHA) is proposed, which adopts a five-layer encoding scheme tailored to the problem structure and integrates eight customized low-level heuristics with a Q-learning controller. Hypervolume and spacing metrics are employed as state features to guide the adaptive selection of heuristics, thereby improving the accuracy and stability of scheduling decisions. Extensive experiments are conducted on six benchmark instances of different scales. Comparative results with three classical algorithms and two advanced hybrid algorithms demonstrate that the proposed RLHHA achieves superior performance in terms of solution accuracy, convergence quality, and robustness. The results indicate that the proposed model and algorithm can effectively support accurate, reliable, and sustainable decision-making for cross-regional agricultural machinery scheduling in real-world scenarios.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104346"},"PeriodicalIF":9.9,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative AI-driven data augmentation and object-guided vision-language reasoning for PPE compliance analysis in work-at-height 高空作业PPE符合性分析的生成人工智能驱动数据增强和对象引导视觉语言推理
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.aei.2026.104364
Wenyu Xu , Wen Yi , Yi Tan
PPE compliance is a fundamental prerequisite for ensuring safety in work-at-height. Although computer vision has advanced PPE detection, challenges remain in dataset scarcity that limits generalization and in weak semantic reasoning that hinders reliable compliance verification. To address these limitations, this paper presents a generative AI-driven data augmentation and an object-guided vision-language model (VLM) to analyze PPE compliance in work-at-height. Safety standards on work-at-height and PPE (e.g., GB 80-2016, GB 2811-2019) are formalized via ChatGPT 4o into a variable pool and structured prompts, which are used as inputs to text-to-image (T2I) generation model for generating a synthetic dataset. Object detection model is employed to detect PPE elements, and the structured outputs of object detection model are integrated with VLM, enabling vision-language reasoning that combines object detection with natural language understanding. Experimental results demonstrate that DALL·E 3 produces a more realistic synthetic dataset than other image generation models, with the hybrid dataset significantly improving detection performance ([email protected]=88.5%, Small Object [email protected]=75.8%). Using YOLOv11 detections as structured inputs, Qwen2.5-VL-7B achieves reliable compliance reasoning (CRA=87.6%, SC=0.83, EQ=4.2), and these advances are consolidated in an integrated platform supporting automated reporting and interactive analysis. This framework enhances work-at-height safety by alleviating data scarcity through generative augmentation and strengthening PPE compliance reasoning.
遵守个人防护装备是确保高空工作安全的基本先决条件。尽管计算机视觉具有先进的PPE检测,但数据集稀缺性限制了泛化,弱语义推理阻碍了可靠的符合性验证,这些方面仍然存在挑战。为了解决这些限制,本文提出了一个生成式人工智能驱动的数据增强和一个对象引导的视觉语言模型(VLM)来分析高空工作中的PPE合规性。通过ChatGPT 40将高空作业和个人防护安全标准(如GB 80-2016、GB 2811-2019)形式化为变量池和结构化提示,并将其作为文本到图像(t2c)生成模型的输入,生成合成数据集。采用目标检测模型对PPE元素进行检测,并将目标检测模型的结构化输出与VLM相结合,实现了目标检测与自然语言理解相结合的视觉语言推理。实验结果表明,与其他图像生成模型相比,DALL·e3生成的合成数据集更真实,混合数据集显著提高了检测性能([email protected]=88.5%, Small Object [email protected]=75.8%)。Qwen2.5-VL-7B使用YOLOv11检测作为结构化输入,实现了可靠的符合性推理(CRA=87.6%, SC=0.83, EQ=4.2),并将这些进展整合到一个支持自动报告和交互式分析的集成平台中。该框架通过生成增强和加强PPE合规推理来缓解数据稀缺性,从而提高高空作业安全性。
{"title":"Generative AI-driven data augmentation and object-guided vision-language reasoning for PPE compliance analysis in work-at-height","authors":"Wenyu Xu ,&nbsp;Wen Yi ,&nbsp;Yi Tan","doi":"10.1016/j.aei.2026.104364","DOIUrl":"10.1016/j.aei.2026.104364","url":null,"abstract":"<div><div>PPE compliance is a fundamental prerequisite for ensuring safety in work-at-height. Although computer vision has advanced PPE detection, challenges remain in dataset scarcity that limits generalization and in weak semantic reasoning that hinders reliable compliance verification. To address these limitations, this paper presents a generative AI-driven data augmentation and an object-guided vision-language model (VLM) to analyze PPE compliance in work-at-height. Safety standards on work-at-height and PPE (e.g., GB 80-2016, GB 2811-2019) are formalized via ChatGPT 4o into a variable pool and structured prompts, which are used as inputs to text-to-image (T2I) generation model for generating a synthetic dataset. Object detection model is employed to detect PPE elements, and the structured outputs of object detection model are integrated with VLM, enabling vision-language reasoning that combines object detection with natural language understanding. Experimental results demonstrate that DALL·E 3 produces a more realistic synthetic dataset than other image generation models, with the hybrid dataset significantly improving detection performance ([email protected]=88.5%, Small Object [email protected]=75.8%). Using YOLOv11 detections as structured inputs, Qwen2.5-VL-7B achieves reliable compliance reasoning (CRA=87.6%, SC=0.83, EQ=4.2), and these advances are consolidated in an integrated platform supporting automated reporting and interactive analysis. This framework enhances work-at-height safety by alleviating data scarcity through generative augmentation and strengthening PPE compliance reasoning.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104364"},"PeriodicalIF":9.9,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatio-temporal motion-aware intelligent robotic grasping with velocity estimation for moving objects 基于运动物体速度估计的时空运动感知智能机器人抓取
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.aei.2026.104367
Qing Jiao , Weifei Hu , Tingjie Wang , Geyu Shao , Ning Tang , Jiayi Wang , Long Fang
Dynamic grasping capabilities, i.e., grasping moving objects in unstructured environments, could render robotic systems more competitive in both industrial and daily life applications. However, previous studies mostly relied on restrictive assumptions, such as static objects subject to slight perturbations or pre-learned object motion patterns, which severely limited adaptability to unknown trajectories. While recent learning-based methods relax these assumptions, they prioritize object or grasp tracking to ensure smooth robot motion over future grasp pose prediction. The scarcity of dynamic grasp datasets further hinders the advancement of learning-based methods. To address these challenges, this paper presents a moving-object grasp prediction method based on Conv-T (Convolutional Transformer), a hierarchical architecture that fuses spatiotemporal features for motion-aware dynamic grasping. By integrating velocity estimation, this method models the dynamics of the latent motion trajectories from time-series depth images to predict future grasp poses. The Conv-T is built based on a proposed SLiding Window Multi-head Self-Attention (SLW-MSA) mechanism, which balances computational efficiency with performance by integrating the properties of convolutional operations and self-attention mechanisms. Additionally, a dynamic grasp dataset generation pipeline combining data synthesis with data expansion techniques is developed to efficiently embed temporal motion cues into the training data. The proposed method is validated on the constructed dynamic grasp datasets as well as in simulated and real‐world robotic environments. Experimental results demonstrate that our Conv-T-based method not only outperforms state-of-the-art networks on datasets but also exhibits superior robustness compared to other baselines when grasping moving objects.
动态抓取能力,即在非结构化环境中抓取移动物体,可以使机器人系统在工业和日常生活应用中更具竞争力。然而,以往的研究大多依赖于限制性假设,如静态物体受到轻微扰动或预先学习的物体运动模式,这严重限制了对未知轨迹的适应性。虽然最近基于学习的方法放松了这些假设,但它们优先考虑物体或抓取跟踪,以确保机器人在未来抓取姿势预测上的平滑运动。动态抓取数据集的缺乏进一步阻碍了基于学习的方法的发展。为了解决这些挑战,本文提出了一种基于卷积变换(convt)的运动物体抓取预测方法,这是一种融合时空特征的分层结构,用于运动感知动态抓取。该方法通过积分速度估计,对时间序列深度图像的潜在运动轨迹进行动力学建模,以预测未来的抓取姿势。该算法基于滑动窗口多头自注意(SLW-MSA)机制,通过集成卷积运算和自注意机制的特性,平衡了计算效率和性能。此外,开发了一种结合数据合成和数据扩展技术的动态抓取数据生成管道,以有效地将时间运动线索嵌入到训练数据中。在构建的动态抓取数据集以及模拟和现实机器人环境中验证了所提出的方法。实验结果表明,我们的基于卷积的方法不仅在数据集上优于最先进的网络,而且在抓取运动物体时,与其他基线相比,具有优越的鲁棒性。
{"title":"Spatio-temporal motion-aware intelligent robotic grasping with velocity estimation for moving objects","authors":"Qing Jiao ,&nbsp;Weifei Hu ,&nbsp;Tingjie Wang ,&nbsp;Geyu Shao ,&nbsp;Ning Tang ,&nbsp;Jiayi Wang ,&nbsp;Long Fang","doi":"10.1016/j.aei.2026.104367","DOIUrl":"10.1016/j.aei.2026.104367","url":null,"abstract":"<div><div>Dynamic grasping capabilities, i.e., grasping moving objects in unstructured environments, could render robotic systems more competitive in both industrial and daily life applications. However, previous studies mostly relied on restrictive assumptions, such as static objects subject to slight perturbations or pre-learned object motion patterns, which severely limited adaptability to unknown trajectories. While recent learning-based methods relax these assumptions, they prioritize object or grasp tracking to ensure smooth robot motion over future grasp pose prediction. The scarcity of dynamic grasp datasets further hinders the advancement of learning-based methods. To address these challenges, this paper presents a moving-object grasp prediction method based on Conv-T (Convolutional Transformer), a hierarchical architecture that fuses spatiotemporal features for motion-aware dynamic grasping. By integrating velocity estimation, this method models the dynamics of the latent motion trajectories from time-series depth images to predict future grasp poses. The Conv-T is built based on a proposed <strong>SL</strong>iding <strong>W</strong>indow <strong>M</strong>ulti-head <strong>S</strong>elf-<strong>A</strong>ttention (SLW-MSA) mechanism, which balances computational efficiency with performance by integrating the properties of convolutional operations and self-attention mechanisms. Additionally, a dynamic grasp dataset generation pipeline combining data synthesis with data expansion techniques is developed to efficiently embed temporal motion cues into the training data. The proposed method is validated on the constructed dynamic grasp datasets as well as in simulated and real‐world robotic environments. Experimental results demonstrate that our Conv-T-based method not only outperforms state-of-the-art networks on datasets but also exhibits superior robustness compared to other baselines when grasping moving objects.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104367"},"PeriodicalIF":9.9,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Advanced Engineering Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1