首页 > 最新文献

Advanced Engineering Informatics最新文献

英文 中文
A novel hybrid neural network for high-accuracy vehicle-to-infrastructure network traffic prediction 基于混合神经网络的车辆与基础设施网络流量高精度预测
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-01 Epub Date: 2026-02-06 DOI: 10.1016/j.aei.2026.104423
Xiaosheng Ni , Jingpu Duan , Xiong Li , Xin Zhang
To address the challenges in Vehicle-to-Infrastructure (V2I) network traffic prediction, this study proposes an innovative solution. We first establish a novel paradigm that integrates physical models to systematically convert publicly available vehicle trajectory data into V2I traffic data. On this basis, a gCNN–BiLSTM–MHA deep learning model is constructed, whose core advantage lies in its use of a lightweight GhostNet-based convolutional network (gCNN) to improve computational efficiency, while leveraging the synergistic effect of a bidirectional long short-term memory network (BiLSTM) and a multi-head attention mechanism (MHA) to effectively balance prediction efficiency and accuracy. The model’s superiority is comprehensively validated: compared to baseline models like LSTM, it demonstrates significant advantages across a series of key evaluation metrics — including running time, MBD, MAE, MAPE, RMSE, and R2 — achieving an overall balanced performance. Furthermore, the model exhibits excellent performance on multiple benchmark datasets, confirming its strong robustness and high applicability for complex V2I network traffic prediction tasks.
为了应对车辆到基础设施(V2I)网络流量预测中的挑战,本研究提出了一种创新的解决方案。我们首先建立了一个新的范例,该范例集成了物理模型,系统地将公开可用的车辆轨迹数据转换为V2I交通数据。在此基础上,构建了gCNN - BiLSTM - MHA深度学习模型,其核心优势在于利用基于ghostnet的轻量级卷积网络(gCNN)提高计算效率,同时利用双向长短期记忆网络(BiLSTM)和多头注意机制(MHA)的协同效应,有效平衡预测效率和准确性。该模型的优势得到了全面验证:与LSTM等基线模型相比,它在一系列关键评估指标(包括运行时间、MBD、MAE、MAPE、RMSE和R2)上显示出显著优势,实现了整体平衡性能。此外,该模型在多个基准数据集上表现出优异的性能,证实了其较强的鲁棒性和对复杂V2I网络流量预测任务的高适用性。
{"title":"A novel hybrid neural network for high-accuracy vehicle-to-infrastructure network traffic prediction","authors":"Xiaosheng Ni ,&nbsp;Jingpu Duan ,&nbsp;Xiong Li ,&nbsp;Xin Zhang","doi":"10.1016/j.aei.2026.104423","DOIUrl":"10.1016/j.aei.2026.104423","url":null,"abstract":"<div><div>To address the challenges in Vehicle-to-Infrastructure (V2I) network traffic prediction, this study proposes an innovative solution. We first establish a novel paradigm that integrates physical models to systematically convert publicly available vehicle trajectory data into V2I traffic data. On this basis, a gCNN–BiLSTM–MHA deep learning model is constructed, whose core advantage lies in its use of a lightweight GhostNet-based convolutional network (gCNN) to improve computational efficiency, while leveraging the synergistic effect of a bidirectional long short-term memory network (BiLSTM) and a multi-head attention mechanism (MHA) to effectively balance prediction efficiency and accuracy. The model’s superiority is comprehensively validated: compared to baseline models like LSTM, it demonstrates significant advantages across a series of key evaluation metrics — including running time, MBD, MAE, MAPE, RMSE, and <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> — achieving an overall balanced performance. Furthermore, the model exhibits excellent performance on multiple benchmark datasets, confirming its strong robustness and high applicability for complex V2I network traffic prediction tasks.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104423"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146188825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling AI-driven modular building design: an auto-decoder approach for IFC 3D geometry representation 启用ai驱动的模块化建筑设计:IFC 3D几何表示的自动解码器方法
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-01 Epub Date: 2026-01-13 DOI: 10.1016/j.aei.2026.104326
Sang Du , Lei Hou , Guomin (Kevin) Zhang , Yang Zou , Haosen Chen
Modular building design requires numerous context-dependent component variants that traditional constraint-based methods cannot exhaustively enumerate. Industry Foundation Classes (IFC) models encode rich spatial and semantic context from completed modular projects. This context could enable Artificial Intelligence (AI) models to generate component variants and complement constraint-based methods. However, IFC 3D geometry that carries spatial context is not directly usable by AI models. This stems from IFC’s complex data structure. To address this limitation, this paper proposes a readily deployable auto-decoder method that produces AI-compatible vectors from IFC geometry. First, an IFC export strategy that retains component spatial context is employed. Second, a sampling method that pairs 3D points with their distances to the nearest surface is applied. Third, an auto-decoder neural network that jointly optimises per-component vectors and the model weights is presented, yielding context-aware representation vectors for modular components. Finally, an octree-based decoder for accurate geometry recovery from vectors is employed. Experiments on real-world modular project data demonstrate that the resulting vectors preserve geometric fidelity and support component variant generation. Geometric fidelity is confirmed by the mean and maximum surface reconstruction errors of 14.57 mm and 51.94 mm, sufficient for modular building design analysis. Support for component variant generation is evidenced by geometric interpolation linearity exceeding 0.98 out of 1, showing excellent variant generation suitability. This method makes IFC spatial context accessible to AI-driven modular design methods, transforming Design for Manufacture and Assembly (DfMA) data into actionable knowledge. Codes available on GitHub.
模块化建筑设计需要大量与上下文相关的组件变体,而传统的基于约束的方法无法详尽地列举这些变体。工业基础类(IFC)模型从已完成的模块化项目中编码丰富的空间和语义上下文。该上下文可以使人工智能(AI)模型生成组件变体并补充基于约束的方法。然而,带有空间背景的IFC 3D几何图形不能直接用于AI模型。这源于IFC复杂的数据结构。为了解决这一限制,本文提出了一种易于部署的自动解码器方法,该方法可以从IFC几何形状中产生与ai兼容的向量。首先,采用了保留组件空间上下文的IFC出口策略。其次,采用一种将三维点与其最近表面的距离配对的采样方法。第三,提出了一种自动解码器神经网络,该网络联合优化每个组件向量和模型权重,生成模块化组件的上下文感知表示向量。最后,采用基于八叉树的解码器对矢量进行精确的几何恢复。在实际模块化工程数据上的实验表明,所得到的向量保持了几何保真度,并支持组件变体的生成。平均表面重构误差为14.57 mm,最大表面重构误差为51.94 mm,证实了几何保真度,足以进行模块化建筑设计分析。几何插补线性度超过0.98 (out of 1),显示出良好的变量生成适宜性。这种方法使人工智能驱动的模块化设计方法可以访问IFC的空间背景,将制造和装配设计(DfMA)数据转化为可操作的知识。代码可在GitHub。
{"title":"Enabling AI-driven modular building design: an auto-decoder approach for IFC 3D geometry representation","authors":"Sang Du ,&nbsp;Lei Hou ,&nbsp;Guomin (Kevin) Zhang ,&nbsp;Yang Zou ,&nbsp;Haosen Chen","doi":"10.1016/j.aei.2026.104326","DOIUrl":"10.1016/j.aei.2026.104326","url":null,"abstract":"<div><div>Modular building design requires numerous context-dependent component variants that traditional constraint-based methods cannot exhaustively enumerate. Industry Foundation Classes (IFC) models encode rich spatial and semantic context from completed modular projects. This context could enable Artificial Intelligence (AI) models to generate component variants and complement constraint-based methods. However, IFC 3D geometry that carries spatial context is not directly usable by AI models. This stems from IFC’s complex data structure. To address this limitation, this paper proposes a readily deployable auto-decoder method that produces AI-compatible vectors from IFC geometry. First, an IFC export strategy that retains component spatial context is employed. Second, a sampling method that pairs 3D points with their distances to the nearest surface is applied. Third, an auto-decoder neural network that jointly optimises per-component vectors and the model weights is presented, yielding context-aware representation vectors for modular components. Finally, an octree-based decoder for accurate geometry recovery from vectors is employed. Experiments on real-world modular project data demonstrate that the resulting vectors preserve geometric fidelity and support component variant generation. Geometric fidelity is confirmed by the mean and maximum surface reconstruction errors of 14.57 mm and 51.94 mm, sufficient for modular building design analysis. Support for component variant generation is evidenced by geometric interpolation linearity exceeding 0.98 out of 1, showing excellent variant generation suitability. This method makes IFC spatial context accessible to AI-driven modular design methods, transforming Design for Manufacture and Assembly (DfMA) data into actionable knowledge. Codes available on GitHub.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104326"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CRFPI-Net: context-aware risk feature perception and inference network for pixel-level urban traffic risk mapping CRFPI-Net:面向像素级城市交通风险映射的情境感知风险特征感知与推理网络
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-01 Epub Date: 2026-01-10 DOI: 10.1016/j.aei.2025.104299
Wentong Guo , Wenzhu Xu , Chengcheng Yang , Zhijian Zhao , Xi Gao , WenBin Yao , Sheng Jin
Urban traffic accidents result in significant casualties and property losses. Conducting traffic risk mapping and inference for urban areas provides substantial benefits for accident prevention as well as future planning and governance. However, pixel-level fine-grained inference of urban traffic risk maps remains challenging, primarily due to the complex layout of urban road networks, the temporal variability of traffic dynamics, and the heterogeneity of spatial semantic information. In this study, we propose an end-to-end Context-Aware Risk Feature Perception and Inference Network (CRFPI-Net) based on multimodal data to achieve fine-grained inference of urban traffic risk maps. In CRFPI-Net, three separate branches are designed to capture risk features from satellite remote sensing imagery, spatiotemporal traffic sequences, and area-of-interest (AOI) semantic information. The risk-aware features from each branch are integrated using a gated fusion mechanism to eliminate redundant information, and the fused features are further processed by context-aware multi-scale correlation analysis to reduce the adverse impact of heterogeneous variations in risk regions on risk perception. Finally, CRFPI-Net produces pixel-level inference maps of urban traffic accident risk, enabling effective and low-cost guidance for traffic accident prevention. The proposed model is quantitatively evaluated on real-world datasets and achieves state-of-the-art performance. Ablation experiments further demonstrate the rationality and effectiveness of the designed modules. The code and pretrained models for urban traffic risk mapping are publicly available at https://github.com/gwt-ZJU/CRFPI-Net.
城市交通事故造成重大人员伤亡和财产损失。对城市地区进行交通风险测绘和推断,为事故预防以及未来规划和治理提供了实质性的好处。然而,由于城市道路网络的复杂布局、交通动态的时间变异性和空间语义信息的异质性,城市交通风险地图的像素级细粒度推理仍然具有挑战性。在本研究中,我们提出了一个基于多模态数据的端到端上下文感知风险特征感知与推理网络(CRFPI-Net),以实现城市交通风险地图的细粒度推理。在CRFPI-Net中,设计了三个独立的分支来捕获来自卫星遥感图像、时空交通序列和兴趣区域(AOI)语义信息的风险特征。采用门控融合机制对各分支的风险感知特征进行融合,消除冗余信息,并通过上下文感知多尺度相关分析对融合特征进行进一步处理,降低风险区域异质变化对风险感知的不利影响。最后,CRFPI-Net生成城市交通事故风险的像素级推理地图,为交通事故预防提供有效和低成本的指导。所提出的模型在真实世界的数据集上进行了定量评估,并达到了最先进的性能。烧蚀实验进一步验证了所设计模块的合理性和有效性。城市交通风险地图的代码和预训练模型可在https://github.com/gwt-ZJU/CRFPI-Net上公开获取。
{"title":"CRFPI-Net: context-aware risk feature perception and inference network for pixel-level urban traffic risk mapping","authors":"Wentong Guo ,&nbsp;Wenzhu Xu ,&nbsp;Chengcheng Yang ,&nbsp;Zhijian Zhao ,&nbsp;Xi Gao ,&nbsp;WenBin Yao ,&nbsp;Sheng Jin","doi":"10.1016/j.aei.2025.104299","DOIUrl":"10.1016/j.aei.2025.104299","url":null,"abstract":"<div><div>Urban traffic accidents result in significant casualties and property losses. Conducting traffic risk mapping and inference for urban areas provides substantial benefits for accident prevention as well as future planning and governance. However, pixel-level fine-grained inference of urban traffic risk maps remains challenging, primarily due to the complex layout of urban road networks, the temporal variability of traffic dynamics, and the heterogeneity of spatial semantic information. In this study, we propose an end-to-end Context-Aware Risk Feature Perception and Inference Network (CRFPI-Net) based on multimodal data to achieve fine-grained inference of urban traffic risk maps. In CRFPI-Net, three separate branches are designed to capture risk features from satellite remote sensing imagery, spatiotemporal traffic sequences, and area-of-interest (AOI) semantic information. The risk-aware features from each branch are integrated using a gated fusion mechanism to eliminate redundant information, and the fused features are further processed by context-aware multi-scale correlation analysis to reduce the adverse impact of heterogeneous variations in risk regions on risk perception. Finally, CRFPI-Net produces pixel-level inference maps of urban traffic accident risk, enabling effective and low-cost guidance for traffic accident prevention. The proposed model is quantitatively evaluated on real-world datasets and achieves state-of-the-art performance. Ablation experiments further demonstrate the rationality and effectiveness of the designed modules. The code and pretrained models for urban traffic risk mapping are publicly available at <span><span>https://github.com/gwt-ZJU/CRFPI-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104299"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Depth-guided cross-modal fusion and diffusion-based enhancement for robust pavement defect segmentation 基于深度引导的跨模态融合和扩散增强稳健路面缺陷分割
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-01 Epub Date: 2026-01-14 DOI: 10.1016/j.aei.2026.104339
Yihui Shan , Wei Li , Jiaqi Shi , Yansong Wang , Zhenzhen Xing , Jiangang Ding , Lili Pei
Accurate and efficient perception of pavement conditions is essential for maintaining transportation infrastructure and ensuring driving safety. In real-world road environments, visual and depth data collected by inspection or autonomous vehicles are often affected by modality imbalance, sensor noise, and image degradation, which compromise the reliability of defect segmentation. To address these challenges, this study proposes a cross-modal segmentation framework that integrates depth-guided fusion with generative latent feature enhancement to achieve robust pavement defect perception under diverse conditions. A defect-centric and class-aware depth prompting strategy is developed to transform geometric priors into explicit guidance for the intensity stream, enabling background suppression before encoding and boundary refinement within intermediate layers. In parallel, a latent feature enhancement module aligns the Segment Anything Model (SAM) feature space with a pretrained diffusion latent space and performs efficient one-step denoising, restoring structural consistency while avoiding the heavy overhead of iterative diffusion sampling. The overall design preserves the efficiency and generalization of SAM while introducing lightweight trainable adapters and low-rank diffusion updates. Experimental evaluations on multimodal pavement datasets demonstrate that the proposed approach achieves higher segmentation accuracy and robustness compared with state-of-the-art fusion methods. The results highlight the potential of the proposed framework to support intelligent pavement inspection, condition assessment, and maintenance decision-making.
准确有效地感知路面状况对于维护交通基础设施和确保驾驶安全至关重要。在现实道路环境中,检测或自动驾驶车辆收集的视觉和深度数据经常受到模态不平衡、传感器噪声和图像退化的影响,从而影响缺陷分割的可靠性。为了解决这些挑战,本研究提出了一种跨模态分割框架,该框架将深度引导融合与生成潜在特征增强相结合,以实现不同条件下稳健的路面缺陷感知。开发了一种以缺陷为中心和类别感知的深度提示策略,将几何先验转换为强度流的明确指导,实现了编码前的背景抑制和中间层内的边界细化。同时,潜在特征增强模块将分段任意模型(SAM)特征空间与预训练的扩散潜在空间对齐,并执行有效的一步去噪,在恢复结构一致性的同时避免了迭代扩散采样的繁重开销。总体设计保留了SAM的效率和通用性,同时引入了轻量级可训练适配器和低秩扩散更新。在多模式路面数据集上的实验结果表明,与现有的融合方法相比,该方法具有更高的分割精度和鲁棒性。研究结果强调了该框架在支持智能路面检测、状况评估和维护决策方面的潜力。
{"title":"Depth-guided cross-modal fusion and diffusion-based enhancement for robust pavement defect segmentation","authors":"Yihui Shan ,&nbsp;Wei Li ,&nbsp;Jiaqi Shi ,&nbsp;Yansong Wang ,&nbsp;Zhenzhen Xing ,&nbsp;Jiangang Ding ,&nbsp;Lili Pei","doi":"10.1016/j.aei.2026.104339","DOIUrl":"10.1016/j.aei.2026.104339","url":null,"abstract":"<div><div>Accurate and efficient perception of pavement conditions is essential for maintaining transportation infrastructure and ensuring driving safety. In real-world road environments, visual and depth data collected by inspection or autonomous vehicles are often affected by modality imbalance, sensor noise, and image degradation, which compromise the reliability of defect segmentation. To address these challenges, this study proposes a cross-modal segmentation framework that integrates depth-guided fusion with generative latent feature enhancement to achieve robust pavement defect perception under diverse conditions. A defect-centric and class-aware depth prompting strategy is developed to transform geometric priors into explicit guidance for the intensity stream, enabling background suppression before encoding and boundary refinement within intermediate layers. In parallel, a latent feature enhancement module aligns the Segment Anything Model (SAM) feature space with a pretrained diffusion latent space and performs efficient one-step denoising, restoring structural consistency while avoiding the heavy overhead of iterative diffusion sampling. The overall design preserves the efficiency and generalization of SAM while introducing lightweight trainable adapters and low-rank diffusion updates. Experimental evaluations on multimodal pavement datasets demonstrate that the proposed approach achieves higher segmentation accuracy and robustness compared with state-of-the-art fusion methods. The results highlight the potential of the proposed framework to support intelligent pavement inspection, condition assessment, and maintenance decision-making.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104339"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large language models enable semantic-guided hierarchical games for intelligent battery coordination 大型语言模型支持语义引导的分层游戏,用于智能电池协调
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-01 Epub Date: 2026-01-08 DOI: 10.1016/j.aei.2026.104312
Yuntao Zou , Zihui Lin , Qianqi Zhang , Zhichun Liu , Zeling Xu
The battery energy consumption system of lunar exploration rovers, as mission-critical equipment, confronts severe challenges under extreme environmental constraints. However, existing modeling methods face fundamental dilemmas: dynamic uncertainty leads to highly ambiguous constraint boundaries, making it difficult for traditional mathematical languages to describe complex coupling relationships; even when mathematical representations are constructed, high-dimensional nonlinear optimization problems become computationally intractable, with existing algorithms unable to address complexity barriers and lacking interpretability. In response to these challenges, this paper innovatively proposes a hierarchical Stackelberg game optimization framework based on semantic embedding. This framework transcends traditional optimization paradigms by deeply integrating the cognitive intelligence of large language models with the mathematical precision of game theory: large language models acknowledge that overall behavior cannot be predicted from simple combinations of parts, processing fuzzy constraints and cross-domain knowledge integration through semantic understanding; the hierarchical structure of Stackelberg games naturally adapts to the hierarchical decision-making requirements of battery allocation, with multi-agent game frameworks effectively handling coordination and competition relationships between batteries. Through semantic embedding technology, natural language constraints are automatically transformed into mathematical objects comprehensible to game participants, with cognitive intelligence handling the “incomputable” complexity components while game theory ensures “provable” mathematical convergence, synergistically achieving the important paradigm transition from “perfect rationality” to “bounded rationality,” thereby providing a theoretically rigorous and practically viable unified solution for intelligent decision-making in mission-critical systems.
月球探测车电池能耗系统作为关键任务设备,在极端环境约束下面临严峻挑战。然而,现有的建模方法面临着根本性的困境:动态不确定性导致约束边界高度模糊,使得传统数学语言难以描述复杂的耦合关系;即使构建了数学表示,高维非线性优化问题在计算上也变得难以处理,现有算法无法解决复杂性障碍且缺乏可解释性。针对这些挑战,本文创新性地提出了一种基于语义嵌入的分层Stackelberg博弈优化框架。该框架超越了传统的优化范式,将大型语言模型的认知智能与博弈论的数学精度深度融合在一起:大型语言模型承认,整体行为不能通过简单的部件组合、模糊约束的处理以及通过语义理解进行跨领域知识整合来预测;Stackelberg博弈的分层结构自然适应了电池分配的分层决策要求,多智能体博弈框架有效处理了电池之间的协调与竞争关系。通过语义嵌入技术,自然语言约束自动转化为博弈参与者可以理解的数学对象,认知智能处理“不可计算”的复杂性成分,博弈论确保“可证明”的数学收敛,协同实现从“完全理性”到“有限理性”的重要范式转换。从而为关键任务系统的智能决策提供了理论严谨、实践可行的统一解决方案。
{"title":"Large language models enable semantic-guided hierarchical games for intelligent battery coordination","authors":"Yuntao Zou ,&nbsp;Zihui Lin ,&nbsp;Qianqi Zhang ,&nbsp;Zhichun Liu ,&nbsp;Zeling Xu","doi":"10.1016/j.aei.2026.104312","DOIUrl":"10.1016/j.aei.2026.104312","url":null,"abstract":"<div><div>The battery energy consumption system of lunar exploration rovers, as mission-critical equipment, confronts severe challenges under extreme environmental constraints. However, existing modeling methods face fundamental dilemmas: dynamic uncertainty leads to highly ambiguous constraint boundaries, making it difficult for traditional mathematical languages to describe complex coupling relationships; even when mathematical representations are constructed, high-dimensional nonlinear optimization problems become computationally intractable, with existing algorithms unable to address complexity barriers and lacking interpretability. In response to these challenges, this paper innovatively proposes a hierarchical Stackelberg game optimization framework based on semantic embedding. This framework transcends traditional optimization paradigms by deeply integrating the cognitive intelligence of large language models with the mathematical precision of game theory: large language models acknowledge that overall behavior cannot be predicted from simple combinations of parts, processing fuzzy constraints and cross-domain knowledge integration through semantic understanding; the hierarchical structure of Stackelberg games naturally adapts to the hierarchical decision-making requirements of battery allocation, with multi-agent game frameworks effectively handling coordination and competition relationships between batteries. Through semantic embedding technology, natural language constraints are automatically transformed into mathematical objects comprehensible to game participants, with cognitive intelligence handling the “incomputable” complexity components while game theory ensures “provable” mathematical convergence, synergistically achieving the important paradigm transition from “perfect rationality” to “bounded rationality,” thereby providing a theoretically rigorous and practically viable unified solution for intelligent decision-making in mission-critical systems.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104312"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145926893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large language model-empowered dynamic scheduling for intelligent hybrid flow shop using multi-agent deep reinforcement learning 基于多智能体深度强化学习的大语言模型智能混合流水车间动态调度
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-01 Epub Date: 2026-01-06 DOI: 10.1016/j.aei.2025.104294
Wenbin Gu , Yushang Cao , Yuxin Li , Nuandong Li , Lei Wang , Na Tang , Minghai Yuan , Fengque Pei
With the emergence of personalized and small-batch production modes, multi-agent manufacturing systems (MAMS) have become a research hotspot for intelligent workshop owing to their self‑organizing capabilities. The hybrid flow shop scheduling problem with unrelated parallel machines (HFSP-UPM) presents significant decision-making challenges due to its heterogeneous resources and dynamic environment. Meanwhile, multi-agent deep reinforcement learning (MADRL) is a prevalent method for addressing complex decision‑making problems. Therefore, this paper proposes a pre-trained large language model (LLM) empowered MADRL method for HFSP-UPM considering stage-wise coordination to minimize the makespan. Specifically, a novel MAMS is developed first, where each processing stage is modeled as an agent to enable high autonomy and reduce decision dimensionality. Then, a multi-agent collaborative scheduling framework based on the centralized training with decentralized execution paradigm (CTDE) is proposed, and the communication mechanism among agents is proposed to promote coordination and collaboration. Through structured prompt engineering, an LLM empowered state space and action selection are designed to enhance semantic understanding and policy updates. Finally, the LLM empowered multi-agent proximal policy optimization (LLM-MAPPO) is employed to train the scheduling model. Experimental results on 330 instances show the superiority of the proposed method over scheduling rules, genetic programming (GP) rules, several advanced DRL-based methods, as well as the baseline MAPPO, achieving over 8% performance improvement in most instances. Furthermore, the generalization experiment demonstrates that the proposed method has self-adjustment capability in response to production scenario changes, and an example verification is provided to verify the proposed method and the experiment platform.
随着个性化和小批量生产模式的出现,多智能体制造系统(MAMS)由于具有自组织能力而成为智能车间的研究热点。不相关并行机混合流水车间调度问题由于其资源的异构性和环境的动态性,给决策提出了很大的挑战。与此同时,多智能体深度强化学习(MADRL)是解决复杂决策问题的一种流行方法。因此,本文提出了一种用于HFSP-UPM的预训练大语言模型(LLM)授权MADRL方法,考虑阶段协调以最小化完工时间。具体而言,首先开发了一种新的MAMS,其中每个处理阶段都建模为一个代理,以实现高度自治并降低决策维度。然后,提出了一种基于集中训练分散执行范式(CTDE)的多智能体协同调度框架,并提出了智能体之间的通信机制,以促进协调与协作。通过结构化的提示工程,设计了一个LLM授权的状态空间和动作选择,以增强语义理解和策略更新。最后,利用基于LLM的多智能体近端策略优化(LLM- mappo)对调度模型进行训练。330个实例的实验结果表明,该方法优于调度规则、遗传规划(GP)规则、几种先进的基于drl的方法以及基线MAPPO,在大多数实例中性能提高了8%以上。推广实验表明,该方法具有响应生产场景变化的自适应能力,并通过实例验证了所提方法和实验平台的有效性。
{"title":"Large language model-empowered dynamic scheduling for intelligent hybrid flow shop using multi-agent deep reinforcement learning","authors":"Wenbin Gu ,&nbsp;Yushang Cao ,&nbsp;Yuxin Li ,&nbsp;Nuandong Li ,&nbsp;Lei Wang ,&nbsp;Na Tang ,&nbsp;Minghai Yuan ,&nbsp;Fengque Pei","doi":"10.1016/j.aei.2025.104294","DOIUrl":"10.1016/j.aei.2025.104294","url":null,"abstract":"<div><div>With the emergence of personalized and small-batch production modes, multi-agent manufacturing systems (MAMS) have become a research hotspot for intelligent workshop owing to their self‑organizing capabilities. The hybrid flow shop scheduling problem with unrelated parallel machines (HFSP-UPM) presents significant decision-making challenges due to its heterogeneous resources and dynamic environment. Meanwhile, multi-agent deep reinforcement learning (MADRL) is a prevalent method for addressing complex decision‑making problems. Therefore, this paper proposes a pre-trained large language model (LLM) empowered MADRL method for HFSP-UPM considering stage-wise coordination to minimize the makespan. Specifically, a novel MAMS is developed first, where each processing stage is modeled as an agent to enable high autonomy and reduce decision dimensionality. Then, a multi-agent collaborative scheduling framework based on the centralized training with decentralized execution paradigm (CTDE) is proposed, and the communication mechanism among agents is proposed to promote coordination and collaboration. Through structured prompt engineering, an LLM empowered state space and action selection are designed to enhance semantic understanding and policy updates. Finally, the LLM empowered multi-agent proximal policy optimization (LLM-MAPPO) is employed to train the scheduling model. Experimental results on 330 instances show the superiority of the proposed method over scheduling rules, genetic programming (GP) rules, several advanced DRL-based methods, as well as the baseline MAPPO, achieving over 8% performance improvement in most instances. Furthermore, the generalization experiment demonstrates that the proposed method has self-adjustment capability in response to production scenario changes, and an example verification is provided to verify the proposed method and the experiment platform.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104294"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty-Aware Long-Term Multi-Worker Behavior Anticipation for Proactive Human-Robot Collaboration in Construction 基于不确定性感知的建筑中主动人机协作的长期多工人行为预测
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-01 Epub Date: 2025-12-15 DOI: 10.1016/j.aei.2025.104203
Pan Zaolin , Yu Yantao
Proactively anticipating human behavior is crucial for assistive robots collaborating with humans in dynamic environments like construction. Given a short visual observation (e.g., 3 s), we aim to predict long-term future behavior sequences for all workers to enable timely and context-aware robot assistance. The key technical challenge facing is uncertain, multimodal worker behaviors: multiple socially plausible future behaviors can arise from a limited context. To address this challenge, we propose an uncertainty-aware framework integrating: (1) Cross-granular hyperbolic learning, mitigating ambiguity by predicting high-level shared goals when low-level predictions are uncertain, and (2) Explicit task-constraint integration, ensuring predictions are socially consistent and contextually viable. Validated on real-world scaffolding assembly video and human-robot collaborative board games, our approach eliminates task-dependency violations, reduces task completion time and human-robot task conflicts, enabling smoother, robust human-robot collaboration. Our primary contribution is an uncertainty-aware model for socially consistent, long-horizon multi-worker behavior prediction in construction.
主动预测人类行为对于辅助机器人在建筑等动态环境中与人类合作至关重要。给定一个短的视觉观察(例如,3秒),我们的目标是预测所有工人的长期未来行为序列,以实现及时和上下文感知的机器人协助。面临的关键技术挑战是不确定的、多模式的工人行为:多种社会合理的未来行为可以从有限的环境中产生。为了应对这一挑战,我们提出了一个不确定性感知框架,集成:(1)跨颗粒双曲学习,通过在低级别预测不确定时预测高级别共享目标来减轻模糊性;(2)明确的任务约束集成,确保预测在社会上是一致的,并且在上下文中是可行的。在现实世界的脚手架组装视频和人机协作棋盘游戏中验证,我们的方法消除了任务依赖违反,减少了任务完成时间和人机任务冲突,实现了更顺畅、更稳健的人机协作。我们的主要贡献是一个不确定性意识模型,用于社会一致,长期视野的多工人行为预测。
{"title":"Uncertainty-Aware Long-Term Multi-Worker Behavior Anticipation for Proactive Human-Robot Collaboration in Construction","authors":"Pan Zaolin ,&nbsp;Yu Yantao","doi":"10.1016/j.aei.2025.104203","DOIUrl":"10.1016/j.aei.2025.104203","url":null,"abstract":"<div><div>Proactively anticipating human behavior is crucial for assistive robots collaborating with humans in dynamic environments like construction. Given a short visual observation (e.g., 3 s), we aim to predict long-term future behavior sequences for all workers to enable timely and context-aware robot assistance. The key technical challenge facing is uncertain, multimodal worker behaviors: multiple socially plausible future behaviors can arise from a limited context. To address this challenge, we propose an uncertainty-aware framework integrating: (1) Cross-granular hyperbolic learning, mitigating ambiguity by predicting high-level shared goals when low-level predictions are uncertain, and (2) Explicit task-constraint integration, ensuring predictions are socially consistent and contextually viable. Validated on real-world scaffolding assembly video and human-robot collaborative board games, our approach eliminates task-dependency violations, reduces task completion time and human-robot task conflicts, enabling smoother, robust human-robot collaboration. Our primary contribution is an uncertainty-aware model for socially consistent, long-horizon multi-worker behavior prediction in construction.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104203"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145750224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iteratively modified variational mode extraction (IMVME): A noise-robust transient feature nonlinear extraction approach for aero-engine fault diagnosis 迭代改进变分模提取(IMVME):一种用于航空发动机故障诊断的噪声鲁棒瞬态特征非线性提取方法
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-01 Epub Date: 2025-12-29 DOI: 10.1016/j.aei.2025.104284
Duxi Shang , Rui Yuan , Yong Lv , Hongan Wu , Hengyu Liu , Zhuyun Chen
Due to the complex structure and harsh operating conditions of aero-engine, the transient feature caused by bearing damage can be easily masked by various interference noises, which brings great challenges to transient feature extraction and aero-engine fault diagnosis. Although optimal bandpass filtering methods are widely used for transient feature extraction, it cannot effectively suppress in-band noise, which diminishes the sensitivity of feature indicators and may cause extraction failure. To address above challenges, this paper proposes a noise-robust transient feature nonlinear extraction approach called iteratively modified variational mode extraction (IMVME). Initially, a noise-robust filter, the enhanced variational Wiener filter (EVWF) is proposed. EVWF performs narrowband demodulation while suppressing in-band noise through amplitude reconstruction, thereby enhancing local transient feature and facilitating the weak transient extraction. Subsequently, the modulation spectral density function (MSDF) is introduced as a feature indicator to distinguish fault transient from interference noise and to guide the EVWF in selecting the optimal magnitude order. Finally, IMVME adopts an adaptive filter parameter iterative optimization framework to solve the optimal EVWF by maximizing MSDF, thereby enabling more accurate fault frequency band localization, robust transient feature extraction under complex noise conditions, and greater adaptability and flexibility in filter design. Through validation on multiple scenarios, including simulation signal and aero-engine fault signal, the superiority of IMVME is demonstrated through its ability to adaptively and accurately extract transient feature while maintaining robustness to noise and interference.
由于航空发动机结构复杂、工作条件恶劣,轴承损伤引起的瞬态特征很容易被各种干扰噪声掩盖,这给航空发动机瞬态特征提取和故障诊断带来了很大的挑战。虽然最优带通滤波方法被广泛用于瞬态特征提取,但它不能有效抑制带内噪声,降低了特征指标的灵敏度,可能导致提取失败。针对上述问题,本文提出了一种抗噪声暂态特征非线性提取方法——迭代改进变分模提取(IMVME)。首先,提出了一种抗噪声的增强变分维纳滤波器(EVWF)。EVWF进行窄带解调,同时通过幅度重构抑制带内噪声,增强局部瞬态特征,便于弱瞬态提取。随后,引入调制谱密度函数(MSDF)作为特征指标,用于区分故障暂态和干扰噪声,并指导EVWF选择最优数量级。最后,IMVME采用自适应滤波器参数迭代优化框架,通过最大化MSDF来求解最优EVWF,从而实现更精确的故障频带定位和复杂噪声条件下的鲁棒瞬态特征提取,增强了滤波器设计的适应性和灵活性。通过对仿真信号和航空发动机故障信号等多种场景的验证,IMVME能够自适应准确提取瞬态特征,同时保持对噪声和干扰的鲁棒性,证明了该方法的优越性。
{"title":"Iteratively modified variational mode extraction (IMVME): A noise-robust transient feature nonlinear extraction approach for aero-engine fault diagnosis","authors":"Duxi Shang ,&nbsp;Rui Yuan ,&nbsp;Yong Lv ,&nbsp;Hongan Wu ,&nbsp;Hengyu Liu ,&nbsp;Zhuyun Chen","doi":"10.1016/j.aei.2025.104284","DOIUrl":"10.1016/j.aei.2025.104284","url":null,"abstract":"<div><div>Due to the complex structure and harsh operating conditions of aero-engine, the transient feature caused by bearing damage can be easily masked by various interference noises, which brings great challenges to transient feature extraction and aero-engine fault diagnosis. Although optimal bandpass filtering methods are widely used for transient feature extraction, it cannot effectively suppress in-band noise, which diminishes the sensitivity of feature indicators and may cause extraction failure. To address above challenges, this paper proposes a noise-robust transient feature nonlinear extraction approach called iteratively modified variational mode extraction (IMVME). Initially, a noise-robust filter, the enhanced variational Wiener filter (EVWF) is proposed. EVWF performs narrowband demodulation while suppressing in-band noise through amplitude reconstruction, thereby enhancing local transient feature and facilitating the weak transient extraction. Subsequently, the modulation spectral density function (MSDF) is introduced as a feature indicator to distinguish fault transient from interference noise and to guide the EVWF in selecting the optimal magnitude order. Finally, IMVME adopts an adaptive filter parameter iterative optimization framework to solve the optimal EVWF by maximizing MSDF, thereby enabling more accurate fault frequency band localization, robust transient feature extraction under complex noise conditions, and greater adaptability and flexibility in filter design. Through validation on multiple scenarios, including simulation signal and aero-engine fault signal, the superiority of IMVME is demonstrated through its ability to adaptively and accurately extract transient feature while maintaining robustness to noise and interference.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104284"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatio-temporal motion-aware intelligent robotic grasping with velocity estimation for moving objects 基于运动物体速度估计的时空运动感知智能机器人抓取
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-01 Epub Date: 2026-01-24 DOI: 10.1016/j.aei.2026.104367
Qing Jiao , Weifei Hu , Tingjie Wang , Geyu Shao , Ning Tang , Jiayi Wang , Long Fang
Dynamic grasping capabilities, i.e., grasping moving objects in unstructured environments, could render robotic systems more competitive in both industrial and daily life applications. However, previous studies mostly relied on restrictive assumptions, such as static objects subject to slight perturbations or pre-learned object motion patterns, which severely limited adaptability to unknown trajectories. While recent learning-based methods relax these assumptions, they prioritize object or grasp tracking to ensure smooth robot motion over future grasp pose prediction. The scarcity of dynamic grasp datasets further hinders the advancement of learning-based methods. To address these challenges, this paper presents a moving-object grasp prediction method based on Conv-T (Convolutional Transformer), a hierarchical architecture that fuses spatiotemporal features for motion-aware dynamic grasping. By integrating velocity estimation, this method models the dynamics of the latent motion trajectories from time-series depth images to predict future grasp poses. The Conv-T is built based on a proposed SLiding Window Multi-head Self-Attention (SLW-MSA) mechanism, which balances computational efficiency with performance by integrating the properties of convolutional operations and self-attention mechanisms. Additionally, a dynamic grasp dataset generation pipeline combining data synthesis with data expansion techniques is developed to efficiently embed temporal motion cues into the training data. The proposed method is validated on the constructed dynamic grasp datasets as well as in simulated and real‐world robotic environments. Experimental results demonstrate that our Conv-T-based method not only outperforms state-of-the-art networks on datasets but also exhibits superior robustness compared to other baselines when grasping moving objects.
动态抓取能力,即在非结构化环境中抓取移动物体,可以使机器人系统在工业和日常生活应用中更具竞争力。然而,以往的研究大多依赖于限制性假设,如静态物体受到轻微扰动或预先学习的物体运动模式,这严重限制了对未知轨迹的适应性。虽然最近基于学习的方法放松了这些假设,但它们优先考虑物体或抓取跟踪,以确保机器人在未来抓取姿势预测上的平滑运动。动态抓取数据集的缺乏进一步阻碍了基于学习的方法的发展。为了解决这些挑战,本文提出了一种基于卷积变换(convt)的运动物体抓取预测方法,这是一种融合时空特征的分层结构,用于运动感知动态抓取。该方法通过积分速度估计,对时间序列深度图像的潜在运动轨迹进行动力学建模,以预测未来的抓取姿势。该算法基于滑动窗口多头自注意(SLW-MSA)机制,通过集成卷积运算和自注意机制的特性,平衡了计算效率和性能。此外,开发了一种结合数据合成和数据扩展技术的动态抓取数据生成管道,以有效地将时间运动线索嵌入到训练数据中。在构建的动态抓取数据集以及模拟和现实机器人环境中验证了所提出的方法。实验结果表明,我们的基于卷积的方法不仅在数据集上优于最先进的网络,而且在抓取运动物体时,与其他基线相比,具有优越的鲁棒性。
{"title":"Spatio-temporal motion-aware intelligent robotic grasping with velocity estimation for moving objects","authors":"Qing Jiao ,&nbsp;Weifei Hu ,&nbsp;Tingjie Wang ,&nbsp;Geyu Shao ,&nbsp;Ning Tang ,&nbsp;Jiayi Wang ,&nbsp;Long Fang","doi":"10.1016/j.aei.2026.104367","DOIUrl":"10.1016/j.aei.2026.104367","url":null,"abstract":"<div><div>Dynamic grasping capabilities, i.e., grasping moving objects in unstructured environments, could render robotic systems more competitive in both industrial and daily life applications. However, previous studies mostly relied on restrictive assumptions, such as static objects subject to slight perturbations or pre-learned object motion patterns, which severely limited adaptability to unknown trajectories. While recent learning-based methods relax these assumptions, they prioritize object or grasp tracking to ensure smooth robot motion over future grasp pose prediction. The scarcity of dynamic grasp datasets further hinders the advancement of learning-based methods. To address these challenges, this paper presents a moving-object grasp prediction method based on Conv-T (Convolutional Transformer), a hierarchical architecture that fuses spatiotemporal features for motion-aware dynamic grasping. By integrating velocity estimation, this method models the dynamics of the latent motion trajectories from time-series depth images to predict future grasp poses. The Conv-T is built based on a proposed <strong>SL</strong>iding <strong>W</strong>indow <strong>M</strong>ulti-head <strong>S</strong>elf-<strong>A</strong>ttention (SLW-MSA) mechanism, which balances computational efficiency with performance by integrating the properties of convolutional operations and self-attention mechanisms. Additionally, a dynamic grasp dataset generation pipeline combining data synthesis with data expansion techniques is developed to efficiently embed temporal motion cues into the training data. The proposed method is validated on the constructed dynamic grasp datasets as well as in simulated and real‐world robotic environments. Experimental results demonstrate that our Conv-T-based method not only outperforms state-of-the-art networks on datasets but also exhibits superior robustness compared to other baselines when grasping moving objects.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104367"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards transparent object detection models for construction sites: explainable AI and error classification 面向建筑工地的透明物体检测模型:可解释的人工智能和错误分类
IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-01 Epub Date: 2025-12-18 DOI: 10.1016/j.aei.2025.104245
Junghoon Kim , Yue Gong , Seokho Chi , Jung In Kim , JoonOh Seo
Construction site monitoring is essential for ensuring projects are executed as planned and achieving goals in productivity, safety, and quality. However, traditional manual monitoring methods are time-consuming, error-prone, and lack scalability. Deep learning-based object detection offers a promising alternative, but its “black-box” nature hinders understanding of detection failures. This study proposes a Grad-CAM-based explainable AI framework to diagnose and classify detection errors systematically. The framework consists of three main processes: (1) defining major types of detection errors, (2) collecting failed images for each error type, and (3) developing a machine learning-based classification model using Grad-CAM features and detection metrics. Unlike previous approaches that relied on qualitative interpretations, this study converts Grad-CAM heatmaps into quantitative features (e.g., GT influence ratio, activation-to-box distance, cluster counts), enabling automated error classification. Errors were categorized into abnormal viewpoint, small size, occlusion, complex background, and lighting variation, achieving 94% classification accuracy on synthetic data, 85% on real images, and 88% on AI-generated data. This framework enhances transparency and interpretability while supporting model optimization and adaptive deployment for real-world construction site applications.
施工现场监控对于确保项目按计划执行和实现生产力、安全和质量目标至关重要。然而,传统的人工监控方法耗时长、容易出错,而且缺乏可伸缩性。基于深度学习的对象检测提供了一个很有前途的替代方案,但其“黑箱”性质阻碍了对检测失败的理解。本研究提出了一个基于grad- cam的可解释AI框架,用于系统地诊断和分类检测错误。该框架包括三个主要过程:(1)定义检测错误的主要类型,(2)收集每种错误类型的失败图像,以及(3)使用Grad-CAM特征和检测指标开发基于机器学习的分类模型。与以往依赖定性解释的方法不同,本研究将Grad-CAM热图转换为定量特征(例如,GT影响比、激活盒距离、聚类计数),从而实现自动错误分类。将错误分类为异常视点、小尺寸、遮挡、复杂背景和光照变化,在合成数据上的分类准确率为94%,在真实图像上的分类准确率为85%,在人工智能生成数据上的分类准确率为88%。该框架增强了透明度和可解释性,同时支持模型优化和实际建筑工地应用程序的自适应部署。
{"title":"Towards transparent object detection models for construction sites: explainable AI and error classification","authors":"Junghoon Kim ,&nbsp;Yue Gong ,&nbsp;Seokho Chi ,&nbsp;Jung In Kim ,&nbsp;JoonOh Seo","doi":"10.1016/j.aei.2025.104245","DOIUrl":"10.1016/j.aei.2025.104245","url":null,"abstract":"<div><div>Construction site monitoring is essential for ensuring projects are executed as planned and achieving goals in productivity, safety, and quality. However, traditional manual monitoring methods are time-consuming, error-prone, and lack scalability. Deep learning-based object detection offers a promising alternative, but its “black-box” nature hinders understanding of detection failures. This study proposes a Grad-CAM-based explainable AI framework to diagnose and classify detection errors systematically. The framework consists of three main processes: (1) defining major types of detection errors, (2) collecting failed images for each error type, and (3) developing a machine learning-based classification model using Grad-CAM features and detection metrics. Unlike previous approaches that relied on qualitative interpretations, this study converts Grad-CAM heatmaps into quantitative features (e.g., GT influence ratio, activation-to-box distance, cluster counts), enabling automated error classification. Errors were categorized into abnormal viewpoint, small size, occlusion, complex background, and lighting variation, achieving 94% classification accuracy on synthetic data, 85% on real images, and 88% on AI-generated data. This framework enhances transparency and interpretability while supporting model optimization and adaptive deployment for real-world construction site applications.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104245"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145792164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Advanced Engineering Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1