Pub Date : 2026-04-01Epub Date: 2026-02-06DOI: 10.1016/j.aei.2026.104423
Xiaosheng Ni , Jingpu Duan , Xiong Li , Xin Zhang
To address the challenges in Vehicle-to-Infrastructure (V2I) network traffic prediction, this study proposes an innovative solution. We first establish a novel paradigm that integrates physical models to systematically convert publicly available vehicle trajectory data into V2I traffic data. On this basis, a gCNN–BiLSTM–MHA deep learning model is constructed, whose core advantage lies in its use of a lightweight GhostNet-based convolutional network (gCNN) to improve computational efficiency, while leveraging the synergistic effect of a bidirectional long short-term memory network (BiLSTM) and a multi-head attention mechanism (MHA) to effectively balance prediction efficiency and accuracy. The model’s superiority is comprehensively validated: compared to baseline models like LSTM, it demonstrates significant advantages across a series of key evaluation metrics — including running time, MBD, MAE, MAPE, RMSE, and — achieving an overall balanced performance. Furthermore, the model exhibits excellent performance on multiple benchmark datasets, confirming its strong robustness and high applicability for complex V2I network traffic prediction tasks.
{"title":"A novel hybrid neural network for high-accuracy vehicle-to-infrastructure network traffic prediction","authors":"Xiaosheng Ni , Jingpu Duan , Xiong Li , Xin Zhang","doi":"10.1016/j.aei.2026.104423","DOIUrl":"10.1016/j.aei.2026.104423","url":null,"abstract":"<div><div>To address the challenges in Vehicle-to-Infrastructure (V2I) network traffic prediction, this study proposes an innovative solution. We first establish a novel paradigm that integrates physical models to systematically convert publicly available vehicle trajectory data into V2I traffic data. On this basis, a gCNN–BiLSTM–MHA deep learning model is constructed, whose core advantage lies in its use of a lightweight GhostNet-based convolutional network (gCNN) to improve computational efficiency, while leveraging the synergistic effect of a bidirectional long short-term memory network (BiLSTM) and a multi-head attention mechanism (MHA) to effectively balance prediction efficiency and accuracy. The model’s superiority is comprehensively validated: compared to baseline models like LSTM, it demonstrates significant advantages across a series of key evaluation metrics — including running time, MBD, MAE, MAPE, RMSE, and <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> — achieving an overall balanced performance. Furthermore, the model exhibits excellent performance on multiple benchmark datasets, confirming its strong robustness and high applicability for complex V2I network traffic prediction tasks.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104423"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146188825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-13DOI: 10.1016/j.aei.2026.104326
Sang Du , Lei Hou , Guomin (Kevin) Zhang , Yang Zou , Haosen Chen
Modular building design requires numerous context-dependent component variants that traditional constraint-based methods cannot exhaustively enumerate. Industry Foundation Classes (IFC) models encode rich spatial and semantic context from completed modular projects. This context could enable Artificial Intelligence (AI) models to generate component variants and complement constraint-based methods. However, IFC 3D geometry that carries spatial context is not directly usable by AI models. This stems from IFC’s complex data structure. To address this limitation, this paper proposes a readily deployable auto-decoder method that produces AI-compatible vectors from IFC geometry. First, an IFC export strategy that retains component spatial context is employed. Second, a sampling method that pairs 3D points with their distances to the nearest surface is applied. Third, an auto-decoder neural network that jointly optimises per-component vectors and the model weights is presented, yielding context-aware representation vectors for modular components. Finally, an octree-based decoder for accurate geometry recovery from vectors is employed. Experiments on real-world modular project data demonstrate that the resulting vectors preserve geometric fidelity and support component variant generation. Geometric fidelity is confirmed by the mean and maximum surface reconstruction errors of 14.57 mm and 51.94 mm, sufficient for modular building design analysis. Support for component variant generation is evidenced by geometric interpolation linearity exceeding 0.98 out of 1, showing excellent variant generation suitability. This method makes IFC spatial context accessible to AI-driven modular design methods, transforming Design for Manufacture and Assembly (DfMA) data into actionable knowledge. Codes available on GitHub.
模块化建筑设计需要大量与上下文相关的组件变体,而传统的基于约束的方法无法详尽地列举这些变体。工业基础类(IFC)模型从已完成的模块化项目中编码丰富的空间和语义上下文。该上下文可以使人工智能(AI)模型生成组件变体并补充基于约束的方法。然而,带有空间背景的IFC 3D几何图形不能直接用于AI模型。这源于IFC复杂的数据结构。为了解决这一限制,本文提出了一种易于部署的自动解码器方法,该方法可以从IFC几何形状中产生与ai兼容的向量。首先,采用了保留组件空间上下文的IFC出口策略。其次,采用一种将三维点与其最近表面的距离配对的采样方法。第三,提出了一种自动解码器神经网络,该网络联合优化每个组件向量和模型权重,生成模块化组件的上下文感知表示向量。最后,采用基于八叉树的解码器对矢量进行精确的几何恢复。在实际模块化工程数据上的实验表明,所得到的向量保持了几何保真度,并支持组件变体的生成。平均表面重构误差为14.57 mm,最大表面重构误差为51.94 mm,证实了几何保真度,足以进行模块化建筑设计分析。几何插补线性度超过0.98 (out of 1),显示出良好的变量生成适宜性。这种方法使人工智能驱动的模块化设计方法可以访问IFC的空间背景,将制造和装配设计(DfMA)数据转化为可操作的知识。代码可在GitHub。
{"title":"Enabling AI-driven modular building design: an auto-decoder approach for IFC 3D geometry representation","authors":"Sang Du , Lei Hou , Guomin (Kevin) Zhang , Yang Zou , Haosen Chen","doi":"10.1016/j.aei.2026.104326","DOIUrl":"10.1016/j.aei.2026.104326","url":null,"abstract":"<div><div>Modular building design requires numerous context-dependent component variants that traditional constraint-based methods cannot exhaustively enumerate. Industry Foundation Classes (IFC) models encode rich spatial and semantic context from completed modular projects. This context could enable Artificial Intelligence (AI) models to generate component variants and complement constraint-based methods. However, IFC 3D geometry that carries spatial context is not directly usable by AI models. This stems from IFC’s complex data structure. To address this limitation, this paper proposes a readily deployable auto-decoder method that produces AI-compatible vectors from IFC geometry. First, an IFC export strategy that retains component spatial context is employed. Second, a sampling method that pairs 3D points with their distances to the nearest surface is applied. Third, an auto-decoder neural network that jointly optimises per-component vectors and the model weights is presented, yielding context-aware representation vectors for modular components. Finally, an octree-based decoder for accurate geometry recovery from vectors is employed. Experiments on real-world modular project data demonstrate that the resulting vectors preserve geometric fidelity and support component variant generation. Geometric fidelity is confirmed by the mean and maximum surface reconstruction errors of 14.57 mm and 51.94 mm, sufficient for modular building design analysis. Support for component variant generation is evidenced by geometric interpolation linearity exceeding 0.98 out of 1, showing excellent variant generation suitability. This method makes IFC spatial context accessible to AI-driven modular design methods, transforming Design for Manufacture and Assembly (DfMA) data into actionable knowledge. Codes available on GitHub.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104326"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-10DOI: 10.1016/j.aei.2025.104299
Wentong Guo , Wenzhu Xu , Chengcheng Yang , Zhijian Zhao , Xi Gao , WenBin Yao , Sheng Jin
Urban traffic accidents result in significant casualties and property losses. Conducting traffic risk mapping and inference for urban areas provides substantial benefits for accident prevention as well as future planning and governance. However, pixel-level fine-grained inference of urban traffic risk maps remains challenging, primarily due to the complex layout of urban road networks, the temporal variability of traffic dynamics, and the heterogeneity of spatial semantic information. In this study, we propose an end-to-end Context-Aware Risk Feature Perception and Inference Network (CRFPI-Net) based on multimodal data to achieve fine-grained inference of urban traffic risk maps. In CRFPI-Net, three separate branches are designed to capture risk features from satellite remote sensing imagery, spatiotemporal traffic sequences, and area-of-interest (AOI) semantic information. The risk-aware features from each branch are integrated using a gated fusion mechanism to eliminate redundant information, and the fused features are further processed by context-aware multi-scale correlation analysis to reduce the adverse impact of heterogeneous variations in risk regions on risk perception. Finally, CRFPI-Net produces pixel-level inference maps of urban traffic accident risk, enabling effective and low-cost guidance for traffic accident prevention. The proposed model is quantitatively evaluated on real-world datasets and achieves state-of-the-art performance. Ablation experiments further demonstrate the rationality and effectiveness of the designed modules. The code and pretrained models for urban traffic risk mapping are publicly available at https://github.com/gwt-ZJU/CRFPI-Net.
{"title":"CRFPI-Net: context-aware risk feature perception and inference network for pixel-level urban traffic risk mapping","authors":"Wentong Guo , Wenzhu Xu , Chengcheng Yang , Zhijian Zhao , Xi Gao , WenBin Yao , Sheng Jin","doi":"10.1016/j.aei.2025.104299","DOIUrl":"10.1016/j.aei.2025.104299","url":null,"abstract":"<div><div>Urban traffic accidents result in significant casualties and property losses. Conducting traffic risk mapping and inference for urban areas provides substantial benefits for accident prevention as well as future planning and governance. However, pixel-level fine-grained inference of urban traffic risk maps remains challenging, primarily due to the complex layout of urban road networks, the temporal variability of traffic dynamics, and the heterogeneity of spatial semantic information. In this study, we propose an end-to-end Context-Aware Risk Feature Perception and Inference Network (CRFPI-Net) based on multimodal data to achieve fine-grained inference of urban traffic risk maps. In CRFPI-Net, three separate branches are designed to capture risk features from satellite remote sensing imagery, spatiotemporal traffic sequences, and area-of-interest (AOI) semantic information. The risk-aware features from each branch are integrated using a gated fusion mechanism to eliminate redundant information, and the fused features are further processed by context-aware multi-scale correlation analysis to reduce the adverse impact of heterogeneous variations in risk regions on risk perception. Finally, CRFPI-Net produces pixel-level inference maps of urban traffic accident risk, enabling effective and low-cost guidance for traffic accident prevention. The proposed model is quantitatively evaluated on real-world datasets and achieves state-of-the-art performance. Ablation experiments further demonstrate the rationality and effectiveness of the designed modules. The code and pretrained models for urban traffic risk mapping are publicly available at <span><span>https://github.com/gwt-ZJU/CRFPI-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104299"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-14DOI: 10.1016/j.aei.2026.104339
Yihui Shan , Wei Li , Jiaqi Shi , Yansong Wang , Zhenzhen Xing , Jiangang Ding , Lili Pei
Accurate and efficient perception of pavement conditions is essential for maintaining transportation infrastructure and ensuring driving safety. In real-world road environments, visual and depth data collected by inspection or autonomous vehicles are often affected by modality imbalance, sensor noise, and image degradation, which compromise the reliability of defect segmentation. To address these challenges, this study proposes a cross-modal segmentation framework that integrates depth-guided fusion with generative latent feature enhancement to achieve robust pavement defect perception under diverse conditions. A defect-centric and class-aware depth prompting strategy is developed to transform geometric priors into explicit guidance for the intensity stream, enabling background suppression before encoding and boundary refinement within intermediate layers. In parallel, a latent feature enhancement module aligns the Segment Anything Model (SAM) feature space with a pretrained diffusion latent space and performs efficient one-step denoising, restoring structural consistency while avoiding the heavy overhead of iterative diffusion sampling. The overall design preserves the efficiency and generalization of SAM while introducing lightweight trainable adapters and low-rank diffusion updates. Experimental evaluations on multimodal pavement datasets demonstrate that the proposed approach achieves higher segmentation accuracy and robustness compared with state-of-the-art fusion methods. The results highlight the potential of the proposed framework to support intelligent pavement inspection, condition assessment, and maintenance decision-making.
{"title":"Depth-guided cross-modal fusion and diffusion-based enhancement for robust pavement defect segmentation","authors":"Yihui Shan , Wei Li , Jiaqi Shi , Yansong Wang , Zhenzhen Xing , Jiangang Ding , Lili Pei","doi":"10.1016/j.aei.2026.104339","DOIUrl":"10.1016/j.aei.2026.104339","url":null,"abstract":"<div><div>Accurate and efficient perception of pavement conditions is essential for maintaining transportation infrastructure and ensuring driving safety. In real-world road environments, visual and depth data collected by inspection or autonomous vehicles are often affected by modality imbalance, sensor noise, and image degradation, which compromise the reliability of defect segmentation. To address these challenges, this study proposes a cross-modal segmentation framework that integrates depth-guided fusion with generative latent feature enhancement to achieve robust pavement defect perception under diverse conditions. A defect-centric and class-aware depth prompting strategy is developed to transform geometric priors into explicit guidance for the intensity stream, enabling background suppression before encoding and boundary refinement within intermediate layers. In parallel, a latent feature enhancement module aligns the Segment Anything Model (SAM) feature space with a pretrained diffusion latent space and performs efficient one-step denoising, restoring structural consistency while avoiding the heavy overhead of iterative diffusion sampling. The overall design preserves the efficiency and generalization of SAM while introducing lightweight trainable adapters and low-rank diffusion updates. Experimental evaluations on multimodal pavement datasets demonstrate that the proposed approach achieves higher segmentation accuracy and robustness compared with state-of-the-art fusion methods. The results highlight the potential of the proposed framework to support intelligent pavement inspection, condition assessment, and maintenance decision-making.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104339"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-08DOI: 10.1016/j.aei.2026.104312
Yuntao Zou , Zihui Lin , Qianqi Zhang , Zhichun Liu , Zeling Xu
The battery energy consumption system of lunar exploration rovers, as mission-critical equipment, confronts severe challenges under extreme environmental constraints. However, existing modeling methods face fundamental dilemmas: dynamic uncertainty leads to highly ambiguous constraint boundaries, making it difficult for traditional mathematical languages to describe complex coupling relationships; even when mathematical representations are constructed, high-dimensional nonlinear optimization problems become computationally intractable, with existing algorithms unable to address complexity barriers and lacking interpretability. In response to these challenges, this paper innovatively proposes a hierarchical Stackelberg game optimization framework based on semantic embedding. This framework transcends traditional optimization paradigms by deeply integrating the cognitive intelligence of large language models with the mathematical precision of game theory: large language models acknowledge that overall behavior cannot be predicted from simple combinations of parts, processing fuzzy constraints and cross-domain knowledge integration through semantic understanding; the hierarchical structure of Stackelberg games naturally adapts to the hierarchical decision-making requirements of battery allocation, with multi-agent game frameworks effectively handling coordination and competition relationships between batteries. Through semantic embedding technology, natural language constraints are automatically transformed into mathematical objects comprehensible to game participants, with cognitive intelligence handling the “incomputable” complexity components while game theory ensures “provable” mathematical convergence, synergistically achieving the important paradigm transition from “perfect rationality” to “bounded rationality,” thereby providing a theoretically rigorous and practically viable unified solution for intelligent decision-making in mission-critical systems.
{"title":"Large language models enable semantic-guided hierarchical games for intelligent battery coordination","authors":"Yuntao Zou , Zihui Lin , Qianqi Zhang , Zhichun Liu , Zeling Xu","doi":"10.1016/j.aei.2026.104312","DOIUrl":"10.1016/j.aei.2026.104312","url":null,"abstract":"<div><div>The battery energy consumption system of lunar exploration rovers, as mission-critical equipment, confronts severe challenges under extreme environmental constraints. However, existing modeling methods face fundamental dilemmas: dynamic uncertainty leads to highly ambiguous constraint boundaries, making it difficult for traditional mathematical languages to describe complex coupling relationships; even when mathematical representations are constructed, high-dimensional nonlinear optimization problems become computationally intractable, with existing algorithms unable to address complexity barriers and lacking interpretability. In response to these challenges, this paper innovatively proposes a hierarchical Stackelberg game optimization framework based on semantic embedding. This framework transcends traditional optimization paradigms by deeply integrating the cognitive intelligence of large language models with the mathematical precision of game theory: large language models acknowledge that overall behavior cannot be predicted from simple combinations of parts, processing fuzzy constraints and cross-domain knowledge integration through semantic understanding; the hierarchical structure of Stackelberg games naturally adapts to the hierarchical decision-making requirements of battery allocation, with multi-agent game frameworks effectively handling coordination and competition relationships between batteries. Through semantic embedding technology, natural language constraints are automatically transformed into mathematical objects comprehensible to game participants, with cognitive intelligence handling the “incomputable” complexity components while game theory ensures “provable” mathematical convergence, synergistically achieving the important paradigm transition from “perfect rationality” to “bounded rationality,” thereby providing a theoretically rigorous and practically viable unified solution for intelligent decision-making in mission-critical systems.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104312"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145926893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-06DOI: 10.1016/j.aei.2025.104294
Wenbin Gu , Yushang Cao , Yuxin Li , Nuandong Li , Lei Wang , Na Tang , Minghai Yuan , Fengque Pei
With the emergence of personalized and small-batch production modes, multi-agent manufacturing systems (MAMS) have become a research hotspot for intelligent workshop owing to their self‑organizing capabilities. The hybrid flow shop scheduling problem with unrelated parallel machines (HFSP-UPM) presents significant decision-making challenges due to its heterogeneous resources and dynamic environment. Meanwhile, multi-agent deep reinforcement learning (MADRL) is a prevalent method for addressing complex decision‑making problems. Therefore, this paper proposes a pre-trained large language model (LLM) empowered MADRL method for HFSP-UPM considering stage-wise coordination to minimize the makespan. Specifically, a novel MAMS is developed first, where each processing stage is modeled as an agent to enable high autonomy and reduce decision dimensionality. Then, a multi-agent collaborative scheduling framework based on the centralized training with decentralized execution paradigm (CTDE) is proposed, and the communication mechanism among agents is proposed to promote coordination and collaboration. Through structured prompt engineering, an LLM empowered state space and action selection are designed to enhance semantic understanding and policy updates. Finally, the LLM empowered multi-agent proximal policy optimization (LLM-MAPPO) is employed to train the scheduling model. Experimental results on 330 instances show the superiority of the proposed method over scheduling rules, genetic programming (GP) rules, several advanced DRL-based methods, as well as the baseline MAPPO, achieving over 8% performance improvement in most instances. Furthermore, the generalization experiment demonstrates that the proposed method has self-adjustment capability in response to production scenario changes, and an example verification is provided to verify the proposed method and the experiment platform.
{"title":"Large language model-empowered dynamic scheduling for intelligent hybrid flow shop using multi-agent deep reinforcement learning","authors":"Wenbin Gu , Yushang Cao , Yuxin Li , Nuandong Li , Lei Wang , Na Tang , Minghai Yuan , Fengque Pei","doi":"10.1016/j.aei.2025.104294","DOIUrl":"10.1016/j.aei.2025.104294","url":null,"abstract":"<div><div>With the emergence of personalized and small-batch production modes, multi-agent manufacturing systems (MAMS) have become a research hotspot for intelligent workshop owing to their self‑organizing capabilities. The hybrid flow shop scheduling problem with unrelated parallel machines (HFSP-UPM) presents significant decision-making challenges due to its heterogeneous resources and dynamic environment. Meanwhile, multi-agent deep reinforcement learning (MADRL) is a prevalent method for addressing complex decision‑making problems. Therefore, this paper proposes a pre-trained large language model (LLM) empowered MADRL method for HFSP-UPM considering stage-wise coordination to minimize the makespan. Specifically, a novel MAMS is developed first, where each processing stage is modeled as an agent to enable high autonomy and reduce decision dimensionality. Then, a multi-agent collaborative scheduling framework based on the centralized training with decentralized execution paradigm (CTDE) is proposed, and the communication mechanism among agents is proposed to promote coordination and collaboration. Through structured prompt engineering, an LLM empowered state space and action selection are designed to enhance semantic understanding and policy updates. Finally, the LLM empowered multi-agent proximal policy optimization (LLM-MAPPO) is employed to train the scheduling model. Experimental results on 330 instances show the superiority of the proposed method over scheduling rules, genetic programming (GP) rules, several advanced DRL-based methods, as well as the baseline MAPPO, achieving over 8% performance improvement in most instances. Furthermore, the generalization experiment demonstrates that the proposed method has self-adjustment capability in response to production scenario changes, and an example verification is provided to verify the proposed method and the experiment platform.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104294"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2025-12-15DOI: 10.1016/j.aei.2025.104203
Pan Zaolin , Yu Yantao
Proactively anticipating human behavior is crucial for assistive robots collaborating with humans in dynamic environments like construction. Given a short visual observation (e.g., 3 s), we aim to predict long-term future behavior sequences for all workers to enable timely and context-aware robot assistance. The key technical challenge facing is uncertain, multimodal worker behaviors: multiple socially plausible future behaviors can arise from a limited context. To address this challenge, we propose an uncertainty-aware framework integrating: (1) Cross-granular hyperbolic learning, mitigating ambiguity by predicting high-level shared goals when low-level predictions are uncertain, and (2) Explicit task-constraint integration, ensuring predictions are socially consistent and contextually viable. Validated on real-world scaffolding assembly video and human-robot collaborative board games, our approach eliminates task-dependency violations, reduces task completion time and human-robot task conflicts, enabling smoother, robust human-robot collaboration. Our primary contribution is an uncertainty-aware model for socially consistent, long-horizon multi-worker behavior prediction in construction.
{"title":"Uncertainty-Aware Long-Term Multi-Worker Behavior Anticipation for Proactive Human-Robot Collaboration in Construction","authors":"Pan Zaolin , Yu Yantao","doi":"10.1016/j.aei.2025.104203","DOIUrl":"10.1016/j.aei.2025.104203","url":null,"abstract":"<div><div>Proactively anticipating human behavior is crucial for assistive robots collaborating with humans in dynamic environments like construction. Given a short visual observation (e.g., 3 s), we aim to predict long-term future behavior sequences for all workers to enable timely and context-aware robot assistance. The key technical challenge facing is uncertain, multimodal worker behaviors: multiple socially plausible future behaviors can arise from a limited context. To address this challenge, we propose an uncertainty-aware framework integrating: (1) Cross-granular hyperbolic learning, mitigating ambiguity by predicting high-level shared goals when low-level predictions are uncertain, and (2) Explicit task-constraint integration, ensuring predictions are socially consistent and contextually viable. Validated on real-world scaffolding assembly video and human-robot collaborative board games, our approach eliminates task-dependency violations, reduces task completion time and human-robot task conflicts, enabling smoother, robust human-robot collaboration. Our primary contribution is an uncertainty-aware model for socially consistent, long-horizon multi-worker behavior prediction in construction.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104203"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145750224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the complex structure and harsh operating conditions of aero-engine, the transient feature caused by bearing damage can be easily masked by various interference noises, which brings great challenges to transient feature extraction and aero-engine fault diagnosis. Although optimal bandpass filtering methods are widely used for transient feature extraction, it cannot effectively suppress in-band noise, which diminishes the sensitivity of feature indicators and may cause extraction failure. To address above challenges, this paper proposes a noise-robust transient feature nonlinear extraction approach called iteratively modified variational mode extraction (IMVME). Initially, a noise-robust filter, the enhanced variational Wiener filter (EVWF) is proposed. EVWF performs narrowband demodulation while suppressing in-band noise through amplitude reconstruction, thereby enhancing local transient feature and facilitating the weak transient extraction. Subsequently, the modulation spectral density function (MSDF) is introduced as a feature indicator to distinguish fault transient from interference noise and to guide the EVWF in selecting the optimal magnitude order. Finally, IMVME adopts an adaptive filter parameter iterative optimization framework to solve the optimal EVWF by maximizing MSDF, thereby enabling more accurate fault frequency band localization, robust transient feature extraction under complex noise conditions, and greater adaptability and flexibility in filter design. Through validation on multiple scenarios, including simulation signal and aero-engine fault signal, the superiority of IMVME is demonstrated through its ability to adaptively and accurately extract transient feature while maintaining robustness to noise and interference.
{"title":"Iteratively modified variational mode extraction (IMVME): A noise-robust transient feature nonlinear extraction approach for aero-engine fault diagnosis","authors":"Duxi Shang , Rui Yuan , Yong Lv , Hongan Wu , Hengyu Liu , Zhuyun Chen","doi":"10.1016/j.aei.2025.104284","DOIUrl":"10.1016/j.aei.2025.104284","url":null,"abstract":"<div><div>Due to the complex structure and harsh operating conditions of aero-engine, the transient feature caused by bearing damage can be easily masked by various interference noises, which brings great challenges to transient feature extraction and aero-engine fault diagnosis. Although optimal bandpass filtering methods are widely used for transient feature extraction, it cannot effectively suppress in-band noise, which diminishes the sensitivity of feature indicators and may cause extraction failure. To address above challenges, this paper proposes a noise-robust transient feature nonlinear extraction approach called iteratively modified variational mode extraction (IMVME). Initially, a noise-robust filter, the enhanced variational Wiener filter (EVWF) is proposed. EVWF performs narrowband demodulation while suppressing in-band noise through amplitude reconstruction, thereby enhancing local transient feature and facilitating the weak transient extraction. Subsequently, the modulation spectral density function (MSDF) is introduced as a feature indicator to distinguish fault transient from interference noise and to guide the EVWF in selecting the optimal magnitude order. Finally, IMVME adopts an adaptive filter parameter iterative optimization framework to solve the optimal EVWF by maximizing MSDF, thereby enabling more accurate fault frequency band localization, robust transient feature extraction under complex noise conditions, and greater adaptability and flexibility in filter design. Through validation on multiple scenarios, including simulation signal and aero-engine fault signal, the superiority of IMVME is demonstrated through its ability to adaptively and accurately extract transient feature while maintaining robustness to noise and interference.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104284"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2026-01-24DOI: 10.1016/j.aei.2026.104367
Qing Jiao , Weifei Hu , Tingjie Wang , Geyu Shao , Ning Tang , Jiayi Wang , Long Fang
Dynamic grasping capabilities, i.e., grasping moving objects in unstructured environments, could render robotic systems more competitive in both industrial and daily life applications. However, previous studies mostly relied on restrictive assumptions, such as static objects subject to slight perturbations or pre-learned object motion patterns, which severely limited adaptability to unknown trajectories. While recent learning-based methods relax these assumptions, they prioritize object or grasp tracking to ensure smooth robot motion over future grasp pose prediction. The scarcity of dynamic grasp datasets further hinders the advancement of learning-based methods. To address these challenges, this paper presents a moving-object grasp prediction method based on Conv-T (Convolutional Transformer), a hierarchical architecture that fuses spatiotemporal features for motion-aware dynamic grasping. By integrating velocity estimation, this method models the dynamics of the latent motion trajectories from time-series depth images to predict future grasp poses. The Conv-T is built based on a proposed SLiding Window Multi-head Self-Attention (SLW-MSA) mechanism, which balances computational efficiency with performance by integrating the properties of convolutional operations and self-attention mechanisms. Additionally, a dynamic grasp dataset generation pipeline combining data synthesis with data expansion techniques is developed to efficiently embed temporal motion cues into the training data. The proposed method is validated on the constructed dynamic grasp datasets as well as in simulated and real‐world robotic environments. Experimental results demonstrate that our Conv-T-based method not only outperforms state-of-the-art networks on datasets but also exhibits superior robustness compared to other baselines when grasping moving objects.
{"title":"Spatio-temporal motion-aware intelligent robotic grasping with velocity estimation for moving objects","authors":"Qing Jiao , Weifei Hu , Tingjie Wang , Geyu Shao , Ning Tang , Jiayi Wang , Long Fang","doi":"10.1016/j.aei.2026.104367","DOIUrl":"10.1016/j.aei.2026.104367","url":null,"abstract":"<div><div>Dynamic grasping capabilities, i.e., grasping moving objects in unstructured environments, could render robotic systems more competitive in both industrial and daily life applications. However, previous studies mostly relied on restrictive assumptions, such as static objects subject to slight perturbations or pre-learned object motion patterns, which severely limited adaptability to unknown trajectories. While recent learning-based methods relax these assumptions, they prioritize object or grasp tracking to ensure smooth robot motion over future grasp pose prediction. The scarcity of dynamic grasp datasets further hinders the advancement of learning-based methods. To address these challenges, this paper presents a moving-object grasp prediction method based on Conv-T (Convolutional Transformer), a hierarchical architecture that fuses spatiotemporal features for motion-aware dynamic grasping. By integrating velocity estimation, this method models the dynamics of the latent motion trajectories from time-series depth images to predict future grasp poses. The Conv-T is built based on a proposed <strong>SL</strong>iding <strong>W</strong>indow <strong>M</strong>ulti-head <strong>S</strong>elf-<strong>A</strong>ttention (SLW-MSA) mechanism, which balances computational efficiency with performance by integrating the properties of convolutional operations and self-attention mechanisms. Additionally, a dynamic grasp dataset generation pipeline combining data synthesis with data expansion techniques is developed to efficiently embed temporal motion cues into the training data. The proposed method is validated on the constructed dynamic grasp datasets as well as in simulated and real‐world robotic environments. Experimental results demonstrate that our Conv-T-based method not only outperforms state-of-the-art networks on datasets but also exhibits superior robustness compared to other baselines when grasping moving objects.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104367"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2025-12-18DOI: 10.1016/j.aei.2025.104245
Junghoon Kim , Yue Gong , Seokho Chi , Jung In Kim , JoonOh Seo
Construction site monitoring is essential for ensuring projects are executed as planned and achieving goals in productivity, safety, and quality. However, traditional manual monitoring methods are time-consuming, error-prone, and lack scalability. Deep learning-based object detection offers a promising alternative, but its “black-box” nature hinders understanding of detection failures. This study proposes a Grad-CAM-based explainable AI framework to diagnose and classify detection errors systematically. The framework consists of three main processes: (1) defining major types of detection errors, (2) collecting failed images for each error type, and (3) developing a machine learning-based classification model using Grad-CAM features and detection metrics. Unlike previous approaches that relied on qualitative interpretations, this study converts Grad-CAM heatmaps into quantitative features (e.g., GT influence ratio, activation-to-box distance, cluster counts), enabling automated error classification. Errors were categorized into abnormal viewpoint, small size, occlusion, complex background, and lighting variation, achieving 94% classification accuracy on synthetic data, 85% on real images, and 88% on AI-generated data. This framework enhances transparency and interpretability while supporting model optimization and adaptive deployment for real-world construction site applications.
{"title":"Towards transparent object detection models for construction sites: explainable AI and error classification","authors":"Junghoon Kim , Yue Gong , Seokho Chi , Jung In Kim , JoonOh Seo","doi":"10.1016/j.aei.2025.104245","DOIUrl":"10.1016/j.aei.2025.104245","url":null,"abstract":"<div><div>Construction site monitoring is essential for ensuring projects are executed as planned and achieving goals in productivity, safety, and quality. However, traditional manual monitoring methods are time-consuming, error-prone, and lack scalability. Deep learning-based object detection offers a promising alternative, but its “black-box” nature hinders understanding of detection failures. This study proposes a Grad-CAM-based explainable AI framework to diagnose and classify detection errors systematically. The framework consists of three main processes: (1) defining major types of detection errors, (2) collecting failed images for each error type, and (3) developing a machine learning-based classification model using Grad-CAM features and detection metrics. Unlike previous approaches that relied on qualitative interpretations, this study converts Grad-CAM heatmaps into quantitative features (e.g., GT influence ratio, activation-to-box distance, cluster counts), enabling automated error classification. Errors were categorized into abnormal viewpoint, small size, occlusion, complex background, and lighting variation, achieving 94% classification accuracy on synthetic data, 85% on real images, and 88% on AI-generated data. This framework enhances transparency and interpretability while supporting model optimization and adaptive deployment for real-world construction site applications.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"71 ","pages":"Article 104245"},"PeriodicalIF":9.9,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145792164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}