Bridge Expansion Joints (BEJs) are crucial for bridge safety, yet their acoustic signals are complex and easily disturbed by traffic noise, limiting traditional identification accuracy. To address this, an intelligent monitoring system based on voiceprint features and deep learning is developed. Its key contributions include: (1) a cloud-edge collaborative voiceprint monitoring device that integrates audio sampling, embedded processing, cloud server and wireless transmission, enabling long-term data collection and remote diagnosis under noisy environments; (2) the use of first- and second-order differential Mel Frequency Cepstral Coefficients (MFCC) for feature extraction, improving discriminability; and (3) the Hybrid Attention Fusion Network (HAFNet), built on a pre-trained convolutional backbone with multi-scale attention, achieving high-precision recognition of typical BEJ faults, with testing accuracies of 97.99% and 99.00% for two vehicle types. Field experiments demonstrate the system's stability, reliability, and feasibility for real-time BEJ monitoring.
{"title":"Automated diagnosis of bridge expansion joint defects using voiceprint features and deep learning","authors":"Yixuan Chen , Hongzhe Zhao , Yichao Xu , Yufeng Zhang , Jian Zhang","doi":"10.1016/j.autcon.2025.106739","DOIUrl":"10.1016/j.autcon.2025.106739","url":null,"abstract":"<div><div>Bridge Expansion Joints (BEJs) are crucial for bridge safety, yet their acoustic signals are complex and easily disturbed by traffic noise, limiting traditional identification accuracy. To address this, an intelligent monitoring system based on voiceprint features and deep learning is developed. Its key contributions include: (1) a cloud-edge collaborative voiceprint monitoring device that integrates audio sampling, embedded processing, cloud server and wireless transmission, enabling long-term data collection and remote diagnosis under noisy environments; (2) the use of first- and second-order differential Mel Frequency Cepstral Coefficients (MFCC) for feature extraction, improving discriminability; and (3) the Hybrid Attention Fusion Network (HAFNet), built on a pre-trained convolutional backbone with multi-scale attention, achieving high-precision recognition of typical BEJ faults, with testing accuracies of 97.99% and 99.00% for two vehicle types. Field experiments demonstrate the system's stability, reliability, and feasibility for real-time BEJ monitoring.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106739"},"PeriodicalIF":11.5,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.autcon.2026.106792
Jinxin Yi , Xuan Kong , Hao Tang , Jie Zhang , Zhenming Chen , Lu Deng
Recent advances in computer vision have provided new solutions for intelligent welding. However, existing vision-based weld seam extraction techniques exhibit limited adaptability to various workpieces in unstructured environments. Therefore, this paper proposes a three-dimensional vision-based method tailored for weld seam extraction and path generation. The proposed method synergizes a deep learning-based point cloud segmentation technique with an improved multi-scale point cloud registration algorithm to reconstruct the complete point cloud model of all weld regions in the workpieces. Subsequently, the welding paths and torch poses are calculated using an optimized multi-plane fitting algorithm integrated with geometry model of weld seam. Experimental validation on four workpieces demonstrates that the proposed method achieves good accuracy and outperforms the existing techniques in terms of efficiency and applicability, offering a robust solution for automated welding of steel structures.
{"title":"Weld seam extraction and path generation for robotic welding of steel structures based on 3D vision","authors":"Jinxin Yi , Xuan Kong , Hao Tang , Jie Zhang , Zhenming Chen , Lu Deng","doi":"10.1016/j.autcon.2026.106792","DOIUrl":"10.1016/j.autcon.2026.106792","url":null,"abstract":"<div><div>Recent advances in computer vision have provided new solutions for intelligent welding. However, existing vision-based weld seam extraction techniques exhibit limited adaptability to various workpieces in unstructured environments. Therefore, this paper proposes a three-dimensional vision-based method tailored for weld seam extraction and path generation. The proposed method synergizes a deep learning-based point cloud segmentation technique with an improved multi-scale point cloud registration algorithm to reconstruct the complete point cloud model of all weld regions in the workpieces. Subsequently, the welding paths and torch poses are calculated using an optimized multi-plane fitting algorithm integrated with geometry model of weld seam. Experimental validation on four workpieces demonstrates that the proposed method achieves good accuracy and outperforms the existing techniques in terms of efficiency and applicability, offering a robust solution for automated welding of steel structures.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106792"},"PeriodicalIF":11.5,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.autcon.2026.106791
Yuandong Pan , Mudan Wang , Linjun Lu , Rabindra Lamsal , Erika Pärn , Sisi Zlatanova , Ioannis Brilakis
Digital twins are increasingly used in the Architecture, Engineering, and Construction (AEC) industry, but their adoption is often hindered by the need for specialised knowledge, such as database querying. This paper presents Graph-DT-GPT, a multi-agent framework that integrates Large Language Models (LLMs) with graph-based digital twins to enable natural language interaction. The framework is designed with modular agents, including decision, query generation, and answer extraction, and grounds all LLMs’ outputs in structured graph data to improve response reliability and reduce hallucinations. The framework is evaluated on two use cases: a city-level graph with over 40,000 building nodes and room-level apartment layout graphs. Graph-DT-GPT achieves 100% and 95.5% answer correctness using Claude Sonnet 4.5 and GPT-4o, respectively, in the city-scale case, and 100% correctness in the room-level case, significantly outperforming baseline methods including LangChain Neo4j pipelines by approximately 40% and 10%, respectively. These results demonstrate its scalability and potential to enhance accessible, accurate information retrieval in AEC digital twin applications.
{"title":"LLM-enabled multi-agent framework for natural language interaction with graph-based digital twins","authors":"Yuandong Pan , Mudan Wang , Linjun Lu , Rabindra Lamsal , Erika Pärn , Sisi Zlatanova , Ioannis Brilakis","doi":"10.1016/j.autcon.2026.106791","DOIUrl":"10.1016/j.autcon.2026.106791","url":null,"abstract":"<div><div>Digital twins are increasingly used in the Architecture, Engineering, and Construction (AEC) industry, but their adoption is often hindered by the need for specialised knowledge, such as database querying. This paper presents Graph-DT-GPT, a multi-agent framework that integrates Large Language Models (LLMs) with graph-based digital twins to enable natural language interaction. The framework is designed with modular agents, including decision, query generation, and answer extraction, and grounds all LLMs’ outputs in structured graph data to improve response reliability and reduce hallucinations. The framework is evaluated on two use cases: a city-level graph with over 40,000 building nodes and room-level apartment layout graphs. Graph-DT-GPT achieves 100% and 95.5% answer correctness using Claude Sonnet 4.5 and GPT-4o, respectively, in the city-scale case, and 100% correctness in the room-level case, significantly outperforming baseline methods including LangChain Neo4j pipelines by approximately 40% and 10%, respectively. These results demonstrate its scalability and potential to enhance accessible, accurate information retrieval in AEC digital twin applications.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106791"},"PeriodicalIF":11.5,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-27DOI: 10.1016/j.autcon.2025.106754
Yiming Liu, Christiane M. Herr
As Artificial Intelligence transforms design through decentralised and self-organising generative systems, Cellular Automata (CA) exemplify a foundational yet underexplored paradigm capable of bridging rule-based emergence and computational creativity in architecture and urbanism. Driven by simple local rules, CA produce spatially responsive and systemic patterns well-suited to capturing the dynamics of complex interrelated systems, making them valuable for generative design exploration. This review systematically investigates control strategies for guiding CA-based generative processes. It identifies temporal logic methods for adjusting CA behaviour through bibliometric analysis. The review further demonstrates control factors, computational control, and human-mediated control, analysing their impact on the adaptability of CA design processes at each stage through the content-based synthesis. The results reveal the advantages of different control strategies in guiding goal-directed CA generation. This study advances the understanding of CA-based design mechanisms and highlights opportunities to develop intelligent control, process-oriented design tools integrating data-driven and AI technologies.
{"title":"Control strategies for Cellular Automata-based generative design in architecture and urbanism","authors":"Yiming Liu, Christiane M. Herr","doi":"10.1016/j.autcon.2025.106754","DOIUrl":"10.1016/j.autcon.2025.106754","url":null,"abstract":"<div><div>As Artificial Intelligence transforms design through decentralised and self-organising generative systems, Cellular Automata (CA) exemplify a foundational yet underexplored paradigm capable of bridging rule-based emergence and computational creativity in architecture and urbanism. Driven by simple local rules, CA produce spatially responsive and systemic patterns well-suited to capturing the dynamics of complex interrelated systems, making them valuable for generative design exploration. This review systematically investigates control strategies for guiding CA-based generative processes. It identifies temporal logic methods for adjusting CA behaviour through bibliometric analysis. The review further demonstrates control factors, computational control, and human-mediated control, analysing their impact on the adaptability of CA design processes at each stage through the content-based synthesis. The results reveal the advantages of different control strategies in guiding goal-directed CA generation. This study advances the understanding of CA-based design mechanisms and highlights opportunities to develop intelligent control, process-oriented design tools integrating data-driven and AI technologies.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106754"},"PeriodicalIF":11.5,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.1016/j.autcon.2026.106794
Yuxin Zhang, Xi Wang, Mo Hu, Zhenyu Zhang
Information retrieval and question answering from safety regulations are essential for automated construction compliance checking but are hindered by the linguistic and structural complexity of regulatory text. Many queries are multi-hop, requiring synthesis across interlinked clauses. To address the challenge, this paper introduces BifrostRAG, a dual-graph retrieval-augmented generation (RAG) system that models both linguistic relationships and document structure. The proposed architecture supports a hybrid retrieval mechanism that combines graph traversal with vector-based semantic search, enabling large language models to reason over both the content and the structure of the text. On a multi-hop question dataset, BifrostRAG achieves 92.8% precision, 85.5% recall, and an F1 score of 87.3%. These results significantly outperform vector-only and graph-only RAG baselines, establishing BifrostRAG as a robust knowledge engine for LLM-driven compliance checking. The dual-graph, hybrid retrieval mechanism presented in this paper offers a transferable blueprint for navigating complex technical documents across knowledge-intensive engineering domains.
{"title":"Bridging dual knowledge graphs for multi-hop question answering in construction safety","authors":"Yuxin Zhang, Xi Wang, Mo Hu, Zhenyu Zhang","doi":"10.1016/j.autcon.2026.106794","DOIUrl":"10.1016/j.autcon.2026.106794","url":null,"abstract":"<div><div>Information retrieval and question answering from safety regulations are essential for automated construction compliance checking but are hindered by the linguistic and structural complexity of regulatory text. Many queries are multi-hop, requiring synthesis across interlinked clauses. To address the challenge, this paper introduces BifrostRAG, a dual-graph retrieval-augmented generation (RAG) system that models both linguistic relationships and document structure. The proposed architecture supports a hybrid retrieval mechanism that combines graph traversal with vector-based semantic search, enabling large language models to reason over both the content and the structure of the text. On a multi-hop question dataset, BifrostRAG achieves 92.8% precision, 85.5% recall, and an F1 score of 87.3%. These results significantly outperform vector-only and graph-only RAG baselines, establishing BifrostRAG as a robust knowledge engine for LLM-driven compliance checking. The dual-graph, hybrid retrieval mechanism presented in this paper offers a transferable blueprint for navigating complex technical documents across knowledge-intensive engineering domains.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106794"},"PeriodicalIF":11.5,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-24DOI: 10.1016/j.autcon.2026.106799
Difeng Hu , You Dong , Mingkai Li , Hanmo Wang , Tao Wang
BIM-derived point clouds are valuable for semantic segmentation and BIM modeling, but distribution discrepancies between BIM and real-world scans significantly degrade segmentation performance. To mitigate this issue, this paper develops a margin-aware maximum classifier discrepancy (MMCD) method, which extends the conventional MCD framework by incorporating a margin-aware mechanism. Task-specific classifiers act as discriminators to encourage the feature generator to learn domain-invariant yet discriminative features for unlabeled real point clouds, improving BIM-to-scan distribution alignment and segmentation accuracy. A margin-aware discrepancy loss is formulated to enforce sufficient margin between features and classification boundaries, improving robustness to domain shift. In addition, a training strategy is proposed to support MMCD optimization. Finally, a refined RandLA-Net with an attention-based upsampling module is constructed as the backbone for validation. Experiments demonstrate that the proposed approach achieves superior performance, with an IoU of 72.79% and an overall accuracy of 87.99%, outperforming RandLA-Net variants with or without MCD.
{"title":"Margin-aware maximum classifier discrepancy for BIM-to-scan semantic segmentation of building point clouds","authors":"Difeng Hu , You Dong , Mingkai Li , Hanmo Wang , Tao Wang","doi":"10.1016/j.autcon.2026.106799","DOIUrl":"10.1016/j.autcon.2026.106799","url":null,"abstract":"<div><div>BIM-derived point clouds are valuable for semantic segmentation and BIM modeling, but distribution discrepancies between BIM and real-world scans significantly degrade segmentation performance. To mitigate this issue, this paper develops a margin-aware maximum classifier discrepancy (MMCD) method, which extends the conventional MCD framework by incorporating a margin-aware mechanism. Task-specific classifiers act as discriminators to encourage the feature generator to learn domain-invariant yet discriminative features for unlabeled real point clouds, improving BIM-to-scan distribution alignment and segmentation accuracy. A margin-aware discrepancy loss is formulated to enforce sufficient margin between features and classification boundaries, improving robustness to domain shift. In addition, a training strategy is proposed to support MMCD optimization. Finally, a refined RandLA-Net with an attention-based upsampling module is constructed as the backbone for validation. Experiments demonstrate that the proposed approach achieves superior performance, with an IoU of 72.79% and an overall accuracy of 87.99%, outperforming RandLA-Net variants with or without MCD.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106799"},"PeriodicalIF":11.5,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-24DOI: 10.1016/j.autcon.2026.106795
Yufei Zhang , Gang Li , Runjie Shen
Infrastructure defect detection is vital for public safety and sustainable societal development. In recent years, advances in computer vision have gradually promoted the intelligence and automation of infrastructure defect detection. This paper provides a comprehensive overview of research progress and emerging trends in computer vision-based detection of diverse defect types across multiple infrastructure scenarios, including datasets, evaluation metrics, and methods. A classification framework is introduced that centers on single and multiple visual modalities. The former includes traditional image processing, machine learning, and deep learning techniques, reflecting the evolution of the field. The latter focuses on data-level, feature-level, and decision-level fusion strategies, highlighting opportunities to improve detection performance with multiple visual modalities. Methods are further categorized according to their characteristics and model architectures. Finally, existing challenges are summarized, and promising research directions are outlined based on the strengths and limitations of current methods.
{"title":"Computer vision for infrastructure defect detection: Methods and trends","authors":"Yufei Zhang , Gang Li , Runjie Shen","doi":"10.1016/j.autcon.2026.106795","DOIUrl":"10.1016/j.autcon.2026.106795","url":null,"abstract":"<div><div>Infrastructure defect detection is vital for public safety and sustainable societal development. In recent years, advances in computer vision have gradually promoted the intelligence and automation of infrastructure defect detection. This paper provides a comprehensive overview of research progress and emerging trends in computer vision-based detection of diverse defect types across multiple infrastructure scenarios, including datasets, evaluation metrics, and methods. A classification framework is introduced that centers on single and multiple visual modalities. The former includes traditional image processing, machine learning, and deep learning techniques, reflecting the evolution of the field. The latter focuses on data-level, feature-level, and decision-level fusion strategies, highlighting opportunities to improve detection performance with multiple visual modalities. Methods are further categorized according to their characteristics and model architectures. Finally, existing challenges are summarized, and promising research directions are outlined based on the strengths and limitations of current methods.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106795"},"PeriodicalIF":11.5,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.autcon.2026.106788
Cong Chen, Shenghan Zhang
Unmanned Aerial Vehicles (UAVs) have emerged as essential tools for building façade inspection. However, due to the repeating patterns on façades, automatically registering images taken by UAV to Building Information Modeling (BIM) models, though important for building maintenance, remains challenging. Existing methods often rely on GPS data, which lack sufficient accuracy in urban environments. This paper proposes a GPS-free automated framework to register UAV-captured image sequences to BIM models by leveraging information from overlapping images. The framework comprises three key components: (1) extracting semantic key points from images using the Grounded SAM 2; (2) implementing a virtual UAV camera model to enable bidirectional projection of key points between BIM coordinates and image coordinates; and (3) developing a particle filter motion model to achieve image-to-BIM registration using image sequences. The proposed method registers various data types to BIM models, including overlapping visual image sequences, infrared (IR)-visual pairs, and façade defects.
无人驾驶飞行器(uav)已成为建筑物外观检查的重要工具。然而,由于立面上的重复模式,自动注册无人机拍摄的图像到建筑信息建模(BIM)模型,虽然对建筑维护很重要,仍然具有挑战性。现有的方法通常依赖于GPS数据,在城市环境中缺乏足够的精度。本文提出了一个无gps的自动化框架,通过利用重叠图像的信息将无人机捕获的图像序列注册到BIM模型。该框架包括三个关键部分:(1)使用ground SAM 2从图像中提取语义关键点;(2)实现虚拟无人机摄像机模型,实现BIM坐标与图像坐标之间关键点的双向投影;(3)开发粒子滤波运动模型,使用图像序列实现图像到bim的注册。该方法将各种数据类型注册到BIM模型中,包括重叠的视觉图像序列,红外(IR)-视觉对和farade缺陷。
{"title":"GPS-free automated registration of UAV-captured façade image sequences to BIM using semantic key points","authors":"Cong Chen, Shenghan Zhang","doi":"10.1016/j.autcon.2026.106788","DOIUrl":"10.1016/j.autcon.2026.106788","url":null,"abstract":"<div><div>Unmanned Aerial Vehicles (UAVs) have emerged as essential tools for building façade inspection. However, due to the repeating patterns on façades, automatically registering images taken by UAV to Building Information Modeling (BIM) models, though important for building maintenance, remains challenging. Existing methods often rely on GPS data, which lack sufficient accuracy in urban environments. This paper proposes a GPS-free automated framework to register UAV-captured image sequences to BIM models by leveraging information from overlapping images. The framework comprises three key components: (1) extracting semantic key points from images using the Grounded SAM 2; (2) implementing a virtual UAV camera model to enable bidirectional projection of key points between BIM coordinates and image coordinates; and (3) developing a particle filter motion model to achieve image-to-BIM registration using image sequences. The proposed method registers various data types to BIM models, including overlapping visual image sequences, infrared (IR)-visual pairs, and façade defects.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106788"},"PeriodicalIF":11.5,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.autcon.2026.106796
Qingwei Zeng , Shunxin Yang , Chang Xu , Jitong Ding , Qiwei Chen , Guoyang Lu
Pavement management data often suffers from severe class imbalance, and existing project-level maintenance and rehabilitation (M&R) decision-making models generally lack post-maintenance evaluation mechanisms. To address these issues, this paper proposes a project-level automated pavement M&R decision-making framework that considers data imbalance and incorporates post-maintenance evaluation (PMDNN). First, a Conditional Tabular Generative Adversarial Network (CTGAN) is developed to augment imbalanced M&R datasets. Next, two deep neural networks (DNNs) are constructed, for pavement performance prediction and for M&R decision-making, respectively. Finally, these two DNNs are nested to enable post-maintenance evaluation, supporting iterative adjustment of suboptimal M&R plans. Results demonstrate that the CTGAN effectively addresses data imbalance and accurately simulates the distribution of the original data. Compared with other data augmentation models, the CTGAN generates data with 4.7%–18.1% higher quality. Additionally, relative to multiple baseline frameworks, the proposed PMDNN framework achieves a 1.91%–4.71% higher overall decision accuracy. These findings indicate that PMDNN can support pavement management systems in making decisions more closely aligned with expert judgment.
{"title":"Project-level automated pavement maintenance and rehabilitation decision-making with data imbalance mitigation and post-maintenance evaluation","authors":"Qingwei Zeng , Shunxin Yang , Chang Xu , Jitong Ding , Qiwei Chen , Guoyang Lu","doi":"10.1016/j.autcon.2026.106796","DOIUrl":"10.1016/j.autcon.2026.106796","url":null,"abstract":"<div><div>Pavement management data often suffers from severe class imbalance, and existing project-level maintenance and rehabilitation (M&R) decision-making models generally lack post-maintenance evaluation mechanisms. To address these issues, this paper proposes a project-level automated pavement M&R decision-making framework that considers data imbalance and incorporates post-maintenance evaluation (PMDNN). First, a Conditional Tabular Generative Adversarial Network (CTGAN) is developed to augment imbalanced M&R datasets. Next, two deep neural networks (DNNs) are constructed, for pavement performance prediction and for M&R decision-making, respectively. Finally, these two DNNs are nested to enable post-maintenance evaluation, supporting iterative adjustment of suboptimal M&R plans. Results demonstrate that the CTGAN effectively addresses data imbalance and accurately simulates the distribution of the original data. Compared with other data augmentation models, the CTGAN generates data with 4.7%–18.1% higher quality. Additionally, relative to multiple baseline frameworks, the proposed PMDNN framework achieves a 1.91%–4.71% higher overall decision accuracy. These findings indicate that PMDNN can support pavement management systems in making decisions more closely aligned with expert judgment.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106796"},"PeriodicalIF":11.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1016/j.autcon.2026.106786
Insoo Jeong , Junghoon Kim , Seungmo Lim , Jeongbin Hwang , Seokho Chi
This paper proposes a lightweight domain adaptation framework for construction safety monitoring by fine-tuning a pretrained text-to-image diffusion model using Low-Rank Adaptation (LoRA). To simulate high-risk construction environments underrepresented in training data, the model was adapted to environmental features and specific hazards, focusing on visually dominant scenarios including falls, struck-by, and caught-in incidents. To address data scarcity, Multi-LoRA fine-tuning was conducted using 20 images per hazard type (totaling 60 across three hazards) and 30 background images, enabling both contextual and hazard-specific adaptation. The generated images achieved the highest semantic consistency, yielding the top mean Contrastive Language-Image Pre-training (CLIP) scores with minimal variance, and improved visual realism by reducing the Fréchet Inception Distance (FID) by 86.72 points. Furthermore, a YOLOv8 model trained exclusively on these synthetic images achieved a mean average precision ([email protected]:0.95) of 94.1% on real-world frames, comparable to a baseline model trained on real data.
{"title":"Addressing data scarcity in construction safety monitoring using low-rank adaptation (LoRA)-tuned domain-specific image generation","authors":"Insoo Jeong , Junghoon Kim , Seungmo Lim , Jeongbin Hwang , Seokho Chi","doi":"10.1016/j.autcon.2026.106786","DOIUrl":"10.1016/j.autcon.2026.106786","url":null,"abstract":"<div><div>This paper proposes a lightweight domain adaptation framework for construction safety monitoring by fine-tuning a pretrained text-to-image diffusion model using Low-Rank Adaptation (LoRA). To simulate high-risk construction environments underrepresented in training data, the model was adapted to environmental features and specific hazards, focusing on visually dominant scenarios including falls, struck-by, and caught-in incidents. To address data scarcity, Multi-LoRA fine-tuning was conducted using 20 images per hazard type (totaling 60 across three hazards) and 30 background images, enabling both contextual and hazard-specific adaptation. The generated images achieved the highest semantic consistency, yielding the top mean Contrastive Language-Image Pre-training (CLIP) scores with minimal variance, and improved visual realism by reducing the Fréchet Inception Distance (FID) by 86.72 points. Furthermore, a YOLOv8 model trained exclusively on these synthetic images achieved a mean average precision ([email protected]:0.95) of 94.1% on real-world frames, comparable to a baseline model trained on real data.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106786"},"PeriodicalIF":11.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146014826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}