Pub Date : 2026-01-17DOI: 10.1016/j.autcon.2026.106782
Fuhao Zu , Xueqing Zhang
Effective reuse of creative ideas from value engineering (VE) workshops is crucial for cost-effective, innovative design. Conventional methods like post-project reviews and keyword searches often lack context, real-time availability, and semantic relevance, limiting the practical reuse of past insights. This paper addresses the fundamental question of how knowledge generated during VE workshops can be effectively captured and reused to support future idea generations. To solve this, it proposes an integrated methodology combining BIM-based live capture with a hybrid retrieval system. This system uses structured attributes and Bidirectional Encoder Representations from Transformers (BERT) based semantic similarity to ensure context-aware reuse. A prototype Revit plug-in was developed for structured capture and semantic search. Evaluation demonstrated strong performance, superiority over baseline methods, and high user acceptance. This paper provides a practical framework and tool for structured documentation and intelligent knowledge reuse, thereby enhancing creativity support for construction VE practices.
{"title":"Real-time knowledge management for construction value engineering: Live capture and BERT-aided case-based retrieval","authors":"Fuhao Zu , Xueqing Zhang","doi":"10.1016/j.autcon.2026.106782","DOIUrl":"10.1016/j.autcon.2026.106782","url":null,"abstract":"<div><div>Effective reuse of creative ideas from value engineering (VE) workshops is crucial for cost-effective, innovative design. Conventional methods like post-project reviews and keyword searches often lack context, real-time availability, and semantic relevance, limiting the practical reuse of past insights. This paper addresses the fundamental question of how knowledge generated during VE workshops can be effectively captured and reused to support future idea generations. To solve this, it proposes an integrated methodology combining BIM-based live capture with a hybrid retrieval system. This system uses structured attributes and Bidirectional Encoder Representations from Transformers (BERT) based semantic similarity to ensure context-aware reuse. A prototype Revit plug-in was developed for structured capture and semantic search. Evaluation demonstrated strong performance, superiority over baseline methods, and high user acceptance. This paper provides a practical framework and tool for structured documentation and intelligent knowledge reuse, thereby enhancing creativity support for construction VE practices.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106782"},"PeriodicalIF":11.5,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-17DOI: 10.1016/j.autcon.2026.106773
Wahib Saif , Omar Doukari , Mohamad Kassem
Construction Digital Twins (CDTs) are increasingly recognised for their potential to improve construction project management. However, successful implementation requires more than just deploying technology; it demands a stakeholder-centric, whole-system lifecycle approach. Existing frameworks are largely technocentric, focusing on technical demonstrations in isolated use cases and offering limited guidance on stakeholders' roles, interactions, and system lifecycle considerations. To address these gaps, this paper introduces a socio-technical CDT framework spanning five lifecycle stages: Define, Design, Deploy, Refine, and Decommission. Grounded in an eight-month longitudinal industrial case study and informed by a CDT triad taxonomy (applications, data, technologies), the framework guides CDT development and maps stakeholder engagement throughout its lifecycle. Stakeholders are categorised into four actor groups: Strategic, Advisory, Technical, and Operational, whose interdependencies are conceptualised through an actor role model. The framework extends CDT applicability beyond controlled demonstrations to real project contexts, while emphasising the need for validation across diverse organisational settings.
{"title":"Stakeholder-centric whole-lifecycle framework for guiding the development and implementation of construction digital twins","authors":"Wahib Saif , Omar Doukari , Mohamad Kassem","doi":"10.1016/j.autcon.2026.106773","DOIUrl":"10.1016/j.autcon.2026.106773","url":null,"abstract":"<div><div>Construction Digital Twins (CDTs) are increasingly recognised for their potential to improve construction project management. However, successful implementation requires more than just deploying technology; it demands a stakeholder-centric, whole-system lifecycle approach. Existing frameworks are largely technocentric, focusing on technical demonstrations in isolated use cases and offering limited guidance on stakeholders' roles, interactions, and system lifecycle considerations. To address these gaps, this paper introduces a socio-technical CDT framework spanning five lifecycle stages: Define, Design, Deploy, Refine, and Decommission. Grounded in an eight-month longitudinal industrial case study and informed by a CDT triad taxonomy (applications, data, technologies), the framework guides CDT development and maps stakeholder engagement throughout its lifecycle. Stakeholders are categorised into four actor groups: Strategic, Advisory, Technical, and Operational, whose interdependencies are conceptualised through an actor role model. The framework extends CDT applicability beyond controlled demonstrations to real project contexts, while emphasising the need for validation across diverse organisational settings.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106773"},"PeriodicalIF":11.5,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a lightweight semantic segmentation framework utilizing 3D point cloud data to enable automatic and rapid construction progress monitoring in high-rise building projects. This study centers on developing an efficient L-PointNet++ model that integrates self-attention mechanisms and MobileNetV3 modules, significantly reducing computational complexity and achieving a 95.63 % reduction in total training time compared to traditional PointNet++. A dual-stage training strategy is adopted to effectively address class imbalance, resulting in high segmentation accuracy with mean Intersection over Union (mIoU) values of 0.9308 for edge points and 0.9300 for corner points. Experimental results indicate that the developed framework can significantly enhance the speed and adaptability of as-built BIM model reconstruction and provide substantial improvements in decision-making efficiency and project management through the implementation of a visualization-based progress monitoring and early-warning system. Overall, the proposed approach demonstrates notable advantages in 3D reconstruction accuracy, speed, and project control, providing a robust solution for real-time construction progress monitoring applications.
{"title":"Lightweight semantic segmentation for construction progress monitoring using 3D point clouds","authors":"Jinting Huang , Zhonghua Xiao , Ankang Ji , Limao Zhang","doi":"10.1016/j.autcon.2026.106765","DOIUrl":"10.1016/j.autcon.2026.106765","url":null,"abstract":"<div><div>This paper proposes a lightweight semantic segmentation framework utilizing 3D point cloud data to enable automatic and rapid construction progress monitoring in high-rise building projects. This study centers on developing an efficient L-PointNet++ model that integrates self-attention mechanisms and MobileNetV3 modules, significantly reducing computational complexity and achieving a 95.63 % reduction in total training time compared to traditional PointNet++. A dual-stage training strategy is adopted to effectively address class imbalance, resulting in high segmentation accuracy with mean Intersection over Union (mIoU) values of 0.9308 for edge points and 0.9300 for corner points. Experimental results indicate that the developed framework can significantly enhance the speed and adaptability of as-built BIM model reconstruction and provide substantial improvements in decision-making efficiency and project management through the implementation of a visualization-based progress monitoring and early-warning system. Overall, the proposed approach demonstrates notable advantages in 3D reconstruction accuracy, speed, and project control, providing a robust solution for real-time construction progress monitoring applications.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106765"},"PeriodicalIF":11.5,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-17DOI: 10.1016/j.autcon.2026.106784
Yuanyuan Li , Runze Zhao , Yin Liu , Hongnan Li , Qingrui Yue , Hongbing Chen
Vibration monitoring of engineering infrastructures is indispensable for structural safety and scientific maintenance. Distributed acoustic sensing (DAS) has been increasingly adopted in engineering field, owing to its attractive characteristics over conventional point-based transducers, including high spatial resolution, spatial continuity, non-invasiveness and superior stability. These advantages align well with instrumentation requirements for long-term and widely-distributed vibration monitoring in large-scale infrastructures. Accordingly, this paper provides a systematic review of DAS technique with respect to sensing mechanisms, deployment strategies, signal analysis, and typical applications. This review is structured around a complete operational workflow that explicates what the technology is, how it works, and what it enables in practice. Furthermore, current challenges and promising directions are discussed to envisage the widespread implementation of DAS systems, with the ultimate goal of automated monitoring for infrastructures. This review also aims to provide an exhaustive reference for researchers, professionals or engineering inspectors seeking state-of-the-art in DAS research.
{"title":"Distributed acoustic sensing for monitoring engineering infrastructure: Mechanisms, signal analytics, and applications","authors":"Yuanyuan Li , Runze Zhao , Yin Liu , Hongnan Li , Qingrui Yue , Hongbing Chen","doi":"10.1016/j.autcon.2026.106784","DOIUrl":"10.1016/j.autcon.2026.106784","url":null,"abstract":"<div><div>Vibration monitoring of engineering infrastructures is indispensable for structural safety and scientific maintenance. Distributed acoustic sensing (DAS) has been increasingly adopted in engineering field, owing to its attractive characteristics over conventional point-based transducers, including high spatial resolution, spatial continuity, non-invasiveness and superior stability. These advantages align well with instrumentation requirements for long-term and widely-distributed vibration monitoring in large-scale infrastructures. Accordingly, this paper provides a systematic review of DAS technique with respect to sensing mechanisms, deployment strategies, signal analysis, and typical applications. This review is structured around a complete operational workflow that explicates what the technology is, how it works, and what it enables in practice. Furthermore, current challenges and promising directions are discussed to envisage the widespread implementation of DAS systems, with the ultimate goal of automated monitoring for infrastructures. This review also aims to provide an exhaustive reference for researchers, professionals or engineering inspectors seeking state-of-the-art in DAS research.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106784"},"PeriodicalIF":11.5,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.autcon.2026.106766
Ghang Lee , Sejin Park , Soo-in Yang
This paper investigates “mouseless design” feasibility, replacing traditional mouse-based interfaces with natural language interaction, in professional design practice. A three-month experiment tested LLMs for developing a sports complex project for competition. Through triangulation analysis of 2162 conversation turns, 1281 messages, and an 84-page design journal, this study established a quantitative baseline for LLM performance across professional design workflows. It revealed 86.9% unsuccessful individual interactions despite successful project completion and identified inconsistent spatial reasoning and geometry handling as the main weaknesses. Two methodological breakthroughs using conversational programming overcame these limitations: the “artifact-driven” approach repositioning LLMs as custom digital tool creators rather than direct design generators, and self-learning approaches extending complex BIM functionality. A statistical analysis (χ2(90) = 156, Cramer's V = 0.120) shows that terminology alignment serves as a success multiplier when combined with other strategies. These contributions provide empirical evidence for natural language-driven design while identifying critical requirements for successful AI integration.
本文在专业设计实践中探讨“无鼠标设计”的可行性,用自然语言交互取代传统的基于鼠标的界面。一项为期三个月的实验测试了llm为比赛开发体育综合体项目的能力。通过对2162个会话回合、1281条消息和84页的设计期刊进行三角分析,本研究为LLM在专业设计工作流程中的表现建立了定量基线。它揭示了86.9%不成功的个人互动,尽管成功完成了项目,并确定了不一致的空间推理和几何处理是主要弱点。对话式编程的两个方法论突破克服了这些限制:“工件驱动”方法将llm重新定位为定制的数字工具创建者,而不是直接的设计生成器,以及扩展复杂BIM功能的自我学习方法。统计分析(χ2(90) = 156, Cramer's V = 0.120)表明,术语对齐与其他策略结合使用时,可以起到成功乘数的作用。这些贡献为自然语言驱动的设计提供了经验证据,同时确定了成功的人工智能集成的关键需求。
{"title":"Artifact-driven LLM integration for mouseless design workflows","authors":"Ghang Lee , Sejin Park , Soo-in Yang","doi":"10.1016/j.autcon.2026.106766","DOIUrl":"10.1016/j.autcon.2026.106766","url":null,"abstract":"<div><div>This paper investigates “mouseless design” feasibility, replacing traditional mouse-based interfaces with natural language interaction, in professional design practice. A three-month experiment tested LLMs for developing a sports complex project for competition. Through triangulation analysis of 2162 conversation turns, 1281 messages, and an 84-page design journal, this study established a quantitative baseline for LLM performance across professional design workflows. It revealed 86.9% unsuccessful individual interactions despite successful project completion and identified inconsistent spatial reasoning and geometry handling as the main weaknesses. Two methodological breakthroughs using conversational programming overcame these limitations: the “artifact-driven” approach repositioning LLMs as custom digital tool creators rather than direct design generators, and self-learning approaches extending complex BIM functionality. A statistical analysis (χ<sup>2</sup>(90) = 156, Cramer's V = 0.120) shows that terminology alignment serves as a success multiplier when combined with other strategies. These contributions provide empirical evidence for natural language-driven design while identifying critical requirements for successful AI integration.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106766"},"PeriodicalIF":11.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1016/j.autcon.2026.106776
Seungkeun Yeom , Juui Kim , Seungwon Seo , Seongkyun Ahn , Choongwan Koo , Taehoon Hong
This paper investigates how personality traits and psychological-cognitive states influence task performance, safety, and physiological responses of novice tower crane operators through a virtual reality (VR) simulation integrated with continuous biometric monitoring. Fifty participants completed object lifting, obstacle navigation, and precision placement tasks while personality profiles and biosignals (ECG, EDA) were collected and analyzed using principal component analysis,cluster-based classification, and additional statistical methods. High extraversion and situational awareness enhanced speed and accuracy, whereas high openness, stress sensitivity, and acrophobia led to longer durations and reduced accuracy. High conscientiousness shortened task times by 19.12% but increased collisions by approximately threefold, revealing a trade-off between efficiency and safety. By integrating behavioral, cognitive, and physiological data, this work advances technology-enabled, data-driven safety management. The proposed approach enables automated operator risk profiling, intelligent task allocation, and proactive safety interventions for high-rise construction projects involving crane operations.
{"title":"Virtual reality-based experimental analysis of personality and cognitive traits on task performance and safety in novice tower crane operators","authors":"Seungkeun Yeom , Juui Kim , Seungwon Seo , Seongkyun Ahn , Choongwan Koo , Taehoon Hong","doi":"10.1016/j.autcon.2026.106776","DOIUrl":"10.1016/j.autcon.2026.106776","url":null,"abstract":"<div><div>This paper investigates how personality traits and psychological-cognitive states influence task performance, safety, and physiological responses of novice tower crane operators through a virtual reality (VR) simulation integrated with continuous biometric monitoring. Fifty participants completed object lifting, obstacle navigation, and precision placement tasks while personality profiles and biosignals (ECG, EDA) were collected and analyzed using principal component analysis,cluster-based classification, and additional statistical methods. High extraversion and situational awareness enhanced speed and accuracy, whereas high openness, stress sensitivity, and acrophobia led to longer durations and reduced accuracy. High conscientiousness shortened task times by 19.12% but increased collisions by approximately threefold, revealing a trade-off between efficiency and safety. By integrating behavioral, cognitive, and physiological data, this work advances technology-enabled, data-driven safety management. The proposed approach enables automated operator risk profiling, intelligent task allocation, and proactive safety interventions for high-rise construction projects involving crane operations.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106776"},"PeriodicalIF":11.5,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Using images captured by UAVs for high-fidelity 3D building reconstruction in architectural engineering is popular and effective nowadays; however, planning a flight trajectory that maximizes reconstruction quality with minimal flight time remains a critical challenge. This paper proposes a universal co-optimization framework that bridges reconstruction objectives with flight dynamics through an integrated planning paradigm. The proposed approach performs initial flight planning by solving a Traveling Salesman Problem over candidate viewpoints and updating them according to the unit-length contribution criterion. The adaptive radius is determined, and subsequently, the sphere-based corridor is constructed to enforce the trajectory passing all updated viewpoints within the corresponding spatial tolerances. Next, an optimal control problem is formulated and solved using a nonlinear solver to obtain the final flight trajectory satisfying both dynamic and safety constraints. Experimental comparisons with state-of-the-art methods on three public scenes and two real scenes captured by ourselves demonstrate that the proposed approach significantly improves flight efficiency, reducing travel distance and flight duration by approximately 10% to 40% with comparable or superior reconstruction quality.
{"title":"Efficient UAV trajectory optimization for fine-detailed 3D building reconstruction","authors":"Tianrui Shen, Lai Kang, Yingmei Wei, Shanshan Wan, Haixuan Wang, Chao Zuo","doi":"10.1016/j.autcon.2026.106775","DOIUrl":"10.1016/j.autcon.2026.106775","url":null,"abstract":"<div><div>Using images captured by UAVs for high-fidelity 3D building reconstruction in architectural engineering is popular and effective nowadays; however, planning a flight trajectory that maximizes reconstruction quality with minimal flight time remains a critical challenge. This paper proposes a universal co-optimization framework that bridges reconstruction objectives with flight dynamics through an integrated planning paradigm. The proposed approach performs initial flight planning by solving a Traveling Salesman Problem over candidate viewpoints and updating them according to the unit-length contribution criterion. The adaptive radius is determined, and subsequently, the sphere-based corridor is constructed to enforce the trajectory passing all updated viewpoints within the corresponding spatial tolerances. Next, an optimal control problem is formulated and solved using a nonlinear solver to obtain the final flight trajectory satisfying both dynamic and safety constraints. Experimental comparisons with state-of-the-art methods on three public scenes and two real scenes captured by ourselves demonstrate that the proposed approach significantly improves flight efficiency, reducing travel distance and flight duration by approximately 10% to 40% with comparable or superior reconstruction quality.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106775"},"PeriodicalIF":11.5,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1016/j.autcon.2025.106748
Hamed Hasani, Francesco Freddi
This study presents an AI-powered framework for automated structural health monitoring that integrates modal identification, anomaly detection, and damage localization under varying environmental and operational conditions. The approach combines stochastic subspace identification with frequency–spatial domain decomposition for automated modal extraction and a condition-aware anomaly detector based on a conditional variational autoencoder. A secondary SSA–OC-SVM module verifies and localizes damage. The methodology is validated on a laboratory-scale structure through 500 one-hour tests under temperature variations up to 35 °C and diverse loading conditions. The identified modes exhibit MAC = 0.99–1.00, confirming reliable automated identification. The CVAE reconstructs healthy-state modal frequencies with MAPE = 0.23%, RMSE = 0.027 Hz, and = 0.836, effectively distinguishing environmental effects ( pp) from genuine structural changes. The integrated framework further accurately localizes all induced damage scenarios across nine structural zones, demonstrating high accuracy, robustness, and scalability for next-generation SHM automation.
{"title":"Condition-aware AI framework for automated structural health monitoring","authors":"Hamed Hasani, Francesco Freddi","doi":"10.1016/j.autcon.2025.106748","DOIUrl":"10.1016/j.autcon.2025.106748","url":null,"abstract":"<div><div>This study presents an AI-powered framework for automated structural health monitoring that integrates modal identification, anomaly detection, and damage localization under varying environmental and operational conditions. The approach combines stochastic subspace identification with frequency–spatial domain decomposition for automated modal extraction and a condition-aware anomaly detector based on a conditional variational autoencoder. A secondary SSA–OC-SVM module verifies and localizes damage. The methodology is validated on a laboratory-scale structure through 500 one-hour tests under temperature variations up to 35 °C and diverse loading conditions. The identified modes exhibit MAC = 0.99–1.00, confirming reliable automated identification. The CVAE reconstructs healthy-state modal frequencies with MAPE = 0.23%, RMSE = 0.027 Hz, and <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> = 0.836, effectively distinguishing environmental effects (<span><math><mrow><mo>≤</mo><mn>0</mn><mo>.</mo><mn>27</mn></mrow></math></span> pp) from genuine structural changes. The integrated framework further accurately localizes all induced damage scenarios across nine structural zones, demonstrating high accuracy, robustness, and scalability for next-generation SHM automation.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"183 ","pages":"Article 106748"},"PeriodicalIF":11.5,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.autcon.2026.106772
Minkyu Koo , Taegeon Kim , Minhyun Lee , Kinam Kim , Hongjo Kim
Automating construction site monitoring through deep learning–based segmentation presents challenges due to the high cost of pixel-wise annotations. This paper introduces a weakly and self-supervised learning framework that enhances segmentation accuracy while reducing annotation burden. Human-annotated bounding-box ground truth is used as prompts for the Segment Anything Model (SAM) to generate high-quality polygon mask labels, which are further refined through self-training. Compared to fully supervised learning models, the framework integrates Transfer Learning, Pseudo-Label Refinement, and the Noisy Student technique, improving mask mean Average Precision (Mask mAP) by 3–63% across seven target domains and achieving a Mask mAP of 72.27%. The approach also outperforms existing weakly supervised techniques, including BoxSnake and BoxTeacher, by 18% and 25.95%, respectively, and exceeds the performance of point-based methods such as PointWSSIS by 48.78%.
由于像素级标注的高成本,通过基于深度学习的分割自动化施工现场监控提出了挑战。本文引入了一种弱自监督学习框架,在降低标注负担的同时提高了分割精度。将人类标注的边界框地面真值作为SAM (Segment Anything Model)的提示符,生成高质量的多边形掩码标签,并通过自我训练进一步细化。与完全监督学习模型相比,该框架集成了迁移学习、伪标签细化和噪声学生技术,在7个目标域将mask mean Average Precision (mask mAP)提高了3-63%,实现了72.27%的mask mAP。该方法也比现有的弱监督技术(包括BoxSnake和BoxTeacher)分别高出18%和25.95%,并且比基于点的方法(如PointWSSIS)的性能高出48.78%。
{"title":"Domain-adaptive instance segmentation for far-field object monitoring using SAM-based weak supervision and noisy student self-training","authors":"Minkyu Koo , Taegeon Kim , Minhyun Lee , Kinam Kim , Hongjo Kim","doi":"10.1016/j.autcon.2026.106772","DOIUrl":"10.1016/j.autcon.2026.106772","url":null,"abstract":"<div><div>Automating construction site monitoring through deep learning–based segmentation presents challenges due to the high cost of pixel-wise annotations. This paper introduces a weakly and self-supervised learning framework that enhances segmentation accuracy while reducing annotation burden. Human-annotated bounding-box ground truth is used as prompts for the Segment Anything Model (SAM) to generate high-quality polygon mask labels, which are further refined through self-training. Compared to fully supervised learning models, the framework integrates Transfer Learning, Pseudo-Label Refinement, and the Noisy Student technique, improving mask mean Average Precision (Mask mAP) by 3–63% across seven target domains and achieving a Mask mAP of 72.27%. The approach also outperforms existing weakly supervised techniques, including BoxSnake and BoxTeacher, by 18% and 25.95%, respectively, and exceeds the performance of point-based methods such as PointWSSIS by 48.78%.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"182 ","pages":"Article 106772"},"PeriodicalIF":11.5,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.autcon.2025.106759
Wenshang Yan , Hongnan Li
Improving the accuracy and robustness of deep-learning-based crack-segmentation models remains a significant challenge, primarily because of the insufficient quantity and diversity of the available pixel-level annotated data. To address this issue, this paper proposes a controllable Crack Reference-based Diffusion Model (CRDM). The proposed model can accurately synthesize realistic cracks on crack-free background images by leveraging predefined masks and reference images. Notably, it effectively transfers crack features from reference images to generated images, while maintaining high semantic accuracy. Extensive experiments are performed to demonstrate the advantages of CRDM in producing high-quality, diverse, crack images with precise controllability. The dataset augmented with the CRDM-generated images improves the performance of crack-segmentation models by ∼1 % IoU, across various scenarios. Further performance gains are achieved through our refined label-filtering strategy. The proposed CRDM exhibits strong potential for crack-segmentation tasks, effectively reducing the time and cost of data annotation and acquisition.
{"title":"Controllable reference-based semantic crack-image generation using diffusion model for intelligent infrastructure inspection","authors":"Wenshang Yan , Hongnan Li","doi":"10.1016/j.autcon.2025.106759","DOIUrl":"10.1016/j.autcon.2025.106759","url":null,"abstract":"<div><div>Improving the accuracy and robustness of deep-learning-based crack-segmentation models remains a significant challenge, primarily because of the insufficient quantity and diversity of the available pixel-level annotated data. To address this issue, this paper proposes a controllable Crack Reference-based Diffusion Model (CRDM). The proposed model can accurately synthesize realistic cracks on crack-free background images by leveraging predefined masks and reference images. Notably, it effectively transfers crack features from reference images to generated images, while maintaining high semantic accuracy. Extensive experiments are performed to demonstrate the advantages of CRDM in producing high-quality, diverse, crack images with precise controllability. The dataset augmented with the CRDM-generated images improves the performance of crack-segmentation models by ∼1 % IoU, across various scenarios. Further performance gains are achieved through our refined label-filtering strategy. The proposed CRDM exhibits strong potential for crack-segmentation tasks, effectively reducing the time and cost of data annotation and acquisition.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"182 ","pages":"Article 106759"},"PeriodicalIF":11.5,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145962001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}