SeyedeZahra Golazad , Abbas Mohammadi , Abbas Rashidi , Mohammad Ilbeigi
{"title":"From raw to refined: Data preprocessing for construction machine learning (ML), deep learning (DL), and reinforcement learning (RL) models","authors":"SeyedeZahra Golazad , Abbas Mohammadi , Abbas Rashidi , Mohammad Ilbeigi","doi":"10.1016/j.autcon.2024.105844","DOIUrl":null,"url":null,"abstract":"<div><div>As the use of predictive models in construction rapidly increases, the need for preprocessing raw construction data has become more critical. This systematic review investigates data preprocessing techniques for machine learning (ML), deep learning (DL), and reinforcement learning (RL) models in the construction domain. Through a comprehensive analysis of 457 studies, the prevalence of six data types (i.e., tabular, image, video frame, time series, text, and point cloud) and their respective preprocessing methods are examined. Key findings reveal data transformation, cleaning, reduction, augmentation, and scaling as fundamental preprocessing categories, with applications varying across data types. The paper highlights knowledge gaps, including limited synthetic data adoption, lack of standardized annotation practices, absence of comprehensive preprocessing frameworks, and need for automated labeling. Furthermore, critical considerations regarding data privacy, security, sharing, and management practices are discussed. The review underscores the pivotal role of robust data preprocessing in enabling reliable predictive models.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"168 ","pages":"Article 105844"},"PeriodicalIF":9.6000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580524005806","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
As the use of predictive models in construction rapidly increases, the need for preprocessing raw construction data has become more critical. This systematic review investigates data preprocessing techniques for machine learning (ML), deep learning (DL), and reinforcement learning (RL) models in the construction domain. Through a comprehensive analysis of 457 studies, the prevalence of six data types (i.e., tabular, image, video frame, time series, text, and point cloud) and their respective preprocessing methods are examined. Key findings reveal data transformation, cleaning, reduction, augmentation, and scaling as fundamental preprocessing categories, with applications varying across data types. The paper highlights knowledge gaps, including limited synthetic data adoption, lack of standardized annotation practices, absence of comprehensive preprocessing frameworks, and need for automated labeling. Furthermore, critical considerations regarding data privacy, security, sharing, and management practices are discussed. The review underscores the pivotal role of robust data preprocessing in enabling reliable predictive models.
期刊介绍:
Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities.
The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.