Negar Tajziyehchi, Mohammad Moshirpour, George Jergeas, F. Sadeghpour
{"title":"A Predictive Model of Cost Growth in Construction Projects Using Feature Selection","authors":"Negar Tajziyehchi, Mohammad Moshirpour, George Jergeas, F. Sadeghpour","doi":"10.1109/AIKE48582.2020.00029","DOIUrl":null,"url":null,"abstract":"The construction industry spends billions of dollars on large-scale projects annually. These projects typically experience cost overruns. To solve this issue, it is essential to identify the key factors that contribute to project cost growth. The data provided by the Construction Owners Association of Alberta (COAA) and the Construction Industry Institute (CII) was used in this study. This data shows that Alberta’s average cost growth is much higher than similar projects in the United States, and it is therefore desirable to improve Alberta’s project performance. There are 139 samples for Alberta projects, and the nature of the data is high dimensional, making it difficult to extract useful information from the data for cost growth prediction. The use of dimensionality reduction techniques, such as feature selection, contribute to identifying the most important features that impact cost growth. This study identified 16 out of 281 significant features, selected in two steps. Initially, 21 features were selected by LASSO. The R2 score and RMSE are calculated for five different models in three train and test split models. Random forest had the highest score, using more than 80 percent of the data for training. The permutation importance of each feature is calculated using random forest, and 16 variables are extracted. These features are applied as an input for five machine learning algorithms to evaluate the variables’ predictive ability.","PeriodicalId":370671,"journal":{"name":"2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIKE48582.2020.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The construction industry spends billions of dollars on large-scale projects annually. These projects typically experience cost overruns. To solve this issue, it is essential to identify the key factors that contribute to project cost growth. The data provided by the Construction Owners Association of Alberta (COAA) and the Construction Industry Institute (CII) was used in this study. This data shows that Alberta’s average cost growth is much higher than similar projects in the United States, and it is therefore desirable to improve Alberta’s project performance. There are 139 samples for Alberta projects, and the nature of the data is high dimensional, making it difficult to extract useful information from the data for cost growth prediction. The use of dimensionality reduction techniques, such as feature selection, contribute to identifying the most important features that impact cost growth. This study identified 16 out of 281 significant features, selected in two steps. Initially, 21 features were selected by LASSO. The R2 score and RMSE are calculated for five different models in three train and test split models. Random forest had the highest score, using more than 80 percent of the data for training. The permutation importance of each feature is calculated using random forest, and 16 variables are extracted. These features are applied as an input for five machine learning algorithms to evaluate the variables’ predictive ability.