A Predictive Model of Cost Growth in Construction Projects Using Feature Selection

Negar Tajziyehchi, Mohammad Moshirpour, George Jergeas, F. Sadeghpour
{"title":"A Predictive Model of Cost Growth in Construction Projects Using Feature Selection","authors":"Negar Tajziyehchi, Mohammad Moshirpour, George Jergeas, F. Sadeghpour","doi":"10.1109/AIKE48582.2020.00029","DOIUrl":null,"url":null,"abstract":"The construction industry spends billions of dollars on large-scale projects annually. These projects typically experience cost overruns. To solve this issue, it is essential to identify the key factors that contribute to project cost growth. The data provided by the Construction Owners Association of Alberta (COAA) and the Construction Industry Institute (CII) was used in this study. This data shows that Alberta’s average cost growth is much higher than similar projects in the United States, and it is therefore desirable to improve Alberta’s project performance. There are 139 samples for Alberta projects, and the nature of the data is high dimensional, making it difficult to extract useful information from the data for cost growth prediction. The use of dimensionality reduction techniques, such as feature selection, contribute to identifying the most important features that impact cost growth. This study identified 16 out of 281 significant features, selected in two steps. Initially, 21 features were selected by LASSO. The R2 score and RMSE are calculated for five different models in three train and test split models. Random forest had the highest score, using more than 80 percent of the data for training. The permutation importance of each feature is calculated using random forest, and 16 variables are extracted. These features are applied as an input for five machine learning algorithms to evaluate the variables’ predictive ability.","PeriodicalId":370671,"journal":{"name":"2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIKE48582.2020.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The construction industry spends billions of dollars on large-scale projects annually. These projects typically experience cost overruns. To solve this issue, it is essential to identify the key factors that contribute to project cost growth. The data provided by the Construction Owners Association of Alberta (COAA) and the Construction Industry Institute (CII) was used in this study. This data shows that Alberta’s average cost growth is much higher than similar projects in the United States, and it is therefore desirable to improve Alberta’s project performance. There are 139 samples for Alberta projects, and the nature of the data is high dimensional, making it difficult to extract useful information from the data for cost growth prediction. The use of dimensionality reduction techniques, such as feature selection, contribute to identifying the most important features that impact cost growth. This study identified 16 out of 281 significant features, selected in two steps. Initially, 21 features were selected by LASSO. The R2 score and RMSE are calculated for five different models in three train and test split models. Random forest had the highest score, using more than 80 percent of the data for training. The permutation importance of each feature is calculated using random forest, and 16 variables are extracted. These features are applied as an input for five machine learning algorithms to evaluate the variables’ predictive ability.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于特征选择的建设项目成本增长预测模型
建筑业每年在大型项目上花费数十亿美元。这些项目通常会经历成本超支。要解决这个问题,必须确定导致项目成本增长的关键因素。本研究使用了艾伯塔省建筑业主协会(COAA)和建筑工业研究所(CII)提供的数据。这一数据表明,艾伯塔省的平均成本增长远远高于美国同类项目,因此,提高艾伯塔省的项目绩效是可取的。阿尔伯塔省的项目有139个样本,数据的性质是高维的,因此很难从数据中提取有用的信息来预测成本增长。使用降维技术,例如特征选择,有助于识别影响成本增长的最重要的特征。这项研究从281个重要特征中确定了16个,分两步选择。最初,LASSO选择了21个特征。在三种训练和测试分割模型中,计算了五种不同模型的R2得分和RMSE。随机森林的得分最高,使用了80%以上的数据进行训练。利用随机森林计算各特征的排列重要度,提取16个变量。这些特征被用作五种机器学习算法的输入,以评估变量的预测能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Artificial Intelligence Design on Embedded Board with Edge Computing for Vehicle Applications Analysis of Permission Selection Techniques in Machine Learning-based Malicious App Detection Using Cultural Algorithms with Common Value Auctions to Provide Sustainability in Complex Dynamic Environments Knowledge Graph Visualization: Challenges, Framework, and Implementation Evaluation of Classification algorithms for Distributed Denial of Service Attack Detection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1