Effort estimation of ETL projects using Forward Stepwise Regression

Raza Rasool, Ali Afzal Malik
{"title":"Effort estimation of ETL projects using Forward Stepwise Regression","authors":"Raza Rasool, Ali Afzal Malik","doi":"10.1109/ICET.2015.7389209","DOIUrl":null,"url":null,"abstract":"Effort estimation is a key component of planning a software development project. In the past, there has been a lot of research on estimation methods for traditional applications but, unfortunately, these methods do not apply to Extract Transform Load (ETL) projects. Coming up with a systematic effort estimate for ETL projects is a challenging task since ETL development does not follow the traditional Software Development Life Cycle (SDLC). Traditional application development is requirements-driven whereas ETL application development is data-driven. This research paper describes the development of an effort estimation model for ETL projects and compares this model with the most widely used algorithmic effort estimation model i.e. COCOMO II. A dataset comprising 220 industrial projects from five different software houses is used to build this effort estimation model using Forward Stepwise Regression. After eliminating 20 outliers from this dataset, the adjusted R2 (i.e. goodness of fit) of our model is 0.87. The prediction and training accuracy of this model is measured using the de-facto standard accuracy measures such as MMRE and PRED(25). On a training dataset of 200 projects, the training accuracy value of PRED(25) is 81.16% and MMRE is 0.16. Results show that our proposed estimation model provides considerably better estimation accuracy as compared to COCOMO II. On a validation dataset of 58 projects, the value of PRED(25) was 49% for our model as compared to 21% for COCOMO II. Furthermore, the MMRE of our model is 0.31 as compared to 0.99 for COCOMO II.","PeriodicalId":166507,"journal":{"name":"2015 International Conference on Emerging Technologies (ICET)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Emerging Technologies (ICET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICET.2015.7389209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Effort estimation is a key component of planning a software development project. In the past, there has been a lot of research on estimation methods for traditional applications but, unfortunately, these methods do not apply to Extract Transform Load (ETL) projects. Coming up with a systematic effort estimate for ETL projects is a challenging task since ETL development does not follow the traditional Software Development Life Cycle (SDLC). Traditional application development is requirements-driven whereas ETL application development is data-driven. This research paper describes the development of an effort estimation model for ETL projects and compares this model with the most widely used algorithmic effort estimation model i.e. COCOMO II. A dataset comprising 220 industrial projects from five different software houses is used to build this effort estimation model using Forward Stepwise Regression. After eliminating 20 outliers from this dataset, the adjusted R2 (i.e. goodness of fit) of our model is 0.87. The prediction and training accuracy of this model is measured using the de-facto standard accuracy measures such as MMRE and PRED(25). On a training dataset of 200 projects, the training accuracy value of PRED(25) is 81.16% and MMRE is 0.16. Results show that our proposed estimation model provides considerably better estimation accuracy as compared to COCOMO II. On a validation dataset of 58 projects, the value of PRED(25) was 49% for our model as compared to 21% for COCOMO II. Furthermore, the MMRE of our model is 0.31 as compared to 0.99 for COCOMO II.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于正逐步回归的ETL项目工作量估算
工作量评估是规划软件开发项目的关键组成部分。过去,人们对传统应用的估计方法进行了大量的研究,但不幸的是,这些方法并不适用于提取转换负载(ETL)项目。由于ETL开发不遵循传统的软件开发生命周期(SDLC),因此对ETL项目进行系统的工作量评估是一项具有挑战性的任务。传统应用程序开发是需求驱动的,而ETL应用程序开发是数据驱动的。本文描述了ETL项目的工作量估算模型的开发,并将该模型与目前使用最广泛的算法工作量估算模型COCOMO II进行了比较。数据集包括来自五个不同软件公司的220个工业项目,使用前向逐步回归构建了这个工作量估计模型。从该数据集中剔除20个异常值后,我们的模型调整后的R2(即拟合优度)为0.87。该模型的预测和训练精度使用事实上的标准精度度量,如MMRE和PRED(25)来测量。在200个项目的训练数据集上,PRED(25)的训练准确率值为81.16%,MMRE为0.16。结果表明,与COCOMO II相比,我们提出的估计模型提供了明显更好的估计精度。在58个项目的验证数据集中,我们的模型的PRED(25)值为49%,而COCOMO II的PRED值为21%。此外,我们的模型的MMRE为0.31,而COCOMO II的MMRE为0.99。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A comparative study of target tracking with Kalman filter, extended Kalman filter and particle filter using received signal strength measurements Optimizing NEURON brain simulator with Remote Memory Access on distributed memory systems Theoretical and empirical based extinction coefficients for fog attenuation in terms of visibility at 850 nm Effort estimation of ETL projects using Forward Stepwise Regression An evaluation of software fault tolerance techniques for optimality
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1