{"title":"Empirical Evaluation of Cost Overrun Prediction with Imbalance Data","authors":"Masateru Tsunoda, Akito Monden, Jun-ichiro Shibata, Ken-ichi Matsumoto","doi":"10.1109/ICIS.2011.71","DOIUrl":null,"url":null,"abstract":"To prevent cost overrun of software projects, it is necessary for project managers to identify projects which have high risk of cost overrun in the early phase. So far, discriminant methods such as linear discriminant analysis and logistic regression have been used to predict cost overrun projects. However, accuracy of discriminant methods often becomes low when a dataset used for predict is imbalanced, i.e. there exists a large difference between the number of cost overrun projects and non cost overrun projects. In this paper, we compared accuracy of linear discriminant analysis, logistic regression, classification tree, Mahalanobis-Taguchi method, and collaborative filtering, by changing the percentage of cost overrun projects in the dataset. The result showed that collaborative filtering was highest accuracy among five methods. When the number of cost overrun projects and non cost overrun was balanced in the dataset, linear discriminant analysis was second highest accuracy, and when it was not balanced, Mahalanobis-Taguchi method was second highest among five methods.","PeriodicalId":256762,"journal":{"name":"2011 10th IEEE/ACIS International Conference on Computer and Information Science","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 10th IEEE/ACIS International Conference on Computer and Information Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIS.2011.71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
To prevent cost overrun of software projects, it is necessary for project managers to identify projects which have high risk of cost overrun in the early phase. So far, discriminant methods such as linear discriminant analysis and logistic regression have been used to predict cost overrun projects. However, accuracy of discriminant methods often becomes low when a dataset used for predict is imbalanced, i.e. there exists a large difference between the number of cost overrun projects and non cost overrun projects. In this paper, we compared accuracy of linear discriminant analysis, logistic regression, classification tree, Mahalanobis-Taguchi method, and collaborative filtering, by changing the percentage of cost overrun projects in the dataset. The result showed that collaborative filtering was highest accuracy among five methods. When the number of cost overrun projects and non cost overrun was balanced in the dataset, linear discriminant analysis was second highest accuracy, and when it was not balanced, Mahalanobis-Taguchi method was second highest among five methods.