{"title":"Prediction Bounds for General-Error-Regression Cost-Estimating Relationships","authors":"Stephen A. Book","doi":"10.1080/1941658X.2012.682935","DOIUrl":null,"url":null,"abstract":"Estimating the cost of a system under development is essentially trying to predict the future, which means that any such estimate contains uncertainty. When estimating using a costestimating relationship (CER), a portion of this uncertainty arises from the possibility that the cost-estimating form to which regression analysis is applied may be the incorrect one. That is, the data may have been fit to a linear form, but some curvilinear relationship may more appropriately model the data. Assuming the algebraic model being used is the correct one, the CER’s uncertainty is described by its standard error of the estimate (SEE), which is basically the standard deviation of errors made (residuals) in applying that CER to estimate the (known) costs of the systems comprising the historical database. The SEE depends primarily on the extent to which those (known) costs fit the CER that purports to model them. Finally, additional uncertainty associated with a specific CER arises from the location of the particular cost-driver value (x) within or without the range of cost-driver values for programs comprising the historical cost database. For example, if x were located near the center of the range of its historical values, the CER would provide a more precise measure of the element’s cost than if x were located toward the edges or even outside the data range. The total uncertainty of CER-based estimates is a combination of all sources of uncertainty. The first kind of uncertainty mentioned, which questions the particular CER shape involved, cannot be measured without redoing the regression analysis for a wide variety of algebraic and other kinds of CER forms. Once we have decided upon a definite CER form, the SEE, represented by only one number characteristic of the CER, is fairly easy to measure for any CER shape or error model using known algebraic formulas. The second kind of uncertainty associated with a specific CER, which assesses both the CER itself and the value of the cost-driving parameter, is more complicated, and the way to account for it is completely understood only in the case of classical linear regression, i.e., ordinary least squares (OLS). As a result, explicit formulas exist for “prediction intervals” that bound cost estimates based on CERs that have been derived by applying OLS to historical cost data. For CERs, even linear ones, derived by other statistical methods, there appears to be no general method of solution described in the theoretical statistical literature. This report illustrates the application of bootstrap statistical sampling, a 34-year-old statistical process (Casella, 2003), to the problem of estimating prediction bounds for multiplicative-error and other CERs derived by non-OLS methods. After the bootstrap method is shown to be capable of yielding prediction bounds that approximate the known OLS bounds fairly","PeriodicalId":390877,"journal":{"name":"Journal of Cost Analysis and Parametrics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cost Analysis and Parametrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/1941658X.2012.682935","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Estimating the cost of a system under development is essentially trying to predict the future, which means that any such estimate contains uncertainty. When estimating using a costestimating relationship (CER), a portion of this uncertainty arises from the possibility that the cost-estimating form to which regression analysis is applied may be the incorrect one. That is, the data may have been fit to a linear form, but some curvilinear relationship may more appropriately model the data. Assuming the algebraic model being used is the correct one, the CER’s uncertainty is described by its standard error of the estimate (SEE), which is basically the standard deviation of errors made (residuals) in applying that CER to estimate the (known) costs of the systems comprising the historical database. The SEE depends primarily on the extent to which those (known) costs fit the CER that purports to model them. Finally, additional uncertainty associated with a specific CER arises from the location of the particular cost-driver value (x) within or without the range of cost-driver values for programs comprising the historical cost database. For example, if x were located near the center of the range of its historical values, the CER would provide a more precise measure of the element’s cost than if x were located toward the edges or even outside the data range. The total uncertainty of CER-based estimates is a combination of all sources of uncertainty. The first kind of uncertainty mentioned, which questions the particular CER shape involved, cannot be measured without redoing the regression analysis for a wide variety of algebraic and other kinds of CER forms. Once we have decided upon a definite CER form, the SEE, represented by only one number characteristic of the CER, is fairly easy to measure for any CER shape or error model using known algebraic formulas. The second kind of uncertainty associated with a specific CER, which assesses both the CER itself and the value of the cost-driving parameter, is more complicated, and the way to account for it is completely understood only in the case of classical linear regression, i.e., ordinary least squares (OLS). As a result, explicit formulas exist for “prediction intervals” that bound cost estimates based on CERs that have been derived by applying OLS to historical cost data. For CERs, even linear ones, derived by other statistical methods, there appears to be no general method of solution described in the theoretical statistical literature. This report illustrates the application of bootstrap statistical sampling, a 34-year-old statistical process (Casella, 2003), to the problem of estimating prediction bounds for multiplicative-error and other CERs derived by non-OLS methods. After the bootstrap method is shown to be capable of yielding prediction bounds that approximate the known OLS bounds fairly