{"title":"Prediction for Big Data Through Kriging: Small Sequential and One-Shot Designs","authors":"J. Kleijnen, Wim C. M. van Beers","doi":"10.1080/01966324.2020.1716281","DOIUrl":null,"url":null,"abstract":"Abstract Kriging—or Gaussian process (GP) modeling—is an interpolation method assuming that the outputs (responses) are more correlated, as the inputs (explanatory or independent variables) are closer. Such a GP has unknown (hyper)parameters that are usually estimated through the maximum-likelihood method. Big data, however, make it problematic to compute these estimated parameters, and the corresponding Kriging predictor and its predictor variance. To solve this problem, some authors select a relatively small subset from the big set of previously observed “old” data. These selection methods are sequential, and they depend on the variance of the Kriging predictor; this variance requires a specific Kriging model and the estimation of its parameters. The resulting designs turn out to be “local”; i.e., most selected old input combinations are concentrated around the new combination to be predicted. We develop a simpler one-shot (fixed-sample, non-sequential) design; i.e., from the big data set we select a small subset with the nearest neighbors of the new combination. To compare our designs and the sequential designs empirically, we use the squared prediction errors, in several numerical experiments. These experiments show that our design may yield reasonable performance.","PeriodicalId":35850,"journal":{"name":"American Journal of Mathematical and Management Sciences","volume":"39 1","pages":"199 - 213"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/01966324.2020.1716281","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Mathematical and Management Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/01966324.2020.1716281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Business, Management and Accounting","Score":null,"Total":0}
引用次数: 13
Abstract
Abstract Kriging—or Gaussian process (GP) modeling—is an interpolation method assuming that the outputs (responses) are more correlated, as the inputs (explanatory or independent variables) are closer. Such a GP has unknown (hyper)parameters that are usually estimated through the maximum-likelihood method. Big data, however, make it problematic to compute these estimated parameters, and the corresponding Kriging predictor and its predictor variance. To solve this problem, some authors select a relatively small subset from the big set of previously observed “old” data. These selection methods are sequential, and they depend on the variance of the Kriging predictor; this variance requires a specific Kriging model and the estimation of its parameters. The resulting designs turn out to be “local”; i.e., most selected old input combinations are concentrated around the new combination to be predicted. We develop a simpler one-shot (fixed-sample, non-sequential) design; i.e., from the big data set we select a small subset with the nearest neighbors of the new combination. To compare our designs and the sequential designs empirically, we use the squared prediction errors, in several numerical experiments. These experiments show that our design may yield reasonable performance.