Prediction for Big Data Through Kriging: Small Sequential and One-Shot Designs

Q3 Business, Management and Accounting American Journal of Mathematical and Management Sciences Pub Date : 2020-01-30 DOI:10.1080/01966324.2020.1716281

J. Kleijnen, Wim C. M. van Beers

{"title":"Prediction for Big Data Through Kriging: Small Sequential and One-Shot Designs","authors":"J. Kleijnen, Wim C. M. van Beers","doi":"10.1080/01966324.2020.1716281","DOIUrl":null,"url":null,"abstract":"Abstract Kriging—or Gaussian process (GP) modeling—is an interpolation method assuming that the outputs (responses) are more correlated, as the inputs (explanatory or independent variables) are closer. Such a GP has unknown (hyper)parameters that are usually estimated through the maximum-likelihood method. Big data, however, make it problematic to compute these estimated parameters, and the corresponding Kriging predictor and its predictor variance. To solve this problem, some authors select a relatively small subset from the big set of previously observed “old” data. These selection methods are sequential, and they depend on the variance of the Kriging predictor; this variance requires a specific Kriging model and the estimation of its parameters. The resulting designs turn out to be “local”; i.e., most selected old input combinations are concentrated around the new combination to be predicted. We develop a simpler one-shot (fixed-sample, non-sequential) design; i.e., from the big data set we select a small subset with the nearest neighbors of the new combination. To compare our designs and the sequential designs empirically, we use the squared prediction errors, in several numerical experiments. These experiments show that our design may yield reasonable performance.","PeriodicalId":35850,"journal":{"name":"American Journal of Mathematical and Management Sciences","volume":"39 1","pages":"199 - 213"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/01966324.2020.1716281","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Mathematical and Management Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/01966324.2020.1716281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Business, Management and Accounting","Score":null,"Total":0}

引用次数: 13

Abstract

Abstract Kriging—or Gaussian process (GP) modeling—is an interpolation method assuming that the outputs (responses) are more correlated, as the inputs (explanatory or independent variables) are closer. Such a GP has unknown (hyper)parameters that are usually estimated through the maximum-likelihood method. Big data, however, make it problematic to compute these estimated parameters, and the corresponding Kriging predictor and its predictor variance. To solve this problem, some authors select a relatively small subset from the big set of previously observed “old” data. These selection methods are sequential, and they depend on the variance of the Kriging predictor; this variance requires a specific Kriging model and the estimation of its parameters. The resulting designs turn out to be “local”; i.e., most selected old input combinations are concentrated around the new combination to be predicted. We develop a simpler one-shot (fixed-sample, non-sequential) design; i.e., from the big data set we select a small subset with the nearest neighbors of the new combination. To compare our designs and the sequential designs empirically, we use the squared prediction errors, in several numerical experiments. These experiments show that our design may yield reasonable performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过克里格预测大数据:小序列和一次性设计

摘要克里格（Kriging）或高斯过程（GP）建模是一种插值方法，假设随着输入（解释变量或自变量）的接近，输出（响应）的相关性更强。这样的GP具有未知（超）参数，这些参数通常通过最大似然法来估计。然而，大数据使得计算这些估计参数以及相应的克里格预测器及其预测器方差成为问题。为了解决这个问题，一些作者从之前观察到的“旧”数据的大集合中选择了一个相对较小的子集。这些选择方法是顺序的，并且它们取决于克里格预测器的方差；这种方差需要特定的克里格模型及其参数的估计。由此产生的设计结果是“局部的”；即大多数选择的旧输入组合集中在要预测的新组合周围。我们开发了一种更简单的一次性（固定样本，非顺序）设计；即，从大数据集中，我们选择具有新组合的最近邻居的子集。为了从经验上比较我们的设计和顺序设计，我们在几个数值实验中使用了预测误差的平方。这些实验表明，我们的设计可能产生合理的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

American Journal of Mathematical and Management Sciences Business, Management and Accounting-Business, Management and Accounting (all)

CiteScore

2.70

自引率

0.00%

发文量