J. Wackers, Hayriye Pehlivan Solak, Riccardo, Pellegrini, A. Serani, M. Diez
{"title":"Error estimation for surrogate models with noisy small-sized training sets","authors":"J. Wackers, Hayriye Pehlivan Solak, Riccardo, Pellegrini, A. Serani, M. Diez","doi":"10.23967/admos.2023.007","DOIUrl":null,"url":null,"abstract":"Simulation-driven shape optimization often uses surrogate models, i.e. approximate models fitted through a dataset of simulation results for a limited number of designs. The shape optimization is then performed over this surrogate model. For efficiency, modern approaches often construct the datasets adaptively, adding simulation points one by one where they are most likely to discover the optimum design [3]. The uncertainty estimation of the surrogate model is essential to guide the choice of new sample points: underestimation of the uncertainty leads to sampling in suboptimal regions, missing the true optimum. Gaussian process regression naturally provides uncertainty estimations [4] and Stochastic Radial Basis Functions (SRBF) surrogate models estimate the uncertainty based on the spread of RBF fits with different kernels [5]. In the context of SRBF, this paper discusses two issues with uncertainty estimation. The first is that most existing techniques rely on knowledge about the global behaviour of the data, such as spatial correlations. However, the number of datapoints can be too small to reconstruct this global information from the data. We argue that in this situation, user-provided estimation of the function behaviour is a better choice (section 3). The second issue is that the dataset may contain noise, i.e. random errors without spatial correlation. Surrogate models can filter out this noise, but it introduces two separate uncertainties: the optimum amount of noise filtering is unknown, and for a small dataset (even with perfect noise filtering) the local mean of the data may not correspond to the true simulation response. In section 4 we introduce estimators for both uncertainties.","PeriodicalId":414984,"journal":{"name":"XI International Conference on Adaptive Modeling and Simulation","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"XI International Conference on Adaptive Modeling and Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23967/admos.2023.007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Simulation-driven shape optimization often uses surrogate models, i.e. approximate models fitted through a dataset of simulation results for a limited number of designs. The shape optimization is then performed over this surrogate model. For efficiency, modern approaches often construct the datasets adaptively, adding simulation points one by one where they are most likely to discover the optimum design [3]. The uncertainty estimation of the surrogate model is essential to guide the choice of new sample points: underestimation of the uncertainty leads to sampling in suboptimal regions, missing the true optimum. Gaussian process regression naturally provides uncertainty estimations [4] and Stochastic Radial Basis Functions (SRBF) surrogate models estimate the uncertainty based on the spread of RBF fits with different kernels [5]. In the context of SRBF, this paper discusses two issues with uncertainty estimation. The first is that most existing techniques rely on knowledge about the global behaviour of the data, such as spatial correlations. However, the number of datapoints can be too small to reconstruct this global information from the data. We argue that in this situation, user-provided estimation of the function behaviour is a better choice (section 3). The second issue is that the dataset may contain noise, i.e. random errors without spatial correlation. Surrogate models can filter out this noise, but it introduces two separate uncertainties: the optimum amount of noise filtering is unknown, and for a small dataset (even with perfect noise filtering) the local mean of the data may not correspond to the true simulation response. In section 4 we introduce estimators for both uncertainties.