On Variability due to Local Minima and K-fold Cross-validation

Artificial intelligence for the earth systems Pub Date : 2022-10-06 DOI:10.1175/aies-d-21-0004.1

C. Marzban, Jueyi Liu, P. Tissot

{"title":"On Variability due to Local Minima and K-fold Cross-validation","authors":"C. Marzban, Jueyi Liu, P. Tissot","doi":"10.1175/aies-d-21-0004.1","DOIUrl":null,"url":null,"abstract":"\nResampling methods such as cross-validation or bootstrap are often employed to estimate the uncertainty in a loss function due to sampling variability, usually for the purpose of model selection. But in models that require nonlinear optimization, the existence of local minima in the loss function landscape introduces an additional source of variability which is confounded with sampling variability. In other words, some portion of the variability in the loss function across different resamples is due to local minima. Given that statistically-sound model selection is based on an examination of variance, it is important to disentangle these two sources of variability. To that end, a methodology is developed for estimating each, specifically in the context of K-fold cross-validation, and Neural Networks (NN) whose training leads to different local minima. Random effects models are used to estimate the two variance components - due to sampling and due to local minima. The results are examined as a function of the number of hidden nodes, and the variance of the initial weights, with the latter controlling the “depth” of local minima. The main goal of the methodology is to increase statistical power in model selection and/or model comparison. Using both simulated and realistic data it is shown that the two sources of variability can be comparable, casting doubt on model selection methods that ignore the variability due to local minima. Furthermore, the methodology is sufficiently flexible so as to allow assessment of the effect of other/any NN parameters on variability.","PeriodicalId":94369,"journal":{"name":"Artificial intelligence for the earth systems","volume":"83 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence for the earth systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1175/aies-d-21-0004.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Resampling methods such as cross-validation or bootstrap are often employed to estimate the uncertainty in a loss function due to sampling variability, usually for the purpose of model selection. But in models that require nonlinear optimization, the existence of local minima in the loss function landscape introduces an additional source of variability which is confounded with sampling variability. In other words, some portion of the variability in the loss function across different resamples is due to local minima. Given that statistically-sound model selection is based on an examination of variance, it is important to disentangle these two sources of variability. To that end, a methodology is developed for estimating each, specifically in the context of K-fold cross-validation, and Neural Networks (NN) whose training leads to different local minima. Random effects models are used to estimate the two variance components - due to sampling and due to local minima. The results are examined as a function of the number of hidden nodes, and the variance of the initial weights, with the latter controlling the “depth” of local minima. The main goal of the methodology is to increase statistical power in model selection and/or model comparison. Using both simulated and realistic data it is shown that the two sources of variability can be comparable, casting doubt on model selection methods that ignore the variability due to local minima. Furthermore, the methodology is sufficiently flexible so as to allow assessment of the effect of other/any NN parameters on variability.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

局部最小值与K-fold交叉验证的可变性

交叉验证或自举等重采样方法通常用于估计由于采样可变性而导致的损失函数中的不确定性，通常用于模型选择。但是在需要非线性优化的模型中，损失函数中局部极小值的存在引入了一个额外的可变性源，它与采样可变性相混淆。换句话说，损失函数中跨不同样本的可变性的一部分是由于局部最小值。考虑到统计上合理的模型选择是基于对方差的检查，区分这两个变异性来源是很重要的。为此，开发了一种方法来估计每个，特别是在K-fold交叉验证和神经网络(NN)的背景下，其训练导致不同的局部最小值。随机效应模型用于估计两个方差成分-由于抽样和由于局部最小值。将结果作为隐藏节点数和初始权重方差的函数进行检验，后者控制局部最小值的“深度”。该方法的主要目标是提高模型选择和/或模型比较的统计能力。模拟数据和实际数据表明，两种变异性来源可以比较，这对忽略局部极小值引起的变异性的模型选择方法提出了质疑。此外，该方法具有足够的灵活性，可以评估其他/任何神经网络参数对变异性的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Artificial intelligence for the earth systems

自引率

0.00%

发文量