{"title":"K-Fold交叉验证的渐近性","authors":"Jessie Li","doi":"10.1613/jair.1.13974","DOIUrl":null,"url":null,"abstract":"This paper investigates the asymptotic distribution of the K-fold cross validation error in an i.i.d. setting. As the number of observations n goes to infinity while keeping the number of folds K fixed, the K-fold cross validation error is √ n-consistent for the expected out-of-sample error and has an asymptotically normal distribution. A consistent estimate of the asymptotic variance is derived and used to construct asymptotically valid confidence intervals for the expected out-of-sample error. A hypothesis test is developed for comparing two estimators’ expected out-of-sample errors and a subsampling procedure is used to obtain critical values. Monte Carlo simulations demonstrate the asymptotic validity of our confidence intervals for the expected out-of-sample error and investigate the size and power properties of our test. In our empirical application, we use our estimator selection test to compare the out-of-sample predictive performance of OLS, Neural Networks, and Random Forests for predicting the sale price of a domain name in a GoDaddy expiry auction.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"49 5","pages":"0"},"PeriodicalIF":4.5000,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Asymptotics of K-Fold Cross Validation\",\"authors\":\"Jessie Li\",\"doi\":\"10.1613/jair.1.13974\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper investigates the asymptotic distribution of the K-fold cross validation error in an i.i.d. setting. As the number of observations n goes to infinity while keeping the number of folds K fixed, the K-fold cross validation error is √ n-consistent for the expected out-of-sample error and has an asymptotically normal distribution. A consistent estimate of the asymptotic variance is derived and used to construct asymptotically valid confidence intervals for the expected out-of-sample error. A hypothesis test is developed for comparing two estimators’ expected out-of-sample errors and a subsampling procedure is used to obtain critical values. Monte Carlo simulations demonstrate the asymptotic validity of our confidence intervals for the expected out-of-sample error and investigate the size and power properties of our test. In our empirical application, we use our estimator selection test to compare the out-of-sample predictive performance of OLS, Neural Networks, and Random Forests for predicting the sale price of a domain name in a GoDaddy expiry auction.\",\"PeriodicalId\":54877,\"journal\":{\"name\":\"Journal of Artificial Intelligence Research\",\"volume\":\"49 5\",\"pages\":\"0\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2023-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Artificial Intelligence Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1613/jair.1.13974\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1613/jair.1.13974","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
This paper investigates the asymptotic distribution of the K-fold cross validation error in an i.i.d. setting. As the number of observations n goes to infinity while keeping the number of folds K fixed, the K-fold cross validation error is √ n-consistent for the expected out-of-sample error and has an asymptotically normal distribution. A consistent estimate of the asymptotic variance is derived and used to construct asymptotically valid confidence intervals for the expected out-of-sample error. A hypothesis test is developed for comparing two estimators’ expected out-of-sample errors and a subsampling procedure is used to obtain critical values. Monte Carlo simulations demonstrate the asymptotic validity of our confidence intervals for the expected out-of-sample error and investigate the size and power properties of our test. In our empirical application, we use our estimator selection test to compare the out-of-sample predictive performance of OLS, Neural Networks, and Random Forests for predicting the sale price of a domain name in a GoDaddy expiry auction.
期刊介绍:
JAIR(ISSN 1076 - 9757) covers all areas of artificial intelligence (AI), publishing refereed research articles, survey articles, and technical notes. Established in 1993 as one of the first electronic scientific journals, JAIR is indexed by INSPEC, Science Citation Index, and MathSciNet. JAIR reviews papers within approximately three months of submission and publishes accepted articles on the internet immediately upon receiving the final versions. JAIR articles are published for free distribution on the internet by the AI Access Foundation, and for purchase in bound volumes by AAAI Press.