The Cross-Validated Adaptive Epsilon-Net Estimator

Statistics & Decisions Pub Date : 1900-01-01 DOI:10.1524/STND.2006.24.3.373

M. Laan, S. Dudoit, A. Vaart

{"title":"The Cross-Validated Adaptive Epsilon-Net Estimator","authors":"M. Laan, S. Dudoit, A. Vaart","doi":"10.1524/STND.2006.24.3.373","DOIUrl":null,"url":null,"abstract":"Suppose that we observe a sample of independent and identically distributed realizations of a random variable, and a parameter of interest can be defined as the minimizer, over a suitably defined parameter set, of the expectation of a (loss) function of a candidate parameter value and the random variable. For example, squared error loss in regression or the negative log-density loss in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter set may result in ill-defined or too variable estimators of the parameter of interest. In this article, we propose a cross-validated e-net estimation method, which uses a collection of submodels and a collection of e-nets over each submodel. For each submodel s and each resolution level e, the minimizer of the empirical risk over the corresponding e-net is a candidate estimator. Next we select from these estimators (i.e. select the pair (s,e)) by multi-fold cross-validation. We derive a finite sample inequality that shows that the resulting estimator is as good as an oracle estimator that uses the best submodel and resolution level for the unknown true parameter. We also address the implementation of the estimation procedure, and in the context of a linear regression model we present results of a preliminary simulation study comparing the cross-validated e-net estimator to the cross-validated L1-penalized least squares estimator (LASSO) and the least angle regression estimator (LARS).","PeriodicalId":380446,"journal":{"name":"Statistics & Decisions","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"138","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics & Decisions","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1524/STND.2006.24.3.373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 138

Abstract

Suppose that we observe a sample of independent and identically distributed realizations of a random variable, and a parameter of interest can be defined as the minimizer, over a suitably defined parameter set, of the expectation of a (loss) function of a candidate parameter value and the random variable. For example, squared error loss in regression or the negative log-density loss in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter set may result in ill-defined or too variable estimators of the parameter of interest. In this article, we propose a cross-validated e-net estimation method, which uses a collection of submodels and a collection of e-nets over each submodel. For each submodel s and each resolution level e, the minimizer of the empirical risk over the corresponding e-net is a candidate estimator. Next we select from these estimators (i.e. select the pair (s,e)) by multi-fold cross-validation. We derive a finite sample inequality that shows that the resulting estimator is as good as an oracle estimator that uses the best submodel and resolution level for the unknown true parameter. We also address the implementation of the estimation procedure, and in the context of a linear regression model we present results of a preliminary simulation study comparing the cross-validated e-net estimator to the cross-validated L1-penalized least squares estimator (LASSO) and the least angle regression estimator (LARS).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

交叉验证自适应Epsilon-Net估计器

假设我们观察到一个随机变量的独立和同分布实现的样本，并且感兴趣的参数可以定义为在适当定义的参数集上，候选参数值和随机变量的(损失)函数的期望的最小值。例如，回归中的平方误差损失或密度估计中的负对数密度损失。最小化整个参数集上的经验风险(即损失函数的经验平均值)可能会导致感兴趣参数的定义不清或太可变的估计。在本文中，我们提出了一种交叉验证的e-net估计方法，该方法使用子模型集合和每个子模型上的e-net集合。对于每个子模型s和每个分辨率水平e，相应e-net上经验风险的最小值是一个候选估计量。接下来，我们通过多重交叉验证从这些估计量中选择(即选择对(s,e))。我们推导了一个有限样本不等式，表明所得到的估计器与使用最佳子模型和未知真参数的分辨率水平的oracle估计器一样好。我们还讨论了估计过程的实现，在线性回归模型的背景下，我们提出了初步模拟研究的结果，将交叉验证的e-net估计器与交叉验证的l1惩罚最小二乘估计器(LASSO)和最小角度回归估计器(LARS)进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Statistics & Decisions

自引率

0.00%

发文量