Relaxing the assumptions of knockoffs by conditioning

Dongming Huang, Lucas Janson
{"title":"Relaxing the assumptions of knockoffs by conditioning","authors":"Dongming Huang, Lucas Janson","doi":"10.1214/19-AOS1920","DOIUrl":null,"url":null,"abstract":"The recent paper Cand\\`es et al. (2018) introduced model-X knockoffs, a method for variable selection that provably and non-asymptotically controls the false discovery rate with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known (but arbitrary) distribution. The present paper shows that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as $\\Omega(n^{*}p)$ parameters, where $p$ is the dimension and $n^{*}$ is the number of covariate samples (which may exceed the usual sample size $n$ of labeled samples when unlabeled samples are also available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. We demonstrate how to do this for three models of interest, with simulations showing the new approach remains powerful under the weaker assumptions.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/19-AOS1920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

Abstract

The recent paper Cand\`es et al. (2018) introduced model-X knockoffs, a method for variable selection that provably and non-asymptotically controls the false discovery rate with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known (but arbitrary) distribution. The present paper shows that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as $\Omega(n^{*}p)$ parameters, where $p$ is the dimension and $n^{*}$ is the number of covariate samples (which may exceed the usual sample size $n$ of labeled samples when unlabeled samples are also available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. We demonstrate how to do this for three models of interest, with simulations showing the new approach remains powerful under the weaker assumptions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过调节放松对仿冒品的假设
最近的论文Cand ' es等人(2018)介绍了model-X仿真品,这是一种变量选择方法,可证明且非渐近地控制错误发现率,对数据的维度或给定协变量的响应的条件分布没有限制或假设。该过程的一个要求是,协变量样本是从一个精确已知(但任意)的分布中独立且相同地抽取的。本文表明,在不完全知道协变量分布的情况下,可以做出完全相同的保证,而是只知道一个参数模型,参数多达$\Omega(n^{*}p)$,其中$p$是维度,$n^{*}$是协变量样本的数量(当没有标记的样本也可用时,它可能超过标记样本的通常样本量$n$)。关键是要对待协变量,就好像它们是有条件地根据模型的充分统计量的观察值绘制的。虽然这个想法很简单,但即使在高斯模型中,在一个足够的统计量的条件下,也会导致一个分布支持在一组零勒贝格测度上,这需要来自拓扑测度理论的技术来建立有效的算法。我们演示了如何为三个感兴趣的模型做到这一点,模拟表明,在较弱的假设下,新方法仍然强大。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Revisiting Empirical Bayes Methods and Applications to Special Types of Data Flexible Bayesian modelling of concomitant covariate effects in mixture models A Critique of Differential Abundance Analysis, and Advocacy for an Alternative Post-Processing of MCMC Conditional variance estimator for sufficient dimension reduction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1