首页 > 最新文献

arXiv: Methodology最新文献

英文 中文
Cross-classified multilevel models 交叉分类多层次模型
Pub Date : 2019-07-04 DOI: 10.1007/978-94-007-0753-5_100821
G. Leckie
{"title":"Cross-classified multilevel models","authors":"G. Leckie","doi":"10.1007/978-94-007-0753-5_100821","DOIUrl":"https://doi.org/10.1007/978-94-007-0753-5_100821","url":null,"abstract":"","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128267423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Causal Inference from Possibly Unbalanced Split-Plot Designs: A Randomization-based Perspective. 从可能不平衡的分裂图设计中得出因果推论:基于随机化的视角。
Pub Date : 2019-06-20 DOI: 10.5705/SS.202020.0149
R. Mukerjee, Tirthankar Dasgupta
Split-plot designs find wide applicability in multifactor experiments with randomization restrictions. Practical considerations often warrant the use of unbalanced designs. This paper investigates randomization based causal inference in split-plot designs that are possibly unbalanced. Extension of ideas from the recently studied balanced case yields an expression for the sampling variance of a treatment contrast estimator as well as a conservative estimator of the sampling variance. However, the bias of this variance estimator does not vanish even when the treatment effects are strictly additive. A careful and involved matrix analysis is employed to overcome this difficulty, resulting in a new variance estimator, which becomes unbiased under milder conditions. A construction procedure that generates such an estimator with minimax bias is proposed.
裂图设计在有随机化限制的多因素实验中具有广泛的适用性。实际的考虑常常保证使用不平衡设计。本文研究了可能不平衡的分裂图设计中基于随机化的因果推理。对最近研究的平衡情况的思想进行扩展,得到了处理对比估计量的抽样方差表达式以及抽样方差的保守估计量。然而,即使处理效果是严格加性的,这种方差估计器的偏差也不会消失。采用仔细而复杂的矩阵分析来克服这一困难,从而产生新的方差估计量,该估计量在较温和的条件下变得无偏。提出了一种构造方法,生成了这样一个具有极大极小偏差的估计量。
{"title":"Causal Inference from Possibly Unbalanced Split-Plot Designs: A Randomization-based Perspective.","authors":"R. Mukerjee, Tirthankar Dasgupta","doi":"10.5705/SS.202020.0149","DOIUrl":"https://doi.org/10.5705/SS.202020.0149","url":null,"abstract":"Split-plot designs find wide applicability in multifactor experiments with randomization restrictions. Practical considerations often warrant the use of unbalanced designs. This paper investigates randomization based causal inference in split-plot designs that are possibly unbalanced. Extension of ideas from the recently studied balanced case yields an expression for the sampling variance of a treatment contrast estimator as well as a conservative estimator of the sampling variance. However, the bias of this variance estimator does not vanish even when the treatment effects are strictly additive. A careful and involved matrix analysis is employed to overcome this difficulty, resulting in a new variance estimator, which becomes unbiased under milder conditions. A construction procedure that generates such an estimator with minimax bias is proposed.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114394496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Approximate Bayesian Approach to Model-assisted Survey Estimation with Many Auxiliary Variables. 多辅助变量模型辅助测量估计的近似贝叶斯方法。
Pub Date : 2019-06-11 DOI: 10.5705/ss.202019.0239
S. Sugasawa, Jae Kwang Kim
Model-assisted estimation with complex survey data is an important practical problem in survey sampling. When there are many auxiliary variables, selecting significant variables associated with the study variable would be necessary to achieve efficient estimation of population parameters of interest. In this paper, we formulate a regularized regression estimator in the framework of Bayesian inference using the penalty function as the shrinkage prior for model selection. The proposed Bayesian approach enables us to get not only efficient point estimates but also reasonable credible intervals. Results from two limited simulation studies are presented to facilitate comparison with existing frequentist methods.
复杂调查数据的模型辅助估计是调查抽样中一个重要的实际问题。当有许多辅助变量时,选择与研究变量相关的显著变量将是必要的,以实现对感兴趣的总体参数的有效估计。本文在贝叶斯推理的框架下,利用惩罚函数作为模型选择的收缩先验,构造了一个正则化回归估计量。所提出的贝叶斯方法不仅能得到有效的点估计,而且能得到合理的可信区间。给出了两个有限仿真研究的结果,以便与现有的频率学方法进行比较。
{"title":"An Approximate Bayesian Approach to Model-assisted Survey Estimation with Many Auxiliary Variables.","authors":"S. Sugasawa, Jae Kwang Kim","doi":"10.5705/ss.202019.0239","DOIUrl":"https://doi.org/10.5705/ss.202019.0239","url":null,"abstract":"Model-assisted estimation with complex survey data is an important practical problem in survey sampling. When there are many auxiliary variables, selecting significant variables associated with the study variable would be necessary to achieve efficient estimation of population parameters of interest. In this paper, we formulate a regularized regression estimator in the framework of Bayesian inference using the penalty function as the shrinkage prior for model selection. The proposed Bayesian approach enables us to get not only efficient point estimates but also reasonable credible intervals. Results from two limited simulation studies are presented to facilitate comparison with existing frequentist methods.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123930316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-asymptotic inference in a class of optimization problems 一类优化问题的非渐近推理
Pub Date : 2019-05-17 DOI: 10.1920/WP.CEM.2019.2319
S. Lee, J. Horowitz
This paper describes a method for carrying out non-asymptotic inference on partially identified parameters that are solutions to a class of optimization problems. The optimization problems arise in applications in which grouped data are used for estimation of a model's structural parameters. The parameters are characterized by restrictions that involve the population means of observed random variables in addition to the structural parameters of interest. Inference consists of finding confidence intervals for the structural parameters. Our method is non-asymptotic in the sense that it provides a finite-sample bound on the difference between the true and nominal probabilities with which a confidence interval contains the true but unknown value of a parameter. We contrast our method with an alternative non-asymptotic method based on the median-of-means estimator of Minsker (2015). The results of Monte Carlo experiments and an empirical example illustrate the usefulness of our method.
本文描述了对一类优化问题的部分辨识参数解进行非渐近推理的方法。在使用分组数据估计模型结构参数的应用中,会出现优化问题。除了感兴趣的结构参数外,参数的特征还包括观察到的随机变量的总体均值的限制。推理包括寻找结构参数的置信区间。我们的方法是非渐近的,因为它提供了真实概率和名义概率之差的有限样本界,其中置信区间包含参数的真实但未知的值。我们将我们的方法与基于Minsker(2015)的中位数估计量的另一种非渐近方法进行对比。蒙特卡罗实验的结果和一个实例说明了该方法的有效性。
{"title":"Non-asymptotic inference in a class of optimization problems","authors":"S. Lee, J. Horowitz","doi":"10.1920/WP.CEM.2019.2319","DOIUrl":"https://doi.org/10.1920/WP.CEM.2019.2319","url":null,"abstract":"This paper describes a method for carrying out non-asymptotic inference on partially identified parameters that are solutions to a class of optimization problems. The optimization problems arise in applications in which grouped data are used for estimation of a model's structural parameters. The parameters are characterized by restrictions that involve the population means of observed random variables in addition to the structural parameters of interest. Inference consists of finding confidence intervals for the structural parameters. Our method is non-asymptotic in the sense that it provides a finite-sample bound on the difference between the true and nominal probabilities with which a confidence interval contains the true but unknown value of a parameter. We contrast our method with an alternative non-asymptotic method based on the median-of-means estimator of Minsker (2015). The results of Monte Carlo experiments and an empirical example illustrate the usefulness of our method.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122750335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Simulation study of estimating between-study variance and overall effect in meta-analyses of log-response-ratio for normal data 正常数据对数响应比荟萃分析中估计研究间方差和总体效应的模拟研究
Pub Date : 2019-05-03 DOI: 10.31222/osf.io/3bnxs
Ilyas Bakbergenuly, D. Hoaglin, E. Kulinskaya
Methods for random-effects meta-analysis require an estimate of the between-study variance, $tau^2$. The performance of estimators of $tau^2$ (measured by bias and coverage) affects their usefulness in assessing heterogeneity of study-level effects, and also the performance of related estimators of the overall effect. For the effect measure log-response-ratio (LRR, also known as the logarithm of the ratio of means, RoM), we review four point estimators of $tau^2$ (the popular methods of DerSimonian-Laird (DL), restricted maximum likelihood, and Mandel and Paule (MP), and the less-familiar method of Jackson), four interval estimators for $tau^2$ (profile likelihood, Q-profile, Biggerstaff and Jackson, and Jackson), five point estimators of the overall effect (the four related to the point estimators of $tau^2$ and an estimator whose weights use only study-level sample sizes), and seven interval estimators for the overall effect (four based on the point estimators for $tau^2$, the Hartung-Knapp-Sidik-Jonkman (HKSJ) interval, a modification of HKSJ that uses the MP estimator of $tau^2$ instead of the DL estimator, and an interval based on the sample-size-weighted estimator). We obtain empirical evidence from extensive simulations of data from normal distributions. Simulations from lognormal distributions are in a separate report Bakbergenuly et al. 2019b.
随机效应荟萃分析的方法需要估计研究间方差,$tau^2$。$tau^2$的估计量(通过偏倚和覆盖测量)的性能影响它们在评估研究水平效应的异质性方面的有用性,以及总体效应的相关估计量的性能。对于效应测量对数响应比(LRR,也称为均值比的对数,RoM),我们回顾了$tau^2$的四个点估计量(流行的dersimonan - laird (DL)方法,限制最大似然,Mandel和Paule (MP)方法,以及不太熟悉的Jackson方法),$tau^2$的四个区间估计量(似然,Q-profile, Biggerstaff和Jackson, Jackson),整体效果的5个点估计(其中4个与$tau^2$的点估计和一个权重仅使用研究水平样本大小的估计有关),以及整体效果的7个区间估计(其中4个基于$tau^2$的点估计、hartung - knappp - sidik - jonkman (HKSJ)区间,HKSJ的一种修改,使用$tau^2$的MP估计而不是DL估计,以及一个基于样本大小加权估计的区间)。我们从正态分布数据的大量模拟中获得经验证据。对数正态分布的模拟见Bakbergenuly et al. 2019b的另一份报告。
{"title":"Simulation study of estimating between-study variance and overall effect in meta-analyses of log-response-ratio for normal data","authors":"Ilyas Bakbergenuly, D. Hoaglin, E. Kulinskaya","doi":"10.31222/osf.io/3bnxs","DOIUrl":"https://doi.org/10.31222/osf.io/3bnxs","url":null,"abstract":"Methods for random-effects meta-analysis require an estimate of the between-study variance, $tau^2$. The performance of estimators of $tau^2$ (measured by bias and coverage) affects their usefulness in assessing heterogeneity of study-level effects, and also the performance of related estimators of the overall effect. For the effect measure log-response-ratio (LRR, also known as the logarithm of the ratio of means, RoM), we review four point estimators of $tau^2$ (the popular methods of DerSimonian-Laird (DL), restricted maximum likelihood, and Mandel and Paule (MP), and the less-familiar method of Jackson), four interval estimators for $tau^2$ (profile likelihood, Q-profile, Biggerstaff and Jackson, and Jackson), five point estimators of the overall effect (the four related to the point estimators of $tau^2$ and an estimator whose weights use only study-level sample sizes), and seven interval estimators for the overall effect (four based on the point estimators for $tau^2$, the Hartung-Knapp-Sidik-Jonkman (HKSJ) interval, a modification of HKSJ that uses the MP estimator of $tau^2$ instead of the DL estimator, and an interval based on the sample-size-weighted estimator). We obtain empirical evidence from extensive simulations of data from normal distributions. Simulations from lognormal distributions are in a separate report Bakbergenuly et al. 2019b.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123229664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A penalized likelihood approach for efficiently estimating a partially linear additive transformation model with current status data 基于当前状态数据的部分线性加性变换模型有效估计的惩罚似然方法
Pub Date : 2019-04-23 DOI: 10.1214/21-EJS1820
Yan Liu, Minggen Lu, C. McMahan
Current status data are commonly encountered in medical and epidemiological studies in which the failure time for study units is the outcome variable of interest. Data of this form are characterized by the fact that the failure time is not directly observed but rather is known relative to an observation time; i.e., the failure times are either left- or right-censored. Due to its structure, the analysis of such data can be challenging. To circumvent these challenges and to provide for a flexible modeling construct which can be used to analyze current status data, herein, a partially linear additive transformation model is proposed. In the formulation of this model, constrained $B$-splines are employed to model the monotone transformation function and nonlinear covariate effects. To provide for more efficient estimates, a penalization technique is used to regularize the estimation of all unknown functions. An easy to implement hybrid algorithm is developed for model fitting and a simple estimator of the large-sample variance-covariance matrix is proposed. It is shown theoretically that the proposed estimators of the finite-dimensional regression coefficients are root-$n$ consistent, asymptotically normal, and achieve the semi-parametric information bound while the estimators of the nonparametric components attain the optimal rate of convergence. The finite-sample performance of the proposed methodology is evaluated through extensive numerical studies and is further demonstrated through the analysis of uterine leiomyomata data.
当前状态数据通常在医学和流行病学研究中遇到,其中研究单位的失效时间是感兴趣的结果变量。这种形式的数据的特点是,失效时间不是直接观测到的,而是相对于观测时间已知的;也就是说,失败时间要么被左审查,要么被右审查。由于其结构,对此类数据的分析可能具有挑战性。为了规避这些挑战,并提供一个灵活的建模结构,可用于分析当前状态数据,本文提出了部分线性加性转换模型。在该模型的表述中,采用约束$B$样条对单调变换函数和非线性协变量效应进行建模。为了提供更有效的估计,使用惩罚技术对所有未知函数的估计进行正则化。提出了一种易于实现的模型拟合混合算法,并提出了一种简单的大样本方差-协方差矩阵估计方法。从理论上证明了所提出的有限维回归系数的估计量是根-$n$一致的,渐近正态的,并且达到了半参数信息界,而非参数分量的估计量达到了最优收敛速度。通过广泛的数值研究评估了所提出方法的有限样本性能,并通过子宫平滑肌瘤数据的分析进一步证明了这一点。
{"title":"A penalized likelihood approach for efficiently estimating a partially linear additive transformation model with current status data","authors":"Yan Liu, Minggen Lu, C. McMahan","doi":"10.1214/21-EJS1820","DOIUrl":"https://doi.org/10.1214/21-EJS1820","url":null,"abstract":"Current status data are commonly encountered in medical and epidemiological studies in which the failure time for study units is the outcome variable of interest. Data of this form are characterized by the fact that the failure time is not directly observed but rather is known relative to an observation time; i.e., the failure times are either left- or right-censored. Due to its structure, the analysis of such data can be challenging. To circumvent these challenges and to provide for a flexible modeling construct which can be used to analyze current status data, herein, a partially linear additive transformation model is proposed. In the formulation of this model, constrained $B$-splines are employed to model the monotone transformation function and nonlinear covariate effects. To provide for more efficient estimates, a penalization technique is used to regularize the estimation of all unknown functions. An easy to implement hybrid algorithm is developed for model fitting and a simple estimator of the large-sample variance-covariance matrix is proposed. It is shown theoretically that the proposed estimators of the finite-dimensional regression coefficients are root-$n$ consistent, asymptotically normal, and achieve the semi-parametric information bound while the estimators of the nonparametric components attain the optimal rate of convergence. The finite-sample performance of the proposed methodology is evaluated through extensive numerical studies and is further demonstrated through the analysis of uterine leiomyomata data.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122996417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Practical valid inferences for the two-sample binomial problem 二样本二项问题的实际有效推论
Pub Date : 2019-04-10 DOI: 10.1214/21-SS131
M. Fay, S. Hunsberger
Our interest is whether two binomial parameters differ, which parameter is larger, and by how much. This apparently simple problem was addressed by Fisher in the 1930's, and has been the subject of many review papers since then. Yet there continues to be new work on this issue and no consensus solution. Previous reviews have focused primarily on testing and the properties of validity and power, or primarily on confidence intervals, their coverage, and expected length. Here we evaluate both. For example, we consider whether a p-value and its matching confidence interval are compatible, meaning that the p-value rejects at level $alpha$ if and only if the $1-alpha$ confidence interval excludes all null parameter values. For focus, we only examine non-asymptotic inferences, so that most of the p-values and confidence intervals are valid (i.e., exact) by construction. Within this focus, we review different methods emphasizing many of the properties and interpretational aspects we desire from applied frequentist inference: validity, accuracy, good power, equivariance, compatibility, coherence, and parameterization and direction of effect. We show that no one method can meet all the desirable properties and give recommendations based on which properties are given more importance.
我们感兴趣的是两个二项式参数是否不同,哪个参数更大,差多少。这个看似简单的问题在20世纪30年代由费雪提出,并从那时起成为许多评论论文的主题。然而,在这个问题上仍有新的工作要做,没有协商一致的解决办法。以前的评论主要集中在测试和有效性和效力的性质上,或者主要集中在置信区间、它们的覆盖范围和预期长度上。这里我们对两者都求值。例如,我们考虑p值及其匹配的置信区间是否兼容,这意味着p值在$alpha$水平上拒绝当且仅当$1-alpha$置信区间排除所有空参数值。对于焦点,我们只检查非渐近推断,因此大多数p值和置信区间通过构造是有效的(即准确的)。在这个重点中,我们回顾了不同的方法,强调了我们希望从应用频率论推理中得到的许多特性和解释方面:有效性、准确性、良好的功率、等变性、兼容性、一致性、参数化和效果方向。我们证明了没有一种方法可以满足所有需要的性质,并给出了基于哪些性质更重要的建议。
{"title":"Practical valid inferences for the two-sample binomial problem","authors":"M. Fay, S. Hunsberger","doi":"10.1214/21-SS131","DOIUrl":"https://doi.org/10.1214/21-SS131","url":null,"abstract":"Our interest is whether two binomial parameters differ, which parameter is larger, and by how much. This apparently simple problem was addressed by Fisher in the 1930's, and has been the subject of many review papers since then. Yet there continues to be new work on this issue and no consensus solution. Previous reviews have focused primarily on testing and the properties of validity and power, or primarily on confidence intervals, their coverage, and expected length. Here we evaluate both. For example, we consider whether a p-value and its matching confidence interval are compatible, meaning that the p-value rejects at level $alpha$ if and only if the $1-alpha$ confidence interval excludes all null parameter values. For focus, we only examine non-asymptotic inferences, so that most of the p-values and confidence intervals are valid (i.e., exact) by construction. Within this focus, we review different methods emphasizing many of the properties and interpretational aspects we desire from applied frequentist inference: validity, accuracy, good power, equivariance, compatibility, coherence, and parameterization and direction of effect. We show that no one method can meet all the desirable properties and give recommendations based on which properties are given more importance.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127623961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Relaxing the assumptions of knockoffs by conditioning 通过调节放松对仿冒品的假设
Pub Date : 2019-03-07 DOI: 10.1214/19-AOS1920
Dongming Huang, Lucas Janson
The recent paper Cand`es et al. (2018) introduced model-X knockoffs, a method for variable selection that provably and non-asymptotically controls the false discovery rate with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known (but arbitrary) distribution. The present paper shows that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as $Omega(n^{*}p)$ parameters, where $p$ is the dimension and $n^{*}$ is the number of covariate samples (which may exceed the usual sample size $n$ of labeled samples when unlabeled samples are also available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. We demonstrate how to do this for three models of interest, with simulations showing the new approach remains powerful under the weaker assumptions.
最近的论文Cand ' es等人(2018)介绍了model-X仿真品,这是一种变量选择方法,可证明且非渐近地控制错误发现率,对数据的维度或给定协变量的响应的条件分布没有限制或假设。该过程的一个要求是,协变量样本是从一个精确已知(但任意)的分布中独立且相同地抽取的。本文表明,在不完全知道协变量分布的情况下,可以做出完全相同的保证,而是只知道一个参数模型,参数多达$Omega(n^{*}p)$,其中$p$是维度,$n^{*}$是协变量样本的数量(当没有标记的样本也可用时,它可能超过标记样本的通常样本量$n$)。关键是要对待协变量,就好像它们是有条件地根据模型的充分统计量的观察值绘制的。虽然这个想法很简单,但即使在高斯模型中,在一个足够的统计量的条件下,也会导致一个分布支持在一组零勒贝格测度上,这需要来自拓扑测度理论的技术来建立有效的算法。我们演示了如何为三个感兴趣的模型做到这一点,模拟表明,在较弱的假设下,新方法仍然强大。
{"title":"Relaxing the assumptions of knockoffs by conditioning","authors":"Dongming Huang, Lucas Janson","doi":"10.1214/19-AOS1920","DOIUrl":"https://doi.org/10.1214/19-AOS1920","url":null,"abstract":"The recent paper Cand`es et al. (2018) introduced model-X knockoffs, a method for variable selection that provably and non-asymptotically controls the false discovery rate with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known (but arbitrary) distribution. The present paper shows that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as $Omega(n^{*}p)$ parameters, where $p$ is the dimension and $n^{*}$ is the number of covariate samples (which may exceed the usual sample size $n$ of labeled samples when unlabeled samples are also available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. We demonstrate how to do this for three models of interest, with simulations showing the new approach remains powerful under the weaker assumptions.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115750544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
A primer on statistically validated networks 统计验证网络的入门
Pub Date : 2019-02-19 DOI: 10.3254/190007
S. Miccichè, R. Mantegna
In this contribution we discuss some approaches of network analysis providing information about single links or single nodes with respect to a null hypothesis taking into account the heterogeneity of the system empirically observed. With this approach, a selection of nodes and links is feasible when the null hypothesis is statistically rejected. We focus our discussion on approaches using (i) the so-called disparity filter and (ii) statistically validated network in bipartite networks. For both methods we discuss the importance of using multiple hypothesis test correction. Specific applications of statistically validated networks are discussed. We also discuss how statistically validated networks can be used to (i) pre-process large sets of data and (ii) detect cores of communities that are forming the most close-knit and stable subsets of clusters of nodes present in a complex system.
在这篇文章中,我们讨论了一些网络分析的方法,这些方法提供了关于单链路或单节点的信息,并考虑到经验观察到的系统的异质性。使用这种方法,当零假设在统计上被拒绝时,节点和链接的选择是可行的。我们将重点讨论在二部网络中使用(i)所谓的视差过滤器和(ii)统计验证网络的方法。对于这两种方法,我们讨论了使用多重假设检验校正的重要性。讨论了统计验证网络的具体应用。我们还讨论了如何使用统计验证的网络来(i)预处理大数据集和(ii)检测社区的核心,这些社区正在形成复杂系统中存在的节点集群中最紧密和最稳定的子集。
{"title":"A primer on statistically validated networks","authors":"S. Miccichè, R. Mantegna","doi":"10.3254/190007","DOIUrl":"https://doi.org/10.3254/190007","url":null,"abstract":"In this contribution we discuss some approaches of network analysis providing information about single links or single nodes with respect to a null hypothesis taking into account the heterogeneity of the system empirically observed. With this approach, a selection of nodes and links is feasible when the null hypothesis is statistically rejected. We focus our discussion on approaches using (i) the so-called disparity filter and (ii) statistically validated network in bipartite networks. For both methods we discuss the importance of using multiple hypothesis test correction. Specific applications of statistically validated networks are discussed. We also discuss how statistically validated networks can be used to (i) pre-process large sets of data and (ii) detect cores of communities that are forming the most close-knit and stable subsets of clusters of nodes present in a complex system.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123866559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
5. Optimality criteria for probabilistic numerical methods 5. 概率数值方法的最优准则
Pub Date : 2019-01-14 DOI: 10.1515/9783110635461-005
C. Oates, J. Cockayne, D. Prangle, T. Sullivan, M. Girolami
It is well understood that Bayesian decision theory and average case analysis are essentially identical. However, if one is interested in performing uncertainty quantification for a numerical task, it can be argued that the decision-theoretic framework is neither appropriate nor sufficient. To this end, we consider an alternative optimality criterion from Bayesian experimental design and study its implied optimal information in the numerical context. This information is demonstrated to differ, in general, from the information that would be used in an average-case-optimal numerical method. The explicit connection to Bayesian experimental design suggests several distinct regimes in which optimal probabilistic numerical methods can be developed.
众所周知,贝叶斯决策理论和平均案例分析在本质上是相同的。然而,如果有人对执行数值任务的不确定性量化感兴趣,可以认为决策理论框架既不合适也不充分。为此,我们考虑了贝叶斯实验设计的另一种最优性准则,并在数值背景下研究了其隐含的最优信息。一般来说,这些信息与平均情况下最优数值方法中使用的信息不同。与贝叶斯实验设计的明确联系表明,可以开发出几种不同的最优概率数值方法。
{"title":"5. Optimality criteria for probabilistic numerical methods","authors":"C. Oates, J. Cockayne, D. Prangle, T. Sullivan, M. Girolami","doi":"10.1515/9783110635461-005","DOIUrl":"https://doi.org/10.1515/9783110635461-005","url":null,"abstract":"It is well understood that Bayesian decision theory and average case analysis are essentially identical. However, if one is interested in performing uncertainty quantification for a numerical task, it can be argued that the decision-theoretic framework is neither appropriate nor sufficient. To this end, we consider an alternative optimality criterion from Bayesian experimental design and study its implied optimal information in the numerical context. This information is demonstrated to differ, in general, from the information that would be used in an average-case-optimal numerical method. The explicit connection to Bayesian experimental design suggests several distinct regimes in which optimal probabilistic numerical methods can be developed.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131725203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
arXiv: Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1