首页 > 最新文献

Bernoulli最新文献

英文 中文
A presmoothing approach for estimation in the semiparametric Cox mixture cure model 半参数Cox混合物固化模型的一种预光滑估计方法
IF 1.5 2区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-11-01 DOI: 10.3150/21-bej1434
Eni Musta, V. Patilea, I. Van Keilegom
A challenge when dealing with survival analysis data is accounting for a cure fraction, meaning that some subjects will never experience the event of interest. Mixture cure models have been frequently used to estimate both the probability of being cured and the time to event for the susceptible subjects, by usually assuming a parametric (logistic) form of the incidence. We propose a new estimation procedure for a parametric cure rate that relies on a preliminary smooth estimator and is independent of the model assumed for the latency. On a second stage one can assume a semiparametric model for the latency and estimate also the survival distribution of the uncured subject. For the particular case of the logistic/Cox model, we investigate the theoretical properties of the estimators and show through simulations that presmoothing leads to more accurate results compared to the maximum likelihood estimator. To illustrate the practical use, we apply the new estimation procedure to two studies of melanoma survival data.
处理生存分析数据时面临的一个挑战是如何计算治愈率,这意味着一些受试者永远不会经历感兴趣的事件。混合治愈模型经常被用来估计被治愈的概率和时间的易感对象,通常假设一个参数(逻辑)形式的发病率。我们提出了一种新的参数治愈率估计程序,它依赖于一个初步的光滑估计器,并且与假定的延迟模型无关。在第二阶段,可以假设潜伏期的半参数模型,并估计未治愈受试者的生存分布。对于logistic/Cox模型的特殊情况,我们研究了估计器的理论性质,并通过模拟表明,与最大似然估计器相比,预平滑导致更准确的结果。为了说明实际应用,我们将新的估计程序应用于黑色素瘤生存数据的两项研究。
{"title":"A presmoothing approach for estimation in the semiparametric Cox mixture cure model","authors":"Eni Musta, V. Patilea, I. Van Keilegom","doi":"10.3150/21-bej1434","DOIUrl":"https://doi.org/10.3150/21-bej1434","url":null,"abstract":"A challenge when dealing with survival analysis data is accounting for a cure fraction, meaning that some subjects will never experience the event of interest. Mixture cure models have been frequently used to estimate both the probability of being cured and the time to event for the susceptible subjects, by usually assuming a parametric (logistic) form of the incidence. We propose a new estimation procedure for a parametric cure rate that relies on a preliminary smooth estimator and is independent of the model assumed for the latency. On a second stage one can assume a semiparametric model for the latency and estimate also the survival distribution of the uncured subject. For the particular case of the logistic/Cox model, we investigate the theoretical properties of the estimators and show through simulations that presmoothing leads to more accurate results compared to the maximum likelihood estimator. To illustrate the practical use, we apply the new estimation procedure to two studies of melanoma survival data.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43867347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Locally polynomial Hilbertian additive regression 局部多项式Hilbert加性回归
IF 1.5 2区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-08-01 DOI: 10.3150/21-bej1410
Jeong Min Jeon, Young K. Lee, E. Mammen, B. Park
Summary: In this paper a new additive regression technique is developed for response variables that take values in general Hilbert spaces. The proposed method is based on the idea of smooth backfitting that has been developed mainly for real-valued responses. The local polynomial smoothing device is adopted, which renders various advantages of the technique evidenced in the classical univariate kernel regression with real-valued responses. It is demonstrated that the new technique eliminates many limitations which existing methods are subject to. In contrast to the existing techniques, the proposed approach is equipped with the estimation of the derivatives as well as the regression function itself, and provides options to make the estimated regression function free from boundary effects and possess oracle properties. A comprehensive theory is presented for the proposed method, which includes the rates of convergence in various modes and the asymptotic distributions of the estimators. The efficiency of the proposed method is also demonstrated via simulation study and is illustrated through real data applications.
摘要:本文针对在一般希尔伯特空间中取值的响应变量,提出了一种新的加性回归技术。所提出的方法基于平滑反拟合的思想,该思想主要针对实值响应而开发。采用了局部多项式平滑装置,使该技术在具有实值响应的经典单变量核回归中具有各种优点。结果表明,新技术消除了现有方法的许多局限性。与现有技术相比,所提出的方法配备了导数的估计以及回归函数本身,并提供了使估计的回归函数不受边界影响并具有预言性质的选项。为所提出的方法提供了一个综合的理论,包括各种模式下的收敛速度和估计量的渐近分布。仿真研究也证明了该方法的有效性,并通过实际数据应用进行了说明。
{"title":"Locally polynomial Hilbertian additive regression","authors":"Jeong Min Jeon, Young K. Lee, E. Mammen, B. Park","doi":"10.3150/21-bej1410","DOIUrl":"https://doi.org/10.3150/21-bej1410","url":null,"abstract":"Summary: In this paper a new additive regression technique is developed for response variables that take values in general Hilbert spaces. The proposed method is based on the idea of smooth backfitting that has been developed mainly for real-valued responses. The local polynomial smoothing device is adopted, which renders various advantages of the technique evidenced in the classical univariate kernel regression with real-valued responses. It is demonstrated that the new technique eliminates many limitations which existing methods are subject to. In contrast to the existing techniques, the proposed approach is equipped with the estimation of the derivatives as well as the regression function itself, and provides options to make the estimated regression function free from boundary effects and possess oracle properties. A comprehensive theory is presented for the proposed method, which includes the rates of convergence in various modes and the asymptotic distributions of the estimators. The efficiency of the proposed method is also demonstrated via simulation study and is illustrated through real data applications.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45972915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Central limit theorems and asymptotic independence for local U-statistics on diverging halfspaces 发散半空间上局部U-统计量的中心极限定理和渐近独立性
IF 1.5 2区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-07-22 DOI: 10.3150/23-bej1583
A. Thomas
We consider the stochastic behavior of a class of local $U$-statistics of Poisson processes$-$which include subgraph and simplex counts as special cases, and amounts to quantifying clustering behavior$-$for point clouds lying in diverging halfspaces. We provide limit theorems for distributions with light and heavy tails. In particular, we prove finite-dimensional central limit theorems. In the light tail case we investigate tails that decay at least as slow as exponential and at least as fast as Gaussian. These results also furnish as a corollary that $U$-statistics for halfspaces diverging at different angles are asymptotically independent, and that there is no asymptotic independence for heavy-tailed densities. Using state-of-the-art bounds derived from recent breakthroughs combining Stein's method and Malliavin calculus, we quantify the rate of this convergence in terms of Kolmogorov distance. We also investigate the behavior of local $U$-statistics of a Poisson Process conditioned to lie in diverging halfspace and show how the rate of convergence in the Kolmogorov distance is faster the lighter the tail of the density is.
我们考虑了Poisson过程$-$的一类局部$U$-统计量的随机行为,其中包括子图和单纯形计数作为特例,并且相当于量化了位于发散半空间中的点云的聚类行为$-$。我们给出了具有轻尾和重尾分布的极限定理。特别地,我们证明了有限维中心极限定理。在轻尾的情况下,我们研究的尾部衰减速度至少与指数一样慢,至少与高斯一样快。这些结果还提供了一个推论,即在不同角度发散的半空间的$U$-统计量是渐近独立的,并且重尾密度不存在渐近独立性。利用结合Stein方法和Malliavin微积分的最新突破得出的最先进的边界,我们用Kolmogorov距离来量化这种收敛速度。我们还研究了条件位于发散半空间中的泊松过程的局部$U$-统计量的行为,并展示了密度尾部越轻,Kolmogorov距离的收敛速度是如何更快的。
{"title":"Central limit theorems and asymptotic independence for local U-statistics on diverging halfspaces","authors":"A. Thomas","doi":"10.3150/23-bej1583","DOIUrl":"https://doi.org/10.3150/23-bej1583","url":null,"abstract":"We consider the stochastic behavior of a class of local $U$-statistics of Poisson processes$-$which include subgraph and simplex counts as special cases, and amounts to quantifying clustering behavior$-$for point clouds lying in diverging halfspaces. We provide limit theorems for distributions with light and heavy tails. In particular, we prove finite-dimensional central limit theorems. In the light tail case we investigate tails that decay at least as slow as exponential and at least as fast as Gaussian. These results also furnish as a corollary that $U$-statistics for halfspaces diverging at different angles are asymptotically independent, and that there is no asymptotic independence for heavy-tailed densities. Using state-of-the-art bounds derived from recent breakthroughs combining Stein's method and Malliavin calculus, we quantify the rate of this convergence in terms of Kolmogorov distance. We also investigate the behavior of local $U$-statistics of a Poisson Process conditioned to lie in diverging halfspace and show how the rate of convergence in the Kolmogorov distance is faster the lighter the tail of the density is.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46071450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference in latent factor regression with clusterable features 具有聚类特征的潜在因子回归推理
IF 1.5 2区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-05-01 DOI: 10.3150/21-bej1374
Xin Bing, F. Bunea, M. Wegkamp
Regression models, in which the observed features X ∈ R p and the response Y ∈ R depend, jointly, on a lower dimensional, unobserved, latent vector Z ∈ R K , with K (cid:3) p , are popular in a large array of applications, and mainly used for predicting a response from correlated features. In contrast, methodology and theory for inference on the regression coefficient β ∈ R K relating Y to Z are scarce, since typically the un-observable factor Z is hard to interpret. Furthermore, the determination of the asymptotic variance of an estimator of β is a long-standing problem, with solutions known only in a few particular cases. To address some of these outstanding questions, we develop inferential tools for β in a class of factor regression models in which the observed features are signed mixtures of the latent factors. The model specifications are both practically desirable, in a large array of applications, render interpretability to the components of Z , and are sufficient for parameter identifiability. Without assuming that the number of latent factors K or the structure of the mixture is known in advance, we construct computationally efficient estimators of β , along with estimators of other important model parameters. We benchmark the rate of convergence of β by first establishing its (cid:3) 2 -norm minimax lower bound, and show that our proposed estimator (cid:2) β is minimax-rate adaptive. Our main contribution is the provision of a unified analysis of the component-wise Gaussian asymptotic distribution of (cid:2) β and, especially, the derivation of a closed form expression of its asymptotic variance, together with consistent variance estimators. The resulting inferential tools can be used when both K and p are independent of the sample size n , and also when both, or either, p and K vary with n , while allowing for p > n . This complements the only asymptotic normality results obtained for a particular case of the model under consideration, in the regime K = O( 1 ) and p → ∞ , but without a variance estimate. As an application, we provide, within our model specifications, a statistical platform for inference in regression on latent cluster centers, thereby increasing the scope of our theoretical results. We benchmark the newly developed methodology on a recently collected data set for the study of the effectiveness of a new SIV vaccine. Our analysis enables the determination of the top latent antibody-centric mechanisms associated with the vaccine response.
回归模型,其中观察到的特征X∈Rp和响应Y∈R共同依赖于低维、未观察到的潜在向量Z∈RK,其中K(cid:3)p,在大量应用中很流行,主要用于根据相关特征预测响应。相反,由于通常难以解释不可观测的因子Z,因此很少有方法和理论来推断Y与Z之间的回归系数β∈RK。此外,β估计量的渐近方差的确定是一个长期存在的问题,其解仅在少数特定情况下已知。为了解决其中一些悬而未决的问题,我们开发了一类因子回归模型中β的推理工具,其中观察到的特征是潜在因子的符号混合。在大量应用中,模型规范在实践中都是可取的,可以解释Z的组成部分,并且足以识别参数。在不假设潜在因子K的数量或混合物的结构预先已知的情况下,我们构造了β的计算有效估计量,以及其他重要模型参数的估计量。我们通过首先建立β的(cid:3)2-范数极小极大下界来衡量β的收敛速度,并证明我们提出的估计量(cid:2)β是极小极大速率自适应的。我们的主要贡献是对(cid:2)β的分量高斯渐近分布进行了统一分析,特别是导出了其渐近方差的闭合形式表达式,以及一致方差估计量。当K和p都独立于样本量n时,以及当p和K都或其中一个随n变化时,可以使用由此产生的推理工具,同时允许p>n。这补充了在K=O(1)和p的情况下,对于所考虑的模型的特定情况所获得的唯一渐近正态性结果→ ∞ , 但是没有方差估计。作为一种应用,我们在模型规范中提供了一个统计平台,用于对潜在聚类中心进行回归推断,从而扩大了我们理论结果的范围。我们根据最近收集的研究新型SIV疫苗有效性的数据集,对新开发的方法进行了基准测试。我们的分析能够确定与疫苗反应相关的最高潜在抗体中心机制。
{"title":"Inference in latent factor regression with clusterable features","authors":"Xin Bing, F. Bunea, M. Wegkamp","doi":"10.3150/21-bej1374","DOIUrl":"https://doi.org/10.3150/21-bej1374","url":null,"abstract":"Regression models, in which the observed features X ∈ R p and the response Y ∈ R depend, jointly, on a lower dimensional, unobserved, latent vector Z ∈ R K , with K (cid:3) p , are popular in a large array of applications, and mainly used for predicting a response from correlated features. In contrast, methodology and theory for inference on the regression coefficient β ∈ R K relating Y to Z are scarce, since typically the un-observable factor Z is hard to interpret. Furthermore, the determination of the asymptotic variance of an estimator of β is a long-standing problem, with solutions known only in a few particular cases. To address some of these outstanding questions, we develop inferential tools for β in a class of factor regression models in which the observed features are signed mixtures of the latent factors. The model specifications are both practically desirable, in a large array of applications, render interpretability to the components of Z , and are sufficient for parameter identifiability. Without assuming that the number of latent factors K or the structure of the mixture is known in advance, we construct computationally efficient estimators of β , along with estimators of other important model parameters. We benchmark the rate of convergence of β by first establishing its (cid:3) 2 -norm minimax lower bound, and show that our proposed estimator (cid:2) β is minimax-rate adaptive. Our main contribution is the provision of a unified analysis of the component-wise Gaussian asymptotic distribution of (cid:2) β and, especially, the derivation of a closed form expression of its asymptotic variance, together with consistent variance estimators. The resulting inferential tools can be used when both K and p are independent of the sample size n , and also when both, or either, p and K vary with n , while allowing for p > n . This complements the only asymptotic normality results obtained for a particular case of the model under consideration, in the regime K = O( 1 ) and p → ∞ , but without a variance estimate. As an application, we provide, within our model specifications, a statistical platform for inference in regression on latent cluster centers, thereby increasing the scope of our theoretical results. We benchmark the newly developed methodology on a recently collected data set for the study of the effectiveness of a new SIV vaccine. Our analysis enables the determination of the top latent antibody-centric mechanisms associated with the vaccine response.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44387123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Cramér–Wold device for infinite divisibility of Zd-valued distributions Zd值分布无穷可分性的Cramér–Wold装置
IF 1.5 2区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-05-01 DOI: 10.3150/21-bej1386
David Berger, Alexandra H Lindner
We show that a Cramér–Wold device holds for infinite divisibility of Zd-valued distributions, i.e. that the distribution of a Zd-valued random vector X is infinitely divisible if and only if the distribution of aTX is infinitely divisible for all a ∈ Rd, and that this in turn is equivalent to infinite divisibility of the distribution of aTX for all a ∈ N0. A key tool for proving this is a Lévy–Khintchine type representation with a signed Lévy measure for the characteristic function of a Zd-valued distribution, provided the characteristic function is zero-free.
我们证明了Cramér–Wold装置适用于Zd值分布的无限可分性,即Zd值随机向量X的分布是无限可分的,当且仅当aTX的分布对所有a∈Rd都是无限可分割的,并且这反过来等价于aTX的分配对所有a≠N0的无限可分割性。证明这一点的一个关键工具是具有Zd值分布的特征函数的符号Lévy测度的Lévy-Khinchine型表示,前提是特征函数为零。
{"title":"A Cramér–Wold device for infinite divisibility of Zd-valued distributions","authors":"David Berger, Alexandra H Lindner","doi":"10.3150/21-bej1386","DOIUrl":"https://doi.org/10.3150/21-bej1386","url":null,"abstract":"We show that a Cramér–Wold device holds for infinite divisibility of Zd-valued distributions, i.e. that the distribution of a Zd-valued random vector X is infinitely divisible if and only if the distribution of aTX is infinitely divisible for all a ∈ Rd, and that this in turn is equivalent to infinite divisibility of the distribution of aTX for all a ∈ N0. A key tool for proving this is a Lévy–Khintchine type representation with a signed Lévy measure for the characteristic function of a Zd-valued distribution, provided the characteristic function is zero-free.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47599338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Posterior probabilities: Nonmonotonicity, asymptotic rates, log-concavity, and Turán’s inequality 后验概率:非单调性、渐近速率、对数凹性和Turán不等式
IF 1.5 2区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-05-01 DOI: 10.3150/21-BEJ1398
S. Hart, Y. Rinott
In the standard Bayesian framework data are assumed to be generated by a distribution parametrized by $theta$ in a parameter space $Theta$, over which a prior distribution $pi$ is given. A Bayesian statistician quantifies the belief that the true parameter is $theta_{0}$ in $Theta$ by its posterior probability given the observed data. We investigate the behavior of the posterior belief in $theta_{0}$ when the data are generated under some parameter $theta_{1},$ which may or may not be the same as $theta_{0}.$ Starting from stochastic orders, specifically, likelihood ratio dominance, that obtain for resulting distributions of posteriors, we consider monotonicity properties of the posterior probabilities as a function of the sample size when data arrive sequentially. While the $theta_{0}$-posterior is monotonically increasing (i.e., it is a submartingale) when the data are generated under that same $theta_{0}$, it need not be monotonically decreasing in general, not even in terms of its overall expectation, when the data are generated under a different $theta_{1}.$ In fact, it may keep going up and down many times, even in simple cases such as iid coin tosses. We obtain precise asymptotic rates when the data come from the wide class of exponential families of distributions; these rates imply in particular that the expectation of the $theta_{0}$-posterior under $theta_{1}neqtheta_{0}$ is eventually strictly decreasing. Finally, we show that in a number of interesting cases this expectation is a log-concave function of the sample size, and thus unimodal. In the Bernoulli case we obtain this by developing an inequality that is related to Tur'{a}n's inequality for Legendre polynomials.
在标准贝叶斯框架中,假设数据是由参数空间$theta$中由$theta$参数化的分布生成的,在该参数空间上给出了先验分布$pi$。贝叶斯统计学家通过给定观测数据的后验概率来量化真实参数为$theta$中的$theta_{0}$的信念。我们研究了当数据在某个参数$theta_{1}下生成时,$theton_{0}$中的后验信念的行为,该参数可能与$ttheta_{0}相同,也可能不同。$从随机阶,特别是似然比优势开始,当数据顺序到达时,我们考虑后验概率的单调性特性作为样本大小的函数。虽然当数据在相同的$theta_{0}$下生成时,$theta{0}$后验是单调递增的(即,它是一个子映射),但当数据在不同的$theata_{1}$下产生时,它一般不需要单调递减,甚至不需要就其总体期望而言单调递减。$事实上,它可能会多次上下波动,即使是在像iid扔硬币这样的简单情况下。当数据来自广泛的指数分布族时,我们获得了精确的渐近速率;这些比率特别意味着在$teta{1}neqteta{0}$下$teta{0}$-后验的期望最终是严格递减的。最后,我们证明了在许多有趣的情况下,这种期望是样本大小的对数凹函数,因此是单峰的。在伯努利情形中,我们通过发展一个与Tur有关的不等式来获得这一点{a}n勒让德多项式的不等式。
{"title":"Posterior probabilities: Nonmonotonicity, asymptotic rates, log-concavity, and Turán’s inequality","authors":"S. Hart, Y. Rinott","doi":"10.3150/21-BEJ1398","DOIUrl":"https://doi.org/10.3150/21-BEJ1398","url":null,"abstract":"In the standard Bayesian framework data are assumed to be generated by a distribution parametrized by $theta$ in a parameter space $Theta$, over which a prior distribution $pi$ is given. A Bayesian statistician quantifies the belief that the true parameter is $theta_{0}$ in $Theta$ by its posterior probability given the observed data. We investigate the behavior of the posterior belief in $theta_{0}$ when the data are generated under some parameter $theta_{1},$ which may or may not be the same as $theta_{0}.$ Starting from stochastic orders, specifically, likelihood ratio dominance, that obtain for resulting distributions of posteriors, we consider monotonicity properties of the posterior probabilities as a function of the sample size when data arrive sequentially. While the $theta_{0}$-posterior is monotonically increasing (i.e., it is a submartingale) when the data are generated under that same $theta_{0}$, it need not be monotonically decreasing in general, not even in terms of its overall expectation, when the data are generated under a different $theta_{1}.$ In fact, it may keep going up and down many times, even in simple cases such as iid coin tosses. We obtain precise asymptotic rates when the data come from the wide class of exponential families of distributions; these rates imply in particular that the expectation of the $theta_{0}$-posterior under $theta_{1}neqtheta_{0}$ is eventually strictly decreasing. Finally, we show that in a number of interesting cases this expectation is a log-concave function of the sample size, and thus unimodal. In the Bernoulli case we obtain this by developing an inequality that is related to Tur'{a}n's inequality for Legendre polynomials.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41789423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-asymptotic properties of spectral decomposition of large Gram-type matrices and applications 大Gram型矩阵谱分解的非渐近性质及其应用
IF 1.5 2区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-05-01 DOI: 10.3150/21-bej1384
Lyuou Zhang, Wen Zhou, Haonan Wang
Gram-type matrices and their spectral decomposition are of central importance for numerous problems in statistics, applied mathematics, physics, and machine learning. In this paper, we carefully study the non-asymptotic properties of spectral decomposition of large Gram-type matrices when data are not necessarily independent. Specifically, we derive the exponential tail bounds for the deviation between eigenvectors of the right Gram matrix to their population counterparts as well as the Berry-Esseen type bound for these deviations. We also obtain the non-asymptotic tail bound of the ratio between eigenvalues of the left Gram matrix, namely the sample covariance matrix, and their population counterparts regardless of the size of the data matrix. The documented non-asymptotic properties are further demonstrated in a suite of applications, including the non-asymptotic characterization of the estimated number of latent factors in factor models and relate machine learning problems, the estimation and forecasting of high-dimensional time series, the spectral properties of large sample covariance matrix such as perturbation bounds and inference on the spectral projectors, and low-rank matrix denoising using dependent data.
Gram型矩阵及其谱分解在统计学、应用数学、物理学和机器学习中的许多问题中具有核心重要性。在本文中,我们仔细研究了当数据不一定独立时,大Gram型矩阵谱分解的非渐近性质。具体而言,我们推导了右Gram矩阵的特征向量与其总体对应向量之间偏差的指数尾界,以及这些偏差的Berry-Essen型界。我们还获得了左Gram矩阵(即样本协方差矩阵)的特征值与其总体对应值之间的比率的非渐近尾界,而与数据矩阵的大小无关。记录的非渐近性质在一系列应用中得到了进一步的证明,包括因子模型和相关机器学习问题中潜在因素估计数量的非渐近表征、高维时间序列的估计和预测、,大样本协方差矩阵的谱特性,如谱投影上的扰动边界和推断,以及使用相关数据的低秩矩阵去噪。
{"title":"Non-asymptotic properties of spectral decomposition of large Gram-type matrices and applications","authors":"Lyuou Zhang, Wen Zhou, Haonan Wang","doi":"10.3150/21-bej1384","DOIUrl":"https://doi.org/10.3150/21-bej1384","url":null,"abstract":"Gram-type matrices and their spectral decomposition are of central importance for numerous problems in statistics, applied mathematics, physics, and machine learning. In this paper, we carefully study the non-asymptotic properties of spectral decomposition of large Gram-type matrices when data are not necessarily independent. Specifically, we derive the exponential tail bounds for the deviation between eigenvectors of the right Gram matrix to their population counterparts as well as the Berry-Esseen type bound for these deviations. We also obtain the non-asymptotic tail bound of the ratio between eigenvalues of the left Gram matrix, namely the sample covariance matrix, and their population counterparts regardless of the size of the data matrix. The documented non-asymptotic properties are further demonstrated in a suite of applications, including the non-asymptotic characterization of the estimated number of latent factors in factor models and relate machine learning problems, the estimation and forecasting of high-dimensional time series, the spectral properties of large sample covariance matrix such as perturbation bounds and inference on the spectral projectors, and low-rank matrix denoising using dependent data.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42365518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rates and coverage for monotone densities using projection-posterior 使用投影后验的单调密度的速率和覆盖率
IF 1.5 2区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-05-01 DOI: 10.3150/21-bej1379
Moumita Chakraborty, S. Ghosal
We consider Bayesian inference for a monotone density on the unit interval and study the resulting asymptotic properties. We consider a “projection-posterior” approach, where we construct a prior on density functions through random histograms without imposing the monotonicity constraint, but induce a random distribution by projecting a sample from the posterior on the space of monotone functions. The approach allows us to retain posterior conjugacy, allowing explicit expressions extremely useful for studying asymptotic properties. We show that the projection-posterior contracts at the optimal n−1/3-rate. We then construct a consistent test based on the posterior distribution for testing the hypothesis of monotonicity. Finally, we obtain the limiting coverage of a projection-posterior credible interval for the value of the function at an interior point. Interestingly, the limiting coverage turns out to be higher than the nominal credibility level, the opposite of the undercoverage phenomenon observed in a smoothness regime. Moreover, we show that a recalibration method using a lower credibility level gives an intended limiting coverage. We also discuss extensions of the obtained results for densities on the half-line. We conduct a simulation study to demonstrate the accuracy of the asymptotic results in finite samples.
我们考虑了单位区间上单调密度的贝叶斯推理,并研究了由此得到的渐近性质。我们考虑一种“投影-后验”方法,其中我们通过随机直方图在密度函数上构造先验,而不施加单调性约束,但通过将样本从后验投影到单调函数的空间上来诱导随机分布。该方法允许我们保留后验共轭,允许显式表达式对研究渐近性质非常有用。我们证明了投影后验收缩的最佳n−1/3速率。然后,我们构造了一个基于后验分布的一致性检验来检验单调性假设。最后,我们得到了内点上函数值的投影后验可信区间的极限覆盖。有趣的是,极限覆盖率高于名义可信度水平,这与平滑制度中观察到的欠杠杆现象相反。此外,我们表明,使用较低可信度水平的重新校准方法给出了预期的限制覆盖范围。我们还讨论了半直线上密度结果的推广。我们进行了一项模拟研究,以证明有限样本中渐近结果的准确性。
{"title":"Rates and coverage for monotone densities using projection-posterior","authors":"Moumita Chakraborty, S. Ghosal","doi":"10.3150/21-bej1379","DOIUrl":"https://doi.org/10.3150/21-bej1379","url":null,"abstract":"We consider Bayesian inference for a monotone density on the unit interval and study the resulting asymptotic properties. We consider a “projection-posterior” approach, where we construct a prior on density functions through random histograms without imposing the monotonicity constraint, but induce a random distribution by projecting a sample from the posterior on the space of monotone functions. The approach allows us to retain posterior conjugacy, allowing explicit expressions extremely useful for studying asymptotic properties. We show that the projection-posterior contracts at the optimal n−1/3-rate. We then construct a consistent test based on the posterior distribution for testing the hypothesis of monotonicity. Finally, we obtain the limiting coverage of a projection-posterior credible interval for the value of the function at an interior point. Interestingly, the limiting coverage turns out to be higher than the nominal credibility level, the opposite of the undercoverage phenomenon observed in a smoothness regime. Moreover, we show that a recalibration method using a lower credibility level gives an intended limiting coverage. We also discuss extensions of the obtained results for densities on the half-line. We conduct a simulation study to demonstrate the accuracy of the asymptotic results in finite samples.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45354898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
On the measure of anchored Gaussian simplices, with applications to multivariate medians 关于锚定高斯单纯形的测度及其在多元中值中的应用
IF 1.5 2区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-05-01 DOI: 10.3150/21-bej1373
D. Paindaveine
We consider anchored Gaussian (cid:96) -simplices in the d -dimensional Euclidean space, that is, simplices with one fixed vertex y ∈ R d and the remaining vertices X 1 , . . . , X (cid:96) randomly sampled from the d -variate standard normal distribution. We determine the distribution of the measure of such simplices for any d , any (cid:96) , and any anchor point y , which is of interest, e.g., when studying the asymptotics of U-statistics based on such simplex measures. We provide two proofs of the results. The first one is short but is not self-contained as it crucially relies on a technical result for non-central Wishart distributions. The second one is a simple and self-contained proof, that also provides some geometric insight on the results. Quite nicely, variations on this second argument reveal intriguing distributional identities on products of central and non-central chi-square distributions with Beta-distributed non-centrality parameters. We independently establish these distributional identities by making use of Mellin transforms. Beyond the aforementioned use to study the asymptotics of some U-statistics, our results do find natural applications in the context of robust location estimation, as we illustrate by considering a class of simplex-based multivariate medians that contains the celebrated spatial median and Oja median as special cases. Throughout, our results are confirmed by numerical experiments.
我们考虑d维欧几里得空间中的锚定高斯(cid:96)单形,即具有一个固定顶点y∈R d和其余顶点X 1,…的单形。, X (cid:96)从d变量标准正态分布中随机抽样。我们确定了任意d,任意(cid:96)和任意锚点y的这种简单测度的分布,这是感兴趣的,例如,当研究基于这种简单测度的u统计量的渐近性时。我们对结果提供了两个证明。第一个很短,但不是独立的,因为它主要依赖于非中心Wishart分布的技术结果。第二个是一个简单而独立的证明,它也提供了一些关于结果的几何见解。很好地,第二个论点的变化揭示了具有β分布非中心性参数的中心和非中心卡方分布乘积的有趣分布恒等式。我们利用Mellin变换独立地建立了这些分布恒等式。除了上述用于研究某些u统计量的渐近性之外,我们的结果确实在鲁棒位置估计的背景下找到了自然的应用,正如我们通过考虑一类基于simplex的多元中位数来说明的那样,其中包含著名的空间中位数和Oja中位数作为特殊情况。通过数值实验验证了本文的研究结果。
{"title":"On the measure of anchored Gaussian simplices, with applications to multivariate medians","authors":"D. Paindaveine","doi":"10.3150/21-bej1373","DOIUrl":"https://doi.org/10.3150/21-bej1373","url":null,"abstract":"We consider anchored Gaussian (cid:96) -simplices in the d -dimensional Euclidean space, that is, simplices with one fixed vertex y ∈ R d and the remaining vertices X 1 , . . . , X (cid:96) randomly sampled from the d -variate standard normal distribution. We determine the distribution of the measure of such simplices for any d , any (cid:96) , and any anchor point y , which is of interest, e.g., when studying the asymptotics of U-statistics based on such simplex measures. We provide two proofs of the results. The first one is short but is not self-contained as it crucially relies on a technical result for non-central Wishart distributions. The second one is a simple and self-contained proof, that also provides some geometric insight on the results. Quite nicely, variations on this second argument reveal intriguing distributional identities on products of central and non-central chi-square distributions with Beta-distributed non-centrality parameters. We independently establish these distributional identities by making use of Mellin transforms. Beyond the aforementioned use to study the asymptotics of some U-statistics, our results do find natural applications in the context of robust location estimation, as we illustrate by considering a class of simplex-based multivariate medians that contains the celebrated spatial median and Oja median as special cases. Throughout, our results are confirmed by numerical experiments.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45056147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Empirical process of concomitants for partly categorial data and applications in statistics 部分分类数据伴随物的经验过程及其在统计学中的应用
IF 1.5 2区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-05-01 DOI: 10.3150/21-bej1367
D. Gaigall, Julian Gerstenberg, Thi Thu Huyen Trinh
On the basis of independent and identically distributed bivariate random vectors, where the components are categorial and continuous variables, respectively, the related concomitants, also called induced order statistic, are considered. The main theoretical result is a functional central limit theorem for the empirical process of the concomitants in a triangular array setting. A natural application is hypothesis testing. An independence test and a two-sample test are investigated in detail. The fairly general setting enables limit results under local alternatives and bootstrap samples. For the comparison with existing tests from the literature simulation studies are conducted. The empirical results obtained confirm the theoretical findings.
在独立的、同分布的二元随机向量的基础上,考虑其分量分别为分类变量和连续变量的伴随量,也称为诱导序统计量。主要的理论结果是三角阵中伴子的经验过程的一个泛函中心极限定理。一个自然的应用是假设检验。研究了独立性检验和双样本检验。相当通用的设置允许在本地替代方案和引导示例下限制结果。为了与已有的文献试验进行比较,进行了仿真研究。得到的实证结果证实了理论结论。
{"title":"Empirical process of concomitants for partly categorial data and applications in statistics","authors":"D. Gaigall, Julian Gerstenberg, Thi Thu Huyen Trinh","doi":"10.3150/21-bej1367","DOIUrl":"https://doi.org/10.3150/21-bej1367","url":null,"abstract":"On the basis of independent and identically distributed bivariate random vectors, where the components are categorial and continuous variables, respectively, the related concomitants, also called induced order statistic, are considered. The main theoretical result is a functional central limit theorem for the empirical process of the concomitants in a triangular array setting. A natural application is hypothesis testing. An independence test and a two-sample test are investigated in detail. The fairly general setting enables limit results under local alternatives and bootstrap samples. For the comparison with existing tests from the literature simulation studies are conducted. The empirical results obtained confirm the theoretical findings.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46514531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bernoulli
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1