A challenge when dealing with survival analysis data is accounting for a cure fraction, meaning that some subjects will never experience the event of interest. Mixture cure models have been frequently used to estimate both the probability of being cured and the time to event for the susceptible subjects, by usually assuming a parametric (logistic) form of the incidence. We propose a new estimation procedure for a parametric cure rate that relies on a preliminary smooth estimator and is independent of the model assumed for the latency. On a second stage one can assume a semiparametric model for the latency and estimate also the survival distribution of the uncured subject. For the particular case of the logistic/Cox model, we investigate the theoretical properties of the estimators and show through simulations that presmoothing leads to more accurate results compared to the maximum likelihood estimator. To illustrate the practical use, we apply the new estimation procedure to two studies of melanoma survival data.
{"title":"A presmoothing approach for estimation in the semiparametric Cox mixture cure model","authors":"Eni Musta, V. Patilea, I. Van Keilegom","doi":"10.3150/21-bej1434","DOIUrl":"https://doi.org/10.3150/21-bej1434","url":null,"abstract":"A challenge when dealing with survival analysis data is accounting for a cure fraction, meaning that some subjects will never experience the event of interest. Mixture cure models have been frequently used to estimate both the probability of being cured and the time to event for the susceptible subjects, by usually assuming a parametric (logistic) form of the incidence. We propose a new estimation procedure for a parametric cure rate that relies on a preliminary smooth estimator and is independent of the model assumed for the latency. On a second stage one can assume a semiparametric model for the latency and estimate also the survival distribution of the uncured subject. For the particular case of the logistic/Cox model, we investigate the theoretical properties of the estimators and show through simulations that presmoothing leads to more accurate results compared to the maximum likelihood estimator. To illustrate the practical use, we apply the new estimation procedure to two studies of melanoma survival data.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43867347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary: In this paper a new additive regression technique is developed for response variables that take values in general Hilbert spaces. The proposed method is based on the idea of smooth backfitting that has been developed mainly for real-valued responses. The local polynomial smoothing device is adopted, which renders various advantages of the technique evidenced in the classical univariate kernel regression with real-valued responses. It is demonstrated that the new technique eliminates many limitations which existing methods are subject to. In contrast to the existing techniques, the proposed approach is equipped with the estimation of the derivatives as well as the regression function itself, and provides options to make the estimated regression function free from boundary effects and possess oracle properties. A comprehensive theory is presented for the proposed method, which includes the rates of convergence in various modes and the asymptotic distributions of the estimators. The efficiency of the proposed method is also demonstrated via simulation study and is illustrated through real data applications.
{"title":"Locally polynomial Hilbertian additive regression","authors":"Jeong Min Jeon, Young K. Lee, E. Mammen, B. Park","doi":"10.3150/21-bej1410","DOIUrl":"https://doi.org/10.3150/21-bej1410","url":null,"abstract":"Summary: In this paper a new additive regression technique is developed for response variables that take values in general Hilbert spaces. The proposed method is based on the idea of smooth backfitting that has been developed mainly for real-valued responses. The local polynomial smoothing device is adopted, which renders various advantages of the technique evidenced in the classical univariate kernel regression with real-valued responses. It is demonstrated that the new technique eliminates many limitations which existing methods are subject to. In contrast to the existing techniques, the proposed approach is equipped with the estimation of the derivatives as well as the regression function itself, and provides options to make the estimated regression function free from boundary effects and possess oracle properties. A comprehensive theory is presented for the proposed method, which includes the rates of convergence in various modes and the asymptotic distributions of the estimators. The efficiency of the proposed method is also demonstrated via simulation study and is illustrated through real data applications.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45972915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the stochastic behavior of a class of local $U$-statistics of Poisson processes$-$which include subgraph and simplex counts as special cases, and amounts to quantifying clustering behavior$-$for point clouds lying in diverging halfspaces. We provide limit theorems for distributions with light and heavy tails. In particular, we prove finite-dimensional central limit theorems. In the light tail case we investigate tails that decay at least as slow as exponential and at least as fast as Gaussian. These results also furnish as a corollary that $U$-statistics for halfspaces diverging at different angles are asymptotically independent, and that there is no asymptotic independence for heavy-tailed densities. Using state-of-the-art bounds derived from recent breakthroughs combining Stein's method and Malliavin calculus, we quantify the rate of this convergence in terms of Kolmogorov distance. We also investigate the behavior of local $U$-statistics of a Poisson Process conditioned to lie in diverging halfspace and show how the rate of convergence in the Kolmogorov distance is faster the lighter the tail of the density is.
{"title":"Central limit theorems and asymptotic independence for local U-statistics on diverging halfspaces","authors":"A. Thomas","doi":"10.3150/23-bej1583","DOIUrl":"https://doi.org/10.3150/23-bej1583","url":null,"abstract":"We consider the stochastic behavior of a class of local $U$-statistics of Poisson processes$-$which include subgraph and simplex counts as special cases, and amounts to quantifying clustering behavior$-$for point clouds lying in diverging halfspaces. We provide limit theorems for distributions with light and heavy tails. In particular, we prove finite-dimensional central limit theorems. In the light tail case we investigate tails that decay at least as slow as exponential and at least as fast as Gaussian. These results also furnish as a corollary that $U$-statistics for halfspaces diverging at different angles are asymptotically independent, and that there is no asymptotic independence for heavy-tailed densities. Using state-of-the-art bounds derived from recent breakthroughs combining Stein's method and Malliavin calculus, we quantify the rate of this convergence in terms of Kolmogorov distance. We also investigate the behavior of local $U$-statistics of a Poisson Process conditioned to lie in diverging halfspace and show how the rate of convergence in the Kolmogorov distance is faster the lighter the tail of the density is.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46071450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Regression models, in which the observed features X ∈ R p and the response Y ∈ R depend, jointly, on a lower dimensional, unobserved, latent vector Z ∈ R K , with K (cid:3) p , are popular in a large array of applications, and mainly used for predicting a response from correlated features. In contrast, methodology and theory for inference on the regression coefficient β ∈ R K relating Y to Z are scarce, since typically the un-observable factor Z is hard to interpret. Furthermore, the determination of the asymptotic variance of an estimator of β is a long-standing problem, with solutions known only in a few particular cases. To address some of these outstanding questions, we develop inferential tools for β in a class of factor regression models in which the observed features are signed mixtures of the latent factors. The model specifications are both practically desirable, in a large array of applications, render interpretability to the components of Z , and are sufficient for parameter identifiability. Without assuming that the number of latent factors K or the structure of the mixture is known in advance, we construct computationally efficient estimators of β , along with estimators of other important model parameters. We benchmark the rate of convergence of β by first establishing its (cid:3) 2 -norm minimax lower bound, and show that our proposed estimator (cid:2) β is minimax-rate adaptive. Our main contribution is the provision of a unified analysis of the component-wise Gaussian asymptotic distribution of (cid:2) β and, especially, the derivation of a closed form expression of its asymptotic variance, together with consistent variance estimators. The resulting inferential tools can be used when both K and p are independent of the sample size n , and also when both, or either, p and K vary with n , while allowing for p > n . This complements the only asymptotic normality results obtained for a particular case of the model under consideration, in the regime K = O( 1 ) and p → ∞ , but without a variance estimate. As an application, we provide, within our model specifications, a statistical platform for inference in regression on latent cluster centers, thereby increasing the scope of our theoretical results. We benchmark the newly developed methodology on a recently collected data set for the study of the effectiveness of a new SIV vaccine. Our analysis enables the determination of the top latent antibody-centric mechanisms associated with the vaccine response.
{"title":"Inference in latent factor regression with clusterable features","authors":"Xin Bing, F. Bunea, M. Wegkamp","doi":"10.3150/21-bej1374","DOIUrl":"https://doi.org/10.3150/21-bej1374","url":null,"abstract":"Regression models, in which the observed features X ∈ R p and the response Y ∈ R depend, jointly, on a lower dimensional, unobserved, latent vector Z ∈ R K , with K (cid:3) p , are popular in a large array of applications, and mainly used for predicting a response from correlated features. In contrast, methodology and theory for inference on the regression coefficient β ∈ R K relating Y to Z are scarce, since typically the un-observable factor Z is hard to interpret. Furthermore, the determination of the asymptotic variance of an estimator of β is a long-standing problem, with solutions known only in a few particular cases. To address some of these outstanding questions, we develop inferential tools for β in a class of factor regression models in which the observed features are signed mixtures of the latent factors. The model specifications are both practically desirable, in a large array of applications, render interpretability to the components of Z , and are sufficient for parameter identifiability. Without assuming that the number of latent factors K or the structure of the mixture is known in advance, we construct computationally efficient estimators of β , along with estimators of other important model parameters. We benchmark the rate of convergence of β by first establishing its (cid:3) 2 -norm minimax lower bound, and show that our proposed estimator (cid:2) β is minimax-rate adaptive. Our main contribution is the provision of a unified analysis of the component-wise Gaussian asymptotic distribution of (cid:2) β and, especially, the derivation of a closed form expression of its asymptotic variance, together with consistent variance estimators. The resulting inferential tools can be used when both K and p are independent of the sample size n , and also when both, or either, p and K vary with n , while allowing for p > n . This complements the only asymptotic normality results obtained for a particular case of the model under consideration, in the regime K = O( 1 ) and p → ∞ , but without a variance estimate. As an application, we provide, within our model specifications, a statistical platform for inference in regression on latent cluster centers, thereby increasing the scope of our theoretical results. We benchmark the newly developed methodology on a recently collected data set for the study of the effectiveness of a new SIV vaccine. Our analysis enables the determination of the top latent antibody-centric mechanisms associated with the vaccine response.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44387123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We show that a Cramér–Wold device holds for infinite divisibility of Zd-valued distributions, i.e. that the distribution of a Zd-valued random vector X is infinitely divisible if and only if the distribution of aTX is infinitely divisible for all a ∈ Rd, and that this in turn is equivalent to infinite divisibility of the distribution of aTX for all a ∈ N0. A key tool for proving this is a Lévy–Khintchine type representation with a signed Lévy measure for the characteristic function of a Zd-valued distribution, provided the characteristic function is zero-free.
{"title":"A Cramér–Wold device for infinite divisibility of Zd-valued distributions","authors":"David Berger, Alexandra H Lindner","doi":"10.3150/21-bej1386","DOIUrl":"https://doi.org/10.3150/21-bej1386","url":null,"abstract":"We show that a Cramér–Wold device holds for infinite divisibility of Zd-valued distributions, i.e. that the distribution of a Zd-valued random vector X is infinitely divisible if and only if the distribution of aTX is infinitely divisible for all a ∈ Rd, and that this in turn is equivalent to infinite divisibility of the distribution of aTX for all a ∈ N0. A key tool for proving this is a Lévy–Khintchine type representation with a signed Lévy measure for the characteristic function of a Zd-valued distribution, provided the characteristic function is zero-free.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47599338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the standard Bayesian framework data are assumed to be generated by a distribution parametrized by $theta$ in a parameter space $Theta$, over which a prior distribution $pi$ is given. A Bayesian statistician quantifies the belief that the true parameter is $theta_{0}$ in $Theta$ by its posterior probability given the observed data. We investigate the behavior of the posterior belief in $theta_{0}$ when the data are generated under some parameter $theta_{1},$ which may or may not be the same as $theta_{0}.$ Starting from stochastic orders, specifically, likelihood ratio dominance, that obtain for resulting distributions of posteriors, we consider monotonicity properties of the posterior probabilities as a function of the sample size when data arrive sequentially. While the $theta_{0}$-posterior is monotonically increasing (i.e., it is a submartingale) when the data are generated under that same $theta_{0}$, it need not be monotonically decreasing in general, not even in terms of its overall expectation, when the data are generated under a different $theta_{1}.$ In fact, it may keep going up and down many times, even in simple cases such as iid coin tosses. We obtain precise asymptotic rates when the data come from the wide class of exponential families of distributions; these rates imply in particular that the expectation of the $theta_{0}$-posterior under $theta_{1}neqtheta_{0}$ is eventually strictly decreasing. Finally, we show that in a number of interesting cases this expectation is a log-concave function of the sample size, and thus unimodal. In the Bernoulli case we obtain this by developing an inequality that is related to Tur'{a}n's inequality for Legendre polynomials.
{"title":"Posterior probabilities: Nonmonotonicity, asymptotic rates, log-concavity, and Turán’s inequality","authors":"S. Hart, Y. Rinott","doi":"10.3150/21-BEJ1398","DOIUrl":"https://doi.org/10.3150/21-BEJ1398","url":null,"abstract":"In the standard Bayesian framework data are assumed to be generated by a distribution parametrized by $theta$ in a parameter space $Theta$, over which a prior distribution $pi$ is given. A Bayesian statistician quantifies the belief that the true parameter is $theta_{0}$ in $Theta$ by its posterior probability given the observed data. We investigate the behavior of the posterior belief in $theta_{0}$ when the data are generated under some parameter $theta_{1},$ which may or may not be the same as $theta_{0}.$ Starting from stochastic orders, specifically, likelihood ratio dominance, that obtain for resulting distributions of posteriors, we consider monotonicity properties of the posterior probabilities as a function of the sample size when data arrive sequentially. While the $theta_{0}$-posterior is monotonically increasing (i.e., it is a submartingale) when the data are generated under that same $theta_{0}$, it need not be monotonically decreasing in general, not even in terms of its overall expectation, when the data are generated under a different $theta_{1}.$ In fact, it may keep going up and down many times, even in simple cases such as iid coin tosses. We obtain precise asymptotic rates when the data come from the wide class of exponential families of distributions; these rates imply in particular that the expectation of the $theta_{0}$-posterior under $theta_{1}neqtheta_{0}$ is eventually strictly decreasing. Finally, we show that in a number of interesting cases this expectation is a log-concave function of the sample size, and thus unimodal. In the Bernoulli case we obtain this by developing an inequality that is related to Tur'{a}n's inequality for Legendre polynomials.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41789423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gram-type matrices and their spectral decomposition are of central importance for numerous problems in statistics, applied mathematics, physics, and machine learning. In this paper, we carefully study the non-asymptotic properties of spectral decomposition of large Gram-type matrices when data are not necessarily independent. Specifically, we derive the exponential tail bounds for the deviation between eigenvectors of the right Gram matrix to their population counterparts as well as the Berry-Esseen type bound for these deviations. We also obtain the non-asymptotic tail bound of the ratio between eigenvalues of the left Gram matrix, namely the sample covariance matrix, and their population counterparts regardless of the size of the data matrix. The documented non-asymptotic properties are further demonstrated in a suite of applications, including the non-asymptotic characterization of the estimated number of latent factors in factor models and relate machine learning problems, the estimation and forecasting of high-dimensional time series, the spectral properties of large sample covariance matrix such as perturbation bounds and inference on the spectral projectors, and low-rank matrix denoising using dependent data.
{"title":"Non-asymptotic properties of spectral decomposition of large Gram-type matrices and applications","authors":"Lyuou Zhang, Wen Zhou, Haonan Wang","doi":"10.3150/21-bej1384","DOIUrl":"https://doi.org/10.3150/21-bej1384","url":null,"abstract":"Gram-type matrices and their spectral decomposition are of central importance for numerous problems in statistics, applied mathematics, physics, and machine learning. In this paper, we carefully study the non-asymptotic properties of spectral decomposition of large Gram-type matrices when data are not necessarily independent. Specifically, we derive the exponential tail bounds for the deviation between eigenvectors of the right Gram matrix to their population counterparts as well as the Berry-Esseen type bound for these deviations. We also obtain the non-asymptotic tail bound of the ratio between eigenvalues of the left Gram matrix, namely the sample covariance matrix, and their population counterparts regardless of the size of the data matrix. The documented non-asymptotic properties are further demonstrated in a suite of applications, including the non-asymptotic characterization of the estimated number of latent factors in factor models and relate machine learning problems, the estimation and forecasting of high-dimensional time series, the spectral properties of large sample covariance matrix such as perturbation bounds and inference on the spectral projectors, and low-rank matrix denoising using dependent data.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42365518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider Bayesian inference for a monotone density on the unit interval and study the resulting asymptotic properties. We consider a “projection-posterior” approach, where we construct a prior on density functions through random histograms without imposing the monotonicity constraint, but induce a random distribution by projecting a sample from the posterior on the space of monotone functions. The approach allows us to retain posterior conjugacy, allowing explicit expressions extremely useful for studying asymptotic properties. We show that the projection-posterior contracts at the optimal n−1/3-rate. We then construct a consistent test based on the posterior distribution for testing the hypothesis of monotonicity. Finally, we obtain the limiting coverage of a projection-posterior credible interval for the value of the function at an interior point. Interestingly, the limiting coverage turns out to be higher than the nominal credibility level, the opposite of the undercoverage phenomenon observed in a smoothness regime. Moreover, we show that a recalibration method using a lower credibility level gives an intended limiting coverage. We also discuss extensions of the obtained results for densities on the half-line. We conduct a simulation study to demonstrate the accuracy of the asymptotic results in finite samples.
{"title":"Rates and coverage for monotone densities using projection-posterior","authors":"Moumita Chakraborty, S. Ghosal","doi":"10.3150/21-bej1379","DOIUrl":"https://doi.org/10.3150/21-bej1379","url":null,"abstract":"We consider Bayesian inference for a monotone density on the unit interval and study the resulting asymptotic properties. We consider a “projection-posterior” approach, where we construct a prior on density functions through random histograms without imposing the monotonicity constraint, but induce a random distribution by projecting a sample from the posterior on the space of monotone functions. The approach allows us to retain posterior conjugacy, allowing explicit expressions extremely useful for studying asymptotic properties. We show that the projection-posterior contracts at the optimal n−1/3-rate. We then construct a consistent test based on the posterior distribution for testing the hypothesis of monotonicity. Finally, we obtain the limiting coverage of a projection-posterior credible interval for the value of the function at an interior point. Interestingly, the limiting coverage turns out to be higher than the nominal credibility level, the opposite of the undercoverage phenomenon observed in a smoothness regime. Moreover, we show that a recalibration method using a lower credibility level gives an intended limiting coverage. We also discuss extensions of the obtained results for densities on the half-line. We conduct a simulation study to demonstrate the accuracy of the asymptotic results in finite samples.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45354898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider anchored Gaussian (cid:96) -simplices in the d -dimensional Euclidean space, that is, simplices with one fixed vertex y ∈ R d and the remaining vertices X 1 , . . . , X (cid:96) randomly sampled from the d -variate standard normal distribution. We determine the distribution of the measure of such simplices for any d , any (cid:96) , and any anchor point y , which is of interest, e.g., when studying the asymptotics of U-statistics based on such simplex measures. We provide two proofs of the results. The first one is short but is not self-contained as it crucially relies on a technical result for non-central Wishart distributions. The second one is a simple and self-contained proof, that also provides some geometric insight on the results. Quite nicely, variations on this second argument reveal intriguing distributional identities on products of central and non-central chi-square distributions with Beta-distributed non-centrality parameters. We independently establish these distributional identities by making use of Mellin transforms. Beyond the aforementioned use to study the asymptotics of some U-statistics, our results do find natural applications in the context of robust location estimation, as we illustrate by considering a class of simplex-based multivariate medians that contains the celebrated spatial median and Oja median as special cases. Throughout, our results are confirmed by numerical experiments.
我们考虑d维欧几里得空间中的锚定高斯(cid:96)单形,即具有一个固定顶点y∈R d和其余顶点X 1,…的单形。, X (cid:96)从d变量标准正态分布中随机抽样。我们确定了任意d,任意(cid:96)和任意锚点y的这种简单测度的分布,这是感兴趣的,例如,当研究基于这种简单测度的u统计量的渐近性时。我们对结果提供了两个证明。第一个很短,但不是独立的,因为它主要依赖于非中心Wishart分布的技术结果。第二个是一个简单而独立的证明,它也提供了一些关于结果的几何见解。很好地,第二个论点的变化揭示了具有β分布非中心性参数的中心和非中心卡方分布乘积的有趣分布恒等式。我们利用Mellin变换独立地建立了这些分布恒等式。除了上述用于研究某些u统计量的渐近性之外,我们的结果确实在鲁棒位置估计的背景下找到了自然的应用,正如我们通过考虑一类基于simplex的多元中位数来说明的那样,其中包含著名的空间中位数和Oja中位数作为特殊情况。通过数值实验验证了本文的研究结果。
{"title":"On the measure of anchored Gaussian simplices, with applications to multivariate medians","authors":"D. Paindaveine","doi":"10.3150/21-bej1373","DOIUrl":"https://doi.org/10.3150/21-bej1373","url":null,"abstract":"We consider anchored Gaussian (cid:96) -simplices in the d -dimensional Euclidean space, that is, simplices with one fixed vertex y ∈ R d and the remaining vertices X 1 , . . . , X (cid:96) randomly sampled from the d -variate standard normal distribution. We determine the distribution of the measure of such simplices for any d , any (cid:96) , and any anchor point y , which is of interest, e.g., when studying the asymptotics of U-statistics based on such simplex measures. We provide two proofs of the results. The first one is short but is not self-contained as it crucially relies on a technical result for non-central Wishart distributions. The second one is a simple and self-contained proof, that also provides some geometric insight on the results. Quite nicely, variations on this second argument reveal intriguing distributional identities on products of central and non-central chi-square distributions with Beta-distributed non-centrality parameters. We independently establish these distributional identities by making use of Mellin transforms. Beyond the aforementioned use to study the asymptotics of some U-statistics, our results do find natural applications in the context of robust location estimation, as we illustrate by considering a class of simplex-based multivariate medians that contains the celebrated spatial median and Oja median as special cases. Throughout, our results are confirmed by numerical experiments.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45056147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Gaigall, Julian Gerstenberg, Thi Thu Huyen Trinh
On the basis of independent and identically distributed bivariate random vectors, where the components are categorial and continuous variables, respectively, the related concomitants, also called induced order statistic, are considered. The main theoretical result is a functional central limit theorem for the empirical process of the concomitants in a triangular array setting. A natural application is hypothesis testing. An independence test and a two-sample test are investigated in detail. The fairly general setting enables limit results under local alternatives and bootstrap samples. For the comparison with existing tests from the literature simulation studies are conducted. The empirical results obtained confirm the theoretical findings.
{"title":"Empirical process of concomitants for partly categorial data and applications in statistics","authors":"D. Gaigall, Julian Gerstenberg, Thi Thu Huyen Trinh","doi":"10.3150/21-bej1367","DOIUrl":"https://doi.org/10.3150/21-bej1367","url":null,"abstract":"On the basis of independent and identically distributed bivariate random vectors, where the components are categorial and continuous variables, respectively, the related concomitants, also called induced order statistic, are considered. The main theoretical result is a functional central limit theorem for the empirical process of the concomitants in a triangular array setting. A natural application is hypothesis testing. An independence test and a two-sample test are investigated in detail. The fairly general setting enables limit results under local alternatives and bootstrap samples. For the comparison with existing tests from the literature simulation studies are conducted. The empirical results obtained confirm the theoretical findings.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46514531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}