In this paper, we investigate the asymptotic behavior of spiked eigenvalues of the noncentral Fisher matrix defined by ${mathbf F}_p={mathbf C}_n(mathbf S_N)^{-1}$, where ${mathbf C}_n$ is a noncentral sample covariance matrix defined by $(mathbf Xi+mathbf X)(mathbf Xi+mathbf X)^*/n$ and $mathbf S_N={mathbf Y}{mathbf Y}^*/N$. The matrices $mathbf X$ and $mathbf Y$ are two independent {Gaussian} arrays, with respective $ptimes n$ and $ptimes N$ and the Gaussian entries of them are textit {independent and identically distributed} (i.i.d.) with mean $0$ and variance $1$. When $p$, $n$, and $N$ grow to infinity proportionally, we establish a phase transition of the spiked eigenvalues of $mathbf F_p$. Furthermore, we derive the textit{central limiting theorem} (CLT) for the spiked eigenvalues of $mathbf F_p$. As an accessory to the proof of the above results, the fluctuations of the spiked eigenvalues of ${mathbf C}_n$ are studied, which should have its own interests. Besides, we develop the limits and CLT for the sample canonical correlation coefficients by the results of the spiked noncentral Fisher matrix and give three consistent estimators, including the population spiked eigenvalues and the population canonical correlation coefficients.
{"title":"Spiked eigenvalues of noncentral Fisher matrix with applications","authors":"Xiaozhuo Zhang, Zhiqiang Hou, Z. Bai, Jiang Hu","doi":"10.3150/22-bej1579","DOIUrl":"https://doi.org/10.3150/22-bej1579","url":null,"abstract":"In this paper, we investigate the asymptotic behavior of spiked eigenvalues of the noncentral Fisher matrix defined by ${mathbf F}_p={mathbf C}_n(mathbf S_N)^{-1}$, where ${mathbf C}_n$ is a noncentral sample covariance matrix defined by $(mathbf Xi+mathbf X)(mathbf Xi+mathbf X)^*/n$ and $mathbf S_N={mathbf Y}{mathbf Y}^*/N$. The matrices $mathbf X$ and $mathbf Y$ are two independent {Gaussian} arrays, with respective $ptimes n$ and $ptimes N$ and the Gaussian entries of them are textit {independent and identically distributed} (i.i.d.) with mean $0$ and variance $1$. When $p$, $n$, and $N$ grow to infinity proportionally, we establish a phase transition of the spiked eigenvalues of $mathbf F_p$. Furthermore, we derive the textit{central limiting theorem} (CLT) for the spiked eigenvalues of $mathbf F_p$. As an accessory to the proof of the above results, the fluctuations of the spiked eigenvalues of ${mathbf C}_n$ are studied, which should have its own interests. Besides, we develop the limits and CLT for the sample canonical correlation coefficients by the results of the spiked noncentral Fisher matrix and give three consistent estimators, including the population spiked eigenvalues and the population canonical correlation coefficients.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42668925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given $n$ samples from a population of individuals belonging to different species, what is the number $U$ of hitherto unseen species that would be observed if $lambda n$ new samples were collected? This is an important problem in many scientific endeavors, and it has been the subject of recent works introducing non-parametric estimators of $U$ that are minimax near-optimal and consistent all the way up to $lambda asymplog n$. These works do not rely on any assumption on the underlying unknown distribution $p$ of the population, and therefore, while providing a theory in its greatest generality, worst-case distributions may severely hamper the estimation of $U$ in concrete applications. In this paper, we consider the problem of strengthening the non-parametric framework for estimating $U$. Inspired by the estimation of rare probabilities in extreme value theory, and motivated by the ubiquitous power-law type distributions in many natural and social phenomena, we make use of a semi-parametric assumption regular variation of index $alpha in (0,1)$ for the tail behaviour of $p$. Under this assumption, we introduce an estimator of $U$ that is simple, linear in the sampling information, computationally efficient, and scalable to massive datasets. Then, uniformly over our class of regularly varying tail distributions, we show that the proposed estimator has provable guarantees: i) it is minimax near-optimal, up to a power of $log n$ factor; ii) it is consistent all of the way up to $loglambda asymp n^{alpha/2}/sqrt{log n}$, and this range is the best possible. This work presents the first study on the estimation of the unseen under regularly varying tail distributions. A numerical illustration of our methodology is presented for synthetic data and real data.
{"title":"Near-optimal estimation of the unseen under regularly varying tail populations","authors":"S. Favaro, Zacharie Naulet","doi":"10.3150/23-bej1589","DOIUrl":"https://doi.org/10.3150/23-bej1589","url":null,"abstract":"Given $n$ samples from a population of individuals belonging to different species, what is the number $U$ of hitherto unseen species that would be observed if $lambda n$ new samples were collected? This is an important problem in many scientific endeavors, and it has been the subject of recent works introducing non-parametric estimators of $U$ that are minimax near-optimal and consistent all the way up to $lambda asymplog n$. These works do not rely on any assumption on the underlying unknown distribution $p$ of the population, and therefore, while providing a theory in its greatest generality, worst-case distributions may severely hamper the estimation of $U$ in concrete applications. In this paper, we consider the problem of strengthening the non-parametric framework for estimating $U$. Inspired by the estimation of rare probabilities in extreme value theory, and motivated by the ubiquitous power-law type distributions in many natural and social phenomena, we make use of a semi-parametric assumption regular variation of index $alpha in (0,1)$ for the tail behaviour of $p$. Under this assumption, we introduce an estimator of $U$ that is simple, linear in the sampling information, computationally efficient, and scalable to massive datasets. Then, uniformly over our class of regularly varying tail distributions, we show that the proposed estimator has provable guarantees: i) it is minimax near-optimal, up to a power of $log n$ factor; ii) it is consistent all of the way up to $loglambda asymp n^{alpha/2}/sqrt{log n}$, and this range is the best possible. This work presents the first study on the estimation of the unseen under regularly varying tail distributions. A numerical illustration of our methodology is presented for synthetic data and real data.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44428263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
St'ephan Cl'emenccon, Hamid Jalalzai, St'ephane Lhaut, Anne Sabourin, J. Segers
The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation that the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the study of the sampling distribution of the resulting empirical angular measure is challenging. It is the purpose of the paper to establish finite-sample bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. The bounds are valid with high probability and, up to logarithmic factors, scale as the square root of the effective sample size. The bounds are applied to provide performance guarantees for two statistical learning procedures tailored to extreme regions of the input space and built upon the empirical angular measure: binary classification in extreme regions through empirical risk minimization and unsupervised anomaly detection through minimum-volume sets of the sphere.
{"title":"Concentration bounds for the empirical angular measure with statistical learning applications","authors":"St'ephan Cl'emenccon, Hamid Jalalzai, St'ephane Lhaut, Anne Sabourin, J. Segers","doi":"10.3150/22-bej1562","DOIUrl":"https://doi.org/10.3150/22-bej1562","url":null,"abstract":"The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation that the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the study of the sampling distribution of the resulting empirical angular measure is challenging. It is the purpose of the paper to establish finite-sample bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. The bounds are valid with high probability and, up to logarithmic factors, scale as the square root of the effective sample size. The bounds are applied to provide performance guarantees for two statistical learning procedures tailored to extreme regions of the input space and built upon the empirical angular measure: binary classification in extreme regions through empirical risk minimization and unsupervised anomaly detection through minimum-volume sets of the sphere.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47877995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We construct a generalization of the Ornstein-Uhlenbeck processes on the cone of covariance matrices endowed with the Log-Euclidean and the Affine-Invariant metrics. Our development exploits the Riemannian geometric structure of symmetric positive definite matrices viewed as a differential manifold. We then provide Bayesian inference for discretely observed diffusion processes of covariance matrices based on an MCMC algorithm built with the help of a novel diffusion bridge sampler accounting for the geometric structure. Our proposed algorithm is illustrated with a real data financial application.
{"title":"Inference for partially observed Riemannian Ornstein–Uhlenbeck diffusions of covariance matrices","authors":"Mai Bui, Y. Pokern, P. Dellaportas","doi":"10.3150/22-bej1570","DOIUrl":"https://doi.org/10.3150/22-bej1570","url":null,"abstract":"We construct a generalization of the Ornstein-Uhlenbeck processes on the cone of covariance matrices endowed with the Log-Euclidean and the Affine-Invariant metrics. Our development exploits the Riemannian geometric structure of symmetric positive definite matrices viewed as a differential manifold. We then provide Bayesian inference for discretely observed diffusion processes of covariance matrices based on an MCMC algorithm built with the help of a novel diffusion bridge sampler accounting for the geometric structure. Our proposed algorithm is illustrated with a real data financial application.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49250092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce a general approach for modeling the dynamic of multivariate time series when the data are of mixed type (binary/count/continuous). Our method is quite flexible and conditionally on past values, each coordinate at time $t$ can have a distribution compatible with a standard univariate time series model such as GARCH, ARMA, INGARCH or logistic models whereas past values of the other coordinates play the role of exogenous covariates in the dynamic. The simultaneous dependence in the multivariate time series can be modeled with a copula. Additional exogenous covariates are also allowed in the dynamic. We first study usual stability properties of these models and then show that autoregressive parameters can be consistently estimated equation-by-equation using a pseudo-maximum likelihood method, leading to a fast implementation even when the number of time series is large. Moreover, we prove consistency results when a parametric copula model is fitted to the time series and in the case of Gaussian copulas, we show that the likelihood estimator of the correlation matrix is strongly consistent. We carefully check all our assumptions for two prototypical examples: a GARCH/INGARCH model and logistic/log-linear INGARCH model. Our results are illustrated with numerical experiments as well as two real data sets.
{"title":"Multivariate time series models for mixed data","authors":"Zinsou Max Debaly, L. Truquet","doi":"10.3150/22-bej1474","DOIUrl":"https://doi.org/10.3150/22-bej1474","url":null,"abstract":"We introduce a general approach for modeling the dynamic of multivariate time series when the data are of mixed type (binary/count/continuous). Our method is quite flexible and conditionally on past values, each coordinate at time $t$ can have a distribution compatible with a standard univariate time series model such as GARCH, ARMA, INGARCH or logistic models whereas past values of the other coordinates play the role of exogenous covariates in the dynamic. The simultaneous dependence in the multivariate time series can be modeled with a copula. Additional exogenous covariates are also allowed in the dynamic. We first study usual stability properties of these models and then show that autoregressive parameters can be consistently estimated equation-by-equation using a pseudo-maximum likelihood method, leading to a fast implementation even when the number of time series is large. Moreover, we prove consistency results when a parametric copula model is fitted to the time series and in the case of Gaussian copulas, we show that the likelihood estimator of the correlation matrix is strongly consistent. We carefully check all our assumptions for two prototypical examples: a GARCH/INGARCH model and logistic/log-linear INGARCH model. Our results are illustrated with numerical experiments as well as two real data sets.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":"232 3","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41263042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many commonly used test statistics are based on a norm measuring the evidence against the null hypothesis. To understand how the choice of a norm affects power properties of tests in high dimensions, we study the consistency sets of $p$-norm based tests in the prototypical framework of sequence models with unrestricted parameter spaces, the null hypothesis being that all observations have zero mean. The consistency set of a test is here defined as the set of all arrays of alternatives the test is consistent against as the dimension of the parameter space diverges. We characterize the consistency sets of $p$-norm based tests and find, in particular, that the consistency against an array of alternatives cannot be determined solely in terms of the $p$-norm of the alternative. Our characterization also reveals an unexpected monotonicity result: namely that the consistency set is strictly increasing in $p in (0, infty)$, such that tests based on higher $p$ strictly dominate those based on lower $p$ in terms of consistency. This monotonicity allows us to construct novel tests that dominate, with respect to their consistency behavior, all $p$-norm based tests without sacrificing size.
许多常用的检验统计量是基于一个标准来衡量反对零假设的证据。为了理解范数的选择如何影响高维测试的功率特性,我们在具有无限制参数空间的序列模型的原型框架中研究了基于$p$范数的测试的一致性集,零假设是所有观测值的平均值为零。测试的一致性集在这里定义为当参数空间的维数发散时,测试与之一致的所有备选项数组的集合。我们描述了基于$p$规范的测试的一致性集,并特别发现,针对一系列替代方案的一致性不能仅根据替代方案的$p$规范来确定。我们的描述还揭示了一个意想不到的单调性结果:即一致性集在$p in (0, infty)$中严格增加,因此基于较高$p$的测试在一致性方面严格优于基于较低$p$的测试。这种单调性允许我们构建新颖的测试,在不牺牲大小的情况下,就其一致性行为而言,所有基于$p$规范的测试都占主导地位。
{"title":"Consistency of p-norm based tests in high dimensions: Characterization, monotonicity, domination","authors":"A. Kock, David Preinerstorfer","doi":"10.3150/22-bej1552","DOIUrl":"https://doi.org/10.3150/22-bej1552","url":null,"abstract":"Many commonly used test statistics are based on a norm measuring the evidence against the null hypothesis. To understand how the choice of a norm affects power properties of tests in high dimensions, we study the consistency sets of $p$-norm based tests in the prototypical framework of sequence models with unrestricted parameter spaces, the null hypothesis being that all observations have zero mean. The consistency set of a test is here defined as the set of all arrays of alternatives the test is consistent against as the dimension of the parameter space diverges. We characterize the consistency sets of $p$-norm based tests and find, in particular, that the consistency against an array of alternatives cannot be determined solely in terms of the $p$-norm of the alternative. Our characterization also reveals an unexpected monotonicity result: namely that the consistency set is strictly increasing in $p in (0, infty)$, such that tests based on higher $p$ strictly dominate those based on lower $p$ in terms of consistency. This monotonicity allows us to construct novel tests that dominate, with respect to their consistency behavior, all $p$-norm based tests without sacrificing size.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44743516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is known that the membership in a given reproducing kernel Hilbert space (RKHS) of the samples of a Gaussian process $X$ is controlled by a certain nuclear dominance condition. However, it is less clear how to identify a"small"set of functions (not necessarily a vector space) that contains the samples. This article presents a general approach for identifying such sets. We use scaled RKHSs, which can be viewed as a generalisation of Hilbert scales, to define the sample support set as the largest set which is contained in every element of full measure under the law of $X$ in the $sigma$-algebra induced by the collection of scaled RKHS. This potentially non-measurable set is then shown to consist of those functions that can be expanded in terms of an orthonormal basis of the RKHS of the covariance kernel of $X$ and have their squared basis coefficients bounded away from zero and infinity, a result suggested by the Karhunen-Lo`{e}ve theorem.
{"title":"Small sample spaces for Gaussian processes","authors":"T. Karvonen","doi":"10.3150/22-bej1483","DOIUrl":"https://doi.org/10.3150/22-bej1483","url":null,"abstract":"It is known that the membership in a given reproducing kernel Hilbert space (RKHS) of the samples of a Gaussian process $X$ is controlled by a certain nuclear dominance condition. However, it is less clear how to identify a\"small\"set of functions (not necessarily a vector space) that contains the samples. This article presents a general approach for identifying such sets. We use scaled RKHSs, which can be viewed as a generalisation of Hilbert scales, to define the sample support set as the largest set which is contained in every element of full measure under the law of $X$ in the $sigma$-algebra induced by the collection of scaled RKHS. This potentially non-measurable set is then shown to consist of those functions that can be expanded in terms of an orthonormal basis of the RKHS of the covariance kernel of $X$ and have their squared basis coefficients bounded away from zero and infinity, a result suggested by the Karhunen-Lo`{e}ve theorem.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42689494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sufficient dimension reduction and instrument search for data with nonignorable nonresponse","authors":"Puying Zhao, Lei Wang, Junchao Shao","doi":"10.3150/20-BEJ1260","DOIUrl":"https://doi.org/10.3150/20-BEJ1260","url":null,"abstract":"","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46437883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We calculate finite sample and asymptotic distributions for the largest censored and uncensored survival times, and some related statistics, from a sample of survival data generated according to an iid censoring model. These statistics are important for assessing whether there is sufficient follow-up in the sample to be confident of the presence of immune or cured individuals in the population. A key structural result obtained is that, conditional on the value of the largest uncensored survival time, and knowing the number of censored observations exceeding this time, the sample partitions into two independent subsamples, each subsample having the distribution of an iid sample of censored survival times, of reduced size, from truncated random variables. This result provides valuable insight into the construction of censored survival data, and facilitates the calculation of explicit finite sample formulae. We illustrate for distributions of statistics useful for testing for sufficient follow-up in a sample, and apply extreme value methods to derive asymptotic distributions for some of those. MSC 2010 subject classifications: MSC2000 Subject Classifications: Primary 62N01, 62N02, 62N03, 62E10, 62E15, 62E20, G2G05; secondary 62F03, 62F05, 62F12, 62G32.
{"title":"Splitting the sample at the largest uncensored observation","authors":"R. Maller, S. Resnick, S. Shemehsavar","doi":"10.3150/21-bej1417","DOIUrl":"https://doi.org/10.3150/21-bej1417","url":null,"abstract":"We calculate finite sample and asymptotic distributions for the largest censored and uncensored survival times, and some related statistics, from a sample of survival data generated according to an iid censoring model. These statistics are important for assessing whether there is sufficient follow-up in the sample to be confident of the presence of immune or cured individuals in the population. A key structural result obtained is that, conditional on the value of the largest uncensored survival time, and knowing the number of censored observations exceeding this time, the sample partitions into two independent subsamples, each subsample having the distribution of an iid sample of censored survival times, of reduced size, from truncated random variables. This result provides valuable insight into the construction of censored survival data, and facilitates the calculation of explicit finite sample formulae. We illustrate for distributions of statistics useful for testing for sufficient follow-up in a sample, and apply extreme value methods to derive asymptotic distributions for some of those. MSC 2010 subject classifications: MSC2000 Subject Classifications: Primary 62N01, 62N02, 62N03, 62E10, 62E15, 62E20, G2G05; secondary 62F03, 62F05, 62F12, 62G32.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46551907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Consider n nodes {Xi}1≤i≤n independently distributed in the unit square S, each according to a distribution f. Nodes Xi and Xj are joined by an edge if the Euclidean distance d(Xi,Xj) is less than rn, the adjacency distance and the resulting random graph Gn is called a random geometric graph (RGG). We now assign a location dependent weight to each edge of Gn and define MSTn to be the sum of the weights of the minimum spanning trees of all components of Gn. For values of rn above the connectivity regime, we obtain upper and lower bound deviation estimates for MSTn and L2-convergence of MSTn appropriately scaled and centred.
{"title":"Minimum spanning trees of random geometric graphs with location dependent weights","authors":"Ghurumuruhan Ganesan","doi":"10.3150/20-BEJ1318","DOIUrl":"https://doi.org/10.3150/20-BEJ1318","url":null,"abstract":"Consider n nodes {Xi}1≤i≤n independently distributed in the unit square S, each according to a distribution f. Nodes Xi and Xj are joined by an edge if the Euclidean distance d(Xi,Xj) is less than rn, the adjacency distance and the resulting random graph Gn is called a random geometric graph (RGG). We now assign a location dependent weight to each edge of Gn and define MSTn to be the sum of the weights of the minimum spanning trees of all components of Gn. For values of rn above the connectivity regime, we obtain upper and lower bound deviation estimates for MSTn and L2-convergence of MSTn appropriately scaled and centred.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":"27 1","pages":"2473-2493"},"PeriodicalIF":1.5,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48226045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}