Bernoulli最新文献

英文中文

Spiked eigenvalues of noncentral Fisher matrix with applications 非中心Fisher矩阵的尖峰特征值及其应用

IF 1.5 2区数学 Q2 STATISTICS & PROBABILITY

Bernoulli

Pub Date : 2021-04-10 DOI: 10.3150/22-bej1579

Xiaozhuo Zhang, Zhiqiang Hou, Z. Bai, Jiang Hu

In this paper, we investigate the asymptotic behavior of spiked eigenvalues of the noncentral Fisher matrix defined by ${mathbf F}_p={mathbf C}_n(mathbf S_N)^{-1}$, where ${mathbf C}_n$ is a noncentral sample covariance matrix defined by $(mathbf Xi+mathbf X)(mathbf Xi+mathbf X)^*/n$ and $mathbf S_N={mathbf Y}{mathbf Y}^*/N$. The matrices $mathbf X$ and $mathbf Y$ are two independent {Gaussian} arrays, with respective $ptimes n$ and $ptimes N$ and the Gaussian entries of them are textit {independent and identically distributed} (i.i.d.) with mean $0$ and variance $1$. When $p$, $n$, and $N$ grow to infinity proportionally, we establish a phase transition of the spiked eigenvalues of $mathbf F_p$. Furthermore, we derive the textit{central limiting theorem} (CLT) for the spiked eigenvalues of $mathbf F_p$. As an accessory to the proof of the above results, the fluctuations of the spiked eigenvalues of ${mathbf C}_n$ are studied, which should have its own interests. Besides, we develop the limits and CLT for the sample canonical correlation coefficients by the results of the spiked noncentral Fisher matrix and give three consistent estimators, including the population spiked eigenvalues and the population canonical correlation coefficients.

本文研究了由${mathbf F}_p={mathbf C}_n(mathbf S_N)^{-1}$定义的非中心Fisher矩阵的峰值特征值的渐近性，其中${mathbf C}_n$是由$(mathbf Xi+mathbf X)(mathbf Xi+mathbf X)^*/n$和$mathbf S_N={mathbf Y}{mathbf Y}^*/N$定义的非中心样本协方差矩阵。矩阵$mathbf X$和$mathbf Y$是两个独立的{高斯}数组，分别为$ptimes n$和$ptimes N$，它们的高斯项为textit独立同分布{(i.i.d)，均值}$0$，方差$1$。当$p$, $n$和$N$按比例增长到无穷大时，我们建立了$mathbf F_p$的尖峰特征值的相变。进一步，我们导出了$mathbf F_p$的尖征值的textit{中心极限定理}(CLT)。作为证明上述结果的辅助，研究了${mathbf C}_n$的尖刺特征值的波动，这应该有它自己的兴趣。此外，利用尖刺非中心Fisher矩阵的结果，给出了样本典型相关系数的极限和CLT，并给出了种群尖刺特征值和种群典型相关系数的三个一致估计。

{"title":"Spiked eigenvalues of noncentral Fisher matrix with applications","authors":"Xiaozhuo Zhang, Zhiqiang Hou, Z. Bai, Jiang Hu","doi":"10.3150/22-bej1579","DOIUrl":"https://doi.org/10.3150/22-bej1579","url":null,"abstract":"In this paper, we investigate the asymptotic behavior of spiked eigenvalues of the noncentral Fisher matrix defined by ${mathbf F}_p={mathbf C}_n(mathbf S_N)^{-1}$, where ${mathbf C}_n$ is a noncentral sample covariance matrix defined by $(mathbf Xi+mathbf X)(mathbf Xi+mathbf X)^*/n$ and $mathbf S_N={mathbf Y}{mathbf Y}^*/N$. The matrices $mathbf X$ and $mathbf Y$ are two independent {Gaussian} arrays, with respective $ptimes n$ and $ptimes N$ and the Gaussian entries of them are textit {independent and identically distributed} (i.i.d.) with mean $0$ and variance $1$. When $p$, $n$, and $N$ grow to infinity proportionally, we establish a phase transition of the spiked eigenvalues of $mathbf F_p$. Furthermore, we derive the textit{central limiting theorem} (CLT) for the spiked eigenvalues of $mathbf F_p$. As an accessory to the proof of the above results, the fluctuations of the spiked eigenvalues of ${mathbf C}_n$ are studied, which should have its own interests. Besides, we develop the limits and CLT for the sample canonical correlation coefficients by the results of the spiked noncentral Fisher matrix and give three consistent estimators, including the population spiked eigenvalues and the population canonical correlation coefficients.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42668925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Near-optimal estimation of the unseen under regularly varying tail populations 在有规律变化的尾部种群下对未见情况的近最优估计

IF 1.5 2区数学 Q2 STATISTICS & PROBABILITY

Bernoulli

Pub Date : 2021-04-07 DOI: 10.3150/23-bej1589

S. Favaro, Zacharie Naulet

Given $n$ samples from a population of individuals belonging to different species, what is the number $U$ of hitherto unseen species that would be observed if $lambda n$ new samples were collected? This is an important problem in many scientific endeavors, and it has been the subject of recent works introducing non-parametric estimators of $U$ that are minimax near-optimal and consistent all the way up to $lambda asymplog n$. These works do not rely on any assumption on the underlying unknown distribution $p$ of the population, and therefore, while providing a theory in its greatest generality, worst-case distributions may severely hamper the estimation of $U$ in concrete applications. In this paper, we consider the problem of strengthening the non-parametric framework for estimating $U$. Inspired by the estimation of rare probabilities in extreme value theory, and motivated by the ubiquitous power-law type distributions in many natural and social phenomena, we make use of a semi-parametric assumption regular variation of index $alpha in (0,1)$ for the tail behaviour of $p$. Under this assumption, we introduce an estimator of $U$ that is simple, linear in the sampling information, computationally efficient, and scalable to massive datasets. Then, uniformly over our class of regularly varying tail distributions, we show that the proposed estimator has provable guarantees: i) it is minimax near-optimal, up to a power of $log n$ factor; ii) it is consistent all of the way up to $loglambda asymp n^{alpha/2}/sqrt{log n}$, and this range is the best possible. This work presents the first study on the estimation of the unseen under regularly varying tail distributions. A numerical illustration of our methodology is presented for synthetic data and real data.

给定来自属于不同物种的个体群体的$n$样本，如果收集$λn$新样本，将观察到迄今为止未发现的物种的数量$U$是多少？这是许多科学工作中的一个重要问题，也是最近引入$U$的非参数估计量的主题，这些估计量是最接近最优的极小极大值，并且一直到$lambaasymplogn$都是一致的。这些工作不依赖于对人口潜在未知分布$p$的任何假设，因此，在提供最具普遍性的理论的同时，最坏情况下的分布可能会严重阻碍具体应用中$U$的估计。在本文中，我们考虑了加强估计$U$的非参数框架的问题。受极值理论中罕见概率估计的启发，并受许多自然和社会现象中普遍存在的幂律型分布的激励，我们对$p$的尾部行为使用了指数$alphain（0,1）$的半参数假设正则变化。在这个假设下，我们引入了一个$U$的估计器，它简单、采样信息线性、计算高效，并且可扩展到大规模数据集。然后，在我们的一类正则变化尾分布上，我们一致地证明了所提出的估计器具有可证明的保证：i）它是接近最优的极小极大值，高达$logn$因子的幂；ii）它一直到$loglambaasymp n^｛alpha/2｝/sqrt｛log n｝$都是一致的，并且这个范围是最好的。这项工作首次研究了在规则变化的尾部分布下不可见的估计。对于合成数据和实际数据，给出了我们方法的数值说明。

{"title":"Near-optimal estimation of the unseen under regularly varying tail populations","authors":"S. Favaro, Zacharie Naulet","doi":"10.3150/23-bej1589","DOIUrl":"https://doi.org/10.3150/23-bej1589","url":null,"abstract":"Given $n$ samples from a population of individuals belonging to different species, what is the number $U$ of hitherto unseen species that would be observed if $lambda n$ new samples were collected? This is an important problem in many scientific endeavors, and it has been the subject of recent works introducing non-parametric estimators of $U$ that are minimax near-optimal and consistent all the way up to $lambda asymplog n$. These works do not rely on any assumption on the underlying unknown distribution $p$ of the population, and therefore, while providing a theory in its greatest generality, worst-case distributions may severely hamper the estimation of $U$ in concrete applications. In this paper, we consider the problem of strengthening the non-parametric framework for estimating $U$. Inspired by the estimation of rare probabilities in extreme value theory, and motivated by the ubiquitous power-law type distributions in many natural and social phenomena, we make use of a semi-parametric assumption regular variation of index $alpha in (0,1)$ for the tail behaviour of $p$. Under this assumption, we introduce an estimator of $U$ that is simple, linear in the sampling information, computationally efficient, and scalable to massive datasets. Then, uniformly over our class of regularly varying tail distributions, we show that the proposed estimator has provable guarantees: i) it is minimax near-optimal, up to a power of $log n$ factor; ii) it is consistent all of the way up to $loglambda asymp n^{alpha/2}/sqrt{log n}$, and this range is the best possible. This work presents the first study on the estimation of the unseen under regularly varying tail distributions. A numerical illustration of our methodology is presented for synthetic data and real data.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44428263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Concentration bounds for the empirical angular measure with statistical learning applications 具有统计学习应用的经验角测度的集中界

IF 1.5 2区数学 Q2 STATISTICS & PROBABILITY

Bernoulli

Pub Date : 2021-04-07 DOI: 10.3150/22-bej1562

St'ephan Cl'emenccon, Hamid Jalalzai, St'ephane Lhaut, Anne Sabourin, J. Segers

The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation that the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the study of the sampling distribution of the resulting empirical angular measure is challenging. It is the purpose of the paper to establish finite-sample bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. The bounds are valid with high probability and, up to logarithmic factors, scale as the square root of the effective sample size. The bounds are applied to provide performance guarantees for two statistical learning procedures tailored to extreme regions of the input space and built upon the empirical angular measure: binary classification in extreme regions through empirical risk minimization and unsupervised anomaly detection through minimum-volume sets of the sphere.

单位球上的角测度表征了一个随机矢量在极端区域的分量的一阶依赖结构，用标准边距来定义。它的统计恢复是学习涉及远离中心的观察问题的重要一步。在向量的组成部分具有不同分布的常见情况下，秩变换提供了一种方便且稳健的标准化数据的方法，以便基于最极端的观测值构建角度度量的经验版本。然而，研究由此产生的经验角度测量的抽样分布是具有挑战性的。本文的目的是在控制组合复杂性的Borel集合的类上，为经验测度和真实测度之间的最大偏差统一地建立有限样本界。边界在高概率下是有效的，并且，直到对数因子，尺度为有效样本量的平方根。应用边界为两个统计学习过程提供性能保证，这些过程针对输入空间的极端区域，并建立在经验角度度量之上:通过经验风险最小化在极端区域进行二进制分类，以及通过球体的最小体积集进行无监督异常检测。

{"title":"Concentration bounds for the empirical angular measure with statistical learning applications","authors":"St'ephan Cl'emenccon, Hamid Jalalzai, St'ephane Lhaut, Anne Sabourin, J. Segers","doi":"10.3150/22-bej1562","DOIUrl":"https://doi.org/10.3150/22-bej1562","url":null,"abstract":"The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation that the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the study of the sampling distribution of the resulting empirical angular measure is challenging. It is the purpose of the paper to establish finite-sample bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. The bounds are valid with high probability and, up to logarithmic factors, scale as the square root of the effective sample size. The bounds are applied to provide performance guarantees for two statistical learning procedures tailored to extreme regions of the input space and built upon the empirical angular measure: binary classification in extreme regions through empirical risk minimization and unsupervised anomaly detection through minimum-volume sets of the sphere.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47877995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Inference for partially observed Riemannian Ornstein–Uhlenbeck diffusions of covariance matrices 协方差矩阵部分观测riemanian Ornstein-Uhlenbeck扩散的推论

IF 1.5 2区数学 Q2 STATISTICS & PROBABILITY

Bernoulli

Pub Date : 2021-04-07 DOI: 10.3150/22-bej1570

Mai Bui, Y. Pokern, P. Dellaportas

We construct a generalization of the Ornstein-Uhlenbeck processes on the cone of covariance matrices endowed with the Log-Euclidean and the Affine-Invariant metrics. Our development exploits the Riemannian geometric structure of symmetric positive definite matrices viewed as a differential manifold. We then provide Bayesian inference for discretely observed diffusion processes of covariance matrices based on an MCMC algorithm built with the help of a novel diffusion bridge sampler accounting for the geometric structure. Our proposed algorithm is illustrated with a real data financial application.

我们在具有对数欧氏和仿射不变度量的协方差矩阵锥上构造了Ornstein-Uhlenbeck过程的推广。我们的发展利用了被视为微分流形的对称正定矩阵的黎曼几何结构。然后，我们基于MCMC算法为离散观测到的协方差矩阵的扩散过程提供贝叶斯推断，该算法是在考虑几何结构的新型扩散桥采样器的帮助下建立的。我们提出的算法通过实际数据金融应用进行了说明。

引用次数: 6

Multivariate time series models for mixed data 混合数据的多元时间序列模型

IF 1.5 2区数学 Q2 STATISTICS & PROBABILITY

Bernoulli

Pub Date : 2021-04-02 DOI: 10.3150/22-bej1474

Zinsou Max Debaly, L. Truquet

We introduce a general approach for modeling the dynamic of multivariate time series when the data are of mixed type (binary/count/continuous). Our method is quite flexible and conditionally on past values, each coordinate at time $t$ can have a distribution compatible with a standard univariate time series model such as GARCH, ARMA, INGARCH or logistic models whereas past values of the other coordinates play the role of exogenous covariates in the dynamic. The simultaneous dependence in the multivariate time series can be modeled with a copula. Additional exogenous covariates are also allowed in the dynamic. We first study usual stability properties of these models and then show that autoregressive parameters can be consistently estimated equation-by-equation using a pseudo-maximum likelihood method, leading to a fast implementation even when the number of time series is large. Moreover, we prove consistency results when a parametric copula model is fitted to the time series and in the case of Gaussian copulas, we show that the likelihood estimator of the correlation matrix is strongly consistent. We carefully check all our assumptions for two prototypical examples: a GARCH/INGARCH model and logistic/log-linear INGARCH model. Our results are illustrated with numerical experiments as well as two real data sets.

本文介绍了数据为混合类型(二元/计数/连续)时多元时间序列动态建模的一般方法。我们的方法非常灵活，并且有条件地依赖于过去的值，时间$t$的每个坐标都可以具有与标准单变量时间序列模型(如GARCH, ARMA, INGARCH或logistic模型)兼容的分布，而其他坐标的过去值在动态中扮演外源协变量的角色。多元时间序列的同时依赖关系可以用联结公式来建模。额外的外生协变量也允许在动态。我们首先研究了这些模型通常的稳定性性质，然后证明了使用伪极大似然方法可以一致地估计方程的自回归参数，即使在时间序列数量很大的情况下也可以快速实现。此外，我们还证明了参数copula模型拟合时间序列时的一致性结果，在高斯copula的情况下，我们证明了相关矩阵的似然估计量是强一致性的。我们仔细检查了两个原型示例的所有假设:GARCH/INGARCH模型和逻辑/对数线性INGARCH模型。我们的结果用数值实验和两个实际数据集来说明。

{"title":"Multivariate time series models for mixed data","authors":"Zinsou Max Debaly, L. Truquet","doi":"10.3150/22-bej1474","DOIUrl":"https://doi.org/10.3150/22-bej1474","url":null,"abstract":"We introduce a general approach for modeling the dynamic of multivariate time series when the data are of mixed type (binary/count/continuous). Our method is quite flexible and conditionally on past values, each coordinate at time $t$ can have a distribution compatible with a standard univariate time series model such as GARCH, ARMA, INGARCH or logistic models whereas past values of the other coordinates play the role of exogenous covariates in the dynamic. The simultaneous dependence in the multivariate time series can be modeled with a copula. Additional exogenous covariates are also allowed in the dynamic. We first study usual stability properties of these models and then show that autoregressive parameters can be consistently estimated equation-by-equation using a pseudo-maximum likelihood method, leading to a fast implementation even when the number of time series is large. Moreover, we prove consistency results when a parametric copula model is fitted to the time series and in the case of Gaussian copulas, we show that the likelihood estimator of the correlation matrix is strongly consistent. We carefully check all our assumptions for two prototypical examples: a GARCH/INGARCH model and logistic/log-linear INGARCH model. Our results are illustrated with numerical experiments as well as two real data sets.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":"232 3","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41263042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Consistency of p-norm based tests in high dimensions: Characterization, monotonicity, domination 高维p-范数检验的一致性：特征性、单调性、支配性

IF 1.5 2区数学 Q2 STATISTICS & PROBABILITY

Bernoulli

Pub Date : 2021-03-20 DOI: 10.3150/22-bej1552

A. Kock, David Preinerstorfer

Many commonly used test statistics are based on a norm measuring the evidence against the null hypothesis. To understand how the choice of a norm affects power properties of tests in high dimensions, we study the consistency sets of $p$-norm based tests in the prototypical framework of sequence models with unrestricted parameter spaces, the null hypothesis being that all observations have zero mean. The consistency set of a test is here defined as the set of all arrays of alternatives the test is consistent against as the dimension of the parameter space diverges. We characterize the consistency sets of $p$-norm based tests and find, in particular, that the consistency against an array of alternatives cannot be determined solely in terms of the $p$-norm of the alternative. Our characterization also reveals an unexpected monotonicity result: namely that the consistency set is strictly increasing in $p in (0, infty)$, such that tests based on higher $p$ strictly dominate those based on lower $p$ in terms of consistency. This monotonicity allows us to construct novel tests that dominate, with respect to their consistency behavior, all $p$-norm based tests without sacrificing size.

许多常用的检验统计量是基于一个标准来衡量反对零假设的证据。为了理解范数的选择如何影响高维测试的功率特性，我们在具有无限制参数空间的序列模型的原型框架中研究了基于$p$范数的测试的一致性集，零假设是所有观测值的平均值为零。测试的一致性集在这里定义为当参数空间的维数发散时，测试与之一致的所有备选项数组的集合。我们描述了基于$p$规范的测试的一致性集，并特别发现，针对一系列替代方案的一致性不能仅根据替代方案的$p$规范来确定。我们的描述还揭示了一个意想不到的单调性结果:即一致性集在$p in (0, infty)$中严格增加，因此基于较高$p$的测试在一致性方面严格优于基于较低$p$的测试。这种单调性允许我们构建新颖的测试，在不牺牲大小的情况下，就其一致性行为而言，所有基于$p$规范的测试都占主导地位。

{"title":"Consistency of p-norm based tests in high dimensions: Characterization, monotonicity, domination","authors":"A. Kock, David Preinerstorfer","doi":"10.3150/22-bej1552","DOIUrl":"https://doi.org/10.3150/22-bej1552","url":null,"abstract":"Many commonly used test statistics are based on a norm measuring the evidence against the null hypothesis. To understand how the choice of a norm affects power properties of tests in high dimensions, we study the consistency sets of $p$-norm based tests in the prototypical framework of sequence models with unrestricted parameter spaces, the null hypothesis being that all observations have zero mean. The consistency set of a test is here defined as the set of all arrays of alternatives the test is consistent against as the dimension of the parameter space diverges. We characterize the consistency sets of $p$-norm based tests and find, in particular, that the consistency against an array of alternatives cannot be determined solely in terms of the $p$-norm of the alternative. Our characterization also reveals an unexpected monotonicity result: namely that the consistency set is strictly increasing in $p in (0, infty)$, such that tests based on higher $p$ strictly dominate those based on lower $p$ in terms of consistency. This monotonicity allows us to construct novel tests that dominate, with respect to their consistency behavior, all $p$-norm based tests without sacrificing size.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44743516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Small sample spaces for Gaussian processes 高斯过程的小样本空间

IF 1.5 2区数学 Q2 STATISTICS & PROBABILITY

Bernoulli

Pub Date : 2021-03-04 DOI: 10.3150/22-bej1483

T. Karvonen

It is known that the membership in a given reproducing kernel Hilbert space (RKHS) of the samples of a Gaussian process $X$ is controlled by a certain nuclear dominance condition. However, it is less clear how to identify a"small"set of functions (not necessarily a vector space) that contains the samples. This article presents a general approach for identifying such sets. We use scaled RKHSs, which can be viewed as a generalisation of Hilbert scales, to define the sample support set as the largest set which is contained in every element of full measure under the law of $X$ in the $sigma$-algebra induced by the collection of scaled RKHS. This potentially non-measurable set is then shown to consist of those functions that can be expanded in terms of an orthonormal basis of the RKHS of the covariance kernel of $X$ and have their squared basis coefficients bounded away from zero and infinity, a result suggested by the Karhunen-Lo`{e}ve theorem.

已知高斯过程$X$的样本在给定的再生核Hilbert空间（RKHS）中的隶属度受一定的核优势条件控制。然而，如何识别包含样本的“小”函数集（不一定是向量空间）还不太清楚。本文提出了一种识别此类集合的通用方法。我们使用可被视为希尔伯特标度的推广的标度RKHS，将样本支持集定义为最大集，该最大集包含在由标度RKHS-集合诱导的$sigma$-代数中的$X$定律下的全测度的每个元素中。然后，这个潜在的不可测量集合被证明由那些函数组成，这些函数可以根据$X$的协方差核的RKHS的正交基进行扩展，并且它们的平方基系数有界于零和无穷大，这是Karhunen-Lo提出的结果`{e}ve定理。

引用次数: 10

Sufficient dimension reduction and instrument search for data with nonignorable nonresponse 不可忽略无响应数据的充分降维和仪器搜索

IF 1.5 2区数学 Q2 STATISTICS & PROBABILITY

Bernoulli

Pub Date : 2021-03-01 DOI: 10.3150/20-BEJ1260

Puying Zhao, Lei Wang, Junchao Shao

引用次数: 5

Splitting the sample at the largest uncensored observation 在最大的未经审查的观察中分割样本

IF 1.5 2区数学 Q2 STATISTICS & PROBABILITY

Bernoulli

Pub Date : 2021-03-01 DOI: 10.3150/21-bej1417

R. Maller, S. Resnick, S. Shemehsavar

We calculate finite sample and asymptotic distributions for the largest censored and uncensored survival times, and some related statistics, from a sample of survival data generated according to an iid censoring model. These statistics are important for assessing whether there is sufficient follow-up in the sample to be confident of the presence of immune or cured individuals in the population. A key structural result obtained is that, conditional on the value of the largest uncensored survival time, and knowing the number of censored observations exceeding this time, the sample partitions into two independent subsamples, each subsample having the distribution of an iid sample of censored survival times, of reduced size, from truncated random variables. This result provides valuable insight into the construction of censored survival data, and facilitates the calculation of explicit finite sample formulae. We illustrate for distributions of statistics useful for testing for sufficient follow-up in a sample, and apply extreme value methods to derive asymptotic distributions for some of those. MSC 2010 subject classifications: MSC2000 Subject Classifications: Primary 62N01, 62N02, 62N03, 62E10, 62E15, 62E20, G2G05; secondary 62F03, 62F05, 62F12, 62G32.

我们从根据iid审查模型生成的生存数据样本中计算了最大审查和未审查生存时间的有限样本和渐近分布，以及一些相关统计数据。这些统计数据对于评估样本中是否有足够的随访以确定人群中是否存在免疫或治愈个体非常重要。所获得的一个关键结构结果是，在最大未审查生存时间的值的条件下，并且知道超过该时间的审查观察的数量，样本被划分为两个独立的子样本，每个子样本具有来自截断随机变量的缩小大小的审查生存时间iid样本的分布。这一结果为截尾生存数据的构造提供了有价值的见解，并有助于显式有限样本公式的计算。我们举例说明了统计数据的分布，这些分布可用于测试样本中是否有足够的后续行动，并应用极值方法推导其中一些的渐近分布。MSC 2010主题分类：MSC2000主题分类：初级62N01、62N02、62N03、62E10、62E15、62E20、G2G05；次级62F03、62F05、62F12、62G32。

{"title":"Splitting the sample at the largest uncensored observation","authors":"R. Maller, S. Resnick, S. Shemehsavar","doi":"10.3150/21-bej1417","DOIUrl":"https://doi.org/10.3150/21-bej1417","url":null,"abstract":"We calculate finite sample and asymptotic distributions for the largest censored and uncensored survival times, and some related statistics, from a sample of survival data generated according to an iid censoring model. These statistics are important for assessing whether there is sufficient follow-up in the sample to be confident of the presence of immune or cured individuals in the population. A key structural result obtained is that, conditional on the value of the largest uncensored survival time, and knowing the number of censored observations exceeding this time, the sample partitions into two independent subsamples, each subsample having the distribution of an iid sample of censored survival times, of reduced size, from truncated random variables. This result provides valuable insight into the construction of censored survival data, and facilitates the calculation of explicit finite sample formulae. We illustrate for distributions of statistics useful for testing for sufficient follow-up in a sample, and apply extreme value methods to derive asymptotic distributions for some of those. MSC 2010 subject classifications: MSC2000 Subject Classifications: Primary 62N01, 62N02, 62N03, 62E10, 62E15, 62E20, G2G05; secondary 62F03, 62F05, 62F12, 62G32.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46551907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Minimum spanning trees of random geometric graphs with location dependent weights 具有位置依赖权的随机几何图的最小生成树

IF 1.5 2区数学 Q2 STATISTICS & PROBABILITY

Bernoulli

Pub Date : 2021-03-01 DOI: 10.3150/20-BEJ1318

Ghurumuruhan Ganesan

Consider n nodes {Xi}1≤i≤n independently distributed in the unit square S, each according to a distribution f. Nodes Xi and Xj are joined by an edge if the Euclidean distance d(Xi,Xj) is less than rn, the adjacency distance and the resulting random graph Gn is called a random geometric graph (RGG). We now assign a location dependent weight to each edge of Gn and define MSTn to be the sum of the weights of the minimum spanning trees of all components of Gn. For values of rn above the connectivity regime, we obtain upper and lower bound deviation estimates for MSTn and L2-convergence of MSTn appropriately scaled and centred.

我们现在为Gn的每条边分配一个位置相关的权值，并定义MSTn为Gn所有组成部分的最小生成树的权值之和。对于连通区间以上的rn值，我们获得了MSTn的上界和下界偏差估计，以及适当缩放和集中的MSTn的l2收敛性。

引用次数: 2

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Bernoulli

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀