首页 > 最新文献

Annals of Statistics最新文献

英文 中文
Single index Fréchet regression 单指数回归法
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-08-01 DOI: 10.1214/23-aos2307
Satarupa Bhattacharjee, Hans-Georg Müller
Single index models provide an effective dimension reduction tool in regression, especially for high-dimensional data, by projecting a general multivariate predictor onto a direction vector. We propose a novel single-index model for regression models where metric space-valued random object responses are coupled with multivariate Euclidean predictors. The responses in this regression model include complex, non-Euclidean data, including covariance matrices, graph Laplacians of networks and univariate probability distribution functions, among other complex objects that lie in abstract metric spaces. While Fréchet regression has proved useful for modeling the conditional mean of such random objects given multivariate Euclidean vectors, it does not provide for regression parameters such as slopes or intercepts, since the metric space-valued responses are not amenable to linear operations. As a consequence, distributional results for Fréchet regression have been elusive. We show here that for the case of multivariate Euclidean predictors, the parameters that define a single index and projection vector can be used to substitute for the inherent absence of parameters in Fréchet regression. Specifically, we derive the asymptotic distribution of suitable estimates of these parameters, which then can be utilized to test linear hypotheses for the parameters, subject to an identifiability condition. Consistent estimation of the link function of the single index Fréchet regression model is obtained through local linear Fréchet regression. We demonstrate the finite sample performance of estimation and inference for the proposed single index Fréchet regression model through simulation studies, including the special cases where responses are probability distributions and graph adjacency matrices. The method is illustrated for resting-state functional Magnetic Resonance Imaging (fMRI) data from the ADNI study.
单指标模型通过将一般的多变量预测器投影到方向向量上,为回归提供了有效的降维工具,特别是对于高维数据。我们提出了一种新的单指标模型的回归模型,其中度量空间值随机对象响应与多元欧几里得预测相耦合。该回归模型中的响应包括复杂的非欧几里得数据,包括协方差矩阵、网络的图拉普拉斯函数和单变量概率分布函数,以及抽象度量空间中的其他复杂对象。虽然fracimet回归已被证明对给定多变量欧几里得向量的随机对象的条件均值建模是有用的,但它不提供回归参数,如斜率或截距,因为度量空间值响应不适合线性操作。因此,fracimet回归的分布结果是难以捉摸的。我们在这里表明,对于多元欧几里得预测器的情况下,定义单个指标和投影向量的参数可以用来代替在fr切特回归中固有的参数缺失。具体地说,我们推导了这些参数的适当估计的渐近分布,然后可以利用它来检验参数的线性假设,但要符合可辨识条件。通过局部线性fr切特回归,得到单指标fr切特回归模型的链接函数的一致性估计。我们通过仿真研究证明了所提出的单指数frachimet回归模型的有限样本估计和推理性能,包括响应为概率分布和图邻接矩阵的特殊情况。ADNI研究的静息状态功能磁共振成像(fMRI)数据说明了该方法。
{"title":"Single index Fréchet regression","authors":"Satarupa Bhattacharjee, Hans-Georg Müller","doi":"10.1214/23-aos2307","DOIUrl":"https://doi.org/10.1214/23-aos2307","url":null,"abstract":"Single index models provide an effective dimension reduction tool in regression, especially for high-dimensional data, by projecting a general multivariate predictor onto a direction vector. We propose a novel single-index model for regression models where metric space-valued random object responses are coupled with multivariate Euclidean predictors. The responses in this regression model include complex, non-Euclidean data, including covariance matrices, graph Laplacians of networks and univariate probability distribution functions, among other complex objects that lie in abstract metric spaces. While Fréchet regression has proved useful for modeling the conditional mean of such random objects given multivariate Euclidean vectors, it does not provide for regression parameters such as slopes or intercepts, since the metric space-valued responses are not amenable to linear operations. As a consequence, distributional results for Fréchet regression have been elusive. We show here that for the case of multivariate Euclidean predictors, the parameters that define a single index and projection vector can be used to substitute for the inherent absence of parameters in Fréchet regression. Specifically, we derive the asymptotic distribution of suitable estimates of these parameters, which then can be utilized to test linear hypotheses for the parameters, subject to an identifiability condition. Consistent estimation of the link function of the single index Fréchet regression model is obtained through local linear Fréchet regression. We demonstrate the finite sample performance of estimation and inference for the proposed single index Fréchet regression model through simulation studies, including the special cases where responses are probability distributions and graph adjacency matrices. The method is illustrated for resting-state functional Magnetic Resonance Imaging (fMRI) data from the ADNI study.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134951505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Optimal change-point detection and localization 最优变点检测和定位
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-08-01 DOI: 10.1214/23-aos2297
Nicolas Verzelen, Magalie Fromont, Matthieu Lerasle, Patricia Reynaud-Bouret
Given a times series Y in Rn, with a piecewise constant mean and independent components, the twin problems of change-point detection and change-point localization, respectively amount to detecting the existence of times where the mean varies and estimating the positions of those change-points. In this work, we tightly characterize optimal rates for both problems and uncover the phase transition phenomenon from a global testing problem to a local estimation problem. Introducing a suitable definition of the energy of a change-point, we first establish in the single change-point setting that the optimal detection threshold is 2loglog(n). When the energy is just above the detection threshold, then the problem of localizing the change-point becomes purely parametric: it only depends on the difference in means and not on the position of the change-point anymore. Interestingly, for most change-point positions, including all those away from the endpoints of the time series, it is possible to detect and localize them at a much smaller energy level. In the multiple change-point setting, we establish the energy detection threshold and show similarly that the optimal localization error of a specific change-point becomes purely parametric. Along the way, tight minimax rates for Hausdorff and l 1 estimation losses of the vector of all change-points positions are also established. Two procedures achieving these optimal rates are introduced. The first one is a least-squares estimator with a new multiscale penalty that favours well spread change-points. The second one is a two-step multiscale post-processing procedure whose computational complexity can be as low as O(nlog(n)). Notably, these two procedures accommodate with the presence of possibly many low-energy and therefore undetectable change-points and are still able to detect and localize high-energy change-points even with the presence of those nuisance parameters.
给定Rn中的一个时间序列Y,它具有分段常数均值和独立分量,变化点检测和变化点定位的孪生问题分别是检测平均值是否存在变化时间和估计这些变化点的位置。在这项工作中,我们严格地描述了这两个问题的最优速率,并揭示了从全局测试问题到局部估计问题的相变现象。引入对变化点能量的合适定义,我们首先在单变化点设置中建立了最佳检测阈值为2logog (n)。当能量刚好高于检测阈值时,那么变化点的局部化问题就变成了纯粹的参数化问题:它只取决于平均值的差异,而不再取决于变化点的位置。有趣的是,对于大多数变化点位置,包括所有远离时间序列端点的位置,可以在更小的能级上检测和定位它们。在多变化点设置中,我们建立了能量检测阈值,并类似地证明了特定变化点的最优定位误差是纯参数化的。在此过程中,还建立了Hausdorff的紧极小极大率和所有变化点位置向量的1.1估计损失。介绍了实现这些最佳速率的两种方法。第一种是最小二乘估计,它具有一种新的多尺度惩罚,有利于良好分布的变化点。第二种是两步多尺度后处理过程,其计算复杂度可低至O(nlog(n))。值得注意的是,这两种方法可以适应可能存在的许多低能量、因此无法检测到的变化点,并且即使存在这些有害参数,仍然能够检测和定位高能变化点。
{"title":"Optimal change-point detection and localization","authors":"Nicolas Verzelen, Magalie Fromont, Matthieu Lerasle, Patricia Reynaud-Bouret","doi":"10.1214/23-aos2297","DOIUrl":"https://doi.org/10.1214/23-aos2297","url":null,"abstract":"Given a times series Y in Rn, with a piecewise constant mean and independent components, the twin problems of change-point detection and change-point localization, respectively amount to detecting the existence of times where the mean varies and estimating the positions of those change-points. In this work, we tightly characterize optimal rates for both problems and uncover the phase transition phenomenon from a global testing problem to a local estimation problem. Introducing a suitable definition of the energy of a change-point, we first establish in the single change-point setting that the optimal detection threshold is 2loglog(n). When the energy is just above the detection threshold, then the problem of localizing the change-point becomes purely parametric: it only depends on the difference in means and not on the position of the change-point anymore. Interestingly, for most change-point positions, including all those away from the endpoints of the time series, it is possible to detect and localize them at a much smaller energy level. In the multiple change-point setting, we establish the energy detection threshold and show similarly that the optimal localization error of a specific change-point becomes purely parametric. Along the way, tight minimax rates for Hausdorff and l 1 estimation losses of the vector of all change-points positions are also established. Two procedures achieving these optimal rates are introduced. The first one is a least-squares estimator with a new multiscale penalty that favours well spread change-points. The second one is a two-step multiscale post-processing procedure whose computational complexity can be as low as O(nlog(n)). Notably, these two procedures accommodate with the presence of possibly many low-energy and therefore undetectable change-points and are still able to detect and localize high-energy change-points even with the presence of those nuisance parameters.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135065833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Bootstrapping persistent Betti numbers and other stabilizing statistics 引导持久的贝蒂数字和其他稳定的统计数据
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-08-01 DOI: 10.1214/23-aos2277
Benjamin Roycraft, Johannes Krebs, Wolfgang Polonik
We investigate multivariate bootstrap procedures for general stabilizing statistics, with specific application to topological data analysis. The work relates to other general results in the area of stabilizing statistics, including central limit theorems for geometric and topological functionals of Poisson and binomial processes in the critical regime, where limit theorems prove difficult to use in practice, motivating the use of a bootstrap approach. A smoothed bootstrap procedure is shown to give consistent estimation in these settings. Specific statistics considered include the persistent Betti numbers of Čech and Vietoris–Rips complexes over point sets in Rd, along with Euler characteristics, and the total edge length of the k-nearest neighbor graph. Special emphasis is given to weakening the necessary conditions needed to establish bootstrap consistency. In particular, the assumption of a continuous underlying density is not required. Numerical studies illustrate the performance of the proposed method.
我们研究了一般稳定统计的多元自举过程,并具体应用于拓扑数据分析。这项工作涉及稳定统计领域的其他一般结果,包括泊松几何和拓扑泛函的中心极限定理和临界状态下的二项式过程,在这些极限定理被证明在实践中难以使用的地方,激励使用自举方法。一个平滑的自举过程显示了在这些设置中给出一致的估计。具体考虑的统计包括Čech和Vietoris-Rips复合体在Rd中点集上的持久Betti数,以及欧拉特征,以及k近邻图的总边长。特别强调削弱建立自举一致性所需的必要条件。特别是,不需要假定底层密度是连续的。数值研究表明了该方法的有效性。
{"title":"Bootstrapping persistent Betti numbers and other stabilizing statistics","authors":"Benjamin Roycraft, Johannes Krebs, Wolfgang Polonik","doi":"10.1214/23-aos2277","DOIUrl":"https://doi.org/10.1214/23-aos2277","url":null,"abstract":"We investigate multivariate bootstrap procedures for general stabilizing statistics, with specific application to topological data analysis. The work relates to other general results in the area of stabilizing statistics, including central limit theorems for geometric and topological functionals of Poisson and binomial processes in the critical regime, where limit theorems prove difficult to use in practice, motivating the use of a bootstrap approach. A smoothed bootstrap procedure is shown to give consistent estimation in these settings. Specific statistics considered include the persistent Betti numbers of Čech and Vietoris–Rips complexes over point sets in Rd, along with Euler characteristics, and the total edge length of the k-nearest neighbor graph. Special emphasis is given to weakening the necessary conditions needed to establish bootstrap consistency. In particular, the assumption of a continuous underlying density is not required. Numerical studies illustrate the performance of the proposed method.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135222531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On lower bounds for the bias-variance trade-off 偏差-方差权衡的下界
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-08-01 DOI: 10.1214/23-aos2279
Alexis Derumigny, Johannes Schmidt-Hieber
It is a common phenomenon that for high-dimensional and nonparametric statistical models, rate-optimal estimators balance squared bias and variance. Although this balancing is widely observed, little is known whether methods exist that could avoid the trade-off between bias and variance. We propose a general strategy to obtain lower bounds on the variance of any estimator with bias smaller than a prespecified bound. This shows to which extent the bias-variance trade-off is unavoidable and allows to quantify the loss of performance for methods that do not obey it. The approach is based on a number of abstract lower bounds for the variance involving the change of expectation with respect to different probability measures as well as information measures such as the Kullback-Leibler or chi-square divergence. Some of these inequalities rely on a new concept of information matrices. In a second part of the article, the abstract lower bounds are applied to several statistical models including the Gaussian white noise model, a boundary estimation problem, the Gaussian sequence model and the high-dimensional linear regression model. For these specific statistical applications, different types of bias-variance trade-offs occur that vary considerably in their strength. For the trade-off between integrated squared bias and integrated variance in the Gaussian white noise model, we propose to combine the general strategy for lower bounds with a reduction technique. This allows us to reduce the original problem to a lower bound on the bias-variance trade-off for estimators with additional symmetry properties in a simpler statistical model. To highlight possible extensions of the proposed framework, we moreover briefly discuss the trade-off between bias and mean absolute deviation.
对于高维和非参数统计模型,比率最优估计器平衡偏差平方和方差是一种常见现象。虽然这种平衡被广泛观察到,但很少有人知道是否存在可以避免偏差和方差之间权衡的方法。我们提出了一种一般策略来获得偏差小于预定界的任何估计量的方差下界。这表明偏差-方差权衡在多大程度上是不可避免的,并允许量化不服从它的方法的性能损失。该方法基于方差的一些抽象下界,这些下界涉及相对于不同概率度量和信息度量(如Kullback-Leibler或卡方散度)的期望变化。其中一些不等式依赖于信息矩阵的新概念。在文章的第二部分,将抽象下界应用于几种统计模型,包括高斯白噪声模型、边界估计问题、高斯序列模型和高维线性回归模型。对于这些特定的统计应用,会出现不同类型的偏差-方差权衡,其强度差异很大。对于高斯白噪声模型中积分平方偏差和积分方差之间的权衡,我们提出将下界的一般策略与约简技术相结合。这允许我们将原始问题简化为在更简单的统计模型中具有附加对称性的估计量的偏差-方差权衡的下界。为了突出提出的框架的可能扩展,我们还简要讨论了偏差和平均绝对偏差之间的权衡。
{"title":"On lower bounds for the bias-variance trade-off","authors":"Alexis Derumigny, Johannes Schmidt-Hieber","doi":"10.1214/23-aos2279","DOIUrl":"https://doi.org/10.1214/23-aos2279","url":null,"abstract":"It is a common phenomenon that for high-dimensional and nonparametric statistical models, rate-optimal estimators balance squared bias and variance. Although this balancing is widely observed, little is known whether methods exist that could avoid the trade-off between bias and variance. We propose a general strategy to obtain lower bounds on the variance of any estimator with bias smaller than a prespecified bound. This shows to which extent the bias-variance trade-off is unavoidable and allows to quantify the loss of performance for methods that do not obey it. The approach is based on a number of abstract lower bounds for the variance involving the change of expectation with respect to different probability measures as well as information measures such as the Kullback-Leibler or chi-square divergence. Some of these inequalities rely on a new concept of information matrices. In a second part of the article, the abstract lower bounds are applied to several statistical models including the Gaussian white noise model, a boundary estimation problem, the Gaussian sequence model and the high-dimensional linear regression model. For these specific statistical applications, different types of bias-variance trade-offs occur that vary considerably in their strength. For the trade-off between integrated squared bias and integrated variance in the Gaussian white noise model, we propose to combine the general strategy for lower bounds with a reduction technique. This allows us to reduce the original problem to a lower bound on the bias-variance trade-off for estimators with additional symmetry properties in a simpler statistical model. To highlight possible extensions of the proposed framework, we moreover briefly discuss the trade-off between bias and mean absolute deviation.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134951950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Off-policy evaluation in partially observed Markov decision processes under sequential ignorability 序列可忽略性条件下部分观察马尔可夫决策过程的偏离策略评价
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-08-01 DOI: 10.1214/23-aos2287
Yuchen Hu, Stefan Wager
We consider off-policy evaluation of dynamic treatment rules under sequential ignorability, given an assumption that the underlying system can be modeled as a partially observed Markov decision process (POMDP). We propose an estimator, partial history importance weighting, and show that it can consistently estimate the stationary mean rewards of a target policy, given long enough draws from the behavior policy. We provide an upper bound on its error that decays polynomially in the number of observations (i.e., the number of trajectories times their length) with an exponent that depends on the overlap of the target and behavior policies as well as the mixing time of the underlying system. Furthermore, we show that this rate of convergence is minimax, given only our assumptions on mixing and overlap. Our results establish that off-policy evaluation in POMDPs is strictly harder than off-policy evaluation in (fully observed) Markov decision processes but strictly easier than model-free off-policy evaluation.
假设底层系统可以被建模为部分观察马尔可夫决策过程(POMDP),我们考虑在顺序可忽略性条件下动态处理规则的离策略评估。我们提出了一个估计器,部分历史重要性加权,并表明它可以一致地估计目标策略的平稳平均奖励,给定足够长的行为策略。我们提供了其误差的上界,该误差随观测数(即轨迹数乘以其长度)的多项式衰减,其指数取决于目标和行为策略的重叠以及底层系统的混合时间。进一步,我们证明了这种收敛速度是极小极大的,只给我们的假设混合和重叠。我们的研究结果表明,pomdp中的off-policy评估比(完全观察到的)Markov决策过程中的off-policy评估严格困难,但比无模型的off-policy评估严格容易。
{"title":"Off-policy evaluation in partially observed Markov decision processes under sequential ignorability","authors":"Yuchen Hu, Stefan Wager","doi":"10.1214/23-aos2287","DOIUrl":"https://doi.org/10.1214/23-aos2287","url":null,"abstract":"We consider off-policy evaluation of dynamic treatment rules under sequential ignorability, given an assumption that the underlying system can be modeled as a partially observed Markov decision process (POMDP). We propose an estimator, partial history importance weighting, and show that it can consistently estimate the stationary mean rewards of a target policy, given long enough draws from the behavior policy. We provide an upper bound on its error that decays polynomially in the number of observations (i.e., the number of trajectories times their length) with an exponent that depends on the overlap of the target and behavior policies as well as the mixing time of the underlying system. Furthermore, we show that this rate of convergence is minimax, given only our assumptions on mixing and overlap. Our results establish that off-policy evaluation in POMDPs is strictly harder than off-policy evaluation in (fully observed) Markov decision processes but strictly easier than model-free off-policy evaluation.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135055876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Allergic rhinitis facts from an Irish pediatric population. 过敏性鼻炎事实从爱尔兰儿科人群。
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-06-19 eCollection Date: 2023-12-01 DOI: 10.1002/wjo2.105
Andreea Nae, Colleen B Heffernan, Michael Colreavy

Objective: Assessing the main allergens in the pediatric population from the largest urban area in the country.

Methods: Clinical letters of patients referred with possible allergic rhinitis (AR) were retrospectively reviewed over the past 5 years.

Results: Five hundred and fifty-five patients were included. Males suffer twice as often with AR than females and have high titers of allergens. House dust mites (44.7%) and grass pollen (29%) were the main allergens in our area, with 48% of patients sensitized to both allergens. Half of the patients had the diagnosis of AR confirmed with positive allergen-specific tests. For the other half, the diagnosis was based on a clinical assessment performed by a pediatric otolaryngologist.

Conclusions: Half of suspected AR children have environmental allergen sensitivity confirmed by testing, and a large number had a clinical diagnosis of AR after an otolaryngology consultation. Our findings can help clinicians to initiate AR treatment considering the most problematic allergens in the area.

目的:评估我国最大城市儿童人群的主要过敏原。方法:回顾性分析近5年来疑似变应性鼻炎(AR)患者的临床资料。结果:共纳入555例患者。男性患AR的频率是女性的两倍,并且过敏原滴度高。屋尘螨(44.7%)和草花粉(29%)是本区主要的过敏原,其中48%的患者对两种过敏原均有过敏反应。一半的患者通过过敏原特异性测试阳性确诊为AR。对于另一半,诊断是基于儿科耳鼻喉科医生进行的临床评估。结论:半数疑似AR患儿经检测确诊为环境过敏原敏感,大量患儿经耳鼻喉科会诊后临床诊断为AR。我们的研究结果可以帮助临床医生开始考虑该地区最严重的过敏原的AR治疗。
{"title":"Allergic rhinitis facts from an Irish pediatric population.","authors":"Andreea Nae, Colleen B Heffernan, Michael Colreavy","doi":"10.1002/wjo2.105","DOIUrl":"10.1002/wjo2.105","url":null,"abstract":"<p><strong>Objective: </strong>Assessing the main allergens in the pediatric population from the largest urban area in the country.</p><p><strong>Methods: </strong>Clinical letters of patients referred with possible allergic rhinitis (AR) were retrospectively reviewed over the past 5 years.</p><p><strong>Results: </strong>Five hundred and fifty-five patients were included. Males suffer twice as often with AR than females and have high titers of allergens. House dust mites (44.7%) and grass pollen (29%) were the main allergens in our area, with 48% of patients sensitized to both allergens. Half of the patients had the diagnosis of AR confirmed with positive allergen-specific tests. For the other half, the diagnosis was based on a clinical assessment performed by a pediatric otolaryngologist.</p><p><strong>Conclusions: </strong>Half of suspected AR children have environmental allergen sensitivity confirmed by testing, and a large number had a clinical diagnosis of AR after an otolaryngology consultation. Our findings can help clinicians to initiate AR treatment considering the most problematic allergens in the area.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"39 1","pages":"333-339"},"PeriodicalIF":0.0,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10696270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88044541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extreme value inference for heterogeneous power law data 异构幂律数据的极值推断
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-06-01 DOI: 10.1214/23-aos2294
John H.J. Einmahl, Yi He
We extend extreme value statistics to independent data with possibly very different distributions. In particular, we present novel asymptotic normality results for the Hill estimator, which now estimates the extreme value index of the average distribution. Due to the heterogeneity, the asymptotic variance can be substantially smaller than that in the i.i.d. case. As a special case, we consider a heterogeneous scales model where the asymptotic variance can be calculated explicitly. The primary tool for the proofs is the functional central limit theorem for a weighted tail empirical process. We also present asymptotic normality results for the extreme quantile estimator. A simulation study shows the good finite-sample behavior of our limit theorems. We also present applications to assess the tail heaviness of earthquake energies and of cross-sectional stock market losses.
我们将极值统计扩展到具有可能非常不同分布的独立数据。特别地,我们给出了Hill估计量的新的渐近正态性结果,它现在估计平均分布的极值指数。由于异质性,渐近方差可以大大小于i.i.d情况。作为一种特殊情况,我们考虑一个异质尺度模型,其中渐近方差可以显式计算。证明的主要工具是加权尾经验过程的泛函中心极限定理。我们也给出了极值分位数估计的渐近正态性结果。仿真研究表明,我们的极限定理具有良好的有限样本性质。我们也提出应用来评估地震能量的尾重和横截面股市损失。
{"title":"Extreme value inference for heterogeneous power law data","authors":"John H.J. Einmahl, Yi He","doi":"10.1214/23-aos2294","DOIUrl":"https://doi.org/10.1214/23-aos2294","url":null,"abstract":"We extend extreme value statistics to independent data with possibly very different distributions. In particular, we present novel asymptotic normality results for the Hill estimator, which now estimates the extreme value index of the average distribution. Due to the heterogeneity, the asymptotic variance can be substantially smaller than that in the i.i.d. case. As a special case, we consider a heterogeneous scales model where the asymptotic variance can be calculated explicitly. The primary tool for the proofs is the functional central limit theorem for a weighted tail empirical process. We also present asymptotic normality results for the extreme quantile estimator. A simulation study shows the good finite-sample behavior of our limit theorems. We also present applications to assess the tail heaviness of earthquake energies and of cross-sectional stock market losses.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135046050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference on the maximal rank of time-varying covariance matrices using high-frequency data 基于高频数据的时变协方差矩阵最大秩的推断
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-04-01 DOI: 10.1214/23-aos2273
Markus Reiss, Lars Winkelmann
We study the rank of the instantaneous or spot covariance matrix ΣX(t) of a multidimensional process X(t). Given high-frequency observations X(i/n), i=0,…,n, we test the null hypothesis rank(ΣX(t))≤r for all t against local alternatives where the average (r+1)st eigenvalue is larger than some signal detection rate vn. A major problem is that the inherent averaging in local covariance statistics produces a bias that distorts the rank statistics. We show that the bias depends on the regularity and spectral gap of ΣX(t). We establish explicit matrix perturbation and concentration results that provide nonasymptotic uniform critical values and optimal signal detection rates vn. This leads to a rank estimation method via sequential testing. For a class of stochastic volatility models, we determine data-driven critical values via normed p-variations of estimated local covariance matrices. The methods are illustrated by simulations and an application to high-frequency data of U.S. government bonds.
我们研究了多维过程X(t)的瞬时或点协方差矩阵ΣX(t)的秩。给定高频观测值X(i/n), i=0,…,n,我们对所有t针对局部替代方案检验零假设秩(ΣX(t))≤r,其中平均(r+1)st特征值大于某些信号检测率vn。一个主要问题是局部协方差统计中固有的平均会产生偏差,从而扭曲秩统计。我们表明,偏差取决于ΣX(t)的规律性和谱间隙。我们建立了显式矩阵摄动和集中结果,提供了非渐近一致临界值和最佳信号检测率vn。这导致了通过顺序测试的秩估计方法。对于一类随机波动模型,我们通过估计的局部协方差矩阵的归一化p变来确定数据驱动的临界值。通过模拟和对美国政府债券高频数据的应用说明了这些方法。
{"title":"Inference on the maximal rank of time-varying covariance matrices using high-frequency data","authors":"Markus Reiss, Lars Winkelmann","doi":"10.1214/23-aos2273","DOIUrl":"https://doi.org/10.1214/23-aos2273","url":null,"abstract":"We study the rank of the instantaneous or spot covariance matrix ΣX(t) of a multidimensional process X(t). Given high-frequency observations X(i/n), i=0,…,n, we test the null hypothesis rank(ΣX(t))≤r for all t against local alternatives where the average (r+1)st eigenvalue is larger than some signal detection rate vn. A major problem is that the inherent averaging in local covariance statistics produces a bias that distorts the rank statistics. We show that the bias depends on the regularity and spectral gap of ΣX(t). We establish explicit matrix perturbation and concentration results that provide nonasymptotic uniform critical values and optimal signal detection rates vn. This leads to a rank estimation method via sequential testing. For a class of stochastic volatility models, we determine data-driven critical values via normed p-variations of estimated local covariance matrices. The methods are illustrated by simulations and an application to high-frequency data of U.S. government bonds.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"483 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135673417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimally tackling covariate shift in RKHS-based nonparametric regression 基于rkhs的非参数回归中协变量移位的优化处理
1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-04-01 DOI: 10.1214/23-aos2268
Cong Ma, Reese Pathak, Martin J. Wainwright
We study the covariate shift problem in the context of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We focus on two natural families of covariate shift problems defined using the likelihood ratios between the source and target distributions. When the likelihood ratios are uniformly bounded, we prove that the kernel ridge regression (KRR) estimator with a carefully chosen regularization parameter is minimax rate-optimal (up to a log factor) for a large family of RKHSs with regular kernel eigenvalues. Interestingly, KRR does not require full knowledge of the likelihood ratio apart from an upper bound on it. In striking contrast to the standard statistical setting without covariate shift, we also demonstrate that a naïve estimator, which minimizes the empirical risk over the function class, is strictly suboptimal under covariate shift as compared to KRR. We then address the larger class of covariate shift problems where likelihood ratio is possibly unbounded yet has a finite second moment. Here, we propose a reweighted KRR estimator that weights samples based on a careful truncation of the likelihood ratios. Again, we are able to show that this estimator is minimax optimal, up to logarithmic factors.
研究了非参数回归在再现核希尔伯特空间(RKHS)上的协变量移位问题。我们关注两个自然的协变量移位问题族,使用源分布和目标分布之间的似然比来定义。当似然比一致有界时,我们证明了具有正则核特征值的核脊回归(KRR)估计器具有精心选择的正则化参数是最小最大率最优的(高达一个对数因子)。有趣的是,KRR不需要完全了解似然比,除了它的上界。与没有协变量移位的标准统计设置形成鲜明对比,我们还证明了与KRR相比,在协变量移位下,将函数类的经验风险最小化的naïve估计器是严格次优的。然后,我们处理更大的一类协变量移位问题,其中似然比可能是无界的,但具有有限的第二矩。在这里,我们提出了一个重新加权的KRR估计器,该估计器基于仔细截断似然比来对样本进行加权。再一次,我们能够证明这个估计器是最小最大最优的,直到对数因子。
{"title":"Optimally tackling covariate shift in RKHS-based nonparametric regression","authors":"Cong Ma, Reese Pathak, Martin J. Wainwright","doi":"10.1214/23-aos2268","DOIUrl":"https://doi.org/10.1214/23-aos2268","url":null,"abstract":"We study the covariate shift problem in the context of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We focus on two natural families of covariate shift problems defined using the likelihood ratios between the source and target distributions. When the likelihood ratios are uniformly bounded, we prove that the kernel ridge regression (KRR) estimator with a carefully chosen regularization parameter is minimax rate-optimal (up to a log factor) for a large family of RKHSs with regular kernel eigenvalues. Interestingly, KRR does not require full knowledge of the likelihood ratio apart from an upper bound on it. In striking contrast to the standard statistical setting without covariate shift, we also demonstrate that a naïve estimator, which minimizes the empirical risk over the function class, is strictly suboptimal under covariate shift as compared to KRR. We then address the larger class of covariate shift problems where likelihood ratio is possibly unbounded yet has a finite second moment. Here, we propose a reweighted KRR estimator that weights samples based on a careful truncation of the likelihood ratios. Again, we are able to show that this estimator is minimax optimal, up to logarithmic factors.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135673416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On high-dimensional Poisson models with measurement error: Hypothesis testing for nonlinear nonconvex optimization. 具有测量误差的高维泊松模型:非线性非凸优化的假设检验。
IF 4.5 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2023-02-01 DOI: 10.1214/22-aos2248
Fei Jiang, Yeqing Zhou, Jianxuan Liu, Yanyuan Ma

We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the L1 and L2 convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.

本文研究了含噪声高维协变量的泊松回归模型的估计和检验,该模型在噪声大数据分析中有广泛的应用。校正由协变量噪声引起的估计偏差导致非凸目标函数最小化。进一步处理高维问题会使我们对目标函数增加一个可接受的惩罚项。我们提出通过最小化惩罚目标函数来估计回归参数。我们得到了估计器的L1和L2收敛速率,并证明了变量选择的一致性。我们进一步建立了参数的任意子集的渐近正态性,只要其基数增长足够慢,该子集可以有无限多个分量。基于估计量的渐近正态性,我们开发了Wald和score检验,它允许对子集的成员的线性函数进行检验。我们通过广泛的模拟来检验所提出的测试的有限样本性能。最后,该方法成功应用于阿尔茨海默病神经影像学倡议研究,初步推动了本工作的开展。
{"title":"On high-dimensional Poisson models with measurement error: Hypothesis testing for nonlinear nonconvex optimization.","authors":"Fei Jiang,&nbsp;Yeqing Zhou,&nbsp;Jianxuan Liu,&nbsp;Yanyuan Ma","doi":"10.1214/22-aos2248","DOIUrl":"https://doi.org/10.1214/22-aos2248","url":null,"abstract":"<p><p>We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the <i>L</i><sub>1</sub> and <i>L</i><sub>2</sub> convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"51 1","pages":"233-259"},"PeriodicalIF":4.5,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10438917/pdf/nihms-1868138.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10054730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annals of Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1