Journal of Statistical Planning and Inference最新文献

英文中文

Zero-inflated multivariate tobit regression modeling 零膨胀多元托比特回归建模

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-09-03 DOI: 10.1016/j.jspi.2024.106229

Becky Tang , Henry A. Frye , John A. Silander Jr. , Alan E. Gelfand

A frequent challenge encountered in real-world applications is data having a high proportion of zeros. Focusing on ecological abundance data, much attention has been given to zero-inflated count data. Models for non-negative continuous abundance data with an excess of zeros are rarely discussed. Work presented here considers the creation of a point mass at zero through a left-censoring approach or through a hurdle approach. We incorporate both mechanisms to capture the analog of zero-inflation for count data. Additionally, primary attention has been given to univariate zero-inflated modeling (e.g., single species), whereas data often arise jointly (e.g., a collection of species). With multivariate abundance data, a key issue is to capture dependence among the species at a site, both in terms of positive abundance as well as absence. Therefore, our contribution is a model for multivariate zero-inflated continuous data that are non-negative. Working in a Bayesian framework, we discuss the issue of separating the two sources of zeros and offer model comparison metrics for multivariate zero-inflated data. In an application, we model the total biomass for five tree species obtained from plots established in the Forest Inventory Analysis database in the Northeast region of the United States.

实际应用中经常遇到的一个难题是数据中零的比例很高。以生态丰度数据为重点，零膨胀计数数据受到了广泛关注。而针对零过多的非负连续丰度数据的模型却鲜有讨论。本文介绍的工作考虑了通过左删减法或障碍法在零点处创建一个点质量。我们将这两种机制结合起来，以捕捉计数数据的零膨胀模拟。此外，人们主要关注的是单变量零膨胀建模（如单一物种），而数据往往是共同产生的（如物种集合）。对于多变量丰度数据，一个关键问题是捕捉一个地点物种之间的依赖性，包括正丰度和缺失。因此，我们的贡献是建立了一个非负的多变量零膨胀连续数据模型。在贝叶斯框架下，我们讨论了分离两个零源的问题，并提供了多元零膨胀数据的模型比较指标。在一个应用中，我们对从美国东北部地区森林资源清查分析数据库建立的地块中获得的五个树种的总生物量进行了建模。

{"title":"Zero-inflated multivariate tobit regression modeling","authors":"Becky Tang , Henry A. Frye , John A. Silander Jr. , Alan E. Gelfand","doi":"10.1016/j.jspi.2024.106229","DOIUrl":"10.1016/j.jspi.2024.106229","url":null,"abstract":"<div><p>A frequent challenge encountered in real-world applications is data having a high proportion of zeros. Focusing on ecological abundance data, much attention has been given to zero-inflated count data. Models for non-negative continuous abundance data with an excess of zeros are rarely discussed. Work presented here considers the creation of a point mass at zero through a left-censoring approach or through a hurdle approach. We incorporate both mechanisms to capture the analog of zero-inflation for count data. Additionally, primary attention has been given to univariate zero-inflated modeling (e.g., single species), whereas data often arise jointly (e.g., a collection of species). With multivariate abundance data, a key issue is to capture dependence among the species at a site, both in terms of positive abundance as well as absence. Therefore, our contribution is a model for multivariate zero-inflated continuous data that are non-negative. Working in a Bayesian framework, we discuss the issue of separating the two sources of zeros and offer model comparison metrics for multivariate zero-inflated data. In an application, we model the total biomass for five tree species obtained from plots established in the Forest Inventory Analysis database in the Northeast region of the United States.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"236 ","pages":"Article 106229"},"PeriodicalIF":0.8,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142150410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Convergent stochastic algorithm for estimation in general multivariate correlated frailty models using integrated partial likelihood 利用集成偏似然法对一般多变量相关虚弱模型进行估计的收敛随机算法

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-08-31 DOI: 10.1016/j.jspi.2024.106231

Ajmal Oodally , Luc Duchateau , Estelle Kuhn

The Cox model with unspecified baseline hazard is often used to model survival data. In the case of correlated event times, this model can be extended by introducing random effects, also called frailty terms, leading to the frailty model. Few methods have been put forward to estimate parameters of such frailty models, and they often consider only a particular distribution for the frailty terms and specific correlation structures. In this paper, a new efficient method is introduced to perform parameter estimation by maximizing the integrated partial likelihood. The proposed stochastic estimation procedure can deal with frailty models with a broad choice of distributions for the frailty terms and with any kind of correlation structure between the frailty components, also allowing random interaction terms between the covariates and the frailty components. The almost sure convergence of the stochastic estimation algorithm towards a critical point of the integrated partial likelihood is proved. Numerical convergence properties are evaluated through simulation studies and comparison with existing methods is performed. In particular, the robustness of the proposed method with respect to different parametric baseline hazards and misspecified frailty distributions is demonstrated through simulation. Finally, the method is applied to a mastitis and a bladder cancer dataset.

具有未指定基线危险的 Cox 模型常用于建立生存数据模型。在事件时间相关的情况下，可以通过引入随机效应（也称为虚弱项）来扩展该模型，从而形成虚弱模型。目前很少有方法能估算出这种虚弱模型的参数，而且这些方法往往只考虑虚弱项的特定分布和特定的相关结构。本文引入了一种新的高效方法，通过最大化集成偏似然来进行参数估计。所提出的随机估计程序可以处理脆性项分布选择广泛的脆性模型，以及脆性成分之间任何类型的相关结构，还允许协变量和脆性成分之间的随机交互项。证明了随机估计算法几乎肯定会收敛到集成偏似然法的临界点。通过模拟研究评估了数值收敛特性，并与现有方法进行了比较。特别是，通过仿真证明了所提出的方法对于不同参数基线危险性和错误指定的虚弱分布的鲁棒性。最后，将该方法应用于乳腺炎和膀胱癌数据集。

{"title":"Convergent stochastic algorithm for estimation in general multivariate correlated frailty models using integrated partial likelihood","authors":"Ajmal Oodally , Luc Duchateau , Estelle Kuhn","doi":"10.1016/j.jspi.2024.106231","DOIUrl":"10.1016/j.jspi.2024.106231","url":null,"abstract":"<div><p>The Cox model with unspecified baseline hazard is often used to model survival data. In the case of correlated event times, this model can be extended by introducing random effects, also called frailty terms, leading to the frailty model. Few methods have been put forward to estimate parameters of such frailty models, and they often consider only a particular distribution for the frailty terms and specific correlation structures. In this paper, a new efficient method is introduced to perform parameter estimation by maximizing the integrated partial likelihood. The proposed stochastic estimation procedure can deal with frailty models with a broad choice of distributions for the frailty terms and with any kind of correlation structure between the frailty components, also allowing random interaction terms between the covariates and the frailty components. The almost sure convergence of the stochastic estimation algorithm towards a critical point of the integrated partial likelihood is proved. Numerical convergence properties are evaluated through simulation studies and comparison with existing methods is performed. In particular, the robustness of the proposed method with respect to different parametric baseline hazards and misspecified frailty distributions is demonstrated through simulation. Finally, the method is applied to a mastitis and a bladder cancer dataset.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"236 ","pages":"Article 106231"},"PeriodicalIF":0.8,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142162836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Effect of dimensionality on convergence rates of kernel ridge regression estimator 维度对核脊回归估计器收敛率的影响

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-08-26 DOI: 10.1016/j.jspi.2024.106228

Kwan-Young Bak , Woojoo Lee

Despite the curse of dimensionality, kernel ridge regression often exhibits good performance in practical applications, even when the dimension is moderately large. However, it has been shown that kernel ridge regression cannot be free from the curse of dimensionality. Until now, the literature on kernel ridge regression has suggested that the gap between theory and practice in relation to dimensionality has not narrowed. In this study, we first investigate when the influence of dimensionality does not significantly affect the convergence rate of the kernel ridge regression. Specifically, we study the convergence rate of

L_{2}

and

L_{\infty}

risks for the kernel ridge estimator, with a focus on reproducing kernel Hilbert space (RKHS) generated by a product kernel. We show that the univariate optimal convergence rate up to a logarithmic factor in

L_{2}

and

L_{\infty}

risks can be achieved by controlling the size of the RKHS. The result of a numerical study confirms our theoretical findings.

尽管存在 "维度诅咒"，但核岭回归在实际应用中往往表现出良好的性能，即使维度适中时也是如此。然而，研究表明，核岭回归无法摆脱维度诅咒。迄今为止，有关核岭回归的文献表明，理论与实践在维度方面的差距并没有缩小。在本研究中，我们首先研究了当维度的影响不会显著影响核岭回归的收敛速度时的情况。具体来说，我们研究了核脊估计器的收敛率和风险，重点是乘积核生成的再现核希尔伯特空间（RKHS）。我们的研究表明，通过控制 RKHS 的大小，可以实现单变量最优收敛率，达到和风险的对数因子。数值研究结果证实了我们的理论发现。

{"title":"Effect of dimensionality on convergence rates of kernel ridge regression estimator","authors":"Kwan-Young Bak , Woojoo Lee","doi":"10.1016/j.jspi.2024.106228","DOIUrl":"10.1016/j.jspi.2024.106228","url":null,"abstract":"<div><div>Despite the curse of dimensionality, kernel ridge regression often exhibits good performance in practical applications, even when the dimension is moderately large. However, it has been shown that kernel ridge regression cannot be free from the curse of dimensionality. Until now, the literature on kernel ridge regression has suggested that the gap between theory and practice in relation to dimensionality has not narrowed. In this study, we first investigate when the influence of dimensionality does not significantly affect the convergence rate of the kernel ridge regression. Specifically, we study the convergence rate of <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> and <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>∞</mi></mrow></msub></math></span> risks for the kernel ridge estimator, with a focus on reproducing kernel Hilbert space (RKHS) generated by a product kernel. We show that the univariate optimal convergence rate up to a logarithmic factor in <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> and <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>∞</mi></mrow></msub></math></span> risks can be achieved by controlling the size of the RKHS. The result of a numerical study confirms our theoretical findings.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"236 ","pages":"Article 106228"},"PeriodicalIF":0.8,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayes oracle property of multiple tests of multivariate normal means under sparsity 稀疏性条件下多元正态均值多重检验的贝叶斯神谕特性

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-08-22 DOI: 10.1016/j.jspi.2024.106227

Zikun Qin, Malay Ghosh

The paper considers a multiple testing problem of multivariate normal means under sparsity. First, the Bayes risk of the multivariate Bayes oracle is derived. Then, a hierarchical Bayesian approach is taken with global–local shrinkage priors, where the global parameter is either treated as a tuning parameter or is given a specific prior. The method is shown to attain an asymptotic Bayes optimal under sparsity (ABOS) property. Finally, an empirical Bayes procedure is proposed which involves estimation of the global shrinkage parameter. The approach is also shown to lead to the ABOS property.

本文研究了稀疏性条件下的多元正态均值多重检验问题。首先，推导出多元贝叶斯神谕的贝叶斯风险。然后，采用全局-局部收缩先验的分层贝叶斯方法，其中全局参数要么被视为调整参数，要么被赋予特定先验。结果表明，该方法具有稀疏性下的渐进贝叶斯最优（ABOS）特性。最后，提出了一种经验贝叶斯程序，涉及全局收缩参数的估计。该方法也显示出 ABOS 特性。

引用次数: 0

Assessing heterogeneity in treatment initiation guidelines in longitudinal randomized controlled trials 评估纵向随机对照试验中治疗启动指南的异质性

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-08-12 DOI: 10.1016/j.jspi.2024.106226

Hyunkeun Ryan Cho , Seonjin Kim

Treatment initiation guidelines are essential in healthcare, dictating when patients begin therapy. These guidelines are typically assessed through randomized controlled trials (RCTs) to measure their average effect on a population. However, this method may not fully account for patient heterogeneity. We introduce a refined analysis methodology that accounts for diverse times to treatment initiation (TTI) arising from these guidelines. We offer a more detailed perspective on the guidelines’ impact by analyzing homogeneous subpopulations based on their TTI. We develop a longitudinal regression model with smooth time functions to capture dynamic changes in average guideline effects on subpopulations (AGES). A unique weighting mechanism creates pseudo-subpopulations from RCT data, enabling consistent and precise estimation of smooth functions. The efficacy of our approach is validated through theoretical and numerical studies, underscoring its capacity to provide insightful statistical inferences. We exemplify the utility of our methodology by applying it to an RCT of the World Health Organization (WHO) guideline for adults with HIV. This analysis promises to enhance the evaluation of treatment initiation guidelines, leading to more personalized and efficient patient care.

治疗起始指南在医疗保健中至关重要，它规定了患者何时开始治疗。这些指南通常通过随机对照试验（RCT）进行评估，以衡量其对人群的平均影响。然而，这种方法可能无法完全考虑患者的异质性。我们介绍了一种经过改进的分析方法，该方法考虑到了这些指南所产生的不同的开始治疗时间（TTI）。通过分析基于 TTI 的同质亚人群，我们可以更详细地了解指南的影响。我们建立了一个具有平滑时间函数的纵向回归模型，以捕捉指南对亚人群平均影响（AGES）的动态变化。一种独特的加权机制可从 RCT 数据中创建伪亚群，从而对平滑函数进行一致而精确的估算。我们通过理论和数值研究验证了这一方法的有效性，并强调了其提供有洞察力的统计推论的能力。我们将这一方法应用于世界卫生组织（WHO）针对成人艾滋病病毒感染者指南的一项 RCT 研究，以此来说明这一方法的实用性。这项分析有望加强对治疗起始指南的评估，从而为患者提供更加个性化和高效的护理。

{"title":"Assessing heterogeneity in treatment initiation guidelines in longitudinal randomized controlled trials","authors":"Hyunkeun Ryan Cho , Seonjin Kim","doi":"10.1016/j.jspi.2024.106226","DOIUrl":"10.1016/j.jspi.2024.106226","url":null,"abstract":"<div><p>Treatment initiation guidelines are essential in healthcare, dictating when patients begin therapy. These guidelines are typically assessed through randomized controlled trials (RCTs) to measure their average effect on a population. However, this method may not fully account for patient heterogeneity. We introduce a refined analysis methodology that accounts for diverse times to treatment initiation (TTI) arising from these guidelines. We offer a more detailed perspective on the guidelines’ impact by analyzing homogeneous subpopulations based on their TTI. We develop a longitudinal regression model with smooth time functions to capture dynamic changes in average guideline effects on subpopulations (AGES). A unique weighting mechanism creates pseudo-subpopulations from RCT data, enabling consistent and precise estimation of smooth functions. The efficacy of our approach is validated through theoretical and numerical studies, underscoring its capacity to provide insightful statistical inferences. We exemplify the utility of our methodology by applying it to an RCT of the World Health Organization (WHO) guideline for adults with HIV. This analysis promises to enhance the evaluation of treatment initiation guidelines, leading to more personalized and efficient patient care.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"235 ","pages":"Article 106226"},"PeriodicalIF":0.8,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A graph decomposition-based approach for the graph-fused lasso 基于图分解的图融合套索方法

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-08-10 DOI: 10.1016/j.jspi.2024.106221

Feng Yu , Archer Yi Yang , Teng Zhang

We propose a new algorithm for solving the graph-fused lasso (GFL), a regularized model that operates under the assumption that the signal tends to be locally constant over a predefined graph structure. The proposed method applies a novel decomposition of the objective function for the alternating direction method of multipliers (ADMM) algorithm. While ADMM has been widely used in fused lasso problems, existing works such as the network lasso decompose the objective function into the loss function component and the total variation penalty component. In contrast, based on the graph matching technique in graph theory, we propose a new method of decomposition that separates the objective function into two components, where one component is the loss function plus part of the total variation penalty, and the other component is the remaining total variation penalty. We develop an exact convergence rate of the proposed algorithm by developing a general theory on the local convergence of ADMM. Compared with the network lasso algorithm, our algorithm has a faster exact linear convergence rate (although in the same order as for the network lasso). It also enjoys a smaller computational cost per iteration, thus converges overall faster in most numerical examples.

我们提出了一种求解图融合套索（GFL）的新算法，这是一种正则化模型，其运行假设是信号在预定义的图结构上趋于局部恒定。所提出的方法对交替方向乘法（ADMM）算法的目标函数进行了新的分解。虽然 ADMM 已广泛应用于融合套索问题，但现有的工作（如网络套索）将目标函数分解为损失函数部分和总变异惩罚部分。相比之下，我们基于图论中的图匹配技术，提出了一种新的分解方法，将目标函数分解为两个部分，其中一个部分是损失函数加上部分总变化惩罚，另一个部分是剩余的总变化惩罚。通过发展 ADMM 局部收敛的一般理论，我们得出了所提算法的精确收敛率。与网络套索算法相比，我们的算法具有更快的精确线性收敛速度（尽管与网络套索算法的收敛速度相同）。它的每次迭代计算成本也更低，因此在大多数数值示例中总体收敛速度更快。

{"title":"A graph decomposition-based approach for the graph-fused lasso","authors":"Feng Yu , Archer Yi Yang , Teng Zhang","doi":"10.1016/j.jspi.2024.106221","DOIUrl":"10.1016/j.jspi.2024.106221","url":null,"abstract":"<div><p>We propose a new algorithm for solving the graph-fused lasso (GFL), a regularized model that operates under the assumption that the signal tends to be locally constant over a predefined graph structure. The proposed method applies a novel decomposition of the objective function for the alternating direction method of multipliers (ADMM) algorithm. While ADMM has been widely used in fused lasso problems, existing works such as the network lasso decompose the objective function into the loss function component and the total variation penalty component. In contrast, based on the graph matching technique in graph theory, we propose a new method of decomposition that separates the objective function into two components, where one component is the loss function plus part of the total variation penalty, and the other component is the remaining total variation penalty. We develop an exact convergence rate of the proposed algorithm by developing a general theory on the local convergence of ADMM. Compared with the network lasso algorithm, our algorithm has a faster exact linear convergence rate (although in the same order as for the network lasso). It also enjoys a smaller computational cost per iteration, thus converges overall faster in most numerical examples.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"235 ","pages":"Article 106221"},"PeriodicalIF":0.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142096052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exponential consistency of M-estimators in generalized linear mixed models 广义线性混合模型中 M 估计器的指数一致性

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-08-08 DOI: 10.1016/j.jspi.2024.106222

Andrea Bratsberg , Magne Thoresen , Abhik Ghosh

Generalized linear mixed models are powerful tools for analyzing clustered data, where the unknown parameters are classically (and most commonly) estimated by the maximum likelihood and restricted maximum likelihood procedures. However, since the likelihood-based procedures are known to be highly sensitive to outliers, M-estimators have become popular as a means to obtain robust estimates under possible data contamination. In this paper, we prove that for sufficiently smooth general loss functions defining the M-estimators in generalized linear mixed models, the tail probability of the deviation between the estimated and the true regression coefficients has an exponential bound. This implies an exponential rate of consistency of these M-estimators under appropriate assumptions, generalizing the existing exponential consistency results from univariate to multivariate responses. We have illustrated this theoretical result further for the special examples of the maximum likelihood estimator and the robust minimum density power divergence estimator, a popular example of model-based M-estimators, in the settings of linear and logistic mixed models, comparing it with the empirical rate of convergence through simulation studies.

广义线性混合模型是分析聚类数据的强大工具，其中的未知参数通常（也是最常用的）通过最大似然和限制最大似然程序进行估计。然而，众所周知，基于似然法的程序对异常值非常敏感，因此，M-估计器作为一种在可能的数据污染情况下获得稳健估计值的方法而备受青睐。本文证明，对于定义广义线性混合模型中 M-estimators 的足够平滑的一般损失函数，估计值与真实回归系数之间偏差的尾部概率具有指数约束。这意味着在适当的假设条件下，这些 M-estimators 的指数一致性率，将现有的指数一致性结果从单变量推广到多变量响应。我们在线性模型和逻辑混合模型中，以最大似然估计器和稳健最小密度功率发散估计器（基于模型的 M-estimators 的一个流行例子）为例，进一步说明了这一理论结果，并通过模拟研究将其与经验收敛率进行了比较。

{"title":"Exponential consistency of M-estimators in generalized linear mixed models","authors":"Andrea Bratsberg , Magne Thoresen , Abhik Ghosh","doi":"10.1016/j.jspi.2024.106222","DOIUrl":"10.1016/j.jspi.2024.106222","url":null,"abstract":"<div><p>Generalized linear mixed models are powerful tools for analyzing clustered data, where the unknown parameters are classically (and most commonly) estimated by the maximum likelihood and restricted maximum likelihood procedures. However, since the likelihood-based procedures are known to be highly sensitive to outliers, M-estimators have become popular as a means to obtain robust estimates under possible data contamination. In this paper, we prove that for sufficiently smooth general loss functions defining the M-estimators in generalized linear mixed models, the tail probability of the deviation between the estimated and the true regression coefficients has an exponential bound. This implies an exponential rate of consistency of these M-estimators under appropriate assumptions, generalizing the existing exponential consistency results from univariate to multivariate responses. We have illustrated this theoretical result further for the special examples of the maximum likelihood estimator and the robust minimum density power divergence estimator, a popular example of model-based M-estimators, in the settings of linear and logistic mixed models, comparing it with the empirical rate of convergence through simulation studies.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"235 ","pages":"Article 106222"},"PeriodicalIF":0.8,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S037837582400079X/pdfft?md5=852e7e6dbe375fd6c8f548a7fe669070&pid=1-s2.0-S037837582400079X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141990800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A criterion for estimating the largest linear homoscedastic zone in Gaussian data 估计高斯数据中最大线性同余区的标准

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-08-06 DOI: 10.1016/j.jspi.2024.106223

Jean-Marc Bardet

A criterion is constructed to identify the largest homoscedastic region in a Gaussian dataset. This can be reduced to a one-sided non-parametric break detection, knowing that up to a certain index the output is governed by a linear homoscedastic model, while after this index it is different (e.g. a different model, different variables, different volatility, ….). We show the convergence of the estimator of this index, with asymptotic concentration inequalities that can be exponential. A criterion and convergence results are derived when the linear homoscedastic zone is bounded by two breaks on both sides. Additionally, a criterion for choosing between zero, one, or two breaks is proposed. Monte Carlo experiments will also confirm its very good numerical performance.

我们构建了一个标准来识别高斯数据集中最大的同方差区域。这可以简化为单边非参数断裂检测，即在某一指数之前，输出由线性同方差模型控制，而在该指数之后，输出则不同（例如，不同的模型、不同的变量、不同的波动率，....）。我们展示了该指数估计值的收敛性，其渐近集中不等式可能是指数型的。当线性同余区两侧有两个断点时，我们将得出一个标准和收敛结果。此外，还提出了在零、一或两个断点之间进行选择的标准。蒙特卡罗实验也将证实其非常好的数值性能。

引用次数: 0

Statistical inference from partially nominated sets: An application to estimating the prevalence of osteoporosis among adult women 从部分提名集进行统计推断：应用于估算成年女性骨质疏松症患病率

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-07-26 DOI: 10.1016/j.jspi.2024.106214

Zeinab Akbari Ghamsari , Ehsan Zamanzade , Majid Asadi

This paper focuses on drawing statistical inference based on a novel variant of maxima or minima nomination sampling (NS) designs. These sampling designs are useful for obtaining more representative sample units from the tails of the population distribution using the available auxiliary ranking information. However, one common difficulty in performing NS in practice is that the researcher cannot obtain a nominated sample unless he/she uniquely determines the sample unit with the highest or the lowest rank in each set. To overcome this problem, a variant of NS, which is called partial nomination sampling, is proposed, in which the researcher is allowed to declare that two or more units are tied in the ranks whenever he/she cannot find the sample unit with the highest or the lowest rank. Based on this sampling design, two asymptotically unbiased estimators are developed for the cumulative distribution function, which is obtained using maximum likelihood and moment-based approaches, and their asymptotic normalities are proved. Several numerical studies have shown that the proposed estimators have higher relative efficiencies than their counterparts in simple random sampling in analyzing either the upper or the lower tail of the parent distribution. The procedures that we developed are then implemented on a real dataset from the Third National Health and Nutrition Examination Survey (NHANES III) to estimate the prevalence of osteoporosis among adult women aged 50 and over. It is shown that in certain circumstances, the techniques that we have developed require only one-third of the sample size needed in SRS to achieve the desired precision. This results in a considerable reduction in time and cost compared to the standard SRS method.

本文的重点是基于最大值或最小值提名抽样（NS）设计的新型变体进行统计推断。这些抽样设计有助于利用现有的辅助排序信息，从总体分布的尾部获得更具代表性的样本单位。然而，在实践中执行提名抽样的一个常见困难是，除非研究人员唯一确定每组中排名最高或最低的样本单位，否则无法获得提名样本。为了克服这个问题，我们提出了 NS 的一种变体，即部分提名抽样，允许研究人员在找不到排名最高或最低的样本单位时，宣布两个或两个以上的单位排名并列。基于这种抽样设计，利用最大似然法和基于矩的方法为累积分布函数建立了两个渐近无偏估计器，并证明了它们的渐近正态性。几项数值研究表明，在分析母分布的上尾或下尾时，所提出的估计器比简单随机抽样中的同类估计器具有更高的相对效率。随后，我们在第三次全国健康与营养调查（NHANES III）的真实数据集上实施了所开发的程序，以估计 50 岁及以上成年女性的骨质疏松症患病率。结果表明，在某些情况下，我们开发的技术只需要 SRS 所需的样本量的三分之一就能达到预期精度。与标准 SRS 方法相比，这大大减少了时间和成本。

{"title":"Statistical inference from partially nominated sets: An application to estimating the prevalence of osteoporosis among adult women","authors":"Zeinab Akbari Ghamsari , Ehsan Zamanzade , Majid Asadi","doi":"10.1016/j.jspi.2024.106214","DOIUrl":"10.1016/j.jspi.2024.106214","url":null,"abstract":"<div><p>This paper focuses on drawing statistical inference based on a novel variant of maxima or minima nomination sampling (NS) designs. These sampling designs are useful for obtaining more representative sample units from the tails of the population distribution using the available auxiliary ranking information. However, one common difficulty in performing NS in practice is that the researcher cannot obtain a nominated sample unless he/she uniquely determines the sample unit with the highest or the lowest rank in each set. To overcome this problem, a variant of NS, which is called partial nomination sampling, is proposed, in which the researcher is allowed to declare that two or more units are tied in the ranks whenever he/she cannot find the sample unit with the highest or the lowest rank. Based on this sampling design, two asymptotically unbiased estimators are developed for the cumulative distribution function, which is obtained using maximum likelihood and moment-based approaches, and their asymptotic normalities are proved. Several numerical studies have shown that the proposed estimators have higher relative efficiencies than their counterparts in simple random sampling in analyzing either the upper or the lower tail of the parent distribution. The procedures that we developed are then implemented on a real dataset from the Third National Health and Nutrition Examination Survey (NHANES III) to estimate the prevalence of osteoporosis among adult women aged 50 and over. It is shown that in certain circumstances, the techniques that we have developed require only one-third of the sample size needed in SRS to achieve the desired precision. This results in a considerable reduction in time and cost compared to the standard SRS method.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"235 ","pages":"Article 106214"},"PeriodicalIF":0.8,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Stable convergence of conditional least squares estimators for supercritical continuous state and continuous time branching processes with immigration 有移民的超临界连续状态和连续时间分支过程的条件最小二乘估计子的稳定收敛性

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-07-22 DOI: 10.1016/j.jspi.2024.106213

Mátyás Barczy

We prove stable convergence of conditional least squares estimators of drift parameters for supercritical continuous state and continuous time branching processes with immigration based on discrete time observations.

我们证明了超临界连续状态和连续时间分支过程的漂移参数条件最小二乘法估计值的稳定收敛性，并基于离散时间观测结果进行了移民。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Statistical Planning and Inference

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀