首页 > 最新文献

Journal of Statistical Planning and Inference最新文献

英文 中文
Effect of dimensionality on convergence rates of kernel ridge regression estimator 维度对核脊回归估计器收敛率的影响
IF 0.9 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-26 DOI: 10.1016/j.jspi.2024.106228
Kwan-Young Bak, Woojoo Lee
Despite the curse of dimensionality, kernel ridge regression often exhibits good performance in practical applications, even when the dimension is moderately large. However, it has been shown that kernel ridge regression cannot be free from the curse of dimensionality. Until now, the literature on kernel ridge regression has suggested that the gap between theory and practice in relation to dimensionality has not narrowed. In this study, we first investigate when the influence of dimensionality does not significantly affect the convergence rate of the kernel ridge regression. Specifically, we study the convergence rate of and risks for the kernel ridge estimator, with a focus on reproducing kernel Hilbert space (RKHS) generated by a product kernel. We show that the univariate optimal convergence rate up to a logarithmic factor in and risks can be achieved by controlling the size of the RKHS. The result of a numerical study confirms our theoretical findings.
尽管存在 "维度诅咒",但核岭回归在实际应用中往往表现出良好的性能,即使维度适中时也是如此。然而,研究表明,核岭回归无法摆脱维度诅咒。迄今为止,有关核岭回归的文献表明,理论与实践在维度方面的差距并没有缩小。在本研究中,我们首先研究了当维度的影响不会显著影响核岭回归的收敛速度时的情况。具体来说,我们研究了核脊估计器的收敛率和风险,重点是乘积核生成的再现核希尔伯特空间(RKHS)。我们的研究表明,通过控制 RKHS 的大小,可以实现单变量最优收敛率,达到和风险的对数因子。数值研究结果证实了我们的理论发现。
{"title":"Effect of dimensionality on convergence rates of kernel ridge regression estimator","authors":"Kwan-Young Bak, Woojoo Lee","doi":"10.1016/j.jspi.2024.106228","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106228","url":null,"abstract":"Despite the curse of dimensionality, kernel ridge regression often exhibits good performance in practical applications, even when the dimension is moderately large. However, it has been shown that kernel ridge regression cannot be free from the curse of dimensionality. Until now, the literature on kernel ridge regression has suggested that the gap between theory and practice in relation to dimensionality has not narrowed. In this study, we first investigate when the influence of dimensionality does not significantly affect the convergence rate of the kernel ridge regression. Specifically, we study the convergence rate of and risks for the kernel ridge estimator, with a focus on reproducing kernel Hilbert space (RKHS) generated by a product kernel. We show that the univariate optimal convergence rate up to a logarithmic factor in and risks can be achieved by controlling the size of the RKHS. The result of a numerical study confirms our theoretical findings.","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayes oracle property of multiple tests of multivariate normal means under sparsity 稀疏性条件下多元正态均值多重检验的贝叶斯神谕特性
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-22 DOI: 10.1016/j.jspi.2024.106227
Zikun Qin, Malay Ghosh

The paper considers a multiple testing problem of multivariate normal means under sparsity. First, the Bayes risk of the multivariate Bayes oracle is derived. Then, a hierarchical Bayesian approach is taken with global–local shrinkage priors, where the global parameter is either treated as a tuning parameter or is given a specific prior. The method is shown to attain an asymptotic Bayes optimal under sparsity (ABOS) property. Finally, an empirical Bayes procedure is proposed which involves estimation of the global shrinkage parameter. The approach is also shown to lead to the ABOS property.

本文研究了稀疏性条件下的多元正态均值多重检验问题。首先,推导出多元贝叶斯神谕的贝叶斯风险。然后,采用全局-局部收缩先验的分层贝叶斯方法,其中全局参数要么被视为调整参数,要么被赋予特定先验。结果表明,该方法具有稀疏性下的渐进贝叶斯最优(ABOS)特性。最后,提出了一种经验贝叶斯程序,涉及全局收缩参数的估计。该方法也显示出 ABOS 特性。
{"title":"Bayes oracle property of multiple tests of multivariate normal means under sparsity","authors":"Zikun Qin,&nbsp;Malay Ghosh","doi":"10.1016/j.jspi.2024.106227","DOIUrl":"10.1016/j.jspi.2024.106227","url":null,"abstract":"<div><p>The paper considers a multiple testing problem of multivariate normal means under sparsity. First, the Bayes risk of the multivariate Bayes oracle is derived. Then, a hierarchical Bayesian approach is taken with global–local shrinkage priors, where the global parameter is either treated as a tuning parameter or is given a specific prior. The method is shown to attain an asymptotic Bayes optimal under sparsity (ABOS) property. Finally, an empirical Bayes procedure is proposed which involves estimation of the global shrinkage parameter. The approach is also shown to lead to the ABOS property.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142088421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing heterogeneity in treatment initiation guidelines in longitudinal randomized controlled trials 评估纵向随机对照试验中治疗启动指南的异质性
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-12 DOI: 10.1016/j.jspi.2024.106226
Hyunkeun Ryan Cho , Seonjin Kim

Treatment initiation guidelines are essential in healthcare, dictating when patients begin therapy. These guidelines are typically assessed through randomized controlled trials (RCTs) to measure their average effect on a population. However, this method may not fully account for patient heterogeneity. We introduce a refined analysis methodology that accounts for diverse times to treatment initiation (TTI) arising from these guidelines. We offer a more detailed perspective on the guidelines’ impact by analyzing homogeneous subpopulations based on their TTI. We develop a longitudinal regression model with smooth time functions to capture dynamic changes in average guideline effects on subpopulations (AGES). A unique weighting mechanism creates pseudo-subpopulations from RCT data, enabling consistent and precise estimation of smooth functions. The efficacy of our approach is validated through theoretical and numerical studies, underscoring its capacity to provide insightful statistical inferences. We exemplify the utility of our methodology by applying it to an RCT of the World Health Organization (WHO) guideline for adults with HIV. This analysis promises to enhance the evaluation of treatment initiation guidelines, leading to more personalized and efficient patient care.

治疗起始指南在医疗保健中至关重要,它规定了患者何时开始治疗。这些指南通常通过随机对照试验(RCT)进行评估,以衡量其对人群的平均影响。然而,这种方法可能无法完全考虑患者的异质性。我们介绍了一种经过改进的分析方法,该方法考虑到了这些指南所产生的不同的开始治疗时间(TTI)。通过分析基于 TTI 的同质亚人群,我们可以更详细地了解指南的影响。我们建立了一个具有平滑时间函数的纵向回归模型,以捕捉指南对亚人群平均影响(AGES)的动态变化。一种独特的加权机制可从 RCT 数据中创建伪亚群,从而对平滑函数进行一致而精确的估算。我们通过理论和数值研究验证了这一方法的有效性,并强调了其提供有洞察力的统计推论的能力。我们将这一方法应用于世界卫生组织(WHO)针对成人艾滋病病毒感染者指南的一项 RCT 研究,以此来说明这一方法的实用性。这项分析有望加强对治疗起始指南的评估,从而为患者提供更加个性化和高效的护理。
{"title":"Assessing heterogeneity in treatment initiation guidelines in longitudinal randomized controlled trials","authors":"Hyunkeun Ryan Cho ,&nbsp;Seonjin Kim","doi":"10.1016/j.jspi.2024.106226","DOIUrl":"10.1016/j.jspi.2024.106226","url":null,"abstract":"<div><p>Treatment initiation guidelines are essential in healthcare, dictating when patients begin therapy. These guidelines are typically assessed through randomized controlled trials (RCTs) to measure their average effect on a population. However, this method may not fully account for patient heterogeneity. We introduce a refined analysis methodology that accounts for diverse times to treatment initiation (TTI) arising from these guidelines. We offer a more detailed perspective on the guidelines’ impact by analyzing homogeneous subpopulations based on their TTI. We develop a longitudinal regression model with smooth time functions to capture dynamic changes in average guideline effects on subpopulations (AGES). A unique weighting mechanism creates pseudo-subpopulations from RCT data, enabling consistent and precise estimation of smooth functions. The efficacy of our approach is validated through theoretical and numerical studies, underscoring its capacity to provide insightful statistical inferences. We exemplify the utility of our methodology by applying it to an RCT of the World Health Organization (WHO) guideline for adults with HIV. This analysis promises to enhance the evaluation of treatment initiation guidelines, leading to more personalized and efficient patient care.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A graph decomposition-based approach for the graph-fused lasso 基于图分解的图融合套索方法
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-10 DOI: 10.1016/j.jspi.2024.106221
Feng Yu , Archer Yi Yang , Teng Zhang

We propose a new algorithm for solving the graph-fused lasso (GFL), a regularized model that operates under the assumption that the signal tends to be locally constant over a predefined graph structure. The proposed method applies a novel decomposition of the objective function for the alternating direction method of multipliers (ADMM) algorithm. While ADMM has been widely used in fused lasso problems, existing works such as the network lasso decompose the objective function into the loss function component and the total variation penalty component. In contrast, based on the graph matching technique in graph theory, we propose a new method of decomposition that separates the objective function into two components, where one component is the loss function plus part of the total variation penalty, and the other component is the remaining total variation penalty. We develop an exact convergence rate of the proposed algorithm by developing a general theory on the local convergence of ADMM. Compared with the network lasso algorithm, our algorithm has a faster exact linear convergence rate (although in the same order as for the network lasso). It also enjoys a smaller computational cost per iteration, thus converges overall faster in most numerical examples.

我们提出了一种求解图融合套索(GFL)的新算法,这是一种正则化模型,其运行假设是信号在预定义的图结构上趋于局部恒定。所提出的方法对交替方向乘法(ADMM)算法的目标函数进行了新的分解。虽然 ADMM 已广泛应用于融合套索问题,但现有的工作(如网络套索)将目标函数分解为损失函数部分和总变异惩罚部分。相比之下,我们基于图论中的图匹配技术,提出了一种新的分解方法,将目标函数分解为两个部分,其中一个部分是损失函数加上部分总变化惩罚,另一个部分是剩余的总变化惩罚。通过发展 ADMM 局部收敛的一般理论,我们得出了所提算法的精确收敛率。与网络套索算法相比,我们的算法具有更快的精确线性收敛速度(尽管与网络套索算法的收敛速度相同)。它的每次迭代计算成本也更低,因此在大多数数值示例中总体收敛速度更快。
{"title":"A graph decomposition-based approach for the graph-fused lasso","authors":"Feng Yu ,&nbsp;Archer Yi Yang ,&nbsp;Teng Zhang","doi":"10.1016/j.jspi.2024.106221","DOIUrl":"10.1016/j.jspi.2024.106221","url":null,"abstract":"<div><p>We propose a new algorithm for solving the graph-fused lasso (GFL), a regularized model that operates under the assumption that the signal tends to be locally constant over a predefined graph structure. The proposed method applies a novel decomposition of the objective function for the alternating direction method of multipliers (ADMM) algorithm. While ADMM has been widely used in fused lasso problems, existing works such as the network lasso decompose the objective function into the loss function component and the total variation penalty component. In contrast, based on the graph matching technique in graph theory, we propose a new method of decomposition that separates the objective function into two components, where one component is the loss function plus part of the total variation penalty, and the other component is the remaining total variation penalty. We develop an exact convergence rate of the proposed algorithm by developing a general theory on the local convergence of ADMM. Compared with the network lasso algorithm, our algorithm has a faster exact linear convergence rate (although in the same order as for the network lasso). It also enjoys a smaller computational cost per iteration, thus converges overall faster in most numerical examples.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142096052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exponential consistency of M-estimators in generalized linear mixed models 广义线性混合模型中 M 估计器的指数一致性
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-08 DOI: 10.1016/j.jspi.2024.106222
Andrea Bratsberg , Magne Thoresen , Abhik Ghosh

Generalized linear mixed models are powerful tools for analyzing clustered data, where the unknown parameters are classically (and most commonly) estimated by the maximum likelihood and restricted maximum likelihood procedures. However, since the likelihood-based procedures are known to be highly sensitive to outliers, M-estimators have become popular as a means to obtain robust estimates under possible data contamination. In this paper, we prove that for sufficiently smooth general loss functions defining the M-estimators in generalized linear mixed models, the tail probability of the deviation between the estimated and the true regression coefficients has an exponential bound. This implies an exponential rate of consistency of these M-estimators under appropriate assumptions, generalizing the existing exponential consistency results from univariate to multivariate responses. We have illustrated this theoretical result further for the special examples of the maximum likelihood estimator and the robust minimum density power divergence estimator, a popular example of model-based M-estimators, in the settings of linear and logistic mixed models, comparing it with the empirical rate of convergence through simulation studies.

广义线性混合模型是分析聚类数据的强大工具,其中的未知参数通常(也是最常用的)通过最大似然和限制最大似然程序进行估计。然而,众所周知,基于似然法的程序对异常值非常敏感,因此,M-估计器作为一种在可能的数据污染情况下获得稳健估计值的方法而备受青睐。本文证明,对于定义广义线性混合模型中 M-estimators 的足够平滑的一般损失函数,估计值与真实回归系数之间偏差的尾部概率具有指数约束。这意味着在适当的假设条件下,这些 M-estimators 的指数一致性率,将现有的指数一致性结果从单变量推广到多变量响应。我们在线性模型和逻辑混合模型中,以最大似然估计器和稳健最小密度功率发散估计器(基于模型的 M-estimators 的一个流行例子)为例,进一步说明了这一理论结果,并通过模拟研究将其与经验收敛率进行了比较。
{"title":"Exponential consistency of M-estimators in generalized linear mixed models","authors":"Andrea Bratsberg ,&nbsp;Magne Thoresen ,&nbsp;Abhik Ghosh","doi":"10.1016/j.jspi.2024.106222","DOIUrl":"10.1016/j.jspi.2024.106222","url":null,"abstract":"<div><p>Generalized linear mixed models are powerful tools for analyzing clustered data, where the unknown parameters are classically (and most commonly) estimated by the maximum likelihood and restricted maximum likelihood procedures. However, since the likelihood-based procedures are known to be highly sensitive to outliers, M-estimators have become popular as a means to obtain robust estimates under possible data contamination. In this paper, we prove that for sufficiently smooth general loss functions defining the M-estimators in generalized linear mixed models, the tail probability of the deviation between the estimated and the true regression coefficients has an exponential bound. This implies an exponential rate of consistency of these M-estimators under appropriate assumptions, generalizing the existing exponential consistency results from univariate to multivariate responses. We have illustrated this theoretical result further for the special examples of the maximum likelihood estimator and the robust minimum density power divergence estimator, a popular example of model-based M-estimators, in the settings of linear and logistic mixed models, comparing it with the empirical rate of convergence through simulation studies.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S037837582400079X/pdfft?md5=852e7e6dbe375fd6c8f548a7fe669070&pid=1-s2.0-S037837582400079X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141990800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A criterion for estimating the largest linear homoscedastic zone in Gaussian data 估计高斯数据中最大线性同余区的标准
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-06 DOI: 10.1016/j.jspi.2024.106223
Jean-Marc Bardet

A criterion is constructed to identify the largest homoscedastic region in a Gaussian dataset. This can be reduced to a one-sided non-parametric break detection, knowing that up to a certain index the output is governed by a linear homoscedastic model, while after this index it is different (e.g. a different model, different variables, different volatility, ….). We show the convergence of the estimator of this index, with asymptotic concentration inequalities that can be exponential. A criterion and convergence results are derived when the linear homoscedastic zone is bounded by two breaks on both sides. Additionally, a criterion for choosing between zero, one, or two breaks is proposed. Monte Carlo experiments will also confirm its very good numerical performance.

我们构建了一个标准来识别高斯数据集中最大的同方差区域。这可以简化为单边非参数断裂检测,即在某一指数之前,输出由线性同方差模型控制,而在该指数之后,输出则不同(例如,不同的模型、不同的变量、不同的波动率,....)。我们展示了该指数估计值的收敛性,其渐近集中不等式可能是指数型的。当线性同余区两侧有两个断点时,我们将得出一个标准和收敛结果。此外,还提出了在零、一或两个断点之间进行选择的标准。蒙特卡罗实验也将证实其非常好的数值性能。
{"title":"A criterion for estimating the largest linear homoscedastic zone in Gaussian data","authors":"Jean-Marc Bardet","doi":"10.1016/j.jspi.2024.106223","DOIUrl":"10.1016/j.jspi.2024.106223","url":null,"abstract":"<div><p>A criterion is constructed to identify the largest homoscedastic region in a Gaussian dataset. This can be reduced to a one-sided non-parametric break detection, knowing that up to a certain index the output is governed by a linear homoscedastic model, while after this index it is different (<em>e.g.</em> a different model, different variables, different volatility, ….). We show the convergence of the estimator of this index, with asymptotic concentration inequalities that can be exponential. A criterion and convergence results are derived when the linear homoscedastic zone is bounded by two breaks on both sides. Additionally, a criterion for choosing between zero, one, or two breaks is proposed. Monte Carlo experiments will also confirm its very good numerical performance.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142047862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical inference from partially nominated sets: An application to estimating the prevalence of osteoporosis among adult women 从部分提名集进行统计推断:应用于估算成年女性骨质疏松症患病率
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-26 DOI: 10.1016/j.jspi.2024.106214
Zeinab Akbari Ghamsari , Ehsan Zamanzade , Majid Asadi

This paper focuses on drawing statistical inference based on a novel variant of maxima or minima nomination sampling (NS) designs. These sampling designs are useful for obtaining more representative sample units from the tails of the population distribution using the available auxiliary ranking information. However, one common difficulty in performing NS in practice is that the researcher cannot obtain a nominated sample unless he/she uniquely determines the sample unit with the highest or the lowest rank in each set. To overcome this problem, a variant of NS, which is called partial nomination sampling, is proposed, in which the researcher is allowed to declare that two or more units are tied in the ranks whenever he/she cannot find the sample unit with the highest or the lowest rank. Based on this sampling design, two asymptotically unbiased estimators are developed for the cumulative distribution function, which is obtained using maximum likelihood and moment-based approaches, and their asymptotic normalities are proved. Several numerical studies have shown that the proposed estimators have higher relative efficiencies than their counterparts in simple random sampling in analyzing either the upper or the lower tail of the parent distribution. The procedures that we developed are then implemented on a real dataset from the Third National Health and Nutrition Examination Survey (NHANES III) to estimate the prevalence of osteoporosis among adult women aged 50 and over. It is shown that in certain circumstances, the techniques that we have developed require only one-third of the sample size needed in SRS to achieve the desired precision. This results in a considerable reduction in time and cost compared to the standard SRS method.

本文的重点是基于最大值或最小值提名抽样(NS)设计的新型变体进行统计推断。这些抽样设计有助于利用现有的辅助排序信息,从总体分布的尾部获得更具代表性的样本单位。然而,在实践中执行提名抽样的一个常见困难是,除非研究人员唯一确定每组中排名最高或最低的样本单位,否则无法获得提名样本。为了克服这个问题,我们提出了 NS 的一种变体,即部分提名抽样,允许研究人员在找不到排名最高或最低的样本单位时,宣布两个或两个以上的单位排名并列。基于这种抽样设计,利用最大似然法和基于矩的方法为累积分布函数建立了两个渐近无偏估计器,并证明了它们的渐近正态性。几项数值研究表明,在分析母分布的上尾或下尾时,所提出的估计器比简单随机抽样中的同类估计器具有更高的相对效率。随后,我们在第三次全国健康与营养调查(NHANES III)的真实数据集上实施了所开发的程序,以估计 50 岁及以上成年女性的骨质疏松症患病率。结果表明,在某些情况下,我们开发的技术只需要 SRS 所需的样本量的三分之一就能达到预期精度。与标准 SRS 方法相比,这大大减少了时间和成本。
{"title":"Statistical inference from partially nominated sets: An application to estimating the prevalence of osteoporosis among adult women","authors":"Zeinab Akbari Ghamsari ,&nbsp;Ehsan Zamanzade ,&nbsp;Majid Asadi","doi":"10.1016/j.jspi.2024.106214","DOIUrl":"10.1016/j.jspi.2024.106214","url":null,"abstract":"<div><p>This paper focuses on drawing statistical inference based on a novel variant of maxima or minima nomination sampling (NS) designs. These sampling designs are useful for obtaining more representative sample units from the tails of the population distribution using the available auxiliary ranking information. However, one common difficulty in performing NS in practice is that the researcher cannot obtain a nominated sample unless he/she uniquely determines the sample unit with the highest or the lowest rank in each set. To overcome this problem, a variant of NS, which is called partial nomination sampling, is proposed, in which the researcher is allowed to declare that two or more units are tied in the ranks whenever he/she cannot find the sample unit with the highest or the lowest rank. Based on this sampling design, two asymptotically unbiased estimators are developed for the cumulative distribution function, which is obtained using maximum likelihood and moment-based approaches, and their asymptotic normalities are proved. Several numerical studies have shown that the proposed estimators have higher relative efficiencies than their counterparts in simple random sampling in analyzing either the upper or the lower tail of the parent distribution. The procedures that we developed are then implemented on a real dataset from the Third National Health and Nutrition Examination Survey (NHANES III) to estimate the prevalence of osteoporosis among adult women aged 50 and over. It is shown that in certain circumstances, the techniques that we have developed require only one-third of the sample size needed in SRS to achieve the desired precision. This results in a considerable reduction in time and cost compared to the standard SRS method.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stable convergence of conditional least squares estimators for supercritical continuous state and continuous time branching processes with immigration 有移民的超临界连续状态和连续时间分支过程的条件最小二乘估计子的稳定收敛性
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-22 DOI: 10.1016/j.jspi.2024.106213
Mátyás Barczy

We prove stable convergence of conditional least squares estimators of drift parameters for supercritical continuous state and continuous time branching processes with immigration based on discrete time observations.

我们证明了超临界连续状态和连续时间分支过程的漂移参数条件最小二乘法估计值的稳定收敛性,并基于离散时间观测结果进行了移民。
{"title":"Stable convergence of conditional least squares estimators for supercritical continuous state and continuous time branching processes with immigration","authors":"Mátyás Barczy","doi":"10.1016/j.jspi.2024.106213","DOIUrl":"10.1016/j.jspi.2024.106213","url":null,"abstract":"<div><p>We prove stable convergence of conditional least squares estimators of drift parameters for supercritical continuous state and continuous time branching processes with immigration based on discrete time observations.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Some clustering-based change-point detection methods applicable to high dimension, low sample size data 一些适用于高维度、低样本量数据的基于聚类的变化点检测方法
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-16 DOI: 10.1016/j.jspi.2024.106212
Trisha Dawn , Angshuman Roy , Alokesh Manna , Anil K. Ghosh

Detection of change-points in a sequence of high dimensional observations is a challenging problem, and this becomes even more challenging when the sample size (i.e., the sequence length) is small. In this article, we propose some change-point detection methods based on clustering, which can be conveniently used in such high dimension, low sample size situations. First, we consider the single change-point problem. Using k-means clustering based on a suitable dissimilarity measures, we propose some methods for testing the existence of a change-point and estimating its location. High dimensional behavior of these proposed methods are investigated under appropriate regularity conditions. Next, we extend our methods for detection of multiple change-points. We carry out extensive numerical studies and analyze a real data set to compare the performance of our proposed methods with some state-of-the-art methods.

在高维观测序列中检测变化点是一个具有挑战性的问题,而当样本量(即序列长度)较小时,这个问题就变得更具挑战性。在本文中,我们提出了一些基于聚类的变化点检测方法,可以方便地用于这种高维度、低样本量的情况。首先,我们考虑单个变化点问题。利用基于合适的异或度量的均值聚类,我们提出了一些检测变化点是否存在并估计其位置的方法。在适当的正则条件下,我们对这些方法的高维行为进行了研究。接下来,我们扩展了检测多个变化点的方法。我们进行了大量的数值研究,并分析了一个真实数据集,将我们提出的方法与一些最先进的方法进行了性能比较。
{"title":"Some clustering-based change-point detection methods applicable to high dimension, low sample size data","authors":"Trisha Dawn ,&nbsp;Angshuman Roy ,&nbsp;Alokesh Manna ,&nbsp;Anil K. Ghosh","doi":"10.1016/j.jspi.2024.106212","DOIUrl":"10.1016/j.jspi.2024.106212","url":null,"abstract":"<div><p>Detection of change-points in a sequence of high dimensional observations is a challenging problem, and this becomes even more challenging when the sample size (i.e., the sequence length) is small. In this article, we propose some change-point detection methods based on clustering, which can be conveniently used in such high dimension, low sample size situations. First, we consider the single change-point problem. Using <span><math><mi>k</mi></math></span>-means clustering based on a suitable dissimilarity measures, we propose some methods for testing the existence of a change-point and estimating its location. High dimensional behavior of these proposed methods are investigated under appropriate regularity conditions. Next, we extend our methods for detection of multiple change-points. We carry out extensive numerical studies and analyze a real data set to compare the performance of our proposed methods with some state-of-the-art methods.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regression to the mean for overdispersed count data 过度分散计数数据的均值回归
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-05 DOI: 10.1016/j.jspi.2024.106211
Kiran Iftikhar , Manzoor Khan , Jake Olivier

In repeated measurements, regression to the mean (RTM) is a tendency of subjects with observed extreme values to move closer to the mean when measured a second time. Not accounting for RTM could lead to incorrect decisions such as when observed natural variation is incorrectly attributed to the effect of a treatment/intervention. A strategy for addressing RTM is to decompose the total effect, the expected difference in paired random variables conditional on the first being in the tail of its distribution, into regression to the mean and unbiased treatment effects. The unbiased treatment effect can then be estimated by subtraction. Formulae are available in the literature to quantify RTM for Poisson distributed data which are constrained by mean–variance equivalence, although there are many real life examples of overdispersed count data that are not well approximated by the Poisson. The negative binomial can be considered an explicit overdispersed Poisson process where the Poisson intensity is chosen from a gamma distribution. In this study, the truncated bivariate negative binomial distribution is used to decompose the total effect formulae into RTM and treatment effects. Maximum likelihood estimators (MLE) and method of moments estimators are developed for the total, RTM, and treatment effects. A simulation study is carried out to investigate the properties of the estimators and compare them with those developed under the assumption of the Poisson process. Data on the incidence of dengue cases reported from 2007 to 2017 are used to estimate the total, RTM, and treatment effects.

在重复测量中,均值回归(RTM)是指观察到极值的受试者在第二次测量时向均值靠拢的趋势。不考虑 RTM 可能会导致错误的决策,例如将观察到的自然变化错误地归因于治疗/干预的效果。处理 RTM 的一种策略是将总效应(即配对随机变量的预期差异,条件是第一个变量处于其分布的尾部)分解为回归均值效应和无偏治疗效应。然后通过减法估算无偏治疗效果。尽管现实生活中有许多过度分散的计数数据不能很好地用泊松来近似,但文献中仍有一些公式可以量化泊松分布数据的 RTM。负二项分布可视为一个明确的过分散泊松过程,其中泊松强度是从伽马分布中选择的。在本研究中,截断的二元负二项分布用于将总效应公式分解为 RTM 和治疗效应。为总效应、RTM 和治疗效应开发了最大似然估计器(MLE)和矩估计法。通过模拟研究调查了估计器的特性,并与在泊松过程假设下开发的估计器进行了比较。2007 年至 2017 年登革热病例报告的发病率数据用于估计总效应、RTM效应和治疗效应。
{"title":"Regression to the mean for overdispersed count data","authors":"Kiran Iftikhar ,&nbsp;Manzoor Khan ,&nbsp;Jake Olivier","doi":"10.1016/j.jspi.2024.106211","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106211","url":null,"abstract":"<div><p>In repeated measurements, regression to the mean (RTM) is a tendency of subjects with observed extreme values to move closer to the mean when measured a second time. Not accounting for RTM could lead to incorrect decisions such as when observed natural variation is incorrectly attributed to the effect of a treatment/intervention. A strategy for addressing RTM is to decompose the <em>total effect</em>, the expected difference in paired random variables conditional on the first being in the tail of its distribution, into regression to the mean and unbiased treatment effects. The unbiased treatment effect can then be estimated by subtraction. Formulae are available in the literature to quantify RTM for Poisson distributed data which are constrained by mean–variance equivalence, although there are many real life examples of overdispersed count data that are not well approximated by the Poisson. The negative binomial can be considered an explicit overdispersed Poisson process where the Poisson intensity is chosen from a gamma distribution. In this study, the truncated bivariate negative binomial distribution is used to decompose the total effect formulae into RTM and treatment effects. Maximum likelihood estimators (MLE) and method of moments estimators are developed for the total, RTM, and treatment effects. A simulation study is carried out to investigate the properties of the estimators and compare them with those developed under the assumption of the Poisson process. Data on the incidence of dengue cases reported from 2007 to 2017 are used to estimate the total, RTM, and treatment effects.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141606665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Statistical Planning and Inference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1