首页 > 最新文献

Journal of Statistical Planning and Inference最新文献

英文 中文
Tuning differential evolution algorithm for constructing uniform projection designs 构造均匀投影设计的差分进化优化算法
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-10-21 DOI: 10.1016/j.jspi.2025.106356
Samuel Onyambu, Hongquan Xu
Space-filling designs are extensively used in computer experiments to analyze complex systems. Among these, uniform projection designs stand out for their desirable low-dimensional projection properties and robustness against other criteria. However, no efficient algorithm currently exists for generating such designs. This study explores the construction of uniform projection designs using a differential evolution (DE) algorithm. DE, an evolutionary algorithm, is known for its simplicity, robustness, and effectiveness in solving complex optimization problems, though its performance is highly sensitive to several hyperparameters. Our goal is to investigate the structure of the hyperparameter space, evaluate the contribution of each hyperparameter, and provide guidelines for optimal hyperparameter settings across various scenarios. To achieve this, we conduct a comprehensive comparison of different experimental designs and surrogate models.
空间填充设计在分析复杂系统的计算机实验中被广泛使用。其中,均匀投影设计以其理想的低维投影特性和对其他标准的鲁棒性而脱颖而出。然而,目前还没有有效的算法来生成这样的设计。本研究探讨了使用差分进化(DE)算法构建均匀投影设计。DE是一种进化算法,以其简单性、鲁棒性和解决复杂优化问题的有效性而闻名,尽管它的性能对几个超参数非常敏感。我们的目标是研究超参数空间的结构,评估每个超参数的贡献,并为各种场景下的最佳超参数设置提供指导。为了实现这一点,我们对不同的实验设计和替代模型进行了全面的比较。
{"title":"Tuning differential evolution algorithm for constructing uniform projection designs","authors":"Samuel Onyambu,&nbsp;Hongquan Xu","doi":"10.1016/j.jspi.2025.106356","DOIUrl":"10.1016/j.jspi.2025.106356","url":null,"abstract":"<div><div>Space-filling designs are extensively used in computer experiments to analyze complex systems. Among these, uniform projection designs stand out for their desirable low-dimensional projection properties and robustness against other criteria. However, no efficient algorithm currently exists for generating such designs. This study explores the construction of uniform projection designs using a differential evolution (DE) algorithm. DE, an evolutionary algorithm, is known for its simplicity, robustness, and effectiveness in solving complex optimization problems, though its performance is highly sensitive to several hyperparameters. Our goal is to investigate the structure of the hyperparameter space, evaluate the contribution of each hyperparameter, and provide guidelines for optimal hyperparameter settings across various scenarios. To achieve this, we conduct a comprehensive comparison of different experimental designs and surrogate models.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106356"},"PeriodicalIF":0.8,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145362408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable selection in high-dimensional varying coefficient panel data models with fixed effects 固定效应高维变系数面板数据模型的变量选择
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-30 DOI: 10.1016/j.jspi.2025.106355
Yiping Yang , Peixin Zhao
To address the challenges of variable selection in panel data models with fixed effects and varying coefficients, we introduce a novel method that combines basis function approximations with group nonconcave penalty functions. By utilizing a forward orthogonal deviation transformation, we eliminate fixed effects, allowing us to select significant variables and estimate non-zero coefficient functions. Under certain regularity conditions, we demonstrate that our method consistently identifies the true model structure, and the resulting estimators exhibit oracle properties. For computational efficiency, we have developed a group gradient descent algorithm that incorporates a transformation of the penalty terms. Simulation studies reveal that nonconvex penalties (SCAD/MCP) outperform the Lasso across various performance metrics. Furthermore, compared to existing methods, our approach significantly reduces false positives (FPs). To demonstrate the practical applicability and effectiveness of our method, we present an analysis of a real dataset.
为了解决固定效应和变系数面板数据模型中变量选择的挑战,我们提出了一种结合基函数逼近和群非凹惩罚函数的新方法。通过利用正向正交偏差变换,我们消除了固定效应,允许我们选择重要变量并估计非零系数函数。在一定的规则条件下,我们证明了我们的方法一致地识别了真实的模型结构,并且得到的估计器显示了oracle属性。为了提高计算效率,我们开发了一种包含惩罚项变换的群梯度下降算法。仿真研究表明,非凸惩罚(SCAD/MCP)在各种性能指标上都优于Lasso。此外,与现有方法相比,我们的方法显著降低了误报(FPs)。为了证明我们的方法的实用性和有效性,我们给出了一个真实数据集的分析。
{"title":"Variable selection in high-dimensional varying coefficient panel data models with fixed effects","authors":"Yiping Yang ,&nbsp;Peixin Zhao","doi":"10.1016/j.jspi.2025.106355","DOIUrl":"10.1016/j.jspi.2025.106355","url":null,"abstract":"<div><div>To address the challenges of variable selection in panel data models with fixed effects and varying coefficients, we introduce a novel method that combines basis function approximations with group nonconcave penalty functions. By utilizing a forward orthogonal deviation transformation, we eliminate fixed effects, allowing us to select significant variables and estimate non-zero coefficient functions. Under certain regularity conditions, we demonstrate that our method consistently identifies the true model structure, and the resulting estimators exhibit oracle properties. For computational efficiency, we have developed a group gradient descent algorithm that incorporates a transformation of the penalty terms. Simulation studies reveal that nonconvex penalties (SCAD/MCP) outperform the Lasso across various performance metrics. Furthermore, compared to existing methods, our approach significantly reduces false positives (FPs). To demonstrate the practical applicability and effectiveness of our method, we present an analysis of a real dataset.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106355"},"PeriodicalIF":0.8,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal inference in early phase clinical trials: Variance decomposition and order of patient inclusion 早期临床试验的因果推断:方差分解和患者纳入顺序
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-29 DOI: 10.1016/j.jspi.2025.106352
Matthieu Clertant , Meliha Akouba , Alexia Iasonos , John O’Quigley
Causal inference tools, in particular those of variance decomposition, hierarchical data structures and counterfactuals, are applied to the study of the methodology of dose-finding studies in oncology. A detailed variance decomposition brings into a much sharper focus the relative performance of different designs. We develop and present new results on the role played by the order of patient inclusions into a sequential dose-finding study. These results make it clear why, previously, authors could easily be misled into a conclusion that different designs enjoy similar performances. This is not so and we show how to avoid making that mistake. We highlight our findings via both theoretical and numerical studies.
因果推理工具,特别是方差分解、分层数据结构和反事实的工具,应用于肿瘤学剂量发现研究方法的研究。详细的方差分解使不同设计的相对性能得到更清晰的关注。我们开发并提出了新的结果,在顺序的剂量发现研究中,患者包裹体的顺序所起的作用。这些结果清楚地表明,为什么以前,作者很容易被误导得出不同设计具有相似性能的结论。事实并非如此,我们将展示如何避免犯这种错误。我们通过理论和数值研究强调了我们的发现。
{"title":"Causal inference in early phase clinical trials: Variance decomposition and order of patient inclusion","authors":"Matthieu Clertant ,&nbsp;Meliha Akouba ,&nbsp;Alexia Iasonos ,&nbsp;John O’Quigley","doi":"10.1016/j.jspi.2025.106352","DOIUrl":"10.1016/j.jspi.2025.106352","url":null,"abstract":"<div><div>Causal inference tools, in particular those of variance decomposition, hierarchical data structures and counterfactuals, are applied to the study of the methodology of dose-finding studies in oncology. A detailed variance decomposition brings into a much sharper focus the relative performance of different designs. We develop and present new results on the role played by the order of patient inclusions into a sequential dose-finding study. These results make it clear why, previously, authors could easily be misled into a conclusion that different designs enjoy similar performances. This is not so and we show how to avoid making that mistake. We highlight our findings via both theoretical and numerical studies.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106352"},"PeriodicalIF":0.8,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145267234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The k-sample Behrens-Fisher problem for high-dimensional data with model free assumption 具有无模型假设的高维数据k样本Behrens-Fisher问题
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-27 DOI: 10.1016/j.jspi.2025.106354
Yanbo Pei, Xiaoxiao Ren, Baoxue Zhang
The problem of testing the equality of k-sample mean vectors with different covariance matrices, known as the Behrens-Fisher (BF) problem for k-sample, is a significant issue in statistics. Hu and Bai (2017) proposed a test statistic that operates under a factor-like model structure assumption and demonstrated its normal limit. Building on this work, we further explore the asymptotic properties of the test statistic. We prove that the asymptotic null distribution of the test statistic is a Chi-square-type mixture distribution under a model-free assumption and establish its asymptotic power under a full alternative hypothesis. Moreover, we show that the asymptotic null distribution of the test statistic is either normal or a weighted sum of normal and Chi-square random variables, depending on the convergence rate of the eigenvalues of the covariance matrix with model free assumption. To address practical challenges in high-dimensional data, we propose a new weighted bootstrap procedure that is simple to implement. Simulation studies demonstrate that our proposed test procedure outperforms existing methods in terms of size control under various settings. Furthermore, real data applications illustrate the applicability of our test procedure to a variety of high-dimensional data analysis problems.
用不同的协方差矩阵检验k-样本均值向量是否相等的问题,被称为k-样本的Behrens-Fisher (BF)问题,是统计学中的一个重要问题。Hu和Bai(2017)提出了在类因子模型结构假设下运行的检验统计量,并证明了其正常极限。在此基础上,我们进一步探讨了检验统计量的渐近性质。在无模型假设下证明了检验统计量的渐近零分布是一个卡方型混合分布,在完全备择假设下证明了检验统计量的渐近幂。此外,我们证明了检验统计量的渐近零分布要么是正态分布,要么是正态和卡方随机变量的加权和,这取决于在无模型假设下协方差矩阵的特征值的收敛速度。为了解决高维数据中的实际挑战,我们提出了一种新的加权自举过程,该过程易于实现。仿真研究表明,我们提出的测试程序在各种设置下的尺寸控制方面优于现有方法。此外,实际数据应用说明了我们的测试程序对各种高维数据分析问题的适用性。
{"title":"The k-sample Behrens-Fisher problem for high-dimensional data with model free assumption","authors":"Yanbo Pei,&nbsp;Xiaoxiao Ren,&nbsp;Baoxue Zhang","doi":"10.1016/j.jspi.2025.106354","DOIUrl":"10.1016/j.jspi.2025.106354","url":null,"abstract":"<div><div>The problem of testing the equality of <em>k</em>-sample mean vectors with different covariance matrices, known as the Behrens-Fisher (BF) problem for <em>k</em>-sample, is a significant issue in statistics. Hu and Bai (2017) proposed a test statistic that operates under a factor-like model structure assumption and demonstrated its normal limit. Building on this work, we further explore the asymptotic properties of the test statistic. We prove that the asymptotic null distribution of the test statistic is a Chi-square-type mixture distribution under a model-free assumption and establish its asymptotic power under a full alternative hypothesis. Moreover, we show that the asymptotic null distribution of the test statistic is either normal or a weighted sum of normal and Chi-square random variables, depending on the convergence rate of the eigenvalues of the covariance matrix with model free assumption. To address practical challenges in high-dimensional data, we propose a new weighted bootstrap procedure that is simple to implement. Simulation studies demonstrate that our proposed test procedure outperforms existing methods in terms of size control under various settings. Furthermore, real data applications illustrate the applicability of our test procedure to a variety of high-dimensional data analysis problems.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106354"},"PeriodicalIF":0.8,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint distribution of numbers of occurrences of countably many runs of specified lengths in a sequence of discrete random variables 离散随机变量序列中指定长度的可数多次运行的出现次数的联合分布
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-25 DOI: 10.1016/j.jspi.2025.106353
Kiyoshi Inoue
In this paper, we consider the joint distribution of numbers of occurrences of countably many runs of several lengths in a sequence of nonnegative integer valued independent and identically distributed random variables through the generating functions. We propose a generalization of the potential partition polynomials, which gives effective computational tools for the derivation of probability functions. The waiting time problems associated with infinitely many runs are investigated and formulae for the evaluation of the generating functions are given. The results presented here provide a wide framework for developing the multivariate distribution theory of runs. Finally, we discuss several applications and numerical examples to show how our theoretical results are applied to the investigation of runs, as well as parameter estimation problems.
本文通过生成函数研究了非负整数值独立同分布随机变量序列中若干长度的可数多次运行的出现次数的联合分布。我们提出了一种潜在配分多项式的推广方法,它为概率函数的推导提供了有效的计算工具。研究了无限次运行的等待时间问题,给出了生成函数的求值公式。本文提出的结果为发展多变量运行分布理论提供了一个广泛的框架。最后,我们讨论了几个应用和数值例子,以显示我们的理论结果如何应用于研究运行,以及参数估计问题。
{"title":"Joint distribution of numbers of occurrences of countably many runs of specified lengths in a sequence of discrete random variables","authors":"Kiyoshi Inoue","doi":"10.1016/j.jspi.2025.106353","DOIUrl":"10.1016/j.jspi.2025.106353","url":null,"abstract":"<div><div>In this paper, we consider the joint distribution of numbers of occurrences of countably many runs of several lengths in a sequence of nonnegative integer valued independent and identically distributed random variables through the generating functions. We propose a generalization of the potential partition polynomials, which gives effective computational tools for the derivation of probability functions. The waiting time problems associated with infinitely many runs are investigated and formulae for the evaluation of the generating functions are given. The results presented here provide a wide framework for developing the multivariate distribution theory of runs. Finally, we discuss several applications and numerical examples to show how our theoretical results are applied to the investigation of runs, as well as parameter estimation problems.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106353"},"PeriodicalIF":0.8,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Orthogonal Latin hypercube designs with hidden low-dimensional projection 具有隐藏低维投影的正交拉丁超立方体设计
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-24 DOI: 10.1016/j.jspi.2025.106349
Tian-fang Zhang , Yue-ru Yan , Fasheng Sun
Orthogonal Latin hypercube designs are widely used in computer experiments because of their attractive properties. In this article, we develop a new grouping method to construct such designs. Compared to the existing results, the new constructed designs can accommodate more factors with the same runsize, which means they are more cost-effective. Moreover, the resulting designs possess not only orthogonality, but also appealing space-filling properties in low dimensions, which make them very suitable for computer experiments.
正交拉丁超立方体设计由于其诱人的特性在计算机实验中得到了广泛的应用。在本文中,我们开发了一种新的分组方法来构造这样的设计。与现有的结果相比,新构建的设计可以在相同的运行尺寸下容纳更多的因素,这意味着它们更具成本效益。此外,所得到的设计不仅具有正交性,而且在低维空间中具有吸引人的空间填充特性,这使它们非常适合计算机实验。
{"title":"Orthogonal Latin hypercube designs with hidden low-dimensional projection","authors":"Tian-fang Zhang ,&nbsp;Yue-ru Yan ,&nbsp;Fasheng Sun","doi":"10.1016/j.jspi.2025.106349","DOIUrl":"10.1016/j.jspi.2025.106349","url":null,"abstract":"<div><div>Orthogonal Latin hypercube designs are widely used in computer experiments because of their attractive properties. In this article, we develop a new grouping method to construct such designs. Compared to the existing results, the new constructed designs can accommodate more factors with the same runsize, which means they are more cost-effective. Moreover, the resulting designs possess not only orthogonality, but also appealing space-filling properties in low dimensions, which make them very suitable for computer experiments.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106349"},"PeriodicalIF":0.8,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust and computationally efficient gradient-based estimation 稳健且计算效率高的梯度估计
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-22 DOI: 10.1016/j.jspi.2025.106351
Yibo Yan , Xiaozhou Wang , Riquan Zhang
In this paper, we propose a class of estimators based on the robust and computationally efficient gradient estimation for both low- and high-dimensional risk minimization framework. The gradient estimation in this work is constructed using a series of newly proposed univariate robust and efficient mean estimators. Our proposed estimators are obtained iteratively using a variant of the gradient descent method, where the update direction is determined by a robust and computationally efficient gradient. These estimators not only have explicit expressions and can be obtained through arithmetic operations but are also robust to arbitrary outliers in common statistical models. Theoretically, we establish the convergence of the algorithms and derive non-asymptotic error bounds for these iterative estimators. Specifically, we apply our methods to linear and logistic regression models, achieving robust parameter estimates and corresponding excess risk bounds. Unlike previous work, our theoretical results rely on a magnitude function of the outliers, which captures the extent of their deviation from the inliers. Finally, we present extensive simulation experiments on both low- and high-dimensional linear models to demonstrate the superior performance of our proposed estimators compared to several baseline methods.
本文针对低维和高维风险最小化框架,提出了一类基于鲁棒性和计算效率高的梯度估计。本文中的梯度估计是使用一系列新提出的单变量稳健高效均值估计量来构造的。我们提出的估计量是使用梯度下降法的一种变体迭代获得的,其中更新方向由一个鲁棒且计算效率高的梯度决定。这些估计量不仅具有显式表达式,可以通过算术运算得到,而且对常见统计模型中的任意离群值具有鲁棒性。在理论上,我们建立了算法的收敛性,并推导了这些迭代估计的非渐近误差界。具体来说,我们将我们的方法应用于线性和逻辑回归模型,实现了鲁棒参数估计和相应的超额风险界限。与以前的工作不同,我们的理论结果依赖于离群值的大小函数,它捕获了离群值与内线的偏差程度。最后,我们在低维和高维线性模型上进行了广泛的模拟实验,以证明与几种基线方法相比,我们提出的估计器具有优越的性能。
{"title":"Robust and computationally efficient gradient-based estimation","authors":"Yibo Yan ,&nbsp;Xiaozhou Wang ,&nbsp;Riquan Zhang","doi":"10.1016/j.jspi.2025.106351","DOIUrl":"10.1016/j.jspi.2025.106351","url":null,"abstract":"<div><div>In this paper, we propose a class of estimators based on the robust and computationally efficient gradient estimation for both low- and high-dimensional risk minimization framework. The gradient estimation in this work is constructed using a series of newly proposed univariate robust and efficient mean estimators. Our proposed estimators are obtained iteratively using a variant of the gradient descent method, where the update direction is determined by a robust and computationally efficient gradient. These estimators not only have explicit expressions and can be obtained through arithmetic operations but are also robust to arbitrary outliers in common statistical models. Theoretically, we establish the convergence of the algorithms and derive non-asymptotic error bounds for these iterative estimators. Specifically, we apply our methods to linear and logistic regression models, achieving robust parameter estimates and corresponding excess risk bounds. Unlike previous work, our theoretical results rely on a magnitude function of the outliers, which captures the extent of their deviation from the inliers. Finally, we present extensive simulation experiments on both low- and high-dimensional linear models to demonstrate the superior performance of our proposed estimators compared to several baseline methods.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106351"},"PeriodicalIF":0.8,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing goodness-of-fit for sparse categories using Rényi divergence 利用rsamnyi散度评估稀疏分类的拟合优度
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-19 DOI: 10.1016/j.jspi.2025.106350
Raul Matsushita , Gabriel Gomes , Regina Da Fonseca , Eduardo Nakano , Roberto Vila
We present the Rényi divergence as a statistic for assessing goodness-of-fit in sparse frequency tables, where small expected counts can undermine the reliability of the traditional chi-square test. The Rényi divergence with index in (0,1) is a natural choice because it circumvents division-related issues by small frequencies. Our main result demonstrates that the Rényi statistic asymptotically follows a chi-square distribution. Through theoretical insights and Monte Carlo simulations, we evaluate the performance of the Rényi statistic across various values of the divergence index. We find that smaller index values improve the alignment of the Rényi statistic with the chi-square distribution and enhance its performance in sparse data settings. Additionally, the Rényi statistic exhibits good power properties in detecting deviations from the null hypothesis under these conditions. To illustrate its practical applicability, we present two real-world data analyses, highlighting the robustness of the Rényi divergence in scenarios involving sparse categories.
我们将r尼散度作为稀疏频率表中评估拟合优度的统计量,其中较小的期望计数会破坏传统卡方检验的可靠性。指数为(0,1)的r nyi散度是一种自然选择,因为它通过小频率规避了与除法相关的问题。我们的主要结果表明,r逍遥统计量渐近地遵循卡方分布。通过理论分析和蒙特卡罗模拟,我们评估了rsamnyi统计在不同散度指数值上的性能。我们发现,较小的指数值改善了rsami统计量与卡方分布的一致性,并提高了其在稀疏数据设置中的性能。此外,在这些条件下,rsamnyi统计量在检测零假设偏差方面表现出良好的功率特性。为了说明它的实际适用性,我们提出了两个真实世界的数据分析,强调了在涉及稀疏类别的情况下rsamnyi分歧的鲁棒性。
{"title":"Assessing goodness-of-fit for sparse categories using Rényi divergence","authors":"Raul Matsushita ,&nbsp;Gabriel Gomes ,&nbsp;Regina Da Fonseca ,&nbsp;Eduardo Nakano ,&nbsp;Roberto Vila","doi":"10.1016/j.jspi.2025.106350","DOIUrl":"10.1016/j.jspi.2025.106350","url":null,"abstract":"<div><div>We present the Rényi divergence as a statistic for assessing goodness-of-fit in sparse frequency tables, where small expected counts can undermine the reliability of the traditional chi-square test. The Rényi divergence with index in (0,1) is a natural choice because it circumvents division-related issues by small frequencies. Our main result demonstrates that the Rényi statistic asymptotically follows a chi-square distribution. Through theoretical insights and Monte Carlo simulations, we evaluate the performance of the Rényi statistic across various values of the divergence index. We find that smaller index values improve the alignment of the Rényi statistic with the chi-square distribution and enhance its performance in sparse data settings. Additionally, the Rényi statistic exhibits good power properties in detecting deviations from the null hypothesis under these conditions. To illustrate its practical applicability, we present two real-world data analyses, highlighting the robustness of the Rényi divergence in scenarios involving sparse categories.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106350"},"PeriodicalIF":0.8,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145105738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consistent community detection approach in the nonparametric weighted stochastic blockmodel with unspecified number of communities 非参数加权随机块模型中未指定社团数的一致性社团检测方法
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-13 DOI: 10.1016/j.jspi.2025.106339
Fei Ye , Jingsong Xiao , Weidong Ma , Yulai Miao , Ying Yang
The stochastic blockmodel (SBM) is a widely used model for representing graphs. Numerous approaches have been applied to the SBM to detect latent community structures in graphs, typically using two types of consistency (strong and weak) to evaluate their performance. Most of these methods have been studied and shown to be consistent under the SBM framework. However, the consistency of the weighted SBM, an important extension of the SBM, has been largely overlooked. Moreover, few approaches are capable of detecting communities when the number of communities is unknown. In this paper, we propose a nonparametric method for effective community detection under the assortative, nonparametric weighted SBM with an unknown number of communities, and we establish the consistency of our approach. We introduce a novel concept, “consistency in relationship”, as a more practical criterion to assess the performance of community detection algorithms. Since solving the optimization problem in our approach becomes intractable for large sample sizes, we propose an efficient algorithm to approximate it. Simulations demonstrate that our community detection method is both efficient and robust, particularly for unbalanced networks. We illustrate the effectiveness of our approach on three real-world networks.
随机块模型(SBM)是一种广泛使用的图表示模型。许多方法已经应用于SBM来检测图中的潜在群落结构,通常使用两种类型的一致性(强和弱)来评估它们的性能。这些方法大多已被研究,并显示在SBM框架下是一致的。然而,加权SBM的一致性是SBM的重要延伸,在很大程度上被忽视了。此外,很少有方法能够在社区数量未知的情况下检测社区。本文提出了一种非参数方法,用于在未知社团数量的分类、非参数加权SBM下进行有效的社团检测,并验证了该方法的一致性。我们引入了一个新的概念,“关系一致性”,作为评估社区检测算法性能的一个更实用的标准。由于在我们的方法中求解优化问题对于大样本量变得难以处理,我们提出了一个有效的算法来近似它。仿真结果表明,该方法具有较好的鲁棒性和有效性,尤其适用于不平衡网络。我们在三个现实世界的网络上说明了我们的方法的有效性。
{"title":"Consistent community detection approach in the nonparametric weighted stochastic blockmodel with unspecified number of communities","authors":"Fei Ye ,&nbsp;Jingsong Xiao ,&nbsp;Weidong Ma ,&nbsp;Yulai Miao ,&nbsp;Ying Yang","doi":"10.1016/j.jspi.2025.106339","DOIUrl":"10.1016/j.jspi.2025.106339","url":null,"abstract":"<div><div>The stochastic blockmodel (SBM) is a widely used model for representing graphs. Numerous approaches have been applied to the SBM to detect latent community structures in graphs, typically using two types of consistency (strong and weak) to evaluate their performance. Most of these methods have been studied and shown to be consistent under the SBM framework. However, the consistency of the weighted SBM, an important extension of the SBM, has been largely overlooked. Moreover, few approaches are capable of detecting communities when the number of communities is unknown. In this paper, we propose a nonparametric method for effective community detection under the assortative, nonparametric weighted SBM with an unknown number of communities, and we establish the consistency of our approach. We introduce a novel concept, “consistency in relationship”, as a more practical criterion to assess the performance of community detection algorithms. Since solving the optimization problem in our approach becomes intractable for large sample sizes, we propose an efficient algorithm to approximate it. Simulations demonstrate that our community detection method is both efficient and robust, particularly for unbalanced networks. We illustrate the effectiveness of our approach on three real-world networks.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106339"},"PeriodicalIF":0.8,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145105737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
D-criterion based optimal subsampling in Poisson regression with one covariate 单协变量泊松回归中基于d准则的最优子抽样
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-12 DOI: 10.1016/j.jspi.2025.106340
Torsten Glemser, Rainer Schwabe
The goal of subsampling is to select an informative subset of all observations, when using the full data for statistical analysis is not viable. We construct locally D-optimal subsampling designs under a Poisson regression model with a log link in one covariate. A representation of the support of locally D-optimal subsampling designs is established. We make statements on scale-location transformations of the covariate that require a simultaneous transformation of the regression parameter. The performance of the methods is demonstrated by illustrating examples. To show the advantage of the optimal subsampling designs, we examine the efficiency of uniform random subsampling as well as of two heuristic designs. Further, the efficiency of locally D-optimal subsampling designs is studied when the parameter is misspecified.
当使用全部数据进行统计分析是不可行的时候,子抽样的目标是在所有观测中选择一个信息丰富的子集。我们在一个协变量有log链接的泊松回归模型下构造了局部d -最优子抽样设计。建立了局部d最优子抽样设计支持度的表示。我们对协变量的尺度-位置变换作了陈述,这些变换要求同时对回归参数进行变换。通过算例验证了方法的有效性。为了展示最优子抽样设计的优势,我们考察了均匀随机子抽样和两种启发式设计的效率。在此基础上,研究了局部d最优子抽样设计在参数不确定情况下的效率。
{"title":"D-criterion based optimal subsampling in Poisson regression with one covariate","authors":"Torsten Glemser,&nbsp;Rainer Schwabe","doi":"10.1016/j.jspi.2025.106340","DOIUrl":"10.1016/j.jspi.2025.106340","url":null,"abstract":"<div><div>The goal of subsampling is to select an informative subset of all observations, when using the full data for statistical analysis is not viable. We construct locally <span><math><mi>D</mi></math></span>-optimal subsampling designs under a Poisson regression model with a log link in one covariate. A representation of the support of locally <span><math><mi>D</mi></math></span>-optimal subsampling designs is established. We make statements on scale-location transformations of the covariate that require a simultaneous transformation of the regression parameter. The performance of the methods is demonstrated by illustrating examples. To show the advantage of the optimal subsampling designs, we examine the efficiency of uniform random subsampling as well as of two heuristic designs. Further, the efficiency of locally <span><math><mi>D</mi></math></span>-optimal subsampling designs is studied when the parameter is misspecified.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106340"},"PeriodicalIF":0.8,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145118484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Statistical Planning and Inference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1