首页 > 最新文献

Journal of Statistical Planning and Inference最新文献

英文 中文
The k-sample Behrens-Fisher problem for high-dimensional data with model free assumption 具有无模型假设的高维数据k样本Behrens-Fisher问题
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-27 DOI: 10.1016/j.jspi.2025.106354
Yanbo Pei, Xiaoxiao Ren, Baoxue Zhang
The problem of testing the equality of k-sample mean vectors with different covariance matrices, known as the Behrens-Fisher (BF) problem for k-sample, is a significant issue in statistics. Hu and Bai (2017) proposed a test statistic that operates under a factor-like model structure assumption and demonstrated its normal limit. Building on this work, we further explore the asymptotic properties of the test statistic. We prove that the asymptotic null distribution of the test statistic is a Chi-square-type mixture distribution under a model-free assumption and establish its asymptotic power under a full alternative hypothesis. Moreover, we show that the asymptotic null distribution of the test statistic is either normal or a weighted sum of normal and Chi-square random variables, depending on the convergence rate of the eigenvalues of the covariance matrix with model free assumption. To address practical challenges in high-dimensional data, we propose a new weighted bootstrap procedure that is simple to implement. Simulation studies demonstrate that our proposed test procedure outperforms existing methods in terms of size control under various settings. Furthermore, real data applications illustrate the applicability of our test procedure to a variety of high-dimensional data analysis problems.
用不同的协方差矩阵检验k-样本均值向量是否相等的问题,被称为k-样本的Behrens-Fisher (BF)问题,是统计学中的一个重要问题。Hu和Bai(2017)提出了在类因子模型结构假设下运行的检验统计量,并证明了其正常极限。在此基础上,我们进一步探讨了检验统计量的渐近性质。在无模型假设下证明了检验统计量的渐近零分布是一个卡方型混合分布,在完全备择假设下证明了检验统计量的渐近幂。此外,我们证明了检验统计量的渐近零分布要么是正态分布,要么是正态和卡方随机变量的加权和,这取决于在无模型假设下协方差矩阵的特征值的收敛速度。为了解决高维数据中的实际挑战,我们提出了一种新的加权自举过程,该过程易于实现。仿真研究表明,我们提出的测试程序在各种设置下的尺寸控制方面优于现有方法。此外,实际数据应用说明了我们的测试程序对各种高维数据分析问题的适用性。
{"title":"The k-sample Behrens-Fisher problem for high-dimensional data with model free assumption","authors":"Yanbo Pei,&nbsp;Xiaoxiao Ren,&nbsp;Baoxue Zhang","doi":"10.1016/j.jspi.2025.106354","DOIUrl":"10.1016/j.jspi.2025.106354","url":null,"abstract":"<div><div>The problem of testing the equality of <em>k</em>-sample mean vectors with different covariance matrices, known as the Behrens-Fisher (BF) problem for <em>k</em>-sample, is a significant issue in statistics. Hu and Bai (2017) proposed a test statistic that operates under a factor-like model structure assumption and demonstrated its normal limit. Building on this work, we further explore the asymptotic properties of the test statistic. We prove that the asymptotic null distribution of the test statistic is a Chi-square-type mixture distribution under a model-free assumption and establish its asymptotic power under a full alternative hypothesis. Moreover, we show that the asymptotic null distribution of the test statistic is either normal or a weighted sum of normal and Chi-square random variables, depending on the convergence rate of the eigenvalues of the covariance matrix with model free assumption. To address practical challenges in high-dimensional data, we propose a new weighted bootstrap procedure that is simple to implement. Simulation studies demonstrate that our proposed test procedure outperforms existing methods in terms of size control under various settings. Furthermore, real data applications illustrate the applicability of our test procedure to a variety of high-dimensional data analysis problems.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106354"},"PeriodicalIF":0.8,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint distribution of numbers of occurrences of countably many runs of specified lengths in a sequence of discrete random variables 离散随机变量序列中指定长度的可数多次运行的出现次数的联合分布
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-25 DOI: 10.1016/j.jspi.2025.106353
Kiyoshi Inoue
In this paper, we consider the joint distribution of numbers of occurrences of countably many runs of several lengths in a sequence of nonnegative integer valued independent and identically distributed random variables through the generating functions. We propose a generalization of the potential partition polynomials, which gives effective computational tools for the derivation of probability functions. The waiting time problems associated with infinitely many runs are investigated and formulae for the evaluation of the generating functions are given. The results presented here provide a wide framework for developing the multivariate distribution theory of runs. Finally, we discuss several applications and numerical examples to show how our theoretical results are applied to the investigation of runs, as well as parameter estimation problems.
本文通过生成函数研究了非负整数值独立同分布随机变量序列中若干长度的可数多次运行的出现次数的联合分布。我们提出了一种潜在配分多项式的推广方法,它为概率函数的推导提供了有效的计算工具。研究了无限次运行的等待时间问题,给出了生成函数的求值公式。本文提出的结果为发展多变量运行分布理论提供了一个广泛的框架。最后,我们讨论了几个应用和数值例子,以显示我们的理论结果如何应用于研究运行,以及参数估计问题。
{"title":"Joint distribution of numbers of occurrences of countably many runs of specified lengths in a sequence of discrete random variables","authors":"Kiyoshi Inoue","doi":"10.1016/j.jspi.2025.106353","DOIUrl":"10.1016/j.jspi.2025.106353","url":null,"abstract":"<div><div>In this paper, we consider the joint distribution of numbers of occurrences of countably many runs of several lengths in a sequence of nonnegative integer valued independent and identically distributed random variables through the generating functions. We propose a generalization of the potential partition polynomials, which gives effective computational tools for the derivation of probability functions. The waiting time problems associated with infinitely many runs are investigated and formulae for the evaluation of the generating functions are given. The results presented here provide a wide framework for developing the multivariate distribution theory of runs. Finally, we discuss several applications and numerical examples to show how our theoretical results are applied to the investigation of runs, as well as parameter estimation problems.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106353"},"PeriodicalIF":0.8,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Orthogonal Latin hypercube designs with hidden low-dimensional projection 具有隐藏低维投影的正交拉丁超立方体设计
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-24 DOI: 10.1016/j.jspi.2025.106349
Tian-fang Zhang , Yue-ru Yan , Fasheng Sun
Orthogonal Latin hypercube designs are widely used in computer experiments because of their attractive properties. In this article, we develop a new grouping method to construct such designs. Compared to the existing results, the new constructed designs can accommodate more factors with the same runsize, which means they are more cost-effective. Moreover, the resulting designs possess not only orthogonality, but also appealing space-filling properties in low dimensions, which make them very suitable for computer experiments.
正交拉丁超立方体设计由于其诱人的特性在计算机实验中得到了广泛的应用。在本文中,我们开发了一种新的分组方法来构造这样的设计。与现有的结果相比,新构建的设计可以在相同的运行尺寸下容纳更多的因素,这意味着它们更具成本效益。此外,所得到的设计不仅具有正交性,而且在低维空间中具有吸引人的空间填充特性,这使它们非常适合计算机实验。
{"title":"Orthogonal Latin hypercube designs with hidden low-dimensional projection","authors":"Tian-fang Zhang ,&nbsp;Yue-ru Yan ,&nbsp;Fasheng Sun","doi":"10.1016/j.jspi.2025.106349","DOIUrl":"10.1016/j.jspi.2025.106349","url":null,"abstract":"<div><div>Orthogonal Latin hypercube designs are widely used in computer experiments because of their attractive properties. In this article, we develop a new grouping method to construct such designs. Compared to the existing results, the new constructed designs can accommodate more factors with the same runsize, which means they are more cost-effective. Moreover, the resulting designs possess not only orthogonality, but also appealing space-filling properties in low dimensions, which make them very suitable for computer experiments.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106349"},"PeriodicalIF":0.8,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust and computationally efficient gradient-based estimation 稳健且计算效率高的梯度估计
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-22 DOI: 10.1016/j.jspi.2025.106351
Yibo Yan , Xiaozhou Wang , Riquan Zhang
In this paper, we propose a class of estimators based on the robust and computationally efficient gradient estimation for both low- and high-dimensional risk minimization framework. The gradient estimation in this work is constructed using a series of newly proposed univariate robust and efficient mean estimators. Our proposed estimators are obtained iteratively using a variant of the gradient descent method, where the update direction is determined by a robust and computationally efficient gradient. These estimators not only have explicit expressions and can be obtained through arithmetic operations but are also robust to arbitrary outliers in common statistical models. Theoretically, we establish the convergence of the algorithms and derive non-asymptotic error bounds for these iterative estimators. Specifically, we apply our methods to linear and logistic regression models, achieving robust parameter estimates and corresponding excess risk bounds. Unlike previous work, our theoretical results rely on a magnitude function of the outliers, which captures the extent of their deviation from the inliers. Finally, we present extensive simulation experiments on both low- and high-dimensional linear models to demonstrate the superior performance of our proposed estimators compared to several baseline methods.
本文针对低维和高维风险最小化框架,提出了一类基于鲁棒性和计算效率高的梯度估计。本文中的梯度估计是使用一系列新提出的单变量稳健高效均值估计量来构造的。我们提出的估计量是使用梯度下降法的一种变体迭代获得的,其中更新方向由一个鲁棒且计算效率高的梯度决定。这些估计量不仅具有显式表达式,可以通过算术运算得到,而且对常见统计模型中的任意离群值具有鲁棒性。在理论上,我们建立了算法的收敛性,并推导了这些迭代估计的非渐近误差界。具体来说,我们将我们的方法应用于线性和逻辑回归模型,实现了鲁棒参数估计和相应的超额风险界限。与以前的工作不同,我们的理论结果依赖于离群值的大小函数,它捕获了离群值与内线的偏差程度。最后,我们在低维和高维线性模型上进行了广泛的模拟实验,以证明与几种基线方法相比,我们提出的估计器具有优越的性能。
{"title":"Robust and computationally efficient gradient-based estimation","authors":"Yibo Yan ,&nbsp;Xiaozhou Wang ,&nbsp;Riquan Zhang","doi":"10.1016/j.jspi.2025.106351","DOIUrl":"10.1016/j.jspi.2025.106351","url":null,"abstract":"<div><div>In this paper, we propose a class of estimators based on the robust and computationally efficient gradient estimation for both low- and high-dimensional risk minimization framework. The gradient estimation in this work is constructed using a series of newly proposed univariate robust and efficient mean estimators. Our proposed estimators are obtained iteratively using a variant of the gradient descent method, where the update direction is determined by a robust and computationally efficient gradient. These estimators not only have explicit expressions and can be obtained through arithmetic operations but are also robust to arbitrary outliers in common statistical models. Theoretically, we establish the convergence of the algorithms and derive non-asymptotic error bounds for these iterative estimators. Specifically, we apply our methods to linear and logistic regression models, achieving robust parameter estimates and corresponding excess risk bounds. Unlike previous work, our theoretical results rely on a magnitude function of the outliers, which captures the extent of their deviation from the inliers. Finally, we present extensive simulation experiments on both low- and high-dimensional linear models to demonstrate the superior performance of our proposed estimators compared to several baseline methods.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106351"},"PeriodicalIF":0.8,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing goodness-of-fit for sparse categories using Rényi divergence 利用rsamnyi散度评估稀疏分类的拟合优度
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-19 DOI: 10.1016/j.jspi.2025.106350
Raul Matsushita , Gabriel Gomes , Regina Da Fonseca , Eduardo Nakano , Roberto Vila
We present the Rényi divergence as a statistic for assessing goodness-of-fit in sparse frequency tables, where small expected counts can undermine the reliability of the traditional chi-square test. The Rényi divergence with index in (0,1) is a natural choice because it circumvents division-related issues by small frequencies. Our main result demonstrates that the Rényi statistic asymptotically follows a chi-square distribution. Through theoretical insights and Monte Carlo simulations, we evaluate the performance of the Rényi statistic across various values of the divergence index. We find that smaller index values improve the alignment of the Rényi statistic with the chi-square distribution and enhance its performance in sparse data settings. Additionally, the Rényi statistic exhibits good power properties in detecting deviations from the null hypothesis under these conditions. To illustrate its practical applicability, we present two real-world data analyses, highlighting the robustness of the Rényi divergence in scenarios involving sparse categories.
我们将r尼散度作为稀疏频率表中评估拟合优度的统计量,其中较小的期望计数会破坏传统卡方检验的可靠性。指数为(0,1)的r nyi散度是一种自然选择,因为它通过小频率规避了与除法相关的问题。我们的主要结果表明,r逍遥统计量渐近地遵循卡方分布。通过理论分析和蒙特卡罗模拟,我们评估了rsamnyi统计在不同散度指数值上的性能。我们发现,较小的指数值改善了rsami统计量与卡方分布的一致性,并提高了其在稀疏数据设置中的性能。此外,在这些条件下,rsamnyi统计量在检测零假设偏差方面表现出良好的功率特性。为了说明它的实际适用性,我们提出了两个真实世界的数据分析,强调了在涉及稀疏类别的情况下rsamnyi分歧的鲁棒性。
{"title":"Assessing goodness-of-fit for sparse categories using Rényi divergence","authors":"Raul Matsushita ,&nbsp;Gabriel Gomes ,&nbsp;Regina Da Fonseca ,&nbsp;Eduardo Nakano ,&nbsp;Roberto Vila","doi":"10.1016/j.jspi.2025.106350","DOIUrl":"10.1016/j.jspi.2025.106350","url":null,"abstract":"<div><div>We present the Rényi divergence as a statistic for assessing goodness-of-fit in sparse frequency tables, where small expected counts can undermine the reliability of the traditional chi-square test. The Rényi divergence with index in (0,1) is a natural choice because it circumvents division-related issues by small frequencies. Our main result demonstrates that the Rényi statistic asymptotically follows a chi-square distribution. Through theoretical insights and Monte Carlo simulations, we evaluate the performance of the Rényi statistic across various values of the divergence index. We find that smaller index values improve the alignment of the Rényi statistic with the chi-square distribution and enhance its performance in sparse data settings. Additionally, the Rényi statistic exhibits good power properties in detecting deviations from the null hypothesis under these conditions. To illustrate its practical applicability, we present two real-world data analyses, highlighting the robustness of the Rényi divergence in scenarios involving sparse categories.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106350"},"PeriodicalIF":0.8,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145105738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consistent community detection approach in the nonparametric weighted stochastic blockmodel with unspecified number of communities 非参数加权随机块模型中未指定社团数的一致性社团检测方法
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-13 DOI: 10.1016/j.jspi.2025.106339
Fei Ye , Jingsong Xiao , Weidong Ma , Yulai Miao , Ying Yang
The stochastic blockmodel (SBM) is a widely used model for representing graphs. Numerous approaches have been applied to the SBM to detect latent community structures in graphs, typically using two types of consistency (strong and weak) to evaluate their performance. Most of these methods have been studied and shown to be consistent under the SBM framework. However, the consistency of the weighted SBM, an important extension of the SBM, has been largely overlooked. Moreover, few approaches are capable of detecting communities when the number of communities is unknown. In this paper, we propose a nonparametric method for effective community detection under the assortative, nonparametric weighted SBM with an unknown number of communities, and we establish the consistency of our approach. We introduce a novel concept, “consistency in relationship”, as a more practical criterion to assess the performance of community detection algorithms. Since solving the optimization problem in our approach becomes intractable for large sample sizes, we propose an efficient algorithm to approximate it. Simulations demonstrate that our community detection method is both efficient and robust, particularly for unbalanced networks. We illustrate the effectiveness of our approach on three real-world networks.
随机块模型(SBM)是一种广泛使用的图表示模型。许多方法已经应用于SBM来检测图中的潜在群落结构,通常使用两种类型的一致性(强和弱)来评估它们的性能。这些方法大多已被研究,并显示在SBM框架下是一致的。然而,加权SBM的一致性是SBM的重要延伸,在很大程度上被忽视了。此外,很少有方法能够在社区数量未知的情况下检测社区。本文提出了一种非参数方法,用于在未知社团数量的分类、非参数加权SBM下进行有效的社团检测,并验证了该方法的一致性。我们引入了一个新的概念,“关系一致性”,作为评估社区检测算法性能的一个更实用的标准。由于在我们的方法中求解优化问题对于大样本量变得难以处理,我们提出了一个有效的算法来近似它。仿真结果表明,该方法具有较好的鲁棒性和有效性,尤其适用于不平衡网络。我们在三个现实世界的网络上说明了我们的方法的有效性。
{"title":"Consistent community detection approach in the nonparametric weighted stochastic blockmodel with unspecified number of communities","authors":"Fei Ye ,&nbsp;Jingsong Xiao ,&nbsp;Weidong Ma ,&nbsp;Yulai Miao ,&nbsp;Ying Yang","doi":"10.1016/j.jspi.2025.106339","DOIUrl":"10.1016/j.jspi.2025.106339","url":null,"abstract":"<div><div>The stochastic blockmodel (SBM) is a widely used model for representing graphs. Numerous approaches have been applied to the SBM to detect latent community structures in graphs, typically using two types of consistency (strong and weak) to evaluate their performance. Most of these methods have been studied and shown to be consistent under the SBM framework. However, the consistency of the weighted SBM, an important extension of the SBM, has been largely overlooked. Moreover, few approaches are capable of detecting communities when the number of communities is unknown. In this paper, we propose a nonparametric method for effective community detection under the assortative, nonparametric weighted SBM with an unknown number of communities, and we establish the consistency of our approach. We introduce a novel concept, “consistency in relationship”, as a more practical criterion to assess the performance of community detection algorithms. Since solving the optimization problem in our approach becomes intractable for large sample sizes, we propose an efficient algorithm to approximate it. Simulations demonstrate that our community detection method is both efficient and robust, particularly for unbalanced networks. We illustrate the effectiveness of our approach on three real-world networks.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106339"},"PeriodicalIF":0.8,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145105737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
D-criterion based optimal subsampling in Poisson regression with one covariate 单协变量泊松回归中基于d准则的最优子抽样
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-12 DOI: 10.1016/j.jspi.2025.106340
Torsten Glemser, Rainer Schwabe
The goal of subsampling is to select an informative subset of all observations, when using the full data for statistical analysis is not viable. We construct locally D-optimal subsampling designs under a Poisson regression model with a log link in one covariate. A representation of the support of locally D-optimal subsampling designs is established. We make statements on scale-location transformations of the covariate that require a simultaneous transformation of the regression parameter. The performance of the methods is demonstrated by illustrating examples. To show the advantage of the optimal subsampling designs, we examine the efficiency of uniform random subsampling as well as of two heuristic designs. Further, the efficiency of locally D-optimal subsampling designs is studied when the parameter is misspecified.
当使用全部数据进行统计分析是不可行的时候,子抽样的目标是在所有观测中选择一个信息丰富的子集。我们在一个协变量有log链接的泊松回归模型下构造了局部d -最优子抽样设计。建立了局部d最优子抽样设计支持度的表示。我们对协变量的尺度-位置变换作了陈述,这些变换要求同时对回归参数进行变换。通过算例验证了方法的有效性。为了展示最优子抽样设计的优势,我们考察了均匀随机子抽样和两种启发式设计的效率。在此基础上,研究了局部d最优子抽样设计在参数不确定情况下的效率。
{"title":"D-criterion based optimal subsampling in Poisson regression with one covariate","authors":"Torsten Glemser,&nbsp;Rainer Schwabe","doi":"10.1016/j.jspi.2025.106340","DOIUrl":"10.1016/j.jspi.2025.106340","url":null,"abstract":"<div><div>The goal of subsampling is to select an informative subset of all observations, when using the full data for statistical analysis is not viable. We construct locally <span><math><mi>D</mi></math></span>-optimal subsampling designs under a Poisson regression model with a log link in one covariate. A representation of the support of locally <span><math><mi>D</mi></math></span>-optimal subsampling designs is established. We make statements on scale-location transformations of the covariate that require a simultaneous transformation of the regression parameter. The performance of the methods is demonstrated by illustrating examples. To show the advantage of the optimal subsampling designs, we examine the efficiency of uniform random subsampling as well as of two heuristic designs. Further, the efficiency of locally <span><math><mi>D</mi></math></span>-optimal subsampling designs is studied when the parameter is misspecified.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106340"},"PeriodicalIF":0.8,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145118484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structured regularization covariance estimation in tensor-valued data analysis 张量值数据分析中的结构化正则化协方差估计
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-06 DOI: 10.1016/j.jspi.2025.106337
Jiangyan Wang, Yang Ren, Jinguan Lin
Covariance estimation poses a crucial challenge in high-dimensional data analysis, especially when traditional methods (e.g., sample covariance) are inaccurate, particularly with small sample sizes. A promising solution is to exploit inherent data structures such as low-rankness, sparsity, or smoothness. For tensor data (multi-dimensional arrays), structured regularization aids in dimensionality reduction. This paper introduces novel regularization methods for tensor covariance estimation, specifically applying banded and tapering structures to the covariance matrix. We use Kronecker Product Canonical Polyadic (KPCP) decomposition to approximate large matrices via the Kronecker product of smaller matrices. A split resampling scheme is employed to select parameters for the KPCP decomposition from noisy data. This leads to two methods: KPCP-TB-R (Triply Banded-Resampling) and KPCP-TT-R (Triply Tapering-Resampling). Additionally, sparse (thresholding) and multi-structured regularization approaches are introduced for comparison. The effectiveness and robustness of the proposed methods are validated through extensive simulations and applied to monthly export trade volume data.
协方差估计在高维数据分析中提出了一个至关重要的挑战,特别是当传统方法(例如样本协方差)不准确时,特别是在小样本量的情况下。一个有希望的解决方案是利用固有的数据结构,如低秩、稀疏性或平滑性。对于张量数据(多维数组),结构化正则化有助于降维。介绍了一种新的正则化方法用于张量协方差估计,特别是对协方差矩阵应用带状结构和锥形结构。我们使用Kronecker积正则多进分解(KPCP)通过较小矩阵的Kronecker积来近似大矩阵。采用分割重采样的方法从噪声数据中选择KPCP分解的参数。这导致了两种方法:kcpp - tb - r(三重带重采样)和kcpp - tt - r(三重锥形重采样)。此外,还介绍了稀疏(阈值)和多结构正则化方法进行比较。通过大量的模拟和月度出口贸易量数据验证了所提出方法的有效性和鲁棒性。
{"title":"Structured regularization covariance estimation in tensor-valued data analysis","authors":"Jiangyan Wang,&nbsp;Yang Ren,&nbsp;Jinguan Lin","doi":"10.1016/j.jspi.2025.106337","DOIUrl":"10.1016/j.jspi.2025.106337","url":null,"abstract":"<div><div>Covariance estimation poses a crucial challenge in high-dimensional data analysis, especially when traditional methods (e.g., sample covariance) are inaccurate, particularly with small sample sizes. A promising solution is to exploit inherent data structures such as low-rankness, sparsity, or smoothness. For tensor data (multi-dimensional arrays), structured regularization aids in dimensionality reduction. This paper introduces novel regularization methods for tensor covariance estimation, specifically applying banded and tapering structures to the covariance matrix. We use Kronecker Product Canonical Polyadic (KPCP) decomposition to approximate large matrices via the Kronecker product of smaller matrices. A split resampling scheme is employed to select parameters for the KPCP decomposition from noisy data. This leads to two methods: KPCP-TB-R (Triply Banded-Resampling) and KPCP-TT-R (Triply Tapering-Resampling). Additionally, sparse (thresholding) and multi-structured regularization approaches are introduced for comparison. The effectiveness and robustness of the proposed methods are validated through extensive simulations and applied to monthly export trade volume data.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106337"},"PeriodicalIF":0.8,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference for trend functions in partially linear models 部分线性模型中趋势函数的推断
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-02 DOI: 10.1016/j.jspi.2025.106338
Sijie Zheng , Xiaojun Song
A nonparametric test is developed to determine whether the trend of a partially linear model (PLM) with dependent errors and locally stationary regressors follows a specific parametric form. The test is asymptotically normal under the null hypothesis of correct trend specification and is consistent against various alternatives that deviate from the null hypothesis. The testing power against two classes of local alternatives approaching the null at different rates is derived, along with the asymptotic distribution of the test under fixed alternatives. We also propose a wild bootstrap procedure to better approximate the finite sample null distribution of the test. Statistical inference is performed on the trend specification in the Phillips curve and ozone concentration.
提出了一种非参数检验方法,以确定具有相关误差和局部平稳回归量的部分线性模型(PLM)的趋势是否遵循特定参数形式。在正确趋势规范的零假设下,检验是渐近正态的,并且对于偏离零假设的各种替代方案是一致的。导出了针对两类以不同速率接近零的局部选择的检验能力,以及在固定选择下检验的渐近分布。我们还提出了一个野生自举过程,以更好地近似检验的有限样本零分布。对菲利普斯曲线的趋势规范和臭氧浓度进行了统计推断。
{"title":"Inference for trend functions in partially linear models","authors":"Sijie Zheng ,&nbsp;Xiaojun Song","doi":"10.1016/j.jspi.2025.106338","DOIUrl":"10.1016/j.jspi.2025.106338","url":null,"abstract":"<div><div>A nonparametric test is developed to determine whether the trend of a partially linear model (PLM) with dependent errors and locally stationary regressors follows a specific parametric form. The test is asymptotically normal under the null hypothesis of correct trend specification and is consistent against various alternatives that deviate from the null hypothesis. The testing power against two classes of local alternatives approaching the null at different rates is derived, along with the asymptotic distribution of the test under fixed alternatives. We also propose a wild bootstrap procedure to better approximate the finite sample null distribution of the test. Statistical inference is performed on the trend specification in the Phillips curve and ozone concentration.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106338"},"PeriodicalIF":0.8,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145004499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of diagnostic biomarkers: A comparative analysis by area under the receiver operating characteristic curve 诊断性生物标志物的评价:通过受试者工作特征曲线下面积的比较分析
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-08-27 DOI: 10.1016/j.jspi.2025.106336
Pengfei Liu , Kai Lou , Yangchun Zhang , Peng Zhao , Wang Zhou
In recent years, a substantial of biomarkers have surfaced to facilitate the prompt diagnosis and intervention of chronic kidney disease. However, the lack of a reliable approach to compare biomarker efficacy poses a significant challenge in clinical practice and biomedical research. The inability to accurately assess biomarkers’ performance limits their utility in disease diagnosis. In this article, we study the efficiency of different diagnostic markers by comparing the areas under the receiver operating characteristic curves of markers, which are estimated via the Wilcoxon–Mann–Whitney statistics. Furthermore, the precision of interval estimation was enhanced through the implementation of the Edgeworth expansion and bootstrap approximation on the statistics. By performing numerical simulations, we have demonstrated that our improved methods exhibit superior accuracy in constructing confidence intervals when compared to the traditional normal approximation method.
近年来,大量的生物标志物已经浮出水面,以促进慢性肾脏疾病的及时诊断和干预。然而,缺乏一种可靠的方法来比较生物标志物的疗效,这对临床实践和生物医学研究构成了重大挑战。无法准确评估生物标志物的性能限制了它们在疾病诊断中的应用。在本文中,我们通过比较标记的接受者工作特征曲线下的面积来研究不同诊断标记的效率,这些标记是通过Wilcoxon-Mann-Whitney统计估计的。此外,通过对统计量进行Edgeworth展开和自举逼近,提高了区间估计的精度。通过进行数值模拟,我们已经证明,与传统的正态近似方法相比,我们改进的方法在构建置信区间方面具有更高的准确性。
{"title":"Evaluation of diagnostic biomarkers: A comparative analysis by area under the receiver operating characteristic curve","authors":"Pengfei Liu ,&nbsp;Kai Lou ,&nbsp;Yangchun Zhang ,&nbsp;Peng Zhao ,&nbsp;Wang Zhou","doi":"10.1016/j.jspi.2025.106336","DOIUrl":"10.1016/j.jspi.2025.106336","url":null,"abstract":"<div><div>In recent years, a substantial of biomarkers have surfaced to facilitate the prompt diagnosis and intervention of chronic kidney disease. However, the lack of a reliable approach to compare biomarker efficacy poses a significant challenge in clinical practice and biomedical research. The inability to accurately assess biomarkers’ performance limits their utility in disease diagnosis. In this article, we study the efficiency of different diagnostic markers by comparing the areas under the receiver operating characteristic curves of markers, which are estimated via the Wilcoxon–Mann–Whitney statistics. Furthermore, the precision of interval estimation was enhanced through the implementation of the Edgeworth expansion and bootstrap approximation on the statistics. By performing numerical simulations, we have demonstrated that our improved methods exhibit superior accuracy in constructing confidence intervals when compared to the traditional normal approximation method.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106336"},"PeriodicalIF":0.8,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144922235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Statistical Planning and Inference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1