首页 > 最新文献

Journal of Statistical Planning and Inference最新文献

英文 中文
Outcome dependent subsampling divide and conquer in generalized linear models for massive data 海量数据广义线性模型的结果依赖子抽样分治方法
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-12-04 DOI: 10.1016/j.jspi.2024.106253
Jie Yin , Jieli Ding , Changming Yang
In order to break the constraints and barriers caused by limited computing power in processing massive datasets, we propose an outcome dependent subsampling divide and conquer strategy in this paper. The proposed strategy can process data on multiple blocks in parallel and concentrate the computing resources of each block on regions with the most information. We develop a distributed statistical inference method and propose a computation-efficient algorithm in the generalized linear models for massive data. The proposed method only need to preserve some summary statistics from each data block and then use them to directly construct the proposed estimator. The asymptotic properties of the proposed method are established. Simulation studies and real data analysis are conducted to illustrate the merits of the proposed method.
为了打破计算能力有限对海量数据集处理的限制和障碍,本文提出了一种结果依赖的子抽样分治策略。该策略可以并行处理多个块上的数据,并将每个块的计算资源集中在信息最多的区域上。本文提出了一种分布式统计推理方法,并在海量数据的广义线性模型中提出了一种计算效率高的算法。该方法只需要从每个数据块中保留一些汇总统计信息,然后使用它们直接构造所提出的估计器。建立了该方法的渐近性。仿真研究和实际数据分析表明了该方法的优越性。
{"title":"Outcome dependent subsampling divide and conquer in generalized linear models for massive data","authors":"Jie Yin ,&nbsp;Jieli Ding ,&nbsp;Changming Yang","doi":"10.1016/j.jspi.2024.106253","DOIUrl":"10.1016/j.jspi.2024.106253","url":null,"abstract":"<div><div>In order to break the constraints and barriers caused by limited computing power in processing massive datasets, we propose an outcome dependent subsampling divide and conquer strategy in this paper. The proposed strategy can process data on multiple blocks in parallel and concentrate the computing resources of each block on regions with the most information. We develop a distributed statistical inference method and propose a computation-efficient algorithm in the generalized linear models for massive data. The proposed method only need to preserve some summary statistics from each data block and then use them to directly construct the proposed estimator. The asymptotic properties of the proposed method are established. Simulation studies and real data analysis are conducted to illustrate the merits of the proposed method.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"237 ","pages":"Article 106253"},"PeriodicalIF":0.8,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric estimators of inequality curves and inequality measures 不等式曲线的非参数估计和不等式测度
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-11-28 DOI: 10.1016/j.jspi.2024.106251
Alicja Jokiel-Rokita, Sylwester Pia̧tek
Classical inequality curves and inequality measures are defined for distributions with finite mean value. Moreover, their empirical counterparts are not resistant to outliers. For these reasons, quantile versions of known inequality curves such as the Lorenz, Bonferroni, Zenga and D curves, and quantile versions of inequality measures such as the Gini, Bonferroni, Zenga and D indices have been proposed in the literature. We propose various nonparametric estimators of quantile versions of inequality curves and inequality measures, prove their consistency, and compare their accuracy in a simulation study. We also give examples of the use of quantile versions of inequality measures in real data analysis.
经典的不等式曲线和不等式测度是针对有限均值分布定义的。此外,他们的经验对应物对异常值没有抵抗力。由于这些原因,文献中已经提出了已知不平等曲线的分位数版本,如Lorenz, Bonferroni, Zenga和D曲线,以及不平等测量的分位数版本,如基尼指数,Bonferroni指数,Zenga指数和D指数。我们提出了不等式曲线和不等式测度的分位数版本的各种非参数估计,证明了它们的一致性,并在模拟研究中比较了它们的准确性。我们还给出了在实际数据分析中使用分位数版本的不平等度量的例子。
{"title":"Nonparametric estimators of inequality curves and inequality measures","authors":"Alicja Jokiel-Rokita,&nbsp;Sylwester Pia̧tek","doi":"10.1016/j.jspi.2024.106251","DOIUrl":"10.1016/j.jspi.2024.106251","url":null,"abstract":"<div><div>Classical inequality curves and inequality measures are defined for distributions with finite mean value. Moreover, their empirical counterparts are not resistant to outliers. For these reasons, quantile versions of known inequality curves such as the Lorenz, Bonferroni, Zenga and <span><math><mi>D</mi></math></span> curves, and quantile versions of inequality measures such as the Gini, Bonferroni, Zenga and <span><math><mi>D</mi></math></span> indices have been proposed in the literature. We propose various nonparametric estimators of quantile versions of inequality curves and inequality measures, prove their consistency, and compare their accuracy in a simulation study. We also give examples of the use of quantile versions of inequality measures in real data analysis.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"237 ","pages":"Article 106251"},"PeriodicalIF":0.8,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation and group-feature selection in sparse mixture-of-experts with diverging number of parameters 参数数量分散的稀疏专家混合物中的估计和组特征选择
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-11-19 DOI: 10.1016/j.jspi.2024.106250
Abbas Khalili , Archer Yi Yang , Xiaonan Da
Mixture-of-experts provide flexible statistical models for a wide range of regression (supervised learning) problems. Often a large number of covariates (features) are available in many modern applications yet only a small subset of them is useful in explaining a response variable of interest. This calls for a feature selection device. In this paper, we present new group-feature selection and estimation methods for sparse mixture-of-experts models when the number of features can be nearly comparable to the sample size. We prove the consistency of the methods in both parameter estimation and feature selection. We implement the methods using a modified EM algorithm combined with proximal gradient method which results in a convenient closed-form parameter update in the M-step of the algorithm. We examine the finite-sample performance of the methods through simulations, and demonstrate their applications in a real data example on exploring relationships in body measurements.
专家混合模型为各种回归(监督学习)问题提供了灵活的统计模型。在许多现代应用中,往往会有大量的协变量(特征),但其中只有一小部分对解释感兴趣的响应变量有用。这就需要一种特征选择装置。在本文中,我们针对稀疏专家混合物模型提出了新的分组特征选择和估计方法,当特征数量几乎与样本大小相当时,就可以使用这种方法。我们证明了这些方法在参数估计和特征选择方面的一致性。我们使用改进的 EM 算法结合近似梯度法来实现这些方法,从而在算法的 M 步中方便地进行闭式参数更新。我们通过仿真检验了这些方法的有限样本性能,并在一个探索人体测量关系的真实数据示例中演示了这些方法的应用。
{"title":"Estimation and group-feature selection in sparse mixture-of-experts with diverging number of parameters","authors":"Abbas Khalili ,&nbsp;Archer Yi Yang ,&nbsp;Xiaonan Da","doi":"10.1016/j.jspi.2024.106250","DOIUrl":"10.1016/j.jspi.2024.106250","url":null,"abstract":"<div><div>Mixture-of-experts provide flexible statistical models for a wide range of regression (supervised learning) problems. Often a large number of covariates (features) are available in many modern applications yet only a small subset of them is useful in explaining a response variable of interest. This calls for a feature selection device. In this paper, we present new group-feature selection and estimation methods for sparse mixture-of-experts models when the number of features can be nearly comparable to the sample size. We prove the consistency of the methods in both parameter estimation and feature selection. We implement the methods using a modified EM algorithm combined with proximal gradient method which results in a convenient closed-form parameter update in the M-step of the algorithm. We examine the finite-sample performance of the methods through simulations, and demonstrate their applications in a real data example on exploring relationships in body measurements.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"237 ","pages":"Article 106250"},"PeriodicalIF":0.8,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling and testing for endpoint-inflated count time series with bounded support 有界支持端点膨胀计数时间序列的建模与测试
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-11-15 DOI: 10.1016/j.jspi.2024.106248
Yao Kang , Xiaojing Fan , Jie Zhang , Ying Tang
Count time series with bounded support frequently exhibit binomial overdispersion, zero inflation and right-endpoint inflation in practical scenarios. Numerous models have been proposed for the analysis of bounded count time series with binomial overdispersion and zero inflation, yet right-endpoint inflation has received comparatively less attention. To better capture these features, this article introduces three versions of extended first-order binomial autoregressive (BAR(1)) models with endpoint inflation. Corresponding stochastic properties of the new models are investigated and model parameters are estimated by the conditional maximum likelihood and quasi-maximum likelihood methods. A binomial right-endpoint inflation index is also constructed and further used to test whether the data set has endpoint-inflated characteristic with respect to a BAR(1) process. Finally, the proposed models are applied to two real data examples. Firstly, we illustrate the usefulness of the proposed models through an application to the voting data on supporting interest rate changes during consecutive monthly meetings of the Monetary Policy Council at the National Bank of Poland. Then, we apply the proposed models to the number of police stations that received at least one drunk driving report per month. The results of the two real data examples indicate that the new models have significant advantages in terms of fitting performance for the bounded count time series with endpoint inflation.
具有有界支持的计数时间序列在实际场景中经常表现为二项过分散、零膨胀和右端点膨胀。对于具有二项过分散和零膨胀的有界计数时间序列的分析,已经提出了许多模型,但右端点膨胀受到的关注相对较少。为了更好地捕捉这些特征,本文介绍了具有端点膨胀的扩展一阶二项自回归(BAR(1))模型的三个版本。研究了新模型的随机性质,并利用条件极大似然和拟极大似然方法估计了模型参数。还构造了一个二项式右端点膨胀指数,并进一步用于测试数据集相对于BAR(1)过程是否具有端点膨胀特征。最后,将所提出的模型应用于两个实际数据实例。首先,我们通过对波兰国家银行货币政策委员会连续每月会议期间支持利率变化的投票数据的应用来说明所提出模型的实用性。然后,我们将所提出的模型应用于每月至少收到一份酒驾报告的警察局数量。两个实际数据示例的结果表明,新模型在具有端点膨胀的有界计数时间序列的拟合性能方面具有显著的优势。
{"title":"Modeling and testing for endpoint-inflated count time series with bounded support","authors":"Yao Kang ,&nbsp;Xiaojing Fan ,&nbsp;Jie Zhang ,&nbsp;Ying Tang","doi":"10.1016/j.jspi.2024.106248","DOIUrl":"10.1016/j.jspi.2024.106248","url":null,"abstract":"<div><div>Count time series with bounded support frequently exhibit binomial overdispersion, zero inflation and right-endpoint inflation in practical scenarios. Numerous models have been proposed for the analysis of bounded count time series with binomial overdispersion and zero inflation, yet right-endpoint inflation has received comparatively less attention. To better capture these features, this article introduces three versions of extended first-order binomial autoregressive (BAR(1)) models with endpoint inflation. Corresponding stochastic properties of the new models are investigated and model parameters are estimated by the conditional maximum likelihood and quasi-maximum likelihood methods. A binomial right-endpoint inflation index is also constructed and further used to test whether the data set has endpoint-inflated characteristic with respect to a BAR(1) process. Finally, the proposed models are applied to two real data examples. Firstly, we illustrate the usefulness of the proposed models through an application to the voting data on supporting interest rate changes during consecutive monthly meetings of the Monetary Policy Council at the National Bank of Poland. Then, we apply the proposed models to the number of police stations that received at least one drunk driving report per month. The results of the two real data examples indicate that the new models have significant advantages in terms of fitting performance for the bounded count time series with endpoint inflation.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"237 ","pages":"Article 106248"},"PeriodicalIF":0.8,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142759599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-parametric empirical likelihood inference on quantile difference between two samples with length-biased and right-censored data 利用长度偏差和右删失数据对两个样本之间的量差进行半参数经验似然推断
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-11-14 DOI: 10.1016/j.jspi.2024.106249
Li Xun , Xin Guan , Yong Zhou
Exploring quantile differences between two populations at various probability levels offers valuable insights into their distinctions, which are essential for practical applications such as assessing treatment effects. However, estimating these differences can be challenging due to the complex data often encountered in clinical trials. This paper assumes that right-censored data and length-biased right-censored data originate from two populations of interest. We propose an adjusted smoothed empirical likelihood (EL) method for inferring quantile differences and establish the asymptotic properties of the proposed estimators. Under mild conditions, we demonstrate that the adjusted log-EL ratio statistics asymptotically follow the standard chi-squared distribution. We construct confidence intervals for the quantile differences using both normal and chi-squared approximations and develop a likelihood ratio test for these differences. The performance of our proposed methods is illustrated through simulation studies. Finally, we present a case study utilizing Oscar award nomination data to demonstrate the application of our method.
探索两个人群在不同概率水平上的量纲差异,可以深入了解它们之间的区别,这对评估治疗效果等实际应用至关重要。然而,由于临床试验中经常遇到复杂的数据,估计这些差异可能具有挑战性。本文假设右删失数据和长度偏倚右删失数据来自两个相关人群。我们提出了一种用于推断量纲差异的调整平滑经验似然法(EL),并建立了所提估计值的渐近特性。在温和条件下,我们证明了调整后的对数-EL 比率统计量渐近遵循标准的卡方分布。我们使用正态和卡方近似值构建了量纲差异的置信区间,并开发了针对这些差异的似然比检验。我们通过模拟研究说明了所提方法的性能。最后,我们利用奥斯卡奖提名数据进行了案例研究,展示了我们方法的应用。
{"title":"Semi-parametric empirical likelihood inference on quantile difference between two samples with length-biased and right-censored data","authors":"Li Xun ,&nbsp;Xin Guan ,&nbsp;Yong Zhou","doi":"10.1016/j.jspi.2024.106249","DOIUrl":"10.1016/j.jspi.2024.106249","url":null,"abstract":"<div><div>Exploring quantile differences between two populations at various probability levels offers valuable insights into their distinctions, which are essential for practical applications such as assessing treatment effects. However, estimating these differences can be challenging due to the complex data often encountered in clinical trials. This paper assumes that right-censored data and length-biased right-censored data originate from two populations of interest. We propose an adjusted smoothed empirical likelihood (EL) method for inferring quantile differences and establish the asymptotic properties of the proposed estimators. Under mild conditions, we demonstrate that the adjusted log-EL ratio statistics asymptotically follow the standard chi-squared distribution. We construct confidence intervals for the quantile differences using both normal and chi-squared approximations and develop a likelihood ratio test for these differences. The performance of our proposed methods is illustrated through simulation studies. Finally, we present a case study utilizing Oscar award nomination data to demonstrate the application of our method.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"237 ","pages":"Article 106249"},"PeriodicalIF":0.8,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sieve estimation of the accelerated mean model based on panel count data 基于面板计数数据的加速平均模型的筛分估计
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-11-12 DOI: 10.1016/j.jspi.2024.106247
Xiaoyang Li , Zhi-Sheng Ye , Xingqiu Zhao
Panel count data are gathered when subjects are examined at discrete times during a study, and only the number of recurrent events occurring before each examination time is recorded. We consider a semiparametric accelerated mean model for panel count data in which the effect of the covariates is to transform the time scale of the baseline mean function. Semiparametric inference for the model is inherently challenging because the finite-dimensional regression parameters appear in the argument of the (infinite-dimensional) functional parameter, i.e., the baseline mean function, leading to the phenomenon of bundled parameters. We propose sieve pseudolikelihood and likelihood methods to construct the random criterion function for estimating the model parameters. An inexact block coordinate ascent algorithm is used to obtain these estimators. We establish the consistency and rate of convergence of the proposed estimators, as well as the asymptotic normality of the estimators of the regression parameters. Novel consistent estimators of the asymptotic covariances of the estimated regression parameters are derived by leveraging the counting process associated with the examination times. Comprehensive simulation studies demonstrate that the optimization algorithm is much less sensitive to the initial values than the Newton–Raphson method. The proposed estimators perform well for practical sample sizes, and are more efficient than existing methods. An example based on real data shows that due to this efficiency gain, the proposed method is better able to detect the significance of practically meaningful covariates than an existing method.
面板计数数据是在研究过程中对受试者进行离散时间检查时收集的数据,只记录每次检查时间之前发生的重复事件的数量。我们考虑了面板计数数据的半参数加速均值模型,其中协变量的作用是转换基线均值函数的时间尺度。由于有限维回归参数出现在(无限维)函数参数(即基线均值函数)的参数中,导致了捆绑参数现象,因此该模型的半参数推断本身就具有挑战性。我们提出了筛分伪似然法和似然法,以构建估计模型参数的随机准则函数。我们使用非精确块坐标上升算法来获得这些估计值。我们确定了所提出的估计值的一致性和收敛率,以及回归参数估计值的渐近正态性。通过利用与考试时间相关的计数过程,我们得出了估计回归参数渐近协方差的新一致估计值。综合模拟研究表明,优化算法对初始值的敏感度远低于牛顿-拉斐森方法。所提出的估计方法在实际样本量中表现良好,比现有方法更有效。一个基于真实数据的例子表明,由于效率的提高,所提出的方法比现有方法更能检测出具有实际意义的协变量的重要性。
{"title":"Sieve estimation of the accelerated mean model based on panel count data","authors":"Xiaoyang Li ,&nbsp;Zhi-Sheng Ye ,&nbsp;Xingqiu Zhao","doi":"10.1016/j.jspi.2024.106247","DOIUrl":"10.1016/j.jspi.2024.106247","url":null,"abstract":"<div><div>Panel count data are gathered when subjects are examined at discrete times during a study, and only the number of recurrent events occurring before each examination time is recorded. We consider a semiparametric accelerated mean model for panel count data in which the effect of the covariates is to transform the time scale of the baseline mean function. Semiparametric inference for the model is inherently challenging because the finite-dimensional regression parameters appear in the argument of the (infinite-dimensional) functional parameter, i.e., the baseline mean function, leading to the phenomenon of bundled parameters. We propose sieve pseudolikelihood and likelihood methods to construct the random criterion function for estimating the model parameters. An inexact block coordinate ascent algorithm is used to obtain these estimators. We establish the consistency and rate of convergence of the proposed estimators, as well as the asymptotic normality of the estimators of the regression parameters. Novel consistent estimators of the asymptotic covariances of the estimated regression parameters are derived by leveraging the counting process associated with the examination times. Comprehensive simulation studies demonstrate that the optimization algorithm is much less sensitive to the initial values than the Newton–Raphson method. The proposed estimators perform well for practical sample sizes, and are more efficient than existing methods. An example based on real data shows that due to this efficiency gain, the proposed method is better able to detect the significance of practically meaningful covariates than an existing method.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"237 ","pages":"Article 106247"},"PeriodicalIF":0.8,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The proximal bootstrap for constrained estimators 受约束估计器的近似自举法
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-10-28 DOI: 10.1016/j.jspi.2024.106245
Jessie Li
We demonstrate how to conduct uniformly asymptotically valid inference for n-consistent estimators defined as the solution to a constrained optimization problem with a possibly nonsmooth or nonconvex sample objective function and a possibly nonconvex constraint set. We allow for the solution to the problem to be on the boundary of the constraint set or to drift towards the boundary of the constraint set as the sample size goes to infinity. We construct a confidence set by benchmarking a test statistic against critical values that can be obtained from a simple unconstrained quadratic programming problem. Monte Carlo simulations illustrate the uniformly correct coverage of our method in a boundary constrained maximum likelihood model, a boundary constrained nonsmooth GMM model, and a conditional logit model with capacity constraints.
我们演示了如何对 n 个一致估计器进行统一渐近有效推断,这些估计器被定义为一个约束优化问题的解,该问题具有可能是非光滑或非凸的样本目标函数和可能是非凸的约束集。我们允许问题的解处于约束集的边界上,或随着样本量的增加而向约束集的边界漂移。我们通过将测试统计量与临界值进行比对来构建置信集,这些临界值可以从一个简单的无约束二次编程问题中获得。蒙特卡罗模拟说明了我们的方法在边界约束最大似然模型、边界约束非光滑 GMM 模型和带容量约束的条件 logit 模型中的均匀正确覆盖率。
{"title":"The proximal bootstrap for constrained estimators","authors":"Jessie Li","doi":"10.1016/j.jspi.2024.106245","DOIUrl":"10.1016/j.jspi.2024.106245","url":null,"abstract":"<div><div>We demonstrate how to conduct uniformly asymptotically valid inference for <span><math><msqrt><mrow><mi>n</mi></mrow></msqrt></math></span>-consistent estimators defined as the solution to a constrained optimization problem with a possibly nonsmooth or nonconvex sample objective function and a possibly nonconvex constraint set. We allow for the solution to the problem to be on the boundary of the constraint set or to drift towards the boundary of the constraint set as the sample size goes to infinity. We construct a confidence set by benchmarking a test statistic against critical values that can be obtained from a simple unconstrained quadratic programming problem. Monte Carlo simulations illustrate the uniformly correct coverage of our method in a boundary constrained maximum likelihood model, a boundary constrained nonsmooth GMM model, and a conditional logit model with capacity constraints.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"236 ","pages":"Article 106245"},"PeriodicalIF":0.8,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing the equality of distributions using integrated maximum mean discrepancy 利用综合最大均值差异测试分布的相等性
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-10-25 DOI: 10.1016/j.jspi.2024.106246
Tianxuan Ding , Zhimei Li , Yaowu Zhang
Comparing and testing for the homogeneity of two independent random samples is a fundamental statistical problem with many applications across various fields. However, existing methods may not be effective when the data is complex or high-dimensional. We propose a new method that integrates the maximum mean discrepancy (MMD) with a Gaussian kernel over all one-dimensional projections of the data. We derive the closed-form expression of the integrated MMD and prove its validity as a distributional similarity metric. We estimate the integrated MMD with the U-statistic theory and study its asymptotic behaviors under the null and two kinds of alternative hypotheses. We demonstrate that our method has the benefits of the MMD, and outperforms existing methods on both synthetic and real datasets, especially when the data is complex and high-dimensional.
比较和检验两个独立随机样本的同质性是一个基本的统计问题,在各个领域都有很多应用。然而,当数据复杂或高维时,现有的方法可能无法奏效。我们提出了一种新方法,用高斯核对数据的所有一维投影进行最大均值差异(MMD)积分。我们推导出了集成 MMD 的闭式表达式,并证明了它作为分布相似度量的有效性。我们用 U 统计理论估计了综合 MMD,并研究了它在零假设和两种替代假设下的渐近行为。我们证明了我们的方法具有 MMD 的优点,并且在合成数据集和真实数据集上都优于现有方法,尤其是在数据复杂和高维的情况下。
{"title":"Testing the equality of distributions using integrated maximum mean discrepancy","authors":"Tianxuan Ding ,&nbsp;Zhimei Li ,&nbsp;Yaowu Zhang","doi":"10.1016/j.jspi.2024.106246","DOIUrl":"10.1016/j.jspi.2024.106246","url":null,"abstract":"<div><div>Comparing and testing for the homogeneity of two independent random samples is a fundamental statistical problem with many applications across various fields. However, existing methods may not be effective when the data is complex or high-dimensional. We propose a new method that integrates the maximum mean discrepancy (MMD) with a Gaussian kernel over all one-dimensional projections of the data. We derive the closed-form expression of the integrated MMD and prove its validity as a distributional similarity metric. We estimate the integrated MMD with the <span><math><mi>U</mi></math></span>-statistic theory and study its asymptotic behaviors under the null and two kinds of alternative hypotheses. We demonstrate that our method has the benefits of the MMD, and outperforms existing methods on both synthetic and real datasets, especially when the data is complex and high-dimensional.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"236 ","pages":"Article 106246"},"PeriodicalIF":0.8,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142553626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semiparametric estimation of a principal functional coefficient panel data model with cross-sectional dependence and its application to cigarette demand 具有横截面依赖性的主函数系数面板数据模型的半参数估计及其在卷烟需求中的应用
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-10-05 DOI: 10.1016/j.jspi.2024.106244
Yan-Yong Zhao , Ling-Ling Ge , Kong-Sheng Zhang
In this paper, we consider the estimation of functional coefficient panel data models with cross-sectional dependence. Borrowing the principal component structure, the functional coefficient panel data models can be transformed into a semiparametric panel data model. Combining the local linear dummy variable technique and profile least squares method, we develop a semiparametric profile method to estimate the coefficient functions. A gradient-descent iterative algorithm is employed to enhance computation speed and estimation accuracy. The main results show that the resulting parameter estimator enjoys asymptotic normality with a NT convergence rate and the nonparametric estimator is asymptotically normal with a nonparametric convergence rate NTh when both the number of cross-sectional units N and the length of time series T go to infinity, under some regularity conditions. Monte Carlo simulations are carried out to evaluate the proposed methods, and an application to cigarette demand is investigated for illustration.
本文考虑了具有横截面依赖性的函数系数面板数据模型的估计。借用主成分结构,函数系数面板数据模型可以转化为半参数面板数据模型。结合局部线性虚拟变量技术和剖面最小二乘法,我们开发了一种估计系数函数的半参数剖面方法。我们采用梯度迭代算法来提高计算速度和估计精度。主要结果表明,在一些正则性条件下,当横截面单位数 N 和时间序列长度 T 都达到无穷大时,所得到的参数估计器具有渐近正态性和 NT 收敛率,而非参数估计器具有渐近正态性和非参数收敛率 NTh。为了评估所提出的方法,我们进行了蒙特卡罗模拟,并对卷烟需求的应用进行了研究以作说明。
{"title":"Semiparametric estimation of a principal functional coefficient panel data model with cross-sectional dependence and its application to cigarette demand","authors":"Yan-Yong Zhao ,&nbsp;Ling-Ling Ge ,&nbsp;Kong-Sheng Zhang","doi":"10.1016/j.jspi.2024.106244","DOIUrl":"10.1016/j.jspi.2024.106244","url":null,"abstract":"<div><div>In this paper, we consider the estimation of functional coefficient panel data models with cross-sectional dependence. Borrowing the principal component structure, the functional coefficient panel data models can be transformed into a semiparametric panel data model. Combining the local linear dummy variable technique and profile least squares method, we develop a semiparametric profile method to estimate the coefficient functions. A gradient-descent iterative algorithm is employed to enhance computation speed and estimation accuracy. The main results show that the resulting parameter estimator enjoys asymptotic normality with a <span><math><msqrt><mrow><mi>N</mi><mi>T</mi></mrow></msqrt></math></span> convergence rate and the nonparametric estimator is asymptotically normal with a nonparametric convergence rate <span><math><msqrt><mrow><mi>N</mi><mi>T</mi><mi>h</mi></mrow></msqrt></math></span> when both the number of cross-sectional units <span><math><mi>N</mi></math></span> and the length of time series <span><math><mi>T</mi></math></span> go to infinity, under some regularity conditions. Monte Carlo simulations are carried out to evaluate the proposed methods, and an application to cigarette demand is investigated for illustration.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"236 ","pages":"Article 106244"},"PeriodicalIF":0.8,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142416590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A family of discrete maximum-entropy distributions 离散最大熵分布系列
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-10-01 DOI: 10.1016/j.jspi.2024.106243
David J. Hessen
In this paper, a family of maximum-entropy distributions with general discrete support is derived. Members of the family are distinguished by the number of specified non-central moments. In addition, a subfamily of discrete symmetric distributions is defined. Attention is paid to maximum likelihood estimation of the parameters of any member of the general family. It is shown that the parameters of any special case with infinite support can be estimated using a conditional distribution given a finite subset of the total support. In an empirical data example, the procedures proposed are demonstrated.
本文导出了具有一般离散支持的最大熵分布族。该族成员根据指定的非中心矩的数量来区分。此外,还定义了离散对称分布子族。一般族成员参数的最大似然估计受到关注。结果表明,任何具有无限支持的特例的参数都可以使用给定总支持的有限子集的条件分布来估计。在一个经验数据示例中,演示了所提出的程序。
{"title":"A family of discrete maximum-entropy distributions","authors":"David J. Hessen","doi":"10.1016/j.jspi.2024.106243","DOIUrl":"10.1016/j.jspi.2024.106243","url":null,"abstract":"<div><div>In this paper, a family of maximum-entropy distributions with general discrete support is derived. Members of the family are distinguished by the number of specified non-central moments. In addition, a subfamily of discrete symmetric distributions is defined. Attention is paid to maximum likelihood estimation of the parameters of any member of the general family. It is shown that the parameters of any special case with infinite support can be estimated using a conditional distribution given a finite subset of the total support. In an empirical data example, the procedures proposed are demonstrated.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"236 ","pages":"Article 106243"},"PeriodicalIF":0.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142416588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Statistical Planning and Inference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1