首页 > 最新文献

Test最新文献

英文 中文
Jackknife empirical likelihood for the correlation coefficient with additive distortion measurement errors 具有加性失真测量误差的相关系数的积弱经验似然法
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-09-04 DOI: 10.1007/s11749-024-00941-x
Da Chen, Linlin Dai, Yichuan Zhao

The correlation coefficient is fundamental in advanced statistical analysis. However, traditional methods of calculating correlation coefficients can be biased due to the existence of confounding variables. Such confounding variables could act in an additive or multiplicative fashion. To study the additive model, previous research has shown residual-based estimation of correlation coefficients. The powerful tool of empirical likelihood (EL) has been used to construct the confidence interval for the correlation coefficient. However, the methods so far only perform well when sample sizes are large. With small sample size situations, the coverage probability of EL, for instance, can be below 90% at confidence level 95%. On the basis of previous research, we propose new methods of interval estimation for the correlation coefficient using jackknife empirical likelihood, mean jackknife empirical likelihood and adjusted jackknife empirical likelihood. For better performance with small sample sizes, we also propose mean adjusted empirical likelihood. The simulation results show the best performance with mean adjusted jackknife empirical likelihood when the sample sizes are as small as 25. Real data analyses are used to illustrate the proposed approach.

相关系数是高级统计分析的基础。然而,由于混杂变量的存在,计算相关系数的传统方法可能会出现偏差。这些混杂变量可能以相加或相乘的方式发挥作用。为了研究加法模型,以往的研究显示了基于残差的相关系数估计方法。经验似然法(EL)这一强大工具被用来构建相关系数的置信区间。然而,迄今为止的方法只有在样本量较大的情况下才表现良好。在样本量较小的情况下,以 EL 为例,在置信水平为 95% 时,其覆盖概率可能低于 90%。在前人研究的基础上,我们提出了新的相关系数区间估计方法,即使用杰克刀经验似然法、平均杰克刀经验似然法和调整杰克刀经验似然法。为了在样本量较小的情况下获得更好的性能,我们还提出了平均调整经验似然法。模拟结果表明,当样本量小到 25 个时,平均调整杰克刀经验似然法的性能最佳。真实数据分析用于说明所提出的方法。
{"title":"Jackknife empirical likelihood for the correlation coefficient with additive distortion measurement errors","authors":"Da Chen, Linlin Dai, Yichuan Zhao","doi":"10.1007/s11749-024-00941-x","DOIUrl":"https://doi.org/10.1007/s11749-024-00941-x","url":null,"abstract":"<p>The correlation coefficient is fundamental in advanced statistical analysis. However, traditional methods of calculating correlation coefficients can be biased due to the existence of confounding variables. Such confounding variables could act in an additive or multiplicative fashion. To study the additive model, previous research has shown residual-based estimation of correlation coefficients. The powerful tool of empirical likelihood (EL) has been used to construct the confidence interval for the correlation coefficient. However, the methods so far only perform well when sample sizes are large. With small sample size situations, the coverage probability of EL, for instance, can be below 90% at confidence level 95%. On the basis of previous research, we propose new methods of interval estimation for the correlation coefficient using jackknife empirical likelihood, mean jackknife empirical likelihood and adjusted jackknife empirical likelihood. For better performance with small sample sizes, we also propose mean adjusted empirical likelihood. The simulation results show the best performance with mean adjusted jackknife empirical likelihood when the sample sizes are as small as 25. Real data analyses are used to illustrate the proposed approach.\u0000</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"7 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric conditional survival function estimation and plug-in bandwidth selection with multiple covariates 带有多个协变量的非参数条件生存函数估计和插件带宽选择
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-08-31 DOI: 10.1007/s11749-024-00945-7
Dimitrios Bagkavos, Montserrat Guillen, Jens P. Nielsen

The present research provides two methodological advances, simulation evidence and a real data analysis, all contributing to the area of local linear survival function estimation and bandwidth selection. The first contribution is the development of a double smoothed local linear survival function estimator which admits an arbitrary number of covariates and the analytic establishment of its asymptotic properties. The second contribution is the efficient implementation of the estimator in practice. This is achieved by developing an automatic plug-in smoothing parameter selector which optimizes the estimator’s performance in all coordinate directions. The traditional problem of vectorization of higher-order derivatives which lead to increasingly intractable matrix algebraic expressions is addressed here by introducing an alternative vectorization that exploits the analytic relationships between the functionals involved. This yields simpler, tractable and efficient in terms of computing time expressions which greatly facilitate the implementation of the rule in practice. The analytic study of the rule’s rate of convergence shows that in contrast to the traditional cross validation approach, the proposed bandwidth selector is functional even for a large number of covariates. The benefits of all methodological advances are illustrated with the analysis of a motivating real-world dataset on credit risk.

本研究提供了两种方法论进展、模拟证据和真实数据分析,它们都有助于局部线性生存函数估计和带宽选择领域。第一个贡献是开发了一种允许任意数量协变量的双平滑局部线性生存函数估计器,并对其渐近特性进行了分析。第二个贡献是在实践中有效实施该估计器。这是通过开发一种自动插件平滑参数选择器来实现的,它能优化估计器在所有坐标方向上的性能。传统的高阶导数矢量化问题会导致矩阵代数表达式越来越难处理,这里通过引入另一种矢量化方法,利用相关函数之间的解析关系,解决了这一问题。这样可以得到更简单、可控和高效的计算时间表达式,极大地促进了该规则在实际中的应用。对该规则收敛率的分析研究表明,与传统的交叉验证方法不同,所提出的带宽选择器即使对大量协变量也是有效的。通过分析现实世界中有关信贷风险的数据集,可以说明所有方法进步的益处。
{"title":"Nonparametric conditional survival function estimation and plug-in bandwidth selection with multiple covariates","authors":"Dimitrios Bagkavos, Montserrat Guillen, Jens P. Nielsen","doi":"10.1007/s11749-024-00945-7","DOIUrl":"https://doi.org/10.1007/s11749-024-00945-7","url":null,"abstract":"<p>The present research provides two methodological advances, simulation evidence and a real data analysis, all contributing to the area of local linear survival function estimation and bandwidth selection. The first contribution is the development of a double smoothed local linear survival function estimator which admits an arbitrary number of covariates and the analytic establishment of its asymptotic properties. The second contribution is the efficient implementation of the estimator in practice. This is achieved by developing an automatic plug-in smoothing parameter selector which optimizes the estimator’s performance in all coordinate directions. The traditional problem of vectorization of higher-order derivatives which lead to increasingly intractable matrix algebraic expressions is addressed here by introducing an alternative vectorization that exploits the analytic relationships between the functionals involved. This yields simpler, tractable and efficient in terms of computing time expressions which greatly facilitate the implementation of the rule in practice. The analytic study of the rule’s rate of convergence shows that in contrast to the traditional cross validation approach, the proposed bandwidth selector is functional even for a large number of covariates. The benefits of all methodological advances are illustrated with the analysis of a motivating real-world dataset on credit risk.</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"5 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Higher-order spatial autoregressive varying coefficient model: estimation and specification test 高阶空间自回归变化系数模型:估计和规格检验
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-08-26 DOI: 10.1007/s11749-024-00944-8
Tizheng Li, Yuping Wang

Conventional higher-order spatial autoregressive models assume that regression coefficients are constant over space, which is overly restrictive and unrealistic in applications. In this paper, we introduce higher-order spatial autoregressive varying coefficient model where regression coefficients are allowed to smoothly change over space, which enables us to simultaneously explore different types of spatial dependence and spatial heterogeneity of regression relationship. We propose a semi-parametric generalized method of moments estimation method for the proposed model and derive asymptotic properties of resulting estimators. Moreover, we propose a testing method to detect spatial heterogeneity of the regression relationship. Simulation studies show that the proposed estimation and testing methods perform quite well in finite samples. The Boston house price data are finally analyzed to demonstrate the proposed model and its estimation and testing methods.

传统的高阶空间自回归模型假设回归系数在空间上是恒定的,这种假设限制过多,在应用中不现实。在本文中,我们引入了高阶空间自回归变化系数模型,允许回归系数随空间平滑变化,这使我们能够同时探索不同类型的空间依赖性和回归关系的空间异质性。我们为提出的模型提出了一种半参数广义矩估计方法,并推导出了估计结果的渐近特性。此外,我们还提出了一种检测回归关系空间异质性的方法。模拟研究表明,所提出的估计和检验方法在有限样本中表现相当出色。最后,我们对波士顿的房价数据进行了分析,以证明所提出的模型及其估计和检验方法。
{"title":"Higher-order spatial autoregressive varying coefficient model: estimation and specification test","authors":"Tizheng Li, Yuping Wang","doi":"10.1007/s11749-024-00944-8","DOIUrl":"https://doi.org/10.1007/s11749-024-00944-8","url":null,"abstract":"<p>Conventional higher-order spatial autoregressive models assume that regression coefficients are constant over space, which is overly restrictive and unrealistic in applications. In this paper, we introduce higher-order spatial autoregressive varying coefficient model where regression coefficients are allowed to smoothly change over space, which enables us to simultaneously explore different types of spatial dependence and spatial heterogeneity of regression relationship. We propose a semi-parametric generalized method of moments estimation method for the proposed model and derive asymptotic properties of resulting estimators. Moreover, we propose a testing method to detect spatial heterogeneity of the regression relationship. Simulation studies show that the proposed estimation and testing methods perform quite well in finite samples. The Boston house price data are finally analyzed to demonstrate the proposed model and its estimation and testing methods.</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Composite quantile estimation in partially functional linear regression model with randomly censored responses 具有随机删减响应的部分函数线性回归模型中的复合量值估计
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-08-24 DOI: 10.1007/s11749-024-00946-6
Chengxin Wu, Nengxiang Ling, Philippe Vieu, Guoliang Fan

In this paper, we focus on the studying of composite quantile estimation for the partially functional linear regression model with randomly censored responses. Concretely, we adopt the approach of inverse probability weighting to estimate the weights by using the survival distribution function of the censoring variables with the methods of Kaplan–Meier and Breslow as well as local Kaplan-Meier respectively. Then, we construct the weighted composite quantile estimators for the slope function and the scalar parameters of the model. Furthermore, the large sample properties, such as the convergence rates of the estimators for the slope function and scalar parameters as well as the asymptotic distribution of the estimators for the scalar parameters are obtained under some mild conditions. In addition, we propose a computationally simple resampling technique to approximate the distribution of the parametric estimators of the model, and establish the interval estimations for the scalar parameters. Finally, the finite sample performances of the model and the estimation method are illustrated by some simulation studies and a real data analysis, which shows that both the model and the estimation methods are effective.

本文主要研究随机剔除响应的部分函数线性回归模型的复合量值估计。具体来说,我们采用反概率加权的方法,利用剔除变量的生存分布函数,分别用 Kaplan-Meier 和 Breslow 以及局部 Kaplan-Meier 的方法估计权重。然后,我们构建模型斜率函数和标量参数的加权复合量化估计器。此外,在一些温和条件下,我们还得到了斜率函数和标量参数估计值的收敛率以及标量参数估计值的渐近分布等大样本特性。此外,我们还提出了一种计算简单的重采样技术来近似模型参数估计值的分布,并建立了标量参数的区间估计值。最后,通过一些模拟研究和实际数据分析,说明了模型和估计方法的有限样本性能,表明模型和估计方法都是有效的。
{"title":"Composite quantile estimation in partially functional linear regression model with randomly censored responses","authors":"Chengxin Wu, Nengxiang Ling, Philippe Vieu, Guoliang Fan","doi":"10.1007/s11749-024-00946-6","DOIUrl":"https://doi.org/10.1007/s11749-024-00946-6","url":null,"abstract":"<p>In this paper, we focus on the studying of composite quantile estimation for the partially functional linear regression model with randomly censored responses. Concretely, we adopt the approach of inverse probability weighting to estimate the weights by using the survival distribution function of the censoring variables with the methods of Kaplan–Meier and Breslow as well as local Kaplan-Meier respectively. Then, we construct the weighted composite quantile estimators for the slope function and the scalar parameters of the model. Furthermore, the large sample properties, such as the convergence rates of the estimators for the slope function and scalar parameters as well as the asymptotic distribution of the estimators for the scalar parameters are obtained under some mild conditions. In addition, we propose a computationally simple resampling technique to approximate the distribution of the parametric estimators of the model, and establish the interval estimations for the scalar parameters. Finally, the finite sample performances of the model and the estimation method are illustrated by some simulation studies and a real data analysis, which shows that both the model and the estimation methods are effective.\u0000</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"62 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian inference and cure rate modeling for event history data 事件历史数据的贝叶斯推理和治愈率建模
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-08-16 DOI: 10.1007/s11749-024-00942-w
Panagiotis Papastamoulis, Fotios S. Milienos

Estimating model parameters of a general family of cure models is always a challenging task mainly due to flatness and multimodality of the likelihood function. In this work, we propose a fully Bayesian approach in order to overcome these issues. Posterior inference is carried out by constructing a Metropolis-coupled Markov chain Monte Carlo (MCMC) sampler, which combines Gibbs sampling for the latent cure indicators and Metropolis–Hastings steps with Langevin diffusion dynamics for parameter updates. The main MCMC algorithm is embedded within a parallel tempering scheme by considering heated versions of the target posterior distribution. It is demonstrated that along the considered simulation study the proposed algorithm freely explores the multimodal posterior distribution and produces robust point estimates, while it outperforms maximum likelihood estimation via the Expectation–Maximization algorithm. A by-product of our Bayesian implementation is to control the False Discovery Rate when classifying items as cured or not. Finally, the proposed method is illustrated in a real dataset which refers to recidivism for offenders released from prison; the event of interest is whether the offender was re-incarcerated after probation or not.

主要由于似然函数的平面性和多模态性,估计一般固化模型族的模型参数始终是一项具有挑战性的任务。在这项工作中,我们提出了一种全贝叶斯方法,以克服这些问题。后验推断是通过构建一个 Metropolis 耦合马尔科夫链蒙特卡罗(MCMC)采样器来实现的,该采样器结合了用于潜在治愈指标的 Gibbs 采样和用于参数更新的 Metropolis-Hastings 步骤与 Langevin 扩散动力学。通过考虑目标后验分布的加热版本,将主要 MCMC 算法嵌入到并行调节方案中。结果表明,在所考虑的模拟研究中,所提出的算法可以自由探索多模态后验分布,并产生稳健的点估计,其性能优于通过期望最大化算法进行的最大似然估计。我们采用贝叶斯方法的一个副产品是在将项目分类为治愈或未治愈时控制错误发现率。最后,我们在一个真实的数据集中对所提出的方法进行了说明,该数据集涉及刑满释放罪犯的再犯情况;关注的事件是罪犯在缓刑期满后是否再次被监禁。
{"title":"Bayesian inference and cure rate modeling for event history data","authors":"Panagiotis Papastamoulis, Fotios S. Milienos","doi":"10.1007/s11749-024-00942-w","DOIUrl":"https://doi.org/10.1007/s11749-024-00942-w","url":null,"abstract":"<p>Estimating model parameters of a general family of cure models is always a challenging task mainly due to flatness and multimodality of the likelihood function. In this work, we propose a fully Bayesian approach in order to overcome these issues. Posterior inference is carried out by constructing a Metropolis-coupled Markov chain Monte Carlo (MCMC) sampler, which combines Gibbs sampling for the latent cure indicators and Metropolis–Hastings steps with Langevin diffusion dynamics for parameter updates. The main MCMC algorithm is embedded within a parallel tempering scheme by considering heated versions of the target posterior distribution. It is demonstrated that along the considered simulation study the proposed algorithm freely explores the multimodal posterior distribution and produces robust point estimates, while it outperforms maximum likelihood estimation via the Expectation–Maximization algorithm. A by-product of our Bayesian implementation is to control the False Discovery Rate when classifying items as cured or not. Finally, the proposed method is illustrated in a real dataset which refers to recidivism for offenders released from prison; the event of interest is whether the offender was re-incarcerated after probation or not.</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"58 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extended Hotelling $$T^2$$ test in distributed frameworks 分布式框架中的扩展 Hotelling $$T^2$$ 检验
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-07-30 DOI: 10.1007/s11749-024-00939-5
Bin Du, Xiumin Liu, Junlong Zhao

Hypothesis test for a mean vector is a classical problem in data analysis but has been highly underinvestigated in distributed frameworks where samples of size n are located on k local sites. This paper focuses on the one-sample mean test, proposing synthesized test statistics with a much lower communication cost than the centralized Hotelling (T^2) test. For the homogeneous case, where data on different local sites are independent and identically distributed, the efficiency of our proposed test is comparable to that of the centralized one, and much better than the test constructed from the divide and conquer method. Besides, three heterogeneous cases are considered, where the distributions of the data on local sites can be different. Heterogeneous cases are much more challenging because the local sample means and covariance matrices may be inconsistent estimators. We construct communication-efficient testing procedures for heterogeneous cases, and the power of the proposed test statistics is comparable to that of the centralized one under some conditions. Simulation results verify the effectiveness of the proposed testing procedures.

均值向量的假设检验是数据分析中的一个经典问题,但在分布式框架中,大小为 n 的样本分布在 k 个本地站点上,对这个问题的研究却非常不够。本文重点关注单样本均值检验,提出了通信成本远低于集中式 Hotelling (T^2) 检验的合成检验统计量。对于不同本地站点数据独立且同分布的同质情况,我们提出的检验效率与集中式检验效率相当,且远优于用分而治之法构建的检验。此外,我们还考虑了三种异构情况,即本地站点的数据分布可能不同。异质情况更具挑战性,因为本地样本均值和协方差矩阵可能是不一致的估计值。我们为异构情况构建了通信效率高的测试程序,在某些条件下,所提出的测试统计量的功率与集中式统计量的功率相当。仿真结果验证了所提测试程序的有效性。
{"title":"Extended Hotelling $$T^2$$ test in distributed frameworks","authors":"Bin Du, Xiumin Liu, Junlong Zhao","doi":"10.1007/s11749-024-00939-5","DOIUrl":"https://doi.org/10.1007/s11749-024-00939-5","url":null,"abstract":"<p>Hypothesis test for a mean vector is a classical problem in data analysis but has been highly underinvestigated in distributed frameworks where samples of size <i>n</i> are located on <i>k</i> local sites. This paper focuses on the one-sample mean test, proposing synthesized test statistics with a much lower communication cost than the centralized Hotelling <span>(T^2)</span> test. For the homogeneous case, where data on different local sites are independent and identically distributed, the efficiency of our proposed test is comparable to that of the centralized one, and much better than the test constructed from the divide and conquer method. Besides, three heterogeneous cases are considered, where the distributions of the data on local sites can be different. Heterogeneous cases are much more challenging because the local sample means and covariance matrices may be inconsistent estimators. We construct communication-efficient testing procedures for heterogeneous cases, and the power of the proposed test statistics is comparable to that of the centralized one under some conditions. Simulation results verify the effectiveness of the proposed testing procedures.\u0000</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"74 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141870317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal subsampling for $$L_p$$ -quantile regression via decorrelated score 通过装饰相关分数实现 $$L_p$$ -quantile 回归的最优子采样
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-07-21 DOI: 10.1007/s11749-024-00940-y
Xing Li, Yujing Shao, Lei Wang

To balance robustness of quantile regression and effectiveness of expectile regression, we consider (L_p)-quantile regression models with large-scale data and develop a unified optimal subsampling method to downsize the data volume and reduce computational burden. For low-dimensional (L_p)-quantile regression models, two optimal subsampling probabilities based on the A- and L-optimality criteria are firstly proposed. For the preconceived low-dimensional parameter in high-dimensional (L_p)-quantile regression models, a novel optimal subsampling decorrelated score function is proposed to mitigate the effect from nuisance parameter estimation and then two optimal decorrelated score subsampling probabilities are provided. The asymptotic properties of two optimal subsample estimators are established. The finite-sample performance of the proposed estimators is studied through simulations, and an application to Beijing Air Quality Dataset is also presented.

为了兼顾量化回归的稳健性和期望回归的有效性,我们考虑了大规模数据下的(L_p)-量化回归模型,并开发了一种统一的最优子采样方法,以缩小数据量并减轻计算负担。针对低维 (L_p)-quantile 回归模型,首先提出了基于 A- 和 L- 最佳准则的两种最优子采样概率。对于高维 (L_p) -quantile 回归模型中的预设低维参数,提出了一种新的最优子采样装饰相关分数函数,以减轻来自滋扰参数估计的影响,然后提供了两种最优装饰相关分数子采样概率。建立了两个最优子样本估计器的渐近特性。通过仿真研究了所提估计器的有限样本性能,并介绍了在北京空气质量数据集中的应用。
{"title":"Optimal subsampling for $$L_p$$ -quantile regression via decorrelated score","authors":"Xing Li, Yujing Shao, Lei Wang","doi":"10.1007/s11749-024-00940-y","DOIUrl":"https://doi.org/10.1007/s11749-024-00940-y","url":null,"abstract":"<p>To balance robustness of quantile regression and effectiveness of expectile regression, we consider <span>(L_p)</span>-quantile regression models with large-scale data and develop a unified optimal subsampling method to downsize the data volume and reduce computational burden. For low-dimensional <span>(L_p)</span>-quantile regression models, two optimal subsampling probabilities based on the A- and L-optimality criteria are firstly proposed. For the preconceived low-dimensional parameter in high-dimensional <span>(L_p)</span>-quantile regression models, a novel optimal subsampling decorrelated score function is proposed to mitigate the effect from nuisance parameter estimation and then two optimal decorrelated score subsampling probabilities are provided. The asymptotic properties of two optimal subsample estimators are established. The finite-sample performance of the proposed estimators is studied through simulations, and an application to Beijing Air Quality Dataset is also presented.</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"46 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141738910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Oracle-efficient M-estimation for single-index models with a smooth simultaneous confidence band 具有平滑同步置信带的单指数模型的 Oracle 高效 M 估计
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-07-12 DOI: 10.1007/s11749-024-00935-9
Li Cai, Lei Jin, Jiuzhou Miao, Suojin Wang

Single-index models are important and popular semiparametric models, as they can handle the problem of the “curse of dimensionality” and enjoy the flexibility of nonparametric modeling and the interpretability of parametric modeling. Most existing methods for single-index models are sensitive to outliers or heavy-tailed distributions because they use the least squares criterion. An oracle-efficient M-estimator is proposed for single-index models, and a smooth simultaneous confidence band is constructed by treating the index coefficients as nuisance parameters. Under general assumptions it is shown that the M-estimator for the nonparametric link function, based on any (sqrt{n})-consistent coefficient index parameter estimators, is oracle-efficient. This means that it is uniformly as efficient as the infeasible one obtained by M-regression using the true single-index coefficient parameters. As a result, the asymptotic distribution of the maximal deviation between the M-type kernel estimator and the true link function is derived, and an asymptotically accurate simultaneous confidence band is established as a global inference tool for the link function. The proposed method generalizes the desirable uniform convergence property of ordinary least squares to the M-estimation. Meanwhile, it is a general approach that allows any (sqrt{n})-consistent coefficient parameter estimators to be applied in the procedure to make global inferences for the link function. Simulation studies with commonly encountered sample sizes are reported to support the theoretical findings. These numerical results show certain desirable robustness properties against heavy-tailed errors and outliers. As an illustration, the proposed method is applied to the analysis of a car purchasing dataset.

单指数模型是重要而流行的半参数模型,因为它们可以处理 "维度诅咒 "问题,并享有非参数建模的灵活性和参数建模的可解释性。大多数现有的单指数模型方法都对异常值或重尾分布很敏感,因为它们使用的是最小二乘法准则。本文为单指数模型提出了一种具有甲骨文效率的 M 估计器,并通过将指数系数视为干扰参数,构建了一个平滑的同步置信带。在一般假设下,基于任何 (sqrt{n})-一致的系数指数参数估计器的非参数链接函数的 M-估计器是具有甲骨文效率的。这意味着它的效率与使用真实的单指数系数参数通过 M 回归得到的不可行估计一样高。因此,推导出了 M 型核估计器与真实链接函数之间最大偏差的渐近分布,并建立了渐近精确的同步置信带,作为链接函数的全局推断工具。所提出的方法将普通最小二乘法理想的均匀收敛特性推广到了 M 型估计。同时,它是一种通用方法,允许在程序中应用任何(sqrt{n})一致的系数参数估计器来对联系函数进行全局推断。为支持理论研究结果,报告还对常见样本量进行了模拟研究。这些数值结果表明,对重尾误差和异常值具有某些理想的稳健性。作为示例,我们将所提出的方法应用于汽车购买数据集的分析。
{"title":"Oracle-efficient M-estimation for single-index models with a smooth simultaneous confidence band","authors":"Li Cai, Lei Jin, Jiuzhou Miao, Suojin Wang","doi":"10.1007/s11749-024-00935-9","DOIUrl":"https://doi.org/10.1007/s11749-024-00935-9","url":null,"abstract":"<p>Single-index models are important and popular semiparametric models, as they can handle the problem of the “curse of dimensionality” and enjoy the flexibility of nonparametric modeling and the interpretability of parametric modeling. Most existing methods for single-index models are sensitive to outliers or heavy-tailed distributions because they use the least squares criterion. An oracle-efficient M-estimator is proposed for single-index models, and a smooth simultaneous confidence band is constructed by treating the index coefficients as nuisance parameters. Under general assumptions it is shown that the M-estimator for the nonparametric link function, based on any <span>(sqrt{n})</span>-consistent coefficient index parameter estimators, is oracle-efficient. This means that it is uniformly as efficient as the infeasible one obtained by M-regression using the true single-index coefficient parameters. As a result, the asymptotic distribution of the maximal deviation between the M-type kernel estimator and the true link function is derived, and an asymptotically accurate simultaneous confidence band is established as a global inference tool for the link function. The proposed method generalizes the desirable uniform convergence property of ordinary least squares to the M-estimation. Meanwhile, it is a general approach that allows any <span>(sqrt{n})</span>-consistent coefficient parameter estimators to be applied in the procedure to make global inferences for the link function. Simulation studies with commonly encountered sample sizes are reported to support the theoretical findings. These numerical results show certain desirable robustness properties against heavy-tailed errors and outliers. As an illustration, the proposed method is applied to the analysis of a car purchasing dataset.</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"32 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141608523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Specifications tests for count time series models with covariates 带有协变量的计数时间序列模型的规格检验
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-07-01 DOI: 10.1007/s11749-024-00933-x
Šárka Hudecová, Marie Hušková, Simos G. Meintanis

We propose a goodness-of-fit test for a class of count time series models with covariates which includes the Poisson autoregressive model with covariates (PARX) as a special case. The test criteria are derived from a specific characterization for the conditional probability generating function, and the test statistic is formulated as a (L_2) weighting norm of the corresponding sample counterpart. The asymptotic properties of the proposed test statistic are provided under the null hypothesis as well as under specific alternatives. A bootstrap version of the test is explored in a Monte–Carlo study and illustrated on a real data set on road safety.

我们提出了一类带有协变量的计数时间序列模型的拟合优度检验,其中带有协变量的泊松自回归模型(PARX)是一个特例。检验标准是从条件概率生成函数的一个特定特征推导出来的,检验统计量被表述为相应样本对应的加权规范((L_2) weighting norm)。提出的检验统计量在零假设和特定替代假设下都具有渐近性质。在蒙特卡洛研究中探讨了该检验的自举版本,并在道路安全的真实数据集上进行了说明。
{"title":"Specifications tests for count time series models with covariates","authors":"Šárka Hudecová, Marie Hušková, Simos G. Meintanis","doi":"10.1007/s11749-024-00933-x","DOIUrl":"https://doi.org/10.1007/s11749-024-00933-x","url":null,"abstract":"<p>We propose a goodness-of-fit test for a class of count time series models with covariates which includes the Poisson autoregressive model with covariates (PARX) as a special case. The test criteria are derived from a specific characterization for the conditional probability generating function, and the test statistic is formulated as a <span>(L_2)</span> weighting norm of the corresponding sample counterpart. The asymptotic properties of the proposed test statistic are provided under the null hypothesis as well as under specific alternatives. A bootstrap version of the test is explored in a Monte–Carlo study and illustrated on a real data set on road safety.\u0000</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"193 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Marginal analysis of count time series in the presence of missing observations 对存在缺失观测数据的计数时间序列进行边际分析
IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2024-06-28 DOI: 10.1007/s11749-024-00938-6
Simon Nik

Time series in real-world applications often have missing observations, making typical analytical methods unsuitable. One method for dealing with missing data is the concept of amplitude modulation. While this principle works with any data, here, missing data for unbounded and bounded count time series are investigated, where tailor-made dispersion and skewness statistics are used for model diagnostics. General closed-form asymptotic formulas are derived for such statistics with only weak assumptions on the underlying process. Moreover, closed-form formulas are derived for the popular special cases of Poisson and binomial autoregressive processes, always under the assumption that missingness occurs. The finite-sample performances of the considered asymptotic approximations are analyzed with simulations. The practical application of the corresponding dispersion and skewness tests under missing data is demonstrated with three real data examples.

实际应用中的时间序列经常会出现观测数据缺失的情况,这使得典型的分析方法变得不适用。处理缺失数据的一种方法是振幅调制概念。虽然这一原理适用于任何数据,但本文研究的是无界和有界计数时间序列的缺失数据,并使用定制的离散度和偏斜度统计量进行模型诊断。只需对基本过程做微弱的假设,就能推导出此类统计的一般闭式渐近公式。此外,还推导出了泊松和二项式自回归过程常用特例的闭式公式,这些特例总是在发生遗漏的假设条件下出现的。通过模拟分析了所考虑的渐近近似的有限样本性能。并通过三个真实数据实例演示了缺失数据下相应离散度和偏斜度检验的实际应用。
{"title":"Marginal analysis of count time series in the presence of missing observations","authors":"Simon Nik","doi":"10.1007/s11749-024-00938-6","DOIUrl":"https://doi.org/10.1007/s11749-024-00938-6","url":null,"abstract":"<p>Time series in real-world applications often have missing observations, making typical analytical methods unsuitable. One method for dealing with missing data is the concept of amplitude modulation. While this principle works with any data, here, missing data for unbounded and bounded count time series are investigated, where tailor-made dispersion and skewness statistics are used for model diagnostics. General closed-form asymptotic formulas are derived for such statistics with only weak assumptions on the underlying process. Moreover, closed-form formulas are derived for the popular special cases of Poisson and binomial autoregressive processes, always under the assumption that missingness occurs. The finite-sample performances of the considered asymptotic approximations are analyzed with simulations. The practical application of the corresponding dispersion and skewness tests under missing data is demonstrated with three real data examples.</p>","PeriodicalId":51189,"journal":{"name":"Test","volume":"167 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Test
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1