首页 > 最新文献

Statistica Neerlandica最新文献

英文 中文
Usual stochastic ordering of the sample maxima from dependent distribution‐free random variables 从非相关分布随机变量得到的样本最大值的通常随机排序
IF 1.5 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-07-21 DOI: 10.1111/stan.12275
Longxiang Fang, N. Balakrishnan, Wenyu Huang, Shuai Zhang
In this paper, we discuss stochastic comparison of the largest order statistics arising from two sets of dependent distribution‐free random variables with respect to multivariate chain majorization, where the dependency structure can be defined by Archimedean copulas. When a distribution‐free model with possibly two parameter vectors has its matrix of parameters changing to another matrix of parameters in a certain mathematical sense, we obtain the first sample maxima is larger than the second sample maxima with respect to the usual stochastic order, based on certain conditions. Applications of our results for scale proportional reverse hazards model, exponentiated gamma distribution, Gompertz–Makeham distribution, and location‐scale model, are also given. Meanwhile, we provide two numerical examples to illustrate the results established here.
本文讨论了由两组无相关分布的随机变量引起的最大阶统计量的随机比较,其中相关结构可由阿基米德copuls定义。当一个可能有两个参数向量的无分布模型的参数矩阵在一定的数学意义上改变为另一个参数矩阵时,我们得到了基于通常随机顺序的第一个样本最大值大于第二个样本最大值,基于某些条件。我们的结果在比例逆向灾害模型、指数伽马分布、Gompertz-Makeham分布和位置尺度模型中的应用也被给出。同时,给出了两个数值算例来说明本文的结论。
{"title":"Usual stochastic ordering of the sample maxima from dependent distribution‐free random variables","authors":"Longxiang Fang, N. Balakrishnan, Wenyu Huang, Shuai Zhang","doi":"10.1111/stan.12275","DOIUrl":"https://doi.org/10.1111/stan.12275","url":null,"abstract":"In this paper, we discuss stochastic comparison of the largest order statistics arising from two sets of dependent distribution‐free random variables with respect to multivariate chain majorization, where the dependency structure can be defined by Archimedean copulas. When a distribution‐free model with possibly two parameter vectors has its matrix of parameters changing to another matrix of parameters in a certain mathematical sense, we obtain the first sample maxima is larger than the second sample maxima with respect to the usual stochastic order, based on certain conditions. Applications of our results for scale proportional reverse hazards model, exponentiated gamma distribution, Gompertz–Makeham distribution, and location‐scale model, are also given. Meanwhile, we provide two numerical examples to illustrate the results established here.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"76 1","pages":"112 - 99"},"PeriodicalIF":1.5,"publicationDate":"2022-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89908645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inverse‐probability‐weighted logrank test for stratified survival data with missing measurements 缺失测量的分层生存数据的逆概率加权logrank检验
IF 1.5 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-07-21 DOI: 10.1111/stan.12276
Rim Ben Elouefi, Foued Saâdaoui
The stratified logrank test can be used to compare survival distributions of several groups of patients, while adjusting for the effect of some discrete variable that may be predictive of the survival outcome. In practice, it can happen that this discrete variable is missing for some patients. An inverse‐probability‐weighted version of the stratified logrank statistic is introduced to tackle this issue. Its asymptotic distribution is derived under the null hypothesis of equality of the survival distributions. A simulation study is conducted to assess behavior of the proposed test statistic in finite samples. An analysis of a medical dataset illustrates the methodology.
分层logrank检验可用于比较几组患者的生存分布,同时调整一些可能预测生存结果的离散变量的影响。在实践中,这一离散变量对某些患者来说可能是缺失的。引入了分层logrank统计的逆概率加权版本来解决这个问题。在生存分布相等的零假设下,导出了其渐近分布。通过仿真研究来评估所提出的检验统计量在有限样本中的行为。对医学数据集的分析说明了这种方法。
{"title":"Inverse‐probability‐weighted logrank test for stratified survival data with missing measurements","authors":"Rim Ben Elouefi, Foued Saâdaoui","doi":"10.1111/stan.12276","DOIUrl":"https://doi.org/10.1111/stan.12276","url":null,"abstract":"The stratified logrank test can be used to compare survival distributions of several groups of patients, while adjusting for the effect of some discrete variable that may be predictive of the survival outcome. In practice, it can happen that this discrete variable is missing for some patients. An inverse‐probability‐weighted version of the stratified logrank statistic is introduced to tackle this issue. Its asymptotic distribution is derived under the null hypothesis of equality of the survival distributions. A simulation study is conducted to assess behavior of the proposed test statistic in finite samples. An analysis of a medical dataset illustrates the methodology.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"29 1","pages":"113 - 129"},"PeriodicalIF":1.5,"publicationDate":"2022-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82520985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing replicability with the sceptical p$$ p $$ ‐value: Type‐I error control and sample size planning 用怀疑p $$ p $$值评估可复制性:I型误差控制和样本量计划
IF 1.5 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-07-01 DOI: 10.1111/stan.12312
Charlotte Micheloud, F. Balabdaoui, L. Held
We study a statistical framework for replicability based on a recently proposed quantitative measure of replication success, the sceptical p$$ p $$ ‐value. A recalibration is proposed to obtain exact overall Type‐I error control if the effect is null in both studies and additional bounds on the partial and conditional Type‐I error rate, which represent the case where only one study has a null effect. The approach avoids the double dichotomization for significance of the two‐trials rule and has larger project power to detect existing effects over both studies in combination. It can also be used for power calculations and requires a smaller replication sample size than the two‐trials rule for already convincing original studies. We illustrate the performance of the proposed methodology in an application to data from the Experimental Economics Replication Project.
我们研究了一个统计框架的可复制性基于最近提出的复制成功的定量测量,怀疑p $$ p $$‐值。如果两项研究的影响为零,以及部分和条件型I错误率的附加界限,则建议重新校准以获得精确的总体型I误差控制,这代表了只有一项研究具有零效应的情况。该方法避免了两次试验规则显著性的双重二分法,并且具有更大的项目能力来检测两项研究合并后的现有效应。它也可以用于功率计算,并且需要比已经令人信服的原始研究的两次试验规则更小的复制样本量。我们在实验经济学复制项目的数据应用中说明了所提出方法的性能。
{"title":"Assessing replicability with the sceptical p$$ p $$ ‐value: Type‐I error control and sample size planning","authors":"Charlotte Micheloud, F. Balabdaoui, L. Held","doi":"10.1111/stan.12312","DOIUrl":"https://doi.org/10.1111/stan.12312","url":null,"abstract":"We study a statistical framework for replicability based on a recently proposed quantitative measure of replication success, the sceptical p$$ p $$ ‐value. A recalibration is proposed to obtain exact overall Type‐I error control if the effect is null in both studies and additional bounds on the partial and conditional Type‐I error rate, which represent the case where only one study has a null effect. The approach avoids the double dichotomization for significance of the two‐trials rule and has larger project power to detect existing effects over both studies in combination. It can also be used for power calculations and requires a smaller replication sample size than the two‐trials rule for already convincing original studies. We illustrate the performance of the proposed methodology in an application to data from the Experimental Economics Replication Project.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"77 1","pages":"573 - 591"},"PeriodicalIF":1.5,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83870470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Automatic bias correction for testing in high‐dimensional linear models 用于高维线性模型测试的自动偏差校正
IF 1.5 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-07-01 DOI: 10.1111/stan.12274
Jing Zhou, G. Claeskens
Hypothesis testing is challenging due to the test statistic's complicated asymptotic distribution when it is based on a regularized estimator in high dimensions. We propose a robust testing framework for ℓ1$$ {ell}_1 $$ ‐regularized M‐estimators to cope with non‐Gaussian distributed regression errors, using the robust approximate message passing algorithm. The proposed framework enjoys an automatically built‐in bias correction and is applicable with general convex nondifferentiable loss functions which also allows inference when the focus is a conditional quantile instead of the mean of the response. The estimator compares numerically well with the debiased and desparsified approaches while using the least squares loss function. The use of the Huber loss function demonstrates that the proposed construction provides stable confidence intervals under different regression error distributions.
假设检验是基于高维正则化估计量的,由于检验统计量的渐近分布复杂,所以假设检验具有挑战性。利用鲁棒近似消息传递算法,提出了一种用于处理非高斯分布回归误差的1 $$ {ell}_1 $$正则化M估计的鲁棒测试框架。所提出的框架具有自动内置的偏差校正功能,适用于一般凸不可微损失函数,当焦点是条件分位而不是响应的平均值时,也允许进行推理。当使用最小二乘损失函数时,该估计方法与去偏和去杂化方法在数值上有很好的比较。Huber损失函数的使用表明,所提出的构造在不同的回归误差分布下提供了稳定的置信区间。
{"title":"Automatic bias correction for testing in high‐dimensional linear models","authors":"Jing Zhou, G. Claeskens","doi":"10.1111/stan.12274","DOIUrl":"https://doi.org/10.1111/stan.12274","url":null,"abstract":"Hypothesis testing is challenging due to the test statistic's complicated asymptotic distribution when it is based on a regularized estimator in high dimensions. We propose a robust testing framework for ℓ1$$ {ell}_1 $$ ‐regularized M‐estimators to cope with non‐Gaussian distributed regression errors, using the robust approximate message passing algorithm. The proposed framework enjoys an automatically built‐in bias correction and is applicable with general convex nondifferentiable loss functions which also allows inference when the focus is a conditional quantile instead of the mean of the response. The estimator compares numerically well with the debiased and desparsified approaches while using the least squares loss function. The use of the Huber loss function demonstrates that the proposed construction provides stable confidence intervals under different regression error distributions.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"57 1","pages":"71 - 98"},"PeriodicalIF":1.5,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86790588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing skewness in financial markets 评估金融市场的偏差
IF 1.5 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-05-30 DOI: 10.1111/stan.12273
Giovanni Campisi, L. La Rocca, S. Muzzioli
It is a matter of common observation that investors value substantial gains but are averse to heavy losses. Obvious as it may sound, this translates into an interesting preference for right‐skewed return distributions, whose right tails are heavier than their left tails. Skewness is thus not only a way to describe the shape of a distribution, but also a tool for risk measurement. We review the statistical literature on skewness and provide a comprehensive framework for its assessment. Then, we present a new measure of skewness, based on the decomposition of variance in its upward and downward components. We argue that this measure fills a gap in the literature and show in a simulation study that it strikes a good balance between robustness and sensitivity.
投资者看重可观的收益,但不愿遭受重大损失,这是一个普遍的观察结果。虽然听起来很明显,但这转化为对右倾斜的回报分布的有趣偏好,其右尾比左尾重。因此,偏度不仅是描述分布形状的一种方式,也是衡量风险的一种工具。我们回顾了关于偏度的统计文献,并为其评估提供了一个全面的框架。然后,我们提出了一种新的测量偏度,基于方差的分解在其上下分量。我们认为这一措施填补了文献中的空白,并在模拟研究中表明,它在鲁棒性和敏感性之间取得了很好的平衡。
{"title":"Assessing skewness in financial markets","authors":"Giovanni Campisi, L. La Rocca, S. Muzzioli","doi":"10.1111/stan.12273","DOIUrl":"https://doi.org/10.1111/stan.12273","url":null,"abstract":"It is a matter of common observation that investors value substantial gains but are averse to heavy losses. Obvious as it may sound, this translates into an interesting preference for right‐skewed return distributions, whose right tails are heavier than their left tails. Skewness is thus not only a way to describe the shape of a distribution, but also a tool for risk measurement. We review the statistical literature on skewness and provide a comprehensive framework for its assessment. Then, we present a new measure of skewness, based on the decomposition of variance in its upward and downward components. We argue that this measure fills a gap in the literature and show in a simulation study that it strikes a good balance between robustness and sensitivity.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"61 1","pages":"48 - 70"},"PeriodicalIF":1.5,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88552120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Autoregressive and moving average models for zero‐inflated count time series 零膨胀计数时间序列的自回归和移动平均模型
IF 1.5 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-05-01 DOI: 10.1111/stan.12255
Vurukonda Sathish, S. Mukhopadhyay, R. Tiwari
Zero inflation is a common nuisance while monitoring disease progression over time. This article proposes a new observation‐driven model for zero‐inflated and over‐dispersed count time series. The counts given from the past history of the process and available information on covariates are assumed to be distributed as a mixture of a Poisson distribution and a distribution degenerated at zero, with a time‐dependent mixing probability, πt . Since, count data usually suffers from overdispersion, a Gamma distribution is used to model the excess variation, resulting in a zero‐inflated negative binomial regression model with mean parameter λt . Linear predictors with autoregressive and moving average (ARMA) type terms, covariates, seasonality and trend are fitted to λt and πt through canonical link generalized linear models. Estimation is done using maximum likelihood aided by iterative algorithms, such as Newton‐Raphson (NR) and Expectation and Maximization. Theoretical results on the consistency and asymptotic normality of the estimators are given. The proposed model is illustrated using in‐depth simulation studies and two disease datasets.
随着时间的推移监测疾病进展时,零通胀是一个常见的麻烦。本文提出了一个新的观测驱动模型,用于零膨胀和过分散计数时间序列。从过去的过程历史中给出的计数和有关协变量的可用信息被假设为泊松分布和在零处退化的分布的混合分布,具有时间相关的混合概率πt。由于计数数据通常存在过度分散,因此使用Gamma分布来模拟过度变化,从而产生具有平均参数λt的零膨胀负二项回归模型。通过正则链接广义线性模型拟合具有自回归和移动平均(ARMA)型项、协变量、季节性和趋势的线性预测因子λt和πt。估计是在迭代算法(如Newton - Raphson (NR)和Expectation and Maximization)的辅助下使用最大似然来完成的。给出了估计量的相合性和渐近正态性的理论结果。所提出的模型使用深度模拟研究和两个疾病数据集来说明。
{"title":"Autoregressive and moving average models for zero‐inflated count time series","authors":"Vurukonda Sathish, S. Mukhopadhyay, R. Tiwari","doi":"10.1111/stan.12255","DOIUrl":"https://doi.org/10.1111/stan.12255","url":null,"abstract":"Zero inflation is a common nuisance while monitoring disease progression over time. This article proposes a new observation‐driven model for zero‐inflated and over‐dispersed count time series. The counts given from the past history of the process and available information on covariates are assumed to be distributed as a mixture of a Poisson distribution and a distribution degenerated at zero, with a time‐dependent mixing probability, πt . Since, count data usually suffers from overdispersion, a Gamma distribution is used to model the excess variation, resulting in a zero‐inflated negative binomial regression model with mean parameter λt . Linear predictors with autoregressive and moving average (ARMA) type terms, covariates, seasonality and trend are fitted to λt and πt through canonical link generalized linear models. Estimation is done using maximum likelihood aided by iterative algorithms, such as Newton‐Raphson (NR) and Expectation and Maximization. Theoretical results on the consistency and asymptotic normality of the estimators are given. The proposed model is illustrated using in‐depth simulation studies and two disease datasets.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"36 1 1","pages":"190 - 218"},"PeriodicalIF":1.5,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79903003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Threshold estimation for continuous three‐phase polynomial regression models with constant mean in the middle regime 中区均值为常数的连续三相多项式回归模型的阈值估计
IF 1.5 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-04-21 DOI: 10.1111/stan.12268
Chih‐Hao Chang, Kam-Fai Wong, Wei‐Yee Lim
This paper considers a continuous three‐phase polynomial regression model with two threshold points for dependent data with heteroscedasticity. We assume the model is polynomial of order zero in the middle regime, and is polynomial of higher orders elsewhere. We denote this model by ℳ2$$ {mathcal{M}}_2 $$ , which includes models with one or no threshold points, denoted by ℳ1$$ {mathcal{M}}_1 $$ and ℳ0$$ {mathcal{M}}_0 $$ , respectively, as special cases. We provide an ordered iterative least squares (OiLS) method when estimating ℳ2$$ {mathcal{M}}_2 $$ and establish the consistency of the OiLS estimators under mild conditions. When the underlying model is ℳ1$$ {mathcal{M}}_1 $$ and is (d0−1)$$ left({d}_0-1right) $$ th‐order differentiable but not d0$$ {d}_0 $$ th‐order differentiable at the threshold point, we further show the Op(N−1/(d0+2))$$ {O}_pleft({N}^{-1/left({d}_0+2right)}right) $$ convergence rate of the OiLS estimators, which can be faster than the Op(N−1/(2d0))$$ {O}_pleft({N}^{-1/left(2{d}_0right)}right) $$ convergence rate given in Feder when d0≥3$$ {d}_0ge 3 $$ . We also apply a model‐selection procedure for selecting ℳκ$$ {mathcal{M}}_{kappa } $$ ; κ=0,1,2$$ kappa =0,1,2 $$ . When the underlying model exists, we establish the selection consistency under the aforementioned conditions. Finally, we conduct simulation experiments to demonstrate the finite‐sample performance of our asymptotic results.
本文考虑具有异方差的相关数据具有两个阈值点的连续三相多项式回归模型。我们假设模型在中间区域是零阶多项式,在其他区域是高阶多项式。我们将该模型表示为$$ {mathcal{M}}_2 $$ ,其中包括有或没有阈值点的模型,用ta1表示$$ {mathcal{M}}_1 $$ 和:0$$ {mathcal{M}}_0 $$ ,分别作为特殊情况。我们提供了一种有序迭代最小二乘(OiLS)方法来估计1$$ {mathcal{M}}_2 $$ 并建立了温和条件下油液估测器的一致性。当底层模型为$$ {mathcal{M}}_1 $$ 是(d0−1)$$ left({d}_0-1right) $$ 阶可微但不是0$$ {d}_0 $$ 在阈值点处,我们进一步证明了Op(N−1/(d0+2))$$ {O}_pleft({N}^{-1/left({d}_0+2right)}right) $$ oil估计器的收敛速度比Op(N−1/(2d0))更快。$$ {O}_pleft({N}^{-1/left(2{d}_0right)}right) $$ 当d0≥3时,Feder给出的收敛速度$$ {d}_0ge 3 $$ . 我们还应用模型选择程序来选择κ$$ {mathcal{M}}_{kappa } $$ ;κ=0,1,2$$ kappa =0,1,2 $$ . 当底层模型存在时,我们建立了上述条件下的选择一致性。最后,我们进行了模拟实验来证明我们的渐近结果的有限样本性能。
{"title":"Threshold estimation for continuous three‐phase polynomial regression models with constant mean in the middle regime","authors":"Chih‐Hao Chang, Kam-Fai Wong, Wei‐Yee Lim","doi":"10.1111/stan.12268","DOIUrl":"https://doi.org/10.1111/stan.12268","url":null,"abstract":"This paper considers a continuous three‐phase polynomial regression model with two threshold points for dependent data with heteroscedasticity. We assume the model is polynomial of order zero in the middle regime, and is polynomial of higher orders elsewhere. We denote this model by ℳ2$$ {mathcal{M}}_2 $$ , which includes models with one or no threshold points, denoted by ℳ1$$ {mathcal{M}}_1 $$ and ℳ0$$ {mathcal{M}}_0 $$ , respectively, as special cases. We provide an ordered iterative least squares (OiLS) method when estimating ℳ2$$ {mathcal{M}}_2 $$ and establish the consistency of the OiLS estimators under mild conditions. When the underlying model is ℳ1$$ {mathcal{M}}_1 $$ and is (d0−1)$$ left({d}_0-1right) $$ th‐order differentiable but not d0$$ {d}_0 $$ th‐order differentiable at the threshold point, we further show the Op(N−1/(d0+2))$$ {O}_pleft({N}^{-1/left({d}_0+2right)}right) $$ convergence rate of the OiLS estimators, which can be faster than the Op(N−1/(2d0))$$ {O}_pleft({N}^{-1/left(2{d}_0right)}right) $$ convergence rate given in Feder when d0≥3$$ {d}_0ge 3 $$ . We also apply a model‐selection procedure for selecting ℳκ$$ {mathcal{M}}_{kappa } $$ ; κ=0,1,2$$ kappa =0,1,2 $$ . When the underlying model exists, we establish the selection consistency under the aforementioned conditions. Finally, we conduct simulation experiments to demonstrate the finite‐sample performance of our asymptotic results.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"13 1","pages":"4 - 47"},"PeriodicalIF":1.5,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87649922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal subsampling for multiplicative regression with massive data 海量数据乘法回归的最优子抽样
IF 1.5 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-03-12 DOI: 10.1111/stan.12266
Tianzhen Wang, Haixiang Zhang
Faced with massive data, subsampling is a popular way to downsize the data volume for reducing computational burden. The key idea of subsampling is to perform statistical analysis on a representative subsample drawn from the full data. It provides a practical solution to extracting useful information from big data. In this article, we develop an efficient subsampling method for large‐scale multiplicative regression model, which can largely reduce the computational burden due to massive data. Under some regularity conditions, we establish consistency and asymptotic normality of the subsample‐based estimator, and derive the optimal subsampling probabilities according to the L‐optimality criterion. A two‐step algorithm is developed to approximate the optimal subsampling procedure. Meanwhile, the convergence rate and asymptotic normality of the two‐step subsample estimator are established. Numerical studies and two real data applications are carried out to evaluate the performance of our subsampling method.
面对海量数据,子采样是减小数据量以减少计算负担的一种流行方法。子抽样的关键思想是对从完整数据中抽取的具有代表性的子样本进行统计分析。它为从大数据中提取有用信息提供了一个实用的解决方案。在本文中,我们开发了一种有效的大规模乘法回归模型的子抽样方法,可以大大减少由于大量数据而造成的计算负担。在一些正则性条件下,我们建立了基于子样本的估计量的相合性和渐近正态性,并根据L -最优性准则导出了最优子样本概率。一种两步算法被开发来近似最优子抽样过程。同时,给出了两步子样本估计量的收敛速率和渐近正态性。通过数值研究和两个实际数据应用来评价该方法的性能。
{"title":"Optimal subsampling for multiplicative regression with massive data","authors":"Tianzhen Wang, Haixiang Zhang","doi":"10.1111/stan.12266","DOIUrl":"https://doi.org/10.1111/stan.12266","url":null,"abstract":"Faced with massive data, subsampling is a popular way to downsize the data volume for reducing computational burden. The key idea of subsampling is to perform statistical analysis on a representative subsample drawn from the full data. It provides a practical solution to extracting useful information from big data. In this article, we develop an efficient subsampling method for large‐scale multiplicative regression model, which can largely reduce the computational burden due to massive data. Under some regularity conditions, we establish consistency and asymptotic normality of the subsample‐based estimator, and derive the optimal subsampling probabilities according to the L‐optimality criterion. A two‐step algorithm is developed to approximate the optimal subsampling procedure. Meanwhile, the convergence rate and asymptotic normality of the two‐step subsample estimator are established. Numerical studies and two real data applications are carried out to evaluate the performance of our subsampling method.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"11 1","pages":"418 - 449"},"PeriodicalIF":1.5,"publicationDate":"2022-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89804072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Change-point analysis through integer-valued autoregressive process with application to some COVID-19 data. 整数值自回归过程的变点分析与部分COVID-19数据的应用
IF 1.5 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-02-01 Epub Date: 2021-07-11 DOI: 10.1111/stan.12251
Subhankar Chattopadhyay, Raju Maiti, Samarjit Das, Atanu Biswas

In this article, we consider the problem of change-point analysis for the count time series data through an integer-valued autoregressive process of order 1 (INAR(1)) with time-varying covariates. These types of features we observe in many real-life scenarios especially in the COVID-19 data sets, where the number of active cases over time starts falling and then again increases. In order to capture those features, we use Poisson INAR(1) process with a time-varying smoothing covariate. By using such model, we can model both the components in the active cases at time-point t namely, (i) number of nonrecovery cases from the previous time-point and (ii) number of new cases at time-point t. We study some theoretical properties of the proposed model along with forecasting. Some simulation studies are performed to study the effectiveness of the proposed method. Finally, we analyze two COVID-19 data sets and compare our proposed model with another PINAR(1) process which has time-varying covariate but no change-point, to demonstrate the overall performance of our proposed model.

本文通过具有时变协变量的1阶整数值自回归过程(INAR(1)),研究了计数时间序列数据的变点分析问题。我们在许多现实场景中观察到这些类型的特征,特别是在COVID-19数据集中,随着时间的推移,活跃病例的数量开始下降,然后再次增加。为了捕获这些特征,我们使用带时变平滑协变量的泊松INAR(1)过程。通过使用该模型,我们可以对时间点t的活动病例的组成部分进行建模,即(i)前一个时间点的未恢复病例数和(ii)时间点t的新病例数。我们研究了所提出模型的一些理论性质以及预测。通过仿真研究验证了该方法的有效性。最后,我们分析了两个COVID-19数据集,并将我们提出的模型与另一个具有时变协变量但没有变化点的PINAR(1)过程进行了比较,以证明我们提出的模型的整体性能。
{"title":"Change-point analysis through integer-valued autoregressive process with application to some COVID-19 data.","authors":"Subhankar Chattopadhyay,&nbsp;Raju Maiti,&nbsp;Samarjit Das,&nbsp;Atanu Biswas","doi":"10.1111/stan.12251","DOIUrl":"https://doi.org/10.1111/stan.12251","url":null,"abstract":"<p><p>In this article, we consider the problem of change-point analysis for the count time series data through an integer-valued autoregressive process of order 1 (INAR(1)) with time-varying covariates. These types of features we observe in many real-life scenarios especially in the COVID-19 data sets, where the number of active cases over time starts falling and then again increases. In order to capture those features, we use Poisson INAR(1) process with a time-varying smoothing covariate. By using such model, we can model both the components in the active cases at time-point <i>t</i> namely, (i) number of nonrecovery cases from the previous time-point and (ii) number of new cases at time-point <i>t</i>. We study some theoretical properties of the proposed model along with forecasting. Some simulation studies are performed to study the effectiveness of the proposed method. Finally, we analyze two COVID-19 data sets and compare our proposed model with another PINAR(1) process which has time-varying covariate but no change-point, to demonstrate the overall performance of our proposed model.</p>","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"76 1","pages":"4-34"},"PeriodicalIF":1.5,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/stan.12251","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39154751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Average ordinary least squares‐centered penalized regression: A more efficient way to address multicollinearity than ridge regression 平均普通最小二乘中心惩罚回归:一种比脊回归更有效的解决多重共线性的方法
IF 1.5 3区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2022-01-23 DOI: 10.1111/stan.12263
Wei Wang, Linjiang Li, Sheng Li, F. Yin, Fang Liao, Zhang Tao, Xiaosong Li, Xiong Xiao, Yue Ma
We developed a novel method to address multicollinearity in linear models called average ordinary least squares (OLS)‐centered penalized regression (AOPR). AOPR penalizes the cost function to shrink the estimators toward the weighted‐average OLS estimator. The commonly used ridge regression (RR) shrinks the estimators toward zero, that is, employs penalization prior β∼N(0,1/k) in the Bayesian view, which contradicts the common real prior β≠0 . Therefore, RR selects small penalization coefficients to relieve such a contradiction and thus makes the penalizations inadequate. Mathematical derivations remind us that AOPR could increase the performance of RR and OLS regression. A simulation study shows that AOPR obtains more accurate estimators than OLS regression in most situations and more accurate estimators than RR when the signs of the true β s are identical and is slightly less accurate than RR when the signs of the true β s are different. Additionally, a case study shows that AOPR obtains more stable estimators and stronger statistical power and predictive ability than RR and OLS regression. Through these results, we recommend using AOPR to address multicollinearity more efficiently than RR and OLS regression, especially when the true β s have identical signs.
我们开发了一种新的方法来解决线性模型中的多重共线性,称为平均普通最小二乘(OLS)中心惩罚回归(AOPR)。AOPR对代价函数进行惩罚,将估计器缩小到加权平均OLS估计器。常用的脊回归(RR)将估计量缩小到零,即在贝叶斯观点中使用惩罚先验β ~ N(0,1/k),这与常见的实先验β≠0相矛盾。因此,RR选择较小的惩罚系数来缓解这种矛盾,从而使惩罚不足。数学推导提醒我们,AOPR可以提高RR和OLS回归的性能。仿真研究表明,在大多数情况下,AOPR的估计精度高于OLS回归,当真β s的符号相同时,AOPR的估计精度高于RR,当真β s的符号不同时,AOPR的估计精度略低于RR。此外,实例研究表明,AOPR比RR和OLS回归获得了更稳定的估计量和更强的统计能力和预测能力。通过这些结果,我们建议使用AOPR比RR和OLS回归更有效地解决多重共线性问题,特别是当真β s具有相同的符号时。
{"title":"Average ordinary least squares‐centered penalized regression: A more efficient way to address multicollinearity than ridge regression","authors":"Wei Wang, Linjiang Li, Sheng Li, F. Yin, Fang Liao, Zhang Tao, Xiaosong Li, Xiong Xiao, Yue Ma","doi":"10.1111/stan.12263","DOIUrl":"https://doi.org/10.1111/stan.12263","url":null,"abstract":"We developed a novel method to address multicollinearity in linear models called average ordinary least squares (OLS)‐centered penalized regression (AOPR). AOPR penalizes the cost function to shrink the estimators toward the weighted‐average OLS estimator. The commonly used ridge regression (RR) shrinks the estimators toward zero, that is, employs penalization prior β∼N(0,1/k) in the Bayesian view, which contradicts the common real prior β≠0 . Therefore, RR selects small penalization coefficients to relieve such a contradiction and thus makes the penalizations inadequate. Mathematical derivations remind us that AOPR could increase the performance of RR and OLS regression. A simulation study shows that AOPR obtains more accurate estimators than OLS regression in most situations and more accurate estimators than RR when the signs of the true β s are identical and is slightly less accurate than RR when the signs of the true β s are different. Additionally, a case study shows that AOPR obtains more stable estimators and stronger statistical power and predictive ability than RR and OLS regression. Through these results, we recommend using AOPR to address multicollinearity more efficiently than RR and OLS regression, especially when the true β s have identical signs.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"88 1","pages":"347 - 368"},"PeriodicalIF":1.5,"publicationDate":"2022-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81131506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistica Neerlandica
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1