Pub Date : 2024-08-31DOI: 10.1007/s00362-024-01606-5
Guogen Shan, Xinlin Lu, Yahui Zhang, Samuel S. Wu
High placebo responses could significantly reduce the treatment effect in a parallel randomized trial. To combat that challenge, several approaches were developed, including the sequential parallel comparison design (SPCD) that was shown to increase the statistical power as compared to the traditional randomized trial. A linear combination of the response rate differences from two phases per the SPCD is commonly used to measure the overall treatment effect size. The traditional approach to calculate the confidence interval for the overall rate difference is based on the delta method using the variance–covariance matrix of all outcomes. As outcomes from a multinomial distribution are correlated, we suggest utilizing a constrained variance–covariance matrix in the delta method. In the observation of anti-conservative coverages from asymptotic intervals, we further propose using importance sampling to develop accurate intervals. Simulation studies show that accurate intervals have better coverage probabilities than others and the interval width of accurate intervals is similar to the interval width of others. Two real trials to treat major depressive disorder are used to illustrate the application of the proposed intervals.
{"title":"Confidence intervals for overall response rate difference in the sequential parallel comparison design","authors":"Guogen Shan, Xinlin Lu, Yahui Zhang, Samuel S. Wu","doi":"10.1007/s00362-024-01606-5","DOIUrl":"https://doi.org/10.1007/s00362-024-01606-5","url":null,"abstract":"<p>High placebo responses could significantly reduce the treatment effect in a parallel randomized trial. To combat that challenge, several approaches were developed, including the sequential parallel comparison design (SPCD) that was shown to increase the statistical power as compared to the traditional randomized trial. A linear combination of the response rate differences from two phases per the SPCD is commonly used to measure the overall treatment effect size. The traditional approach to calculate the confidence interval for the overall rate difference is based on the delta method using the variance–covariance matrix of all outcomes. As outcomes from a multinomial distribution are correlated, we suggest utilizing a constrained variance–covariance matrix in the delta method. In the observation of anti-conservative coverages from asymptotic intervals, we further propose using importance sampling to develop accurate intervals. Simulation studies show that accurate intervals have better coverage probabilities than others and the interval width of accurate intervals is similar to the interval width of others. Two real trials to treat major depressive disorder are used to illustrate the application of the proposed intervals.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"39 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-27DOI: 10.1007/s00362-024-01597-3
David R. Bickel
Using statistical methods to analyze data requires considering the data set to be randomly generated from a probability distribution that is unknown but idealized according to a mathematical model consisting of constraints, assumptions about the distribution. Since the choice of such a model is up to the scientist, there is an understandable bias toward choosing models that make scientific conclusions appear more certain than they really are. There is a similar bias in the scientist’s choice of whether to use Bayesian or frequentist methods. This article provides tools to mitigate both of those biases on the basis of a principle of information theory. It is found that the same principle unifies Bayesianism with the fiducial version of frequentism. The principle arguably overcomes not only the main objections against fiducial inference but also the main Bayesian objection against the use of confidence intervals.
{"title":"Bayesian and frequentist inference derived from the maximum entropy principle with applications to propagating uncertainty about statistical methods","authors":"David R. Bickel","doi":"10.1007/s00362-024-01597-3","DOIUrl":"https://doi.org/10.1007/s00362-024-01597-3","url":null,"abstract":"<p>Using statistical methods to analyze data requires considering the data set to be randomly generated from a probability distribution that is unknown but idealized according to a mathematical model consisting of constraints, assumptions about the distribution. Since the choice of such a model is up to the scientist, there is an understandable bias toward choosing models that make scientific conclusions appear more certain than they really are. There is a similar bias in the scientist’s choice of whether to use Bayesian or frequentist methods. This article provides tools to mitigate both of those biases on the basis of a principle of information theory. It is found that the same principle unifies Bayesianism with the fiducial version of frequentism. The principle arguably overcomes not only the main objections against fiducial inference but also the main Bayesian objection against the use of confidence intervals.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"46 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-26DOI: 10.1007/s00362-024-01593-7
Asma Saleh
Analysis of binary matched pairs data is problematic due to infinite maximum likelihood estimates of the log odds ratio and potentially biased estimates, especially for small samples. We propose a penalised version of the log-likelihood function based on adjusted responses which always results in a finite estimator of the log odds ratio. The probability limit of the adjusted log-likelihood estimator is derived and it is shown that in certain settings the maximum likelihood, conditional and modified profile log-likelihood estimators drop out as special cases of the former estimator. We implement indirect inference to the adjusted log-likelihood estimator. It is shown, through a complete enumeration study, that the indirect inference estimator is competitive in terms of bias and variance in comparison to the maximum likelihood, conditional, modified profile log-likelihood and Firth’s penalised log-likelihood estimators.
{"title":"Reduced bias estimation of the log odds ratio","authors":"Asma Saleh","doi":"10.1007/s00362-024-01593-7","DOIUrl":"https://doi.org/10.1007/s00362-024-01593-7","url":null,"abstract":"<p>Analysis of binary matched pairs data is problematic due to infinite maximum likelihood estimates of the log odds ratio and potentially biased estimates, especially for small samples. We propose a penalised version of the log-likelihood function based on adjusted responses which always results in a finite estimator of the log odds ratio. The probability limit of the adjusted log-likelihood estimator is derived and it is shown that in certain settings the maximum likelihood, conditional and modified profile log-likelihood estimators drop out as special cases of the former estimator. We implement indirect inference to the adjusted log-likelihood estimator. It is shown, through a complete enumeration study, that the indirect inference estimator is competitive in terms of bias and variance in comparison to the maximum likelihood, conditional, modified profile log-likelihood and Firth’s penalised log-likelihood estimators.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"6 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-23DOI: 10.1007/s00362-024-01601-w
Abdul Haq, William H. Woodall
In this short note, we reevaluate the run-length performance of the EWMA and exponentiated EWMA (Exp-EWMA) charts using the conditional expected delay metric. It is found that the enhancements offered by the Exp-EWMA chart over the EWMA chart in the zero-state setup are marginal. Given its simplicity in implementation and its ability to encompass the functionality of the Exp-EWMA chart in detecting delayed shifts in the process mean, the EWMA chart remains the preferred choice over the Exp-EWMA chart.
{"title":"A critical note on the exponentiated EWMA chart","authors":"Abdul Haq, William H. Woodall","doi":"10.1007/s00362-024-01601-w","DOIUrl":"https://doi.org/10.1007/s00362-024-01601-w","url":null,"abstract":"<p>In this short note, we reevaluate the run-length performance of the EWMA and exponentiated EWMA (Exp-EWMA) charts using the conditional expected delay metric. It is found that the enhancements offered by the Exp-EWMA chart over the EWMA chart in the zero-state setup are marginal. Given its simplicity in implementation and its ability to encompass the functionality of the Exp-EWMA chart in detecting delayed shifts in the process mean, the EWMA chart remains the preferred choice over the Exp-EWMA chart.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"16 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-21DOI: 10.1007/s00362-024-01602-9
Abbas Alhakim
The symbolic partitioning of the Pearson chi-square statistic with unequal cell probabilities into asymptotically independent component tests is revisited. We introduce Hadamard-like matrices whose resulting component tests compares the full vector of cell counts. This contributes to making these component tests intuitively interpretable. We present a simple way to construct the Hadamard-like matrices when the number of cell counts is 2, 4 or 8 without assuming any relations between cell probabilities. For higher powers of 2, the theory of orthogonal designs is used to set a priori relations between cell probabilities, in order to establish the construction. Simulations are given to illustrate the sensitivity of various components to changes in location, scale, skewness and tail probability, as well as to illustrate the potential improvement in power when the cell probabilities are changed.
{"title":"Hadamard matrices, quaternions, and the Pearson chi-square statistic","authors":"Abbas Alhakim","doi":"10.1007/s00362-024-01602-9","DOIUrl":"https://doi.org/10.1007/s00362-024-01602-9","url":null,"abstract":"<p>The symbolic partitioning of the Pearson chi-square statistic with unequal cell probabilities into asymptotically independent component tests is revisited. We introduce Hadamard-like matrices whose resulting component tests compares the full vector of cell counts. This contributes to making these component tests intuitively interpretable. We present a simple way to construct the Hadamard-like matrices when the number of cell counts is 2, 4 or 8 without assuming any relations between cell probabilities. For higher powers of 2, the theory of orthogonal designs is used to set a priori relations between cell probabilities, in order to establish the construction. Simulations are given to illustrate the sensitivity of various components to changes in location, scale, skewness and tail probability, as well as to illustrate the potential improvement in power when the cell probabilities are changed.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"5 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1007/s00362-024-01581-x
Agah Kozan, Burak Uyar, Halil Tanil
Similar to usual lower records, Bottom-(k)-lists (Kozan and Tanil, İstatistik J Turk Stat Assoc 13:73–79, 2013) have a wide range of practical applications in meteorology, hydrology, sports, etc. Also, exceedance statistics can be viewed as a close relative to tolerance limits—an important field of statistical science. In this study, an idea of combining these two important subjects together is studied and an exceedance statistic is defined based on bottom-(k)-lists in an independent and identically distributed (iid) continuous random sequence. Probability mass function (pmf) of a selected exceedance statistic is obtained. Also, an illustrative application of the exceedance statistic is given.
与通常的较低记录类似,Bottom-(k)-lists(Kozan 和 Tanil,İstatistik J Turk Stat Assoc 13:73-79,2013 年)在气象学、水文学、体育等领域有着广泛的实际应用。此外,超限统计也可视为与容限的近亲--容限是统计科学的一个重要领域。本研究将这两个重要课题结合在一起进行研究,并基于独立且同分布(iid)的连续随机序列中的底(k )列表定义了超限统计量。得出了所选超标统计量的概率质量函数(pmf)。此外,还给出了超标统计量的示例应用。
{"title":"Exceedance statistics based on bottom- $$k$$ -lists","authors":"Agah Kozan, Burak Uyar, Halil Tanil","doi":"10.1007/s00362-024-01581-x","DOIUrl":"https://doi.org/10.1007/s00362-024-01581-x","url":null,"abstract":"<p>Similar to usual lower records, Bottom-<span>(k)</span>-lists (Kozan and Tanil, İstatistik J Turk Stat Assoc 13:73–79, 2013) have a wide range of practical applications in meteorology, hydrology, sports, etc. Also, exceedance statistics can be viewed as a close relative to tolerance limits—an important field of statistical science. In this study, an idea of combining these two important subjects together is studied and an exceedance statistic is defined based on bottom-<span>(k)</span>-lists in an independent and identically distributed (iid) continuous random sequence. Probability mass function (pmf) of a selected exceedance statistic is obtained. Also, an illustrative application of the exceedance statistic is given.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"41 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-20DOI: 10.1007/s00362-024-01589-3
Na Young Yoo, Ji Hwan Cha
In analyzing bivariate data sets, data with common observations are frequently encountered and, in this case, existing absolutely continuous bivariate distributions are not applicable. Only a few models, such as the bivariate distribution proposed by Marshall and Olkin (J Am Stat Assoc 62(317):30–44, 1967), have been developed to model such data sets and the choice of models to fit data sets having common observations is very limited. In this paper, three general classes of bivariate distributions for modeling data with common observations are developed. To develop the bivariate distributions, we employ a probability model in reliability. Considering a system with two components, it is assumed that, when the first failure of the components occurs, with some probability, it immediately causes the failure of the remaining component, and, with complementary probability, the residual lifetime of the remaining component is shortened according to some stochastic order. It will be shown that, by specifying the underlying distributions contained in the joint distribution, numerous families of bivariate distributions can be generated. Therefore, this work provides substantially increased flexibility in modeling data sets with common observations. The developed models are fitted to two real-life data sets and it is shown that these models outperform the existing models in terms of fitting performance and their performances are satisfactory.
在分析二元数据集时,经常会遇到具有共同观测值的数据,在这种情况下,现有的 绝对连续二元分布并不适用。只有少数几个模型,如 Marshall 和 Olkin 提出的双变量分布(J Am Stat Assoc 62(317):30-44, 1967),被用来模拟这类数据集,而用于拟合具有共同观测值的数据集的模型选择非常有限。本文开发了三类通用的双变量分布,用于对具有共同观测值的数据建模。为了建立双变量分布,我们采用了可靠性概率模型。考虑到一个系统有两个组件,假定当组件的第一个故障以某种概率发生时,会立即导致剩余组件的故障,并且以互补概率,剩余组件的剩余寿命会按照某种随机顺序缩短。我们将证明,通过指定联合分布中包含的基本分布,可以生成众多的二元分布系列。因此,这项工作大大提高了对具有共同观测数据集建模的灵活性。开发的模型拟合了两个现实生活中的数据集,结果表明这些模型在拟合性能方面优于现有模型,其表现令人满意。
{"title":"General classes of bivariate distributions for modeling data with common observations","authors":"Na Young Yoo, Ji Hwan Cha","doi":"10.1007/s00362-024-01589-3","DOIUrl":"https://doi.org/10.1007/s00362-024-01589-3","url":null,"abstract":"<p>In analyzing bivariate data sets, data with common observations are frequently encountered and, in this case, existing absolutely continuous bivariate distributions are not applicable. Only a few models, such as the bivariate distribution proposed by Marshall and Olkin (J Am Stat Assoc 62(317):30–44, 1967), have been developed to model such data sets and the choice of models to fit data sets having common observations is very limited. In this paper, three general classes of bivariate distributions for modeling data with common observations are developed. To develop the bivariate distributions, we employ a probability model in reliability. Considering a system with two components, it is assumed that, when the first failure of the components occurs, with some probability, it immediately causes the failure of the remaining component, and, with complementary probability, the residual lifetime of the remaining component is shortened according to some stochastic order. It will be shown that, by specifying the underlying distributions contained in the joint distribution, numerous families of bivariate distributions can be generated. Therefore, this work provides substantially increased flexibility in modeling data sets with common observations. The developed models are fitted to two real-life data sets and it is shown that these models outperform the existing models in terms of fitting performance and their performances are satisfactory.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"43 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141744105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-19DOI: 10.1007/s00362-024-01587-5
Reza Modarres
We consider the Hotelling (T^2) test in low sample size, high dimensional setting. We partition the p variables into (b>1) blocks of p/b variables and use the union-intersection principle to propose a testing procedure that computes the (T^2) test in each block. We show that the proposed method is more powerful than Hotelling (T^2) test. We also consider Wilks method of outlier detection and use the union-intersection principle to search for outliers in blocks of variables. The significance level and the power function of the new test are investigated. We show that the new outlier detection method produces more power compared to Wilks test.
{"title":"Hotelling $$T^2$$ test in high dimensions with application to Wilks outlier method","authors":"Reza Modarres","doi":"10.1007/s00362-024-01587-5","DOIUrl":"https://doi.org/10.1007/s00362-024-01587-5","url":null,"abstract":"<p>We consider the Hotelling <span>(T^2)</span> test in low sample size, high dimensional setting. We partition the <i>p</i> variables into <span>(b>1)</span> blocks of <i>p</i>/<i>b</i> variables and use the union-intersection principle to propose a testing procedure that computes the <span>(T^2)</span> test in each block. We show that the proposed method is more powerful than Hotelling <span>(T^2)</span> test. We also consider Wilks method of outlier detection and use the union-intersection principle to search for outliers in blocks of variables. The significance level and the power function of the new test are investigated. We show that the new outlier detection method produces more power compared to Wilks test.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"7 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141744301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-10DOI: 10.1007/s00362-024-01567-9
José R. Berrendero, Alejandro Cholaquidis, Antonio Cuevas
The problem of linearly predicting a scalar response Y from a functional (random) explanatory variable (X=X(t), tin I) is considered. It is argued that the term “linearly” can be interpreted in several meaningful ways. Thus, one could interpret that (up to a random noise) Y could be expressed as a linear combination of a finite family of marginals (X(t_i)) of the process X, or a limit of a sequence of such linear combinations. This simple point of view (which has some precedents in the literature) leads to a formulation of the linear model in terms of the RKHS space generated by the covariance function of the process X(t). It turns out that such RKHS-based formulation includes the standard functional linear model, based on the inner product in the space (L^2[0,1]), as a particular case. It includes as well all models in which Y is assumed to be (up to an additive noise) a linear combination of a finite number of linear projections of X. Some consistency results are proved which, in particular, lead to an asymptotic approximation of the predictions derived from the general (functional) linear model in terms of finite-dimensional models based on a finite family of marginals (X(t_i)), for an increasing grid of points (t_j) in I. We also include a discussion on the crucial notion of coefficient of determination (aimed at assessing the fit of the model) in this setting. A few experimental results are given.
研究考虑了从函数(随机)解释变量 (X=X(t),tin I) 线性预测标量响应 Y 的问题。有人认为,"线性 "一词可以有几种有意义的解释。因此,我们可以将 Y 解释为过程 X 的边际值 (X(t_i))的有限族的线性组合,或者是这种线性组合序列的极限。这种简单的观点(在文献中已有先例)导致了线性模型的表述,即由过程 X(t) 的协方差函数生成的 RKHS 空间。事实证明,这种基于 RKHS 的表述包括标准函数线性模型,它基于空间 (L^2[0,1])中的内积,是一种特殊情况。它还包括所有假定 Y 是 X 的有限数量线性投影的线性组合(直到加性噪声)的模型。我们还讨论了在这种情况下决定系数的关键概念(旨在评估模型的拟合度)。文中给出了一些实验结果。
{"title":"On the functional regression model and its finite-dimensional approximations","authors":"José R. Berrendero, Alejandro Cholaquidis, Antonio Cuevas","doi":"10.1007/s00362-024-01567-9","DOIUrl":"https://doi.org/10.1007/s00362-024-01567-9","url":null,"abstract":"<p>The problem of linearly predicting a scalar response <i>Y</i> from a functional (random) explanatory variable <span>(X=X(t), tin I)</span> is considered. It is argued that the term “linearly” can be interpreted in several meaningful ways. Thus, one could interpret that (up to a random noise) <i>Y</i> could be expressed as a linear combination of a finite family of marginals <span>(X(t_i))</span> of the process <i>X</i>, or a limit of a sequence of such linear combinations. This simple point of view (which has some precedents in the literature) leads to a formulation of the linear model in terms of the RKHS space generated by the covariance function of the process <i>X</i>(<i>t</i>). It turns out that such RKHS-based formulation includes the standard functional linear model, based on the inner product in the space <span>(L^2[0,1])</span>, as a particular case. It includes as well all models in which <i>Y</i> is assumed to be (up to an additive noise) a linear combination of a finite number of linear projections of <i>X</i>. Some consistency results are proved which, in particular, lead to an asymptotic approximation of the predictions derived from the general (functional) linear model in terms of finite-dimensional models based on a finite family of marginals <span>(X(t_i))</span>, for an increasing grid of points <span>(t_j)</span> in <i>I</i>. We also include a discussion on the crucial notion of coefficient of determination (aimed at assessing the fit of the model) in this setting. A few experimental results are given.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"18 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-09DOI: 10.1007/s00362-024-01590-w
Shu Wei Chou-Chen, Rodrigo A. Oliveira, Irina Raicher, Gilberto A. Paula
Additive partial linear models with symmetric autoregressive errors of order p are proposed in this paper for modeling time series data. Specifically, we apply this model class to explain the weekly hospitalization for respiratory diseases in Sorocaba, São Paulo, Brazil, by incorporating climate and pollution as covariates, trend and seasonality. The main feature of this model class is its capability of considering a set of explanatory variables with linear and nonlinear structures, which allows, for example, to model jointly trend and seasonality of a time series with additive functions for the nonlinear explanatory variables and a predictor to accommodate discrete and linear explanatory variables. Additionally, the conditional symmetric errors allow the possibility of fitting data with high correlation order, as well as error distributions with heavier or lighter tails than the normal ones. We present the model class and a novel iterative process is derived by combining a P-GAM type algorithm with a quasi-Newton procedure for the parameter estimation. The inferential results, diagnostic procedures, including conditional quantile residual analysis and local influence analysis for sensitivity, are discussed. Simulation studies are performed to assess finite sample properties of parametric and nonparametric estimators. Finally, the data set analysis and concluding remarks are given.
本文提出了具有 p 阶对称自回归误差的加性偏线性模型,用于建立时间序列数据模型。具体而言,我们将该模型类别应用于解释巴西圣保罗索罗卡巴的呼吸道疾病周住院率,并将气候和污染作为协变量、趋势和季节性纳入其中。该模型类别的主要特点是能够考虑一组具有线性和非线性结构的解释变量,例如,它允许用非线性解释变量的加法函数和一个预测器对时间序列的趋势和季节性进行联合建模,以适应离散和线性解释变量。此外,条件对称误差允许拟合高相关阶数的数据,以及比正态分布尾部更重或更轻的误差分布。我们介绍了模型类别,并通过将 P-GAM 类型算法与参数估计的准牛顿过程相结合,得出了一种新颖的迭代过程。我们讨论了推论结果和诊断程序,包括条件量级残差分析和局部影响分析的敏感性。还进行了模拟研究,以评估参数和非参数估计器的有限样本特性。最后,给出了数据集分析和结束语。
{"title":"Additive partial linear models with autoregressive symmetric errors and its application to the hospitalizations for respiratory diseases","authors":"Shu Wei Chou-Chen, Rodrigo A. Oliveira, Irina Raicher, Gilberto A. Paula","doi":"10.1007/s00362-024-01590-w","DOIUrl":"https://doi.org/10.1007/s00362-024-01590-w","url":null,"abstract":"<p>Additive partial linear models with symmetric autoregressive errors of order <i>p</i> are proposed in this paper for modeling time series data. Specifically, we apply this model class to explain the weekly hospitalization for respiratory diseases in Sorocaba, São Paulo, Brazil, by incorporating climate and pollution as covariates, trend and seasonality. The main feature of this model class is its capability of considering a set of explanatory variables with linear and nonlinear structures, which allows, for example, to model jointly trend and seasonality of a time series with additive functions for the nonlinear explanatory variables and a predictor to accommodate discrete and linear explanatory variables. Additionally, the conditional symmetric errors allow the possibility of fitting data with high correlation order, as well as error distributions with heavier or lighter tails than the normal ones. We present the model class and a novel iterative process is derived by combining a P-GAM type algorithm with a quasi-Newton procedure for the parameter estimation. The inferential results, diagnostic procedures, including conditional quantile residual analysis and local influence analysis for sensitivity, are discussed. Simulation studies are performed to assess finite sample properties of parametric and nonparametric estimators. Finally, the data set analysis and concluding remarks are given.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"11 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}