首页 > 最新文献

Statistical Modelling最新文献

英文 中文
Principal component regression in GAMLSS applied to Greek–German government bond yield spreads GAMLSS中的主成分回归应用于希腊-德国政府债券收益率差
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2021-06-21 DOI: 10.1177/1471082X211022980
D. S. Mikis, A. Robert, Georgikopoulos Nikolaos, Demaio Fernanda
A solution to the problem of having to deal with a large number of interrelated explanatory variables within a generalized additive model for location, scale and shape (GAMLSS) is given here using as an example the Greek–German government bond yield spreads from 25 April 2005 to 31 March 2010. Those were turbulent financial years, and in order to capture the spreads behaviour, a model has to be able to deal with the complex nature of the financial indicators used to predict the spreads. Fitting a model, using principal components regression of both main and first order interaction terms, for all the parameters of the assumed distribution of the response variable seems to produce promising results.
本文以2005年4月25日至2010年3月31日的希腊-德国政府债券收益率差为例,给出了在位置、规模和形状的广义加性模型(GAMLSS)中必须处理大量相互关联的解释变量的问题的解决方案。那是动荡的财政年,为了捕捉利差行为,模型必须能够处理用于预测利差的金融指标的复杂性质。对于响应变量假设分布的所有参数,使用主项和一阶相互作用项的主成分回归拟合模型似乎产生了有希望的结果。
{"title":"Principal component regression in GAMLSS applied to Greek–German government bond yield spreads","authors":"D. S. Mikis, A. Robert, Georgikopoulos Nikolaos, Demaio Fernanda","doi":"10.1177/1471082X211022980","DOIUrl":"https://doi.org/10.1177/1471082X211022980","url":null,"abstract":"A solution to the problem of having to deal with a large number of interrelated explanatory variables within a generalized additive model for location, scale and shape (GAMLSS) is given here using as an example the Greek–German government bond yield spreads from 25 April 2005 to 31 March 2010. Those were turbulent financial years, and in order to capture the spreads behaviour, a model has to be able to deal with the complex nature of the financial indicators used to predict the spreads. Fitting a model, using principal components regression of both main and first order interaction terms, for all the parameters of the assumed distribution of the response variable seems to produce promising results.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"127 - 145"},"PeriodicalIF":1.0,"publicationDate":"2021-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X211022980","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44097647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A spatially explicit N-mixture model for the estimation of disease prevalence 用于疾病流行率估计的空间显式N混合模型
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2021-06-20 DOI: 10.1177/1471082X211020872
Ben Brintz, L. Madsen, Claudio Fuentes
This article develops an approximate N-mixture model for infectious disease counts that accounts for under-reporting as well as spatial dependence induced by person-to-person spread of disease. We employ the model to estimate actual case counts in Oregon of chlamydia, an easily-treated but usually asymptomatic sexually transmitted disease. We describe a combined parametric bootstrap to account for uncertainty in parameter estimates as well as sampling variability in actual case counts. A simulation study illustrates that our method performs well in many scenarios when the model is correctly specified, and also gives reasonable results when the model is misspecified, and no spatial dependence exists.
本文开发了一个传染病计数的近似N混合模型,该模型解释了疾病在人与人之间传播引起的报告不足和空间依赖性。我们使用该模型来估计俄勒冈州衣原体的实际病例数,衣原体是一种容易治疗但通常无症状的性传播疾病。我们描述了一种组合的参数自举,以说明参数估计的不确定性以及实际病例数中的采样可变性。仿真研究表明,当模型被正确指定时,我们的方法在许多场景中都表现良好,当模型指定错误且不存在空间依赖性时,我们也给出了合理的结果。
{"title":"A spatially explicit N-mixture model for the estimation of disease prevalence","authors":"Ben Brintz, L. Madsen, Claudio Fuentes","doi":"10.1177/1471082X211020872","DOIUrl":"https://doi.org/10.1177/1471082X211020872","url":null,"abstract":"This article develops an approximate N-mixture model for infectious disease counts that accounts for under-reporting as well as spatial dependence induced by person-to-person spread of disease. We employ the model to estimate actual case counts in Oregon of chlamydia, an easily-treated but usually asymptomatic sexually transmitted disease. We describe a combined parametric bootstrap to account for uncertainty in parameter estimates as well as sampling variability in actual case counts. A simulation study illustrates that our method performs well in many scenarios when the model is correctly specified, and also gives reasonable results when the model is misspecified, and no spatial dependence exists.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"23 1","pages":"31 - 52"},"PeriodicalIF":1.0,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X211020872","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46148761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Recurrent Events Analysis with Piece-wise exponential Additive Mixed Models 基于分段指数加性混合模型的递归事件分析
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2021-06-08 DOI: 10.21203/RS.3.RS-563303/V1
J. Ramjith, Andreas Bender, Roes Kcb, Jonker Ma
Background: Recurrent events analysis plays an important role in many applications, including the study of chronic diseases or recurrence of infections. Historically, most models for the analysis of time-to-event data, including recurrent events, have been based on Cox proportional hazards regression. Recently, however, the Piece-wise exponential Additive Mixed Model (PAMM) has gained popularity as a flexible framework for survival analysis. While many papers and tutorials have been presented in the literature on the application of Cox based models, few papers have provided detailed instructions for the application of PAMMs and to our knowledge, none exist for recurrent events analysis. Methods: The PAMM is introduced as a framework for recurrent events analysis. We describe the application of the model to unstratified and stratified shared frailty models for recurrent events. We illustrate how penalized splines can be used to estimate non-linear and time-varying covariate effects without a priori assumptions about their functional shape. The model is motivated for both, analysis on the gap timescale ("clock-reset") and calendar timescale ("clock-forward"). The data augmentation necessary for the application to recurrent events is described and explained in detail. Results: Simulations confirmed that the model provides unbiased estimates of covariate effects and the frailty variance, as well as equivalence to the Cox model when proportional hazards are assumed. Applications to recurrence of staphylococcus aureus and malaria in children illustrates the estimation of seasonality, bivariate non-linear effects, multiple timescales and relaxation of the proportional hazards assumption via time-varying effects. The R package pammtools has been extended to facilitate estimation, visualization and interpretation of PAMMs for recurrent events analysis. Conclusion: PAMMs provide a flexible framework for the analysis of time-to-event and recurrent events data. The estimation of PAMMs is based on Generalized Additive Mixed Models and thus extends the researcher’s toolbox for recurrent events analysis.
背景:复发事件分析在许多应用中发挥着重要作用,包括慢性疾病或感染复发的研究。从历史上看,大多数用于分析事件时间数据的模型,包括周期性事件,都是基于Cox比例风险回归。然而,最近,分段指数加性混合模型(PAMM)作为一种灵活的生存分析框架得到了广泛的应用。虽然文献中有许多关于Cox模型应用的论文和教程,但很少有论文为pamm的应用提供详细的说明,据我们所知,没有针对循环事件分析的论文。方法:引入PAMM作为反复事件分析的框架。我们描述了该模型在重复事件的非分层和分层共享脆弱性模型中的应用。我们说明如何惩罚样条可以用来估计非线性和时变协变量效应,而不需要对其功能形状的先验假设。该模型的动机是对间隙时间刻度(“时钟重置”)和日历时间刻度(“时钟前进”)进行分析。详细描述和解释了应用于循环事件所需的数据扩充。结果:模拟证实,该模型提供了协变量效应和脆弱方差的无偏估计,并且在假设成比例风险时与Cox模型等效。金黄色葡萄球菌和疟疾在儿童中复发的应用说明了季节性的估计、二元非线性效应、多重时间尺度以及通过时变效应放宽比例危害假设。R包pamtools已被扩展,以方便用于循环事件分析的pamm的估计、可视化和解释。结论:PAMMs为事件发生时间和复发事件数据的分析提供了一个灵活的框架。pamm的估计是基于广义加性混合模型,从而扩展了研究人员的工具箱用于循环事件分析。
{"title":"Recurrent Events Analysis with Piece-wise exponential Additive Mixed Models","authors":"J. Ramjith, Andreas Bender, Roes Kcb, Jonker Ma","doi":"10.21203/RS.3.RS-563303/V1","DOIUrl":"https://doi.org/10.21203/RS.3.RS-563303/V1","url":null,"abstract":"\u0000 Background: Recurrent events analysis plays an important role in many applications, including the study of chronic diseases or recurrence of infections. Historically, most models for the analysis of time-to-event data, including recurrent events, have been based on Cox proportional hazards regression. Recently, however, the Piece-wise exponential Additive Mixed Model (PAMM) has gained popularity as a flexible framework for survival analysis. While many papers and tutorials have been presented in the literature on the application of Cox based models, few papers have provided detailed instructions for the application of PAMMs and to our knowledge, none exist for recurrent events analysis. Methods: The PAMM is introduced as a framework for recurrent events analysis. We describe the application of the model to unstratified and stratified shared frailty models for recurrent events. We illustrate how penalized splines can be used to estimate non-linear and time-varying covariate effects without a priori assumptions about their functional shape. The model is motivated for both, analysis on the gap timescale (\"clock-reset\") and calendar timescale (\"clock-forward\"). The data augmentation necessary for the application to recurrent events is described and explained in detail. Results: Simulations confirmed that the model provides unbiased estimates of covariate effects and the frailty variance, as well as equivalence to the Cox model when proportional hazards are assumed. Applications to recurrence of staphylococcus aureus and malaria in children illustrates the estimation of seasonality, bivariate non-linear effects, multiple timescales and relaxation of the proportional hazards assumption via time-varying effects. The R package pammtools has been extended to facilitate estimation, visualization and interpretation of PAMMs for recurrent events analysis. Conclusion: PAMMs provide a flexible framework for the analysis of time-to-event and recurrent events data. The estimation of PAMMs is based on Generalized Additive Mixed Models and thus extends the researcher’s toolbox for recurrent events analysis.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"1 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2021-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48112585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Quantile regression for longitudinal data via the multivariate generalized hyperbolic distribution 基于多元广义双曲分布的纵向数据的分位数回归
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2021-06-07 DOI: 10.1177/1471082X211015454
Alvaro J. Flórez, I. Van Keilegom, G. Molenberghs, A. Verhasselt
While extensive research has been devoted to univariate quantile regression, this is considerably less the case for the multivariate (longitudinal) version, even though there are many potential applications, such as the joint examination of growth curves for two or more growth characteristics, such as body weight and length in infants. Quantile functions are easier to interpret for a population of curves than mean functions. While the connection between multivariate quantiles and the multivariate asymmetric Laplace distribution is known, it is less well known that its use for maximum likelihood estimation poses mathematical as well as computational challenges. Therefore, we study a broader family of multivariate generalized hyperbolic distributions, of which the multivariate asymmetric Laplace distribution is a limiting case. We offer an asymptotic treatment. Simulations and a data example supplement the modelling and theoretical considerations.
虽然对单变量分位数回归进行了广泛的研究,但对多变量(纵向)回归的研究要少得多,尽管有许多潜在的应用,例如对两种或多种生长特征(如婴儿的体重和身高)的生长曲线进行联合检查。对于曲线总体而言,分位数函数比均值函数更容易解释。虽然多元分位数和多元不对称拉普拉斯分布之间的联系是已知的,但鲜为人知的是,它用于最大似然估计会带来数学和计算方面的挑战。因此,我们研究了一类更广泛的多元广义双曲分布,其中多元非对称拉普拉斯分布是其极限情况。我们提供渐近治疗。仿真和一个数据实例补充了建模和理论考虑。
{"title":"Quantile regression for longitudinal data via the multivariate generalized hyperbolic distribution","authors":"Alvaro J. Flórez, I. Van Keilegom, G. Molenberghs, A. Verhasselt","doi":"10.1177/1471082X211015454","DOIUrl":"https://doi.org/10.1177/1471082X211015454","url":null,"abstract":"While extensive research has been devoted to univariate quantile regression, this is considerably less the case for the multivariate (longitudinal) version, even though there are many potential applications, such as the joint examination of growth curves for two or more growth characteristics, such as body weight and length in infants. Quantile functions are easier to interpret for a population of curves than mean functions. While the connection between multivariate quantiles and the multivariate asymmetric Laplace distribution is known, it is less well known that its use for maximum likelihood estimation poses mathematical as well as computational challenges. Therefore, we study a broader family of multivariate generalized hyperbolic distributions, of which the multivariate asymmetric Laplace distribution is a limiting case. We offer an asymptotic treatment. Simulations and a data example supplement the modelling and theoretical considerations.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"566 - 584"},"PeriodicalIF":1.0,"publicationDate":"2021-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X211015454","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42336179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactively visualizing distributional regression models with distreg.vis 分布式回归模型的交互式可视化
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2021-05-27 DOI: 10.1177/1471082X211007308
Stanislaus Stadlmann, T. Kneib
A newly emerging field in statistics is distributional regression, where not only the mean but each parameter of a parametric response distribution can be modelled using a set of predictors. As an extension of generalized additive models, distributional regression utilizes the known link functions (log, logit, etc.), model terms (fixed, random, spatial, smooth, etc.) and available types of distributions but allows us to go well beyond the exponential family and to model potentially all distributional parameters. Due to this increase in model flexibility, the interpretation of covariate effects on the shape of the conditional response distribution, its moments and other features derived from this distribution is more challenging than with traditional mean-based methods. In particular, such quantities of interest often do not directly equate the modelled parameters but are rather a (potentially complex) combination of them. To ease the post-estimation model analysis, we propose a framework and subsequently feature an implementation in R for the visualization of Bayesian and frequentist distributional regression models fitted using the bamlss, gamlss and betareg R packages.
统计学中一个新兴的领域是分布回归,其中不仅可以使用一组预测因子对参数响应分布的平均值,而且可以对每个参数进行建模。作为广义加性模型的扩展,分布回归利用了已知的链接函数(log、logit等)、模型项(固定、随机、空间、光滑等)和可用的分布类型,但使我们能够远远超越指数族,并对潜在的所有分布参数进行建模。由于模型灵活性的增加,与传统的基于均值的方法相比,对条件响应分布形状、其矩和从该分布导出的其他特征的协变效应的解释更具挑战性。特别是,这些感兴趣的量通常不会直接等同于建模参数,而是它们的(潜在的复杂)组合。为了简化估计后模型分析,我们提出了一个框架,并随后在R中实现,用于使用bamlss、gamlss和betareg R包拟合的贝叶斯和频率分布回归模型的可视化。
{"title":"Interactively visualizing distributional regression models with distreg.vis","authors":"Stanislaus Stadlmann, T. Kneib","doi":"10.1177/1471082X211007308","DOIUrl":"https://doi.org/10.1177/1471082X211007308","url":null,"abstract":"A newly emerging field in statistics is distributional regression, where not only the mean but each parameter of a parametric response distribution can be modelled using a set of predictors. As an extension of generalized additive models, distributional regression utilizes the known link functions (log, logit, etc.), model terms (fixed, random, spatial, smooth, etc.) and available types of distributions but allows us to go well beyond the exponential family and to model potentially all distributional parameters. Due to this increase in model flexibility, the interpretation of covariate effects on the shape of the conditional response distribution, its moments and other features derived from this distribution is more challenging than with traditional mean-based methods. In particular, such quantities of interest often do not directly equate the modelled parameters but are rather a (potentially complex) combination of them. To ease the post-estimation model analysis, we propose a framework and subsequently feature an implementation in R for the visualization of Bayesian and frequentist distributional regression models fitted using the bamlss, gamlss and betareg R packages.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"527 - 545"},"PeriodicalIF":1.0,"publicationDate":"2021-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X211007308","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43253762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Bayesian adjustment for measurement error in an offset variable in a Poisson regression model 泊松回归模型中偏移变量测量误差的贝叶斯平差
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2021-05-24 DOI: 10.1177/1471082X211008011
Kangjie Zhang, Juxin Liu, Yang Liu, Peng Zhang, R. Carroll
Fatal car crashes are the leading cause of death among teenagers in the USA. The Graduated Driver Licensing (GDL) programme is one effective policy for reducing the number of teen fatal car crashes. Our study focuses on the number of fatal car crashes in Michigan during 1990–2004 excluding 1997, when the GDL started. We use Poisson regression with spatially dependent random effects to model the county level teen car crash counts. We develop a measurement error model to account for the fact that the total teenage population in the county level is used as a proxy for the teenage driver population. To the best of our knowledge, there is no existing literature that considers adjustment for measurement error in an offset variable. Furthermore, limited work has addressed the measurement errors in the context of spatial data. In our modelling, a Berkson measurement error model with spatial random effects is applied to adjust for the error-prone offset variable in a Bayesian paradigm. The Bayesian Markov chain Monte Carlo (MCMC) sampling is implemented in rstan. To assess the consequence of adjusting for measurement error, we compared two models with and without adjustment for measurement error. We found the effect of a time indicator becomes less significant with the measurement-error adjustment. It leads to our conclusion that the reduced number of teen drivers can help explain, to some extent, the effectiveness of GDL.
致命车祸是美国青少年死亡的主要原因。毕业驾驶执照(GDL)计划是减少青少年致命车祸数量的一项有效政策。我们的研究重点是1990-2004年密歇根州致命车祸的数量,不包括GDL开始的1997年。我们使用具有空间相关随机效应的泊松回归来对县级青少年车祸计数进行建模。我们开发了一个测量误差模型,以说明县一级的青少年总人口被用作青少年司机人口的代理。据我们所知,现有文献中没有考虑偏移变量测量误差的调整。此外,有限的工作已经解决了空间数据背景下的测量误差。在我们的建模中,应用具有空间随机效应的Berkson测量误差模型来调整贝叶斯范式中易于出错的偏移变量。在rstan中实现了贝叶斯马尔可夫链蒙特卡罗(MCMC)采样。为了评估测量误差调整的结果,我们比较了有测量误差调整和无测量误差调整两种模型。我们发现,随着测量误差的调整,时间指示器的影响变得不那么显著。我们得出的结论是,青少年司机数量的减少在一定程度上有助于解释GDL的有效性。
{"title":"Bayesian adjustment for measurement error in an offset variable in a Poisson regression model","authors":"Kangjie Zhang, Juxin Liu, Yang Liu, Peng Zhang, R. Carroll","doi":"10.1177/1471082X211008011","DOIUrl":"https://doi.org/10.1177/1471082X211008011","url":null,"abstract":"Fatal car crashes are the leading cause of death among teenagers in the USA. The Graduated Driver Licensing (GDL) programme is one effective policy for reducing the number of teen fatal car crashes. Our study focuses on the number of fatal car crashes in Michigan during 1990–2004 excluding 1997, when the GDL started. We use Poisson regression with spatially dependent random effects to model the county level teen car crash counts. We develop a measurement error model to account for the fact that the total teenage population in the county level is used as a proxy for the teenage driver population. To the best of our knowledge, there is no existing literature that considers adjustment for measurement error in an offset variable. Furthermore, limited work has addressed the measurement errors in the context of spatial data. In our modelling, a Berkson measurement error model with spatial random effects is applied to adjust for the error-prone offset variable in a Bayesian paradigm. The Bayesian Markov chain Monte Carlo (MCMC) sampling is implemented in rstan. To assess the consequence of adjusting for measurement error, we compared two models with and without adjustment for measurement error. We found the effect of a time indicator becomes less significant with the measurement-error adjustment. It leads to our conclusion that the reduced number of teen drivers can help explain, to some extent, the effectiveness of GDL.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"509 - 526"},"PeriodicalIF":1.0,"publicationDate":"2021-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X211008011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44696177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mixed effect modelling and variable selection for quantile regression 分位数回归的混合效应建模和变量选择
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2021-04-17 DOI: 10.1177/1471082X211033490
H. Bar, J. Booth, M. Wells
It is known that the estimating equations for quantile regression (QR) can be solved using an EM algorithm in which the M-step is computed via weighted least squares, with weights computed at the E-step as the expectation of independent generalized inverse-Gaussian variables. This fact is exploited here to extend QR to allow for random effects in the linear predictor. Convergence of the algorithm in this setting is established by showing that it is a generalized alternating minimization (GAM) procedure. Another modification of the EM algorithm also allows us to adapt a recently proposed method for variable selection in mean regression models to the QR setting. Simulations show that the resulting method significantly outperforms variable selection in QR models using the lasso penalty. Applications to real data include a frailty QR analysis of hospital stays, and variable selection for age at onset of lung cancer and for riboflavin production rate using high-dimensional gene expression arrays for prediction.
众所周知,分位数回归(QR)的估计方程可以使用EM算法求解,其中M步是通过加权最小二乘法计算的,在E步计算的权重是独立广义逆高斯变量的期望值。这里利用这一事实来扩展QR,以允许线性预测器中的随机效应。该算法在该设置下的收敛性是通过表明它是一个广义交替最小化(GAM)过程来建立的。EM算法的另一个修改还允许我们将最近提出的均值回归模型中的变量选择方法调整为QR设置。仿真结果表明,该方法显著优于使用套索惩罚的QR模型中的变量选择。对真实数据的应用包括住院的虚弱QR分析,以及使用高维基因表达阵列进行预测的肺癌发病年龄和核黄素产生率的变量选择。
{"title":"Mixed effect modelling and variable selection for quantile regression","authors":"H. Bar, J. Booth, M. Wells","doi":"10.1177/1471082X211033490","DOIUrl":"https://doi.org/10.1177/1471082X211033490","url":null,"abstract":"It is known that the estimating equations for quantile regression (QR) can be solved using an EM algorithm in which the M-step is computed via weighted least squares, with weights computed at the E-step as the expectation of independent generalized inverse-Gaussian variables. This fact is exploited here to extend QR to allow for random effects in the linear predictor. Convergence of the algorithm in this setting is established by showing that it is a generalized alternating minimization (GAM) procedure. Another modification of the EM algorithm also allows us to adapt a recently proposed method for variable selection in mean regression models to the QR setting. Simulations show that the resulting method significantly outperforms variable selection in QR models using the lasso penalty. Applications to real data include a frailty QR analysis of hospital stays, and variable selection for age at onset of lung cancer and for riboflavin production rate using high-dimensional gene expression arrays for prediction.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"23 1","pages":"53 - 80"},"PeriodicalIF":1.0,"publicationDate":"2021-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47055947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modelling changes over time in a multivariate paired comparison: An application to window display design 在多变量配对比较中建模随时间的变化:在窗口显示设计中的应用
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2021-03-31 DOI: 10.1177/1471082X21995675
A. Grand, R. Dittrich
This article proposes an alternative method of making comparative judgements in multivariate paired comparisons (PCs) where judgements about change are made directly by comparing an object at two time points for each of a series of attributes. The application deals with the design of shop window displays where products should be arranged by teams of vocational students according to aesthetic principles (attributes). The photos of the students’ window displays at time 1 (before feedback) and at time 2 (after feedback) were compared by judging each attribute as to whether it was fulfilled better at time 1 or at time 2. An advantage of this PC approach over an alternative of a scoring system is the possibility to assess even subtle changes of various aspects of attractiveness, which cannot easily be measured using a score. To analyse these data, we used earlier work which developed both a multivariate PC pattern model for multi-attribute data and a PC model over time and defined a multivariate PC model of changes (MPCC). The model can be fitted as a non-standard Poisson log-linear model and provides estimates of change for the three attributes for time 2 and we were able to check for possible interaction effects between these attributes.
本文提出了一种在多变量配对比较(PCs)中进行比较判断的替代方法,其中关于变化的判断是通过在两个时间点对一系列属性中的每个属性进行比较而直接做出的。该应用程序处理商店橱窗展示的设计,其中产品应由职业学生团队根据美学原则(属性)排列。比较时间1(反馈前)和时间2(反馈后)学生橱窗展示的照片,判断每个属性在时间1或时间2中实现得更好。与评分系统相比,这种个人电脑方法的一个优点是,它可以评估吸引力各个方面的细微变化,而这些变化是无法轻易用分数来衡量的。为了分析这些数据,我们使用了早期的工作,该工作开发了多属性数据的多元PC模式模型和随时间变化的PC模型,并定义了变化的多元PC模型(MPCC)。该模型可以拟合为非标准泊松对数线性模型,并提供时间2的三个属性的变化估计,我们能够检查这些属性之间可能的相互作用效应。
{"title":"Modelling changes over time in a multivariate paired comparison: An application to window display design","authors":"A. Grand, R. Dittrich","doi":"10.1177/1471082X21995675","DOIUrl":"https://doi.org/10.1177/1471082X21995675","url":null,"abstract":"This article proposes an alternative method of making comparative judgements in multivariate paired comparisons (PCs) where judgements about change are made directly by comparing an object at two time points for each of a series of attributes. The application deals with the design of shop window displays where products should be arranged by teams of vocational students according to aesthetic principles (attributes). The photos of the students’ window displays at time 1 (before feedback) and at time 2 (after feedback) were compared by judging each attribute as to whether it was fulfilled better at time 1 or at time 2. An advantage of this PC approach over an alternative of a scoring system is the possibility to assess even subtle changes of various aspects of attractiveness, which cannot easily be measured using a score. To analyse these data, we used earlier work which developed both a multivariate PC pattern model for multi-attribute data and a PC model over time and defined a multivariate PC model of changes (MPCC). The model can be fitted as a non-standard Poisson log-linear model and provides estimates of change for the three attributes for time 2 and we were able to check for possible interaction effects between these attributes.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"95 - 106"},"PeriodicalIF":1.0,"publicationDate":"2021-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X21995675","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42838067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multivariate functional additive mixed models 多元函数加性混合模型
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2021-03-11 DOI: 10.1177/1471082X211056158
A. Volkmann, Almond Stöcker, F. Scheipl, S. Greven
Multivariate functional data can be intrinsically multivariate like movement trajectories in 2D or complementary such as precipitation, temperature and wind speeds over time at a given weather station. We propose a multivariate functional additive mixed model (multiFAMM) and show its application to both data situations using examples from sports science (movement trajectories of snooker players) and phonetic science (acoustic signals and articulation of consonants). The approach includes linear and nonlinear covariate effects and models the dependency structure between the dimensions of the responses using multivariate functional principal component analysis. Multivariate functional random intercepts capture both the auto-correlation within a given function and cross-correlations between the multivariate functional dimensions. They also allow us to model between-function correlations as induced by, for example, repeated measurements or crossed study designs. Modelling the dependency structure between the dimensions can generate additional insight into the properties of the multivariate functional process, improves the estimation of random effects, and yields corrected confidence bands for covariate effects. Extensive simulation studies indicate that a multivariate modelling approach is more parsimonious than fitting independent univariate models to the data while maintaining or improving model fit.
多元功能数据本质上可以是多元的,如二维的运动轨迹或互补的,如给定气象站随时间的降水、温度和风速。我们提出了一个多元功能加性混合模型(multiFAMM),并通过运动科学(斯诺克运动员的运动轨迹)和语音学(声学信号和辅音发音)的例子展示了它在两种数据情况下的应用。该方法包括线性和非线性协变量效应,并使用多变量泛函主成分分析对响应维度之间的依赖结构进行建模。多变量函数随机截距捕获给定函数内的自相关性和多变量函数维度之间的交叉相关性。它们还允许我们模拟由重复测量或交叉研究设计等引起的功能间相关性。对维度之间的依赖结构进行建模可以对多元函数过程的属性产生额外的见解,改进随机效应的估计,并为协变量效应产生校正的置信带。大量的仿真研究表明,多元建模方法比在保持或改善模型拟合的同时对数据拟合独立的单变量模型更为简洁。
{"title":"Multivariate functional additive mixed models","authors":"A. Volkmann, Almond Stöcker, F. Scheipl, S. Greven","doi":"10.1177/1471082X211056158","DOIUrl":"https://doi.org/10.1177/1471082X211056158","url":null,"abstract":"Multivariate functional data can be intrinsically multivariate like movement trajectories in 2D or complementary such as precipitation, temperature and wind speeds over time at a given weather station. We propose a multivariate functional additive mixed model (multiFAMM) and show its application to both data situations using examples from sports science (movement trajectories of snooker players) and phonetic science (acoustic signals and articulation of consonants). The approach includes linear and nonlinear covariate effects and models the dependency structure between the dimensions of the responses using multivariate functional principal component analysis. Multivariate functional random intercepts capture both the auto-correlation within a given function and cross-correlations between the multivariate functional dimensions. They also allow us to model between-function correlations as induced by, for example, repeated measurements or crossed study designs. Modelling the dependency structure between the dimensions can generate additional insight into the properties of the multivariate functional process, improves the estimation of random effects, and yields corrected confidence bands for covariate effects. Extensive simulation studies indicate that a multivariate modelling approach is more parsimonious than fitting independent univariate models to the data while maintaining or improving model fit.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"23 1","pages":"303 - 326"},"PeriodicalIF":1.0,"publicationDate":"2021-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48199296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Semi-supervised clustering of time-dependent categorical sequences with application to discovering education-based life patterns 时间依赖分类序列的半监督聚类及其在基于教育生活模式发现中的应用
IF 1 4区 数学 Q2 STATISTICS & PROBABILITY Pub Date : 2021-03-08 DOI: 10.1177/1471082X21989170
Yingying Zhang, Volodymyr Melnykov, Igor Melnykov
A new approach to the analysis of heterogeneous categorical sequences is proposed. The first-order Markov model is employed in a finite mixture setting with initial state and transition probabilities being expressed as functions of time. The expectation–maximization algorithm approach to parameter estimation is implemented in the presence of positive equivalence constraints that determine which observations must be placed in the same class in the solution. The proposed model is applied to a dataset from the British Household Panel Survey to evaluate the association between the education background and life outcomes of study participants. The analysis of the survey data reveals many interesting relationships between the level of education and major life events.
提出了一种分析异质分类序列的新方法。一阶马尔可夫模型用于有限混合设置,初始状态和转移概率表示为时间的函数。参数估计的期望-最大化算法方法是在存在正等价约束的情况下实现的,该约束确定了哪些观测值必须放在解中的同一类中。将所提出的模型应用于英国家庭小组调查的数据集,以评估研究参与者的教育背景和生活结果之间的关联。对调查数据的分析揭示了教育水平与重大生活事件之间的许多有趣的关系。
{"title":"Semi-supervised clustering of time-dependent categorical sequences with application to discovering education-based life patterns","authors":"Yingying Zhang, Volodymyr Melnykov, Igor Melnykov","doi":"10.1177/1471082X21989170","DOIUrl":"https://doi.org/10.1177/1471082X21989170","url":null,"abstract":"A new approach to the analysis of heterogeneous categorical sequences is proposed. The first-order Markov model is employed in a finite mixture setting with initial state and transition probabilities being expressed as functions of time. The expectation–maximization algorithm approach to parameter estimation is implemented in the presence of positive equivalence constraints that determine which observations must be placed in the same class in the solution. The proposed model is applied to a dataset from the British Household Panel Survey to evaluate the association between the education background and life outcomes of study participants. The analysis of the survey data reveals many interesting relationships between the level of education and major life events.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"457 - 476"},"PeriodicalIF":1.0,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X21989170","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49177583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistical Modelling
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1