首页 > 最新文献

Statistical Modeling最新文献

英文 中文
Estimation in generalised linear mixed models with binary outcomes by simulated maximum likelihood 用模拟极大似然法估计具有二元结果的广义线性混合模型
Pub Date : 2006-04-01 DOI: 10.1191/1471082X06st106oa
E. S. Ng, J. Carpenter, H. Goldstein, J. Rasbash
Fitting multilevel models to discrete outcome data is problematic because the discrete distribution of the response variable implies an analytically intractable log-likelihood function. Among a number of approximate methods proposed, second-order penalised quasi-likelihood (PQL) is commonly used and is one of the most accurate. Unfortunately, even the second-order PQL approximation has been shown to produce estimates biased toward zero in certain circumstances. This bias can be marked especially when the data are sparse. One option to reduce this bias is to use Monte-Carlo simulation. A bootstrap bias correction method proposed by Kuk has been implemented in MLwiN. However, a similar technique based on the Robbins-Monro (RM) algorithm is potentially more efficient. An alternative is to use simulated maximum likelihood (SML), either alone or to refine estimates identified by other methods. In this article, we first compare bias correction using the RM algorithm, Kuk’s method and SML. We find that SML performs as efficiently as the other two methods and also yields standard errors of the bias-corrected parameter estimates and an estimate of the log-likelihood at the maximum, with which nested models can be compared. Secondly, using simulated and real data examples, we compare SML, second-order Laplace approximation (as implemented in HLM), Markov Chain Monte-Carlo (MCMC) (in MLwiN) and numerical integration using adaptive quadrature methods (in Stata’s GLLAMM and in SAS’s proc NLMIXED). We find that when the data are sparse, the second-order Laplace approximation produces markedly lower parameter estimates, whereas the MCMC method produces estimates that are noticeably higher than those from the SML and quadrature methods. Although proc NLMIXED is much faster than GLLAMM, it is not designed to fit models of more than two levels. SML produces parameter estimates and log-likelihoods very similar to those from quadrature methods. Further our SML approach extends to handle other link functions, discrete data distributions, non-normal random effects and higher-level models.
将多水平模型拟合到离散结果数据是有问题的,因为响应变量的离散分布意味着一个难以分析的对数似然函数。在提出的许多近似方法中,二阶惩罚拟似然(PQL)是常用的,也是最准确的一种。不幸的是,在某些情况下,即使是二阶PQL近似也会产生偏向于零的估计。这种偏差可以被标记出来,尤其是在数据稀疏的情况下。减少这种偏差的一个选择是使用蒙特卡罗模拟。Kuk提出的自举偏置校正方法已在MLwiN中实现。然而,基于罗宾斯-门罗(RM)算法的类似技术可能更有效。另一种方法是使用模拟最大似然(SML),可以单独使用,也可以对其他方法确定的估计进行细化。在本文中,我们首先比较了RM算法、Kuk方法和SML方法的偏差校正。我们发现SML的执行效率与其他两种方法一样高,并且还产生了偏差校正参数估计的标准误差和最大对数似然估计,可以与嵌套模型进行比较。其次,通过模拟和真实数据示例,我们比较了SML、二阶拉普拉斯近似(在HLM中实现)、马尔可夫链蒙特卡罗(MCMC)(在MLwiN中)和使用自适应正交方法的数值积分(在Stata的GLLAMM和SAS的NLMIXED过程中)。我们发现,当数据稀疏时,二阶拉普拉斯近似产生明显较低的参数估计,而MCMC方法产生的估计明显高于SML和正交方法。虽然程序NLMIXED比GLLAMM快得多,但它的设计不适合两层以上的模型。SML产生的参数估计和对数似然与正交方法非常相似。此外,我们的SML方法扩展到处理其他链接函数、离散数据分布、非正态随机效应和更高级别的模型。
{"title":"Estimation in generalised linear mixed models with binary outcomes by simulated maximum likelihood","authors":"E. S. Ng, J. Carpenter, H. Goldstein, J. Rasbash","doi":"10.1191/1471082X06st106oa","DOIUrl":"https://doi.org/10.1191/1471082X06st106oa","url":null,"abstract":"Fitting multilevel models to discrete outcome data is problematic because the discrete distribution of the response variable implies an analytically intractable log-likelihood function. Among a number of approximate methods proposed, second-order penalised quasi-likelihood (PQL) is commonly used and is one of the most accurate. Unfortunately, even the second-order PQL approximation has been shown to produce estimates biased toward zero in certain circumstances. This bias can be marked especially when the data are sparse. One option to reduce this bias is to use Monte-Carlo simulation. A bootstrap bias correction method proposed by Kuk has been implemented in MLwiN. However, a similar technique based on the Robbins-Monro (RM) algorithm is potentially more efficient. An alternative is to use simulated maximum likelihood (SML), either alone or to refine estimates identified by other methods. In this article, we first compare bias correction using the RM algorithm, Kuk’s method and SML. We find that SML performs as efficiently as the other two methods and also yields standard errors of the bias-corrected parameter estimates and an estimate of the log-likelihood at the maximum, with which nested models can be compared. Secondly, using simulated and real data examples, we compare SML, second-order Laplace approximation (as implemented in HLM), Markov Chain Monte-Carlo (MCMC) (in MLwiN) and numerical integration using adaptive quadrature methods (in Stata’s GLLAMM and in SAS’s proc NLMIXED). We find that when the data are sparse, the second-order Laplace approximation produces markedly lower parameter estimates, whereas the MCMC method produces estimates that are noticeably higher than those from the SML and quadrature methods. Although proc NLMIXED is much faster than GLLAMM, it is not designed to fit models of more than two levels. SML produces parameter estimates and log-likelihoods very similar to those from quadrature methods. Further our SML approach extends to handle other link functions, discrete data distributions, non-normal random effects and higher-level models.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128747363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Modelling of repeated ordered measurements by isotonic sequential regression 用等渗序贯回归法模拟重复有序测量
Pub Date : 2005-12-01 DOI: 10.1191/1471082X05st101oa
G. Tutz
This article introduces a simple model for repeated observations of an ordered categorical response variable which is isotonic over time. It is assumed that the measurements represent an irreversible process such that the response at time t is never lower than the response observed at the previous time point t − 1. Observations of this type occur, for example, in treatment studies when improvement is measured on an ordinal scale. As the response at time t depends on the previous outcome, the number of ordered response categories depends on the previous outcome leading to severe problems when simple threshold models for ordered data are used. To avoid these problems, the isotonic sequential model is introduced. It accounts for the irreversible process by considering the binary transitions to higher scores and allows a parsimonious parameterization. It is shown how the model may easily be estimated using existing software. Moreover, the model is extended to a random effects version which explicitly takes heterogeneity of individuals and potential correlations into account.
本文介绍了一个简单的模型,用于重复观察一个有序的分类响应变量,该变量随时间等渗。假设测量代表一个不可逆过程,使得时间t的响应永远不会低于在前一个时间点t - 1观察到的响应。这种类型的观察发生,例如,在治疗研究中,当改善是在有序尺度上测量时。由于时间t的响应取决于先前的结果,因此有序响应类别的数量取决于先前的结果,当使用有序数据的简单阈值模型时,会导致严重的问题。为了避免这些问题,引入了等渗序列模型。它通过考虑二进制转换到更高分数来解释不可逆过程,并允许简约的参数化。它显示了如何使用现有的软件容易地估计模型。此外,该模型被扩展到一个随机效应版本,明确地考虑了个体的异质性和潜在的相关性。
{"title":"Modelling of repeated ordered measurements by isotonic sequential regression","authors":"G. Tutz","doi":"10.1191/1471082X05st101oa","DOIUrl":"https://doi.org/10.1191/1471082X05st101oa","url":null,"abstract":"This article introduces a simple model for repeated observations of an ordered categorical response variable which is isotonic over time. It is assumed that the measurements represent an irreversible process such that the response at time t is never lower than the response observed at the previous time point t − 1. Observations of this type occur, for example, in treatment studies when improvement is measured on an ordinal scale. As the response at time t depends on the previous outcome, the number of ordered response categories depends on the previous outcome leading to severe problems when simple threshold models for ordered data are used. To avoid these problems, the isotonic sequential model is introduced. It accounts for the irreversible process by considering the binary transitions to higher scores and allows a parsimonious parameterization. It is shown how the model may easily be estimated using existing software. Moreover, the model is extended to a random effects version which explicitly takes heterogeneity of individuals and potential correlations into account.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129030605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Mining epidemiological time series: an approach based on dynamic regression 流行病学时间序列的挖掘:基于动态回归的方法
Pub Date : 2005-12-01 DOI: 10.1191/1471082X05st103oa
M. Chiogna, C. Gaetan
In epidemiology, time-series regression models are specially suitable for evaluating short-term effects of time-varying exposures to pollution. To summarize findings from different studies on different cities, the techniques of designed meta-analyses have been employed. In this context, city-specific findings are summarized by an ‘effect size’ measured on a common scale. Such effects are then pooled together on a second hierarchy of analysis. The objective of this article is to exploit exploratory analysis of city-specific time series. In fact, when dealing with many sources of data, that is, many cities, an exploratory analysis becomes almost unaffordable. Our idea is to explore the time series by fitting complete dynamic regression models. These models are easier to fit than models usually employed and allow implementation of very fast automated model selection algorithms. The idea is to highlight the common features across cities through this analysis, which might then be used to design the meta-analysis. The proposal is illustrated by analysing data on the relationship between daily nonaccidental deaths and air pollution in the 20 US largest cities.
在流行病学中,时间序列回归模型特别适合于评价时变污染暴露的短期影响。为了总结不同城市的不同研究结果,采用了设计的元分析技术。在这种情况下,特定城市的研究结果通过在共同尺度上测量的“效应大小”来总结。然后将这些影响汇集在一起,进行第二层次的分析。本文的目的是利用城市特定时间序列的探索性分析。事实上,当处理许多数据源,即许多城市时,探索性分析几乎是负担不起的。我们的想法是通过拟合完整的动态回归模型来探索时间序列。这些模型比通常使用的模型更容易拟合,并且允许实现非常快速的自动模型选择算法。这个想法是通过这种分析来突出城市之间的共同特征,然后可以用来设计元分析。通过分析美国20个最大城市的日常非意外死亡与空气污染之间关系的数据,可以说明这一建议。
{"title":"Mining epidemiological time series: an approach based on dynamic regression","authors":"M. Chiogna, C. Gaetan","doi":"10.1191/1471082X05st103oa","DOIUrl":"https://doi.org/10.1191/1471082X05st103oa","url":null,"abstract":"In epidemiology, time-series regression models are specially suitable for evaluating short-term effects of time-varying exposures to pollution. To summarize findings from different studies on different cities, the techniques of designed meta-analyses have been employed. In this context, city-specific findings are summarized by an ‘effect size’ measured on a common scale. Such effects are then pooled together on a second hierarchy of analysis. The objective of this article is to exploit exploratory analysis of city-specific time series. In fact, when dealing with many sources of data, that is, many cities, an exploratory analysis becomes almost unaffordable. Our idea is to explore the time series by fitting complete dynamic regression models. These models are easier to fit than models usually employed and allow implementation of very fast automated model selection algorithms. The idea is to highlight the common features across cities through this analysis, which might then be used to design the meta-analysis. The proposal is illustrated by analysing data on the relationship between daily nonaccidental deaths and air pollution in the 20 US largest cities.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128458898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Latent variable models for mixed categorical and survival responses, with an application to fertility preferences and family planning in Bangladesh 混合分类和生存反应的潜在变量模型,应用于孟加拉国的生育偏好和计划生育
Pub Date : 2005-12-01 DOI: 10.1191/1471082X05st100oa
I. Moustaki, F. Steele
In this article, we discuss a latent variable model with continuous latent variables for manifest variables that are a mixture of categorical and survival outcomes. Models for censored and uncensored survival data are discussed. The model allows for covariate effects both on the manifest variables (direct effects) and on the latent variable(s) (indirect effects). The methodological developments are motivated by a demographic application: an exploration of women’s fertility preferences and family planning behaviour in Bangladesh.
在本文中,我们讨论了一个潜在变量模型,该模型具有连续潜在变量,用于混合分类和生存结果的显变量。讨论了删减和未删减生存数据的模型。该模型允许对明显变量(直接影响)和潜在变量(间接影响)的协变量影响。方法发展的动机是人口应用:探讨孟加拉国妇女的生育偏好和计划生育行为。
{"title":"Latent variable models for mixed categorical and survival responses, with an application to fertility preferences and family planning in Bangladesh","authors":"I. Moustaki, F. Steele","doi":"10.1191/1471082X05st100oa","DOIUrl":"https://doi.org/10.1191/1471082X05st100oa","url":null,"abstract":"In this article, we discuss a latent variable model with continuous latent variables for manifest variables that are a mixture of categorical and survival outcomes. Models for censored and uncensored survival data are discussed. The model allows for covariate effects both on the manifest variables (direct effects) and on the latent variable(s) (indirect effects). The methodological developments are motivated by a demographic application: an exploration of women’s fertility preferences and family planning behaviour in Bangladesh.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115144227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Analyzing lifetime data with long-tailed skewed distribution: the logistic-sinh family 具有长尾偏态分布的寿命数据分析:logistic-sinh家族
Pub Date : 2005-12-01 DOI: 10.1191/1471082X05st099oa
Kahadawala Cooray
A new two-parameter family of distribution is presented. It is derived to model the highly negatively skewed data with extreme observations. The new family of distribution is referred to as the logistic-sinh distribution, as it is derived from the logistic distribution by appropriately replacing an exponential term with a hyperbolic sine term. The resulting family provides not only negatively skewed densities with thick tails but also variety of monotonic density shapes. The space of shape parameter, lambda greater than zero is divided by boundary line of lambda equals one, into two regions over which the hazard function is, respectively, increasing and bathtub shaped. The maximum likelihood parameter estimation techniques are discussed by providing approximate coverage probabilities for uncensored samples. The advantages of using the new family are demonstrated and compared by illustrating well known examples.
提出了一种新的双参数分布族。它的推导是为了用极端观测值对高度负偏的数据进行建模。新的分布族被称为logistic-sinh分布,因为它是通过适当地用双曲正弦项取代指数项而从logistic分布中导出的。由此产生的家族不仅提供了具有厚尾的负偏斜密度,而且还提供了各种单调密度形状。形状参数λ大于0的空间被λ = 1的边界线划分为危害函数分别为递增和浴缸形的两个区域。通过提供未删节样本的近似覆盖概率,讨论了最大似然参数估计技术。通过一些众所周知的例子,对使用新家族的优点进行了论证和比较。
{"title":"Analyzing lifetime data with long-tailed skewed distribution: the logistic-sinh family","authors":"Kahadawala Cooray","doi":"10.1191/1471082X05st099oa","DOIUrl":"https://doi.org/10.1191/1471082X05st099oa","url":null,"abstract":"A new two-parameter family of distribution is presented. It is derived to model the highly negatively skewed data with extreme observations. The new family of distribution is referred to as the logistic-sinh distribution, as it is derived from the logistic distribution by appropriately replacing an exponential term with a hyperbolic sine term. The resulting family provides not only negatively skewed densities with thick tails but also variety of monotonic density shapes. The space of shape parameter, lambda greater than zero is divided by boundary line of lambda equals one, into two regions over which the hazard function is, respectively, increasing and bathtub shaped. The maximum likelihood parameter estimation techniques are discussed by providing approximate coverage probabilities for uncensored samples. The advantages of using the new family are demonstrated and compared by illustrating well known examples.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114800234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Understanding past ocean circulations: a nonparametric regression case study 理解过去的海洋环流:一个非参数回归案例研究
Pub Date : 2005-12-01 DOI: 10.1191/1471082X05st102oa
R. Samworth, H. Poore
Oceanographers study past ocean circulations and their effect on global climate through carbon isotope records obtained from microfossils deposited on the ocean floor. An initial goal is to estimate the carbon isotope levels for the Pacific, Southern and North Atlantic Oceans over the last 23 million years and to provide confidence bands. We consider a nonparametric regression model and demonstrate how several recent developments in methodology make local linear kernel regression an attractive approach for tackling the problem. The results are used to estimate a quantity called the proportion of Northern Component Water and its effect on global climate. Several interesting and important geophysical and oceanographic conclusions are suggested by the study.
海洋学家通过从沉积在海底的微化石中获得的碳同位素记录来研究过去的海洋环流及其对全球气候的影响。最初的目标是估计过去2300万年中太平洋、南大西洋和北大西洋的碳同位素水平,并提供置信区间。我们考虑了一个非参数回归模型,并展示了方法论的几个最新发展如何使局部线性核回归成为解决问题的有吸引力的方法。这些结果被用来估计北方成分水的比例及其对全球气候的影响。该研究提出了几个有趣和重要的地球物理和海洋学结论。
{"title":"Understanding past ocean circulations: a nonparametric regression case study","authors":"R. Samworth, H. Poore","doi":"10.1191/1471082X05st102oa","DOIUrl":"https://doi.org/10.1191/1471082X05st102oa","url":null,"abstract":"Oceanographers study past ocean circulations and their effect on global climate through carbon isotope records obtained from microfossils deposited on the ocean floor. An initial goal is to estimate the carbon isotope levels for the Pacific, Southern and North Atlantic Oceans over the last 23 million years and to provide confidence bands. We consider a nonparametric regression model and demonstrate how several recent developments in methodology make local linear kernel regression an attractive approach for tackling the problem. The results are used to estimate a quantity called the proportion of Northern Component Water and its effect on global climate. Several interesting and important geophysical and oceanographic conclusions are suggested by the study.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"62 22","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113933249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A statistical framework for the analysis of multivariate infectious disease surveillance counts 多变量传染病监测计数分析的统计框架
Pub Date : 2005-10-01 DOI: 10.1191/1471082X05st098oa
L. Held, M. Höhle, Mathias W. Hofmann
A framework for the statistical analysis of counts from infectious disease surveillance databases is proposed. In its simplest form, the model can be seen as a Poisson branching process model with immigration. Extensions to include seasonal effects, time trends and overdispersion are outlined. The model is shown to provide an adequate fit and reliable one-step-ahead prediction intervals for a typical infectious disease time series. In addition, a multivariate formulation is proposed, which is well suited to capture space-time dependence caused by the spatial spread of a disease over time. An analysis of two multivariate time series is described. All analyses have been done using general optimization routines, where ML estimates and corresponding standard errors are readily available.
提出了传染病监测数据库计数统计分析的框架。在其最简单的形式,该模型可以看作是一个泊松分支过程模型与移民。概述了包括季节影响、时间趋势和过度分散在内的扩展。该模型为典型传染病时间序列提供了充分的拟合和可靠的一步前预测区间。此外,提出了一个多变量公式,它非常适合捕捉由疾病随时间的空间传播引起的时空依赖性。对两个多变量时间序列进行了分析。所有分析都使用一般优化例程完成,其中ML估计和相应的标准误差很容易获得。
{"title":"A statistical framework for the analysis of multivariate infectious disease surveillance counts","authors":"L. Held, M. Höhle, Mathias W. Hofmann","doi":"10.1191/1471082X05st098oa","DOIUrl":"https://doi.org/10.1191/1471082X05st098oa","url":null,"abstract":"A framework for the statistical analysis of counts from infectious disease surveillance databases is proposed. In its simplest form, the model can be seen as a Poisson branching process model with immigration. Extensions to include seasonal effects, time trends and overdispersion are outlined. The model is shown to provide an adequate fit and reliable one-step-ahead prediction intervals for a typical infectious disease time series. In addition, a multivariate formulation is proposed, which is well suited to capture space-time dependence caused by the spatial spread of a disease over time. An analysis of two multivariate time series is described. All analyses have been done using general optimization routines, where ML estimates and corresponding standard errors are readily available.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131417747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 222
Two slice-EM algorithms for fitting generalized linear mixed models with binary response 具有二元响应的广义线性混合模型的两种切片em拟合算法
Pub Date : 2005-10-01 DOI: 10.1191/1471082X05st097oa
F. Vaida, X. Meng
The celebrated simplicity of the EM algorithm is somewhat lost in its common use for generalized linear mixed models (GLMMs) because of its analytically intractable E-step. A natural and typical strategy in practice is to implement the E-step via Monte Carlo by drawing the unobserved random effects from their conditional distribution as specified by the E-step. In this paper, we show that further augmenting the missing data (e.g., the random effects) used by the M-step leads to a quite attractive and general slice sampler for implementing the Monte Carlo E-step. The slice sampler scheme is straightforward to implement, and it is neither restricted to the particular choice of the link function (e.g., probit) nor to the distribution of the random effects (e.g., normal). We apply this scheme to the standard EM algorithm as well as to an alternative EM algorithm which treats the variance-standardized random effects, rather than the random effects themselves, as missing data. The alternative EM algorithm does not only have faster convergence, but also leads to generalized linear model-like variance estimation, because it converts the random-effect standard deviations into linear regression parameters. Using the well-known salamander mating problem, we compare these two algorithms with each other, as well as with a variety of methods given in the literature in terms of the resulting point and interval estimates.
EM算法的简单性在一般的广义线性混合模型(glmm)中由于其难以解析的e步而有所丧失。在实践中,一种自然而典型的策略是通过蒙特卡罗方法实现e步,即从e步指定的条件分布中绘制未观察到的随机效应。在本文中,我们表明,进一步扩大m步所使用的缺失数据(例如,随机效应)会导致一个相当有吸引力的和通用的切片采样器,用于实现蒙特卡洛e步。切片采样器方案很容易实现,它既不局限于链接函数的特定选择(例如,probit),也不局限于随机效应的分布(例如,正态)。我们将该方案应用于标准的EM算法以及一种替代的EM算法,该算法将方差标准化的随机效应而不是随机效应本身作为缺失数据。替代EM算法不仅具有更快的收敛速度,而且由于将随机效应标准差转换为线性回归参数,导致了广义线性类模型方差估计。利用众所周知的蝾螈交配问题,我们将这两种算法相互比较,并与文献中给出的各种方法在结果点和区间估计方面进行比较。
{"title":"Two slice-EM algorithms for fitting generalized linear mixed models with binary response","authors":"F. Vaida, X. Meng","doi":"10.1191/1471082X05st097oa","DOIUrl":"https://doi.org/10.1191/1471082X05st097oa","url":null,"abstract":"The celebrated simplicity of the EM algorithm is somewhat lost in its common use for generalized linear mixed models (GLMMs) because of its analytically intractable E-step. A natural and typical strategy in practice is to implement the E-step via Monte Carlo by drawing the unobserved random effects from their conditional distribution as specified by the E-step. In this paper, we show that further augmenting the missing data (e.g., the random effects) used by the M-step leads to a quite attractive and general slice sampler for implementing the Monte Carlo E-step. The slice sampler scheme is straightforward to implement, and it is neither restricted to the particular choice of the link function (e.g., probit) nor to the distribution of the random effects (e.g., normal). We apply this scheme to the standard EM algorithm as well as to an alternative EM algorithm which treats the variance-standardized random effects, rather than the random effects themselves, as missing data. The alternative EM algorithm does not only have faster convergence, but also leads to generalized linear model-like variance estimation, because it converts the random-effect standard deviations into linear regression parameters. Using the well-known salamander mating problem, we compare these two algorithms with each other, as well as with a variety of methods given in the literature in terms of the resulting point and interval estimates.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127214479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A pairwise likelihood approach to generalized linear models with crossed random effects 具有交叉随机效应的广义线性模型的两两似然方法
Pub Date : 2005-10-01 DOI: 10.1191/1471082X05st095oa
R. Bellio, C. Varin
Inference in generalized linear models with crossed effects is often made cumbersome by the high-dimensional intractable integrals involved in the likelihood function. We propose an inferential strategy based on the pairwise likelihood, which only requires the computation of bivariate distributions. The benefits of our approach are the simplicity of implementation and the potential to handle large data sets. The estimators based on the pairwise likelihood are generally consistent and asymptotically normally distributed. The pairwise likelihood makes it possible to improve on standard inferential procedures by means of bootstrap methods. The performance of the proposed methodology is illustrated by simulations and application to the well-known salamander mating data set.
在具有交叉效应的广义线性模型中,由于似然函数中包含高维难以处理的积分,常常使推理变得很麻烦。我们提出了一种基于两两似然的推理策略,它只需要计算二元分布。我们的方法的好处是简单的实现和处理大型数据集的潜力。基于两两似然的估计量通常是一致的,并且是渐近正态分布的。两两似然使得用自举方法改进标准推理程序成为可能。通过对已知的蝾螈交配数据集的仿真和应用,说明了所提出方法的性能。
{"title":"A pairwise likelihood approach to generalized linear models with crossed random effects","authors":"R. Bellio, C. Varin","doi":"10.1191/1471082X05st095oa","DOIUrl":"https://doi.org/10.1191/1471082X05st095oa","url":null,"abstract":"Inference in generalized linear models with crossed effects is often made cumbersome by the high-dimensional intractable integrals involved in the likelihood function. We propose an inferential strategy based on the pairwise likelihood, which only requires the computation of bivariate distributions. The benefits of our approach are the simplicity of implementation and the potential to handle large data sets. The estimators based on the pairwise likelihood are generally consistent and asymptotically normally distributed. The pairwise likelihood makes it possible to improve on standard inferential procedures by means of bootstrap methods. The performance of the proposed methodology is illustrated by simulations and application to the well-known salamander mating data set.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"56 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132026665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Joint modelling of recurrence and progression of adenomas: a latent variable approach 腺瘤复发和进展的联合建模:一种潜在变量方法
Pub Date : 2005-10-01 DOI: 10.1191/1471082X05st094oa
Chiu-Hsieh Hsu
In this paper, we treat the number of recurrent adenomatous polyps as a latent variable and then use a mixture distribution to model the number of observed recurrent adenomatous polyps. This approach is equivalent to zero-inflated Poisson regression, which is a method used to analyse count data with excess zeros. In a zero-inflated Poisson model, a count response variable is assumed to be distributed as a mixture of a Poisson distribution and a distribution with point mass of one at zero. In many cancer studies, patients often have variable follow-up. When the disease of interest is subject to late onset, ignoring the length of follow-up will underestimate the recurrence rate. In this paper, we modify zero-inflated Poisson regression through a weight function to incorporate the length of follow-up into analysis. We motivate, develop, and illustrate the methods described here with an example from a colon cancer study.
在本文中,我们将复发性腺瘤性息肉的数量作为一个潜在变量,然后使用混合分布来模拟观察到的复发性腺瘤性息肉的数量。这种方法相当于零膨胀泊松回归,这是一种用于分析带有多余零的计数数据的方法。在零膨胀泊松模型中,假设计数响应变量分布为泊松分布和点质量为1的分布的混合分布。在许多癌症研究中,患者通常有不同的随访。当所关注的疾病属于晚发性疾病时,忽略随访时间长短会低估复发率。在本文中,我们通过一个权函数修正零膨胀泊松回归,将随访的长度纳入分析。我们用一个结肠癌研究的例子来激励、发展和说明这里所描述的方法。
{"title":"Joint modelling of recurrence and progression of adenomas: a latent variable approach","authors":"Chiu-Hsieh Hsu","doi":"10.1191/1471082X05st094oa","DOIUrl":"https://doi.org/10.1191/1471082X05st094oa","url":null,"abstract":"In this paper, we treat the number of recurrent adenomatous polyps as a latent variable and then use a mixture distribution to model the number of observed recurrent adenomatous polyps. This approach is equivalent to zero-inflated Poisson regression, which is a method used to analyse count data with excess zeros. In a zero-inflated Poisson model, a count response variable is assumed to be distributed as a mixture of a Poisson distribution and a distribution with point mass of one at zero. In many cancer studies, patients often have variable follow-up. When the disease of interest is subject to late onset, ignoring the length of follow-up will underestimate the recurrence rate. In this paper, we modify zero-inflated Poisson regression through a weight function to incorporate the length of follow-up into analysis. We motivate, develop, and illustrate the methods described here with an example from a colon cancer study.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130698551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
Statistical Modeling
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1