首页 > 最新文献

Statistical Modeling最新文献

英文 中文
Bayesian inference for stochastic epidemics in closed populations 封闭种群中随机流行病的贝叶斯推断
Pub Date : 2004-04-01 DOI: 10.1191/1471082X04st065oa
G. Streftaris, G. Gibson
We consider continuous-time stochastic compartmental models that can be applied in veterinary epidemiology to model the within-herd dynamics of infectious diseases. We focus on an extension of Markovian epidemic models, allowing the infectious period of an individual to follow a Weibull distribution, resulting in a more flexible model for many diseases. Following a Bayesian approach we show how approximation methods can be applied to design efficient MCMC algorithms with favourable mixing properties for fitting non-Markovian models to partial observations of epidemic processes. The methodology is used to analyse real data concerning a smallpox outbreak in a human population, and a simulation study is conducted to assess the effects of the frequency and accuracy of diagnostic tests on the information yielded on the epidemic process.
我们考虑连续时间随机区室模型,可以应用于兽医流行病学来模拟传染病的群内动力学。我们关注的是马尔可夫流行病模型的扩展,允许个体的感染期遵循威布尔分布,从而为许多疾病提供更灵活的模型。根据贝叶斯方法,我们展示了如何将近似方法应用于设计具有良好混合特性的高效MCMC算法,以拟合非马尔可夫模型到流行病过程的部分观测值。该方法用于分析关于人口中天花爆发的真实数据,并进行了模拟研究,以评估诊断测试的频率和准确性对所获得的关于流行病过程的信息的影响。
{"title":"Bayesian inference for stochastic epidemics in closed populations","authors":"G. Streftaris, G. Gibson","doi":"10.1191/1471082X04st065oa","DOIUrl":"https://doi.org/10.1191/1471082X04st065oa","url":null,"abstract":"We consider continuous-time stochastic compartmental models that can be applied in veterinary epidemiology to model the within-herd dynamics of infectious diseases. We focus on an extension of Markovian epidemic models, allowing the infectious period of an individual to follow a Weibull distribution, resulting in a more flexible model for many diseases. Following a Bayesian approach we show how approximation methods can be applied to design efficient MCMC algorithms with favourable mixing properties for fitting non-Markovian models to partial observations of epidemic processes. The methodology is used to analyse real data concerning a smallpox outbreak in a human population, and a simulation study is conducted to assess the effects of the frequency and accuracy of diagnostic tests on the information yielded on the epidemic process.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130647200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
An empirical model for underdispersed count data 欠分散计数数据的经验模型
Pub Date : 2004-04-01 DOI: 10.1191/1471082X04st064oa
M. Ridout, P. Besbeas
We present a novel distribution for modelling count data that are underdispersed relative to the Poisson distribution. The distribution is a form of weighted Poisson distribution and is shown to have advantages over other weighted Poisson distributions that have been proposed to model underdispersion. One key difference is that the weights in our distribution are centred on the mean of the underlying Poisson distribution. Several illustrative examples are presented that illustrate the consistently good performance of the distribution.
我们提出了一种新的分布,用于模拟相对于泊松分布的欠分散计数数据。该分布是加权泊松分布的一种形式,与其他提出的用于模拟欠分散的加权泊松分布相比,该分布具有优势。一个关键的区别是,我们的分布中的权重集中在潜在泊松分布的平均值上。给出了几个示例,说明了该分布始终具有良好的性能。
{"title":"An empirical model for underdispersed count data","authors":"M. Ridout, P. Besbeas","doi":"10.1191/1471082X04st064oa","DOIUrl":"https://doi.org/10.1191/1471082X04st064oa","url":null,"abstract":"We present a novel distribution for modelling count data that are underdispersed relative to the Poisson distribution. The distribution is a form of weighted Poisson distribution and is shown to have advantages over other weighted Poisson distributions that have been proposed to model underdispersion. One key difference is that the weights in our distribution are centred on the mean of the underlying Poisson distribution. Several illustrative examples are presented that illustrate the consistently good performance of the distribution.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132854786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 101
Generalized log-linear models with random effects, with application to smoothing contingency tables 具有随机效应的广义对数线性模型及其在平滑列联表中的应用
Pub Date : 2003-12-01 DOI: 10.1191/1471082X03st059oa
B. Coull, A. Agresti
We define a class of generalized log-linear models with random effects. For a vector of Poisson or multinomial means m and matrices of constants C and A, the model has the form C log A μ = X β + Zu, where β are fixed effects and u are random effects. The model contains most standard models currently used for categorical data analysis. We suggest some new models that are special cases of this model and are useful for applications such as smoothing large contingency tables and modeling heterogeneity in odds ratios. We present examples of its use for such applications. In many cases, maximum likelihood model fitting can be handled with existing methods and software. We outline extensions of model fitting methods for other cases. We also summarize several challenges for future research, such as fitting the model in its most general form and deriving properties of estimates used in smoothing contingency tables.
我们定义了一类具有随机效应的广义对数线性模型。对于泊松向量或多项均值m和常数C和a的矩阵,模型的形式为C log a μ = X β + Zu,其中β为固定效应,u为随机效应。该模型包含了目前用于分类数据分析的大多数标准模型。我们提出了一些新的模型,这些模型是该模型的特殊情况,对于平滑大型列联表和建模优势比中的异质性等应用非常有用。我们给出了在此类应用程序中使用它的示例。在许多情况下,极大似然模型拟合可以用现有的方法和软件来处理。我们概述了模型拟合方法在其他情况下的扩展。我们还总结了未来研究的几个挑战,例如以最一般的形式拟合模型以及推导平滑列联表中使用的估计的性质。
{"title":"Generalized log-linear models with random effects, with application to smoothing contingency tables","authors":"B. Coull, A. Agresti","doi":"10.1191/1471082X03st059oa","DOIUrl":"https://doi.org/10.1191/1471082X03st059oa","url":null,"abstract":"We define a class of generalized log-linear models with random effects. For a vector of Poisson or multinomial means m and matrices of constants C and A, the model has the form C log A μ = X β + Zu, where β are fixed effects and u are random effects. The model contains most standard models currently used for categorical data analysis. We suggest some new models that are special cases of this model and are useful for applications such as smoothing large contingency tables and modeling heterogeneity in odds ratios. We present examples of its use for such applications. In many cases, maximum likelihood model fitting can be handled with existing methods and software. We outline extensions of model fitting methods for other cases. We also summarize several challenges for future research, such as fitting the model in its most general form and deriving properties of estimates used in smoothing contingency tables.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116824223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Longitudinal analysis of repeated binary data using autoregressive and random effect modelling 使用自回归和随机效应模型对重复二进制数据进行纵向分析
Pub Date : 2003-12-01 DOI: 10.1191/1471082X03st061oa
M. Aitkin, M. Alfò
In this paper we extend random coefficient models for binary repeated responses to include serial dependence of Markovian form, with the aim of defining a general association structure among responses recorded on the same individual. We do not adopt a parametric specification for the random coefficients distribution and this allows us to overcome inconsistencies due to misspecification of this component. Model parameters are estimated by means of an EM algorithm for nonparametric maximum likelihood (NPML), which is extended to deal with serial correlation among repeated measures, with an explicit focus on those situations where short individual time series have been observed. The approach is described by presenting a reanalysis of the well-known Muscatine (Iowa) longitudinal study on childhood obesity.
本文将二元重复响应的随机系数模型扩展到包含马尔可夫形式的序列依赖,目的是定义同一个体上记录的响应之间的一般关联结构。我们不采用随机系数分布的参数规范,这使我们能够克服由于该组件的错误规范而导致的不一致。通过非参数最大似然(NPML)的EM算法估计模型参数,该算法扩展到处理重复测量之间的序列相关性,并明确关注已观察到的短单个时间序列的情况。该方法是通过对著名的马斯卡廷(爱荷华州)儿童肥胖纵向研究的重新分析来描述的。
{"title":"Longitudinal analysis of repeated binary data using autoregressive and random effect modelling","authors":"M. Aitkin, M. Alfò","doi":"10.1191/1471082X03st061oa","DOIUrl":"https://doi.org/10.1191/1471082X03st061oa","url":null,"abstract":"In this paper we extend random coefficient models for binary repeated responses to include serial dependence of Markovian form, with the aim of defining a general association structure among responses recorded on the same individual. We do not adopt a parametric specification for the random coefficients distribution and this allows us to overcome inconsistencies due to misspecification of this component. Model parameters are estimated by means of an EM algorithm for nonparametric maximum likelihood (NPML), which is extended to deal with serial correlation among repeated measures, with an explicit focus on those situations where short individual time series have been observed. The approach is described by presenting a reanalysis of the well-known Muscatine (Iowa) longitudinal study on childhood obesity.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115091040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Generalized linear mixed models for strawberry inflorescence data 草莓花序数据的广义线性混合模型
Pub Date : 2003-12-01 DOI: 10.1191/1471082X03st060oa
D. Cole, B. Morgan, M. Ridout
Strawberry inflorescences have a variable branching structure. This paper demonstrates how the inflorescence structure can be modelled concisely using binomial logistic generalized linear mixed models. Many different procedures exist for estimating the parameters of generalized linear mixed models, including penalized likelihood, EM, Bayesian techniques, and simulated maximum likelihood. The main methods are reviewed and compared for fitting binomial logistic generalized linear mixed models to strawberry inflorescence data. Simulations matched to the original data are used to show that a modified EM method due to Steele (1996) is clearly the best, in terms of speed and mean-squared-error performance, for data of this kind.
草莓的花序具有可变的分枝结构。本文论证了如何用二项逻辑广义线性混合模型简明地模拟花序结构。估计广义线性混合模型的参数存在许多不同的程序,包括惩罚似然、EM、贝叶斯技术和模拟最大似然。综述和比较了草莓花序数据的二项logistic广义线性混合模型拟合的主要方法。与原始数据相匹配的模拟表明,由于Steele(1996)的改进的EM方法在速度和均方误差性能方面,对于这类数据显然是最好的。
{"title":"Generalized linear mixed models for strawberry inflorescence data","authors":"D. Cole, B. Morgan, M. Ridout","doi":"10.1191/1471082X03st060oa","DOIUrl":"https://doi.org/10.1191/1471082X03st060oa","url":null,"abstract":"Strawberry inflorescences have a variable branching structure. This paper demonstrates how the inflorescence structure can be modelled concisely using binomial logistic generalized linear mixed models. Many different procedures exist for estimating the parameters of generalized linear mixed models, including penalized likelihood, EM, Bayesian techniques, and simulated maximum likelihood. The main methods are reviewed and compared for fitting binomial logistic generalized linear mixed models to strawberry inflorescence data. Simulations matched to the original data are used to show that a modified EM method due to Steele (1996) is clearly the best, in terms of speed and mean-squared-error performance, for data of this kind.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128934770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Point and interval estimation of the population size using the truncated Poisson regression model 用截断泊松回归模型估计总体大小的点和区间
Pub Date : 2003-12-01 DOI: 10.1191/1471082X03st057oa
P. V. D. van der Heijden, R. Bustami, Maarten Cruyff, G. Engbersen, H. V. van Houwelingen
A method is presented to derive point and interval estimates of the total number of individuals in a heterogenous Poisson population. The method is based on the Horvitz-Thompson approach. The zero-truncated Poisson regression model is fitted and results are used to obtain point and interval estimates for the total number of individuals in the population. The method is assessed by performing a simulation experiment computing coverage probabilities of Horvitz-Thompson confidence intervals for cases with different sample sizes and Poisson parameters. We illustrate our method using capture-recapture data from the police registration system providing information on illegal immigrants in four large cities in the Netherlands.
提出了一种异质泊松种群中总个体数的点估计和区间估计方法。该方法基于Horvitz-Thompson方法。对零截断泊松回归模型进行拟合,并利用结果得到种群中个体总数的点和区间估计。通过模拟实验对不同样本量和泊松参数情况下的Horvitz-Thompson置信区间的覆盖概率进行了评估。我们使用来自警察登记系统的捕获-再捕获数据来说明我们的方法,该系统提供了荷兰四个大城市的非法移民信息。
{"title":"Point and interval estimation of the population size using the truncated Poisson regression model","authors":"P. V. D. van der Heijden, R. Bustami, Maarten Cruyff, G. Engbersen, H. V. van Houwelingen","doi":"10.1191/1471082X03st057oa","DOIUrl":"https://doi.org/10.1191/1471082X03st057oa","url":null,"abstract":"A method is presented to derive point and interval estimates of the total number of individuals in a heterogenous Poisson population. The method is based on the Horvitz-Thompson approach. The zero-truncated Poisson regression model is fitted and results are used to obtain point and interval estimates for the total number of individuals in the population. The method is assessed by performing a simulation experiment computing coverage probabilities of Horvitz-Thompson confidence intervals for cases with different sample sizes and Poisson parameters. We illustrate our method using capture-recapture data from the police registration system providing information on illegal immigrants in four large cities in the Netherlands.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129548425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 94
Modelling paired release-recovery data in the presence of survival and capture heterogeneity with application to marked juvenile salmon 在存在生存和捕获异质性的情况下,建模配对释放-恢复数据,并应用于标记的幼年鲑鱼
Pub Date : 2003-10-01 DOI: 10.1191/1471082X03st055oa
K. Newman
Products of multinomial models have been the standard approach to analysing animal release-recovery data. Two alternatives, a pseudo-likelihood model and a Bayesian nonlinear hierarchical model, are developed. Both approaches can to some degree account for heterogeneity in survival and capture probabilities over and above that accounted for by covariates. The pseudo-likelihood approach allows for recovery period specific overdispersion. The hierarchical approach treats survival and capture rates as a sum of fixed and random effects. The standard and alternative approaches were applied to a set of paired release-recovery salmon data. Marked juvenile chinook salmon (Oncorhynchus tshawytscha) were released, with some recovered in freshwater as juveniles and others in marine waters as adults. Interest centered on modelling freshwater survival rates as a function of biological and hydrological covariates. Under the product multinomial formulation, most covariates were statistically significant. In contrast, under the pseudo-likelihood and hierarchical formulations, the standard errors for the coefficients were considerably larger, with pseudo-likelihood standard errors five to eight times larger, and fewer coefficients were statistically significant. Covariates, significant under all formulations, with important management implications included water temperature, water flow and amount of water exported for human use. The hierarchical model was considerably more stable with regard to estimated coefficients of training subsets used in a cross-validation.
多项模型的结果已成为分析动物释放-恢复数据的标准方法。提出了伪似然模型和贝叶斯非线性层次模型。这两种方法都可以在一定程度上解释生存和捕获概率的异质性,而不是协变量所解释的。伪似然方法允许恢复期特定的过分散。分层方法将存活率和捕获率视为固定效应和随机效应的总和。标准和替代方法应用于一组配对释放-恢复鲑鱼数据。有标记的幼年奇努克鲑鱼(Oncorhynchus tshawytscha)被放生,其中一些在淡水中恢复为幼体,另一些在海水中恢复为成年。兴趣集中在模拟淡水存活率作为生物和水文协变量的函数。在乘积多项式公式下,大多数协变量具有统计学显著性。相比之下,在伪似然和分层公式下,系数的标准误差要大得多,其中伪似然标准误差要大5 ~ 8倍,且具有统计学显著性的系数较少。协变量包括水温、水流量和供人类使用的出口水量,在所有公式中都很重要,具有重要的管理意义。在交叉验证中使用的训练子集的估计系数方面,层次模型相当稳定。
{"title":"Modelling paired release-recovery data in the presence of survival and capture heterogeneity with application to marked juvenile salmon","authors":"K. Newman","doi":"10.1191/1471082X03st055oa","DOIUrl":"https://doi.org/10.1191/1471082X03st055oa","url":null,"abstract":"Products of multinomial models have been the standard approach to analysing animal release-recovery data. Two alternatives, a pseudo-likelihood model and a Bayesian nonlinear hierarchical model, are developed. Both approaches can to some degree account for heterogeneity in survival and capture probabilities over and above that accounted for by covariates. The pseudo-likelihood approach allows for recovery period specific overdispersion. The hierarchical approach treats survival and capture rates as a sum of fixed and random effects. The standard and alternative approaches were applied to a set of paired release-recovery salmon data. Marked juvenile chinook salmon (Oncorhynchus tshawytscha) were released, with some recovered in freshwater as juveniles and others in marine waters as adults. Interest centered on modelling freshwater survival rates as a function of biological and hydrological covariates. Under the product multinomial formulation, most covariates were statistically significant. In contrast, under the pseudo-likelihood and hierarchical formulations, the standard errors for the coefficients were considerably larger, with pseudo-likelihood standard errors five to eight times larger, and fewer coefficients were statistically significant. Covariates, significant under all formulations, with important management implications included water temperature, water flow and amount of water exported for human use. The hierarchical model was considerably more stable with regard to estimated coefficients of training subsets used in a cross-validation.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125349955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Regression analysis of variates observed on (0, 1): percentages, proportions and fractions 在(0,1)上观察到的变量的回归分析:百分比,比例和分数
Pub Date : 2003-10-01 DOI: 10.1191/1471082X03st053oa
R. Kieschnick, Bruce D. McCullough
Many types of studies examine the influence of selected variables on the conditional expectation of a proportion or vector of proportions, for example, market shares, rock composition, and so on. We identify four distributional categories into which such data can be put, and focus on regression models for the first category, for proportions observed on the open interval (0, 1). For these data, we identify different specifications used in prior research and compare these specifications using two common samples and specifications of the regressors. Based upon our analysis, we recommend that researchers use either a parametric regression model based upon the beta distribution or a quasi-likelihood regression model developed by Papke and Wooldridge (1997) for these data. Concerning the choice between these two regression models, we recommend that researchers use the parametric regression model unless their sample size is large enough to justify the asymptotic arguments underlying the quasi-likelihood approach.
许多类型的研究检查选定变量对比例或比例矢量的条件期望的影响,例如,市场份额、岩石成分等。我们确定了可以放入这些数据的四个分布类别,并将重点放在第一类的回归模型上,即在开放区间(0,1)上观察到的比例。对于这些数据,我们确定了先前研究中使用的不同规格,并使用两个常见样本和回归量的规格比较这些规格。根据我们的分析,我们建议研究人员使用基于beta分布的参数回归模型或由Papke和Wooldridge(1997)开发的准似然回归模型来处理这些数据。关于这两种回归模型之间的选择,我们建议研究人员使用参数回归模型,除非他们的样本量足够大,足以证明准似然方法背后的渐近论点。
{"title":"Regression analysis of variates observed on (0, 1): percentages, proportions and fractions","authors":"R. Kieschnick, Bruce D. McCullough","doi":"10.1191/1471082X03st053oa","DOIUrl":"https://doi.org/10.1191/1471082X03st053oa","url":null,"abstract":"Many types of studies examine the influence of selected variables on the conditional expectation of a proportion or vector of proportions, for example, market shares, rock composition, and so on. We identify four distributional categories into which such data can be put, and focus on regression models for the first category, for proportions observed on the open interval (0, 1). For these data, we identify different specifications used in prior research and compare these specifications using two common samples and specifications of the regressors. Based upon our analysis, we recommend that researchers use either a parametric regression model based upon the beta distribution or a quasi-likelihood regression model developed by Papke and Wooldridge (1997) for these data. Concerning the choice between these two regression models, we recommend that researchers use the parametric regression model unless their sample size is large enough to justify the asymptotic arguments underlying the quasi-likelihood approach.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114610102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 461
Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation 用非参数极大似然估计校正逻辑回归中的协变量测量误差
Pub Date : 2003-10-01 DOI: 10.1191/1471082X03st056oa
S. Rabe-Hesketh, A. Pickles, A. Skrondal
When covariates are measured with error, inference based on conventional generalized linear models can yield biased estimates of regression parameters. This problem can potentially be rectified by using generalized linear latent and mixed models (GLLAMM), including a measurement model for the relationship between observed and true covariates. However, the models are typically estimated under the assumption that both the true covariates and the measurement errors are normally distributed, although skewed covariate distributions are often observed in practice. In this article we relax the normality assumption for the true covariates by developing nonparametric maximum likelihood estimation (NPMLE) for GLLAMMs. The methodology is applied to estimating the effect of dietary fibre intake on coronary heart disease. We also assess the performance of estimation of regression parameters and empirical Bayes prediction of the true covariate. Normal as well as skewed covariate distributions are simulated and inference is performed based on both maximum likelihood assuming normality and NPMLE. Both estimators are unbiased and have similar root mean square errors when the true covariate is normal. With a skewed covariate, the conventional estimator is biased but has a smaller mean square error than the NPMLE. NPMLE produces substantially improved empirical Bayes predictions of the true covariate when its distribution is skewed.
当协变量测量有误差时,基于传统广义线性模型的推理可能会产生回归参数的偏估计。这个问题可以通过使用广义线性潜在和混合模型(GLLAMM)来潜在地纠正,包括观测和真实协变量之间关系的测量模型。然而,模型通常是在假设真协变量和测量误差都是正态分布的情况下进行估计的,尽管在实践中经常观察到偏态协变量分布。本文通过发展GLLAMMs的非参数极大似然估计(NPMLE),放宽真协变量的正态性假设。该方法用于估计膳食纤维摄入量对冠心病的影响。我们还评估了回归参数估计和真协变量的经验贝叶斯预测的性能。模拟正态和偏态协变量分布,并基于最大似然假设正态和NPMLE进行推理。当真正的协变量为正态时,两个估计量都是无偏的,并且具有相似的均方根误差。对于偏态协变量,传统估计量是有偏的,但其均方误差比NPMLE小。当其分布偏斜时,NPMLE对真协变量产生显著改进的经验贝叶斯预测。
{"title":"Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation","authors":"S. Rabe-Hesketh, A. Pickles, A. Skrondal","doi":"10.1191/1471082X03st056oa","DOIUrl":"https://doi.org/10.1191/1471082X03st056oa","url":null,"abstract":"When covariates are measured with error, inference based on conventional generalized linear models can yield biased estimates of regression parameters. This problem can potentially be rectified by using generalized linear latent and mixed models (GLLAMM), including a measurement model for the relationship between observed and true covariates. However, the models are typically estimated under the assumption that both the true covariates and the measurement errors are normally distributed, although skewed covariate distributions are often observed in practice. In this article we relax the normality assumption for the true covariates by developing nonparametric maximum likelihood estimation (NPMLE) for GLLAMMs. The methodology is applied to estimating the effect of dietary fibre intake on coronary heart disease. We also assess the performance of estimation of regression parameters and empirical Bayes prediction of the true covariate. Normal as well as skewed covariate distributions are simulated and inference is performed based on both maximum likelihood assuming normality and NPMLE. Both estimators are unbiased and have similar root mean square errors when the true covariate is normal. With a skewed covariate, the conventional estimator is biased but has a smaller mean square error than the NPMLE. NPMLE produces substantially improved empirical Bayes predictions of the true covariate when its distribution is skewed.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114783979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 84
A mixed model formulation for designing cluster randomized trials with binary outcomes 设计具有二元结果的聚类随机试验的混合模型
Pub Date : 2003-10-01 DOI: 10.1191/1471082X03st054oa
T. Braun
Cluster randomized trials (CRTs) are unlike traditional individually randomized trials because observations within the same cluster are positively correlated and the sample size (number of clusters) is relatively small. Although formulae for sample size and power estimates of CRT designs do exist, these formulae rely upon first-order asymptotic approximations for the distribution of the average intervention effect and are inaccurate for CRTs that have a small number of clusters. These formulae also assume that the intracluster correlation (ICC) is the same for each cluster in the CRT. However, for CRTs in which the clusters are classrooms or medical practices, the degree of ICC is often a factor of how many students are in each classroom or how many patients are in each practice. Specifically, smaller clusters are expected to have larger ICC than larger clusters. A weighted sum of the cluster means, D, is the statistic often used to estimate the average intervention effect in a CRT. Therefore, we propose that a saddlepoint approximation is a natural choice to approximate the distributions of the cluster means more precisely than a standard large-sample approximation. We parameterize the ICC for each cluster as a random effect with a predefined prior distribution that is dependent upon the size of each cluster. After integrating over the range of the random effect, we use Monte Carlo methods to generate sample cluster means, which are in turn used to approximate the distribution of D with saddlepoint methods. Through numerical examples and an actual application, we show that our method has accuracy that is equal to or better than that of existing methods. Futhermore, our method accommodates CRTs in which the correlation within cluster is expected to diminish with the cluster size.
聚类随机试验(crt)不同于传统的单独随机试验,因为同一聚类内的观察结果呈正相关,而且样本量(聚类数量)相对较小。虽然确实存在CRT设计的样本量和功率估计公式,但这些公式依赖于平均干预效果分布的一阶渐近近似,对于具有少量簇的CRT是不准确的。这些公式还假设簇内相关性(ICC)对CRT中的每个簇都是相同的。然而,对于集群是教室或医疗实践的crt, ICC的程度通常是每个教室有多少学生或每个实践有多少病人的一个因素。具体来说,较小的集群预计比较大的集群具有更大的ICC。聚类均值的加权和D是通常用于估计CRT平均干预效果的统计量。因此,我们提出鞍点近似是比标准大样本近似更精确地近似聚类均值分布的自然选择。我们将每个集群的ICC参数化为具有预定义先验分布的随机效应,该分布取决于每个集群的大小。在随机效应的范围内积分后,我们使用蒙特卡罗方法生成样本聚类均值,然后使用鞍点方法近似D的分布。通过数值算例和实际应用表明,该方法的精度等于或优于现有方法。此外,我们的方法适用于簇内相关性随着簇大小而减小的crt。
{"title":"A mixed model formulation for designing cluster randomized trials with binary outcomes","authors":"T. Braun","doi":"10.1191/1471082X03st054oa","DOIUrl":"https://doi.org/10.1191/1471082X03st054oa","url":null,"abstract":"Cluster randomized trials (CRTs) are unlike traditional individually randomized trials because observations within the same cluster are positively correlated and the sample size (number of clusters) is relatively small. Although formulae for sample size and power estimates of CRT designs do exist, these formulae rely upon first-order asymptotic approximations for the distribution of the average intervention effect and are inaccurate for CRTs that have a small number of clusters. These formulae also assume that the intracluster correlation (ICC) is the same for each cluster in the CRT. However, for CRTs in which the clusters are classrooms or medical practices, the degree of ICC is often a factor of how many students are in each classroom or how many patients are in each practice. Specifically, smaller clusters are expected to have larger ICC than larger clusters. A weighted sum of the cluster means, D, is the statistic often used to estimate the average intervention effect in a CRT. Therefore, we propose that a saddlepoint approximation is a natural choice to approximate the distributions of the cluster means more precisely than a standard large-sample approximation. We parameterize the ICC for each cluster as a random effect with a predefined prior distribution that is dependent upon the size of each cluster. After integrating over the range of the random effect, we use Monte Carlo methods to generate sample cluster means, which are in turn used to approximate the distribution of D with saddlepoint methods. Through numerical examples and an actual application, we show that our method has accuracy that is equal to or better than that of existing methods. Futhermore, our method accommodates CRTs in which the correlation within cluster is expected to diminish with the cluster size.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"70 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129639702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Statistical Modeling
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1