Pub Date : 2004-04-01DOI: 10.1191/1471082X04st065oa
G. Streftaris, G. Gibson
We consider continuous-time stochastic compartmental models that can be applied in veterinary epidemiology to model the within-herd dynamics of infectious diseases. We focus on an extension of Markovian epidemic models, allowing the infectious period of an individual to follow a Weibull distribution, resulting in a more flexible model for many diseases. Following a Bayesian approach we show how approximation methods can be applied to design efficient MCMC algorithms with favourable mixing properties for fitting non-Markovian models to partial observations of epidemic processes. The methodology is used to analyse real data concerning a smallpox outbreak in a human population, and a simulation study is conducted to assess the effects of the frequency and accuracy of diagnostic tests on the information yielded on the epidemic process.
{"title":"Bayesian inference for stochastic epidemics in closed populations","authors":"G. Streftaris, G. Gibson","doi":"10.1191/1471082X04st065oa","DOIUrl":"https://doi.org/10.1191/1471082X04st065oa","url":null,"abstract":"We consider continuous-time stochastic compartmental models that can be applied in veterinary epidemiology to model the within-herd dynamics of infectious diseases. We focus on an extension of Markovian epidemic models, allowing the infectious period of an individual to follow a Weibull distribution, resulting in a more flexible model for many diseases. Following a Bayesian approach we show how approximation methods can be applied to design efficient MCMC algorithms with favourable mixing properties for fitting non-Markovian models to partial observations of epidemic processes. The methodology is used to analyse real data concerning a smallpox outbreak in a human population, and a simulation study is conducted to assess the effects of the frequency and accuracy of diagnostic tests on the information yielded on the epidemic process.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130647200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-04-01DOI: 10.1191/1471082X04st064oa
M. Ridout, P. Besbeas
We present a novel distribution for modelling count data that are underdispersed relative to the Poisson distribution. The distribution is a form of weighted Poisson distribution and is shown to have advantages over other weighted Poisson distributions that have been proposed to model underdispersion. One key difference is that the weights in our distribution are centred on the mean of the underlying Poisson distribution. Several illustrative examples are presented that illustrate the consistently good performance of the distribution.
{"title":"An empirical model for underdispersed count data","authors":"M. Ridout, P. Besbeas","doi":"10.1191/1471082X04st064oa","DOIUrl":"https://doi.org/10.1191/1471082X04st064oa","url":null,"abstract":"We present a novel distribution for modelling count data that are underdispersed relative to the Poisson distribution. The distribution is a form of weighted Poisson distribution and is shown to have advantages over other weighted Poisson distributions that have been proposed to model underdispersion. One key difference is that the weights in our distribution are centred on the mean of the underlying Poisson distribution. Several illustrative examples are presented that illustrate the consistently good performance of the distribution.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132854786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-12-01DOI: 10.1191/1471082X03st059oa
B. Coull, A. Agresti
We define a class of generalized log-linear models with random effects. For a vector of Poisson or multinomial means m and matrices of constants C and A, the model has the form C log A μ = X β + Zu, where β are fixed effects and u are random effects. The model contains most standard models currently used for categorical data analysis. We suggest some new models that are special cases of this model and are useful for applications such as smoothing large contingency tables and modeling heterogeneity in odds ratios. We present examples of its use for such applications. In many cases, maximum likelihood model fitting can be handled with existing methods and software. We outline extensions of model fitting methods for other cases. We also summarize several challenges for future research, such as fitting the model in its most general form and deriving properties of estimates used in smoothing contingency tables.
我们定义了一类具有随机效应的广义对数线性模型。对于泊松向量或多项均值m和常数C和a的矩阵,模型的形式为C log a μ = X β + Zu,其中β为固定效应,u为随机效应。该模型包含了目前用于分类数据分析的大多数标准模型。我们提出了一些新的模型,这些模型是该模型的特殊情况,对于平滑大型列联表和建模优势比中的异质性等应用非常有用。我们给出了在此类应用程序中使用它的示例。在许多情况下,极大似然模型拟合可以用现有的方法和软件来处理。我们概述了模型拟合方法在其他情况下的扩展。我们还总结了未来研究的几个挑战,例如以最一般的形式拟合模型以及推导平滑列联表中使用的估计的性质。
{"title":"Generalized log-linear models with random effects, with application to smoothing contingency tables","authors":"B. Coull, A. Agresti","doi":"10.1191/1471082X03st059oa","DOIUrl":"https://doi.org/10.1191/1471082X03st059oa","url":null,"abstract":"We define a class of generalized log-linear models with random effects. For a vector of Poisson or multinomial means m and matrices of constants C and A, the model has the form C log A μ = X β + Zu, where β are fixed effects and u are random effects. The model contains most standard models currently used for categorical data analysis. We suggest some new models that are special cases of this model and are useful for applications such as smoothing large contingency tables and modeling heterogeneity in odds ratios. We present examples of its use for such applications. In many cases, maximum likelihood model fitting can be handled with existing methods and software. We outline extensions of model fitting methods for other cases. We also summarize several challenges for future research, such as fitting the model in its most general form and deriving properties of estimates used in smoothing contingency tables.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116824223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-12-01DOI: 10.1191/1471082X03st061oa
M. Aitkin, M. Alfò
In this paper we extend random coefficient models for binary repeated responses to include serial dependence of Markovian form, with the aim of defining a general association structure among responses recorded on the same individual. We do not adopt a parametric specification for the random coefficients distribution and this allows us to overcome inconsistencies due to misspecification of this component. Model parameters are estimated by means of an EM algorithm for nonparametric maximum likelihood (NPML), which is extended to deal with serial correlation among repeated measures, with an explicit focus on those situations where short individual time series have been observed. The approach is described by presenting a reanalysis of the well-known Muscatine (Iowa) longitudinal study on childhood obesity.
{"title":"Longitudinal analysis of repeated binary data using autoregressive and random effect modelling","authors":"M. Aitkin, M. Alfò","doi":"10.1191/1471082X03st061oa","DOIUrl":"https://doi.org/10.1191/1471082X03st061oa","url":null,"abstract":"In this paper we extend random coefficient models for binary repeated responses to include serial dependence of Markovian form, with the aim of defining a general association structure among responses recorded on the same individual. We do not adopt a parametric specification for the random coefficients distribution and this allows us to overcome inconsistencies due to misspecification of this component. Model parameters are estimated by means of an EM algorithm for nonparametric maximum likelihood (NPML), which is extended to deal with serial correlation among repeated measures, with an explicit focus on those situations where short individual time series have been observed. The approach is described by presenting a reanalysis of the well-known Muscatine (Iowa) longitudinal study on childhood obesity.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115091040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-12-01DOI: 10.1191/1471082X03st060oa
D. Cole, B. Morgan, M. Ridout
Strawberry inflorescences have a variable branching structure. This paper demonstrates how the inflorescence structure can be modelled concisely using binomial logistic generalized linear mixed models. Many different procedures exist for estimating the parameters of generalized linear mixed models, including penalized likelihood, EM, Bayesian techniques, and simulated maximum likelihood. The main methods are reviewed and compared for fitting binomial logistic generalized linear mixed models to strawberry inflorescence data. Simulations matched to the original data are used to show that a modified EM method due to Steele (1996) is clearly the best, in terms of speed and mean-squared-error performance, for data of this kind.
{"title":"Generalized linear mixed models for strawberry inflorescence data","authors":"D. Cole, B. Morgan, M. Ridout","doi":"10.1191/1471082X03st060oa","DOIUrl":"https://doi.org/10.1191/1471082X03st060oa","url":null,"abstract":"Strawberry inflorescences have a variable branching structure. This paper demonstrates how the inflorescence structure can be modelled concisely using binomial logistic generalized linear mixed models. Many different procedures exist for estimating the parameters of generalized linear mixed models, including penalized likelihood, EM, Bayesian techniques, and simulated maximum likelihood. The main methods are reviewed and compared for fitting binomial logistic generalized linear mixed models to strawberry inflorescence data. Simulations matched to the original data are used to show that a modified EM method due to Steele (1996) is clearly the best, in terms of speed and mean-squared-error performance, for data of this kind.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128934770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-12-01DOI: 10.1191/1471082X03st057oa
P. V. D. van der Heijden, R. Bustami, Maarten Cruyff, G. Engbersen, H. V. van Houwelingen
A method is presented to derive point and interval estimates of the total number of individuals in a heterogenous Poisson population. The method is based on the Horvitz-Thompson approach. The zero-truncated Poisson regression model is fitted and results are used to obtain point and interval estimates for the total number of individuals in the population. The method is assessed by performing a simulation experiment computing coverage probabilities of Horvitz-Thompson confidence intervals for cases with different sample sizes and Poisson parameters. We illustrate our method using capture-recapture data from the police registration system providing information on illegal immigrants in four large cities in the Netherlands.
{"title":"Point and interval estimation of the population size using the truncated Poisson regression model","authors":"P. V. D. van der Heijden, R. Bustami, Maarten Cruyff, G. Engbersen, H. V. van Houwelingen","doi":"10.1191/1471082X03st057oa","DOIUrl":"https://doi.org/10.1191/1471082X03st057oa","url":null,"abstract":"A method is presented to derive point and interval estimates of the total number of individuals in a heterogenous Poisson population. The method is based on the Horvitz-Thompson approach. The zero-truncated Poisson regression model is fitted and results are used to obtain point and interval estimates for the total number of individuals in the population. The method is assessed by performing a simulation experiment computing coverage probabilities of Horvitz-Thompson confidence intervals for cases with different sample sizes and Poisson parameters. We illustrate our method using capture-recapture data from the police registration system providing information on illegal immigrants in four large cities in the Netherlands.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129548425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-01DOI: 10.1191/1471082X03st055oa
K. Newman
Products of multinomial models have been the standard approach to analysing animal release-recovery data. Two alternatives, a pseudo-likelihood model and a Bayesian nonlinear hierarchical model, are developed. Both approaches can to some degree account for heterogeneity in survival and capture probabilities over and above that accounted for by covariates. The pseudo-likelihood approach allows for recovery period specific overdispersion. The hierarchical approach treats survival and capture rates as a sum of fixed and random effects. The standard and alternative approaches were applied to a set of paired release-recovery salmon data. Marked juvenile chinook salmon (Oncorhynchus tshawytscha) were released, with some recovered in freshwater as juveniles and others in marine waters as adults. Interest centered on modelling freshwater survival rates as a function of biological and hydrological covariates. Under the product multinomial formulation, most covariates were statistically significant. In contrast, under the pseudo-likelihood and hierarchical formulations, the standard errors for the coefficients were considerably larger, with pseudo-likelihood standard errors five to eight times larger, and fewer coefficients were statistically significant. Covariates, significant under all formulations, with important management implications included water temperature, water flow and amount of water exported for human use. The hierarchical model was considerably more stable with regard to estimated coefficients of training subsets used in a cross-validation.
{"title":"Modelling paired release-recovery data in the presence of survival and capture heterogeneity with application to marked juvenile salmon","authors":"K. Newman","doi":"10.1191/1471082X03st055oa","DOIUrl":"https://doi.org/10.1191/1471082X03st055oa","url":null,"abstract":"Products of multinomial models have been the standard approach to analysing animal release-recovery data. Two alternatives, a pseudo-likelihood model and a Bayesian nonlinear hierarchical model, are developed. Both approaches can to some degree account for heterogeneity in survival and capture probabilities over and above that accounted for by covariates. The pseudo-likelihood approach allows for recovery period specific overdispersion. The hierarchical approach treats survival and capture rates as a sum of fixed and random effects. The standard and alternative approaches were applied to a set of paired release-recovery salmon data. Marked juvenile chinook salmon (Oncorhynchus tshawytscha) were released, with some recovered in freshwater as juveniles and others in marine waters as adults. Interest centered on modelling freshwater survival rates as a function of biological and hydrological covariates. Under the product multinomial formulation, most covariates were statistically significant. In contrast, under the pseudo-likelihood and hierarchical formulations, the standard errors for the coefficients were considerably larger, with pseudo-likelihood standard errors five to eight times larger, and fewer coefficients were statistically significant. Covariates, significant under all formulations, with important management implications included water temperature, water flow and amount of water exported for human use. The hierarchical model was considerably more stable with regard to estimated coefficients of training subsets used in a cross-validation.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125349955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-01DOI: 10.1191/1471082X03st053oa
R. Kieschnick, Bruce D. McCullough
Many types of studies examine the influence of selected variables on the conditional expectation of a proportion or vector of proportions, for example, market shares, rock composition, and so on. We identify four distributional categories into which such data can be put, and focus on regression models for the first category, for proportions observed on the open interval (0, 1). For these data, we identify different specifications used in prior research and compare these specifications using two common samples and specifications of the regressors. Based upon our analysis, we recommend that researchers use either a parametric regression model based upon the beta distribution or a quasi-likelihood regression model developed by Papke and Wooldridge (1997) for these data. Concerning the choice between these two regression models, we recommend that researchers use the parametric regression model unless their sample size is large enough to justify the asymptotic arguments underlying the quasi-likelihood approach.
{"title":"Regression analysis of variates observed on (0, 1): percentages, proportions and fractions","authors":"R. Kieschnick, Bruce D. McCullough","doi":"10.1191/1471082X03st053oa","DOIUrl":"https://doi.org/10.1191/1471082X03st053oa","url":null,"abstract":"Many types of studies examine the influence of selected variables on the conditional expectation of a proportion or vector of proportions, for example, market shares, rock composition, and so on. We identify four distributional categories into which such data can be put, and focus on regression models for the first category, for proportions observed on the open interval (0, 1). For these data, we identify different specifications used in prior research and compare these specifications using two common samples and specifications of the regressors. Based upon our analysis, we recommend that researchers use either a parametric regression model based upon the beta distribution or a quasi-likelihood regression model developed by Papke and Wooldridge (1997) for these data. Concerning the choice between these two regression models, we recommend that researchers use the parametric regression model unless their sample size is large enough to justify the asymptotic arguments underlying the quasi-likelihood approach.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114610102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-01DOI: 10.1191/1471082X03st056oa
S. Rabe-Hesketh, A. Pickles, A. Skrondal
When covariates are measured with error, inference based on conventional generalized linear models can yield biased estimates of regression parameters. This problem can potentially be rectified by using generalized linear latent and mixed models (GLLAMM), including a measurement model for the relationship between observed and true covariates. However, the models are typically estimated under the assumption that both the true covariates and the measurement errors are normally distributed, although skewed covariate distributions are often observed in practice. In this article we relax the normality assumption for the true covariates by developing nonparametric maximum likelihood estimation (NPMLE) for GLLAMMs. The methodology is applied to estimating the effect of dietary fibre intake on coronary heart disease. We also assess the performance of estimation of regression parameters and empirical Bayes prediction of the true covariate. Normal as well as skewed covariate distributions are simulated and inference is performed based on both maximum likelihood assuming normality and NPMLE. Both estimators are unbiased and have similar root mean square errors when the true covariate is normal. With a skewed covariate, the conventional estimator is biased but has a smaller mean square error than the NPMLE. NPMLE produces substantially improved empirical Bayes predictions of the true covariate when its distribution is skewed.
{"title":"Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation","authors":"S. Rabe-Hesketh, A. Pickles, A. Skrondal","doi":"10.1191/1471082X03st056oa","DOIUrl":"https://doi.org/10.1191/1471082X03st056oa","url":null,"abstract":"When covariates are measured with error, inference based on conventional generalized linear models can yield biased estimates of regression parameters. This problem can potentially be rectified by using generalized linear latent and mixed models (GLLAMM), including a measurement model for the relationship between observed and true covariates. However, the models are typically estimated under the assumption that both the true covariates and the measurement errors are normally distributed, although skewed covariate distributions are often observed in practice. In this article we relax the normality assumption for the true covariates by developing nonparametric maximum likelihood estimation (NPMLE) for GLLAMMs. The methodology is applied to estimating the effect of dietary fibre intake on coronary heart disease. We also assess the performance of estimation of regression parameters and empirical Bayes prediction of the true covariate. Normal as well as skewed covariate distributions are simulated and inference is performed based on both maximum likelihood assuming normality and NPMLE. Both estimators are unbiased and have similar root mean square errors when the true covariate is normal. With a skewed covariate, the conventional estimator is biased but has a smaller mean square error than the NPMLE. NPMLE produces substantially improved empirical Bayes predictions of the true covariate when its distribution is skewed.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114783979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-01DOI: 10.1191/1471082X03st054oa
T. Braun
Cluster randomized trials (CRTs) are unlike traditional individually randomized trials because observations within the same cluster are positively correlated and the sample size (number of clusters) is relatively small. Although formulae for sample size and power estimates of CRT designs do exist, these formulae rely upon first-order asymptotic approximations for the distribution of the average intervention effect and are inaccurate for CRTs that have a small number of clusters. These formulae also assume that the intracluster correlation (ICC) is the same for each cluster in the CRT. However, for CRTs in which the clusters are classrooms or medical practices, the degree of ICC is often a factor of how many students are in each classroom or how many patients are in each practice. Specifically, smaller clusters are expected to have larger ICC than larger clusters. A weighted sum of the cluster means, D, is the statistic often used to estimate the average intervention effect in a CRT. Therefore, we propose that a saddlepoint approximation is a natural choice to approximate the distributions of the cluster means more precisely than a standard large-sample approximation. We parameterize the ICC for each cluster as a random effect with a predefined prior distribution that is dependent upon the size of each cluster. After integrating over the range of the random effect, we use Monte Carlo methods to generate sample cluster means, which are in turn used to approximate the distribution of D with saddlepoint methods. Through numerical examples and an actual application, we show that our method has accuracy that is equal to or better than that of existing methods. Futhermore, our method accommodates CRTs in which the correlation within cluster is expected to diminish with the cluster size.
{"title":"A mixed model formulation for designing cluster randomized trials with binary outcomes","authors":"T. Braun","doi":"10.1191/1471082X03st054oa","DOIUrl":"https://doi.org/10.1191/1471082X03st054oa","url":null,"abstract":"Cluster randomized trials (CRTs) are unlike traditional individually randomized trials because observations within the same cluster are positively correlated and the sample size (number of clusters) is relatively small. Although formulae for sample size and power estimates of CRT designs do exist, these formulae rely upon first-order asymptotic approximations for the distribution of the average intervention effect and are inaccurate for CRTs that have a small number of clusters. These formulae also assume that the intracluster correlation (ICC) is the same for each cluster in the CRT. However, for CRTs in which the clusters are classrooms or medical practices, the degree of ICC is often a factor of how many students are in each classroom or how many patients are in each practice. Specifically, smaller clusters are expected to have larger ICC than larger clusters. A weighted sum of the cluster means, D, is the statistic often used to estimate the average intervention effect in a CRT. Therefore, we propose that a saddlepoint approximation is a natural choice to approximate the distributions of the cluster means more precisely than a standard large-sample approximation. We parameterize the ICC for each cluster as a random effect with a predefined prior distribution that is dependent upon the size of each cluster. After integrating over the range of the random effect, we use Monte Carlo methods to generate sample cluster means, which are in turn used to approximate the distribution of D with saddlepoint methods. Through numerical examples and an actual application, we show that our method has accuracy that is equal to or better than that of existing methods. Futhermore, our method accommodates CRTs in which the correlation within cluster is expected to diminish with the cluster size.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"70 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129639702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}