Pub Date : 2003-10-01DOI: 10.1191/1471082X03st058oa
J. Booth, G. Casella, H. Friedl, J. Hobert
The Poisson loglinear model is a common choice for explaining variability in counts. However, in many practical circumstances the restriction that the mean and variance are equal is not realistic. Overdispersion with respect to the Poisson distribution can be modeled explicitly by integrating with respect to a mixture distribution, and use of the conjugate gamma mixing distribution leads to a negative binomial loglinear model. This paper extends the negative binomial loglinear model to the case of dependent counts, where dependence among the counts is handled by including linear combinations of random effects in the linear predictor. If we assume that the vector of random effects is multivariate normal, then complex forms of dependence can be modelled by appropriate specification of the covariance structure. Although the likelihood function for the resulting model is not tractable, maximum likelihood estimates (and standard errors) can be found using the NLMIXED procedure in SAS or, in more complicated examples, using a Monte Carlo EM algorithm. An alternate approach is to leave the random effects completely unspecified and attempt to estimate them using nonparametric maximum likelihood. The methodologies are illustrated with several examples.
{"title":"Negative binomial loglinear mixed models","authors":"J. Booth, G. Casella, H. Friedl, J. Hobert","doi":"10.1191/1471082X03st058oa","DOIUrl":"https://doi.org/10.1191/1471082X03st058oa","url":null,"abstract":"The Poisson loglinear model is a common choice for explaining variability in counts. However, in many practical circumstances the restriction that the mean and variance are equal is not realistic. Overdispersion with respect to the Poisson distribution can be modeled explicitly by integrating with respect to a mixture distribution, and use of the conjugate gamma mixing distribution leads to a negative binomial loglinear model. This paper extends the negative binomial loglinear model to the case of dependent counts, where dependence among the counts is handled by including linear combinations of random effects in the linear predictor. If we assume that the vector of random effects is multivariate normal, then complex forms of dependence can be modelled by appropriate specification of the covariance structure. Although the likelihood function for the resulting model is not tractable, maximum likelihood estimates (and standard errors) can be found using the NLMIXED procedure in SAS or, in more complicated examples, using a Monte Carlo EM algorithm. An alternate approach is to leave the random effects completely unspecified and attempt to estimate them using nonparametric maximum likelihood. The methodologies are illustrated with several examples.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126614504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-07-01DOI: 10.1191/1471082X03st051oa
C. Faes, H. Geys, M. Aerts, G. Molenberghs
Developmental toxicity studies are designed to assess the potential adverse effects of an exposure on developing fetuses. Safe dose levels can be determined using dose-response modelling. To this end, it is important to investigate the effect of misspecifying the dose-response model on the safe dose. Since classical polynomial predictors are often of poor quality, there is a clear need for alternative specifications of the predictors, such as fractional polynomials. By means of simulations, we will show how fractional polynomial predictors may resolve possible model misspecifications and may thus yield more reliable estimates of the benchmark doses.
{"title":"Use of fractional polynomials for dose-response modelling and quantitative risk assessment in developmental toxicity studies","authors":"C. Faes, H. Geys, M. Aerts, G. Molenberghs","doi":"10.1191/1471082X03st051oa","DOIUrl":"https://doi.org/10.1191/1471082X03st051oa","url":null,"abstract":"Developmental toxicity studies are designed to assess the potential adverse effects of an exposure on developing fetuses. Safe dose levels can be determined using dose-response modelling. To this end, it is important to investigate the effect of misspecifying the dose-response model on the safe dose. Since classical polynomial predictors are often of poor quality, there is a clear need for alternative specifications of the predictors, such as fractional polynomials. By means of simulations, we will show how fractional polynomial predictors may resolve possible model misspecifications and may thus yield more reliable estimates of the benchmark doses.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125875440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-07-01DOI: 10.1191/1471082X02st049oa
A. Salim, Y. Pawitan
While the Bartlett-Lewis model has been widely used for modelling rainfall processes at a fixed point in space over time, there are observed features, such as longer-scale dependence, which are not well fitted by the model. In this paper, we study an extension where we put an extra layer in the clustered Poisson process of storm origins. We also investigate the Pareto inter-arrival time for the storm origins, which has been used to model web-traffic data. We derive the theoretical first and second-order properties of the multi-layer clustered Poisson processes, but generally we have to rely on Monte Carlo techniques. The models are fitted to hourly rainfall data from Valentia observatory in southwest Ireland, where the extensions are shown to improve on the standard models. We generalize these models further by allowing some parameters of the models to be a function of some covariates. An application using data from Valentia observatory and Belmullet shows how to use this class of models to analyze the association between the rainfall pattern and the North Atlantic Oscillation (NAO) index.
{"title":"Extensions of the Bartlett-Lewis model for rainfall processes","authors":"A. Salim, Y. Pawitan","doi":"10.1191/1471082X02st049oa","DOIUrl":"https://doi.org/10.1191/1471082X02st049oa","url":null,"abstract":"While the Bartlett-Lewis model has been widely used for modelling rainfall processes at a fixed point in space over time, there are observed features, such as longer-scale dependence, which are not well fitted by the model. In this paper, we study an extension where we put an extra layer in the clustered Poisson process of storm origins. We also investigate the Pareto inter-arrival time for the storm origins, which has been used to model web-traffic data. We derive the theoretical first and second-order properties of the multi-layer clustered Poisson processes, but generally we have to rely on Monte Carlo techniques. The models are fitted to hourly rainfall data from Valentia observatory in southwest Ireland, where the extensions are shown to improve on the standard models. We generalize these models further by allowing some parameters of the models to be a function of some covariates. An application using data from Valentia observatory and Belmullet shows how to use this class of models to analyze the association between the rainfall pattern and the North Atlantic Oscillation (NAO) index.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"194 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114973881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-07-01DOI: 10.1191/1471082X03st050oa
P. Toscas, M. Faddy
Models based on a generalization of the simple Poisson process are discussed and illustrated with an analysis of some longitudinal count data on frequencies of epileptic fits. The models enable a broad class of discrete distributions to be constructed, which cover a variety of dispersion properties that can be characterized in an intuitive and appealing way by a simple parameterization. This class includes the Poisson and negative binomial distributions as well as other distributions with greater dispersion than Poisson, and also distributions underdispersed relative to the Poisson distribution. Comparing a number of analyses of the data shows that some covariates have a more significant effect using this modelling than from using mixed Poisson models. It is argued that this could be due to the mixed Poisson models used in the other analyses not providing an appropriate description of the residual variation, with the greater flexibility of the generalized Poisson modelling generally enabling more critical assessment of covariate effects than more standard mixed Poisson modelling.
{"title":"Likelihood-based analysis of longitudinal count data using a generalized Poisson model","authors":"P. Toscas, M. Faddy","doi":"10.1191/1471082X03st050oa","DOIUrl":"https://doi.org/10.1191/1471082X03st050oa","url":null,"abstract":"Models based on a generalization of the simple Poisson process are discussed and illustrated with an analysis of some longitudinal count data on frequencies of epileptic fits. The models enable a broad class of discrete distributions to be constructed, which cover a variety of dispersion properties that can be characterized in an intuitive and appealing way by a simple parameterization. This class includes the Poisson and negative binomial distributions as well as other distributions with greater dispersion than Poisson, and also distributions underdispersed relative to the Poisson distribution. Comparing a number of analyses of the data shows that some covariates have a more significant effect using this modelling than from using mixed Poisson models. It is argued that this could be due to the mixed Poisson models used in the other analyses not providing an appropriate description of the residual variation, with the greater flexibility of the generalized Poisson modelling generally enabling more critical assessment of covariate effects than more standard mixed Poisson modelling.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131505209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-07-01DOI: 10.1191/1471082X03st052oa
A. Fielding, Min Yang, H. Goldstein
In multilevel situations graded category responses are often converted to points scores and linear models for continuous normal responses fitted. This is particularly prevalent in educational research. Generalized multilevel ordinal models for response categories are developed and contrasted in some respects with these normal models. Attention is given to the analysis of a large database of the General Certificate of Education Advanced Level examinations in England and Wales. Ordinal models appear to have advantages in facilitating the study of institutional differences in more detail. Of particular importance is the flexibility offered by logit models with nonproportionally changing odds. Examples are given of the richer contrasts of institutional and subgroup differences that may be evaluated. Appropriate widely available software for this approach is also discussed.
{"title":"Multilevel ordinal models for examination grades","authors":"A. Fielding, Min Yang, H. Goldstein","doi":"10.1191/1471082X03st052oa","DOIUrl":"https://doi.org/10.1191/1471082X03st052oa","url":null,"abstract":"In multilevel situations graded category responses are often converted to points scores and linear models for continuous normal responses fitted. This is particularly prevalent in educational research. Generalized multilevel ordinal models for response categories are developed and contrasted in some respects with these normal models. Attention is given to the analysis of a large database of the General Certificate of Education Advanced Level examinations in England and Wales. Ordinal models appear to have advantages in facilitating the study of institutional differences in more detail. Of particular importance is the flexibility offered by logit models with nonproportionally changing odds. Examples are given of the richer contrasts of institutional and subgroup differences that may be evaluated. Appropriate widely available software for this approach is also discussed.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116718701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-04-01DOI: 10.1191/1471082X03st047oa
Angela D'Elia
A statistical model for ranks is presented, and some results on its parameter are discussed. In particular, maximum likelihood inference is developed, with and without covariates; thus, a statistical model for rank data is introduced in order to link the expressed ranks to the main features of the raters. Some empirical evidence from a marketing survey confirms the usefulness of the proposal in the study of the preferences.
{"title":"Modelling ranks using the inverse hypergeometric distribution","authors":"Angela D'Elia","doi":"10.1191/1471082X03st047oa","DOIUrl":"https://doi.org/10.1191/1471082X03st047oa","url":null,"abstract":"A statistical model for ranks is presented, and some results on its parameter are discussed. In particular, maximum likelihood inference is developed, with and without covariates; thus, a statistical model for rank data is introduced in order to link the expressed ranks to the main features of the raters. Some empirical evidence from a marketing survey confirms the usefulness of the proposal in the study of the preferences.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123460863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-04-01DOI: 10.1191/1471082X03st044oa
D. Allcroft, C. Glasbey
We wish to evaluate and compare models that are non-nested and fit to data using different fitting criteria. We first estimate parameters in all models by optimizing goodness-of-fit to a dataset. Then, to assess a candidate model, we simulate a population of datasets from it and evaluate the goodness-of-fit of all the models, without re-estimating parameter values. Finally, we see whether the vector of goodness-of-fit criteria for the original data is compatible with the multivariate distribution of these criteria for the simulated datasets. By simulating from each model in turn, we determine whether any, or several, models are consistent with the data. We apply the method to compare three models, fit at different temporal resolutions to binary time series of animal behaviour data, concluding that a semi-Markov model gives a better fit than latent Gaussian and hidden Markov models.
{"title":"A simulation-based method for model evaluation","authors":"D. Allcroft, C. Glasbey","doi":"10.1191/1471082X03st044oa","DOIUrl":"https://doi.org/10.1191/1471082X03st044oa","url":null,"abstract":"We wish to evaluate and compare models that are non-nested and fit to data using different fitting criteria. We first estimate parameters in all models by optimizing goodness-of-fit to a dataset. Then, to assess a candidate model, we simulate a population of datasets from it and evaluate the goodness-of-fit of all the models, without re-estimating parameter values. Finally, we see whether the vector of goodness-of-fit criteria for the original data is compatible with the multivariate distribution of these criteria for the simulated datasets. By simulating from each model in turn, we determine whether any, or several, models are consistent with the data. We apply the method to compare three models, fit at different temporal resolutions to binary time series of animal behaviour data, concluding that a semi-Markov model gives a better fit than latent Gaussian and hidden Markov models.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114622113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-04-01DOI: 10.1191/1471082X03st045oa
T. Yee, T. Hastie
Reduced-rank regression is a method with great potential for dimension reduction but has found few applications in applied statistics. To address this, reduced-rank regression is proposed for the class of vector generalized linear models (VGLMs), which is very large. The resulting class, which we call reduced-rank VGLMs (RR-VGLMs), enables the benefits of reduced-rank regression to be conveyed to a wide range of data types, including categorical data. RR-VGLMs are illustrated by focussing on models for categorical data, and especially the multinomial logit model. General algorithmic details are provided and software written by the first author is described. The reduced-rank multinomial logit model is illustrated with real data in two contexts: a regression analysis of workforce data and a classification problem.
{"title":"Reduced-rank vector generalized linear models","authors":"T. Yee, T. Hastie","doi":"10.1191/1471082X03st045oa","DOIUrl":"https://doi.org/10.1191/1471082X03st045oa","url":null,"abstract":"Reduced-rank regression is a method with great potential for dimension reduction but has found few applications in applied statistics. To address this, reduced-rank regression is proposed for the class of vector generalized linear models (VGLMs), which is very large. The resulting class, which we call reduced-rank VGLMs (RR-VGLMs), enables the benefits of reduced-rank regression to be conveyed to a wide range of data types, including categorical data. RR-VGLMs are illustrated by focussing on models for categorical data, and especially the multinomial logit model. General algorithmic details are provided and software written by the first author is described. The reduced-rank multinomial logit model is illustrated with real data in two contexts: a regression analysis of workforce data and a classification problem.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"50 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132974030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-04-01DOI: 10.1191/1471082X03st048oa
G. Kauermann, H. Küchenhoff
The paper describes the analysis of data originating from the German Deep Drill Program. The amount of ‘cataclastic rocks’ is modelled with data resulting from a series of measurements taken from deep drill samples ranging from 1000 up to 5000 m depth. The measurements thereby describe the amount of strongly deformed rock particles and serve as indicator for the occurrence of cataclastic shear zones, which are areas of severely‘ground’ stones due to movements of different layers in the earth crust. The data represent a ‘depth series’ as analogue to a ‘time series’, with mean, dispersion and correlation structure varying in depth. The general smooth structure is thereby disturbed by peaks and outliers so that robust procedures have to be applied for estimation. In terms of statistical modelling technology three different peculiarities of the data have to be tackled simultaneously, that is estimation of the correlation structure, local bandwidth selection and robust smoothing. To do so, existing routines are adapted and combined in new ‘two-stage’ estimation procedures.
{"title":"Modelling data from inside the Earth: local smoothing of mean and dispersion structure in deep drill data","authors":"G. Kauermann, H. Küchenhoff","doi":"10.1191/1471082X03st048oa","DOIUrl":"https://doi.org/10.1191/1471082X03st048oa","url":null,"abstract":"The paper describes the analysis of data originating from the German Deep Drill Program. The amount of ‘cataclastic rocks’ is modelled with data resulting from a series of measurements taken from deep drill samples ranging from 1000 up to 5000 m depth. The measurements thereby describe the amount of strongly deformed rock particles and serve as indicator for the occurrence of cataclastic shear zones, which are areas of severely‘ground’ stones due to movements of different layers in the earth crust. The data represent a ‘depth series’ as analogue to a ‘time series’, with mean, dispersion and correlation structure varying in depth. The general smooth structure is thereby disturbed by peaks and outliers so that robust procedures have to be applied for estimation. In terms of statistical modelling technology three different peculiarities of the data have to be tackled simultaneously, that is estimation of the correlation structure, local bandwidth selection and robust smoothing. To do so, existing routines are adapted and combined in new ‘two-stage’ estimation procedures.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132521502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-01DOI: 10.1191/1471082x02st039ob
I. Currie, M. Durbán
We consider the application of P-splines (Eilers and Marx, 1996) to three classes of models with smooth components: semiparametric models, models with serially correlated errors, and models with heteroscedastic errors. We show that P-splines provide a common approach to these problems. We set out a simple nonparametric strategy for the choice of the P-spline parameters (the number of knots, the degree of the P-spline, and the order of the penalty) and use mixed model (REML) methods for smoothing parameter selection. We give an example of a model in each of the three classes and analyse appropriate data sets.
{"title":"Flexible smoothing with P-splines: a unified approach","authors":"I. Currie, M. Durbán","doi":"10.1191/1471082x02st039ob","DOIUrl":"https://doi.org/10.1191/1471082x02st039ob","url":null,"abstract":"We consider the application of P-splines (Eilers and Marx, 1996) to three classes of models with smooth components: semiparametric models, models with serially correlated errors, and models with heteroscedastic errors. We show that P-splines provide a common approach to these problems. We set out a simple nonparametric strategy for the choice of the P-spline parameters (the number of knots, the degree of the P-spline, and the order of the penalty) and use mixed model (REML) methods for smoothing parameter selection. We give an example of a model in each of the three classes and analyse appropriate data sets.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"310 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133468983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}