Pub Date : 2021-06-21DOI: 10.1177/1471082X211022980
D. S. Mikis, A. Robert, Georgikopoulos Nikolaos, Demaio Fernanda
A solution to the problem of having to deal with a large number of interrelated explanatory variables within a generalized additive model for location, scale and shape (GAMLSS) is given here using as an example the Greek–German government bond yield spreads from 25 April 2005 to 31 March 2010. Those were turbulent financial years, and in order to capture the spreads behaviour, a model has to be able to deal with the complex nature of the financial indicators used to predict the spreads. Fitting a model, using principal components regression of both main and first order interaction terms, for all the parameters of the assumed distribution of the response variable seems to produce promising results.
{"title":"Principal component regression in GAMLSS applied to Greek–German government bond yield spreads","authors":"D. S. Mikis, A. Robert, Georgikopoulos Nikolaos, Demaio Fernanda","doi":"10.1177/1471082X211022980","DOIUrl":"https://doi.org/10.1177/1471082X211022980","url":null,"abstract":"A solution to the problem of having to deal with a large number of interrelated explanatory variables within a generalized additive model for location, scale and shape (GAMLSS) is given here using as an example the Greek–German government bond yield spreads from 25 April 2005 to 31 March 2010. Those were turbulent financial years, and in order to capture the spreads behaviour, a model has to be able to deal with the complex nature of the financial indicators used to predict the spreads. Fitting a model, using principal components regression of both main and first order interaction terms, for all the parameters of the assumed distribution of the response variable seems to produce promising results.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"127 - 145"},"PeriodicalIF":1.0,"publicationDate":"2021-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X211022980","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44097647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-20DOI: 10.1177/1471082X211020872
Ben Brintz, L. Madsen, Claudio Fuentes
This article develops an approximate N-mixture model for infectious disease counts that accounts for under-reporting as well as spatial dependence induced by person-to-person spread of disease. We employ the model to estimate actual case counts in Oregon of chlamydia, an easily-treated but usually asymptomatic sexually transmitted disease. We describe a combined parametric bootstrap to account for uncertainty in parameter estimates as well as sampling variability in actual case counts. A simulation study illustrates that our method performs well in many scenarios when the model is correctly specified, and also gives reasonable results when the model is misspecified, and no spatial dependence exists.
{"title":"A spatially explicit N-mixture model for the estimation of disease prevalence","authors":"Ben Brintz, L. Madsen, Claudio Fuentes","doi":"10.1177/1471082X211020872","DOIUrl":"https://doi.org/10.1177/1471082X211020872","url":null,"abstract":"This article develops an approximate N-mixture model for infectious disease counts that accounts for under-reporting as well as spatial dependence induced by person-to-person spread of disease. We employ the model to estimate actual case counts in Oregon of chlamydia, an easily-treated but usually asymptomatic sexually transmitted disease. We describe a combined parametric bootstrap to account for uncertainty in parameter estimates as well as sampling variability in actual case counts. A simulation study illustrates that our method performs well in many scenarios when the model is correctly specified, and also gives reasonable results when the model is misspecified, and no spatial dependence exists.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"23 1","pages":"31 - 52"},"PeriodicalIF":1.0,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X211020872","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46148761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-08DOI: 10.21203/RS.3.RS-563303/V1
J. Ramjith, Andreas Bender, Roes Kcb, Jonker Ma
Background: Recurrent events analysis plays an important role in many applications, including the study of chronic diseases or recurrence of infections. Historically, most models for the analysis of time-to-event data, including recurrent events, have been based on Cox proportional hazards regression. Recently, however, the Piece-wise exponential Additive Mixed Model (PAMM) has gained popularity as a flexible framework for survival analysis. While many papers and tutorials have been presented in the literature on the application of Cox based models, few papers have provided detailed instructions for the application of PAMMs and to our knowledge, none exist for recurrent events analysis. Methods: The PAMM is introduced as a framework for recurrent events analysis. We describe the application of the model to unstratified and stratified shared frailty models for recurrent events. We illustrate how penalized splines can be used to estimate non-linear and time-varying covariate effects without a priori assumptions about their functional shape. The model is motivated for both, analysis on the gap timescale ("clock-reset") and calendar timescale ("clock-forward"). The data augmentation necessary for the application to recurrent events is described and explained in detail. Results: Simulations confirmed that the model provides unbiased estimates of covariate effects and the frailty variance, as well as equivalence to the Cox model when proportional hazards are assumed. Applications to recurrence of staphylococcus aureus and malaria in children illustrates the estimation of seasonality, bivariate non-linear effects, multiple timescales and relaxation of the proportional hazards assumption via time-varying effects. The R package pammtools has been extended to facilitate estimation, visualization and interpretation of PAMMs for recurrent events analysis. Conclusion: PAMMs provide a flexible framework for the analysis of time-to-event and recurrent events data. The estimation of PAMMs is based on Generalized Additive Mixed Models and thus extends the researcher’s toolbox for recurrent events analysis.
{"title":"Recurrent Events Analysis with Piece-wise exponential Additive Mixed Models","authors":"J. Ramjith, Andreas Bender, Roes Kcb, Jonker Ma","doi":"10.21203/RS.3.RS-563303/V1","DOIUrl":"https://doi.org/10.21203/RS.3.RS-563303/V1","url":null,"abstract":"\u0000 Background: Recurrent events analysis plays an important role in many applications, including the study of chronic diseases or recurrence of infections. Historically, most models for the analysis of time-to-event data, including recurrent events, have been based on Cox proportional hazards regression. Recently, however, the Piece-wise exponential Additive Mixed Model (PAMM) has gained popularity as a flexible framework for survival analysis. While many papers and tutorials have been presented in the literature on the application of Cox based models, few papers have provided detailed instructions for the application of PAMMs and to our knowledge, none exist for recurrent events analysis. Methods: The PAMM is introduced as a framework for recurrent events analysis. We describe the application of the model to unstratified and stratified shared frailty models for recurrent events. We illustrate how penalized splines can be used to estimate non-linear and time-varying covariate effects without a priori assumptions about their functional shape. The model is motivated for both, analysis on the gap timescale (\"clock-reset\") and calendar timescale (\"clock-forward\"). The data augmentation necessary for the application to recurrent events is described and explained in detail. Results: Simulations confirmed that the model provides unbiased estimates of covariate effects and the frailty variance, as well as equivalence to the Cox model when proportional hazards are assumed. Applications to recurrence of staphylococcus aureus and malaria in children illustrates the estimation of seasonality, bivariate non-linear effects, multiple timescales and relaxation of the proportional hazards assumption via time-varying effects. The R package pammtools has been extended to facilitate estimation, visualization and interpretation of PAMMs for recurrent events analysis. Conclusion: PAMMs provide a flexible framework for the analysis of time-to-event and recurrent events data. The estimation of PAMMs is based on Generalized Additive Mixed Models and thus extends the researcher’s toolbox for recurrent events analysis.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"1 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2021-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48112585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-07DOI: 10.1177/1471082X211015454
Alvaro J. Flórez, I. Van Keilegom, G. Molenberghs, A. Verhasselt
While extensive research has been devoted to univariate quantile regression, this is considerably less the case for the multivariate (longitudinal) version, even though there are many potential applications, such as the joint examination of growth curves for two or more growth characteristics, such as body weight and length in infants. Quantile functions are easier to interpret for a population of curves than mean functions. While the connection between multivariate quantiles and the multivariate asymmetric Laplace distribution is known, it is less well known that its use for maximum likelihood estimation poses mathematical as well as computational challenges. Therefore, we study a broader family of multivariate generalized hyperbolic distributions, of which the multivariate asymmetric Laplace distribution is a limiting case. We offer an asymptotic treatment. Simulations and a data example supplement the modelling and theoretical considerations.
{"title":"Quantile regression for longitudinal data via the multivariate generalized hyperbolic distribution","authors":"Alvaro J. Flórez, I. Van Keilegom, G. Molenberghs, A. Verhasselt","doi":"10.1177/1471082X211015454","DOIUrl":"https://doi.org/10.1177/1471082X211015454","url":null,"abstract":"While extensive research has been devoted to univariate quantile regression, this is considerably less the case for the multivariate (longitudinal) version, even though there are many potential applications, such as the joint examination of growth curves for two or more growth characteristics, such as body weight and length in infants. Quantile functions are easier to interpret for a population of curves than mean functions. While the connection between multivariate quantiles and the multivariate asymmetric Laplace distribution is known, it is less well known that its use for maximum likelihood estimation poses mathematical as well as computational challenges. Therefore, we study a broader family of multivariate generalized hyperbolic distributions, of which the multivariate asymmetric Laplace distribution is a limiting case. We offer an asymptotic treatment. Simulations and a data example supplement the modelling and theoretical considerations.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"566 - 584"},"PeriodicalIF":1.0,"publicationDate":"2021-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X211015454","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42336179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-27DOI: 10.1177/1471082X211007308
Stanislaus Stadlmann, T. Kneib
A newly emerging field in statistics is distributional regression, where not only the mean but each parameter of a parametric response distribution can be modelled using a set of predictors. As an extension of generalized additive models, distributional regression utilizes the known link functions (log, logit, etc.), model terms (fixed, random, spatial, smooth, etc.) and available types of distributions but allows us to go well beyond the exponential family and to model potentially all distributional parameters. Due to this increase in model flexibility, the interpretation of covariate effects on the shape of the conditional response distribution, its moments and other features derived from this distribution is more challenging than with traditional mean-based methods. In particular, such quantities of interest often do not directly equate the modelled parameters but are rather a (potentially complex) combination of them. To ease the post-estimation model analysis, we propose a framework and subsequently feature an implementation in R for the visualization of Bayesian and frequentist distributional regression models fitted using the bamlss, gamlss and betareg R packages.
{"title":"Interactively visualizing distributional regression models with distreg.vis","authors":"Stanislaus Stadlmann, T. Kneib","doi":"10.1177/1471082X211007308","DOIUrl":"https://doi.org/10.1177/1471082X211007308","url":null,"abstract":"A newly emerging field in statistics is distributional regression, where not only the mean but each parameter of a parametric response distribution can be modelled using a set of predictors. As an extension of generalized additive models, distributional regression utilizes the known link functions (log, logit, etc.), model terms (fixed, random, spatial, smooth, etc.) and available types of distributions but allows us to go well beyond the exponential family and to model potentially all distributional parameters. Due to this increase in model flexibility, the interpretation of covariate effects on the shape of the conditional response distribution, its moments and other features derived from this distribution is more challenging than with traditional mean-based methods. In particular, such quantities of interest often do not directly equate the modelled parameters but are rather a (potentially complex) combination of them. To ease the post-estimation model analysis, we propose a framework and subsequently feature an implementation in R for the visualization of Bayesian and frequentist distributional regression models fitted using the bamlss, gamlss and betareg R packages.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"527 - 545"},"PeriodicalIF":1.0,"publicationDate":"2021-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X211007308","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43253762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-24DOI: 10.1177/1471082X211008011
Kangjie Zhang, Juxin Liu, Yang Liu, Peng Zhang, R. Carroll
Fatal car crashes are the leading cause of death among teenagers in the USA. The Graduated Driver Licensing (GDL) programme is one effective policy for reducing the number of teen fatal car crashes. Our study focuses on the number of fatal car crashes in Michigan during 1990–2004 excluding 1997, when the GDL started. We use Poisson regression with spatially dependent random effects to model the county level teen car crash counts. We develop a measurement error model to account for the fact that the total teenage population in the county level is used as a proxy for the teenage driver population. To the best of our knowledge, there is no existing literature that considers adjustment for measurement error in an offset variable. Furthermore, limited work has addressed the measurement errors in the context of spatial data. In our modelling, a Berkson measurement error model with spatial random effects is applied to adjust for the error-prone offset variable in a Bayesian paradigm. The Bayesian Markov chain Monte Carlo (MCMC) sampling is implemented in rstan. To assess the consequence of adjusting for measurement error, we compared two models with and without adjustment for measurement error. We found the effect of a time indicator becomes less significant with the measurement-error adjustment. It leads to our conclusion that the reduced number of teen drivers can help explain, to some extent, the effectiveness of GDL.
{"title":"Bayesian adjustment for measurement error in an offset variable in a Poisson regression model","authors":"Kangjie Zhang, Juxin Liu, Yang Liu, Peng Zhang, R. Carroll","doi":"10.1177/1471082X211008011","DOIUrl":"https://doi.org/10.1177/1471082X211008011","url":null,"abstract":"Fatal car crashes are the leading cause of death among teenagers in the USA. The Graduated Driver Licensing (GDL) programme is one effective policy for reducing the number of teen fatal car crashes. Our study focuses on the number of fatal car crashes in Michigan during 1990–2004 excluding 1997, when the GDL started. We use Poisson regression with spatially dependent random effects to model the county level teen car crash counts. We develop a measurement error model to account for the fact that the total teenage population in the county level is used as a proxy for the teenage driver population. To the best of our knowledge, there is no existing literature that considers adjustment for measurement error in an offset variable. Furthermore, limited work has addressed the measurement errors in the context of spatial data. In our modelling, a Berkson measurement error model with spatial random effects is applied to adjust for the error-prone offset variable in a Bayesian paradigm. The Bayesian Markov chain Monte Carlo (MCMC) sampling is implemented in rstan. To assess the consequence of adjusting for measurement error, we compared two models with and without adjustment for measurement error. We found the effect of a time indicator becomes less significant with the measurement-error adjustment. It leads to our conclusion that the reduced number of teen drivers can help explain, to some extent, the effectiveness of GDL.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"509 - 526"},"PeriodicalIF":1.0,"publicationDate":"2021-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X211008011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44696177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-17DOI: 10.1177/1471082X211033490
H. Bar, J. Booth, M. Wells
It is known that the estimating equations for quantile regression (QR) can be solved using an EM algorithm in which the M-step is computed via weighted least squares, with weights computed at the E-step as the expectation of independent generalized inverse-Gaussian variables. This fact is exploited here to extend QR to allow for random effects in the linear predictor. Convergence of the algorithm in this setting is established by showing that it is a generalized alternating minimization (GAM) procedure. Another modification of the EM algorithm also allows us to adapt a recently proposed method for variable selection in mean regression models to the QR setting. Simulations show that the resulting method significantly outperforms variable selection in QR models using the lasso penalty. Applications to real data include a frailty QR analysis of hospital stays, and variable selection for age at onset of lung cancer and for riboflavin production rate using high-dimensional gene expression arrays for prediction.
{"title":"Mixed effect modelling and variable selection for quantile regression","authors":"H. Bar, J. Booth, M. Wells","doi":"10.1177/1471082X211033490","DOIUrl":"https://doi.org/10.1177/1471082X211033490","url":null,"abstract":"It is known that the estimating equations for quantile regression (QR) can be solved using an EM algorithm in which the M-step is computed via weighted least squares, with weights computed at the E-step as the expectation of independent generalized inverse-Gaussian variables. This fact is exploited here to extend QR to allow for random effects in the linear predictor. Convergence of the algorithm in this setting is established by showing that it is a generalized alternating minimization (GAM) procedure. Another modification of the EM algorithm also allows us to adapt a recently proposed method for variable selection in mean regression models to the QR setting. Simulations show that the resulting method significantly outperforms variable selection in QR models using the lasso penalty. Applications to real data include a frailty QR analysis of hospital stays, and variable selection for age at onset of lung cancer and for riboflavin production rate using high-dimensional gene expression arrays for prediction.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"23 1","pages":"53 - 80"},"PeriodicalIF":1.0,"publicationDate":"2021-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47055947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-31DOI: 10.1177/1471082X21995675
A. Grand, R. Dittrich
This article proposes an alternative method of making comparative judgements in multivariate paired comparisons (PCs) where judgements about change are made directly by comparing an object at two time points for each of a series of attributes. The application deals with the design of shop window displays where products should be arranged by teams of vocational students according to aesthetic principles (attributes). The photos of the students’ window displays at time 1 (before feedback) and at time 2 (after feedback) were compared by judging each attribute as to whether it was fulfilled better at time 1 or at time 2. An advantage of this PC approach over an alternative of a scoring system is the possibility to assess even subtle changes of various aspects of attractiveness, which cannot easily be measured using a score. To analyse these data, we used earlier work which developed both a multivariate PC pattern model for multi-attribute data and a PC model over time and defined a multivariate PC model of changes (MPCC). The model can be fitted as a non-standard Poisson log-linear model and provides estimates of change for the three attributes for time 2 and we were able to check for possible interaction effects between these attributes.
{"title":"Modelling changes over time in a multivariate paired comparison: An application to window display design","authors":"A. Grand, R. Dittrich","doi":"10.1177/1471082X21995675","DOIUrl":"https://doi.org/10.1177/1471082X21995675","url":null,"abstract":"This article proposes an alternative method of making comparative judgements in multivariate paired comparisons (PCs) where judgements about change are made directly by comparing an object at two time points for each of a series of attributes. The application deals with the design of shop window displays where products should be arranged by teams of vocational students according to aesthetic principles (attributes). The photos of the students’ window displays at time 1 (before feedback) and at time 2 (after feedback) were compared by judging each attribute as to whether it was fulfilled better at time 1 or at time 2. An advantage of this PC approach over an alternative of a scoring system is the possibility to assess even subtle changes of various aspects of attractiveness, which cannot easily be measured using a score. To analyse these data, we used earlier work which developed both a multivariate PC pattern model for multi-attribute data and a PC model over time and defined a multivariate PC model of changes (MPCC). The model can be fitted as a non-standard Poisson log-linear model and provides estimates of change for the three attributes for time 2 and we were able to check for possible interaction effects between these attributes.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"95 - 106"},"PeriodicalIF":1.0,"publicationDate":"2021-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X21995675","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42838067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-11DOI: 10.1177/1471082X211056158
A. Volkmann, Almond Stöcker, F. Scheipl, S. Greven
Multivariate functional data can be intrinsically multivariate like movement trajectories in 2D or complementary such as precipitation, temperature and wind speeds over time at a given weather station. We propose a multivariate functional additive mixed model (multiFAMM) and show its application to both data situations using examples from sports science (movement trajectories of snooker players) and phonetic science (acoustic signals and articulation of consonants). The approach includes linear and nonlinear covariate effects and models the dependency structure between the dimensions of the responses using multivariate functional principal component analysis. Multivariate functional random intercepts capture both the auto-correlation within a given function and cross-correlations between the multivariate functional dimensions. They also allow us to model between-function correlations as induced by, for example, repeated measurements or crossed study designs. Modelling the dependency structure between the dimensions can generate additional insight into the properties of the multivariate functional process, improves the estimation of random effects, and yields corrected confidence bands for covariate effects. Extensive simulation studies indicate that a multivariate modelling approach is more parsimonious than fitting independent univariate models to the data while maintaining or improving model fit.
{"title":"Multivariate functional additive mixed models","authors":"A. Volkmann, Almond Stöcker, F. Scheipl, S. Greven","doi":"10.1177/1471082X211056158","DOIUrl":"https://doi.org/10.1177/1471082X211056158","url":null,"abstract":"Multivariate functional data can be intrinsically multivariate like movement trajectories in 2D or complementary such as precipitation, temperature and wind speeds over time at a given weather station. We propose a multivariate functional additive mixed model (multiFAMM) and show its application to both data situations using examples from sports science (movement trajectories of snooker players) and phonetic science (acoustic signals and articulation of consonants). The approach includes linear and nonlinear covariate effects and models the dependency structure between the dimensions of the responses using multivariate functional principal component analysis. Multivariate functional random intercepts capture both the auto-correlation within a given function and cross-correlations between the multivariate functional dimensions. They also allow us to model between-function correlations as induced by, for example, repeated measurements or crossed study designs. Modelling the dependency structure between the dimensions can generate additional insight into the properties of the multivariate functional process, improves the estimation of random effects, and yields corrected confidence bands for covariate effects. Extensive simulation studies indicate that a multivariate modelling approach is more parsimonious than fitting independent univariate models to the data while maintaining or improving model fit.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"23 1","pages":"303 - 326"},"PeriodicalIF":1.0,"publicationDate":"2021-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48199296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-08DOI: 10.1177/1471082X21989170
Yingying Zhang, Volodymyr Melnykov, Igor Melnykov
A new approach to the analysis of heterogeneous categorical sequences is proposed. The first-order Markov model is employed in a finite mixture setting with initial state and transition probabilities being expressed as functions of time. The expectation–maximization algorithm approach to parameter estimation is implemented in the presence of positive equivalence constraints that determine which observations must be placed in the same class in the solution. The proposed model is applied to a dataset from the British Household Panel Survey to evaluate the association between the education background and life outcomes of study participants. The analysis of the survey data reveals many interesting relationships between the level of education and major life events.
{"title":"Semi-supervised clustering of time-dependent categorical sequences with application to discovering education-based life patterns","authors":"Yingying Zhang, Volodymyr Melnykov, Igor Melnykov","doi":"10.1177/1471082X21989170","DOIUrl":"https://doi.org/10.1177/1471082X21989170","url":null,"abstract":"A new approach to the analysis of heterogeneous categorical sequences is proposed. The first-order Markov model is employed in a finite mixture setting with initial state and transition probabilities being expressed as functions of time. The expectation–maximization algorithm approach to parameter estimation is implemented in the presence of positive equivalence constraints that determine which observations must be placed in the same class in the solution. The proposed model is applied to a dataset from the British Household Panel Survey to evaluate the association between the education background and life outcomes of study participants. The analysis of the survey data reveals many interesting relationships between the level of education and major life events.","PeriodicalId":49476,"journal":{"name":"Statistical Modelling","volume":"22 1","pages":"457 - 476"},"PeriodicalIF":1.0,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1471082X21989170","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49177583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}