{"title":"利用吉布斯采样器和信息标准对纵向数据进行聚类以建立生长曲线模型","authors":"Yu Fei, Rongli Li, Zhouhong Li, Guoqi Qian","doi":"10.1007/s00357-024-09477-z","DOIUrl":null,"url":null,"abstract":"<p>Clustering longitudinal data for growth curve modelling is considered in this paper, where we aim to optimally estimate the underpinning unknown group partition matrix. Instead of following the conventional soft clustering approach, which assumes the columns of the partition matrix to have i.i.d. multinomial or categorical prior distributions and uses a regression model with the response following a finite mixture distribution to estimate the posterior distribution of the partition matrix, we propose an iterative partition and regression procedure to find the best partition matrix and the associated best growth curve regression model for each identified cluster. We show that the best partition matrix is the one minimizing a recently developed empirical Bayes information criterion (eBIC), which, due to the involved combinatorial explosion, is difficult to compute via enumerating all candidate partition matrices. Thus, we develop a Gibbs sampling method to generate a Markov chain of candidate partition matrices that has its equilibrium probability distribution equal the one induced from eBIC. We further show that the best partition matrix, given a priori the number of latent clusters, can be consistently estimated and is computationally scalable based on this Markov chain. The number of latent clusters is also best estimated by minimizing eBIC. The proposed iterative clustering and regression method is assessed by a comprehensive simulation study before being applied to two real-world growth curve modelling examples involving longitudinal data clustering.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"199 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Clustering Longitudinal Data for Growth Curve Modelling by Gibbs Sampler and Information Criterion\",\"authors\":\"Yu Fei, Rongli Li, Zhouhong Li, Guoqi Qian\",\"doi\":\"10.1007/s00357-024-09477-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Clustering longitudinal data for growth curve modelling is considered in this paper, where we aim to optimally estimate the underpinning unknown group partition matrix. Instead of following the conventional soft clustering approach, which assumes the columns of the partition matrix to have i.i.d. multinomial or categorical prior distributions and uses a regression model with the response following a finite mixture distribution to estimate the posterior distribution of the partition matrix, we propose an iterative partition and regression procedure to find the best partition matrix and the associated best growth curve regression model for each identified cluster. We show that the best partition matrix is the one minimizing a recently developed empirical Bayes information criterion (eBIC), which, due to the involved combinatorial explosion, is difficult to compute via enumerating all candidate partition matrices. Thus, we develop a Gibbs sampling method to generate a Markov chain of candidate partition matrices that has its equilibrium probability distribution equal the one induced from eBIC. We further show that the best partition matrix, given a priori the number of latent clusters, can be consistently estimated and is computationally scalable based on this Markov chain. The number of latent clusters is also best estimated by minimizing eBIC. The proposed iterative clustering and regression method is assessed by a comprehensive simulation study before being applied to two real-world growth curve modelling examples involving longitudinal data clustering.</p>\",\"PeriodicalId\":50241,\"journal\":{\"name\":\"Journal of Classification\",\"volume\":\"199 1\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Classification\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00357-024-09477-z\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Classification","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00357-024-09477-z","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Clustering Longitudinal Data for Growth Curve Modelling by Gibbs Sampler and Information Criterion
Clustering longitudinal data for growth curve modelling is considered in this paper, where we aim to optimally estimate the underpinning unknown group partition matrix. Instead of following the conventional soft clustering approach, which assumes the columns of the partition matrix to have i.i.d. multinomial or categorical prior distributions and uses a regression model with the response following a finite mixture distribution to estimate the posterior distribution of the partition matrix, we propose an iterative partition and regression procedure to find the best partition matrix and the associated best growth curve regression model for each identified cluster. We show that the best partition matrix is the one minimizing a recently developed empirical Bayes information criterion (eBIC), which, due to the involved combinatorial explosion, is difficult to compute via enumerating all candidate partition matrices. Thus, we develop a Gibbs sampling method to generate a Markov chain of candidate partition matrices that has its equilibrium probability distribution equal the one induced from eBIC. We further show that the best partition matrix, given a priori the number of latent clusters, can be consistently estimated and is computationally scalable based on this Markov chain. The number of latent clusters is also best estimated by minimizing eBIC. The proposed iterative clustering and regression method is assessed by a comprehensive simulation study before being applied to two real-world growth curve modelling examples involving longitudinal data clustering.
期刊介绍:
To publish original and valuable papers in the field of classification, numerical taxonomy, multidimensional scaling and other ordination techniques, clustering, tree structures and other network models (with somewhat less emphasis on principal components analysis, factor analysis, and discriminant analysis), as well as associated models and algorithms for fitting them. Articles will support advances in methodology while demonstrating compelling substantive applications. Comprehensive review articles are also acceptable. Contributions will represent disciplines such as statistics, psychology, biology, information retrieval, anthropology, archeology, astronomy, business, chemistry, computer science, economics, engineering, geography, geology, linguistics, marketing, mathematics, medicine, political science, psychiatry, sociology, and soil science.