{"title":"Model‐based clustering of time‐dependent categorical sequences with application to the analysis of major life event patterns","authors":"Yingying Zhang, Volodymyr Melnykov, Xuwen Zhu","doi":"10.1002/sam.11502","DOIUrl":null,"url":null,"abstract":"Clustering categorical sequences is a problem that arises in many fields. There is a few techniques available in this framework but none of them take into account the possible temporal character of transitions from one state to another. A mixture of Markov models is proposed, where transition probabilities are represented as functions of time. The corresponding expectation–maximization algorithm is discussed along with related computational challenges. The effectiveness of the proposed procedure is illustrated on the set of simulation studies, in which it outperforms four alternative approaches. The method is applied to major life event sequences from the British Household Panel Survey. As reflected by Bayesian Information Criterion, the proposed model demonstrates substantially better performance than its competitors. The analysis of obtained results and related transition probability plots reveals two groups of individuals: people with a conventional development of life course and those encountering some challenges.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining: The ASA Data Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Clustering categorical sequences is a problem that arises in many fields. There is a few techniques available in this framework but none of them take into account the possible temporal character of transitions from one state to another. A mixture of Markov models is proposed, where transition probabilities are represented as functions of time. The corresponding expectation–maximization algorithm is discussed along with related computational challenges. The effectiveness of the proposed procedure is illustrated on the set of simulation studies, in which it outperforms four alternative approaches. The method is applied to major life event sequences from the British Household Panel Survey. As reflected by Bayesian Information Criterion, the proposed model demonstrates substantially better performance than its competitors. The analysis of obtained results and related transition probability plots reveals two groups of individuals: people with a conventional development of life course and those encountering some challenges.