Clustering Longitudinal Life-Course Sequences Using Mixtures of Exponential-Distance Models

Keefe Murphy, Brendan Murphy, R. Piccarreta, I. C. Gormley
{"title":"Clustering Longitudinal Life-Course Sequences Using Mixtures of Exponential-Distance Models","authors":"Keefe Murphy, Brendan Murphy, R. Piccarreta, I. C. Gormley","doi":"10.31235/osf.io/f5n8k","DOIUrl":null,"url":null,"abstract":"Sequence analysis is an increasingly popular approach for analysing life courses represented by ordered collections of activities experienced by subjects over time. Here, we analyse a survey data set containing information on the career trajectories of a cohort of Northern Irish youths tracked between the ages of 16 and 22. We propose a novel, model-based clustering approach suited to the analysis of such data from a holistic perspective, with the aims of estimating the number of typical career trajectories, identifying the relevant features of these patterns, and assessing the extent to which such patterns are shaped by background characteristics.Several criteria exist for measuring pairwise dissimilarities among categorical sequences. Typically, dissimilarity matrices are employed as input to heuristic clustering algorithms. The family of methods we develop instead clusters sequences directly using mixtures of exponential-distance models. Basing the models on weighted variants of the Hamming distance metric permits closed-form expressions for parameter estimation. Simultaneously allowing the component membership probabilities to depend on fixed covariates and accommodating sampling weights in the clustering process yields new insights on the Northern Irish data. In particular, we find that school examination performance is the single most important predictor of cluster membership.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31235/osf.io/f5n8k","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Sequence analysis is an increasingly popular approach for analysing life courses represented by ordered collections of activities experienced by subjects over time. Here, we analyse a survey data set containing information on the career trajectories of a cohort of Northern Irish youths tracked between the ages of 16 and 22. We propose a novel, model-based clustering approach suited to the analysis of such data from a holistic perspective, with the aims of estimating the number of typical career trajectories, identifying the relevant features of these patterns, and assessing the extent to which such patterns are shaped by background characteristics.Several criteria exist for measuring pairwise dissimilarities among categorical sequences. Typically, dissimilarity matrices are employed as input to heuristic clustering algorithms. The family of methods we develop instead clusters sequences directly using mixtures of exponential-distance models. Basing the models on weighted variants of the Hamming distance metric permits closed-form expressions for parameter estimation. Simultaneously allowing the component membership probabilities to depend on fixed covariates and accommodating sampling weights in the clustering process yields new insights on the Northern Irish data. In particular, we find that school examination performance is the single most important predictor of cluster membership.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用混合指数距离模型聚类纵向生命历程序列
序列分析是一种日益流行的方法,用于分析由受试者在一段时间内经历的有序活动集合所表示的生命历程。在这里,我们分析了一组调查数据集,其中包含了一群年龄在16岁至22岁之间的北爱尔兰青年的职业轨迹信息。我们提出了一种新颖的、基于模型的聚类方法,适合于从整体角度分析这些数据,目的是估计典型职业轨迹的数量,识别这些模式的相关特征,并评估这些模式受背景特征影响的程度。存在几种标准来测量分类序列之间的两两不相似性。通常,不相似矩阵被用作启发式聚类算法的输入。我们开发的方法系列直接使用指数距离模型的混合物来聚类序列。基于汉明距离度量的加权变量的模型允许参数估计的封闭形式表达式。同时,允许组件隶属概率依赖于固定的协变量,并在聚类过程中容纳采样权重,从而产生对北爱尔兰数据的新见解。特别是,我们发现学校考试成绩是集群隶属度的最重要的单一预测因子。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Revisiting Empirical Bayes Methods and Applications to Special Types of Data Flexible Bayesian modelling of concomitant covariate effects in mixture models A Critique of Differential Abundance Analysis, and Advocacy for an Alternative Post-Processing of MCMC Conditional variance estimator for sufficient dimension reduction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1