Multiple states clustering analysis (MSCA), an unsupervised approach to multiple time-to-event electronic health records applied to multimorbidity associated with myocardial infarction.
{"title":"Multiple states clustering analysis (MSCA), an unsupervised approach to multiple time-to-event electronic health records applied to multimorbidity associated with myocardial infarction.","authors":"Marc Delord, Abdel Douiri","doi":"10.1186/s12874-025-02476-7","DOIUrl":null,"url":null,"abstract":"<p><p>Multimorbidity is characterized by the accrual of two or more long-term conditions (LTCs) in an individual. This state of health is increasingly prevalent and poses public health challenges. Adapting approaches to effectively analyse electronic health records is needed to better understand multimorbidity. We propose a novel unsupervised clustering approach to multiple time-to-event health records denoted as multiple state clustering analysis (MSCA). In MSCA, patients' pairwise dissimilarities are computed using patients' state matrices which are composed of multiple censored time-to-event indicators reflecting patients' health history. The use of state matrices enables the analysis of an arbitrary number of LTCs without reducing patients' health trajectories to a particular sequence of events. MSCA was applied to analyse multimorbidity associated with myocardial infarction using electronic health records of 26 LTCs, including conventional cardiovascular risk factors (CVRFs) such as diabetes and hypertension, collected from south London general practices between 2005 and 2021 in 5087 patients using the MSCA R library. We identified a typology of 11 clusters, characterised by age at onset of myocardial infarction, sequences of conventional CVRFs and non-conventional risk factors including physical and mental health conditions. Interestingly, multivariate analysis revealed that clusters were also associated with various combinations of socio-demographic characteristics including gender and ethnicity. By identifying meaningful sequences of LTCs associated with myocardial infarction and distinct socio-demographic characteristics, MSCA proves to be an effective approach to the analysis of electronic health records, with the potential to enhance our understanding of multimorbidity for improved prevention and management.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"32"},"PeriodicalIF":3.9000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11792209/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-025-02476-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Multimorbidity is characterized by the accrual of two or more long-term conditions (LTCs) in an individual. This state of health is increasingly prevalent and poses public health challenges. Adapting approaches to effectively analyse electronic health records is needed to better understand multimorbidity. We propose a novel unsupervised clustering approach to multiple time-to-event health records denoted as multiple state clustering analysis (MSCA). In MSCA, patients' pairwise dissimilarities are computed using patients' state matrices which are composed of multiple censored time-to-event indicators reflecting patients' health history. The use of state matrices enables the analysis of an arbitrary number of LTCs without reducing patients' health trajectories to a particular sequence of events. MSCA was applied to analyse multimorbidity associated with myocardial infarction using electronic health records of 26 LTCs, including conventional cardiovascular risk factors (CVRFs) such as diabetes and hypertension, collected from south London general practices between 2005 and 2021 in 5087 patients using the MSCA R library. We identified a typology of 11 clusters, characterised by age at onset of myocardial infarction, sequences of conventional CVRFs and non-conventional risk factors including physical and mental health conditions. Interestingly, multivariate analysis revealed that clusters were also associated with various combinations of socio-demographic characteristics including gender and ethnicity. By identifying meaningful sequences of LTCs associated with myocardial infarction and distinct socio-demographic characteristics, MSCA proves to be an effective approach to the analysis of electronic health records, with the potential to enhance our understanding of multimorbidity for improved prevention and management.
期刊介绍:
BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.