{"title":"Identifying heterogeneous subgroups of systemic autoimmune diseases by applying a joint dimension reduction and clustering approach to immunomarkers","authors":"Chia-Wei Chang, Hsin-Yao Wang, Wan-Ying Lin, Yu-Chiang Wang, Wei-Lin Lo, Ting-Wei Lin, Jia-Ruei Yu, Yi-Ju Tseng","doi":"10.1186/s13040-024-00389-7","DOIUrl":null,"url":null,"abstract":"The high complexity of systemic autoimmune diseases (SADs) has hindered precise management. This study aims to investigate heterogeneity in SADs. We applied a joint cluster analysis, which jointed multiple correspondence analysis and k-means, to immunomarkers and measured the heterogeneity of clusters by examining differences in immunomarkers and clinical manifestations. The electronic health records of patients who received an antinuclear antibody test and were diagnosed with SADs, namely systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), and Sjögren’s syndrome (SS), were retrieved between 2001 and 2016 from hospitals in Taiwan. With distinctive patterns of immunomarkers, a total of 11,923 patients with the three SADs were grouped into six clusters. None of the clusters was composed only of a single SAD, and these clusters demonstrated considerable differences in clinical manifestation. Both patients with SLE and SS had a more dispersed distribution in the six clusters. Among patients with SLE, the occurrence of renal compromise was higher in Clusters 3 and 6 (52% and 51%) than in the other clusters (p < 0.001). Cluster 3 also had a high proportion of patients with discoid lupus (60%) than did Cluster 6 (39%; p < 0.001). Patients with SS in Cluster 3 were the most distinctive because of the high occurrence of immunity disorders (63%) and other and unspecified benign neoplasm (58%) with statistical significance compared with the other clusters (all p < 0.05). The immunomarker-driven clustering method could recognise more clinically relevant subgroups of the SADs and would provide a more precise diagnosis basis.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"117 1","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-024-00389-7","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The high complexity of systemic autoimmune diseases (SADs) has hindered precise management. This study aims to investigate heterogeneity in SADs. We applied a joint cluster analysis, which jointed multiple correspondence analysis and k-means, to immunomarkers and measured the heterogeneity of clusters by examining differences in immunomarkers and clinical manifestations. The electronic health records of patients who received an antinuclear antibody test and were diagnosed with SADs, namely systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), and Sjögren’s syndrome (SS), were retrieved between 2001 and 2016 from hospitals in Taiwan. With distinctive patterns of immunomarkers, a total of 11,923 patients with the three SADs were grouped into six clusters. None of the clusters was composed only of a single SAD, and these clusters demonstrated considerable differences in clinical manifestation. Both patients with SLE and SS had a more dispersed distribution in the six clusters. Among patients with SLE, the occurrence of renal compromise was higher in Clusters 3 and 6 (52% and 51%) than in the other clusters (p < 0.001). Cluster 3 also had a high proportion of patients with discoid lupus (60%) than did Cluster 6 (39%; p < 0.001). Patients with SS in Cluster 3 were the most distinctive because of the high occurrence of immunity disorders (63%) and other and unspecified benign neoplasm (58%) with statistical significance compared with the other clusters (all p < 0.05). The immunomarker-driven clustering method could recognise more clinically relevant subgroups of the SADs and would provide a more precise diagnosis basis.
期刊介绍:
BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data.
Topical areas include, but are not limited to:
-Development, evaluation, and application of novel data mining and machine learning algorithms.
-Adaptation, evaluation, and application of traditional data mining and machine learning algorithms.
-Open-source software for the application of data mining and machine learning algorithms.
-Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies.
-Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.