Identifying heterogeneous subgroups of systemic autoimmune diseases by applying a joint dimension reduction and clustering approach to immunomarkers

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Biodata Mining Pub Date : 2024-09-16 DOI:10.1186/s13040-024-00389-7

Chia-Wei Chang, Hsin-Yao Wang, Wan-Ying Lin, Yu-Chiang Wang, Wei-Lin Lo, Ting-Wei Lin, Jia-Ruei Yu, Yi-Ju Tseng

{"title":"Identifying heterogeneous subgroups of systemic autoimmune diseases by applying a joint dimension reduction and clustering approach to immunomarkers","authors":"Chia-Wei Chang, Hsin-Yao Wang, Wan-Ying Lin, Yu-Chiang Wang, Wei-Lin Lo, Ting-Wei Lin, Jia-Ruei Yu, Yi-Ju Tseng","doi":"10.1186/s13040-024-00389-7","DOIUrl":null,"url":null,"abstract":"The high complexity of systemic autoimmune diseases (SADs) has hindered precise management. This study aims to investigate heterogeneity in SADs. We applied a joint cluster analysis, which jointed multiple correspondence analysis and k-means, to immunomarkers and measured the heterogeneity of clusters by examining differences in immunomarkers and clinical manifestations. The electronic health records of patients who received an antinuclear antibody test and were diagnosed with SADs, namely systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), and Sjögren’s syndrome (SS), were retrieved between 2001 and 2016 from hospitals in Taiwan. With distinctive patterns of immunomarkers, a total of 11,923 patients with the three SADs were grouped into six clusters. None of the clusters was composed only of a single SAD, and these clusters demonstrated considerable differences in clinical manifestation. Both patients with SLE and SS had a more dispersed distribution in the six clusters. Among patients with SLE, the occurrence of renal compromise was higher in Clusters 3 and 6 (52% and 51%) than in the other clusters (p < 0.001). Cluster 3 also had a high proportion of patients with discoid lupus (60%) than did Cluster 6 (39%; p < 0.001). Patients with SS in Cluster 3 were the most distinctive because of the high occurrence of immunity disorders (63%) and other and unspecified benign neoplasm (58%) with statistical significance compared with the other clusters (all p < 0.05). The immunomarker-driven clustering method could recognise more clinically relevant subgroups of the SADs and would provide a more precise diagnosis basis.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"117 1","pages":""},"PeriodicalIF":6.1000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-024-00389-7","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

The high complexity of systemic autoimmune diseases (SADs) has hindered precise management. This study aims to investigate heterogeneity in SADs. We applied a joint cluster analysis, which jointed multiple correspondence analysis and k-means, to immunomarkers and measured the heterogeneity of clusters by examining differences in immunomarkers and clinical manifestations. The electronic health records of patients who received an antinuclear antibody test and were diagnosed with SADs, namely systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), and Sjögren’s syndrome (SS), were retrieved between 2001 and 2016 from hospitals in Taiwan. With distinctive patterns of immunomarkers, a total of 11,923 patients with the three SADs were grouped into six clusters. None of the clusters was composed only of a single SAD, and these clusters demonstrated considerable differences in clinical manifestation. Both patients with SLE and SS had a more dispersed distribution in the six clusters. Among patients with SLE, the occurrence of renal compromise was higher in Clusters 3 and 6 (52% and 51%) than in the other clusters (p < 0.001). Cluster 3 also had a high proportion of patients with discoid lupus (60%) than did Cluster 6 (39%; p < 0.001). Patients with SS in Cluster 3 were the most distinctive because of the high occurrence of immunity disorders (63%) and other and unspecified benign neoplasm (58%) with statistical significance compared with the other clusters (all p < 0.05). The immunomarker-driven clustering method could recognise more clinically relevant subgroups of the SADs and would provide a more precise diagnosis basis.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过对免疫标记物采用联合降维和聚类方法识别全身性自身免疫疾病的异质亚组

系统性自身免疫性疾病（SAD）的高度复杂性阻碍了精确的管理。本研究旨在调查 SAD 的异质性。我们对免疫标志物进行了联合聚类分析，将多重对应分析和k-means联合起来，通过研究免疫标志物和临床表现的差异来衡量聚类的异质性。研究人员检索了台湾各医院2001年至2016年间接受抗核抗体检测并被诊断为系统性红斑狼疮（SLE）、类风湿性关节炎（RA）和斯约格伦综合征（SS）的患者的电子病历。三种 SAD 患者的免疫标志物模式各不相同，共有 11,923 名患者被分为六个群组。没有一个群组仅由单一的 SAD 组成，而且这些群组在临床表现上有很大差异。系统性红斑狼疮和 SS 患者在六个群组中的分布较为分散。在系统性红斑狼疮患者中，第 3 组和第 6 组的肾功能损害发生率（52% 和 51%）高于其他组群（P < 0.001）。群组 3 中盘状狼疮患者的比例（60%）也高于群组 6（39%；P < 0.001）。群组 3 中的 SS 患者与其他群组相比，免疫紊乱（63%）和其他及未指定的良性肿瘤（58%）的发生率较高，具有统计学意义（均为 p <0.05），因此群组 3 的 SS 患者最具特色。免疫标记物驱动的聚类方法可以识别出更多与临床相关的 SADs 亚群，并提供更精确的诊断依据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

7.90

自引率

0.00%

发文量

审稿时长

23 weeks

期刊介绍： BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.