Deep Clustering of Electronic Health Records Tabular Data for Clinical Interpretation.

... IEEE International Conference on Telecommunications and Photonics. IEEE International Conference on Telecommunications and Photonics Pub Date : 2023-12-01 Epub Date: 2024-04-11 DOI:10.1109/ictp60248.2023.10490723

Ibna Kowsar, Shourav B Rabbani, Kazi Fuad B Akhter, Manar D Samad

{"title":"Deep Clustering of Electronic Health Records Tabular Data for Clinical Interpretation.","authors":"Ibna Kowsar, Shourav B Rabbani, Kazi Fuad B Akhter, Manar D Samad","doi":"10.1109/ictp60248.2023.10490723","DOIUrl":null,"url":null,"abstract":"<p><p>Machine learning applications are widespread due to straightforward supervised learning of known data labels. Many data samples in real-world scenarios, including medicine, are unlabeled because data annotation can be time-consuming and error-prone. The application and evaluation of unsupervised clustering methods are not trivial and are limited to traditional methods (e.g., k-means) when clinicians demand deeper insights into patient data beyond classification accuracy. The contribution of this paper is three-fold: 1) to introduce a patient stratification strategy based on a clinical variable instead of a diagnostic label, 2) to evaluate clustering performance using within-cluster homogeneity and between-cluster statistical difference, and 3) to compare widely used traditional clustering algorithms (e.g., k-means) with a state-of-the-art deep learning solution for clustering tabular data. The deep clustering method achieves superior within-cluster homogeneity and between-cluster separation compared to k-means and identifies three statistically distinct and clinically interpretable high blood pressure patient clusters. The proposed clustering strategy and evaluation metrics will facilitate the stratification of large patient cohorts in health science research without requiring explicit diagnostic labels.</p>","PeriodicalId":519985,"journal":{"name":"... IEEE International Conference on Telecommunications and Photonics. IEEE International Conference on Telecommunications and Photonics","volume":"2023 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11255553/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"... IEEE International Conference on Telecommunications and Photonics. IEEE International Conference on Telecommunications and Photonics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ictp60248.2023.10490723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/4/11 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning applications are widespread due to straightforward supervised learning of known data labels. Many data samples in real-world scenarios, including medicine, are unlabeled because data annotation can be time-consuming and error-prone. The application and evaluation of unsupervised clustering methods are not trivial and are limited to traditional methods (e.g., k-means) when clinicians demand deeper insights into patient data beyond classification accuracy. The contribution of this paper is three-fold: 1) to introduce a patient stratification strategy based on a clinical variable instead of a diagnostic label, 2) to evaluate clustering performance using within-cluster homogeneity and between-cluster statistical difference, and 3) to compare widely used traditional clustering algorithms (e.g., k-means) with a state-of-the-art deep learning solution for clustering tabular data. The deep clustering method achieves superior within-cluster homogeneity and between-cluster separation compared to k-means and identifies three statistically distinct and clinically interpretable high blood pressure patient clusters. The proposed clustering strategy and evaluation metrics will facilitate the stratification of large patient cohorts in health science research without requiring explicit diagnostic labels.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

电子健康记录表格数据的深度聚类，用于临床解读。

由于对已知数据标签的直接监督学习，机器学习的应用非常广泛。由于数据标注耗时且容易出错，现实世界（包括医学）中的许多数据样本都是无标签的。无监督聚类方法的应用和评估并非易事，当临床医生需要对病人数据有更深入的了解，而不仅仅局限于分类准确性时，无监督聚类方法的应用和评估就仅限于传统方法（如 K-均值）。本文有三方面的贡献：1）介绍一种基于临床变量而非诊断标签的患者分层策略；2）使用簇内同质性和簇间统计差异评估聚类性能；3）比较广泛使用的传统聚类算法（如 k-means）和最先进的表格数据聚类深度学习解决方案。与 k-means 相比，深度聚类方法实现了更优越的簇内同质性和簇间分离性，并识别出三个在统计学上截然不同且在临床上可解释的高血压患者簇。所提出的聚类策略和评估指标将有助于在健康科学研究中对大型患者群进行分层，而无需明确的诊断标签。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助