Ibna Kowsar, Shourav B Rabbani, Kazi Fuad B Akhter, Manar D Samad
{"title":"Deep Clustering of Electronic Health Records Tabular Data for Clinical Interpretation.","authors":"Ibna Kowsar, Shourav B Rabbani, Kazi Fuad B Akhter, Manar D Samad","doi":"10.1109/ictp60248.2023.10490723","DOIUrl":null,"url":null,"abstract":"<p><p>Machine learning applications are widespread due to straightforward supervised learning of known data labels. Many data samples in real-world scenarios, including medicine, are unlabeled because data annotation can be time-consuming and error-prone. The application and evaluation of unsupervised clustering methods are not trivial and are limited to traditional methods (e.g., k-means) when clinicians demand deeper insights into patient data beyond classification accuracy. The contribution of this paper is three-fold: 1) to introduce a patient stratification strategy based on a clinical variable instead of a diagnostic label, 2) to evaluate clustering performance using within-cluster homogeneity and between-cluster statistical difference, and 3) to compare widely used traditional clustering algorithms (e.g., k-means) with a state-of-the-art deep learning solution for clustering tabular data. The deep clustering method achieves superior within-cluster homogeneity and between-cluster separation compared to k-means and identifies three statistically distinct and clinically interpretable high blood pressure patient clusters. The proposed clustering strategy and evaluation metrics will facilitate the stratification of large patient cohorts in health science research without requiring explicit diagnostic labels.</p>","PeriodicalId":519985,"journal":{"name":"... IEEE International Conference on Telecommunications and Photonics. IEEE International Conference on Telecommunications and Photonics","volume":"2023 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11255553/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"... IEEE International Conference on Telecommunications and Photonics. IEEE International Conference on Telecommunications and Photonics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ictp60248.2023.10490723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/4/11 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning applications are widespread due to straightforward supervised learning of known data labels. Many data samples in real-world scenarios, including medicine, are unlabeled because data annotation can be time-consuming and error-prone. The application and evaluation of unsupervised clustering methods are not trivial and are limited to traditional methods (e.g., k-means) when clinicians demand deeper insights into patient data beyond classification accuracy. The contribution of this paper is three-fold: 1) to introduce a patient stratification strategy based on a clinical variable instead of a diagnostic label, 2) to evaluate clustering performance using within-cluster homogeneity and between-cluster statistical difference, and 3) to compare widely used traditional clustering algorithms (e.g., k-means) with a state-of-the-art deep learning solution for clustering tabular data. The deep clustering method achieves superior within-cluster homogeneity and between-cluster separation compared to k-means and identifies three statistically distinct and clinically interpretable high blood pressure patient clusters. The proposed clustering strategy and evaluation metrics will facilitate the stratification of large patient cohorts in health science research without requiring explicit diagnostic labels.