{"title":"Comparative assessment of projection and clustering method combinations in the analysis of biomedical data","authors":"Jörn Lötsch , Alfred Ultsch","doi":"10.1016/j.imu.2024.101573","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Clustering on projected data is common in biomedical research analysis. Principal component analysis (PCA) is widely used for projection, focusing on data dispersion (variance), while clustering identifies data concentrations (neighborhood). These are conflicting aims. This study re-evaluates combinations of PCA and other projection methods with common clustering algorithms.</p></div><div><h3>Methods</h3><p>Six projection methods (PCA, ICA, isomap, MDS, t-SNE, UMAP) were combined with five clustering algorithms (k-means, k-medoids, single link, Ward's method, average link). Projections and clusterings were evaluated using a numerical criterion for evaluating clustering performance and a visual criterion based on plotting the projected data on a Voronoi tessellation plane with class-wise coloring. Nine artificial and five real biomedical datasets were analyzed.</p></div><div><h3>Results</h3><p>No combination consistently captured prior classifications in projections and clusters. Visual inspection proved essential. PCA was often but not always outperformed or equaled by neighborhood-based methods (UMAP, t-SNE) and manifold learning techniques (isomap).</p></div><div><h3>Conclusions</h3><p>The results dissaprove PCA as a standard projection method prior to clustering. Therefore, method selection should be data specific as a tailored approach to data projection and clustering in biomedical analysis. To aid this process, we propose a novel visualization technique that combines Voronoi tessellation with color coding.</p></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"50 ","pages":"Article 101573"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2352914824001291/pdfft?md5=94cb1089dab67b47fecf55f0a7d21d34&pid=1-s2.0-S2352914824001291-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatics in Medicine Unlocked","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352914824001291","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Clustering on projected data is common in biomedical research analysis. Principal component analysis (PCA) is widely used for projection, focusing on data dispersion (variance), while clustering identifies data concentrations (neighborhood). These are conflicting aims. This study re-evaluates combinations of PCA and other projection methods with common clustering algorithms.
Methods
Six projection methods (PCA, ICA, isomap, MDS, t-SNE, UMAP) were combined with five clustering algorithms (k-means, k-medoids, single link, Ward's method, average link). Projections and clusterings were evaluated using a numerical criterion for evaluating clustering performance and a visual criterion based on plotting the projected data on a Voronoi tessellation plane with class-wise coloring. Nine artificial and five real biomedical datasets were analyzed.
Results
No combination consistently captured prior classifications in projections and clusters. Visual inspection proved essential. PCA was often but not always outperformed or equaled by neighborhood-based methods (UMAP, t-SNE) and manifold learning techniques (isomap).
Conclusions
The results dissaprove PCA as a standard projection method prior to clustering. Therefore, method selection should be data specific as a tailored approach to data projection and clustering in biomedical analysis. To aid this process, we propose a novel visualization technique that combines Voronoi tessellation with color coding.
期刊介绍:
Informatics in Medicine Unlocked (IMU) is an international gold open access journal covering a broad spectrum of topics within medical informatics, including (but not limited to) papers focusing on imaging, pathology, teledermatology, public health, ophthalmological, nursing and translational medicine informatics. The full papers that are published in the journal are accessible to all who visit the website.