In this article, we expand upon the concepts introduced in Spivak (Metric realization of fuzzy simplicial sets, 2009. http://www.dspivak.net/metric_realization090922.pdf) about the relationship between the category (textbf{UM}) of uber metric spaces and the category (textbf{sFuz}) of fuzzy simplicial sets. We show that fuzzy simplicial sets can be regarded as natural combinatorial generalizations of metric relations. Furthermore, we take inspiration from UMAP (McInnes et al, in: Umap: Uniform manifold approximation and projection for dimension reduction, 2018) to apply the theory to manifold learning, dimension reduction and data visualization, while refining some of their constructions to put the corresponding theory on a more solid footing. A generalization of the adjunction between (textbf{UM}) and (textbf{sFuz}) allows us to view the adjunctions used in both publications as special cases. Moreover, we derive an explicit description of colimits in (textbf{UM}) and the realization functor (text {Re}:textbf{sFuz}rightarrow textbf{UM}), and show that (textbf{UM}) can be embedded into (textbf{sFuz}). Furthermore, we prove analogous results for the category of extended-pseudo metric spaces (textbf{EPMet}). We also provide rigorous definitions of functors that make it possible to recursively merge sets of fuzzy simplicial sets and provide a description of the adjunctions between the category of truncated fuzzy simplicial sets and (textbf{sFuz}), which we relate to persistent homology. Combining those constructions, we can show a surprising connection between the well-known dimension reduction methods UMAP and Isomap (Tenenbaum et al. in Science 290(5500):2319–2323, 2000) and derive an alternative algorithm, which we call IsUMap, that combines some of the strengths of both methods. Additionally, we developed a new embedding method that allows to preserve clusters detected in the original metric space that we construct from the data. The visualization of the optimization process gives the user information, both about the inner-cluster distributions in the original metric space and their inter-cluster relations. We compare our new method with UMAP, Isomap and t-SNE on a series of low- and high-dimensional datasets and provide explanations for observed differences and improvements.
扫码关注我们
求助内容:
应助结果提醒方式:
