时间非齐次扩散几何与拓扑

IF 1.9 Q1 MATHEMATICS, APPLIED SIAM journal on mathematics of data science Pub Date : 2023-05-22 DOI:10.1137/21m1462945

Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy

{"title":"时间非齐次扩散几何与拓扑","authors":"Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy","doi":"10.1137/21m1462945","DOIUrl":null,"url":null,"abstract":"Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes a diffusion operator and then applies it to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show that diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic condensation homology. We use this intrinsic topology, as well as the ambient persistent homology, of the condensation process to study how the data changes over diffusion time. We demonstrate both types of topological information in well-understood toy examples. Our work gives theoretical insight into the convergence of diffusion condensation and shows that it provides a link between topological and geometric data analysis.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"22 1","pages":"0"},"PeriodicalIF":1.9000,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Time-Inhomogeneous Diffusion Geometry and Topology\",\"authors\":\"Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy\",\"doi\":\"10.1137/21m1462945\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes a diffusion operator and then applies it to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show that diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic condensation homology. We use this intrinsic topology, as well as the ambient persistent homology, of the condensation process to study how the data changes over diffusion time. We demonstrate both types of topological information in well-understood toy examples. Our work gives theoretical insight into the convergence of diffusion condensation and shows that it provides a link between topological and geometric data analysis.\",\"PeriodicalId\":74797,\"journal\":{\"name\":\"SIAM journal on mathematics of data science\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2023-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SIAM journal on mathematics of data science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1137/21m1462945\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM journal on mathematics of data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/21m1462945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 1

摘要

扩散凝聚是一个动态过程，它产生一系列旨在编码有意义抽象的多尺度数据表示。它已被证明是有效的流形学习，去噪，聚类和高维数据的可视化。扩散凝聚被构造为一个时间非均匀过程，其中每一步首先计算一个扩散算子，然后将其应用于数据。我们从几何、光谱和拓扑的角度对这一过程的收敛和演化进行了理论分析。从几何角度来看，我们基于最小转移概率和数据半径得到收敛界，而从光谱角度来看，我们的边界是基于扩散核的特征谱。我们的光谱结果特别有趣，因为大多数关于数据扩散的文献都集中在均匀过程上。从拓扑学的角度，我们证明了扩散凝聚推广了基于质心的分层聚类。我们使用这个透视图来获得一个基于数据点数量的边界，与它们的位置无关。为了理解数据几何超越收敛的演变，我们使用拓扑数据分析。我们证明了缩合过程本身定义了一个本征缩合同源性。我们使用凝聚过程的这种内在拓扑以及环境持续同源性来研究数据随扩散时间的变化。我们在易于理解的玩具示例中演示了两种类型的拓扑信息。我们的工作为扩散凝聚的收敛提供了理论见解，并表明它提供了拓扑和几何数据分析之间的联系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Time-Inhomogeneous Diffusion Geometry and Topology

Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes a diffusion operator and then applies it to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show that diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic condensation homology. We use this intrinsic topology, as well as the ambient persistent homology, of the condensation process to study how the data changes over diffusion time. We demonstrate both types of topological information in well-understood toy examples. Our work gives theoretical insight into the convergence of diffusion condensation and shows that it provides a link between topological and geometric data analysis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

SIAM journal on mathematics of data science

自引率

0.00%

发文量