{"title":"拟合树与 $\\ell_1$-hyperbolic 距离","authors":"Joon-Hyeok Yim, Anna C. Gilbert","doi":"arxiv-2409.01010","DOIUrl":null,"url":null,"abstract":"Building trees to represent or to fit distances is a critical component of\nphylogenetic analysis, metric embeddings, approximation algorithms, geometric\ngraph neural nets, and the analysis of hierarchical data. Much of the previous\nalgorithmic work, however, has focused on generic metric spaces (i.e., those\nwith no a priori constraints). Leveraging several ideas from the mathematical\nanalysis of hyperbolic geometry and geometric group theory, we study the tree\nfitting problem as finding the relation between the hyperbolicity\n(ultrametricity) vector and the error of tree (ultrametric) embedding. That is,\nwe define a vector of hyperbolicity (ultrametric) values over all triples of\npoints and compare the $\\ell_p$ norms of this vector with the $\\ell_q$ norm of\nthe distortion of the best tree fit to the distances. This formulation allows\nus to define the average hyperbolicity (ultrametricity) in terms of a\nnormalized $\\ell_1$ norm of the hyperbolicity vector. Furthermore, we can\ninterpret the classical tree fitting result of Gromov as a $p = q = \\infty$\nresult. We present an algorithm HCCRootedTreeFit such that the $\\ell_1$ error\nof the output embedding is analytically bounded in terms of the $\\ell_1$ norm\nof the hyperbolicity vector (i.e., $p = q = 1$) and that this result is tight.\nFurthermore, this algorithm has significantly different theoretical and\nempirical performance as compared to Gromov's result and related algorithms.\nFinally, we show using HCCRootedTreeFit and related tree fitting algorithms,\nthat supposedly standard data sets for hierarchical data analysis and geometric\ngraph neural networks have radically different tree fits than those of\nsynthetic, truly tree-like data sets, suggesting that a much more refined\nanalysis of these standard data sets is called for.","PeriodicalId":501444,"journal":{"name":"arXiv - MATH - Metric Geometry","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fitting trees to $\\\\ell_1$-hyperbolic distances\",\"authors\":\"Joon-Hyeok Yim, Anna C. Gilbert\",\"doi\":\"arxiv-2409.01010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Building trees to represent or to fit distances is a critical component of\\nphylogenetic analysis, metric embeddings, approximation algorithms, geometric\\ngraph neural nets, and the analysis of hierarchical data. Much of the previous\\nalgorithmic work, however, has focused on generic metric spaces (i.e., those\\nwith no a priori constraints). Leveraging several ideas from the mathematical\\nanalysis of hyperbolic geometry and geometric group theory, we study the tree\\nfitting problem as finding the relation between the hyperbolicity\\n(ultrametricity) vector and the error of tree (ultrametric) embedding. That is,\\nwe define a vector of hyperbolicity (ultrametric) values over all triples of\\npoints and compare the $\\\\ell_p$ norms of this vector with the $\\\\ell_q$ norm of\\nthe distortion of the best tree fit to the distances. This formulation allows\\nus to define the average hyperbolicity (ultrametricity) in terms of a\\nnormalized $\\\\ell_1$ norm of the hyperbolicity vector. Furthermore, we can\\ninterpret the classical tree fitting result of Gromov as a $p = q = \\\\infty$\\nresult. We present an algorithm HCCRootedTreeFit such that the $\\\\ell_1$ error\\nof the output embedding is analytically bounded in terms of the $\\\\ell_1$ norm\\nof the hyperbolicity vector (i.e., $p = q = 1$) and that this result is tight.\\nFurthermore, this algorithm has significantly different theoretical and\\nempirical performance as compared to Gromov's result and related algorithms.\\nFinally, we show using HCCRootedTreeFit and related tree fitting algorithms,\\nthat supposedly standard data sets for hierarchical data analysis and geometric\\ngraph neural networks have radically different tree fits than those of\\nsynthetic, truly tree-like data sets, suggesting that a much more refined\\nanalysis of these standard data sets is called for.\",\"PeriodicalId\":501444,\"journal\":{\"name\":\"arXiv - MATH - Metric Geometry\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - MATH - Metric Geometry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.01010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Metric Geometry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Building trees to represent or to fit distances is a critical component of
phylogenetic analysis, metric embeddings, approximation algorithms, geometric
graph neural nets, and the analysis of hierarchical data. Much of the previous
algorithmic work, however, has focused on generic metric spaces (i.e., those
with no a priori constraints). Leveraging several ideas from the mathematical
analysis of hyperbolic geometry and geometric group theory, we study the tree
fitting problem as finding the relation between the hyperbolicity
(ultrametricity) vector and the error of tree (ultrametric) embedding. That is,
we define a vector of hyperbolicity (ultrametric) values over all triples of
points and compare the $\ell_p$ norms of this vector with the $\ell_q$ norm of
the distortion of the best tree fit to the distances. This formulation allows
us to define the average hyperbolicity (ultrametricity) in terms of a
normalized $\ell_1$ norm of the hyperbolicity vector. Furthermore, we can
interpret the classical tree fitting result of Gromov as a $p = q = \infty$
result. We present an algorithm HCCRootedTreeFit such that the $\ell_1$ error
of the output embedding is analytically bounded in terms of the $\ell_1$ norm
of the hyperbolicity vector (i.e., $p = q = 1$) and that this result is tight.
Furthermore, this algorithm has significantly different theoretical and
empirical performance as compared to Gromov's result and related algorithms.
Finally, we show using HCCRootedTreeFit and related tree fitting algorithms,
that supposedly standard data sets for hierarchical data analysis and geometric
graph neural networks have radically different tree fits than those of
synthetic, truly tree-like data sets, suggesting that a much more refined
analysis of these standard data sets is called for.