Vincent Cohen-Addad, Debarati Das, Evangelos Kipouridis, Nikos Parotsidis, Mikkel Thorup
{"title":"Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor","authors":"Vincent Cohen-Addad, Debarati Das, Evangelos Kipouridis, Nikos Parotsidis, Mikkel Thorup","doi":"10.1145/3639453","DOIUrl":null,"url":null,"abstract":"<p>We consider the numerical taxonomy problem of fitting a positive distance function \\({\\mathcal {D}:{S\\choose 2}\\rightarrow \\mathbb {R}_{\\gt 0}} \\) by a tree metric. We want a tree <i>T</i> with positive edge weights and including <i>S</i> among the vertices so that their distances in <i>T</i> match those in \\(\\mathcal {D} \\). A nice application is in evolutionary biology where the tree <i>T</i> aims to approximate the branching process leading to the observed distances in \\(\\mathcal {D} \\) [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is the sum of distance errors over all pairs of points. We present a deterministic polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees, and for the special case of ultrametrics with a root having the same distance to all vertices in <i>S</i>. </p><p>The problems are APX-hard, so a constant factor is the best we can hope for in polynomial time. The best previous approximation factor was <i>O</i>((log <i>n</i>)(log log <i>n</i>)) by Ailon and Charikar [2005] who wrote “Determining whether an <i>O</i>(1) approximation can be obtained is a fascinating question”.</p>","PeriodicalId":50022,"journal":{"name":"Journal of the ACM","volume":"25 1","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the ACM","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3639453","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
We consider the numerical taxonomy problem of fitting a positive distance function \({\mathcal {D}:{S\choose 2}\rightarrow \mathbb {R}_{\gt 0}} \) by a tree metric. We want a tree T with positive edge weights and including S among the vertices so that their distances in T match those in \(\mathcal {D} \). A nice application is in evolutionary biology where the tree T aims to approximate the branching process leading to the observed distances in \(\mathcal {D} \) [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is the sum of distance errors over all pairs of points. We present a deterministic polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees, and for the special case of ultrametrics with a root having the same distance to all vertices in S.
The problems are APX-hard, so a constant factor is the best we can hope for in polynomial time. The best previous approximation factor was O((log n)(log log n)) by Ailon and Charikar [2005] who wrote “Determining whether an O(1) approximation can be obtained is a fascinating question”.
期刊介绍:
The best indicator of the scope of the journal is provided by the areas covered by its Editorial Board. These areas change from time to time, as the field evolves. The following areas are currently covered by a member of the Editorial Board: Algorithms and Combinatorial Optimization; Algorithms and Data Structures; Algorithms, Combinatorial Optimization, and Games; Artificial Intelligence; Complexity Theory; Computational Biology; Computational Geometry; Computer Graphics and Computer Vision; Computer-Aided Verification; Cryptography and Security; Cyber-Physical, Embedded, and Real-Time Systems; Database Systems and Theory; Distributed Computing; Economics and Computation; Information Theory; Logic and Computation; Logic, Algorithms, and Complexity; Machine Learning and Computational Learning Theory; Networking; Parallel Computing and Architecture; Programming Languages; Quantum Computing; Randomized Algorithms and Probabilistic Analysis of Algorithms; Scientific Computing and High Performance Computing; Software Engineering; Web Algorithms and Data Mining