{"title":"A consistent least-squares criterion for calibrating edge lengths in phylogenetic networks","authors":"Jingcheng Xu, Cécile Ané","doi":"arxiv-2407.19343","DOIUrl":null,"url":null,"abstract":"In phylogenetic networks, it is desirable to estimate edge lengths in\nsubstitutions per site or calendar time. Yet, there is a lack of scalable\nmethods that provide such estimates. Here we consider the problem of obtaining\nedge length estimates from genetic distances, in the presence of rate variation\nacross genes and lineages, when the network topology is known. We propose a\nnovel criterion based on least-squares that is both consistent and\ncomputationally tractable. The crux of our approach is to decompose the genetic\ndistances into two parts, one of which is invariant across displayed trees of\nthe network. The scaled genetic distances are then fitted to the invariant\npart, while the average scaled genetic distances are fitted to the\nnon-invariant part. We show that this criterion is consistent provided that\nthere exists a tree path between some pair of tips in the network, and that\nedge lengths in the network are identifiable from average distances. We also\nprovide a constrained variant of this criterion assuming a molecular clock,\nwhich can be used to obtain relative edge lengths in calendar time.","PeriodicalId":501044,"journal":{"name":"arXiv - QuanBio - Populations and Evolution","volume":"42 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Populations and Evolution","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.19343","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In phylogenetic networks, it is desirable to estimate edge lengths in
substitutions per site or calendar time. Yet, there is a lack of scalable
methods that provide such estimates. Here we consider the problem of obtaining
edge length estimates from genetic distances, in the presence of rate variation
across genes and lineages, when the network topology is known. We propose a
novel criterion based on least-squares that is both consistent and
computationally tractable. The crux of our approach is to decompose the genetic
distances into two parts, one of which is invariant across displayed trees of
the network. The scaled genetic distances are then fitted to the invariant
part, while the average scaled genetic distances are fitted to the
non-invariant part. We show that this criterion is consistent provided that
there exists a tree path between some pair of tips in the network, and that
edge lengths in the network are identifiable from average distances. We also
provide a constrained variant of this criterion assuming a molecular clock,
which can be used to obtain relative edge lengths in calendar time.