Advances in Estimating Level-1 Phylogenetic Networks from Unrooted SNPs.

IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Journal of Computational Biology Pub Date : 2024-11-25 DOI:10.1089/cmb.2024.0710
Tandy Warnow, Yasamin Tabatabaee, Steven N Evans
{"title":"Advances in Estimating Level-1 Phylogenetic Networks from Unrooted SNPs.","authors":"Tandy Warnow, Yasamin Tabatabaee, Steven N Evans","doi":"10.1089/cmb.2024.0710","DOIUrl":null,"url":null,"abstract":"<p><p>We address the problem of how to estimate a phylogenetic network when given single-nucleotide polymorphisms (i.e., SNPs, or bi-allelic markers that have evolved under the infinite sites assumption). We focus on level-1 phylogenetic networks (i.e., networks where the cycles are node-disjoint), since more complex networks are unidentifiable. We provide a polynomial time quartet-based method that we prove correct for reconstructing the semi-directed level-1 phylogenetic network <i>N</i>, if we are given a set of SNPs that covers all the bipartitions of <i>N</i>, even if the ancestral state is not known, provided that the cycles are of length at least 5; we also prove that an algorithm developed by Dan Gusfield in the <i>Journal of Computer and System Sciences</i> in 2005 correctly recovers semi-directed level-1 phylogenetic networks in polynomial time in this case. We present a stochastic model for DNA evolution, and we prove that the two methods (our quartet-based method and Gusfield's method) are statistically consistent estimators of the semi-directed level-1 phylogenetic network. For the case of multi-state homoplasy-free characters, we prove that our quartet-based method correctly constructs semi-directed level-1 networks under the required conditions (all cycles of length at least five), while Gusfield's algorithm cannot be used in that case. These results assume that we have access to an oracle for indicating which sites in the DNA alignment are homoplasy-free, and we show that the methods are robust, under some conditions, to oracle errors.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1089/cmb.2024.0710","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

We address the problem of how to estimate a phylogenetic network when given single-nucleotide polymorphisms (i.e., SNPs, or bi-allelic markers that have evolved under the infinite sites assumption). We focus on level-1 phylogenetic networks (i.e., networks where the cycles are node-disjoint), since more complex networks are unidentifiable. We provide a polynomial time quartet-based method that we prove correct for reconstructing the semi-directed level-1 phylogenetic network N, if we are given a set of SNPs that covers all the bipartitions of N, even if the ancestral state is not known, provided that the cycles are of length at least 5; we also prove that an algorithm developed by Dan Gusfield in the Journal of Computer and System Sciences in 2005 correctly recovers semi-directed level-1 phylogenetic networks in polynomial time in this case. We present a stochastic model for DNA evolution, and we prove that the two methods (our quartet-based method and Gusfield's method) are statistically consistent estimators of the semi-directed level-1 phylogenetic network. For the case of multi-state homoplasy-free characters, we prove that our quartet-based method correctly constructs semi-directed level-1 networks under the required conditions (all cycles of length at least five), while Gusfield's algorithm cannot be used in that case. These results assume that we have access to an oracle for indicating which sites in the DNA alignment are homoplasy-free, and we show that the methods are robust, under some conditions, to oracle errors.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从无根 SNPs 估算一级系统发育网络的进展。
我们要解决的问题是,在给定单核苷酸多态性(即 SNP 或在无限位点假设下进化的双等位基因标记)的情况下,如何估算系统发育网络。我们的重点是一级系统发生网络(即循环节点不相连的网络),因为更复杂的网络是无法识别的。我们提供了一种基于多项式时间四元组的方法,并证明了这种方法在重建半定向一级系统发生网络 N 时的正确性,如果我们给定的 SNP 集覆盖了 N 的所有双分区,即使祖先状态未知,条件是循环的长度至少为 5;我们还证明了 Dan Gusfield 于 2005 年在《计算机与系统科学杂志》(Journal of Computer and System Sciences)上开发的一种算法在这种情况下能以多项式时间正确地恢复半定向一级系统发生网络。我们提出了一个 DNA 进化的随机模型,并证明这两种方法(我们的基于四元组的方法和 Gusfield 的方法)都是半定向一级系统发育网络的统计一致的估计方法。对于多态无同源字符的情况,我们证明我们基于四重奏的方法在所需条件下(所有循环长度至少为 5)能正确构建半定向一级网络,而 Gusfield 算法不能用于这种情况。这些结果假定我们可以使用一个神谕来指示 DNA 配对中哪些位点是无同源的,我们证明了这些方法在某些条件下对神谕错误是稳健的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Computational Biology
Journal of Computational Biology 生物-计算机:跨学科应用
CiteScore
3.60
自引率
5.90%
发文量
113
审稿时长
6-12 weeks
期刊介绍: Journal of Computational Biology is the leading peer-reviewed journal in computational biology and bioinformatics, publishing in-depth statistical, mathematical, and computational analysis of methods, as well as their practical impact. Available only online, this is an essential journal for scientists and students who want to keep abreast of developments in bioinformatics. Journal of Computational Biology coverage includes: -Genomics -Mathematical modeling and simulation -Distributed and parallel biological computing -Designing biological databases -Pattern matching and pattern detection -Linking disparate databases and data -New tools for computational biology -Relational and object-oriented database technology for bioinformatics -Biological expert system design and use -Reasoning by analogy, hypothesis formation, and testing by machine -Management of biological databases
期刊最新文献
CLHGNNMDA: Hypergraph Neural Network Model Enhanced by Contrastive Learning for miRNA-Disease Association Prediction. Advances in Estimating Level-1 Phylogenetic Networks from Unrooted SNPs. Adaptive Arithmetic Coding-Based Encoding Method Toward High-Density DNA Storage. The Statistics of Parametrized Syncmers in a Simple Mutation Process Without Spurious Matches. A Hybrid GNN Approach for Improved Molecular Property Prediction.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1