Exact and efficient phylodynamic simulation from arbitrarily large populations.

ArXiv Pub Date : 2024-08-10
Michael Celentano, William S DeWitt, Sebastian Prillo, Yun S Song
{"title":"Exact and efficient phylodynamic simulation from arbitrarily large populations.","authors":"Michael Celentano, William S DeWitt, Sebastian Prillo, Yun S Song","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Many biological studies involve inferring the evolutionary history of a sample of individuals from a large population and interpreting the reconstructed tree. Such an ascertained tree typically represents only a small part of a comprehensive population tree and is distorted by survivorship and sampling biases. Inferring evolutionary parameters from ascertained trees requires modeling both the underlying population dynamics and the ascertainment process. A crucial component of this phylodynamic modeling involves tree simulation, which is used to benchmark probabilistic inference methods. To simulate an ascertained tree, one must first simulate the full population tree and then prune unobserved lineages. Consequently, the computational cost is determined not by the size of the final simulated tree, but by the size of the population tree in which it is embedded. In most biological scenarios, simulations of the entire population are prohibitively expensive due to computational demands placed on lineages without sampled descendants. Here, we address this challenge by proving that, for any partially ascertained process from a general multi-type birth-death-mutation-sampling model, there exists an equivalent process with complete sampling and no death, a property which we leverage to develop a highly efficient algorithm for simulating trees. Our algorithm scales linearly with the size of the final simulated tree and is independent of the population size, enabling simulations from extremely large populations beyond the reach of current methods but essential for various biological applications. We anticipate that this unprecedented speedup will significantly advance the development of novel inference methods that require extensive training data.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10925381/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Many biological studies involve inferring the evolutionary history of a sample of individuals from a large population and interpreting the reconstructed tree. Such an ascertained tree typically represents only a small part of a comprehensive population tree and is distorted by survivorship and sampling biases. Inferring evolutionary parameters from ascertained trees requires modeling both the underlying population dynamics and the ascertainment process. A crucial component of this phylodynamic modeling involves tree simulation, which is used to benchmark probabilistic inference methods. To simulate an ascertained tree, one must first simulate the full population tree and then prune unobserved lineages. Consequently, the computational cost is determined not by the size of the final simulated tree, but by the size of the population tree in which it is embedded. In most biological scenarios, simulations of the entire population are prohibitively expensive due to computational demands placed on lineages without sampled descendants. Here, we address this challenge by proving that, for any partially ascertained process from a general multi-type birth-death-mutation-sampling model, there exists an equivalent process with complete sampling and no death, a property which we leverage to develop a highly efficient algorithm for simulating trees. Our algorithm scales linearly with the size of the final simulated tree and is independent of the population size, enabling simulations from extremely large populations beyond the reach of current methods but essential for various biological applications. We anticipate that this unprecedented speedup will significantly advance the development of novel inference methods that require extensive training data.

分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从任意大的种群中进行精确高效的系统动力学模拟。
许多生物学研究都涉及从一个大种群中抽样推断个体的家谱历史,并对重建的家谱树进行解释。这种确定的树通常只代表整个种群树的一小部分,而且会受到存活率和取样偏差的扭曲。从确定的树中推断进化参数需要对潜在的种群动态和确定过程进行建模。这种系统动力学建模的一个重要组成部分是树模拟,用于对概率推断方法进行基准测试。要模拟确定树,首先必须模拟完整的种群树,然后剪除未观察到的世系。因此,计算成本不是由最终模拟树的大小决定的,而是由其所嵌入的种群树的大小决定的。在大多数生物场景中,对整个种群进行模拟的成本过高,这是因为对没有采样后代的世系的计算要求过高。在这里,我们通过证明对于一般多类型出生-死亡-突变-采样(BDMS)模型中的任何部分确定过程,存在一个等效的纯出生过程(即无死亡),该过程带有突变和完全采样,来应对这一挑战。在这些过程中生成的最终树具有完全相同的分布。利用这一特性,我们提出了一种在一般 BDMS 模型下模拟树的高效算法。我们的算法与最终模拟树的大小成线性关系,与种群大小无关,因此可以模拟超大种群,这超出了现有方法的能力范围,但对各种生物应用至关重要。我们预计,这种前所未有的提速将极大地推动需要大量训练数据的新型推断方法的发展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Categorization of 33 computational methods to detect spatially variable genes from spatially resolved transcriptomics data. A Geometric Tension Dynamics Model of Epithelial Convergent Extension. Learning Molecular Representation in a Cell. Ankle Exoskeletons May Hinder Standing Balance in Simple Models of Older and Younger Adults. Nonparametric causal inference for optogenetics: sequential excursion effects for dynamic regimes.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1