加速混合型生物特征之间依赖性的贝叶斯推断。

IF 4.3 2区 生物学 PLoS Computational Biology Pub Date : 2023-08-28 eCollection Date: 2023-08-01 DOI:10.1371/journal.pcbi.1011419
Zhenyu Zhang, Akihiko Nishimura, Nídia S Trovão, Joshua L Cherry, Andrew J Holbrook, Xiang Ji, Philippe Lemey, Marc A Suchard
{"title":"加速混合型生物特征之间依赖性的贝叶斯推断。","authors":"Zhenyu Zhang, Akihiko Nishimura, Nídia S Trovão, Joshua L Cherry, Andrew J Holbrook, Xiang Ji, Philippe Lemey, Marc A Suchard","doi":"10.1371/journal.pcbi.1011419","DOIUrl":null,"url":null,"abstract":"<p><p>Inferring dependencies between mixed-type biological traits while accounting for evolutionary relationships between specimens is of great scientific interest yet remains infeasible when trait and specimen counts grow large. The state-of-the-art approach uses a phylogenetic multivariate probit model to accommodate binary and continuous traits via a latent variable framework, and utilizes an efficient bouncy particle sampler (BPS) to tackle the computational bottleneck-integrating many latent variables from a high-dimensional truncated normal distribution. This approach breaks down as the number of specimens grows and fails to reliably characterize conditional dependencies between traits. Here, we propose an inference pipeline for phylogenetic probit models that greatly outperforms BPS. The novelty lies in 1) a combination of the recent Zigzag Hamiltonian Monte Carlo (Zigzag-HMC) with linear-time gradient evaluations and 2) a joint sampling scheme for highly correlated latent variables and correlation matrix elements. In an application exploring HIV-1 evolution from 535 viruses, the inference requires joint sampling from an 11,235-dimensional truncated normal and a 24-dimensional covariance matrix. Our method yields a 5-fold speedup compared to BPS and makes it possible to learn partial correlations between candidate viral mutations and virulence. Computational speedup now enables us to tackle even larger problems: we study the evolution of influenza H1N1 glycosylations on around 900 viruses. For broader applicability, we extend the phylogenetic probit model to incorporate categorical traits, and demonstrate its use to study Aquilegia flower and pollinator co-evolution.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":"19 8","pages":"e1011419"},"PeriodicalIF":4.3000,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10491301/pdf/","citationCount":"0","resultStr":"{\"title\":\"Accelerating Bayesian inference of dependency between mixed-type biological traits.\",\"authors\":\"Zhenyu Zhang, Akihiko Nishimura, Nídia S Trovão, Joshua L Cherry, Andrew J Holbrook, Xiang Ji, Philippe Lemey, Marc A Suchard\",\"doi\":\"10.1371/journal.pcbi.1011419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Inferring dependencies between mixed-type biological traits while accounting for evolutionary relationships between specimens is of great scientific interest yet remains infeasible when trait and specimen counts grow large. The state-of-the-art approach uses a phylogenetic multivariate probit model to accommodate binary and continuous traits via a latent variable framework, and utilizes an efficient bouncy particle sampler (BPS) to tackle the computational bottleneck-integrating many latent variables from a high-dimensional truncated normal distribution. This approach breaks down as the number of specimens grows and fails to reliably characterize conditional dependencies between traits. Here, we propose an inference pipeline for phylogenetic probit models that greatly outperforms BPS. The novelty lies in 1) a combination of the recent Zigzag Hamiltonian Monte Carlo (Zigzag-HMC) with linear-time gradient evaluations and 2) a joint sampling scheme for highly correlated latent variables and correlation matrix elements. In an application exploring HIV-1 evolution from 535 viruses, the inference requires joint sampling from an 11,235-dimensional truncated normal and a 24-dimensional covariance matrix. Our method yields a 5-fold speedup compared to BPS and makes it possible to learn partial correlations between candidate viral mutations and virulence. Computational speedup now enables us to tackle even larger problems: we study the evolution of influenza H1N1 glycosylations on around 900 viruses. For broader applicability, we extend the phylogenetic probit model to incorporate categorical traits, and demonstrate its use to study Aquilegia flower and pollinator co-evolution.</p>\",\"PeriodicalId\":49688,\"journal\":{\"name\":\"PLoS Computational Biology\",\"volume\":\"19 8\",\"pages\":\"e1011419\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2023-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10491301/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS Computational Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pcbi.1011419\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/8/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pcbi.1011419","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/8/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在解释标本之间的进化关系的同时,推断混合型生物特征之间的依赖性具有很大的科学意义,但当特征和标本数量增加时,这仍然是不可行的。最先进的方法使用系统发育多变量probit模型,通过潜在变量框架来适应二元和连续特征,并使用有效的有界粒子采样器(BPS)来解决从高维截断正态分布中集成许多潜在变量的计算瓶颈。这种方法随着样本数量的增长而失效,并且无法可靠地表征性状之间的条件依赖性。在这里,我们提出了一个系统发育概率集模型的推理管道,它大大优于BPS。新颖性在于1)将最近的Zigzag哈密顿蒙特卡罗(Zigzag HMC)与线性时间梯度评估相结合,以及2)用于高度相关的潜在变量和相关矩阵元素的联合采样方案。在探索535种病毒的HIV-1进化的应用中,推断需要从11235维截断正态和24维协方差矩阵中联合采样。与BPS相比,我们的方法产生了5倍的加速,并使我们有可能了解候选病毒突变和毒力之间的部分相关性。计算加速现在使我们能够解决更大的问题:我们研究了大约900种病毒上甲型H1N1流感糖基化的进化。为了更广泛的适用性,我们扩展了系统发育概率模型,将分类特征纳入其中,并证明了它在研究Aquilegia花和传粉昆虫共同进化中的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Accelerating Bayesian inference of dependency between mixed-type biological traits.

Inferring dependencies between mixed-type biological traits while accounting for evolutionary relationships between specimens is of great scientific interest yet remains infeasible when trait and specimen counts grow large. The state-of-the-art approach uses a phylogenetic multivariate probit model to accommodate binary and continuous traits via a latent variable framework, and utilizes an efficient bouncy particle sampler (BPS) to tackle the computational bottleneck-integrating many latent variables from a high-dimensional truncated normal distribution. This approach breaks down as the number of specimens grows and fails to reliably characterize conditional dependencies between traits. Here, we propose an inference pipeline for phylogenetic probit models that greatly outperforms BPS. The novelty lies in 1) a combination of the recent Zigzag Hamiltonian Monte Carlo (Zigzag-HMC) with linear-time gradient evaluations and 2) a joint sampling scheme for highly correlated latent variables and correlation matrix elements. In an application exploring HIV-1 evolution from 535 viruses, the inference requires joint sampling from an 11,235-dimensional truncated normal and a 24-dimensional covariance matrix. Our method yields a 5-fold speedup compared to BPS and makes it possible to learn partial correlations between candidate viral mutations and virulence. Computational speedup now enables us to tackle even larger problems: we study the evolution of influenza H1N1 glycosylations on around 900 viruses. For broader applicability, we extend the phylogenetic probit model to incorporate categorical traits, and demonstrate its use to study Aquilegia flower and pollinator co-evolution.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
PLoS Computational Biology
PLoS Computational Biology 生物-生化研究方法
CiteScore
7.10
自引率
4.70%
发文量
820
期刊介绍: PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery. Research articles must be declared as belonging to a relevant section. More information about the sections can be found in the submission guidelines. Research articles should model aspects of biological systems, demonstrate both methodological and scientific novelty, and provide profound new biological insights. Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies. Inclusion of experimental validation is not required for publication, but should be referenced where possible. Inclusion of experimental validation of a modest biological discovery through computation does not render a manuscript suitable for PLOS Computational Biology. Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities.
期刊最新文献
Real-time forecasting of COVID-19-related hospital strain in France using a non-Markovian mechanistic model. Ten simple rules for teaching an introduction to R Evolutionary analyses of intrinsically disordered regions reveal widespread signals of conservation. A weak coupling mechanism for the early steps of the recovery stroke of myosin VI: A free energy simulation and string method analysis. Validity conditions of approximations for a target-mediated drug disposition model: A novel first-order approximation and its comparison to other approximations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1