根据基因序列和不确定的感染时间贝叶斯法重建传播树。

IF 0.8 4区 数学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Statistical Applications in Genetics and Molecular Biology Pub Date : 2020-10-21 DOI:10.1515/sagmb-2019-0026
Hesam Montazeri, Susan Little, Mozhgan Mozaffarilegha, Niko Beerenwinkel, Victor DeGruttola
{"title":"根据基因序列和不确定的感染时间贝叶斯法重建传播树。","authors":"Hesam Montazeri, Susan Little, Mozhgan Mozaffarilegha, Niko Beerenwinkel, Victor DeGruttola","doi":"10.1515/sagmb-2019-0026","DOIUrl":null,"url":null,"abstract":"<p><p>Genetic sequence data of pathogens are increasingly used to investigate transmission dynamics in both endemic diseases and disease outbreaks. Such research can aid in the development of appropriate interventions and in the design of studies to evaluate them. Several computational methods have been proposed to infer transmission chains from sequence data; however, existing methods do not generally reliably reconstruct transmission trees because genetic sequence data or inferred phylogenetic trees from such data contain insufficient information for accurate estimation of transmission chains. Here, we show by simulation studies that incorporating infection times, even when they are uncertain, can greatly improve the accuracy of reconstruction of transmission trees. To achieve this improvement, we propose a Bayesian inference methods using Markov chain Monte Carlo that directly draws samples from the space of transmission trees under the assumption of complete sampling of the outbreak. The likelihood of each transmission tree is computed by a phylogenetic model by treating its internal nodes as transmission events. By a simulation study, we demonstrate that accuracy of the reconstructed transmission trees depends mainly on the amount of information available on times of infection; we show superiority of the proposed method to two alternative approaches when infection times are known up to specified degrees of certainty. In addition, we illustrate the use of a multiple imputation framework to study features of epidemic dynamics, such as the relationship between characteristics of nodes and average number of outbound edges or inbound edges, signifying possible transmission events from and to nodes. We apply the proposed method to a transmission cluster in San Diego and to a dataset from the 2014 Sierra Leone Ebola virus outbreak and investigate the impact of biological, behavioral, and demographic factors.</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":" ","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8212962/pdf/nihms-1709644.pdf","citationCount":"0","resultStr":"{\"title\":\"Bayesian reconstruction of transmission trees from genetic sequences and uncertain infection times.\",\"authors\":\"Hesam Montazeri, Susan Little, Mozhgan Mozaffarilegha, Niko Beerenwinkel, Victor DeGruttola\",\"doi\":\"10.1515/sagmb-2019-0026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Genetic sequence data of pathogens are increasingly used to investigate transmission dynamics in both endemic diseases and disease outbreaks. Such research can aid in the development of appropriate interventions and in the design of studies to evaluate them. Several computational methods have been proposed to infer transmission chains from sequence data; however, existing methods do not generally reliably reconstruct transmission trees because genetic sequence data or inferred phylogenetic trees from such data contain insufficient information for accurate estimation of transmission chains. Here, we show by simulation studies that incorporating infection times, even when they are uncertain, can greatly improve the accuracy of reconstruction of transmission trees. To achieve this improvement, we propose a Bayesian inference methods using Markov chain Monte Carlo that directly draws samples from the space of transmission trees under the assumption of complete sampling of the outbreak. The likelihood of each transmission tree is computed by a phylogenetic model by treating its internal nodes as transmission events. By a simulation study, we demonstrate that accuracy of the reconstructed transmission trees depends mainly on the amount of information available on times of infection; we show superiority of the proposed method to two alternative approaches when infection times are known up to specified degrees of certainty. In addition, we illustrate the use of a multiple imputation framework to study features of epidemic dynamics, such as the relationship between characteristics of nodes and average number of outbound edges or inbound edges, signifying possible transmission events from and to nodes. We apply the proposed method to a transmission cluster in San Diego and to a dataset from the 2014 Sierra Leone Ebola virus outbreak and investigate the impact of biological, behavioral, and demographic factors.</p>\",\"PeriodicalId\":48980,\"journal\":{\"name\":\"Statistical Applications in Genetics and Molecular Biology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2020-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8212962/pdf/nihms-1709644.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Applications in Genetics and Molecular Biology\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1515/sagmb-2019-0026\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2019-0026","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

病原体的基因序列数据越来越多地被用于研究地方病和疾病爆发的传播动态。此类研究有助于制定适当的干预措施,也有助于设计对其进行评估的研究。已经提出了几种计算方法来从序列数据中推断传播链;然而,现有的方法一般不能可靠地重建传播树,因为基因序列数据或从此类数据推断出的系统发生树包含的信息不足以准确估计传播链。在这里,我们通过模拟研究表明,即使感染时间不确定,纳入感染时间也能大大提高传播树重建的准确性。为了实现这一改进,我们提出了一种使用马尔科夫链蒙特卡洛的贝叶斯推理方法,在对疫情进行完全采样的假设下,直接从传播树空间中抽取样本。将每个传播树的内部节点视为传播事件,通过系统发生学模型计算其可能性。通过模拟研究,我们证明了重建传播树的准确性主要取决于可获得的感染时间信息量;当感染时间已知到指定的确定程度时,我们证明了所提出的方法优于两种替代方法。此外,我们还说明了如何使用多重归因框架来研究流行病动态特征,例如节点特征与出境边或入境边平均数量之间的关系,这意味着节点之间可能发生的传播事件。我们将提出的方法应用于圣地亚哥的一个传播集群和 2014 年塞拉利昂埃博拉病毒爆发的数据集,并研究了生物、行为和人口因素的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Bayesian reconstruction of transmission trees from genetic sequences and uncertain infection times.

Genetic sequence data of pathogens are increasingly used to investigate transmission dynamics in both endemic diseases and disease outbreaks. Such research can aid in the development of appropriate interventions and in the design of studies to evaluate them. Several computational methods have been proposed to infer transmission chains from sequence data; however, existing methods do not generally reliably reconstruct transmission trees because genetic sequence data or inferred phylogenetic trees from such data contain insufficient information for accurate estimation of transmission chains. Here, we show by simulation studies that incorporating infection times, even when they are uncertain, can greatly improve the accuracy of reconstruction of transmission trees. To achieve this improvement, we propose a Bayesian inference methods using Markov chain Monte Carlo that directly draws samples from the space of transmission trees under the assumption of complete sampling of the outbreak. The likelihood of each transmission tree is computed by a phylogenetic model by treating its internal nodes as transmission events. By a simulation study, we demonstrate that accuracy of the reconstructed transmission trees depends mainly on the amount of information available on times of infection; we show superiority of the proposed method to two alternative approaches when infection times are known up to specified degrees of certainty. In addition, we illustrate the use of a multiple imputation framework to study features of epidemic dynamics, such as the relationship between characteristics of nodes and average number of outbound edges or inbound edges, signifying possible transmission events from and to nodes. We apply the proposed method to a transmission cluster in San Diego and to a dataset from the 2014 Sierra Leone Ebola virus outbreak and investigate the impact of biological, behavioral, and demographic factors.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Statistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology BIOCHEMISTRY & MOLECULAR BIOLOGY-MATHEMATICAL & COMPUTATIONAL BIOLOGY
自引率
11.10%
发文量
8
期刊介绍: Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.
期刊最新文献
When is the allele-sharing dissimilarity between two populations exceeded by the allele-sharing dissimilarity of a population with itself? Sparse latent factor regression models for genome-wide and epigenome-wide association studies Low variability in the underlying cellular landscape adversely affects the performance of interaction-based approaches for conducting cell-specific analyses of DNA methylation in bulk samples. AdaReg: data adaptive robust estimation in linear regression with application in GTEx gene expressions. Collocation based training of neural ordinary differential equations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1