Using phylogenetic summary statistics for epidemiological inference

Rafael C. Núñez, Gregory R. Hart, Michael Famulare, Christopher Lorton, Joshua T. Herbeck
{"title":"Using phylogenetic summary statistics for epidemiological inference","authors":"Rafael C. Núñez, Gregory R. Hart, Michael Famulare, Christopher Lorton, Joshua T. Herbeck","doi":"10.1101/2024.08.07.607080","DOIUrl":null,"url":null,"abstract":"Since the coining of the term phylodynamics, the use of phylogenies to understand infectious disease dynamics has steadily increased. As methods for phylodynamics and genomic epidemiology have proliferated and grown more computationally expensive, the epidemiological information they extract has also evolved to better complement what can be learned through traditional epidemiological data. However, for genomic epidemiology to continue to grow, and for the accumulating number of pathogen genetic sequences to fulfill their potential widespread utility, the extraction of epidemiological information from phylogenies needs to be simpler and more efficient. Summary statistics provide a straightforward way of extracting information from a phylogenetic tree, but the relationship between these statistics and epidemiological quantities needs to be better understood. In this work we address this need via simulation. Using two different benchmark scenarios, we evaluate 74 tree summary statistics and their relationship to epidemiological quantities. In addition to evaluating the epidemiological information that can be inferred from each summary statistic, we also assess the computational cost of each statistic. This helps us optimize the selection of summary statistics for specific applications. Our study offers guidelines on essential considerations for designing or choosing summary statistics. The evaluated set of summary statistics, along with additional helpful functions for phylogenetic analysis, is accessible through an open-source Python library. Our research not only illuminates the main characteristics of many tree summary statistics but also provides valuable computational tools for real-world epidemiological analyses. These contributions aim to enhance our understanding of disease spread dynamics and advance the broader utilization of genomic epidemiology in public health efforts.","PeriodicalId":501246,"journal":{"name":"bioRxiv - Genetics","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Genetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.07.607080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Since the coining of the term phylodynamics, the use of phylogenies to understand infectious disease dynamics has steadily increased. As methods for phylodynamics and genomic epidemiology have proliferated and grown more computationally expensive, the epidemiological information they extract has also evolved to better complement what can be learned through traditional epidemiological data. However, for genomic epidemiology to continue to grow, and for the accumulating number of pathogen genetic sequences to fulfill their potential widespread utility, the extraction of epidemiological information from phylogenies needs to be simpler and more efficient. Summary statistics provide a straightforward way of extracting information from a phylogenetic tree, but the relationship between these statistics and epidemiological quantities needs to be better understood. In this work we address this need via simulation. Using two different benchmark scenarios, we evaluate 74 tree summary statistics and their relationship to epidemiological quantities. In addition to evaluating the epidemiological information that can be inferred from each summary statistic, we also assess the computational cost of each statistic. This helps us optimize the selection of summary statistics for specific applications. Our study offers guidelines on essential considerations for designing or choosing summary statistics. The evaluated set of summary statistics, along with additional helpful functions for phylogenetic analysis, is accessible through an open-source Python library. Our research not only illuminates the main characteristics of many tree summary statistics but also provides valuable computational tools for real-world epidemiological analyses. These contributions aim to enhance our understanding of disease spread dynamics and advance the broader utilization of genomic epidemiology in public health efforts.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用系统发生学汇总统计进行流行病学推断
自从提出系统动力学这一术语以来,利用系统发生学来了解传染病动态的情况稳步增加。随着系统动力学和基因组流行病学方法的增多和计算成本的增加,它们所提取的流行病学信息也在不断发展,以更好地补充通过传统流行病学数据所能了解到的信息。然而,要使基因组流行病学继续发展,并使不断积累的病原体基因序列发挥其潜在的广泛作用,从系统发生学中提取流行病学信息的工作就必须更加简单、高效。摘要统计提供了一种从系统发生树中提取信息的直接方法,但这些统计与流行病学数量之间的关系需要更好地理解。在这项工作中,我们通过模拟来满足这一需求。利用两种不同的基准方案,我们评估了 74 个系统树汇总统计量及其与流行病学数量之间的关系。除了评估可从每个汇总统计量推断出的流行病学信息外,我们还评估了每个统计量的计算成本。这有助于我们优化特定应用中汇总统计量的选择。我们的研究为设计或选择汇总统计的基本考虑因素提供了指导。通过开源 Python 库可以访问经过评估的汇总统计集以及其他对系统发育分析有帮助的函数。我们的研究不仅阐明了许多树状汇总统计的主要特点,还为现实世界的流行病学分析提供了宝贵的计算工具。这些贡献旨在加强我们对疾病传播动态的了解,并推动在公共卫生工作中更广泛地利用基因组流行病学。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multiplexed spatial mapping of chromatin features, transcriptome, and proteins in tissues Mitochondrial superoxide acts in the intestine to extend longevity AyurPhenoClusters define common molecular roots for rare diseases and uncover ciliary dysfunctions in syndromic conditions Screening and identification of gene expression in large cohorts of clinical lung cancer samples unveils the major involvement of EZH2 and SOX2 LncRNA TAAL is a Modulator of Tie1-Mediated Vascular Function in Diabetic Retinopathy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1