Using phylogenetic summary statistics for epidemiological inference

bioRxiv - Genetics Pub Date : 2024-08-07 DOI:10.1101/2024.08.07.607080

Rafael C. Núñez, Gregory R. Hart, Michael Famulare, Christopher Lorton, Joshua T. Herbeck

{"title":"Using phylogenetic summary statistics for epidemiological inference","authors":"Rafael C. Núñez, Gregory R. Hart, Michael Famulare, Christopher Lorton, Joshua T. Herbeck","doi":"10.1101/2024.08.07.607080","DOIUrl":null,"url":null,"abstract":"Since the coining of the term phylodynamics, the use of phylogenies to understand infectious disease dynamics has steadily increased. As methods for phylodynamics and genomic epidemiology have proliferated and grown more computationally expensive, the epidemiological information they extract has also evolved to better complement what can be learned through traditional epidemiological data. However, for genomic epidemiology to continue to grow, and for the accumulating number of pathogen genetic sequences to fulfill their potential widespread utility, the extraction of epidemiological information from phylogenies needs to be simpler and more efficient. Summary statistics provide a straightforward way of extracting information from a phylogenetic tree, but the relationship between these statistics and epidemiological quantities needs to be better understood. In this work we address this need via simulation. Using two different benchmark scenarios, we evaluate 74 tree summary statistics and their relationship to epidemiological quantities. In addition to evaluating the epidemiological information that can be inferred from each summary statistic, we also assess the computational cost of each statistic. This helps us optimize the selection of summary statistics for specific applications. Our study offers guidelines on essential considerations for designing or choosing summary statistics. The evaluated set of summary statistics, along with additional helpful functions for phylogenetic analysis, is accessible through an open-source Python library. Our research not only illuminates the main characteristics of many tree summary statistics but also provides valuable computational tools for real-world epidemiological analyses. These contributions aim to enhance our understanding of disease spread dynamics and advance the broader utilization of genomic epidemiology in public health efforts.","PeriodicalId":501246,"journal":{"name":"bioRxiv - Genetics","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Genetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.07.607080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Since the coining of the term phylodynamics, the use of phylogenies to understand infectious disease dynamics has steadily increased. As methods for phylodynamics and genomic epidemiology have proliferated and grown more computationally expensive, the epidemiological information they extract has also evolved to better complement what can be learned through traditional epidemiological data. However, for genomic epidemiology to continue to grow, and for the accumulating number of pathogen genetic sequences to fulfill their potential widespread utility, the extraction of epidemiological information from phylogenies needs to be simpler and more efficient. Summary statistics provide a straightforward way of extracting information from a phylogenetic tree, but the relationship between these statistics and epidemiological quantities needs to be better understood. In this work we address this need via simulation. Using two different benchmark scenarios, we evaluate 74 tree summary statistics and their relationship to epidemiological quantities. In addition to evaluating the epidemiological information that can be inferred from each summary statistic, we also assess the computational cost of each statistic. This helps us optimize the selection of summary statistics for specific applications. Our study offers guidelines on essential considerations for designing or choosing summary statistics. The evaluated set of summary statistics, along with additional helpful functions for phylogenetic analysis, is accessible through an open-source Python library. Our research not only illuminates the main characteristics of many tree summary statistics but also provides valuable computational tools for real-world epidemiological analyses. These contributions aim to enhance our understanding of disease spread dynamics and advance the broader utilization of genomic epidemiology in public health efforts.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用系统发生学汇总统计进行流行病学推断

自从提出系统动力学这一术语以来，利用系统发生学来了解传染病动态的情况稳步增加。随着系统动力学和基因组流行病学方法的增多和计算成本的增加，它们所提取的流行病学信息也在不断发展，以更好地补充通过传统流行病学数据所能了解到的信息。然而，要使基因组流行病学继续发展，并使不断积累的病原体基因序列发挥其潜在的广泛作用，从系统发生学中提取流行病学信息的工作就必须更加简单、高效。摘要统计提供了一种从系统发生树中提取信息的直接方法，但这些统计与流行病学数量之间的关系需要更好地理解。在这项工作中，我们通过模拟来满足这一需求。利用两种不同的基准方案，我们评估了 74 个系统树汇总统计量及其与流行病学数量之间的关系。除了评估可从每个汇总统计量推断出的流行病学信息外，我们还评估了每个统计量的计算成本。这有助于我们优化特定应用中汇总统计量的选择。我们的研究为设计或选择汇总统计的基本考虑因素提供了指导。通过开源 Python 库可以访问经过评估的汇总统计集以及其他对系统发育分析有帮助的函数。我们的研究不仅阐明了许多树状汇总统计的主要特点，还为现实世界的流行病学分析提供了宝贵的计算工具。这些贡献旨在加强我们对疾病传播动态的了解，并推动在公共卫生工作中更广泛地利用基因组流行病学。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

bioRxiv - Genetics

自引率

0.00%

发文量