Scaling Hawkes processes to one million COVID-19 cases

Seyoon Ko, Marc A. Suchard, Andrew J. Holbrook
{"title":"Scaling Hawkes processes to one million COVID-19 cases","authors":"Seyoon Ko, Marc A. Suchard, Andrew J. Holbrook","doi":"arxiv-2407.11349","DOIUrl":null,"url":null,"abstract":"Hawkes stochastic point process models have emerged as valuable statistical\ntools for analyzing viral contagion. The spatiotemporal Hawkes process\ncharacterizes the speeds at which viruses spread within human populations.\nUnfortunately, likelihood-based inference using these models requires $O(N^2)$\nfloating-point operations, for $N$ the number of observed cases. Recent work\nresponds to the Hawkes likelihood's computational burden by developing\nefficient graphics processing unit (GPU)-based routines that enable Bayesian\nanalysis of tens-of-thousands of observations. We build on this work and\ndevelop a high-performance computing (HPC) strategy that divides 30 Markov\nchains between 4 GPU nodes, each of which uses multiple GPUs to accelerate its\nchain's likelihood computations. We use this framework to apply two\nspatiotemporal Hawkes models to the analysis of one million COVID-19 cases in\nthe United States between March 2020 and June 2023. In addition to brute-force\nHPC, we advocate for two simple strategies as scalable alternatives to\nsuccessful approaches proposed for small data settings. First, we use known\ncounty-specific population densities to build a spatially varying triggering\nkernel in a manner that avoids computationally costly nearest neighbors search.\nSecond, we use a cut-posterior inference routine that accounts for infections'\nspatial location uncertainty by iteratively sampling latent locations uniformly\nwithin their respective counties of occurrence, thereby avoiding full-blown\nlatent variable inference for 1,000,000 infection locations.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.11349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Hawkes stochastic point process models have emerged as valuable statistical tools for analyzing viral contagion. The spatiotemporal Hawkes process characterizes the speeds at which viruses spread within human populations. Unfortunately, likelihood-based inference using these models requires $O(N^2)$ floating-point operations, for $N$ the number of observed cases. Recent work responds to the Hawkes likelihood's computational burden by developing efficient graphics processing unit (GPU)-based routines that enable Bayesian analysis of tens-of-thousands of observations. We build on this work and develop a high-performance computing (HPC) strategy that divides 30 Markov chains between 4 GPU nodes, each of which uses multiple GPUs to accelerate its chain's likelihood computations. We use this framework to apply two spatiotemporal Hawkes models to the analysis of one million COVID-19 cases in the United States between March 2020 and June 2023. In addition to brute-force HPC, we advocate for two simple strategies as scalable alternatives to successful approaches proposed for small data settings. First, we use known county-specific population densities to build a spatially varying triggering kernel in a manner that avoids computationally costly nearest neighbors search. Second, we use a cut-posterior inference routine that accounts for infections' spatial location uncertainty by iteratively sampling latent locations uniformly within their respective counties of occurrence, thereby avoiding full-blown latent variable inference for 1,000,000 infection locations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
将霍克斯过程扩展到一百万个 COVID-19 病例
霍克斯随机点过程模型已成为分析病毒传染的重要统计工具。时空霍克斯过程描述了病毒在人类种群中的传播速度。不幸的是,使用这些模型进行基于似然法的推断需要 $O(N^2)$ 的浮点运算,而 $N$ 是观察到的病例数。最近的工作通过开发基于图形处理器(GPU)的高效例程来解决霍克斯似然法的计算负担问题,这些例程可以对数以万计的观测数据进行贝叶斯分析。我们在此基础上开发了一种高性能计算(HPC)策略,将 30 个马尔可夫链划分为 4 个 GPU 节点,每个节点使用多个 GPU 加速其链的似然计算。我们利用这一框架将两个时空霍克斯模型应用于分析 2020 年 3 月至 2023 年 6 月期间美国的 100 万 COVID-19 病例。除了 "蛮力高性能计算"(brute-forceHPC)外,我们还主张采用两种简单的策略,作为针对小数据环境提出的成功方法的可扩展替代方案。首先,我们使用已知的特定县域人口密度来构建空间变化的触发核,这种方式避免了计算成本高昂的近邻搜索。其次,我们使用切后置推断例程,通过在各自的发生县域内均匀地迭代采样潜伏位置来考虑感染的空间位置不确定性,从而避免了对 1,000,000 个感染位置进行全吹式潜伏变量推断。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Model-Embedded Gaussian Process Regression for Parameter Estimation in Dynamical System Effects of the entropy source on Monte Carlo simulations A Robust Approach to Gaussian Processes Implementation HJ-sampler: A Bayesian sampler for inverse problems of a stochastic process by leveraging Hamilton-Jacobi PDEs and score-based generative models Reducing Shape-Graph Complexity with Application to Classification of Retinal Blood Vessels and Neurons
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1