首页 > 最新文献

Foundations of data science (Springfield, Mo.)最新文献

英文 中文
An international initiative of predicting the SARS-CoV-2 pandemic using ensemble data assimilation 利用集合数据同化预测SARS-CoV-2大流行的国际倡议
Q2 MATHEMATICS, APPLIED Pub Date : 2020-12-11 DOI: 10.3934/fods.2021001
G. Evensen, Javier Amezcua, M. Bocquet, A. Carrassi, A. Farchi, A. Fowler, P. Houtekamer, C. Jones, R. Moraes, M. Pulido, C. Sampson, F. Vossepoel
This work demonstrates the efficiency of using iterative ensemble smoothers to estimate the parameters of an SEIR model. We have extended a standard SEIR model with age-classes and compartments of sick, hospitalized, and dead. The data conditioned on are the daily numbers of accumulated deaths and the number of hospitalized. Also, it is possible to condition the model on the number of cases obtained from testing. We start from a wide prior distribution for the model parameters; then, the ensemble conditioning leads to a posterior ensemble of estimated parameters yielding model predictions in close agreement with the observations. The updated ensemble of model simulations has predictive capabilities and include uncertainty estimates. In particular, we estimate the effective reproductive number as a function of time, and we can assess the impact of different intervention measures. By starting from the updated set of model parameters, we can make accurate short-term predictions of the epidemic development assuming knowledge of the future effective reproductive number. Also, the model system allows for the computation of long-term scenarios of the epidemic under different assumptions. We have applied the model system on data sets from several countries, i.e., the four European countries Norway, England, The Netherlands, and France; the province of Quebec in Canada; the South American countries Argentina and Brazil; and the four US states Alabama, North Carolina, California, and New York. These countries and states all have vastly different developments of the epidemic, and we could accurately model the SARS-CoV-2 outbreak in all of them. We realize that more complex models, e.g., with regional compartments, may be desirable, and we suggest that the approach used here should be applicable also for these models.
这项工作证明了使用迭代集成平滑器来估计SEIR模型参数的效率。我们扩展了一个标准的SEIR模型,该模型包含了患病、住院和死亡的年龄等级和隔间。以每日累计死亡人数和住院人数为条件的数据。此外,可以根据从测试中获得的病例数量来调整模型。我们从模型参数的广泛先验分布开始;然后,集合条件导致估计参数的后验集合,从而产生与观测结果非常一致的模型预测。更新后的模型模拟集合具有预测能力,并包括不确定性估计。特别是,我们将有效繁殖数量估计为时间的函数,我们可以评估不同干预措施的影响。通过从更新后的一组模型参数开始,假设知道未来的有效繁殖数,我们可以对疫情发展做出准确的短期预测。此外,该模型系统允许在不同假设下计算疫情的长期情景。我们已经将模型系统应用于几个国家的数据集,即四个欧洲国家挪威、英国、荷兰和法国;加拿大魁北克省;南美洲国家阿根廷和巴西;以及美国四个州阿拉巴马州、北卡罗来纳州、加利福尼亚州和纽约州。这些国家和州的疫情发展都大不相同,我们可以准确地模拟所有国家的严重急性呼吸系统综合征冠状病毒2型疫情。我们意识到,可能需要更复杂的模型,例如具有区域分区的模型,我们建议此处使用的方法也应适用于这些模型。
{"title":"An international initiative of predicting the SARS-CoV-2 pandemic using ensemble data assimilation","authors":"G. Evensen, Javier Amezcua, M. Bocquet, A. Carrassi, A. Farchi, A. Fowler, P. Houtekamer, C. Jones, R. Moraes, M. Pulido, C. Sampson, F. Vossepoel","doi":"10.3934/fods.2021001","DOIUrl":"https://doi.org/10.3934/fods.2021001","url":null,"abstract":"This work demonstrates the efficiency of using iterative ensemble smoothers to estimate the parameters of an SEIR model. We have extended a standard SEIR model with age-classes and compartments of sick, hospitalized, and dead. The data conditioned on are the daily numbers of accumulated deaths and the number of hospitalized. Also, it is possible to condition the model on the number of cases obtained from testing. We start from a wide prior distribution for the model parameters; then, the ensemble conditioning leads to a posterior ensemble of estimated parameters yielding model predictions in close agreement with the observations. The updated ensemble of model simulations has predictive capabilities and include uncertainty estimates. In \u0000particular, we estimate the effective reproductive number as a function of time, and we can assess the impact of different intervention measures. By starting from the updated set of model parameters, we can make accurate short-term predictions of the epidemic development assuming \u0000knowledge of the future effective reproductive number. Also, the model system allows for the computation of long-term scenarios of the epidemic under different assumptions. We have applied the model system on data sets from several countries, i.e., the four European countries Norway, England, The Netherlands, and France; the province of Quebec in Canada; the South American countries Argentina and Brazil; and the four US states Alabama, North Carolina, California, and New York. These countries and states all have vastly different developments of the epidemic, and we could accurately model the SARS-CoV-2 outbreak in all of them. We realize that more complex models, e.g., with regional compartments, may be desirable, and we suggest that the approach used here should be applicable also for these models.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43519659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A surrogate-based approach to nonlinear, non-Gaussian joint state-parameter data assimilation 一种基于代理的非线性非高斯联合状态参数数据同化方法
Q2 MATHEMATICS, APPLIED Pub Date : 2020-12-08 DOI: 10.3934/fods.2021019
J. Maclean, E. Spiller
Many recent advances in sequential assimilation of data into nonlinear high-dimensional models are modifications to particle filters which employ efficient searches of a high-dimensional state space. In this work, we present a complementary strategy that combines statistical emulators and particle filters. The emulators are used to learn and offer a computationally cheap approximation to the forward dynamic mapping. This emulator-particle filter (Emu-PF) approach requires a modest number of forward-model runs, but yields well-resolved posterior distributions even in non-Gaussian cases. We explore several modifications to the Emu-PF that utilize mechanisms for dimension reduction to efficiently fit the statistical emulator, and present a series of simulation experiments on an atypical Lorenz-96 system to demonstrate their performance. We conclude with a discussion on how the Emu-PF can be paired with modern particle filtering algorithms.
在将数据序贯同化到非线性高维模型方面的许多最新进展是对粒子滤波的改进,粒子滤波利用高维状态空间的有效搜索。在这项工作中,我们提出了一种结合统计模拟器和粒子滤波器的互补策略。仿真器用于学习并提供一个计算成本低廉的前向动态映射近似值。这种仿真粒子滤波(Emu-PF)方法需要少量的前向模型运行,但即使在非高斯情况下也能产生很好的后验分布。我们探索了Emu-PF的几种修改,利用降维机制来有效地适应统计模拟器,并在非典型Lorenz-96系统上进行了一系列仿真实验来证明它们的性能。最后,我们讨论了如何将Emu-PF与现代粒子滤波算法配对。
{"title":"A surrogate-based approach to nonlinear, non-Gaussian joint state-parameter data assimilation","authors":"J. Maclean, E. Spiller","doi":"10.3934/fods.2021019","DOIUrl":"https://doi.org/10.3934/fods.2021019","url":null,"abstract":"Many recent advances in sequential assimilation of data into nonlinear high-dimensional models are modifications to particle filters which employ efficient searches of a high-dimensional state space. In this work, we present a complementary strategy that combines statistical emulators and particle filters. The emulators are used to learn and offer a computationally cheap approximation to the forward dynamic mapping. This emulator-particle filter (Emu-PF) approach requires a modest number of forward-model runs, but yields well-resolved posterior distributions even in non-Gaussian cases. We explore several modifications to the Emu-PF that utilize mechanisms for dimension reduction to efficiently fit the statistical emulator, and present a series of simulation experiments on an atypical Lorenz-96 system to demonstrate their performance. We conclude with a discussion on how the Emu-PF can be paired with modern particle filtering algorithms.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48331060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Estimating linear response statistics using orthogonal polynomials: An rkhs formulation 估计线性响应统计使用正交多项式:一个rkhs公式
Q2 MATHEMATICS, APPLIED Pub Date : 2020-12-08 DOI: 10.3934/fods.2020021
He Zhang, J. Harlim, Xiantao Li
We study the problem of estimating linear response statistics under external perturbations using time series of unperturbed dynamics. Based on the fluctuation-dissipation theory, this problem is reformulated as an unsupervised learning task of estimating a density function. We consider a nonparametric density estimator formulated by the kernel embedding of distributions with "Mercer-type" kernels, constructed based on the classical orthogonal polynomials defined on non-compact domains. While the resulting representation is analogous to Polynomial Chaos Expansion (PCE), the connection to the reproducing kernel Hilbert space (RKHS) theory allows one to establish the uniform convergence of the estimator and to systematically address a practical question of identifying the PCE basis for a consistent estimation. We also provide practical conditions for the well-posedness of not only the estimator but also of the underlying response statistics. Finally, we provide a statistical error bound for the density estimation that accounts for the Monte-Carlo averaging over non-i.i.d time series and the biases due to a finite basis truncation. This error bound provides a means to understand the feasibility as well as limitation of the kernel embedding with Mercer-type kernels. Numerically, we verify the effectiveness of the estimator on two stochastic dynamics with known, yet, non-trivial equilibrium densities.
研究了利用无扰动动力学时间序列估计外部扰动下线性响应统计量的问题。基于涨落耗散理论,将该问题重新表述为一个估计密度函数的无监督学习任务。我们考虑了一个非参数密度估计量,它是基于定义在非紧域上的经典正交多项式,由具有“mercer型”核的分布的核嵌入来表示的。虽然结果表示类似于多项式混沌展开(PCE),但与再现核希尔伯特空间(RKHS)理论的联系允许人们建立估计量的一致收敛性,并系统地解决识别一致估计的PCE基础的实际问题。我们还为估计量和底层响应统计量的适定性提供了实际条件。最后,我们为密度估计提供了一个统计误差界,它解释了非i -i上的蒙特卡罗平均。D时间序列和有限基截断引起的偏差。这个错误界为理解用mercer型核嵌入核的可行性和局限性提供了一种方法。在数值上,我们验证了估计器在两个随机动力学上的有效性,这些随机动力学具有已知的非平凡平衡密度。
{"title":"Estimating linear response statistics using orthogonal polynomials: An rkhs formulation","authors":"He Zhang, J. Harlim, Xiantao Li","doi":"10.3934/fods.2020021","DOIUrl":"https://doi.org/10.3934/fods.2020021","url":null,"abstract":"We study the problem of estimating linear response statistics under external perturbations using time series of unperturbed dynamics. Based on the fluctuation-dissipation theory, this problem is reformulated as an unsupervised learning task of estimating a density function. We consider a nonparametric density estimator formulated by the kernel embedding of distributions with \"Mercer-type\" kernels, constructed based on the classical orthogonal polynomials defined on non-compact domains. While the resulting representation is analogous to Polynomial Chaos Expansion (PCE), the connection to the reproducing kernel Hilbert space (RKHS) theory allows one to establish the uniform convergence of the estimator and to systematically address a practical question of identifying the PCE basis for a consistent estimation. We also provide practical conditions for the well-posedness of not only the estimator but also of the underlying response statistics. Finally, we provide a statistical error bound for the density estimation that accounts for the Monte-Carlo averaging over non-i.i.d time series and the biases due to a finite basis truncation. This error bound provides a means to understand the feasibility as well as limitation of the kernel embedding with Mercer-type kernels. Numerically, we verify the effectiveness of the estimator on two stochastic dynamics with known, yet, non-trivial equilibrium densities.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48255540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
ANAPT: Additive noise analysis for persistence thresholding 持久性阈值的加性噪声分析
Q2 MATHEMATICS, APPLIED Pub Date : 2020-12-07 DOI: 10.3934/fods.2022005
Audun D. Myers, Firas A. Khasawneh, Brittany Terese Fasy
We introduce a novel method for Additive Noise Analysis for Persistence Thresholding (ANAPT) which separates significant features in the sublevel set persistence diagram of a time series based on a statistics analysis of the persistence of a noise distribution. Specifically, we consider an additive noise model and leverage the statistical analysis to provide a noise cutoff or confidence interval in the persistence diagram for the observed time series. This analysis is done for several common noise models including Gaussian, uniform, exponential, and Rayleigh distributions. ANAPT is computationally efficient, does not require any signal pre-filtering, is widely applicable, and has open-source software available. We demonstrate the functionality of ANAPT with both numerically simulated examples and an experimental data set. Additionally, we provide an efficient begin{document}$ Theta(nlog(n)) $end{document} algorithm for calculating the zero-dimensional sublevel set persistence homology.
We introduce a novel method for Additive Noise Analysis for Persistence Thresholding (ANAPT) which separates significant features in the sublevel set persistence diagram of a time series based on a statistics analysis of the persistence of a noise distribution. Specifically, we consider an additive noise model and leverage the statistical analysis to provide a noise cutoff or confidence interval in the persistence diagram for the observed time series. This analysis is done for several common noise models including Gaussian, uniform, exponential, and Rayleigh distributions. ANAPT is computationally efficient, does not require any signal pre-filtering, is widely applicable, and has open-source software available. We demonstrate the functionality of ANAPT with both numerically simulated examples and an experimental data set. Additionally, we provide an efficient begin{document}$ Theta(nlog(n)) $end{document} algorithm for calculating the zero-dimensional sublevel set persistence homology.
{"title":"ANAPT: Additive noise analysis for persistence thresholding","authors":"Audun D. Myers, Firas A. Khasawneh, Brittany Terese Fasy","doi":"10.3934/fods.2022005","DOIUrl":"https://doi.org/10.3934/fods.2022005","url":null,"abstract":"We introduce a novel method for Additive Noise Analysis for Persistence Thresholding (ANAPT) which separates significant features in the sublevel set persistence diagram of a time series based on a statistics analysis of the persistence of a noise distribution. Specifically, we consider an additive noise model and leverage the statistical analysis to provide a noise cutoff or confidence interval in the persistence diagram for the observed time series. This analysis is done for several common noise models including Gaussian, uniform, exponential, and Rayleigh distributions. ANAPT is computationally efficient, does not require any signal pre-filtering, is widely applicable, and has open-source software available. We demonstrate the functionality of ANAPT with both numerically simulated examples and an experimental data set. Additionally, we provide an efficient begin{document}$ Theta(nlog(n)) $end{document} algorithm for calculating the zero-dimensional sublevel set persistence homology.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44284181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mean field limit of Ensemble Square Root filters - discrete and continuous time 集合平方根滤波器的平均场极限-离散和连续时间
Q2 MATHEMATICS, APPLIED Pub Date : 2020-11-20 DOI: 10.3934/FODS.2021003
Theresa Lange, W. Stannat
Consider the class of Ensemble Square Root filtering algorithms for the numerical approximation of the posterior distribution of nonlinear Markovian signals partially observed with linear observations corrupted with independent measurement noise. We analyze the asymptotic behavior of these algorithms in the large ensemble limit both in discrete and continuous time. We identify limiting mean-field processes on the level of the ensemble members, prove corresponding propagation of chaos results and derive associated convergence rates in terms of the ensemble size. In continuous time we also identify the stochastic partial differential equation driving the distribution of the mean-field process and perform a comparison with the Kushner-Stratonovich equation.
考虑一类集成平方根滤波算法,用于非线性马尔可夫信号部分观测到的后验分布的数值逼近,线性观测被独立测量噪声破坏。我们分析了这些算法在离散时间和连续时间的大集合极限下的渐近行为。我们在集合成员的水平上确定了极限平均场过程,证明了混沌结果的相应传播,并根据集合大小推导了相关的收敛速率。在连续时间条件下,我们还确定了驱动平均场过程分布的随机偏微分方程,并与Kushner-Stratonovich方程进行了比较。
{"title":"Mean field limit of Ensemble Square Root filters - discrete and continuous time","authors":"Theresa Lange, W. Stannat","doi":"10.3934/FODS.2021003","DOIUrl":"https://doi.org/10.3934/FODS.2021003","url":null,"abstract":"Consider the class of Ensemble Square Root filtering algorithms for the numerical approximation of the posterior distribution of nonlinear Markovian signals partially observed with linear observations corrupted with independent measurement noise. We analyze the asymptotic behavior of these algorithms in the large ensemble limit both in discrete and continuous time. We identify limiting mean-field processes on the level of the ensemble members, prove corresponding propagation of chaos results and derive associated convergence rates in terms of the ensemble size. In continuous time we also identify the stochastic partial differential equation driving the distribution of the mean-field process and perform a comparison with the Kushner-Stratonovich equation.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"50 14","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41267351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Feedback particle filter for collective inference 用于集体推理的反馈粒子滤波器
Q2 MATHEMATICS, APPLIED Pub Date : 2020-10-13 DOI: 10.3934/fods.2021018
Jin W. Kim, P. Mehta

The purpose of this paper is to describe the feedback particle filter algorithm for problems where there are a large number (begin{document}$ M $end{document}) of non-interacting agents (targets) with a large number (begin{document}$ M $end{document}) of non-agent specific observations (measurements) that originate from these agents. In its basic form, the problem is characterized by data association uncertainty whereby the association between the observations and agents must be deduced in addition to the agent state. In this paper, the large-begin{document}$ M $end{document} limit is interpreted as a problem of collective inference. This viewpoint is used to derive the equation for the empirical distribution of the hidden agent states. A feedback particle filter (FPF) algorithm for this problem is presented and illustrated via numerical simulations. Results are presented for the Euclidean and the finite state-space cases, both in continuous-time settings. The classical FPF algorithm is shown to be the special case (with begin{document}$ M = 1 $end{document}) of these more general results. The simulations help show that the algorithm well approximates the empirical distribution of the hidden states for large begin{document}$ M $end{document}.

The purpose of this paper is to describe the feedback particle filter algorithm for problems where there are a large number (begin{document}$ M $end{document}) of non-interacting agents (targets) with a large number (begin{document}$ M $end{document}) of non-agent specific observations (measurements) that originate from these agents. In its basic form, the problem is characterized by data association uncertainty whereby the association between the observations and agents must be deduced in addition to the agent state. In this paper, the large-begin{document}$ M $end{document} limit is interpreted as a problem of collective inference. This viewpoint is used to derive the equation for the empirical distribution of the hidden agent states. A feedback particle filter (FPF) algorithm for this problem is presented and illustrated via numerical simulations. Results are presented for the Euclidean and the finite state-space cases, both in continuous-time settings. The classical FPF algorithm is shown to be the special case (with begin{document}$ M = 1 $end{document}) of these more general results. The simulations help show that the algorithm well approximates the empirical distribution of the hidden states for large begin{document}$ M $end{document}.
{"title":"Feedback particle filter for collective inference","authors":"Jin W. Kim, P. Mehta","doi":"10.3934/fods.2021018","DOIUrl":"https://doi.org/10.3934/fods.2021018","url":null,"abstract":"<p style='text-indent:20px;'>The purpose of this paper is to describe the feedback particle filter algorithm for problems where there are a large number (<inline-formula><tex-math id=\"M1\">begin{document}$ M $end{document}</tex-math></inline-formula>) of non-interacting agents (targets) with a large number (<inline-formula><tex-math id=\"M2\">begin{document}$ M $end{document}</tex-math></inline-formula>) of non-agent specific observations (measurements) that originate from these agents. In its basic form, the problem is characterized by data association uncertainty whereby the association between the observations and agents must be deduced in addition to the agent state. In this paper, the large-<inline-formula><tex-math id=\"M3\">begin{document}$ M $end{document}</tex-math></inline-formula> limit is interpreted as a problem of collective inference. This viewpoint is used to derive the equation for the empirical distribution of the hidden agent states. A feedback particle filter (FPF) algorithm for this problem is presented and illustrated via numerical simulations. Results are presented for the Euclidean and the finite state-space cases, both in continuous-time settings. The classical FPF algorithm is shown to be the special case (with <inline-formula><tex-math id=\"M4\">begin{document}$ M = 1 $end{document}</tex-math></inline-formula>) of these more general results. The simulations help show that the algorithm well approximates the empirical distribution of the hidden states for large <inline-formula><tex-math id=\"M5\">begin{document}$ M $end{document}</tex-math></inline-formula>.</p>","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46470057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Multiple hypothesis testing with persistent homology 具有持久同源性的多重假设检验
Q2 MATHEMATICS, APPLIED Pub Date : 2020-10-10 DOI: 10.3934/fods.2022018
Mikael Vejdemo-Johansson, Sayan Mukherjee
Multiple hypothesis testing requires a control procedure: the error probabilities in statistical testing compound when several tests are performed for the same conclusion. A common type of multiple hypothesis testing error rates is the FamilyWise Error Rate (FWER) which measures the probability that any one of the performed tests rejects its null hypothesis erroneously. These are often controlled using Bonferroni’s method or later more sophisticated approaches all of which involve replacing the test level α with α/k, reducing it by a factor of the number of simultaneous tests performed. Common paradigms for hypothesis testing in persistent homology are often based on permutation testing, however increasing the number of permutations to meet a Bonferroni-style threshold can be prohibitively expensive. In this paper we propose a null model based approach to testing for acyclicity (ie trivial homology), coupled with a Family-Wise Error Rate (FWER) control method that does not suffer from these computational costs.
多重假设检验需要一个控制程序:当对同一结论进行多次检验时,统计检验中的错误概率。多假设测试错误率的一种常见类型是FamilyWise错误率(FWER),它测量任何一个执行的测试错误地拒绝其零假设的概率。这些通常使用Bonferroni的方法或后来更复杂的方法进行控制,所有这些方法都涉及用α/k代替测试水平α,将其减少一倍于同时进行的测试数量。持久同源性中假设检验的常见范式通常基于排列检验,然而,增加排列数量以满足Bonferroni风格的阈值可能代价高昂。在本文中,我们提出了一种基于零模型的方法来测试非循环性(即平凡同源性),并结合了一种不受这些计算成本影响的家族错误率(FWER)控制方法。
{"title":"Multiple hypothesis testing with persistent homology","authors":"Mikael Vejdemo-Johansson, Sayan Mukherjee","doi":"10.3934/fods.2022018","DOIUrl":"https://doi.org/10.3934/fods.2022018","url":null,"abstract":"Multiple hypothesis testing requires a control procedure: the error probabilities in statistical testing compound when several tests are performed for the same conclusion. A common type of multiple hypothesis testing error rates is the FamilyWise Error Rate (FWER) which measures the probability that any one of the performed tests rejects its null hypothesis erroneously. These are often controlled using Bonferroni’s method or later more sophisticated approaches all of which involve replacing the test level α with α/k, reducing it by a factor of the number of simultaneous tests performed. Common paradigms for hypothesis testing in persistent homology are often based on permutation testing, however increasing the number of permutations to meet a Bonferroni-style threshold can be prohibitively expensive. In this paper we propose a null model based approach to testing for acyclicity (ie trivial homology), coupled with a Family-Wise Error Rate (FWER) control method that does not suffer from these computational costs.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48362858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Wave-shape oscillatory model for nonstationary periodic time series analysis 非平稳周期时间序列分析的波形振荡模型
Q2 MATHEMATICS, APPLIED Pub Date : 2020-07-13 DOI: 10.3934/FODS.2021009
Yu-Ting Lin, John Malik, Hau‐Tieng Wu
The oscillations observed in many time series, particularly in biomedicine, exhibit morphological variations over time. These morphological variations are caused by intrinsic or extrinsic changes to the state of the generating system, henceforth referred to as dynamics. To model these time series (including and specifically pathophysiological ones) and estimate the underlying dynamics, we provide a novel wave-shape oscillatory model. In this model, time-dependent variations in cycle shape occur along a manifold called the wave-shape manifold. To estimate the wave-shape manifold associated with an oscillatory time series, study the dynamics, and visualize the time-dependent changes along the wave-shape manifold, we apply the well-established diffusion maps (DM) algorithm to the set of all observed oscillations. We provide a theoretical guarantee on the dynamical information recovered by the DM algorithm under the proposed model. Applying the proposed model and algorithm to arterial blood pressure (ABP) signals recorded during general anesthesia leads to the extraction of nociception information. Applying the wave-shape oscillatory model and the DM algorithm to cardiac cycles in the electrocardiogram (ECG) leads to ectopy detection and a new ECG-derived respiratory signal, even when the subject has atrial fibrillation.
在许多时间序列中观察到的振荡,特别是在生物医学中,表现出随时间的形态学变化。这些形态变化是由发电系统状态的内在或外在变化引起的,下文称为动力学。为了对这些时间序列(包括,特别是病理生理序列)进行建模并估计潜在的动力学,我们提供了一个新的波形振荡模型。在这个模型中,周期形状随时间的变化沿着一个称为波形流形的流形发生。为了估计与振荡时间序列相关的波形流形,研究动力学,并可视化沿波形流形的随时间变化,我们将公认的扩散图(DM)算法应用于所有观测到的振荡集。我们为DM算法在所提出的模型下恢复动态信息提供了理论保证。将所提出的模型和算法应用于全麻期间记录的动脉血压(ABP)信号,可以提取伤害感受信息。将波形振荡模型和DM算法应用于心电图(ECG)中的心动周期会导致异位检测和新的ECG衍生的呼吸信号,即使受试者患有心房颤动。
{"title":"Wave-shape oscillatory model for nonstationary periodic time series analysis","authors":"Yu-Ting Lin, John Malik, Hau‐Tieng Wu","doi":"10.3934/FODS.2021009","DOIUrl":"https://doi.org/10.3934/FODS.2021009","url":null,"abstract":"The oscillations observed in many time series, particularly in biomedicine, exhibit morphological variations over time. These morphological variations are caused by intrinsic or extrinsic changes to the state of the generating system, henceforth referred to as dynamics. To model these time series (including and specifically pathophysiological ones) and estimate the underlying dynamics, we provide a novel wave-shape oscillatory model. In this model, time-dependent variations in cycle shape occur along a manifold called the wave-shape manifold. To estimate the wave-shape manifold associated with an oscillatory time series, study the dynamics, and visualize the time-dependent changes along the wave-shape manifold, we apply the well-established diffusion maps (DM) algorithm to the set of all observed oscillations. We provide a theoretical guarantee on the dynamical information recovered by the DM algorithm under the proposed model. Applying the proposed model and algorithm to arterial blood pressure (ABP) signals recorded during general anesthesia leads to the extraction of nociception information. Applying the wave-shape oscillatory model and the DM algorithm to cardiac cycles in the electrocardiogram (ECG) leads to ectopy detection and a new ECG-derived respiratory signal, even when the subject has atrial fibrillation.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45887443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
The (homological) persistence of gerrymandering 选区划分不公的(同源)持续性
Q2 MATHEMATICS, APPLIED Pub Date : 2020-07-05 DOI: 10.3934/FODS.2021007
M. Duchin, Tom Needham, Thomas Weighill
We apply persistent homology, the dominant tool from the field of topological data analysis, to study electoral redistricting. Our method combines the geographic information from a political districting plan with election data to produce a persistence diagram. We are then able to visualize and analyze large ensembles of computer-generated districting plans of the type commonly used in modern redistricting research (and court challenges). We set out three applications: zoning a state at each scale of districting, comparing elections, and seeking signals of gerrymandering. Our case studies focus on redistricting in Pennsylvania and North Carolina, two states whose legal challenges to enacted plans have raised considerable public interest in the last few years. To address the question of robustness of the persistence diagrams to perturbations in vote data and in district boundaries, we translate the classical stability theorem of Cohen--Steiner et al. into our setting and find that it can be phrased in a manner that is easy to interpret. We accompany the theoretical bound with an empirical demonstration to illustrate diagram stability in practice.
我们应用拓扑数据分析领域的主要工具持久同源性来研究选举选区的重新划分。我们的方法将政治区划计划中的地理信息与选举数据相结合,生成持久图。然后,我们能够可视化和分析现代重新划分研究(和法庭挑战)中常用的计算机生成的大规模划分计划。我们提出了三个应用程序:按每种选区划分一个州,比较选举,以及寻找不公正选区划分的信号。我们的案例研究集中在宾夕法尼亚州和北卡罗来纳州的重新划分,这两个州对已颁布计划的法律挑战在过去几年中引起了相当大的公众兴趣。为了解决持久性图对投票数据和地区边界扰动的稳健性问题,我们将Cohen–Steiner等人的经典稳定性定理转化为我们的设置,并发现它可以用一种易于解释的方式来表达。我们在理论界的同时进行了实证论证,以说明图表在实践中的稳定性。
{"title":"The (homological) persistence of gerrymandering","authors":"M. Duchin, Tom Needham, Thomas Weighill","doi":"10.3934/FODS.2021007","DOIUrl":"https://doi.org/10.3934/FODS.2021007","url":null,"abstract":"We apply persistent homology, the dominant tool from the field of topological data analysis, to study electoral redistricting. Our method combines the geographic information from a political districting plan with election data to produce a persistence diagram. We are then able to visualize and analyze large ensembles of computer-generated districting plans of the type commonly used in modern redistricting research (and court challenges). We set out three applications: zoning a state at each scale of districting, comparing elections, and seeking signals of gerrymandering. Our case studies focus on redistricting in Pennsylvania and North Carolina, two states whose legal challenges to enacted plans have raised considerable public interest in the last few years. \u0000To address the question of robustness of the persistence diagrams to perturbations in vote data and in district boundaries, we translate the classical stability theorem of Cohen--Steiner et al. into our setting and find that it can be phrased in a manner that is easy to interpret. We accompany the theoretical bound with an empirical demonstration to illustrate diagram stability in practice.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45846167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Posterior contraction rates for non-parametric state and drift estimation 非参数状态和漂移估计的后验收缩率
Q2 MATHEMATICS, APPLIED Pub Date : 2020-03-20 DOI: 10.3934/fods.2020016
S. Reich, P. Rozdeba
We consider a combined state and drift estimation problem for the linear stochastic heat equation. The infinite-dimensional Bayesian inference problem is formulated in terms of the Kalman-Bucy filter over an extended state space, and its long-time asymptotic properties are studied. Asymptotic posterior contraction rates in the unknown drift function are the main contribution of this paper. Such rates have been studied before for stationary non-parametric Bayesian inverse problems, and here we demonstrate the consistency of our time-dependent formulation with these previous results building upon scale separation and a slow manifold approximation.
考虑了线性随机热方程的组合状态估计和漂移估计问题。利用扩展状态空间上的Kalman-Bucy滤波器,提出了无限维贝叶斯推理问题,并研究了该问题的长时间渐近性质。未知漂移函数中的渐近后验收缩率是本文的主要贡献。这种速率之前已经研究过平稳非参数贝叶斯反问题,在这里,我们证明了我们的时间相关公式与这些基于尺度分离和慢流形近似的先前结果的一致性。
{"title":"Posterior contraction rates for non-parametric state and drift estimation","authors":"S. Reich, P. Rozdeba","doi":"10.3934/fods.2020016","DOIUrl":"https://doi.org/10.3934/fods.2020016","url":null,"abstract":"We consider a combined state and drift estimation problem for the linear stochastic heat equation. The infinite-dimensional Bayesian inference problem is formulated in terms of the Kalman-Bucy filter over an extended state space, and its long-time asymptotic properties are studied. Asymptotic posterior contraction rates in the unknown drift function are the main contribution of this paper. Such rates have been studied before for stationary non-parametric Bayesian inverse problems, and here we demonstrate the consistency of our time-dependent formulation with these previous results building upon scale separation and a slow manifold approximation.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42505068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Foundations of data science (Springfield, Mo.)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1