arXiv - STAT - Computation最新文献_第5页

Efficient variance-based reliability sensitivity analysis for Monte Carlo methods 蒙特卡洛方法基于方差的高效可靠性敏感性分析

arXiv - STAT - Computation

Pub Date : 2024-08-13 DOI: arxiv-2408.06664

Thomas Most

In this paper, a Monte Carlo based approach for the quantification of theimportance of the scattering input parameters with respect to the failureprobability is presented. Using the basic idea of the alpha-factors of theFirst Order Reliability Method, this approach was developed to analyzecorrelated input variables as well as arbitrary marginal parameterdistributions. Based on an efficient transformation scheme using the importancesampling principle, only a single analysis run by a plain or variance-reducedMonte Carlo method is required to give a sufficient estimate of the introducedparameter sensitivities. Several application examples are presented anddiscussed in the paper.

本文介绍了一种基于蒙特卡罗的方法，用于量化散射输入参数对故障概率的重要性。利用一阶可靠性方法中α系数的基本思想，该方法可用于分析相关输入变量以及任意边际参数分布。该方法基于使用输入采样原理的高效转换方案，只需使用普通或方差缩小的蒙特卡罗方法进行一次分析，即可对引入的参数敏感性做出充分估计。文中介绍并讨论了几个应用实例。

引用次数: 0

Fast fitting of phylogenetic mixed effects models 快速拟合系统发育混合效应模型

arXiv - STAT - Computation

Pub Date : 2024-08-09 DOI: arxiv-2408.05333

Bert van der Veen, Robert Brian O'Hara

Mixed effects models are among the most commonly used statistical methods forthe exploration of multispecies data. In recent years, also Joint SpeciesDistribution Models and Generalized Linear Latent Variale Models have gained inpopularity when the goal is to incorporate residual covariation between speciesthat cannot be explained due to measured environmental covariates. Few softwareimplementations of such models exist that can additionally incorporatephylogenetic information, and those that exist tend to utilize Markov chainMonte Carlo methods for estimation, so that model fitting takes a long time. Inthis article we develop new methods for quickly and flexibly fittingphylogenetic mixed models, potentially incorporating residual covariationbetween species using latent variables, with the possibility to estimate thestrength of phylogenetic structuring in species responses per environmentalcovariate, and while incorporating correlation between different covariateeffects. By combining Variational approximations with a reduced rank matrixnormal covariance structure, Nearest Neighbours Gaussian Processes, andparallel computation, phylogenetic mixed models can be fitted much more quicklythan the current state-of-the-art. Two simulation studies demonstrate that theproposed combination of approximations is not only fast, but also enjoys highaccuracy. Finally, we demonstrate use of the method with a real world datasetof wood-decaying fungi.

混合效应模型是探索多物种数据最常用的统计方法之一。近年来，当目标是纳入物种间的残差时，联合物种分布模型和广义线性潜变量模型也越来越受欢迎。现有的此类模型软件很少能够额外纳入系统发育信息，而且现有的软件往往使用马尔可夫链蒙特卡罗方法进行估计，因此模型拟合需要很长时间。在这篇文章中，我们开发了快速灵活拟合系统发育混合模型的新方法，有可能利用潜变量纳入物种间的残差协变，并有可能估计每个环境变量在物种响应中的系统发育结构强度，同时纳入不同协变效应之间的相关性。通过将变分近似与秩矩阵正态协方差结构、近邻高斯过程和并行计算相结合，系统发育混合模型的拟合速度远远超过目前的先进水平。两项模拟研究表明，建议的近似值组合不仅速度快，而且精度高。最后，我们展示了该方法在实际木材腐朽真菌数据集中的应用。

{"title":"Fast fitting of phylogenetic mixed effects models","authors":"Bert van der Veen, Robert Brian O'Hara","doi":"arxiv-2408.05333","DOIUrl":"https://doi.org/arxiv-2408.05333","url":null,"abstract":"Mixed effects models are among the most commonly used statistical methods for\u0000the exploration of multispecies data. In recent years, also Joint Species\u0000Distribution Models and Generalized Linear Latent Variale Models have gained in\u0000popularity when the goal is to incorporate residual covariation between species\u0000that cannot be explained due to measured environmental covariates. Few software\u0000implementations of such models exist that can additionally incorporate\u0000phylogenetic information, and those that exist tend to utilize Markov chain\u0000Monte Carlo methods for estimation, so that model fitting takes a long time. In\u0000this article we develop new methods for quickly and flexibly fitting\u0000phylogenetic mixed models, potentially incorporating residual covariation\u0000between species using latent variables, with the possibility to estimate the\u0000strength of phylogenetic structuring in species responses per environmental\u0000covariate, and while incorporating correlation between different covariate\u0000effects. By combining Variational approximations with a reduced rank matrix\u0000normal covariance structure, Nearest Neighbours Gaussian Processes, and\u0000parallel computation, phylogenetic mixed models can be fitted much more quickly\u0000than the current state-of-the-art. Two simulation studies demonstrate that the\u0000proposed combination of approximations is not only fast, but also enjoys high\u0000accuracy. Finally, we demonstrate use of the method with a real world dataset\u0000of wood-decaying fungi.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Boosting Earth System Model Outputs And Saving PetaBytes in their Storage Using Exascale Climate Emulators 利用超大规模气候模拟器提升地球系统模型输出并节省 PetaBytes 的存储空间

arXiv - STAT - Computation

Pub Date : 2024-08-08 DOI: arxiv-2408.04440

Sameh Abdulah, Allison H. Baker, George Bosilca, Qinglei Cao, Stefano Castruccio, Marc G. Genton, David E. Keyes, Zubair Khalid, Hatem Ltaief, Yan Song, Georgiy L. Stenchikov, Ying Sun

We present the design and scalable implementation of an exascale climateemulator for addressing the escalating computational and storage requirementsof high-resolution Earth System Model simulations. We utilize the sphericalharmonic transform to stochastically model spatio-temporal variations inclimate data. This provides tunable spatio-temporal resolution andsignificantly improves the fidelity and granularity of climate emulation,achieving an ultra-high spatial resolution of 0.034 (approximately 3.5 km) inspace. Our emulator, trained on 318 billion hourly temperature data points froma 35-year and 31 billion daily data points from an 83-year global simulationensemble, generates statistically consistent climate emulations. We extendlinear solver software to mixed-precision arithmetic GPUs, applying differentprecisions within a single solver to adapt to different correlation strengths.The PaRSEC runtime system supports efficient parallel matrix operations byoptimizing the dynamic balance between computation, communication, and memoryrequirements. Our BLAS3-rich code is optimized for systems equipped with fourdifferent families and generations of GPUs, scaling well to achieve 0.976EFlop/s on 9,025 nodes (36,100 AMD MI250X multichip module (MCM) GPUs) ofFrontier (nearly full system), 0.739 EFlop/s on 1,936 nodes (7,744 Grace-HopperSuperchips (GH200)) of Alps, 0.243 EFlop/s on 1,024 nodes (4,096 A100 GPUs) ofLeonardo, and 0.375 EFlop/s on 3,072 nodes (18,432 V100 GPUs) of Summit.

我们介绍了超大规模气候模拟器的设计和可扩展实施，以满足高分辨率地球系统模型模拟不断增长的计算和存储需求。我们利用球形谐波变换来随机模拟气候数据的时空变化。这提供了可调的时空分辨率，显著提高了气候模拟的保真度和粒度，实现了 0.034（约 3.5 公里）的超高空间分辨率。我们的模拟器根据 35 年的 3180 亿个小时温度数据点和 83 年全球模拟集合的 310 亿个日数据点进行训练，生成了统计上一致的气候模拟。我们将线性求解器软件扩展到混合精度算术 GPU，在单个求解器中应用不同精度，以适应不同的相关强度。PaRSEC 运行时系统通过优化计算、通信和内存需求之间的动态平衡，支持高效的并行矩阵操作。我们富含BLAS3的代码针对配备四种不同系列和世代GPU的系统进行了优化，在Frontier（几乎全系统）的9,025个节点（36,100个AMD MI250X多芯片模块（MCM）GPU）上实现了0.976EFlop/s的速度，在1,025个节点（36,100个AMD MI250X多芯片模块（MCM）GPU）上实现了0.739EFlop/s的速度。在 Alps 的 1,936 个节点（7,744 个 Grace-HopperSuperchips (GH200)）上达到 0.739 EFlop/s，在 Leonardo 的 1,024 个节点（4,096 个 A100 GPU）上达到 0.243 EFlop/s，在 Summit 的 3,072 个节点（18,432 个 V100 GPU）上达到 0.375 EFlop/s。

{"title":"Boosting Earth System Model Outputs And Saving PetaBytes in their Storage Using Exascale Climate Emulators","authors":"Sameh Abdulah, Allison H. Baker, George Bosilca, Qinglei Cao, Stefano Castruccio, Marc G. Genton, David E. Keyes, Zubair Khalid, Hatem Ltaief, Yan Song, Georgiy L. Stenchikov, Ying Sun","doi":"arxiv-2408.04440","DOIUrl":"https://doi.org/arxiv-2408.04440","url":null,"abstract":"We present the design and scalable implementation of an exascale climate\u0000emulator for addressing the escalating computational and storage requirements\u0000of high-resolution Earth System Model simulations. We utilize the spherical\u0000harmonic transform to stochastically model spatio-temporal variations in\u0000climate data. This provides tunable spatio-temporal resolution and\u0000significantly improves the fidelity and granularity of climate emulation,\u0000achieving an ultra-high spatial resolution of 0.034 (approximately 3.5 km) in\u0000space. Our emulator, trained on 318 billion hourly temperature data points from\u0000a 35-year and 31 billion daily data points from an 83-year global simulation\u0000ensemble, generates statistically consistent climate emulations. We extend\u0000linear solver software to mixed-precision arithmetic GPUs, applying different\u0000precisions within a single solver to adapt to different correlation strengths.\u0000The PaRSEC runtime system supports efficient parallel matrix operations by\u0000optimizing the dynamic balance between computation, communication, and memory\u0000requirements. Our BLAS3-rich code is optimized for systems equipped with four\u0000different families and generations of GPUs, scaling well to achieve 0.976\u0000EFlop/s on 9,025 nodes (36,100 AMD MI250X multichip module (MCM) GPUs) of\u0000Frontier (nearly full system), 0.739 EFlop/s on 1,936 nodes (7,744 Grace-Hopper\u0000Superchips (GH200)) of Alps, 0.243 EFlop/s on 1,024 nodes (4,096 A100 GPUs) of\u0000Leonardo, and 0.375 EFlop/s on 3,072 nodes (18,432 V100 GPUs) of Summit.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automated Techniques for Efficient Sampling of Piecewise-Deterministic Markov Processes 片断确定性马尔可夫过程高效采样的自动化技术

arXiv - STAT - Computation

Pub Date : 2024-08-07 DOI: arxiv-2408.03682

Charly Andral, Kengo Kamatani

Piecewise deterministic Markov processes (PDMPs) are a class ofcontinuous-time Markov processes that were recently used to develop a new classof Markov chain Monte Carlo algorithms. However, the implementation of theprocesses is challenging due to the continuous-time aspect and the necessity ofintegrating the rate function. Recently, Corbella, Spencer, and Roberts (2022)proposed a new algorithm to automate the implementation of the Zig-Zag sampler.However, the efficiency of the algorithm highly depends on a hyperparameter($t_{text{max}}$) that is fixed all along the run of the algorithm and needspreliminary runs to tune. In this work, we relax this assumption and propose anew variant of their algorithm that let this parameter change over time andautomatically adapt to the target distribution. We also replace the Brentoptimization algorithm by a grid-based method to compute the upper bound of therate function. This method is more robust to the regularity of the function andgives a tighter upper bound while being quicker to compute. We also extend thealgorithm to other PDMPs and provide a Python implementation of the algorithmbased on JAX.

片断确定性马尔可夫过程（PDMP）是一类连续时间马尔可夫过程，最近被用于开发一类新的马尔可夫链蒙特卡罗算法。然而，由于其连续时间性和对速率函数进行积分的必要性，该过程的实现具有挑战性。然而，该算法的效率高度依赖于一个超参数（$t_{text{max}}$），而该超参数在算法运行过程中一直固定不变，因此需要进行初步运行来调整。在这项工作中，我们放宽了这一假设，并提出了一种新的算法变体，让这一参数随时间变化，并自动适应目标分布。我们还用一种基于网格的方法取代了布伦托最优化算法，以计算速率函数的上界。这种方法对函数的正则性更稳健，能给出更严格的上界，同时计算速度更快。我们还将该算法扩展到其他 PDMP，并提供了基于 JAX 的 Python 算法实现。

{"title":"Automated Techniques for Efficient Sampling of Piecewise-Deterministic Markov Processes","authors":"Charly Andral, Kengo Kamatani","doi":"arxiv-2408.03682","DOIUrl":"https://doi.org/arxiv-2408.03682","url":null,"abstract":"Piecewise deterministic Markov processes (PDMPs) are a class of\u0000continuous-time Markov processes that were recently used to develop a new class\u0000of Markov chain Monte Carlo algorithms. However, the implementation of the\u0000processes is challenging due to the continuous-time aspect and the necessity of\u0000integrating the rate function. Recently, Corbella, Spencer, and Roberts (2022)\u0000proposed a new algorithm to automate the implementation of the Zig-Zag sampler.\u0000However, the efficiency of the algorithm highly depends on a hyperparameter\u0000($t_{text{max}}$) that is fixed all along the run of the algorithm and needs\u0000preliminary runs to tune. In this work, we relax this assumption and propose a\u0000new variant of their algorithm that let this parameter change over time and\u0000automatically adapt to the target distribution. We also replace the Brent\u0000optimization algorithm by a grid-based method to compute the upper bound of the\u0000rate function. This method is more robust to the regularity of the function and\u0000gives a tighter upper bound while being quicker to compute. We also extend the\u0000algorithm to other PDMPs and provide a Python implementation of the algorithm\u0000based on JAX.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Monotonic warpings for additive and deep Gaussian processes 加性和深高斯过程的单调翘曲

arXiv - STAT - Computation

Pub Date : 2024-08-02 DOI: arxiv-2408.01540

Steven D. Barnett, Lauren J. Beesley, Annie S. Booth, Robert B. Gramacy, Dave Osthus

Gaussian processes (GPs) are canonical as surrogates for computer experimentsbecause they enjoy a degree of analytic tractability. But that breaks when theresponse surface is constrained, say to be monotonic. Here, we provide amono-GP construction for a single input that is highly efficient even thoughthe calculations are non-analytic. Key ingredients include transformation of areference process and elliptical slice sampling. We then show how mono-GP maybe deployed effectively in two ways. One is additive, extending monotonicity tomore inputs; the other is as a prior on injective latent warping variables in adeep Gaussian process for (non-monotonic, multi-input) non-stationary surrogatemodeling. We provide illustrative and benchmarking examples throughout, showingthat our methods yield improved performance over the state-of-the-art onexamples from those two classes of problems.

高斯过程（GPs）是计算机实验的典型代表，因为它们具有一定程度的可分析性。但是，当响应面受到约束，例如必须是单调的时候，这种可分析性就不复存在了。在这里，我们为单一输入提供了一种单 GP 结构，即使计算是非解析的，它也非常高效。其关键要素包括推理过程的转换和椭圆切片采样。然后，我们展示了如何以两种方式有效地部署单GP。一种是加法，将单调性扩展到更多输入；另一种是作为深高斯过程中注入式潜翘变量的先验，用于（非单调、多输入）非稳态代理建模。我们通篇提供了说明性和基准示例，表明我们的方法在这两类问题的示例中取得了优于最先进方法的性能。

引用次数: 0

Gradient-free optimization via integration 通过整合实现无梯度优化

arXiv - STAT - Computation

Pub Date : 2024-08-01 DOI: arxiv-2408.00888

Christophe Andrieu, Nicolas Chopin, Ettore Fincato, Mathieu Gerber

In this paper we propose a novel, general purpose, algorithm to optimizefunctions $lcolon mathbb{R}^d rightarrow mathbb{R}$ not assumed to beconvex or differentiable or even continuous. The main idea is to sequentiallyfit a sequence of parametric probability densities, possessing a concentrationproperty, to $l$ using a Bayesian update followed by a reprojection back ontothe chosen parametric sequence. Remarkably, with the sequence chosen to be fromthe exponential family, reprojection essentially boils down to the computationof expectations. Our algorithm therefore lends itself to Monte Carloapproximation, ranging from plain to Sequential Monte Carlo (SMC) methods. The algorithm is therefore particularly simple to implement and we illustrateperformance on a challenging Machine Learning classification problem. Ourmethodology naturally extends to the scenario where only noisy measurements of$l$ are available and retains ease of implementation and performance. At atheoretical level we establish, in a fairly general scenario, that ourframework can be viewed as implicitly implementing a time inhomogeneousgradient descent algorithm on a sequence of smoothed approximations of $l$.This opens the door to establishing convergence of the algorithm and providetheoretical guarantees. Along the way, we establish new results forinhomogeneous gradient descent algorithms of independent interest.

在本文中，我们提出了一种新颖的、通用的算法来优化函数$lcolon mathbb{R}^d rightarrow mathbb{R}$，该算法不假定函数是凸的或可微的，甚至是连续的。其主要思路是利用贝叶斯更新，将具有集中属性的参数概率密度序列依次拟合到 $l$，然后再投影回所选的参数序列。值得注意的是，如果选择的序列来自指数族，重投影基本上可以归结为期望值的计算。因此，我们的算法适用于蒙特卡罗逼近，包括普通蒙特卡罗方法和序列蒙特卡罗（SMC）方法。因此，该算法的实现特别简单，我们在一个具有挑战性的机器学习分类问题上展示了该算法的性能。我们的方法可以自然地扩展到只有对$l$的噪声测量的情况，并且保持了实施的简便性和性能。在理论层面，我们在一个相当普遍的场景中建立了我们的框架，该框架可被视为在$l$的平滑近似值序列上隐式地实现了时间不均匀梯度下降算法。在此过程中，我们建立了具有独立意义的非均质梯度下降算法的新结果。

{"title":"Gradient-free optimization via integration","authors":"Christophe Andrieu, Nicolas Chopin, Ettore Fincato, Mathieu Gerber","doi":"arxiv-2408.00888","DOIUrl":"https://doi.org/arxiv-2408.00888","url":null,"abstract":"In this paper we propose a novel, general purpose, algorithm to optimize\u0000functions $lcolon mathbb{R}^d rightarrow mathbb{R}$ not assumed to be\u0000convex or differentiable or even continuous. The main idea is to sequentially\u0000fit a sequence of parametric probability densities, possessing a concentration\u0000property, to $l$ using a Bayesian update followed by a reprojection back onto\u0000the chosen parametric sequence. Remarkably, with the sequence chosen to be from\u0000the exponential family, reprojection essentially boils down to the computation\u0000of expectations. Our algorithm therefore lends itself to Monte Carlo\u0000approximation, ranging from plain to Sequential Monte Carlo (SMC) methods. The algorithm is therefore particularly simple to implement and we illustrate\u0000performance on a challenging Machine Learning classification problem. Our\u0000methodology naturally extends to the scenario where only noisy measurements of\u0000$l$ are available and retains ease of implementation and performance. At a\u0000theoretical level we establish, in a fairly general scenario, that our\u0000framework can be viewed as implicitly implementing a time inhomogeneous\u0000gradient descent algorithm on a sequence of smoothed approximations of $l$.\u0000This opens the door to establishing convergence of the algorithm and provide\u0000theoretical guarantees. Along the way, we establish new results for\u0000inhomogeneous gradient descent algorithms of independent interest.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Within-vector viral dynamics challenges how to model the extrinsic incubation period for major arboviruses: dengue, Zika, and chikungunya 病媒内病毒动态对如何模拟登革热、寨卡和基孔肯雅等主要虫媒病毒的外在潜伏期提出了挑战

arXiv - STAT - Computation

Pub Date : 2024-08-01 DOI: arxiv-2408.00409

Léa Loisel, Vincent Raquin, Maxime Ratinier, Pauline Ezanno, Gaël Beaunée

Arboviruses represent a significant threat to human, animal, and plant healthworldwide. To elucidate transmission, anticipate their spread and efficientlycontrol them, mechanistic modelling has proven its usefulness. However, mostmodels rely on assumptions about how the extrinsic incubation period (EIP) isrepresented: the intra-vector viral dynamics (IVD), occurring during the EIP,is approximated by a single state. After an average duration, all exposedvectors become infectious. Behind this are hidden two strong hypotheses: (i)EIP is exponentially distributed in the vector population; (ii) virusessuccessfully cross the infection, dissemination, and transmission barriers inall exposed vectors. To assess these hypotheses, we developed a stochasticcompartmental model which represents successive IVD stages, associated to thecrossing or not of these three barriers. We calibrated the model using anABC-SMC (Approximate Bayesian Computation - Sequential Monte Carlo) method withmodel selection. We systematically searched for literature data on experimentalinfections of Aedes mosquitoes infected by either dengue, chikungunya, or Zikaviruses. We demonstrated the discrepancy between the exponential hypothesis andobserved EIP distributions for dengue and Zika viruses and identified morerelevant EIP distributions . We also quantified the fraction of infectedmosquitoes eventually becoming infectious, highlighting that often only a smallfraction crosses the three barriers. This work provides a generic modellingframework applicable to other arboviruses for which similar data are available.Our model can also be coupled to population-scale models to aid futurearbovirus control.

虫媒病毒对全球人类、动物和植物的健康构成重大威胁。为了阐明其传播途径、预测其传播并有效控制其传播，机理模型已被证明是非常有用的。然而，大多数模型都依赖于关于如何表示外在潜伏期（EIP）的假设：在 EIP 期间发生的媒介内病毒动态（IVD）近似于单一状态。在一个平均持续时间之后，所有暴露的病媒都具有传染性。这背后隐藏着两个强有力的假设：(i) EIP 在病媒种群中呈指数分布；(ii) 病毒在所有暴露的病媒中都能成功跨越感染、传播和传播障碍。为了评估这些假设，我们建立了一个随机区室模型，该模型表示了与是否跨越这三个障碍相关的连续 IVD 阶段。我们使用具有模型选择功能的近似贝叶斯计算-序列蒙特卡洛（ABC-SMC）方法对模型进行了校准。我们系统地搜索了伊蚊感染登革热、基孔肯雅或齐卡病毒的实验数据。我们证明了登革热和寨卡病毒的指数假说与观察到的 EIP 分布之间的差异，并确定了更相关的 EIP 分布。我们还量化了最终成为传染源的受感染蚊子的比例，并强调通常只有一小部分蚊子能跨越三道屏障。这项工作提供了一个通用的建模框架，适用于有类似数据的其他虫媒病毒。

{"title":"Within-vector viral dynamics challenges how to model the extrinsic incubation period for major arboviruses: dengue, Zika, and chikungunya","authors":"Léa Loisel, Vincent Raquin, Maxime Ratinier, Pauline Ezanno, Gaël Beaunée","doi":"arxiv-2408.00409","DOIUrl":"https://doi.org/arxiv-2408.00409","url":null,"abstract":"Arboviruses represent a significant threat to human, animal, and plant health\u0000worldwide. To elucidate transmission, anticipate their spread and efficiently\u0000control them, mechanistic modelling has proven its usefulness. However, most\u0000models rely on assumptions about how the extrinsic incubation period (EIP) is\u0000represented: the intra-vector viral dynamics (IVD), occurring during the EIP,\u0000is approximated by a single state. After an average duration, all exposed\u0000vectors become infectious. Behind this are hidden two strong hypotheses: (i)\u0000EIP is exponentially distributed in the vector population; (ii) viruses\u0000successfully cross the infection, dissemination, and transmission barriers in\u0000all exposed vectors. To assess these hypotheses, we developed a stochastic\u0000compartmental model which represents successive IVD stages, associated to the\u0000crossing or not of these three barriers. We calibrated the model using an\u0000ABC-SMC (Approximate Bayesian Computation - Sequential Monte Carlo) method with\u0000model selection. We systematically searched for literature data on experimental\u0000infections of Aedes mosquitoes infected by either dengue, chikungunya, or Zika\u0000viruses. We demonstrated the discrepancy between the exponential hypothesis and\u0000observed EIP distributions for dengue and Zika viruses and identified more\u0000relevant EIP distributions . We also quantified the fraction of infected\u0000mosquitoes eventually becoming infectious, highlighting that often only a small\u0000fraction crosses the three barriers. This work provides a generic modelling\u0000framework applicable to other arboviruses for which similar data are available.\u0000Our model can also be coupled to population-scale models to aid future\u0000arbovirus control.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141884144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Supervised brain node and network construction under voxel-level functional imaging 体素级功能成像下的有监督大脑节点和网络构建

arXiv - STAT - Computation

Pub Date : 2024-07-30 DOI: arxiv-2407.21242

Wanwan Xu, Selena Wang, Chichun Tan, Xilin Shen, Wenjing Luo, Todd Constable, Tianxi Li, Yize Zhao

Recent advancements in understanding the brain's functional organizationrelated to behavior have been pivotal, particularly in the development ofpredictive models based on brain connectivity. Traditional methods in thisdomain often involve a two-step process by first constructing a connectivitymatrix from predefined brain regions, and then linking these connections tobehaviors or clinical outcomes. However, these approaches with unsupervisednode partitions predict outcomes inefficiently with independently establishedconnectivity. In this paper, we introduce the Supervised Brain Parcellation(SBP), a brain node parcellation scheme informed by the downstream predictivetask. With voxel-level functional time courses generated under resting-state orcognitive tasks as input, our approach clusters voxels into nodes in a mannerthat maximizes the correlation between inter-node connections and thebehavioral outcome, while also accommodating intra-node homogeneity. Werigorously evaluate the SBP approach using resting-state and task-based fMRIdata from both the Adolescent Brain Cognitive Development (ABCD) study and theHuman Connectome Project (HCP). Our analyses show that SBP significantlyimproves out-of-sample connectome-based predictive performance compared toconventional step-wise methods under various brain atlases. This advancementholds promise for enhancing our understanding of brain functional architectureswith behavior and establishing more informative network neuromarkers forclinical applications.

最近，在理解与行为相关的大脑功能组织方面取得了举足轻重的进展，尤其是在开发基于大脑连接性的预测模型方面。这一领域的传统方法通常包括两个步骤：首先从预定义的脑区构建连接矩阵，然后将这些连接与行为或临床结果联系起来。然而，这些采用无监督节点分区的方法在独立建立连接性的情况下预测结果的效率很低。在本文中，我们介绍了 "监督脑节点划分"（Supervised Brain Parcellation，SBP），这是一种由下游预测任务提供信息的脑节点划分方案。以静息态或认知任务下生成的体素级功能时程为输入，我们的方法将体素聚类为节点，使节点间连接与行为结果之间的相关性最大化，同时也兼顾了节点内的同质性。我们使用青少年大脑认知发展（ABCD）研究和人类连接组计划（HCP）的静息态和任务型 fMRI 数据对 SBP 方法进行了评估。我们的分析表明，在各种脑图谱下，与传统的分步法相比，SBP 能显著提高基于连接组的样本外预测性能。这一进步有望增强我们对大脑功能结构与行为的理解，并为临床应用建立更多信息丰富的网络神经标记。

{"title":"Supervised brain node and network construction under voxel-level functional imaging","authors":"Wanwan Xu, Selena Wang, Chichun Tan, Xilin Shen, Wenjing Luo, Todd Constable, Tianxi Li, Yize Zhao","doi":"arxiv-2407.21242","DOIUrl":"https://doi.org/arxiv-2407.21242","url":null,"abstract":"Recent advancements in understanding the brain's functional organization\u0000related to behavior have been pivotal, particularly in the development of\u0000predictive models based on brain connectivity. Traditional methods in this\u0000domain often involve a two-step process by first constructing a connectivity\u0000matrix from predefined brain regions, and then linking these connections to\u0000behaviors or clinical outcomes. However, these approaches with unsupervised\u0000node partitions predict outcomes inefficiently with independently established\u0000connectivity. In this paper, we introduce the Supervised Brain Parcellation\u0000(SBP), a brain node parcellation scheme informed by the downstream predictive\u0000task. With voxel-level functional time courses generated under resting-state or\u0000cognitive tasks as input, our approach clusters voxels into nodes in a manner\u0000that maximizes the correlation between inter-node connections and the\u0000behavioral outcome, while also accommodating intra-node homogeneity. We\u0000rigorously evaluate the SBP approach using resting-state and task-based fMRI\u0000data from both the Adolescent Brain Cognitive Development (ABCD) study and the\u0000Human Connectome Project (HCP). Our analyses show that SBP significantly\u0000improves out-of-sample connectome-based predictive performance compared to\u0000conventional step-wise methods under various brain atlases. This advancement\u0000holds promise for enhancing our understanding of brain functional architectures\u0000with behavior and establishing more informative network neuromarkers for\u0000clinical applications.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141866089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multilevel Monte Carlo in Sample Average Approximation: Convergence, Complexity and Application 抽样平均逼近中的多级蒙特卡罗：收敛性、复杂性与应用

arXiv - STAT - Computation

Pub Date : 2024-07-26 DOI: arxiv-2407.18504

Devang Sinha, Siddhartha P. Chakrabarty

In this paper, we examine the Sample Average Approximation (SAA) procedurewithin a framework where the Monte Carlo estimator of the expectation isbiased. We also introduce Multilevel Monte Carlo (MLMC) in the SAA setup toenhance the computational efficiency of solving optimization problems. In thiscontext, we conduct a thorough analysis, exploiting Cram'er's large deviationtheory, to establish uniform convergence, quantify the convergence rate, anddetermine the sample complexity for both standard Monte Carlo and MLMCparadigms. Additionally, we perform a root-mean-squared error analysisutilizing tools from empirical process theory to derive sample complexitywithout relying on the finite moment condition typically required for uniformconvergence results. Finally, we validate our findings and demonstrate theadvantages of the MLMC estimator through numerical examples, estimatingConditional Value-at-Risk (CVaR) in the Geometric Brownian Motion and nestedexpectation framework.

在本文中，我们在蒙特卡罗期望估计器有偏差的框架下研究了样本平均逼近（SAA）程序。我们还在 SAA 设置中引入了多级蒙特卡罗（MLMC），以提高解决优化问题的计算效率。在此背景下，我们利用克拉姆（Cram'er）的大偏差理论（large deviationtheory）进行了深入分析，为标准蒙特卡罗和 MLMC 范式建立了均匀收敛性、量化了收敛速率并确定了样本复杂度。此外，我们还利用经验过程理论的工具进行了均方根误差分析，得出了样本复杂度，而无需依赖均匀收敛结果通常需要的有限矩条件。最后，我们通过数值示例验证了我们的发现，并证明了 MLMC 估计器的优势，即在几何布朗运动和嵌套期望框架下估计条件风险值（CVaR）。

引用次数: 0

Multi-physics Simulation Guided Generative Diffusion Models with Applications in Fluid and Heat Dynamics 多物理场仿真指导下的生成扩散模型在流体和热动力学中的应用

arXiv - STAT - Computation

Pub Date : 2024-07-25 DOI: arxiv-2407.17720

Naichen Shi, Hao Yan, Shenghan Guo, Raed Al Kontar

In this paper, we present a generic physics-informed generative model calledMPDM that integrates multi-fidelity physics simulations with diffusion models.MPDM categorizes multi-fidelity physics simulations into inexpensive andexpensive simulations, depending on computational costs. The inexpensivesimulations, which can be obtained with low latency, directly inject contextualinformation into DDMs. Furthermore, when results from expensive simulations areavailable, MPDM refines the quality of generated samples via a guided diffusionprocess. This design separates the training of a denoising diffusion model fromphysics-informed conditional probability models, thus lending flexibility topractitioners. MPDM builds on Bayesian probabilistic models and is equippedwith a theoretical guarantee that provides upper bounds on the Wassersteindistance between the sample and underlying true distribution. The probabilisticnature of MPDM also provides a convenient approach for uncertaintyquantification in prediction. Our models excel in cases where physicssimulations are imperfect and sometimes inaccessible. We use a numericalsimulation in fluid dynamics and a case study in heat dynamics withinlaser-based metal powder deposition additive manufacturing to demonstrate howMPDM seamlessly integrates multi-idelity physics simulations and observationsto obtain surrogates with superior predictive performance.

在本文中，我们提出了一种名为 MPDM 的通用物理信息生成模型，它将多保真度物理模拟与扩散模型集成在一起。MPDM 根据计算成本的不同，将多保真度物理模拟分为廉价模拟和昂贵模拟。廉价模拟可以在较低的延迟时间内获得，并直接将上下文信息注入 DDM。此外，当昂贵的模拟结果可用时，MPDM 会通过引导扩散过程来改进生成样本的质量。这种设计将去噪扩散模型的训练从物理信息条件概率模型中分离出来，从而为实践者提供了灵活性。MPDM 建立在贝叶斯概率模型的基础上，具有理论保证，为样本与底层真实分布之间的瓦瑟斯特距离提供了上限。MPDM 的概率性质还为预测中的不确定性量化提供了便捷的方法。我们的模型在物理模拟不完善、有时无法进入的情况下表现出色。我们利用流体动力学数值模拟和基于激光的金属粉末沉积快速成型制造中的热动力学案例研究，展示了 MPDM 如何无缝集成多保真度物理模拟和观测，从而获得具有卓越预测性能的替代模型。

{"title":"Multi-physics Simulation Guided Generative Diffusion Models with Applications in Fluid and Heat Dynamics","authors":"Naichen Shi, Hao Yan, Shenghan Guo, Raed Al Kontar","doi":"arxiv-2407.17720","DOIUrl":"https://doi.org/arxiv-2407.17720","url":null,"abstract":"In this paper, we present a generic physics-informed generative model called\u0000MPDM that integrates multi-fidelity physics simulations with diffusion models.\u0000MPDM categorizes multi-fidelity physics simulations into inexpensive and\u0000expensive simulations, depending on computational costs. The inexpensive\u0000simulations, which can be obtained with low latency, directly inject contextual\u0000information into DDMs. Furthermore, when results from expensive simulations are\u0000available, MPDM refines the quality of generated samples via a guided diffusion\u0000process. This design separates the training of a denoising diffusion model from\u0000physics-informed conditional probability models, thus lending flexibility to\u0000practitioners. MPDM builds on Bayesian probabilistic models and is equipped\u0000with a theoretical guarantee that provides upper bounds on the Wasserstein\u0000distance between the sample and underlying true distribution. The probabilistic\u0000nature of MPDM also provides a convenient approach for uncertainty\u0000quantification in prediction. Our models excel in cases where physics\u0000simulations are imperfect and sometimes inaccessible. We use a numerical\u0000simulation in fluid dynamics and a case study in heat dynamics within\u0000laser-based metal powder deposition additive manufacturing to demonstrate how\u0000MPDM seamlessly integrates multi-idelity physics simulations and observations\u0000to obtain surrogates with superior predictive performance.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141775382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0