arXiv - STAT - Computation最新文献

英文中文

Sampling parameters of ordinary differential equations with Langevin dynamics that satisfy constraints 满足约束条件的朗格文动态常微分方程参数采样

arXiv - STAT - Computation

Pub Date : 2024-08-28 DOI: arxiv-2408.15505

Chris Chi, Jonathan Weare, Aaron R. Dinner

Fitting models to data to obtain distributions of consistent parameter valuesis important for uncertainty quantification, model comparison, and prediction.Standard Markov Chain Monte Carlo (MCMC) approaches for fitting ordinarydifferential equations (ODEs) to time-series data involve proposing trialparameter sets, numerically integrating the ODEs forward in time, and acceptingor rejecting the trial parameter sets. When the model dynamics dependnonlinearly on the parameters, as is generally the case, trial parameter setsare often rejected, and MCMC approaches become prohibitively computationallycostly to converge. Here, we build on methods for numerical continuation andtrajectory optimization to introduce an approach in which we use Langevindynamics in the joint space of variables and parameters to sample models thatsatisfy constraints on the dynamics. We demonstrate the method by sampling Hopfbifurcations and limit cycles of a model of a biochemical oscillator in aBayesian framework for parameter estimation, and we obtain more than a hundredfold speedup relative to a leading ensemble MCMC approach that requiresnumerically integrating the ODEs forward in time. We describe numericalexperiments that provide insight into the speedup. The method is general andcan be used in any framework for parameter estimation and model selection.

标准的马尔可夫链蒙特卡罗（MCMC）方法用于将普通微分方程（ODEs）拟合到时间序列数据中，包括提出试验参数集，对 ODEs 进行时间上的数值积分，以及接受或拒绝试验参数集。当模型动态非线性地依赖于参数时（通常是这种情况），试验参数集往往会被拒绝，MCMC 方法的收敛计算成本会高得令人望而却步。在这里，我们以数值延续和轨迹优化方法为基础，引入了一种方法，即在变量和参数的联合空间中使用朗格文德动力学，对满足动力学约束的模型进行采样。我们通过在贝叶斯框架下对一个生化振荡器模型的霍普夫分岔和极限循环进行采样，演示了这种方法的参数估计，与需要在时间上对 ODEs 进行数值积分的领先集合 MCMC 方法相比，我们获得了超过百倍的速度。我们描述了数值实验，以深入了解这种提速。该方法具有通用性，可用于参数估计和模型选择的任何框架。

{"title":"Sampling parameters of ordinary differential equations with Langevin dynamics that satisfy constraints","authors":"Chris Chi, Jonathan Weare, Aaron R. Dinner","doi":"arxiv-2408.15505","DOIUrl":"https://doi.org/arxiv-2408.15505","url":null,"abstract":"Fitting models to data to obtain distributions of consistent parameter values\u0000is important for uncertainty quantification, model comparison, and prediction.\u0000Standard Markov Chain Monte Carlo (MCMC) approaches for fitting ordinary\u0000differential equations (ODEs) to time-series data involve proposing trial\u0000parameter sets, numerically integrating the ODEs forward in time, and accepting\u0000or rejecting the trial parameter sets. When the model dynamics depend\u0000nonlinearly on the parameters, as is generally the case, trial parameter sets\u0000are often rejected, and MCMC approaches become prohibitively computationally\u0000costly to converge. Here, we build on methods for numerical continuation and\u0000trajectory optimization to introduce an approach in which we use Langevin\u0000dynamics in the joint space of variables and parameters to sample models that\u0000satisfy constraints on the dynamics. We demonstrate the method by sampling Hopf\u0000bifurcations and limit cycles of a model of a biochemical oscillator in a\u0000Bayesian framework for parameter estimation, and we obtain more than a hundred\u0000fold speedup relative to a leading ensemble MCMC approach that requires\u0000numerically integrating the ODEs forward in time. We describe numerical\u0000experiments that provide insight into the speedup. The method is general and\u0000can be used in any framework for parameter estimation and model selection.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating Complex HPV Dynamics Using Emulation and History Matching 利用仿真和历史匹配研究复杂的 HPV 动态变化

arXiv - STAT - Computation

Pub Date : 2024-08-28 DOI: arxiv-2408.15805

Andrew Iskauskas, Jamie A. Cohen, Danny Scarponi, Ian Vernon, Michael Goldstein, Daniel Klein, Richard G. White, Nicky McCreesh

The study of transmission and progression of human papillomavirus (HPV) iscrucial for understanding the incidence of cervical cancers, and has beenidentified as a priority worldwide. The complexity of the disease necessitatesa detailed model of HPV transmission and its progression to cancer; to inferproperties of the above we require a careful process that can match toimperfect or incomplete observational data. In this paper, we describe theHPVsim simulator to satisfy the former requirement; to satisfy the latter wecouple this stochastic simulator to a process of emulation and history matchingusing the R package hmer. With these tools, we are able to obtain acomprehensive collection of parameter combinations that could give rise toobserved cancer data, and explore the implications of the variability of theseparameter sets as it relates to future health interventions.

研究人类乳头瘤病毒（HPV）的传播和发展对了解宫颈癌的发病率至关重要，已被确定为全球的优先事项。由于该疾病的复杂性，有必要建立一个详细的 HPV 传播及其向癌症发展的模型；要推断上述模型的特性，我们需要一个能与不完善或不完整的观察数据相匹配的谨慎过程。在本文中，我们描述了 HPVsim 模拟器，以满足前一项要求；为了满足后一项要求，我们将该随机模拟器与使用 R 软件包 hmer 的仿真和历史匹配过程结合起来。有了这些工具，我们就能全面收集可能导致癌症观测数据的参数组合，并探索这些参数集的可变性对未来健康干预的影响。

引用次数: 0

A Model-Free Method to Quantify Memory Utilization in Neural Point Processes 量化神经点过程内存利用率的无模型方法

arXiv - STAT - Computation

Pub Date : 2024-08-28 DOI: arxiv-2408.15875

Gorana Mijatovic, Sebastiano Stramaglia, Luca Faes

Quantifying the predictive capacity of a neural system, intended as thecapability to store information and actively use it for dynamic systemevolution, is a key component of neural information processing. Informationstorage (IS), the main measure quantifying the active utilization of memory ina dynamic system, is only defined for discrete-time processes. While recenttheoretical work laid the foundations for the continuous-time analysis of thepredictive capacity stored in a process, methods for the effective computationof the related measures are needed to favor widespread utilization on neuraldata. This work introduces a method for the model-free estimation of theso-called memory utilization rate (MUR), the continuous-time counterpart of theIS, specifically designed to quantify the predictive capacity stored in neuralpoint processes. The method employs nearest-neighbor entropy estimation appliedto the inter-spike intervals measured from point-process realizations toquantify the extent of memory used by a spike train. An empirical procedurebased on surrogate data is implemented to compensate the estimation bias anddetect statistically significant levels of memory. The method is validated insimulated Poisson processes and in realistic models of coupled corticaldynamics and heartbeat dynamics. It is then applied to real spike trainsreflecting central and autonomic nervous system activities: in spontaneouslygrowing cortical neuron cultures, the MUR detected increasing memoryutilization across maturation stages, associated to emergent burstingsynchronized activity; in the study of the neuro-autonomic modulation of humanheartbeats, the MUR reflected the sympathetic activation occurring withpostural but not with mental stress. The proposed approach offers acomputationally reliable tool to analyze spike train data in computationalneuroscience and physiology.

量化神经系统的预测能力是神经信息处理的一个关键组成部分，预测能力是指神经系统存储信息并积极利用信息进行动态系统进化的能力。信息存储（IS）是量化动态系统内存主动利用率的主要指标，但它只适用于离散时间过程。虽然最近的理论工作为连续时间分析过程中存储的预测能力奠定了基础，但仍需要有效计算相关度量的方法，以促进神经数据的广泛利用。本研究介绍了一种无模型估算所谓内存利用率（MUR）的方法，即 IS 的连续时间对应值，专门用于量化神经点过程中存储的预测能力。该方法采用最近邻熵估算法，将其应用于从点进程实现中测量的尖峰间间隔，以量化尖峰序列所使用的记忆程度。基于代用数据的经验程序可补偿估计偏差，并检测出具有统计学意义的记忆水平。该方法在模拟泊松过程以及耦合皮层动力学和心跳动力学的现实模型中得到了验证。然后，将该方法应用于反映中枢神经系统和自主神经系统活动的真实尖峰列车：在自发生长的皮层神经元培养物中，MUR 检测到记忆利用率在各个成熟阶段都在增加，这与突发的同步活动有关；在人类心跳的神经-自主神经调节研究中，MUR 反映了交感神经在体力压力下的激活，而不是在精神压力下的激活。所提出的方法为在计算神经科学和生理学中分析尖峰列车数据提供了一种计算上可靠的工具。

{"title":"A Model-Free Method to Quantify Memory Utilization in Neural Point Processes","authors":"Gorana Mijatovic, Sebastiano Stramaglia, Luca Faes","doi":"arxiv-2408.15875","DOIUrl":"https://doi.org/arxiv-2408.15875","url":null,"abstract":"Quantifying the predictive capacity of a neural system, intended as the\u0000capability to store information and actively use it for dynamic system\u0000evolution, is a key component of neural information processing. Information\u0000storage (IS), the main measure quantifying the active utilization of memory in\u0000a dynamic system, is only defined for discrete-time processes. While recent\u0000theoretical work laid the foundations for the continuous-time analysis of the\u0000predictive capacity stored in a process, methods for the effective computation\u0000of the related measures are needed to favor widespread utilization on neural\u0000data. This work introduces a method for the model-free estimation of the\u0000so-called memory utilization rate (MUR), the continuous-time counterpart of the\u0000IS, specifically designed to quantify the predictive capacity stored in neural\u0000point processes. The method employs nearest-neighbor entropy estimation applied\u0000to the inter-spike intervals measured from point-process realizations to\u0000quantify the extent of memory used by a spike train. An empirical procedure\u0000based on surrogate data is implemented to compensate the estimation bias and\u0000detect statistically significant levels of memory. The method is validated in\u0000simulated Poisson processes and in realistic models of coupled cortical\u0000dynamics and heartbeat dynamics. It is then applied to real spike trains\u0000reflecting central and autonomic nervous system activities: in spontaneously\u0000growing cortical neuron cultures, the MUR detected increasing memory\u0000utilization across maturation stages, associated to emergent bursting\u0000synchronized activity; in the study of the neuro-autonomic modulation of human\u0000heartbeats, the MUR reflected the sympathetic activation occurring with\u0000postural but not with mental stress. The proposed approach offers a\u0000computationally reliable tool to analyze spike train data in computational\u0000neuroscience and physiology.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NetSurvival.jl: A glimpse into relative survival analysis with Julia NetSurvival.jl：使用 Julia 进行相对生存分析的一瞥

arXiv - STAT - Computation

Pub Date : 2024-08-28 DOI: arxiv-2408.15655

Rim Alhajal, Oskar Laverny

In many population-based medical studies, the specific cause of death isunidentified, unreliable or even unavailable. Relative survival analysisaddresses this scenario, outside of standard (competing risks) survivalanalysis, to nevertheless estimate survival with respect to a specific cause.It separates the impact of the disease itself on mortality from other factors,such as age, sex, and general population trends. Different methods were createdwith the aim to construct consistent and efficient estimators for this purpose.The R package relsurv is the most commonly used today in application. WithJulia continuously proving itself to be an efficient and powerful programminglanguage, we felt the need to code a pure Julia take, thus NetSurvival.jl, ofthe standard routines and estimators in the field. The proposed implementationis clean, future-proof, well tested, and the package is correctly documentedinside the rising JuliaSurv GitHub organization, ensuring trustability of theresults. Through a comprehensive comparison in terms of performance andinterface to relsurv, we highlight the benefits of the Julia developingenvironment.

在许多基于人口的医学研究中，具体死因无法确定、不可靠甚至无法获得。相对存活率分析就是在标准（竞争风险）存活率分析之外，针对这种情况估算与特定死因相关的存活率，它将疾病本身对死亡率的影响与年龄、性别和总体人口趋势等其他因素区分开来。为实现这一目的，人们创造了不同的方法来构建一致且高效的估计器。R软件包relsurv是目前最常用的应用软件。随着 Julia 不断证明自己是一种高效、强大的编程语言，我们认为有必要对该领域的标准例程和估计器进行纯 Julia 代码转换，即 NetSurvival.jl。我们提出的实现是简洁的、面向未来的、经过良好测试的，而且该软件包在不断上升的 JuliaSurv GitHub 组织内有正确的文档记录，从而确保了结果的可信度。通过对性能和与 relsurv 接口的综合比较，我们强调了 Julia 开发环境的优势。

{"title":"NetSurvival.jl: A glimpse into relative survival analysis with Julia","authors":"Rim Alhajal, Oskar Laverny","doi":"arxiv-2408.15655","DOIUrl":"https://doi.org/arxiv-2408.15655","url":null,"abstract":"In many population-based medical studies, the specific cause of death is\u0000unidentified, unreliable or even unavailable. Relative survival analysis\u0000addresses this scenario, outside of standard (competing risks) survival\u0000analysis, to nevertheless estimate survival with respect to a specific cause.\u0000It separates the impact of the disease itself on mortality from other factors,\u0000such as age, sex, and general population trends. Different methods were created\u0000with the aim to construct consistent and efficient estimators for this purpose.\u0000The R package relsurv is the most commonly used today in application. With\u0000Julia continuously proving itself to be an efficient and powerful programming\u0000language, we felt the need to code a pure Julia take, thus NetSurvival.jl, of\u0000the standard routines and estimators in the field. The proposed implementation\u0000is clean, future-proof, well tested, and the package is correctly documented\u0000inside the rising JuliaSurv GitHub organization, ensuring trustability of the\u0000results. Through a comprehensive comparison in terms of performance and\u0000interface to relsurv, we highlight the benefits of the Julia developing\u0000environment.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An invitation to adaptive Markov chain Monte Carlo convergence theory 自适应马尔可夫链蒙特卡罗收敛理论邀请函

arXiv - STAT - Computation

Pub Date : 2024-08-27 DOI: arxiv-2408.14903

Pietari Laitinen, Matti Vihola

Adaptive Markov chain Monte Carlo (MCMC) algorithms, which automatically tunetheir parameters based on past samples, have proved extremely useful inpractice. The self-tuning mechanism makes them `non-Markovian', which meansthat their validity cannot be ensured by standard Markov chains theory. Severaldifferent techniques have been suggested to analyse their theoreticalproperties, many of which are technically involved. The technical nature of thetheory may make the methods unnecessarily unappealing. We discuss one technique-- based on a martingale decomposition -- with uniformly ergodic Markovtransitions. We provide an accessible and self-contained treatment in thissetting, and give detailed proofs of the results discussed in the paper, whichonly require basic understanding of martingale theory and general state spaceMarkov chain concepts. We illustrate how our conditions can accomodatedifferent types of adaptation schemes, and can give useful insight to therequirements which ensure their validity.

自适应马尔可夫链蒙特卡罗（MCMC）算法可根据过去的样本自动调整参数，在实践中已被证明非常有用。自调整机制使其成为 "非马尔可夫 "算法，这意味着标准马尔可夫链理论无法确保其有效性。人们提出了几种不同的技术来分析它们的理论特性，其中许多都涉及技术问题。理论的技术性可能会使这些方法失去吸引力。我们讨论了一种技术--基于马丁格尔分解--与均匀遍历马尔可夫变换。我们在这种情况下提供了一种通俗易懂、自成一体的处理方法，并对文中讨论的结果给出了详细的证明，而这只需要对鞅理论和一般状态空间马尔可夫链概念有基本的了解。我们说明了我们的条件如何适应不同类型的适应方案，并对确保其有效性的要求提出了有益的见解。

引用次数: 0

The Traceplot Thickens: MCMC Diagnostics for Non-Euclidean Spaces 迹图变厚：非欧几里得空间的 MCMC 诊断方法

arXiv - STAT - Computation

Pub Date : 2024-08-27 DOI: arxiv-2408.15392

Luke Duttweiler, Jonathan Klus, Brent Coull, Sally W. Thurston

MCMC algorithms are frequently used to perform inference under a Bayesianmodeling framework. Convergence diagnostics, such as traceplots, theGelman-Rubin potential scale reduction factor, and effective sample size, areused to visualize mixing and determine how long to run the sampler. However,these classic diagnostics can be ineffective when the sample space of thealgorithm is highly discretized (eg. Bayesian Networks or Dirichlet ProcessMixture Models) or the sampler uses frequent non-Euclidean moves. In thisarticle, we develop novel generalized convergence diagnostics produced bymapping the original space to the real-line while respecting a relevantdistance function and then evaluating the convergence diagnostics on the mappedvalues. Simulated examples are provided that demonstrate the success of thismethod in identifying failures to converge that are missed or unavailable byother methods.

MCMC 算法经常用于在贝叶斯建模框架下进行推理。收敛性诊断，如轨迹图、Gelman-Rubin 潜在规模缩减因子和有效样本大小，被用来直观显示混合情况并确定运行采样器的时间。然而，当算法的样本空间高度离散化（如贝叶斯网络或狄利克特过程混合模型）或采样器频繁使用非欧几里得移动时，这些经典诊断方法就会失效。在本文中，我们通过将原始空间映射到实线上，同时尊重相关的距离函数，然后在映射值上评估收敛诊断，开发出了新颖的通用收敛诊断。文中提供的模拟示例证明了这种方法在识别收敛失败方面的成功，而这些失败是其他方法所遗漏或无法识别的。

引用次数: 0

Implementing MCMC: Multivariate estimation with confidence 实施 MCMC：有把握的多变量估计

arXiv - STAT - Computation

Pub Date : 2024-08-27 DOI: arxiv-2408.15396

James M. Flegal, Rebecca P. Kurtz-Garcia

This paper addresses the key challenge of estimating the asymptoticcovariance associated with the Markov chain central limit theorem, which isessential for visualizing and terminating Markov Chain Monte Carlo (MCMC)simulations. We focus on summarizing batching, spectral, and initial sequencecovariance estimation techniques. We emphasize practical recommendations formodern MCMC simulations, where positive correlation is common and leads tonegatively biased covariance estimates. Our discussion is centered oncomputationally efficient methods that remain viable even when the number ofiterations is large, offering insights into improving the reliability andaccuracy of MCMC output in such scenarios.

本文探讨了估计与马尔可夫链中心极限定理相关的渐近方差这一关键挑战，这对于可视化和终止马尔可夫链蒙特卡罗（MCMC）模拟至关重要。我们重点总结了批处理、频谱和初始序列方差估计技术。我们强调对现代 MCMC 模拟的实用建议，因为正相关是常见现象，会导致协方差估计出现相对偏差。我们的讨论集中在计算效率高的方法上，这些方法即使在迭代次数多的情况下仍然可行，为在这种情况下提高 MCMC 输出的可靠性和准确性提供了启示。

引用次数: 0

Bayesian inference for the Markov-modulated Poisson process with an outcome process 带有结果过程的马尔可夫调制泊松过程的贝叶斯推理

arXiv - STAT - Computation

Pub Date : 2024-08-27 DOI: arxiv-2408.15314

Yu Luo, Chris Sherlock

In medical research, understanding changes in outcome measurements is crucialfor inferring shifts in a patient's underlying health condition. While datafrom clinical and administrative systems hold promise for advancing thisunderstanding, traditional methods for modelling disease progression strugglewith analyzing a large volume of longitudinal data collected irregularly and donot account for the phenomenon where the poorer an individual's health, themore frequently they interact with the healthcare system. In addition, datafrom the claim and health care system provide no information for terminatingevents, such as death. To address these challenges, we start from thecontinuous-time hidden Markov model to understand disease progression bymodelling the observed data as an outcome whose distribution depends on thestate of a latent Markov chain representing the underlying health state.However, we also allow the underlying health state to influence the timings ofthe observations via a point process. Furthermore, we create an addition"death" state and model the unobserved terminating event, a transition to thisstate, via an additional Poisson process whose rate depends on the latent stateof the Markov chain. This extension allows us to model disease severity anddeath not only based on the types of care received but also on the temporal andfrequency aspects of different observed events. We present an exact Gibbssampler procedure that alternates sampling the complete path of the hiddenchain (the latent health state throughout the observation window) conditionalon the complete paths. When the unobserved, terminating event occurs early inthe observation window, there are no more observed events, and naive use of amodel with only "live" health states would lead to biases in parameterestimates; our inclusion of a "death" state mitigates against this.

在医学研究中，了解结果测量的变化对于推断病人潜在健康状况的变化至关重要。虽然来自临床和行政系统的数据有望促进这一理解，但传统的疾病进展建模方法难以分析不定期收集的大量纵向数据，也无法解释个人健康状况越差，与医疗系统的互动就越频繁这一现象。此外，来自索赔和医疗系统的数据不提供死亡等终止事件的信息。为了应对这些挑战，我们从连续时间隐马尔可夫模型入手，将观察到的数据模拟为一种结果，其分布取决于代表潜在健康状况的潜在马尔可夫链的状态，从而理解疾病的进展。此外，我们还创建了一个额外的 "死亡 "状态，并通过一个额外的泊松过程（其速率取决于马尔可夫链的潜伏状态）来模拟未观测到的终止事件，即向该状态的转变。通过这种扩展，我们不仅可以根据所接受的护理类型，还可以根据不同观察事件的时间和频率对疾病严重程度和死亡进行建模。我们提出了一种精确的吉布斯取样器程序，该程序以完整路径为条件，交替对隐藏链的完整路径（整个观察窗口中的潜在健康状态）进行取样。当未观察到的终止事件在观察窗口早期发生时，就不会再有观察到的事件了，如果天真地使用只有 "活 "的健康状态的模型，就会导致参数估计的偏差；而我们加入了 "死 "的状态，则可以减轻这种偏差。

{"title":"Bayesian inference for the Markov-modulated Poisson process with an outcome process","authors":"Yu Luo, Chris Sherlock","doi":"arxiv-2408.15314","DOIUrl":"https://doi.org/arxiv-2408.15314","url":null,"abstract":"In medical research, understanding changes in outcome measurements is crucial\u0000for inferring shifts in a patient's underlying health condition. While data\u0000from clinical and administrative systems hold promise for advancing this\u0000understanding, traditional methods for modelling disease progression struggle\u0000with analyzing a large volume of longitudinal data collected irregularly and do\u0000not account for the phenomenon where the poorer an individual's health, the\u0000more frequently they interact with the healthcare system. In addition, data\u0000from the claim and health care system provide no information for terminating\u0000events, such as death. To address these challenges, we start from the\u0000continuous-time hidden Markov model to understand disease progression by\u0000modelling the observed data as an outcome whose distribution depends on the\u0000state of a latent Markov chain representing the underlying health state.\u0000However, we also allow the underlying health state to influence the timings of\u0000the observations via a point process. Furthermore, we create an addition\u0000\"death\" state and model the unobserved terminating event, a transition to this\u0000state, via an additional Poisson process whose rate depends on the latent state\u0000of the Markov chain. This extension allows us to model disease severity and\u0000death not only based on the types of care received but also on the temporal and\u0000frequency aspects of different observed events. We present an exact Gibbs\u0000sampler procedure that alternates sampling the complete path of the hidden\u0000chain (the latent health state throughout the observation window) conditional\u0000on the complete paths. When the unobserved, terminating event occurs early in\u0000the observation window, there are no more observed events, and naive use of a\u0000model with only \"live\" health states would lead to biases in parameter\u0000estimates; our inclusion of a \"death\" state mitigates against this.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A New Perspective to Fish Trajectory Imputation: A Methodology for Spatiotemporal Modeling of Acoustically Tagged Fish Data 鱼类轨迹推算的新视角：声学标记鱼类数据时空建模方法论

arXiv - STAT - Computation

Pub Date : 2024-08-23 DOI: arxiv-2408.13220

Mahshid Ahmadian, Edward L. Boone, Grace S. Chiu

The focus of this paper is a key component of a methodology forunderstanding, interpolating, and predicting fish movement patterns based onspatiotemporal data recorded by spatially static acoustic receivers. Forperiods of time, fish may be far from the receivers, resulting in the absenceof observations. The lack of information on the fish's location for extendedtime periods poses challenges to the understanding of fish movement patterns,and hence, the identification of proper statistical inference frameworks formodeling the trajectories. As the initial step in our methodology, in thispaper, we implement an imputation strategy that relies on both Markov chain andBrownian motion principles to enhance our dataset over time. This methodologywill be generalizable and applicable to all fish species with similar migrationpatterns or data with similar structures due to the use of static acousticreceivers.

本文的重点是根据空间静态声学接收器记录的时空数据理解、内插和预测鱼类运动模式方法的关键组成部分。在一段时间内，鱼类可能会远离接收器，导致观测结果缺失。由于在较长时间内缺乏有关鱼类位置的信息，这对理解鱼类的运动模式以及确定适当的统计推断框架来模拟鱼类的运动轨迹提出了挑战。作为方法论的第一步，我们在本文中实施了一种估算策略，该策略依赖于马尔可夫链和布朗运动原理来增强我们的数据集。这种方法具有通用性，适用于所有具有类似洄游模式的鱼类物种，或由于使用静态声学接收器而具有类似结构的数据。

引用次数: 0

Disclosure risk assessment with Bayesian non-parametric hierarchical modelling 利用贝叶斯非参数分层模型进行信息披露风险评估

arXiv - STAT - Computation

Pub Date : 2024-08-22 DOI: arxiv-2408.12521

Marco Battiston, Lorenzo Rimella

Micro and survey datasets often contain private information aboutindividuals, like their health status, income or political preferences.Previous studies have shown that, even after data anonymization, a maliciousintruder could still be able to identify individuals in the dataset by matchingtheir variables to external information. Disclosure risk measures arestatistical measures meant to quantify how big such a risk is for a specificdataset. One of the most common measures is the number of sample unique valuesthat are also population-unique. cite{Man12} have shown how mixed membershipmodels can provide very accurate estimates of this measure. A limitation ofthat approach is that the number of extreme profiles has to be chosen by themodeller. In this article, we propose a non-parametric version of the model,based on the Hierarchical Dirichlet Process (HDP). The proposed approach doesnot require any tuning parameter or model selection step and provides accurateestimates of the disclosure risk measure, even with samples as small as 1$%$of the population size. Moreover, a data augmentation scheme to address thepresence of structural zeros is presented. The proposed methodology is testedon a real dataset from the New York census.

微观和调查数据集通常包含个人的私人信息，如健康状况、收入或政治偏好等。以往的研究表明，即使在数据匿名化之后，恶意入侵者仍然可以通过将数据集中的个人变量与外部信息进行匹配，从而识别出数据集中的个人。披露风险度量是一种统计度量，旨在量化特定数据集的这种风险有多大。最常见的测量方法之一是样本唯一值中同时也是人口唯一值的数量。引用{Man12}的研究表明，混合成员模型可以提供非常精确的估计值。这种方法的局限性在于，极端剖面的数量必须由计算者来选择。在本文中，我们提出了一种基于分层迪里希勒过程（HDP）的非参数版本模型。所提出的方法不需要任何调整参数或模型选择步骤，即使样本量只有群体规模的 1%，也能提供准确的披露风险度量估计值。此外，还提出了一种数据增强方案来解决结构零的存在。所提出的方法在纽约人口普查的真实数据集上进行了测试。

{"title":"Disclosure risk assessment with Bayesian non-parametric hierarchical modelling","authors":"Marco Battiston, Lorenzo Rimella","doi":"arxiv-2408.12521","DOIUrl":"https://doi.org/arxiv-2408.12521","url":null,"abstract":"Micro and survey datasets often contain private information about\u0000individuals, like their health status, income or political preferences.\u0000Previous studies have shown that, even after data anonymization, a malicious\u0000intruder could still be able to identify individuals in the dataset by matching\u0000their variables to external information. Disclosure risk measures are\u0000statistical measures meant to quantify how big such a risk is for a specific\u0000dataset. One of the most common measures is the number of sample unique values\u0000that are also population-unique. cite{Man12} have shown how mixed membership\u0000models can provide very accurate estimates of this measure. A limitation of\u0000that approach is that the number of extreme profiles has to be chosen by the\u0000modeller. In this article, we propose a non-parametric version of the model,\u0000based on the Hierarchical Dirichlet Process (HDP). The proposed approach does\u0000not require any tuning parameter or model selection step and provides accurate\u0000estimates of the disclosure risk measure, even with samples as small as 1$%$\u0000of the population size. Moreover, a data augmentation scheme to address the\u0000presence of structural zeros is presented. The proposed methodology is tested\u0000on a real dataset from the New York census.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142189501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

arXiv - STAT - Computation

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀