首页 > 最新文献

arXiv - STAT - Methodology最新文献

英文 中文
An efficient heuristic for approximate maximum flow computations 近似最大流量计算的高效启发式方法
Pub Date : 2024-09-12 DOI: arxiv-2409.08350
Jingyun Qian, Georg Hahn
Several concepts borrowed from graph theory are routinely used to betterunderstand the inner workings of the (human) brain. To this end, a connectivitynetwork of the brain is built first, which then allows one to assess quantitiessuch as information flow and information routing via shortest path and maximumflow computations. Since brain networks typically contain several thousandnodes and edges, computational scaling is a key research area. In thiscontribution, we focus on approximate maximum flow computations in large brainnetworks. By combining graph partitioning with maximum flow computations, wepropose a new approximation algorithm for the computation of the maximum flowwith runtime O(|V||E|^2/k^2) compared to the usual runtime of O(|V||E|^2) forthe Edmonds-Karp algorithm, where $V$ is the set of vertices, $E$ is the set ofedges, and $k$ is the number of partitions. We assess both accuracy and runtimeof the proposed algorithm on simulated graphs as well as on graphs downloadedfrom the Brain Networks Data Repository (https://networkrepository.com).
为了更好地理解(人类)大脑的内部运作,人们经常使用从图论中借用的一些概念。为此,首先要建立大脑的连接网络,然后通过最短路径和最大流计算来评估信息流和信息路由等数量。由于大脑网络通常包含数千个节点和边,因此计算扩展是一个关键的研究领域。在本文中,我们将重点研究大型脑网络中的近似最大流计算。通过将图分割与最大流计算相结合,我们提出了一种计算最大流的新近似算法,其运行时间为 O(|V||E|^2/k^2),而 Edmonds-Karp 算法的通常运行时间为 O(|V||E|^2),其中 $V$ 是顶点集,$E$ 是边集,$k$ 是分割数。我们在模拟图以及从脑网络数据存储库(https://networkrepository.com)下载的图上评估了所提算法的准确性和运行时间。
{"title":"An efficient heuristic for approximate maximum flow computations","authors":"Jingyun Qian, Georg Hahn","doi":"arxiv-2409.08350","DOIUrl":"https://doi.org/arxiv-2409.08350","url":null,"abstract":"Several concepts borrowed from graph theory are routinely used to better\u0000understand the inner workings of the (human) brain. To this end, a connectivity\u0000network of the brain is built first, which then allows one to assess quantities\u0000such as information flow and information routing via shortest path and maximum\u0000flow computations. Since brain networks typically contain several thousand\u0000nodes and edges, computational scaling is a key research area. In this\u0000contribution, we focus on approximate maximum flow computations in large brain\u0000networks. By combining graph partitioning with maximum flow computations, we\u0000propose a new approximation algorithm for the computation of the maximum flow\u0000with runtime O(|V||E|^2/k^2) compared to the usual runtime of O(|V||E|^2) for\u0000the Edmonds-Karp algorithm, where $V$ is the set of vertices, $E$ is the set of\u0000edges, and $k$ is the number of partitions. We assess both accuracy and runtime\u0000of the proposed algorithm on simulated graphs as well as on graphs downloaded\u0000from the Brain Networks Data Repository (https://networkrepository.com).","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Debiased high-dimensional regression calibration for errors-in-variables log-contrast models 变量误差对数对比模型的去偏高维回归校准
Pub Date : 2024-09-11 DOI: arxiv-2409.07568
Huali Zhao, Tianying Wang
Motivated by the challenges in analyzing gut microbiome and metagenomic data,this work aims to tackle the issue of measurement errors in high-dimensionalregression models that involve compositional covariates. This paper marks apioneering effort in conducting statistical inference on high-dimensionalcompositional data affected by mismeasured or contaminated data. We introduce acalibration approach tailored for the linear log-contrast model. Underrelatively lenient conditions regarding the sparsity level of the parameter, wehave established the asymptotic normality of the estimator for inference.Numerical experiments and an application in microbiome study have demonstratedthe efficacy of our high-dimensional calibration strategy in minimizing biasand achieving the expected coverage rates for confidence intervals. Moreover,the potential application of our proposed methodology extends well beyondcompositional data, suggesting its adaptability for a wide range of researchcontexts.
受分析肠道微生物组和元基因组数据所面临的挑战的激励,这项工作旨在解决涉及组成协变量的高维回归模型中的测量误差问题。本文开创性地对受误测或污染数据影响的高维组成数据进行统计推断。我们介绍了一种为线性对数对比模型量身定制的校准方法。数值实验和在微生物组研究中的应用证明了我们的高维校准策略在最小化偏差和实现置信区间预期覆盖率方面的有效性。此外,我们提出的方法的潜在应用范围远远超出了组合数据,这表明该方法适用于广泛的研究环境。
{"title":"Debiased high-dimensional regression calibration for errors-in-variables log-contrast models","authors":"Huali Zhao, Tianying Wang","doi":"arxiv-2409.07568","DOIUrl":"https://doi.org/arxiv-2409.07568","url":null,"abstract":"Motivated by the challenges in analyzing gut microbiome and metagenomic data,\u0000this work aims to tackle the issue of measurement errors in high-dimensional\u0000regression models that involve compositional covariates. This paper marks a\u0000pioneering effort in conducting statistical inference on high-dimensional\u0000compositional data affected by mismeasured or contaminated data. We introduce a\u0000calibration approach tailored for the linear log-contrast model. Under\u0000relatively lenient conditions regarding the sparsity level of the parameter, we\u0000have established the asymptotic normality of the estimator for inference.\u0000Numerical experiments and an application in microbiome study have demonstrated\u0000the efficacy of our high-dimensional calibration strategy in minimizing bias\u0000and achieving the expected coverage rates for confidence intervals. Moreover,\u0000the potential application of our proposed methodology extends well beyond\u0000compositional data, suggesting its adaptability for a wide range of research\u0000contexts.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Order selection in GARMA models for count time series: a Bayesian perspective 计数时间序列 GARMA 模型中的阶次选择:贝叶斯视角
Pub Date : 2024-09-11 DOI: arxiv-2409.07263
Katerine Zuniga Lastra, Guilherme Pumi, Taiane Schaedler Prass
Estimation in GARMA models has traditionally been carried out under thefrequentist approach. To date, Bayesian approaches for such estimation havebeen relatively limited. In the context of GARMA models for count time series,Bayesian estimation achieves satisfactory results in terms of point estimation.Model selection in this context often relies on the use of informationcriteria. Despite its prominence in the literature, the use of informationcriteria for model selection in GARMA models for count time series have beenshown to present poor performance in simulations, especially in terms of theirability to correctly identify models, even under large sample sizes. In thisstudy, we study the problem of order selection in GARMA models for count timeseries, adopting a Bayesian perspective through the application of theReversible Jump Markov Chain Monte Carlo approach. Monte Carlo simulationstudies are conducted to assess the finite sample performance of the developedideas, including point and interval inference, sensitivity analysis, effects ofburn-in and thinning, as well as the choice of related priors andhyperparameters. Two real-data applications are presented, one consideringautomobile production in Brazil and the other considering bus exportation inBrazil before and after the COVID-19 pandemic, showcasing the method'scapabilities and further exploring its flexibility.
传统上,GARMA 模型的估算是采用频数法进行的。迄今为止,用于此类估计的贝叶斯方法相对有限。在计数时间序列的 GARMA 模型中,贝叶斯估计在点估计方面取得了令人满意的结果。尽管信息标准在文献中占有重要地位,但在计数时间序列的 GARMA 模型中用于模型选择的信息标准在模拟中表现不佳,尤其是在正确识别模型的能力方面,即使在样本量较大的情况下也是如此。在本研究中,我们采用贝叶斯视角,通过应用可逆跃迁马尔可夫链蒙特卡罗方法,研究了计数时间序列 GARMA 模型中的阶次选择问题。我们进行了蒙特卡罗模拟研究,以评估所开发思路的有限样本性能,包括点推断和区间推断、灵敏度分析、燃烧和稀疏的影响,以及相关先验和超参数的选择。介绍了两个真实数据应用,一个考虑了巴西的汽车生产,另一个考虑了 COVID-19 大流行前后巴西的公共汽车出口,展示了该方法的能力,并进一步探索了其灵活性。
{"title":"Order selection in GARMA models for count time series: a Bayesian perspective","authors":"Katerine Zuniga Lastra, Guilherme Pumi, Taiane Schaedler Prass","doi":"arxiv-2409.07263","DOIUrl":"https://doi.org/arxiv-2409.07263","url":null,"abstract":"Estimation in GARMA models has traditionally been carried out under the\u0000frequentist approach. To date, Bayesian approaches for such estimation have\u0000been relatively limited. In the context of GARMA models for count time series,\u0000Bayesian estimation achieves satisfactory results in terms of point estimation.\u0000Model selection in this context often relies on the use of information\u0000criteria. Despite its prominence in the literature, the use of information\u0000criteria for model selection in GARMA models for count time series have been\u0000shown to present poor performance in simulations, especially in terms of their\u0000ability to correctly identify models, even under large sample sizes. In this\u0000study, we study the problem of order selection in GARMA models for count time\u0000series, adopting a Bayesian perspective through the application of the\u0000Reversible Jump Markov Chain Monte Carlo approach. Monte Carlo simulation\u0000studies are conducted to assess the finite sample performance of the developed\u0000ideas, including point and interval inference, sensitivity analysis, effects of\u0000burn-in and thinning, as well as the choice of related priors and\u0000hyperparameters. Two real-data applications are presented, one considering\u0000automobile production in Brazil and the other considering bus exportation in\u0000Brazil before and after the COVID-19 pandemic, showcasing the method's\u0000capabilities and further exploring its flexibility.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial Deep Convolutional Neural Networks 空间深度卷积神经网络
Pub Date : 2024-09-11 DOI: arxiv-2409.07559
Qi Wang, Paul A. Parker, Robert B. Lund
Spatial prediction problems often use Gaussian process models, which can becomputationally burdensome in high dimensions. Specification of an appropriatecovariance function for the model can be challenging when complexnon-stationarities exist. Recent work has shown that pre-computed spatial basisfunctions and a feed-forward neural network can capture complex spatialdependence structures while remaining computationally efficient. This paperbuilds on this literature by tailoring spatial basis functions for use inconvolutional neural networks. Through both simulated and real data, wedemonstrate that this approach yields more accurate spatial predictions thanexisting methods. Uncertainty quantification is also considered.
空间预测问题通常使用高斯过程模型,而高斯过程模型在高维度下会成为计算上的负担。当存在复杂的非稳态关系时,为模型指定一个合适的协方差函数可能会很有挑战性。最近的研究表明,预先计算的空间基函数和前馈神经网络可以捕捉复杂的空间依赖性结构,同时保持计算效率。本文在这些文献的基础上,对空间基函数进行了定制,以用于卷积神经网络。通过模拟和真实数据,我们证明了这种方法比现有方法能产生更准确的空间预测。我们还考虑了不确定性量化问题。
{"title":"Spatial Deep Convolutional Neural Networks","authors":"Qi Wang, Paul A. Parker, Robert B. Lund","doi":"arxiv-2409.07559","DOIUrl":"https://doi.org/arxiv-2409.07559","url":null,"abstract":"Spatial prediction problems often use Gaussian process models, which can be\u0000computationally burdensome in high dimensions. Specification of an appropriate\u0000covariance function for the model can be challenging when complex\u0000non-stationarities exist. Recent work has shown that pre-computed spatial basis\u0000functions and a feed-forward neural network can capture complex spatial\u0000dependence structures while remaining computationally efficient. This paper\u0000builds on this literature by tailoring spatial basis functions for use in\u0000convolutional neural networks. Through both simulated and real data, we\u0000demonstrate that this approach yields more accurate spatial predictions than\u0000existing methods. Uncertainty quantification is also considered.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local Sequential MCMC for Data Assimilation with Applications in Geoscience 应用于地球科学数据同化的局部序列 MCMC
Pub Date : 2024-09-11 DOI: arxiv-2409.07111
Hamza Ruzayqat, Omar Knio
This paper presents a new data assimilation (DA) scheme based on a sequentialMarkov Chain Monte Carlo (SMCMC) DA technique [Ruzayqat et al. 2024] which isprovably convergent and has been recently used for filtering, particularly forhigh-dimensional non-linear, and potentially, non-Gaussian state-space models.Unlike particle filters, which can be considered exact methods and can be usedfor filtering non-linear, non-Gaussian models, SMCMC does not assign weights tothe samples/particles, and therefore, the method does not suffer from the issueof weight-degeneracy when a relatively small number of samples is used. Wedesign a localization approach within the SMCMC framework that focuses onregions where observations are located and restricts the transition densitiesincluded in the filtering distribution of the state to these regions. Thisresults in immensely reducing the effective degrees of freedom and thusimproving the efficiency. We test the new technique on high-dimensional ($dsim 10^4 - 10^5$) linear Gaussian model and non-linear shallow water modelswith Gaussian noise with real and synthetic observations. For two of thenumerical examples, the observations mimic the data generated by the SurfaceWater and Ocean Topography (SWOT) mission led by NASA, which is a swath ofocean height observations that changes location at every assimilation timestep. We also use a set of ocean drifters' real observations in which thedrifters are moving according the ocean kinematics and assumed to haveuncertain locations at the time of assimilation. We show that when higheraccuracy is required, the proposed algorithm is superior in terms of efficiencyand accuracy over competing ensemble methods and the original SMCMC filter.
本文提出了一种基于序列马尔可夫链蒙特卡洛(SMCMC)数据同化(DA)技术[Ruzayqat et al. 2024]的新数据同化(DA)方案,该方案具有明显的收敛性,最近已被用于滤波,特别是高维非线性和潜在的非高斯状态空间模型的滤波。粒子滤波器被认为是精确的方法,可用于非线性、非高斯模型的滤波,与粒子滤波器不同的是,SMCMC 不给样本/粒子分配权重,因此,在使用相对较少的样本时,该方法不会出现权重退化的问题。我们在 SMCMC 框架内设计了一种本地化方法,该方法侧重于观测值所在的区域,并将状态滤波分布中包含的过渡密度限制在这些区域内。这大大减少了有效自由度,从而提高了效率。我们在高维($dsim 10^4 - 10^5$)线性高斯模型和具有高斯噪声的非线性浅水模型上测试了新技术,并进行了真实和合成观测。对于其中的两个数值示例,观测数据模仿了美国国家航空航天局(NASA)领导的地表水和海洋地形学(SWOT)任务生成的数据,这是一个在每个同化时间步都会改变位置的海洋高度观测带。我们还使用了一组海洋漂流者的真实观测数据,其中漂流者是根据海洋运动学原理移动的,并假定其在同化时的位置是不确定的。我们的研究表明,当需要更高的精度时,所提出的算法在效率和精度方面都优于其他同类集合方法和原始的 SMCMC 滤波器。
{"title":"Local Sequential MCMC for Data Assimilation with Applications in Geoscience","authors":"Hamza Ruzayqat, Omar Knio","doi":"arxiv-2409.07111","DOIUrl":"https://doi.org/arxiv-2409.07111","url":null,"abstract":"This paper presents a new data assimilation (DA) scheme based on a sequential\u0000Markov Chain Monte Carlo (SMCMC) DA technique [Ruzayqat et al. 2024] which is\u0000provably convergent and has been recently used for filtering, particularly for\u0000high-dimensional non-linear, and potentially, non-Gaussian state-space models.\u0000Unlike particle filters, which can be considered exact methods and can be used\u0000for filtering non-linear, non-Gaussian models, SMCMC does not assign weights to\u0000the samples/particles, and therefore, the method does not suffer from the issue\u0000of weight-degeneracy when a relatively small number of samples is used. We\u0000design a localization approach within the SMCMC framework that focuses on\u0000regions where observations are located and restricts the transition densities\u0000included in the filtering distribution of the state to these regions. This\u0000results in immensely reducing the effective degrees of freedom and thus\u0000improving the efficiency. We test the new technique on high-dimensional ($d\u0000sim 10^4 - 10^5$) linear Gaussian model and non-linear shallow water models\u0000with Gaussian noise with real and synthetic observations. For two of the\u0000numerical examples, the observations mimic the data generated by the Surface\u0000Water and Ocean Topography (SWOT) mission led by NASA, which is a swath of\u0000ocean height observations that changes location at every assimilation time\u0000step. We also use a set of ocean drifters' real observations in which the\u0000drifters are moving according the ocean kinematics and assumed to have\u0000uncertain locations at the time of assimilation. We show that when higher\u0000accuracy is required, the proposed algorithm is superior in terms of efficiency\u0000and accuracy over competing ensemble methods and the original SMCMC filter.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustered Factor Analysis for Multivariate Spatial Data 多变量空间数据的聚类因子分析
Pub Date : 2024-09-11 DOI: arxiv-2409.07018
Yanxiu Jin, Tomoya Wakayama, Renhe Jiang, Shonosuke Sugasawa
Factor analysis has been extensively used to reveal the dependence structuresamong multivariate variables, offering valuable insight in various fields.However, it cannot incorporate the spatial heterogeneity that is typicallypresent in spatial data. To address this issue, we introduce an effectivemethod specifically designed to discover the potential dependence structures inmultivariate spatial data. Our approach assumes that spatial locations can beapproximately divided into a finite number of clusters, with locations withinthe same cluster sharing similar dependence structures. By leveraging aniterative algorithm that combines spatial clustering with factor analysis, wesimultaneously detect spatial clusters and estimate a unique factor model foreach cluster. The proposed method is evaluated through comprehensive simulationstudies, demonstrating its flexibility. In addition, we apply the proposedmethod to a dataset of railway station attributes in the Tokyo metropolitanarea, highlighting its practical applicability and effectiveness in uncoveringcomplex spatial dependencies.
因子分析已被广泛用于揭示多变量之间的依赖结构,为各个领域提供了有价值的见解。然而,它无法纳入空间数据中通常存在的空间异质性。为了解决这个问题,我们引入了一种有效的方法,专门用于发现多变量空间数据中的潜在依赖结构。我们的方法假设空间位置可以大致划分为有限数量的聚类,同一聚类中的位置共享相似的依赖结构。通过利用一种将空间聚类与因子分析相结合的迭代算法,我们可以同时检测空间聚类,并为每个聚类估计一个独特的因子模型。我们通过全面的模拟研究对所提出的方法进行了评估,证明了它的灵活性。此外,我们还将提出的方法应用于东京都地区的火车站属性数据集,突出了该方法在揭示复杂空间依赖关系方面的实用性和有效性。
{"title":"Clustered Factor Analysis for Multivariate Spatial Data","authors":"Yanxiu Jin, Tomoya Wakayama, Renhe Jiang, Shonosuke Sugasawa","doi":"arxiv-2409.07018","DOIUrl":"https://doi.org/arxiv-2409.07018","url":null,"abstract":"Factor analysis has been extensively used to reveal the dependence structures\u0000among multivariate variables, offering valuable insight in various fields.\u0000However, it cannot incorporate the spatial heterogeneity that is typically\u0000present in spatial data. To address this issue, we introduce an effective\u0000method specifically designed to discover the potential dependence structures in\u0000multivariate spatial data. Our approach assumes that spatial locations can be\u0000approximately divided into a finite number of clusters, with locations within\u0000the same cluster sharing similar dependence structures. By leveraging an\u0000iterative algorithm that combines spatial clustering with factor analysis, we\u0000simultaneously detect spatial clusters and estimate a unique factor model for\u0000each cluster. The proposed method is evaluated through comprehensive simulation\u0000studies, demonstrating its flexibility. In addition, we apply the proposed\u0000method to a dataset of railway station attributes in the Tokyo metropolitan\u0000area, highlighting its practical applicability and effectiveness in uncovering\u0000complex spatial dependencies.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-parametric estimation of transition intensities in interval censored Markov multi-state models without loops 无循环区间删失马尔可夫多态模型中过渡强度的非参数估计
Pub Date : 2024-09-11 DOI: arxiv-2409.07176
Daniel Gomon, Hein Putter
Panel data arises when transitions between different states areinterval-censored in multi-state data. The analysis of such data usingnon-parametric multi-state models was not possible until recently, but is verydesirable as it allows for more flexibility than its parametric counterparts.The single available result to date has some unique drawbacks. We propose anon-parametric estimator of the transition intensities for panel data using anExpectation Maximisation algorithm. The method allows for a mix ofinterval-censored and right-censored (exactly observed) transitions. Acondition to check for the convergence of the algorithm to the non-parametricmaximum likelihood estimator is given. A simulation study comparing theproposed estimator to a consistent estimator is performed, and shown to yieldnear identical estimates at smaller computational cost. A data set on theemergence of teeth in children is analysed. Code to perform the analyses ispublicly available.
在多状态数据中,不同状态之间的转换是有时间间隔的,这就产生了面板数据。使用非参数多状态模型分析此类数据直到最近才成为可能,但这是非常可取的,因为它比参数模型更具灵活性。我们提出了一种使用期望最大化算法对面板数据的过渡强度进行非参数估计的方法。该方法允许混合使用区间删失和右删失(精确观测)的过渡。给出了检查该算法向非参数最大似然估计法收敛的条件。对所提出的估计器与一致估计器进行了模拟研究比较,结果表明该估计器以较小的计算成本获得了几乎相同的估计结果。分析了一组儿童牙齿萌出的数据。执行分析的代码可公开获取。
{"title":"Non-parametric estimation of transition intensities in interval censored Markov multi-state models without loops","authors":"Daniel Gomon, Hein Putter","doi":"arxiv-2409.07176","DOIUrl":"https://doi.org/arxiv-2409.07176","url":null,"abstract":"Panel data arises when transitions between different states are\u0000interval-censored in multi-state data. The analysis of such data using\u0000non-parametric multi-state models was not possible until recently, but is very\u0000desirable as it allows for more flexibility than its parametric counterparts.\u0000The single available result to date has some unique drawbacks. We propose a\u0000non-parametric estimator of the transition intensities for panel data using an\u0000Expectation Maximisation algorithm. The method allows for a mix of\u0000interval-censored and right-censored (exactly observed) transitions. A\u0000condition to check for the convergence of the algorithm to the non-parametric\u0000maximum likelihood estimator is given. A simulation study comparing the\u0000proposed estimator to a consistent estimator is performed, and shown to yield\u0000near identical estimates at smaller computational cost. A data set on the\u0000emergence of teeth in children is analysed. Code to perform the analyses is\u0000publicly available.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Bayesian Networks, Elicitation and Data Embedding for Secure Environments 用于安全环境的动态贝叶斯网络、诱导和数据嵌入
Pub Date : 2024-09-11 DOI: arxiv-2409.07389
Kieran Drury, Jim Q. Smith
Serious crime modelling typically needs to be undertaken securely behind afirewall where police knowledge and capabilities can remain undisclosed. Datainforming an ongoing incident is often sparse, with a large proportion ofrelevant data only coming to light after the incident culminates or afterpolice intervene - by which point it is too late to make use of the data to aidreal-time decision making for the incident in question. Much of the data thatis available to police to support real-time decision making is highlyconfidential so cannot be shared with academics, and is therefore missing tothem. In this paper, we describe the development of a formal protocol where agraphical model is used as a framework for securely translating a modeldesigned by an academic team to a model for use by a police team. We then show,for the first time, how libraries of these models can be built and used forreal-time decision support to circumvent the challenges of data missingness andtardiness seen in such a secure environment. The parallel development describedby this protocol ensures that any sensitive information collected by police,and missing to academics, remains secured behind a firewall. The protocolnevertheless guides police so that they are able to combine the typicallyincomplete data streams that are open source with their more sensitiveinformation in a formal and justifiable way. We illustrate the application ofthis protocol by describing how a new entry - a suspected vehicle attack - canbe embedded into such a police library of criminal plots.
重罪建模通常需要在安全的防火墙后进行,这样警方的知识和能力才能不被泄露。正在发生的事件所形成的数据通常很稀少,大部分相关数据只有在事件达到高潮或警方介入后才会曝光--此时再利用这些数据来帮助对相关事件做出实时决策已为时过晚。警方可用于支持实时决策的大部分数据都是高度机密的,因此无法与学术界共享,也就无法为他们所用。在本文中,我们介绍了一个正式协议的开发过程,在该协议中,图形模型被用作一个框架,用于将学术团队设计的模型安全地转换为供警察团队使用的模型。然后,我们首次展示了如何建立这些模型库,并将其用于实时决策支持,以规避在这种安全环境中出现的数据缺失和延迟等挑战。本协议所描述的并行开发可确保警方收集的任何敏感信息以及学术界所遗漏的信息在防火墙后保持安全。尽管如此,该协议仍能为警方提供指导,使他们能够以正规、合理的方式将开源的典型不完整数据流与更敏感的信息结合起来。我们通过描述如何将一个新条目--疑似车辆袭击--嵌入到这样一个警方犯罪阴谋库中来说明该协议的应用。
{"title":"Dynamic Bayesian Networks, Elicitation and Data Embedding for Secure Environments","authors":"Kieran Drury, Jim Q. Smith","doi":"arxiv-2409.07389","DOIUrl":"https://doi.org/arxiv-2409.07389","url":null,"abstract":"Serious crime modelling typically needs to be undertaken securely behind a\u0000firewall where police knowledge and capabilities can remain undisclosed. Data\u0000informing an ongoing incident is often sparse, with a large proportion of\u0000relevant data only coming to light after the incident culminates or after\u0000police intervene - by which point it is too late to make use of the data to aid\u0000real-time decision making for the incident in question. Much of the data that\u0000is available to police to support real-time decision making is highly\u0000confidential so cannot be shared with academics, and is therefore missing to\u0000them. In this paper, we describe the development of a formal protocol where a\u0000graphical model is used as a framework for securely translating a model\u0000designed by an academic team to a model for use by a police team. We then show,\u0000for the first time, how libraries of these models can be built and used for\u0000real-time decision support to circumvent the challenges of data missingness and\u0000tardiness seen in such a secure environment. The parallel development described\u0000by this protocol ensures that any sensitive information collected by police,\u0000and missing to academics, remains secured behind a firewall. The protocol\u0000nevertheless guides police so that they are able to combine the typically\u0000incomplete data streams that are open source with their more sensitive\u0000information in a formal and justifiable way. We illustrate the application of\u0000this protocol by describing how a new entry - a suspected vehicle attack - can\u0000be embedded into such a police library of criminal plots.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local Effects of Continuous Instruments without Positivity 无正向性连续仪器的局部效应
Pub Date : 2024-09-11 DOI: arxiv-2409.07350
Prabrisha Rakshit, Alexander Levis, Luke Keele
Instrumental variables have become a popular study design for the estimationof treatment effects in the presence of unobserved confounders. In thecanonical instrumental variables design, the instrument is a binary variable,and most extant methods are tailored to this context. In many settings,however, the instrument is a continuous measure. Standard estimation methodscan be applied with continuous instruments, but they require strong assumptionsregarding functional form. Moreover, while some recent work has introduced moreflexible approaches for continuous instruments, these methods require anassumption known as positivity that is unlikely to hold in many applications.We derive a novel family of causal estimands using a stochastic dynamicintervention framework that considers a range of intervention distributionsthat are absolutely continuous with respect to the observed distribution of theinstrument. These estimands focus on a specific form of local effect but do notrequire a positivity assumption. Next, we develop doubly robust estimators forthese estimands that allow for estimation of the nuisance functions vianonparametric estimators. We use empirical process theory and sample splittingto derive asymptotic properties of the proposed estimators under weakconditions. In addition, we derive methods for profiling the principal strataas well as a method for sensitivity analysis for assessing robustness to anunderlying monotonicity assumption. We evaluate our methods via simulation anddemonstrate their feasibility using an application on the effectiveness ofsurgery for specific emergency conditions.
工具变量已成为一种流行的研究设计,用于估计存在未观察混杂因素时的治疗效果。在典型的工具变量设计中,工具是二元变量,大多数现存方法都是针对这种情况而设计的。然而,在许多情况下,工具是一个连续变量。标准的估计方法可以应用于连续工具,但需要对函数形式进行严格的假设。此外,虽然最近的一些研究针对连续工具引入了更灵活的方法,但这些方法需要一个被称为 "正向性 "的假设,而这个假设在很多应用中都不太可能成立。我们利用随机动态干预框架推导出了一系列新颖的因果估计方法,这些方法考虑了一系列相对于观察到的工具分布而言绝对连续的干预分布。这些估计值关注的是一种特定形式的局部效应,但不需要正向性假设。接下来,我们为这些估计项开发了双重稳健估计器,允许用非参数估计器来估计滋扰函数。我们利用经验过程理论和样本分割推导出所提估计器在弱条件下的渐近特性。此外,我们还推导出了剖析主层的方法以及敏感性分析方法,用于评估对基本单调性假设的稳健性。我们通过模拟对我们的方法进行了评估,并通过对特定紧急情况下手术效果的应用证明了这些方法的可行性。
{"title":"Local Effects of Continuous Instruments without Positivity","authors":"Prabrisha Rakshit, Alexander Levis, Luke Keele","doi":"arxiv-2409.07350","DOIUrl":"https://doi.org/arxiv-2409.07350","url":null,"abstract":"Instrumental variables have become a popular study design for the estimation\u0000of treatment effects in the presence of unobserved confounders. In the\u0000canonical instrumental variables design, the instrument is a binary variable,\u0000and most extant methods are tailored to this context. In many settings,\u0000however, the instrument is a continuous measure. Standard estimation methods\u0000can be applied with continuous instruments, but they require strong assumptions\u0000regarding functional form. Moreover, while some recent work has introduced more\u0000flexible approaches for continuous instruments, these methods require an\u0000assumption known as positivity that is unlikely to hold in many applications.\u0000We derive a novel family of causal estimands using a stochastic dynamic\u0000intervention framework that considers a range of intervention distributions\u0000that are absolutely continuous with respect to the observed distribution of the\u0000instrument. These estimands focus on a specific form of local effect but do not\u0000require a positivity assumption. Next, we develop doubly robust estimators for\u0000these estimands that allow for estimation of the nuisance functions via\u0000nonparametric estimators. We use empirical process theory and sample splitting\u0000to derive asymptotic properties of the proposed estimators under weak\u0000conditions. In addition, we derive methods for profiling the principal strata\u0000as well as a method for sensitivity analysis for assessing robustness to an\u0000underlying monotonicity assumption. We evaluate our methods via simulation and\u0000demonstrate their feasibility using an application on the effectiveness of\u0000surgery for specific emergency conditions.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Determining number of factors under stability considerations 在考虑稳定性的情况下确定因子数量
Pub Date : 2024-09-11 DOI: arxiv-2409.07617
Sze Ming Lee, Yunxiao Chen
This paper proposes a novel method for determining the number of factors inlinear factor models under stability considerations. An instability measure isproposed based on the principal angle between the estimated loading spacesobtained by data splitting. Based on this measure, criteria for determining thenumber of factors are proposed and shown to be consistent. This consistency isobtained using results from random matrix theory, especially the completedelocalization of non-outlier eigenvectors. The advantage of the proposedmethods over the existing ones is shown via weaker asymptotic requirements forconsistency, simulation studies and a real data example.
本文提出了一种在考虑稳定性的前提下确定线性因子模型中因子个数的新方法。根据数据分割得到的估计载荷空间之间的主角,提出了一种不稳定性度量。在此基础上,提出了确定因子个数的标准,并证明这些标准是一致的。这种一致性是利用随机矩阵理论的结果,特别是非离群特征向量的完整定位来实现的。通过弱化一致性的渐近要求、模拟研究和真实数据示例,展示了所提出的方法相对于现有方法的优势。
{"title":"Determining number of factors under stability considerations","authors":"Sze Ming Lee, Yunxiao Chen","doi":"arxiv-2409.07617","DOIUrl":"https://doi.org/arxiv-2409.07617","url":null,"abstract":"This paper proposes a novel method for determining the number of factors in\u0000linear factor models under stability considerations. An instability measure is\u0000proposed based on the principal angle between the estimated loading spaces\u0000obtained by data splitting. Based on this measure, criteria for determining the\u0000number of factors are proposed and shown to be consistent. This consistency is\u0000obtained using results from random matrix theory, especially the complete\u0000delocalization of non-outlier eigenvectors. The advantage of the proposed\u0000methods over the existing ones is shown via weaker asymptotic requirements for\u0000consistency, simulation studies and a real data example.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - STAT - Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1