首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Joint estimation of precision matrices for long-memory time series 长记忆时间序列精度矩阵的联合估计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-19 DOI: 10.1016/j.csda.2025.108234
Qihu Zhang , Jongik Chung , Cheolwoo Park
Methods are proposed for estimating multiple precision matrices for long-memory time series, with particular emphasis on the analysis of resting-state functional magnetic resonance imaging (fMRI) data obtained from multiple subjects. The objective is to estimate both individual brain networks and a common structure representative of a group. Several approaches employing weighted aggregation are introduced to simultaneously estimate individual and group-level precision matrices. Convergence rates of the estimators are examined under various norms and expectations, and their performance is evaluated under both sub-Gaussian and heavy-tailed distributions. The proposed methods are demonstrated through simulated data and real resting-state fMRI datasets.
提出了估计长记忆时间序列的多个精度矩阵的方法,重点分析了从多个受试者获得的静息状态功能磁共振成像(fMRI)数据。目的是估计个体大脑网络和代表群体的共同结构。介绍了几种采用加权聚合的方法来同时估计个体和群体级精度矩阵。在各种规范和期望下检验了估计器的收敛速度,并在亚高斯分布和重尾分布下评估了它们的性能。通过模拟数据和真实静息状态fMRI数据集验证了所提出的方法。
{"title":"Joint estimation of precision matrices for long-memory time series","authors":"Qihu Zhang ,&nbsp;Jongik Chung ,&nbsp;Cheolwoo Park","doi":"10.1016/j.csda.2025.108234","DOIUrl":"10.1016/j.csda.2025.108234","url":null,"abstract":"<div><div>Methods are proposed for estimating multiple precision matrices for long-memory time series, with particular emphasis on the analysis of resting-state functional magnetic resonance imaging (fMRI) data obtained from multiple subjects. The objective is to estimate both individual brain networks and a common structure representative of a group. Several approaches employing weighted aggregation are introduced to simultaneously estimate individual and group-level precision matrices. Convergence rates of the estimators are examined under various norms and expectations, and their performance is evaluated under both sub-Gaussian and heavy-tailed distributions. The proposed methods are demonstrated through simulated data and real resting-state fMRI datasets.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108234"},"PeriodicalIF":1.5,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144338392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference on a stochastic SIR model including growth curves 包含生长曲线的随机SIR模型的推论
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-16 DOI: 10.1016/j.csda.2025.108231
Giuseppina Albano , Virginia Giorno , Gema Pérez-Romero , Francisco de Asis Torres-Ruiz
A Susceptible-Infected-Removed stochastic model is presented, in which the stochasticity is introduced through two independent Brownian motions in the dynamics of the Susceptible and Infected populations. To account for the natural evolution of the Susceptible population, a growth function is considered in which size is influenced by the birth and death of individuals. Inference for such a model is addressed by means of a Quasi Maximum Likelihood Estimation (QMLE) method. The resulting nonlinear system can be numerically solved by iterative procedures. A technique to obtain the initial solutions usually required by such methods is also provided. Finally, simulation studies are performed for three well-known growth functions, namely Gompertz, Logistic and Bertalanffy curves. The performance of the initial estimates of the involved parameters is assessed, and the goodness of the proposed methodology is evaluated.
提出了一种易感-感染-去除随机模型,该模型通过易感种群和感染种群动力学中的两个独立布朗运动引入随机性。为了解释易感群体的自然进化,考虑了一个生长函数,其中大小受个体出生和死亡的影响。利用拟极大似然估计(Quasi Maximum Likelihood Estimation, QMLE)方法解决了该模型的推理问题。所得到的非线性系统可以通过迭代过程进行数值求解。本文还提供了一种获得这些方法通常需要的初始解的技术。最后,对Gompertz曲线、Logistic曲线和Bertalanffy曲线这三种著名的生长函数进行了仿真研究。评估了所涉及参数的初始估计的性能,并评估了所提出方法的优点。
{"title":"Inference on a stochastic SIR model including growth curves","authors":"Giuseppina Albano ,&nbsp;Virginia Giorno ,&nbsp;Gema Pérez-Romero ,&nbsp;Francisco de Asis Torres-Ruiz","doi":"10.1016/j.csda.2025.108231","DOIUrl":"10.1016/j.csda.2025.108231","url":null,"abstract":"<div><div>A Susceptible-Infected-Removed stochastic model is presented, in which the stochasticity is introduced through two independent Brownian motions in the dynamics of the Susceptible and Infected populations. To account for the natural evolution of the Susceptible population, a growth function is considered in which size is influenced by the birth and death of individuals. Inference for such a model is addressed by means of a Quasi Maximum Likelihood Estimation (QMLE) method. The resulting nonlinear system can be numerically solved by iterative procedures. A technique to obtain the initial solutions usually required by such methods is also provided. Finally, simulation studies are performed for three well-known growth functions, namely Gompertz, Logistic and Bertalanffy curves. The performance of the initial estimates of the involved parameters is assessed, and the goodness of the proposed methodology is evaluated.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108231"},"PeriodicalIF":1.5,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144338395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-preserving communication-efficient spectral clustering for distributed multiple networks 分布式多网络的保密性通信高效频谱聚类
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-09 DOI: 10.1016/j.csda.2025.108230
Shanghao Wu , Xiao Guo , Hai Zhang
Multi-layer networks arise naturally in various scientific domains including social sciences, biology, neuroscience, among others. The network layers of a given multi-layer network are commonly stored in a local and distributed fashion because of the privacy, ownership, and communication costs. The literature on community detection based on these data is still limited. This paper proposes a new distributed spectral clustering-based algorithm for consensus community detection of the locally stored multi-layer network. The algorithm is based on the power method. It is communication-efficient by allowing multiple local power iterations before aggregation; and privacy-preserving by incorporating the notion of differential privacy. The convergence rate of the proposed algorithm is studied under the assumption that the multi-layer networks are generated from the multi-layer stochastic block models. Numerical studies show the superior performance of the proposed algorithm over competitive algorithms.
多层网络自然出现在各种科学领域,包括社会科学、生物学、神经科学等。考虑到隐私、所有权和通信成本,给定多层网络的网络层通常以本地和分布式方式存储。基于这些数据的社区检测文献仍然有限。本文提出了一种新的基于分布式谱聚类的局部存储多层网络共识团体检测算法。该算法基于幂次法。它允许在聚合之前进行多次本地功率迭代,从而提高了通信效率;通过结合差分隐私的概念来保护隐私。在多层随机块模型生成多层网络的假设下,研究了该算法的收敛速度。数值研究表明,该算法的性能优于竞争算法。
{"title":"Privacy-preserving communication-efficient spectral clustering for distributed multiple networks","authors":"Shanghao Wu ,&nbsp;Xiao Guo ,&nbsp;Hai Zhang","doi":"10.1016/j.csda.2025.108230","DOIUrl":"10.1016/j.csda.2025.108230","url":null,"abstract":"<div><div>Multi-layer networks arise naturally in various scientific domains including social sciences, biology, neuroscience, among others. The network layers of a given multi-layer network are commonly stored in a local and distributed fashion because of the privacy, ownership, and communication costs. The literature on community detection based on these data is still limited. This paper proposes a new distributed spectral clustering-based algorithm for consensus community detection of the locally stored multi-layer network. The algorithm is based on the power method. It is communication-efficient by allowing multiple local power iterations before aggregation; and privacy-preserving by incorporating the notion of differential privacy. The convergence rate of the proposed algorithm is studied under the assumption that the multi-layer networks are generated from the multi-layer stochastic block models. Numerical studies show the superior performance of the proposed algorithm over competitive algorithms.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108230"},"PeriodicalIF":1.5,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144261609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flexible modeling of left-truncated and interval-censored competing risks data with missing event types 具有缺失事件类型的左截尾和区间截尾竞争风险数据的灵活建模
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-05 DOI: 10.1016/j.csda.2025.108229
Yichen Lou , Yuqing Ma , Liming Xiang , Jianguo Sun
Interval-censored competing risks data arise in many cohort studies in clinical research, where multiple types of events subject to interval censoring are included and the occurrence of the primary event of interest may be censored by the occurrence of other events. The presence of missing event types and left truncation poses challenges to the regression analysis of such data. We propose a new two-stage estimation procedure under a class of semiparametric generalized odds rate transformation models to overcome these challenges. Our method first facilitates the estimation of both the probability of response and the probability of occurrence of each type of event under the missing at random assumption, using either parametric or non-parametric methods. An augmented inverse probability weighting likelihood based on the complete-case likelihood and data from subjects with missing type of event is then maximized for estimating regression parameters. We provide desirable asymptotic properties and construct a concordance index to evaluate the model's discriminative ability. The proposed method is demonstrated through extensive simulations and the analysis of data from the Amsterdam cohort study on HIV infection and AIDS.
间隔审查竞争风险数据出现在临床研究中的许多队列研究中,其中包括受间隔审查的多种类型的事件,并且主要感兴趣事件的发生可能被其他事件的发生所审查。缺失事件类型和左截断的存在对此类数据的回归分析提出了挑战。为了克服这些挑战,我们在一类半参数广义比值率变换模型下提出了一种新的两阶段估计方法。我们的方法首先使用参数或非参数方法,便于在随机缺失假设下估计响应概率和每种事件发生的概率。然后,基于完全案例似然和缺失事件类型的受试者数据的增广逆概率加权似然最大化用于估计回归参数。我们给出了理想的渐近性质,并构造了一个一致性指标来评价模型的判别能力。提出的方法是通过广泛的模拟和数据分析从阿姆斯特丹队列研究艾滋病毒感染和艾滋病证明。
{"title":"Flexible modeling of left-truncated and interval-censored competing risks data with missing event types","authors":"Yichen Lou ,&nbsp;Yuqing Ma ,&nbsp;Liming Xiang ,&nbsp;Jianguo Sun","doi":"10.1016/j.csda.2025.108229","DOIUrl":"10.1016/j.csda.2025.108229","url":null,"abstract":"<div><div>Interval-censored competing risks data arise in many cohort studies in clinical research, where multiple types of events subject to interval censoring are included and the occurrence of the primary event of interest may be censored by the occurrence of other events. The presence of missing event types and left truncation poses challenges to the regression analysis of such data. We propose a new two-stage estimation procedure under a class of semiparametric generalized odds rate transformation models to overcome these challenges. Our method first facilitates the estimation of both the probability of response and the probability of occurrence of each type of event under the missing at random assumption, using either parametric or non-parametric methods. An augmented inverse probability weighting likelihood based on the complete-case likelihood and data from subjects with missing type of event is then maximized for estimating regression parameters. We provide desirable asymptotic properties and construct a concordance index to evaluate the model's discriminative ability. The proposed method is demonstrated through extensive simulations and the analysis of data from the Amsterdam cohort study on HIV infection and AIDS.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108229"},"PeriodicalIF":1.5,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144242893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Region detection and image clustering via sparse Kronecker product decomposition 基于稀疏Kronecker积分解的区域检测与图像聚类
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-03 DOI: 10.1016/j.csda.2025.108226
Guang Yang , Long Feng
Image clustering is usually conducted by vectorizing image pixels, treating them as independent, and applying classical clustering approaches to the obtained features. However, as image data is often of high-dimensional and contains rich spatial information, such treatment is far from satisfactory. For medical image data, another important characteristic is the region-wise sparseness in signals. That is to say, there are only a few unknown regions in the medical image that differentiate the images associated with different groups of patients, while other regions are uninformative. Accurately detecting these informative regions would not only improve clustering accuracy, more importantly, it would also provide interpretations for the rationale behind them. Motivated by the need to identify significant regions of interest, we propose a general framework named Image Clustering via Sparse Kronecker Product Decomposition (IC-SKPD). This framework aims to simultaneously divide samples into clusters and detect regions that are informative for clustering. Our framework is general in the sense that it provides a unified treatment for matrix and tensor-valued samples. An iterative hard-thresholded singular value decomposition approach is developed to solve this model. Theoretically, the IC-SKPD enjoys guarantees for clustering accuracy and region detection consistency under mild conditions on the minimum signals. Comprehensive simulations along with real data analysis further validate the superior performance of IC-SKPD on clustering and region detection.
图像聚类通常是通过对图像像素进行矢量化,将它们视为独立的,然后对得到的特征应用经典聚类方法进行聚类。然而,由于图像数据往往是高维的,并且包含了丰富的空间信息,这样的处理是远远不能令人满意的。对于医学图像数据,另一个重要的特征是信号的区域稀疏性。也就是说,医学图像中只有少数未知区域能够区分与不同患者组相关的图像,而其他区域是无信息的。准确地检测这些信息区域不仅可以提高聚类的准确性,更重要的是,它还可以为它们背后的原理提供解释。由于需要识别感兴趣的重要区域,我们提出了一个通用框架,称为通过稀疏Kronecker积分解(IC-SKPD)的图像聚类。该框架旨在同时将样本划分为簇,并检测用于聚类的信息区域。我们的框架是通用的,因为它提供了对矩阵和张量值样本的统一处理。提出了一种迭代硬阈值奇异值分解方法来求解该模型。理论上,IC-SKPD在最小信号的温和条件下保证了聚类精度和区域检测一致性。综合仿真和实际数据分析进一步验证了IC-SKPD在聚类和区域检测方面的优越性能。
{"title":"Region detection and image clustering via sparse Kronecker product decomposition","authors":"Guang Yang ,&nbsp;Long Feng","doi":"10.1016/j.csda.2025.108226","DOIUrl":"10.1016/j.csda.2025.108226","url":null,"abstract":"<div><div>Image clustering is usually conducted by vectorizing image pixels, treating them as independent, and applying classical clustering approaches to the obtained features. However, as image data is often of high-dimensional and contains rich spatial information, such treatment is far from satisfactory. For medical image data, another important characteristic is the region-wise sparseness in signals. That is to say, there are only a few unknown regions in the medical image that differentiate the images associated with different groups of patients, while other regions are uninformative. Accurately detecting these informative regions would not only improve clustering accuracy, more importantly, it would also provide interpretations for the rationale behind them. Motivated by the need to identify significant regions of interest, we propose a general framework named Image Clustering via Sparse Kronecker Product Decomposition (IC-SKPD). This framework aims to simultaneously divide samples into clusters and detect regions that are informative for clustering. Our framework is general in the sense that it provides a unified treatment for matrix and tensor-valued samples. An iterative hard-thresholded singular value decomposition approach is developed to solve this model. Theoretically, the IC-SKPD enjoys guarantees for clustering accuracy and region detection consistency under mild conditions on the minimum signals. Comprehensive simulations along with real data analysis further validate the superior performance of IC-SKPD on clustering and region detection.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108226"},"PeriodicalIF":1.5,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144242892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed iterative hard thresholding for variable selection in Tobit models Tobit模型中变量选择的分布式迭代硬阈值
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-03 DOI: 10.1016/j.csda.2025.108227
Changxin Yang , Zhongyi Zhu , Hongmei Lin , Zengyan Fan , Heng Lian
While there is a substantial body of research on high-dimensional regression with left-censored responses, few methods address this problem in a distributed manner. Due to data transmission limitations and privacy concerns, centralizing all data is often impractical, necessitating a method for collaborative learning with distributed data. In this paper, we employ the Iterative Hard Thresholding (IHT) method for the Tobit model to address this challenge, allowing one to directly specify the desired sparsity and offering an alternative estimation and variable selection approach. Theoretical analysis shows that our estimator achieves a nearly minimax-optimal convergence rate using only a few rounds of communication. Its practical performance is evaluated under both the pooled and the distributed setting. The former highlights its competitive estimation efficiency and variable selection performance compared to existing approaches, while the latter demonstrates that the decentralized estimator closely matches the performance of its centralized counterpart. When applied to high-dimensional left-censored HIV viral load data, our method also demonstrates comparable performance.
虽然有大量关于左删节响应的高维回归的研究,但很少有方法以分布式的方式解决这个问题。由于数据传输限制和隐私问题,集中所有数据通常是不切实际的,因此需要一种使用分布式数据进行协作学习的方法。在本文中,我们对Tobit模型采用迭代硬阈值(IHT)方法来解决这一挑战,允许人们直接指定所需的稀疏性,并提供替代估计和变量选择方法。理论分析表明,我们的估计器仅使用几轮通信就达到了接近最小最大最优收敛速率。在池化和分布式两种情况下对其实际性能进行了评价。与现有方法相比,前者突出了其具有竞争力的估计效率和变量选择性能,而后者则表明分散估计器的性能与集中式估计器的性能非常匹配。当应用于高维左删节HIV病毒载量数据时,我们的方法也显示出相当的性能。
{"title":"Distributed iterative hard thresholding for variable selection in Tobit models","authors":"Changxin Yang ,&nbsp;Zhongyi Zhu ,&nbsp;Hongmei Lin ,&nbsp;Zengyan Fan ,&nbsp;Heng Lian","doi":"10.1016/j.csda.2025.108227","DOIUrl":"10.1016/j.csda.2025.108227","url":null,"abstract":"<div><div>While there is a substantial body of research on high-dimensional regression with left-censored responses, few methods address this problem in a distributed manner. Due to data transmission limitations and privacy concerns, centralizing all data is often impractical, necessitating a method for collaborative learning with distributed data. In this paper, we employ the Iterative Hard Thresholding (IHT) method for the Tobit model to address this challenge, allowing one to directly specify the desired sparsity and offering an alternative estimation and variable selection approach. Theoretical analysis shows that our estimator achieves a nearly minimax-optimal convergence rate using only a few rounds of communication. Its practical performance is evaluated under both the pooled and the distributed setting. The former highlights its competitive estimation efficiency and variable selection performance compared to existing approaches, while the latter demonstrates that the decentralized estimator closely matches the performance of its centralized counterpart. When applied to high-dimensional left-censored HIV viral load data, our method also demonstrates comparable performance.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108227"},"PeriodicalIF":1.5,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144203578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
JANE: Just Another latent space NEtwork clustering algorithm 简:只是另一个潜在空间网络聚类算法
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-02 DOI: 10.1016/j.csda.2025.108228
Alan T. Arakkal, Daniel K. Sewell
While latent space network models have been a popular approach for community detection for over 15 years, major computational challenges remain, limiting the ability to scale beyond small networks. The R statistical software package, JANE, introduces a new estimation algorithm with massive speedups derived from: (1) a low dimensional approximation approach to adjust for degree heterogeneity parameters; (2) an approximation of intractable likelihood terms; (3) a fast initialization algorithm; and (4) a novel set of convergence criteria focused on clustering performance. Additionally, the proposed method addresses limitations of current implementations, which rely on a restrictive spherical-shape assumption for the prior distribution on the latent positions; relaxing this constraint allows for greater flexibility across diverse network structures. A simulation study evaluating clustering performance of the proposed approach against state-of-the-art methods shows dramatically improved clustering performance in most scenarios and significant reductions in computational time — up to 45 times faster compared to existing approaches.
虽然潜在空间网络模型已经成为社区检测的流行方法超过15年,但主要的计算挑战仍然存在,限制了扩展到小型网络之外的能力。R统计软件包JANE引入了一种新的估计算法,该算法具有巨大的速度提升,其源自:(1)一种低维近似方法来调整程度异质性参数;(2)难以处理的似然项的近似;(3)快速初始化算法;(4)基于聚类性能的一套新的收敛准则。此外,所提出的方法解决了当前实现的局限性,即依赖于潜在位置先验分布的限制性球形假设;放宽这个限制可以在不同的网络结构中获得更大的灵活性。一项模拟研究评估了所提出的方法与最先进的方法的聚类性能,结果表明,在大多数情况下,该方法的聚类性能都得到了显著提高,计算时间也显著减少——与现有方法相比,速度提高了45倍。
{"title":"JANE: Just Another latent space NEtwork clustering algorithm","authors":"Alan T. Arakkal,&nbsp;Daniel K. Sewell","doi":"10.1016/j.csda.2025.108228","DOIUrl":"10.1016/j.csda.2025.108228","url":null,"abstract":"<div><div>While latent space network models have been a popular approach for community detection for over 15 years, major computational challenges remain, limiting the ability to scale beyond small networks. The R statistical software package, <span>JANE</span>, introduces a new estimation algorithm with massive speedups derived from: (1) a low dimensional approximation approach to adjust for degree heterogeneity parameters; (2) an approximation of intractable likelihood terms; (3) a fast initialization algorithm; and (4) a novel set of convergence criteria focused on clustering performance. Additionally, the proposed method addresses limitations of current implementations, which rely on a restrictive spherical-shape assumption for the prior distribution on the latent positions; relaxing this constraint allows for greater flexibility across diverse network structures. A simulation study evaluating clustering performance of the proposed approach against state-of-the-art methods shows dramatically improved clustering performance in most scenarios and significant reductions in computational time — up to 45 times faster compared to existing approaches.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108228"},"PeriodicalIF":1.5,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144222027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian forecasting of Italian seismicity using the spatiotemporal RETAS model 利用时空RETAS模型对意大利地震活动性的贝叶斯预测
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-02 DOI: 10.1016/j.csda.2025.108219
Tom Stindl , Zelong Bi , Clara Grazian
Spatiotemporal Renewal Epidemic Type Aftershock Sequence models are self-exciting point processes that model the occurrence time, epicenter, and magnitude of earthquakes in a geographical region. The arrival rate of earthquakes is formulated as the superposition of a main shock renewal process and homogeneous Poisson processes for the aftershocks, motivated by empirical laws in seismology. Existing methods for model fitting rely on maximizing the log-likelihood by either direct numerical optimization or Expectation Maximization algorithms, both of which can suffer from convergence issues and lack adequate quantification of parameter estimation uncertainty. To address these limitations, a Bayesian approach is employed, with posterior inference carried out using a data augmentation strategy within a Markov chain Monte Carlo framework. The branching structure is treated as a latent variable to improve sampling efficiency, and a purpose-built Hamiltonian Monte Carlo sampler is implemented to update the parameters within the Gibbs sampler. This methodology enables parameter uncertainty to be incorporated into forecasts of seismicity. Estimation and forecasting are demonstrated on simulated catalogs and an earthquake catalog from Italy. R code implementing the methods is provided in the Supplementary Materials.
时空更新流行型余震序列模型是一种模拟地理区域内地震发生时间、震中和震级的自激点过程。根据地震学的经验规律,将地震的到达率表示为主震更新过程和余震的均匀泊松过程的叠加。现有的模型拟合方法依赖于通过直接数值优化或期望最大化算法来最大化对数似然,这两种方法都存在收敛问题,并且缺乏对参数估计不确定性的充分量化。为了解决这些限制,采用贝叶斯方法,并在马尔可夫链蒙特卡罗框架内使用数据增强策略进行后验推理。将分支结构作为潜在变量处理以提高采样效率,并实现了专用的哈密顿蒙特卡罗采样器来更新吉布斯采样器内的参数。这种方法可以将参数的不确定性纳入地震活动性的预测中。以模拟地震目录和意大利地震目录为例进行了估计和预报。在补充资料中提供了实现这些方法的R代码。
{"title":"Bayesian forecasting of Italian seismicity using the spatiotemporal RETAS model","authors":"Tom Stindl ,&nbsp;Zelong Bi ,&nbsp;Clara Grazian","doi":"10.1016/j.csda.2025.108219","DOIUrl":"10.1016/j.csda.2025.108219","url":null,"abstract":"<div><div>Spatiotemporal Renewal Epidemic Type Aftershock Sequence models are self-exciting point processes that model the occurrence time, epicenter, and magnitude of earthquakes in a geographical region. The arrival rate of earthquakes is formulated as the superposition of a main shock renewal process and homogeneous Poisson processes for the aftershocks, motivated by empirical laws in seismology. Existing methods for model fitting rely on maximizing the log-likelihood by either direct numerical optimization or Expectation Maximization algorithms, both of which can suffer from convergence issues and lack adequate quantification of parameter estimation uncertainty. To address these limitations, a Bayesian approach is employed, with posterior inference carried out using a data augmentation strategy within a Markov chain Monte Carlo framework. The branching structure is treated as a latent variable to improve sampling efficiency, and a purpose-built Hamiltonian Monte Carlo sampler is implemented to update the parameters within the Gibbs sampler. This methodology enables parameter uncertainty to be incorporated into forecasts of seismicity. Estimation and forecasting are demonstrated on simulated catalogs and an earthquake catalog from Italy. <span>R</span> code implementing the methods is provided in the Supplementary Materials.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"212 ","pages":"Article 108219"},"PeriodicalIF":1.5,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144261610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Small area prediction of counts under machine learning-type mixed models 机器学习混合模型下计数的小面积预测
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-30 DOI: 10.1016/j.csda.2025.108218
Nicolas Frink, Timo Schmid
Small area estimation methods are proposed that use generalized tree-based machine learning techniques to improve the estimation of disaggregated means in small areas using discrete survey data. Specifically, two existing approaches based on random forests - the Generalized Mixed Effects Random Forest (GMERF) and a Mixed Effects Random Forest (MERF) - are extended to accommodate count outcomes, addressing key challenges such as overdispersion. Additionally, three bootstrap methodologies designed to assess the reliability of point estimators for area-level means are evaluated. The numerical analysis shows that the MERF, which does not assume a Poisson distribution to model the mean behavior of count data, excels in scenarios of severe overdispersion. Conversely, the GMERF performs best under conditions where Poisson distribution assumptions are moderately met. In a case study using real-world data from the state of Guerrero, Mexico, the proposed methods effectively estimate area-level means while capturing the uncertainty inherent in overdispersed count data. These findings highlight their practical applicability for small area estimation.
提出了一种小区域估计方法,该方法使用基于广义树的机器学习技术来改进使用离散调查数据的小区域分解均值的估计。具体来说,现有的两种基于随机森林的方法——广义混合效应随机森林(GMERF)和混合效应随机森林(MERF)——得到了扩展,以适应计数结果,解决了过度分散等关键挑战。此外,三种bootstrap方法旨在评估点估计器的可靠性为区域水平的平均值进行了评估。数值分析表明,MERF不假设泊松分布来模拟计数数据的平均行为,在严重过分散的情况下表现出色。相反,GMERF在适度满足泊松分布假设的条件下表现最佳。在使用来自墨西哥Guerrero州的真实数据的案例研究中,所提出的方法有效地估计了面积水平的平均值,同时捕获了过度分散计数数据中固有的不确定性。这些发现突出了它们在小面积估计中的实际适用性。
{"title":"Small area prediction of counts under machine learning-type mixed models","authors":"Nicolas Frink,&nbsp;Timo Schmid","doi":"10.1016/j.csda.2025.108218","DOIUrl":"10.1016/j.csda.2025.108218","url":null,"abstract":"<div><div>Small area estimation methods are proposed that use generalized tree-based machine learning techniques to improve the estimation of disaggregated means in small areas using discrete survey data. Specifically, two existing approaches based on random forests - the Generalized Mixed Effects Random Forest (GMERF) and a Mixed Effects Random Forest (MERF) - are extended to accommodate count outcomes, addressing key challenges such as overdispersion. Additionally, three bootstrap methodologies designed to assess the reliability of point estimators for area-level means are evaluated. The numerical analysis shows that the MERF, which does not assume a Poisson distribution to model the mean behavior of count data, excels in scenarios of severe overdispersion. Conversely, the GMERF performs best under conditions where Poisson distribution assumptions are moderately met. In a case study using real-world data from the state of Guerrero, Mexico, the proposed methods effectively estimate area-level means while capturing the uncertainty inherent in overdispersed count data. These findings highlight their practical applicability for small area estimation.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108218"},"PeriodicalIF":1.5,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144196139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Frisch-Waugh-Lovell theorem for empirical likelihood 经验似然的Frisch-Waugh-Lovell定理
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-23 DOI: 10.1016/j.csda.2025.108208
Yichun Song
A Frisch-Waugh-Lovell-type (FWL) theorem for empirical likelihood estimation with instrumental variables is presented, which resembles the standard FWL theorem in ordinary least squares (OLS), but its partitioning procedure employs the empirical likelihood weights at the solution rather than the original sample distribution. This result is leveraged to simplify the computational process through an iterative algorithm, where exogenous variables are partitioned out using weighted least squares, and the weights are updated between iterations. Furthermore, it is demonstrated that iterations converge locally to the original empirical likelihood estimate at a stochastically super-linear rate. A feasible iterative constrained optimization algorithm for calculating empirical-likelihood-based confidence intervals is provided, along with a discussion of its properties. Monte Carlo simulations indicate that the iterative algorithm is robust and produces results within the numerical tolerance of the original empirical likelihood estimator in finite samples, while significantly improves computation in large-scale problems. Additionally, the algorithm performs effectively in an illustrative application using the return to education framework.
提出了一种具有工具变量的经验似然估计的frisch - waugh - lovell型(FWL)定理,它类似于普通最小二乘(OLS)中的标准FWL定理,但其划分过程使用解处的经验似然权重而不是原始样本分布。利用这一结果,通过迭代算法简化计算过程,其中使用加权最小二乘法划分外生变量,并在迭代之间更新权重。进一步证明了迭代以随机超线性速率局部收敛于原始经验似然估计。给出了一种可行的基于经验似然的置信区间迭代约束优化算法,并对其性质进行了讨论。Monte Carlo仿真结果表明,迭代算法具有较强的鲁棒性,在有限样本情况下产生的结果在原始经验似然估计的数值公差范围内,同时在大规模问题中显著提高了计算能力。此外,该算法在使用回归教育框架的说明性应用程序中有效地执行。
{"title":"A Frisch-Waugh-Lovell theorem for empirical likelihood","authors":"Yichun Song","doi":"10.1016/j.csda.2025.108208","DOIUrl":"10.1016/j.csda.2025.108208","url":null,"abstract":"<div><div>A Frisch-Waugh-Lovell-type (FWL) theorem for empirical likelihood estimation with instrumental variables is presented, which resembles the standard FWL theorem in ordinary least squares (OLS), but its partitioning procedure employs the empirical likelihood weights at the solution rather than the original sample distribution. This result is leveraged to simplify the computational process through an iterative algorithm, where exogenous variables are partitioned out using weighted least squares, and the weights are updated between iterations. Furthermore, it is demonstrated that iterations converge locally to the original empirical likelihood estimate at a stochastically super-linear rate. A feasible iterative constrained optimization algorithm for calculating empirical-likelihood-based confidence intervals is provided, along with a discussion of its properties. Monte Carlo simulations indicate that the iterative algorithm is robust and produces results within the numerical tolerance of the original empirical likelihood estimator in finite samples, while significantly improves computation in large-scale problems. Additionally, the algorithm performs effectively in an illustrative application using the return to education framework.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108208"},"PeriodicalIF":1.5,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144137907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1