首页 > 最新文献

Statistics and Computing最新文献

英文 中文
Model-based clustering of time-dependent observations with common structural changes. 具有共同结构变化的时变观测的基于模型的聚类。
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-01 Epub Date: 2025-10-28 DOI: 10.1007/s11222-025-10756-x
Riccardo Corradin, Luca Danese, Wasiur R KhudaBukhsh, Andrea Ongaro

We propose a novel model-based clustering approach for samples of time series. We assume as a unique commonality that two observations belong to the same group if structural changes in their behaviors happen at the same time. We resort to a latent representation of structural changes in each time series, based on random orders, to induce ties among different observations. Such an approach results in a general modeling strategy and can be combined with many time-dependent models already known in the literature. Our studies have been motivated by an epidemiological problem. Specifically, we want to provide clusters of different countries of the European Union where two countries belong to the same cluster if the spreading processes of the COVID-19 virus show structural changes at the same time.

Supplementary information: The online version contains supplementary material available at 10.1007/s11222-025-10756-x.

我们提出了一种新的基于模型的时间序列样本聚类方法。我们假设,如果两个观察对象的行为同时发生结构性变化,那么它们属于同一组,这是一种独特的共性。我们采用基于随机顺序的每个时间序列结构变化的潜在表示来诱导不同观测值之间的联系。这种方法产生了一种通用的建模策略,并且可以与文献中已知的许多时间相关模型相结合。我们的研究是由一个流行病学问题推动的。具体来说,如果新冠病毒的传播过程同时出现结构性变化,我们希望提供欧盟不同国家的集群,其中两个国家属于同一集群。补充信息:在线版本包含补充资料,提供地址:10.1007/s11222-025-10756-x。
{"title":"Model-based clustering of time-dependent observations with common structural changes.","authors":"Riccardo Corradin, Luca Danese, Wasiur R KhudaBukhsh, Andrea Ongaro","doi":"10.1007/s11222-025-10756-x","DOIUrl":"10.1007/s11222-025-10756-x","url":null,"abstract":"<p><p>We propose a novel model-based clustering approach for samples of time series. We assume as a unique commonality that two observations belong to the same group if structural changes in their behaviors happen at the same time. We resort to a latent representation of structural changes in each time series, based on random orders, to induce ties among different observations. Such an approach results in a general modeling strategy and can be combined with many time-dependent models already known in the literature. Our studies have been motivated by an epidemiological problem. Specifically, we want to provide clusters of different countries of the European Union where two countries belong to the same cluster if the spreading processes of the COVID-19 virus show structural changes at the same time.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s11222-025-10756-x.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"36 1","pages":"7"},"PeriodicalIF":1.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12568813/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145410278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Neural Network Integrated Accelerated Failure Time-Based Mixture Cure Model. 基于加速失效时间的神经网络混合固化模型。
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-10-01 Epub Date: 2025-06-22 DOI: 10.1007/s11222-025-10674-y
Wisdom Aselisewine, Suvra Pal

The mixture cure rate model (MCM) is commonly used for analyzing survival data with a cured subgroup. While the prevailing approach to modeling the probability of cure involves a generalized linear model using a known parametric link function, such as the logit link function, it has limitations in capturing the complex effects of covariates on cure probability. This paper introduces a novel MCM employing a neural network-based classifier for cure probability and an accelerated failure time structure for the survival distribution of uncured patients. An expectation maximization algorithm is developed for parameter estimation. Simulation results demonstrate the superior performance of the proposed model in capturing non-linear classification boundaries compared to logit-based and spline-based MCMs, as well as other machine learning algorithms. This enhances the accuracy and precision of cured probability estimates, improving predictive accuracy. The proposed model and estimation method are applied to survival data on leukemia cancer patients, showcasing their effectiveness.

混合治愈率模型(MCM)通常用于分析治愈亚组的生存数据。虽然对治愈概率建模的主流方法涉及使用已知参数链接函数(如logit链接函数)的广义线性模型,但它在捕获协变量对治愈概率的复杂影响方面存在局限性。本文介绍了一种新的MCM算法,该算法采用基于神经网络的治愈概率分类器和加速失效时间结构来计算未治愈患者的生存分布。提出了一种参数估计的期望最大化算法。仿真结果表明,与基于逻辑和样条的mcm以及其他机器学习算法相比,所提出的模型在捕获非线性分类边界方面具有优越的性能。这提高了固化概率估计的准确性和精密度,提高了预测的准确性。将该模型和估计方法应用于白血病患者的生存数据,验证了其有效性。
{"title":"A Neural Network Integrated Accelerated Failure Time-Based Mixture Cure Model.","authors":"Wisdom Aselisewine, Suvra Pal","doi":"10.1007/s11222-025-10674-y","DOIUrl":"10.1007/s11222-025-10674-y","url":null,"abstract":"<p><p>The mixture cure rate model (MCM) is commonly used for analyzing survival data with a cured subgroup. While the prevailing approach to modeling the probability of cure involves a generalized linear model using a known parametric link function, such as the logit link function, it has limitations in capturing the complex effects of covariates on cure probability. This paper introduces a novel MCM employing a neural network-based classifier for cure probability and an accelerated failure time structure for the survival distribution of uncured patients. An expectation maximization algorithm is developed for parameter estimation. Simulation results demonstrate the superior performance of the proposed model in capturing non-linear classification boundaries compared to logit-based and spline-based MCMs, as well as other machine learning algorithms. This enhances the accuracy and precision of cured probability estimates, improving predictive accuracy. The proposed model and estimation method are applied to survival data on leukemia cancer patients, showcasing their effectiveness.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 5","pages":""},"PeriodicalIF":1.6,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12369597/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144969648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bootstrap estimation of the proportion of outliers in robust regression. 稳健回归中异常值比例的自举估计。
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-02-01 Epub Date: 2024-11-16 DOI: 10.1007/s11222-024-10526-1
Qiang Heng, Kenneth Lange

This paper presents a nonparametric bootstrap method for estimating the proportions of inliers and outliers in robust regression models. Our approach is based on the concept of stability, providing robustness against distributional assumptions and eliminating the need for pre-specified confidence levels. Through numerical experiments, we demonstrate that this method yields more accurate and stable estimates than existing alternatives. Additionally, the generated instability paths offer a valuable graphical tool for understanding the inlier and outlier distributions within the data. The method naturally extends to generalized linear models, where we find that variance-stabilizing transformations produce residuals that are well-suited for outlier detection. Applications to two real-world datasets further illustrate the practical utility of our approach in identifying outliers.

本文提出了一种估计鲁棒回归模型中离群值和内群值比例的非参数自举方法。我们的方法基于稳定性的概念,提供了对分布假设的鲁棒性,并且消除了预先指定置信水平的需要。通过数值实验,我们证明了该方法比现有的替代方法产生更准确和稳定的估计。此外,生成的不稳定性路径提供了一个有价值的图形工具,用于理解数据中的内线和离群分布。该方法自然地扩展到广义线性模型,我们发现方差稳定变换产生的残差非常适合于离群值检测。对两个真实世界数据集的应用进一步说明了我们的方法在识别异常值方面的实际效用。
{"title":"Bootstrap estimation of the proportion of outliers in robust regression.","authors":"Qiang Heng, Kenneth Lange","doi":"10.1007/s11222-024-10526-1","DOIUrl":"https://doi.org/10.1007/s11222-024-10526-1","url":null,"abstract":"<p><p>This paper presents a nonparametric bootstrap method for estimating the proportions of inliers and outliers in robust regression models. Our approach is based on the concept of stability, providing robustness against distributional assumptions and eliminating the need for pre-specified confidence levels. Through numerical experiments, we demonstrate that this method yields more accurate and stable estimates than existing alternatives. Additionally, the generated instability paths offer a valuable graphical tool for understanding the inlier and outlier distributions within the data. The method naturally extends to generalized linear models, where we find that variance-stabilizing transformations produce residuals that are well-suited for outlier detection. Applications to two real-world datasets further illustrate the practical utility of our approach in identifying outliers.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12077844/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144080117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simulation based composite likelihood. 基于模拟的复合可能性。
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-01-01 Epub Date: 2025-02-25 DOI: 10.1007/s11222-025-10584-z
Lorenzo Rimella, Chris Jewell, Paul Fearnhead

Inference for high-dimensional hidden Markov models is challenging due to the exponential-in-dimension computational cost of calculating the likelihood. To address this issue, we introduce an innovative composite likelihood approach called "Simulation Based Composite Likelihood" (SimBa-CL). With SimBa-CL, we approximate the likelihood by the product of its marginals, which we estimate using Monte Carlo sampling. In a similar vein to approximate Bayesian computation (ABC), SimBa-CL requires multiple simulations from the model, but, in contrast to ABC, it provides a likelihood approximation that guides the optimization of the parameters. Leveraging automatic differentiation libraries, it is simple to calculate gradients and Hessians to not only speed up optimization but also to build approximate confidence sets. We present extensive empirical results which validate our theory and demonstrate its advantage over SMC, and apply SimBa-CL to real-world Aphtovirus data.

Supplementary information: The online version contains supplementary material available at 10.1007/s11222-025-10584-z.

高维隐马尔可夫模型的推理是具有挑战性的,因为计算可能性的计算成本是指数维的。为了解决这个问题,我们引入了一种创新的复合似然方法,称为“基于模拟的复合似然”(SimBa-CL)。使用SimBa-CL,我们通过其边际的乘积来近似似然,我们使用蒙特卡罗采样来估计。与近似贝叶斯计算(ABC)类似,SimBa-CL需要从模型中进行多次模拟,但是,与ABC相反,它提供了指导参数优化的似然近似。利用自动微分库,可以简单地计算梯度和Hessians,不仅可以加快优化速度,还可以构建近似置信集。我们提出了广泛的实证结果,验证了我们的理论,并证明了其优于SMC的优势,并将SimBa-CL应用于现实世界的阿夫托病毒数据。补充信息:在线版本包含补充资料,提供地址为10.1007/s11222-025-10584-z。
{"title":"Simulation based composite likelihood.","authors":"Lorenzo Rimella, Chris Jewell, Paul Fearnhead","doi":"10.1007/s11222-025-10584-z","DOIUrl":"10.1007/s11222-025-10584-z","url":null,"abstract":"<p><p>Inference for high-dimensional hidden Markov models is challenging due to the exponential-in-dimension computational cost of calculating the likelihood. To address this issue, we introduce an innovative composite likelihood approach called \"Simulation Based Composite Likelihood\" (SimBa-CL). With SimBa-CL, we approximate the likelihood by the product of its marginals, which we estimate using Monte Carlo sampling. In a similar vein to approximate Bayesian computation (ABC), SimBa-CL requires multiple simulations from the model, but, in contrast to ABC, it provides a likelihood approximation that guides the optimization of the parameters. Leveraging automatic differentiation libraries, it is simple to calculate gradients and Hessians to not only speed up optimization but also to build approximate confidence sets. We present extensive empirical results which validate our theory and demonstrate its advantage over SMC, and apply SimBa-CL to real-world Aphtovirus data.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s11222-025-10584-z.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 3","pages":"58"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11861035/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143524490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sequential Bayesian Registration for Functional Data. 功能数据的顺序贝叶斯配准。
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-01-01 Epub Date: 2025-05-27 DOI: 10.1007/s11222-025-10640-8
Yoonji Kim, Oksana A Chkrebtii, Sebastian A Kurtek

In many modern applications, discretely-observed data may be naturally understood as a set of functions. Functional data often exhibit two confounded sources of variability: amplitude (y-axis) and phase (x-axis). The extraction of amplitude and phase, a process known as registration, is essential in exploring the underlying structure of functional data in a variety of areas, from environmental monitoring to medical imaging. Critically, such data are often gathered sequentially with new functional observations arriving over time. Despite this, existing registration procedures do not sequentially update inference based on the new data, requiring model refitting. To address these challenges, we introduce a Bayesian framework for sequential registration of functional data, which updates statistical inference as new sets of functions are assimilated. This Bayesian model-based sequential learning approach utilizes sequential Monte Carlo sampling to recursively update the alignment of observed functions while accounting for associated uncertainty. Distributed computing significantly reduces computational cost relative to refitting the model using an iterative method such as Markov chain Monte Carlo on the full data. Simulation studies and comparisons reveal that the proposed approach performs well even when the target posterior distribution has a challenging structure. We apply the proposed method to three real datasets: (1) functions of annual drought intensity near Kaweah River in California, (2) annual sea surface salinity functions near Null Island, and (3) a sequence of repeated patterns in electrocardiogram signals.

在许多现代应用中,离散观测数据可以很自然地理解为一组函数。功能数据通常表现出两个混杂的变异性来源:振幅(y轴)和相位(x轴)。振幅和相位的提取,一个被称为配准的过程,对于探索从环境监测到医学成像等各种领域的功能数据的潜在结构至关重要。关键的是,这些数据通常是随着时间的推移,随着新的功能观察的到来而顺序收集的。尽管如此,现有的配准程序不能根据新数据顺序更新推理,需要对模型进行改装。为了解决这些挑战,我们引入了一个贝叶斯框架,用于功能数据的顺序注册,该框架在吸收新函数集时更新统计推断。这种基于贝叶斯模型的顺序学习方法利用顺序蒙特卡罗采样递归地更新观察到的函数的对齐,同时考虑到相关的不确定性。分布式计算相对于在全数据上使用马尔可夫链蒙特卡罗等迭代方法重新调整模型,显著降低了计算成本。仿真研究和比较表明,即使在目标后验分布具有挑战性的情况下,该方法也具有良好的性能。我们将该方法应用于三个实际数据集:(1)加利福尼亚州Kaweah河附近的年干旱强度函数,(2)Null岛附近的年海面盐度函数,以及(3)心电图信号的重复模式序列。
{"title":"Sequential Bayesian Registration for Functional Data.","authors":"Yoonji Kim, Oksana A Chkrebtii, Sebastian A Kurtek","doi":"10.1007/s11222-025-10640-8","DOIUrl":"10.1007/s11222-025-10640-8","url":null,"abstract":"<p><p>In many modern applications, discretely-observed data may be naturally understood as a set of functions. Functional data often exhibit two confounded sources of variability: amplitude (<i>y</i>-axis) and phase (<i>x</i>-axis). The extraction of amplitude and phase, a process known as registration, is essential in exploring the underlying structure of functional data in a variety of areas, from environmental monitoring to medical imaging. Critically, such data are often gathered sequentially with new functional observations arriving over time. Despite this, existing registration procedures do not sequentially update inference based on the new data, requiring model refitting. To address these challenges, we introduce a Bayesian framework for sequential registration of functional data, which updates statistical inference as new sets of functions are assimilated. This Bayesian model-based sequential learning approach utilizes sequential Monte Carlo sampling to recursively update the alignment of observed functions while accounting for associated uncertainty. Distributed computing significantly reduces computational cost relative to refitting the model using an iterative method such as Markov chain Monte Carlo on the full data. Simulation studies and comparisons reveal that the proposed approach performs well even when the target posterior distribution has a challenging structure. We apply the proposed method to three real datasets: (1) functions of annual drought intensity near Kaweah River in California, (2) annual sea surface salinity functions near Null Island, and (3) a sequence of repeated patterns in electrocardiogram signals.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 4","pages":"108"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12116714/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144182656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Outcome-guided spike-and-slab Lasso Biclustering: A Novel Approach for Enhancing Biclustering Techniques for Gene Expression Analysis. 结果导向的穗板Lasso双聚类:一种增强基因表达分析双聚类技术的新方法。
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-01-01 Epub Date: 2025-08-28 DOI: 10.1007/s11222-025-10709-4
Luis A Vargas-Mieles, Paul D W Kirk, Chris Wallace

Biclustering has gained interest in gene expression data analysis due to its ability to identify groups of samples that exhibit similar behaviour in specific subsets of genes (or vice versa), in contrast to traditional clustering methods that classify samples based on all genes. Despite advances, biclustering remains a challenging problem, even with cutting-edge methodologies. This paper introduces an extension of the recently proposed Spike-and-Slab Lasso Biclustering (SSLB) algorithm, termed Outcome-Guided SSLB (OG-SSLB), aimed at enhancing the identification of biclusters in gene expression analysis. Our proposed approach integrates disease outcomes into the biclustering framework through Bayesian profile regression. By leveraging additional clinical information, OG-SSLB improves the interpretability and relevance of the resulting biclusters. Comprehensive simulations and numerical experiments demonstrate that OG-SSLB achieves superior performance, with improved accuracy in estimating the number of clusters and higher consensus scores compared to the original SSLB method. Furthermore, OG-SSLB effectively identifies meaningful patterns and associations between gene expression profiles and disease states. These promising results demonstrate the effectiveness of OG-SSLB in advancing biclustering techniques, providing a powerful tool for uncovering biologically relevant insights. The OGSSLB software can be found as an R/C++ package at https://github.com/luisvargasmieles/OGSSLB.

与传统的基于所有基因对样本进行分类的聚类方法相比,双聚类方法能够识别在特定基因子集中表现出相似行为的样本组(反之亦然),因此对基因表达数据分析产生了兴趣。尽管取得了进步,但即使使用尖端的方法,双聚类仍然是一个具有挑战性的问题。本文介绍了最近提出的spike - slab Lasso双聚类(SSLB)算法的扩展,称为结果导向SSLB (OG-SSLB),旨在增强基因表达分析中双聚类的识别。我们提出的方法通过贝叶斯剖面回归将疾病结果整合到双聚类框架中。通过利用额外的临床信息,OG-SSLB提高了结果双聚类的可解释性和相关性。综合仿真和数值实验表明,OG-SSLB方法具有较好的性能,与原始的SSLB方法相比,OG-SSLB方法在估计聚类数量方面具有更高的准确性和更高的一致性分数。此外,OG-SSLB有效识别基因表达谱和疾病状态之间有意义的模式和关联。这些有希望的结果证明了OG-SSLB在推进双聚类技术方面的有效性,为揭示生物学相关的见解提供了一个强大的工具。可以在https://github.com/luisvargasmieles/OGSSLB上找到OGSSLB软件的R/ c++包。
{"title":"Outcome-guided spike-and-slab Lasso Biclustering: A Novel Approach for Enhancing Biclustering Techniques for Gene Expression Analysis.","authors":"Luis A Vargas-Mieles, Paul D W Kirk, Chris Wallace","doi":"10.1007/s11222-025-10709-4","DOIUrl":"10.1007/s11222-025-10709-4","url":null,"abstract":"<p><p>Biclustering has gained interest in gene expression data analysis due to its ability to identify groups of samples that exhibit similar behaviour in specific subsets of genes (or vice versa), in contrast to traditional clustering methods that classify samples based on all genes. Despite advances, biclustering remains a challenging problem, even with cutting-edge methodologies. This paper introduces an extension of the recently proposed Spike-and-Slab Lasso Biclustering (SSLB) algorithm, termed Outcome-Guided SSLB (OG-SSLB), aimed at enhancing the identification of biclusters in gene expression analysis. Our proposed approach integrates disease outcomes into the biclustering framework through Bayesian profile regression. By leveraging additional clinical information, OG-SSLB improves the interpretability and relevance of the resulting biclusters. Comprehensive simulations and numerical experiments demonstrate that OG-SSLB achieves superior performance, with improved accuracy in estimating the number of clusters and higher consensus scores compared to the original SSLB method. Furthermore, OG-SSLB effectively identifies meaningful patterns and associations between gene expression profiles and disease states. These promising results demonstrate the effectiveness of OG-SSLB in advancing biclustering techniques, providing a powerful tool for uncovering biologically relevant insights. The OGSSLB software can be found as an R/C++ package at https://github.com/luisvargasmieles/OGSSLB.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 6","pages":"179"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12394340/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144969714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extended fiducial inference for individual treatment effects via deep neural networks. 基于深度神经网络的个体治疗效果扩展基准推断。
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-01-01 Epub Date: 2025-05-17 DOI: 10.1007/s11222-025-10624-8
Sehwan Kim, Faming Liang

Individual treatment effect estimation has gained significant attention in recent data science literature. This work introduces the Double Neural Network (Double-NN) method to address this problem within the framework of extended fiducial inference (EFI). In the proposed method, deep neural networks are used to model the treatment and control effect functions, while an additional neural network is employed to estimate their parameters. The universal approximation capability of deep neural networks ensures the broad applicability of this method. Numerical results highlight the superior performance of the proposed Double-NN method compared to the conformal quantile regression (CQR) method in individual treatment effect estimation. From the perspective of statistical inference, this work advances the theory and methodology for statistical inference of large models. Specifically, it is theoretically proven that the proposed method permits the model size to increase with the sample size n at a rate of O ( n ζ ) for some 0 ζ < 1 , while still maintaining proper quantification of uncertainty in the model parameters. This result marks a significant improvement compared to the range 0 ζ < 1 2 required by the classical central limit theorem. Furthermore, this work provides a rigorous framework for quantifying the uncertainty of deep neural networks under the neural scaling law, representing a substantial contribution to the statistical understanding of large-scale neural network models.

Supplementary information: The online version contains supplementary material available at 10.1007/s11222-025-10624-8.

在最近的数据科学文献中,个体治疗效果估计得到了极大的关注。本文介绍了双神经网络(Double- nn)方法在扩展基准推理(EFI)框架内解决这一问题。在该方法中,采用深度神经网络对处理和控制效果函数进行建模,并采用附加神经网络对其参数进行估计。深度神经网络的通用逼近能力保证了该方法的广泛适用性。数值结果表明,该方法在个体治疗效果估计方面优于保形分位数回归(CQR)方法。从统计推断的角度出发,提出了大模型统计推断的理论和方法。具体来说,理论上证明了所提出的方法允许模型尺寸随样本量n以0 (n ζ)的速率增加,对于某些0≤ζ 1,同时仍然保持模型参数中不确定性的适当量化。与经典中心极限定理要求的0≤ζ 12的范围相比,这个结果标志着一个显著的改进。此外,这项工作为在神经标度律下量化深度神经网络的不确定性提供了一个严格的框架,对大规模神经网络模型的统计理解做出了重大贡献。补充资料:在线版本包含补充资料,下载地址:10.1007/s11222-025-10624-8。
{"title":"Extended fiducial inference for individual treatment effects via deep neural networks.","authors":"Sehwan Kim, Faming Liang","doi":"10.1007/s11222-025-10624-8","DOIUrl":"10.1007/s11222-025-10624-8","url":null,"abstract":"<p><p>Individual treatment effect estimation has gained significant attention in recent data science literature. This work introduces the Double Neural Network (Double-NN) method to address this problem within the framework of extended fiducial inference (EFI). In the proposed method, deep neural networks are used to model the treatment and control effect functions, while an additional neural network is employed to estimate their parameters. The universal approximation capability of deep neural networks ensures the broad applicability of this method. Numerical results highlight the superior performance of the proposed Double-NN method compared to the conformal quantile regression (CQR) method in individual treatment effect estimation. From the perspective of statistical inference, this work advances the theory and methodology for statistical inference of large models. Specifically, it is theoretically proven that the proposed method permits the model size to increase with the sample size <i>n</i> at a rate of <math><mrow><mi>O</mi> <mo>(</mo> <msup><mi>n</mi> <mi>ζ</mi></msup> <mo>)</mo></mrow> </math> for some <math><mrow><mn>0</mn> <mo>≤</mo> <mi>ζ</mi> <mo><</mo> <mn>1</mn></mrow> </math> , while still maintaining proper quantification of uncertainty in the model parameters. This result marks a significant improvement compared to the range <math><mrow><mn>0</mn> <mo>≤</mo> <mi>ζ</mi> <mo><</mo> <mfrac><mn>1</mn> <mn>2</mn></mfrac> </mrow> </math> required by the classical central limit theorem. Furthermore, this work provides a rigorous framework for quantifying the uncertainty of deep neural networks under the neural scaling law, representing a substantial contribution to the statistical understanding of large-scale neural network models.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s11222-025-10624-8.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 4","pages":"97"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12085359/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144102739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian shared parameter joint models for heterogeneous populations. 异质种群的贝叶斯共享参数联合模型。
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-01-01 Epub Date: 2025-06-12 DOI: 10.1007/s11222-025-10647-1
Sida Chen, Danilo Alvares, Marco Palma, Jessica K Barrett

Joint models (JMs) for longitudinal and time-to-event data are an important class of biostatistical models in health and medical research. When the study population consists of heterogeneous subgroups, standard JMs may be inadequate, leading to misleading results or loss of information. Joint latent class models (JLCMs) and their variants have been proposed to incorporate latent class structures into JMs. JLCMs are useful for identifying latent subgroups, uncovering deeper insights into relationships between the outcomes, and improving prediction performance. We consider the problem of Bayesian inference for the generic form of JLCMs, which poses significant computational challenges due to the complex nature of the posterior distribution. We propose a new Bayesian inference framework to tackle these challenges. Our approach leverages state-of-the-art Markov chain Monte Carlo techniques and parallel computing for parameter estimation and model selection regarding the number of latent classes. Through a simulation study, we demonstrate the feasibility and superiority of our proposed method over the existing approach. Additionally, we provide practical guidance on model and prior specification, which has received little attention, to facilitate the implementation of such complex models. We illustrate our method using data from the PAQUID prospective cohort study, where the outcomes of interest include a longitudinal measurement of cognitive performance and time to dementia diagnosis. Our analysis provides deeper insights into the latent class characteristics underlying the study population.

Supplementary information: The online version contains supplementary material available at 10.1007/s11222-025-10647-1.

纵向和事件时间数据联合模型(JMs)是卫生和医学研究中一类重要的生物统计模型。当研究人群由异质亚组组成时,标准JMs可能不充分,导致误导性结果或信息丢失。联合潜在类模型(jlcm)及其变体被提出将潜在类结构纳入JMs。jlcm对于识别潜在的子组、揭示对结果之间关系的更深入的了解以及提高预测性能非常有用。我们考虑了jlcm一般形式的贝叶斯推理问题,由于后验分布的复杂性,该问题带来了重大的计算挑战。我们提出了一个新的贝叶斯推理框架来解决这些挑战。我们的方法利用最先进的马尔可夫链蒙特卡罗技术和并行计算进行参数估计和关于潜在类别数量的模型选择。通过仿真研究,我们证明了该方法的可行性和优越性。此外,我们还提供了关于模型和先验规范的实用指导,这一点很少受到关注,以促进此类复杂模型的实现。我们使用来自PAQUID前瞻性队列研究的数据来说明我们的方法,其中感兴趣的结果包括认知表现和痴呆诊断时间的纵向测量。我们的分析为研究人群潜在的阶级特征提供了更深入的见解。补充资料:在线版本包含补充资料,下载地址:10.1007/s11222-025-10647-1。
{"title":"Bayesian shared parameter joint models for heterogeneous populations.","authors":"Sida Chen, Danilo Alvares, Marco Palma, Jessica K Barrett","doi":"10.1007/s11222-025-10647-1","DOIUrl":"10.1007/s11222-025-10647-1","url":null,"abstract":"<p><p>Joint models (JMs) for longitudinal and time-to-event data are an important class of biostatistical models in health and medical research. When the study population consists of heterogeneous subgroups, standard JMs may be inadequate, leading to misleading results or loss of information. Joint latent class models (JLCMs) and their variants have been proposed to incorporate latent class structures into JMs. JLCMs are useful for identifying latent subgroups, uncovering deeper insights into relationships between the outcomes, and improving prediction performance. We consider the problem of Bayesian inference for the generic form of JLCMs, which poses significant computational challenges due to the complex nature of the posterior distribution. We propose a new Bayesian inference framework to tackle these challenges. Our approach leverages state-of-the-art Markov chain Monte Carlo techniques and parallel computing for parameter estimation and model selection regarding the number of latent classes. Through a simulation study, we demonstrate the feasibility and superiority of our proposed method over the existing approach. Additionally, we provide practical guidance on model and prior specification, which has received little attention, to facilitate the implementation of such complex models. We illustrate our method using data from the PAQUID prospective cohort study, where the outcomes of interest include a longitudinal measurement of cognitive performance and time to dementia diagnosis. Our analysis provides deeper insights into the latent class characteristics underlying the study population.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s11222-025-10647-1.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 5","pages":"125"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12162714/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144302837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Bayesian changepoint detection for network Poisson processes with community structure. 具有群落结构的网络泊松过程的在线贝叶斯变化点检测。
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-01-01 Epub Date: 2025-04-03 DOI: 10.1007/s11222-025-10606-w
Joshua Corneck, Edward A K Cohen, James S Martin, Francesco Sanna Passino

Network point processes often exhibit latent structure that govern the behaviour of the sub-processes. It is not always reasonable to assume that this latent structure is static, and detecting when and how this driving structure changes is often of interest. In this paper, we introduce a novel online methodology for detecting changes within the latent structure of a network point process. We focus on block-homogeneous Poisson processes, where latent node memberships determine the rates of the edge processes. We propose a scalable variational procedure which can be applied on large networks in an online fashion via a Bayesian forgetting factor applied to sequential variational approximations to the posterior distribution. The proposed framework is tested on simulated and real-world data, and it rapidly and accurately detects changes to the latent edge process rates, and to the latent node group memberships, both in an online manner. In particular, in an application on the Santander Cycles bike-sharing network in central London, we detect changes within the network related to holiday periods and lockdown restrictions between 2019 and 2020.

网络点过程通常表现出控制子过程行为的潜在结构。假设这种潜在结构是静态的并不总是合理的,检测这种驱动结构何时以及如何变化通常是令人感兴趣的。在本文中,我们介绍了一种新的在线方法来检测网络点过程中潜在结构的变化。我们专注于块齐次泊松过程,其中潜在节点的隶属度决定了边缘过程的速率。我们提出了一种可扩展的变分过程,它可以通过将贝叶斯遗忘因子应用于后验分布的顺序变分近似,以在线方式应用于大型网络。所提出的框架在模拟和真实数据上进行了测试,它能够快速准确地在线检测潜在边缘处理速率和潜在节点组成员关系的变化。特别是,在伦敦市中心桑坦德自行车共享网络的应用程序中,我们发现了2019年至2020年期间与假期和封锁限制相关的网络变化。
{"title":"Online Bayesian changepoint detection for network Poisson processes with community structure.","authors":"Joshua Corneck, Edward A K Cohen, James S Martin, Francesco Sanna Passino","doi":"10.1007/s11222-025-10606-w","DOIUrl":"10.1007/s11222-025-10606-w","url":null,"abstract":"<p><p>Network point processes often exhibit latent structure that govern the behaviour of the sub-processes. It is not always reasonable to assume that this latent structure is static, and detecting when and how this driving structure changes is often of interest. In this paper, we introduce a novel online methodology for detecting changes within the latent structure of a network point process. We focus on block-homogeneous Poisson processes, where latent node memberships determine the rates of the edge processes. We propose a scalable variational procedure which can be applied on large networks in an online fashion via a Bayesian forgetting factor applied to sequential variational approximations to the posterior distribution. The proposed framework is tested on simulated and real-world data, and it rapidly and accurately detects changes to the latent edge process rates, and to the latent node group memberships, both in an online manner. In particular, in an application on the Santander Cycles bike-sharing network in central London, we detect changes within the network related to holiday periods and lockdown restrictions between 2019 and 2020.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 3","pages":"75"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11968509/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143796525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Likelihood-Based Temporal Changepoint Detection in Spatio-Temporal Processes. 基于似然的时空变化点检测方法。
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-01-01 Epub Date: 2025-10-17 DOI: 10.1007/s11222-025-10745-0
Gaurav Agarwal, Idris A Eckley, Paul Fearnhead

The rapid advancements of scalable methodologies have opened new avenues for analyzing complex spatio-temporal data, which is crucial in understanding dynamic environmental phenomena. This paper introduces a likelihood-based methodology for detecting abrupt changes in time in spatio-temporal processes, a field where traditional time series methods fall short. Unlike recent approaches, we do not make the unrealistic assumption that data is independent across changepoints. Instead, we use a recently proposed family of covariance models that allows nonstationarity in time, and we propose a Markov approximation to reduce the computational burden of calculating likelihoods under this model. We apply our method to two years of daily wind speed data from various synoptic weather stations in Ireland, identifying a significant changepoint on July 24, 2021, which aligns with a major shift in weather patterns. This application not only demonstrates the method's utility in handling spatio-temporal datasets but also showcases its potential in broader environmental and climatic studies, offering a scalable solution for analyzing changing patterns in spatial data over time.

可扩展方法的快速发展为分析复杂的时空数据开辟了新的途径,这对于理解动态环境现象至关重要。本文介绍了一种基于似然的方法来检测时空过程中的时间突变,这是传统时间序列方法所欠缺的领域。与最近的方法不同,我们没有做出不切实际的假设,即数据在各个更改点之间是独立的。相反,我们使用了最近提出的一系列协方差模型,这些模型允许时间上的非平稳性,并且我们提出了一个马尔可夫近似来减少在该模型下计算可能性的计算负担。我们将我们的方法应用于爱尔兰各天气气象站两年的每日风速数据,确定了2021年7月24日的一个重要变化点,这与天气模式的重大转变相一致。该应用程序不仅展示了该方法在处理时空数据集方面的实用性,而且还展示了其在更广泛的环境和气候研究中的潜力,为分析空间数据随时间变化的模式提供了可扩展的解决方案。
{"title":"Efficient Likelihood-Based Temporal Changepoint Detection in Spatio-Temporal Processes.","authors":"Gaurav Agarwal, Idris A Eckley, Paul Fearnhead","doi":"10.1007/s11222-025-10745-0","DOIUrl":"10.1007/s11222-025-10745-0","url":null,"abstract":"<p><p>The rapid advancements of scalable methodologies have opened new avenues for analyzing complex spatio-temporal data, which is crucial in understanding dynamic environmental phenomena. This paper introduces a likelihood-based methodology for detecting abrupt changes in time in spatio-temporal processes, a field where traditional time series methods fall short. Unlike recent approaches, we do not make the unrealistic assumption that data is independent across changepoints. Instead, we use a recently proposed family of covariance models that allows nonstationarity in time, and we propose a Markov approximation to reduce the computational burden of calculating likelihoods under this model. We apply our method to two years of daily wind speed data from various synoptic weather stations in Ireland, identifying a significant changepoint on July 24, 2021, which aligns with a major shift in weather patterns. This application not only demonstrates the method's utility in handling spatio-temporal datasets but also showcases its potential in broader environmental and climatic studies, offering a scalable solution for analyzing changing patterns in spatial data over time.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 6","pages":"213"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12534301/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145329916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistics and Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1