首页 > 最新文献

Statistics and Computing最新文献

英文 中文
Simulation based composite likelihood. 基于模拟的复合可能性。
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-01-01 Epub Date: 2025-02-25 DOI: 10.1007/s11222-025-10584-z
Lorenzo Rimella, Chris Jewell, Paul Fearnhead

Inference for high-dimensional hidden Markov models is challenging due to the exponential-in-dimension computational cost of calculating the likelihood. To address this issue, we introduce an innovative composite likelihood approach called "Simulation Based Composite Likelihood" (SimBa-CL). With SimBa-CL, we approximate the likelihood by the product of its marginals, which we estimate using Monte Carlo sampling. In a similar vein to approximate Bayesian computation (ABC), SimBa-CL requires multiple simulations from the model, but, in contrast to ABC, it provides a likelihood approximation that guides the optimization of the parameters. Leveraging automatic differentiation libraries, it is simple to calculate gradients and Hessians to not only speed up optimization but also to build approximate confidence sets. We present extensive empirical results which validate our theory and demonstrate its advantage over SMC, and apply SimBa-CL to real-world Aphtovirus data.

Supplementary information: The online version contains supplementary material available at 10.1007/s11222-025-10584-z.

{"title":"Simulation based composite likelihood.","authors":"Lorenzo Rimella, Chris Jewell, Paul Fearnhead","doi":"10.1007/s11222-025-10584-z","DOIUrl":"10.1007/s11222-025-10584-z","url":null,"abstract":"<p><p>Inference for high-dimensional hidden Markov models is challenging due to the exponential-in-dimension computational cost of calculating the likelihood. To address this issue, we introduce an innovative composite likelihood approach called \"Simulation Based Composite Likelihood\" (SimBa-CL). With SimBa-CL, we approximate the likelihood by the product of its marginals, which we estimate using Monte Carlo sampling. In a similar vein to approximate Bayesian computation (ABC), SimBa-CL requires multiple simulations from the model, but, in contrast to ABC, it provides a likelihood approximation that guides the optimization of the parameters. Leveraging automatic differentiation libraries, it is simple to calculate gradients and Hessians to not only speed up optimization but also to build approximate confidence sets. We present extensive empirical results which validate our theory and demonstrate its advantage over SMC, and apply SimBa-CL to real-world Aphtovirus data.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s11222-025-10584-z.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 3","pages":"58"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11861035/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143524490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Bayesian changepoint detection for network Poisson processes with community structure. 具有群落结构的网络泊松过程的在线贝叶斯变化点检测。
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-01-01 Epub Date: 2025-04-03 DOI: 10.1007/s11222-025-10606-w
Joshua Corneck, Edward A K Cohen, James S Martin, Francesco Sanna Passino

Network point processes often exhibit latent structure that govern the behaviour of the sub-processes. It is not always reasonable to assume that this latent structure is static, and detecting when and how this driving structure changes is often of interest. In this paper, we introduce a novel online methodology for detecting changes within the latent structure of a network point process. We focus on block-homogeneous Poisson processes, where latent node memberships determine the rates of the edge processes. We propose a scalable variational procedure which can be applied on large networks in an online fashion via a Bayesian forgetting factor applied to sequential variational approximations to the posterior distribution. The proposed framework is tested on simulated and real-world data, and it rapidly and accurately detects changes to the latent edge process rates, and to the latent node group memberships, both in an online manner. In particular, in an application on the Santander Cycles bike-sharing network in central London, we detect changes within the network related to holiday periods and lockdown restrictions between 2019 and 2020.

{"title":"Online Bayesian changepoint detection for network Poisson processes with community structure.","authors":"Joshua Corneck, Edward A K Cohen, James S Martin, Francesco Sanna Passino","doi":"10.1007/s11222-025-10606-w","DOIUrl":"10.1007/s11222-025-10606-w","url":null,"abstract":"<p><p>Network point processes often exhibit latent structure that govern the behaviour of the sub-processes. It is not always reasonable to assume that this latent structure is static, and detecting when and how this driving structure changes is often of interest. In this paper, we introduce a novel online methodology for detecting changes within the latent structure of a network point process. We focus on block-homogeneous Poisson processes, where latent node memberships determine the rates of the edge processes. We propose a scalable variational procedure which can be applied on large networks in an online fashion via a Bayesian forgetting factor applied to sequential variational approximations to the posterior distribution. The proposed framework is tested on simulated and real-world data, and it rapidly and accurately detects changes to the latent edge process rates, and to the latent node group memberships, both in an online manner. In particular, in an application on the Santander Cycles bike-sharing network in central London, we detect changes within the network related to holiday periods and lockdown restrictions between 2019 and 2020.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 3","pages":"75"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11968509/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143796525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using prior-data conflict to tune Bayesian regularized regression models.
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-01-01 Epub Date: 2025-02-20 DOI: 10.1007/s11222-025-10582-1
Timofei Biziaev, Karen Kopciuk, Thierry Chekouo

In high-dimensional regression models, variable selection becomes challenging from a computational and theoretical perspective. Bayesian regularized regression via shrinkage priors like the Laplace or spike-and-slab prior are effective methods for variable selection in p > n scenarios provided the shrinkage priors are configured adequately. We propose an empirical Bayes configuration using checks for prior-data conflict: tests that assess whether there is disagreement in parameter information provided by the prior and data. We apply our proposed method to the Bayesian LASSO and spike-and-slab shrinkage priors in the linear regression model and assess the variable selection performance of our prior configurations through a high-dimensional simulation study. Additionally, we apply our method to proteomic data collected from patients admitted to the Albany Medical Center in Albany NY in April of 2020 with COVID-like respiratory issues. Simulation results suggest our proposed configurations may outperform competing models when the true regression effects are small.

Supplementary information: The online version contains supplementary material available at 10.1007/s11222-025-10582-1.

{"title":"Using prior-data conflict to tune Bayesian regularized regression models.","authors":"Timofei Biziaev, Karen Kopciuk, Thierry Chekouo","doi":"10.1007/s11222-025-10582-1","DOIUrl":"10.1007/s11222-025-10582-1","url":null,"abstract":"<p><p>In high-dimensional regression models, variable selection becomes challenging from a computational and theoretical perspective. Bayesian regularized regression via shrinkage priors like the Laplace or spike-and-slab prior are effective methods for variable selection in <math><mrow><mi>p</mi> <mo>></mo> <mi>n</mi></mrow> </math> scenarios provided the shrinkage priors are configured adequately. We propose an empirical Bayes configuration using checks for prior-data conflict: tests that assess whether there is disagreement in parameter information provided by the prior and data. We apply our proposed method to the Bayesian LASSO and spike-and-slab shrinkage priors in the linear regression model and assess the variable selection performance of our prior configurations through a high-dimensional simulation study. Additionally, we apply our method to proteomic data collected from patients admitted to the Albany Medical Center in Albany NY in April of 2020 with COVID-like respiratory issues. Simulation results suggest our proposed configurations may outperform competing models when the true regression effects are small.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s11222-025-10582-1.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 2","pages":"53"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11842445/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143484027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation and model selection for finite mixtures of Tukey's g- &-h distributions.
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-01-01 Epub Date: 2025-03-15 DOI: 10.1007/s11222-025-10596-9
Tingting Zhan, Misung Yi, Amy R Peck, Hallgeir Rui, Inna Chervoneva

A finite mixture of distributions is a popular statistical model, which is especially meaningful when the population of interest may include distinct subpopulations. This work is motivated by analysis of protein expression levels quantified using immunofluorescence immunohistochemistry assays of human tissues. The distributions of cellular protein expression levels in a tissue often exhibit multimodality, skewness and heavy tails, but there is a substantial variability between distributions in different tissues from different subjects, while some of these mixture distributions include components consistent with the assumption of a normal distribution. To accommodate such diversity, we propose a mixture of 4-parameter Tukey's g- &-h distributions for fitting finite mixtures with both Gaussian and non-Gaussian components. Tukey's g- &-h distribution is a flexible model that allows variable degree of skewness and kurtosis in mixture components, including normal distribution as a particular case. Since the likelihood of the Tukey's g- &-h mixtures does not have a closed analytical form, we propose a quantile least Mahalanobis distance (QLMD) estimator for parameters of such mixtures. QLMD is an indirect estimator minimizing the Mahalanobis distance between the sample and model-based quantiles, and its asymptotic properties follow from the general theory of indirect estimation. We have developed a stepwise algorithm to select a parsimonious Tukey's g- &-h mixture model and implemented all proposed methods in the R package QuantileGH available on CRAN. A simulation study was conducted to evaluate performance of the Tukey's g- &-h mixtures and compare to performance of mixtures of skew-normal or skew-t distributions. The Tukey's g- &-h mixtures were applied to model cellular expressions of Cyclin D1 protein in breast cancer tissues, and resulting parameter estimates evaluated as predictors of progression-free survival.

{"title":"Estimation and model selection for finite mixtures of Tukey's <i>g</i>- &-<i>h</i> distributions.","authors":"Tingting Zhan, Misung Yi, Amy R Peck, Hallgeir Rui, Inna Chervoneva","doi":"10.1007/s11222-025-10596-9","DOIUrl":"10.1007/s11222-025-10596-9","url":null,"abstract":"<p><p>A finite mixture of distributions is a popular statistical model, which is especially meaningful when the population of interest may include distinct subpopulations. This work is motivated by analysis of protein expression levels quantified using immunofluorescence immunohistochemistry assays of human tissues. The distributions of cellular protein expression levels in a tissue often exhibit multimodality, skewness and heavy tails, but there is a substantial variability between distributions in different tissues from different subjects, while some of these mixture distributions include components consistent with the assumption of a normal distribution. To accommodate such diversity, we propose a mixture of 4-parameter Tukey's <i>g</i>- &-<i>h</i> distributions for fitting finite mixtures with both Gaussian and non-Gaussian components. Tukey's <i>g</i>- &-<i>h</i> distribution is a flexible model that allows variable degree of skewness and kurtosis in mixture components, including normal distribution as a particular case. Since the likelihood of the Tukey's <i>g</i>- &-<i>h</i> mixtures does not have a closed analytical form, we propose a quantile least Mahalanobis distance (QLMD) estimator for parameters of such mixtures. QLMD is an indirect estimator minimizing the Mahalanobis distance between the sample and model-based quantiles, and its asymptotic properties follow from the general theory of indirect estimation. We have developed a stepwise algorithm to select a parsimonious Tukey's <i>g</i>- &-<i>h</i> mixture model and implemented all proposed methods in the R package QuantileGH available on CRAN. A simulation study was conducted to evaluate performance of the Tukey's <i>g</i>- &-<i>h</i> mixtures and compare to performance of mixtures of skew-normal or skew-<i>t</i> distributions. The Tukey's <i>g</i>- &-<i>h</i> mixtures were applied to model cellular expressions of Cyclin D1 protein in breast cancer tissues, and resulting parameter estimates evaluated as predictors of progression-free survival.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 3","pages":"67"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11910465/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143650810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new p-value based multiple testing procedure for generalized linear models.
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-01-01 Epub Date: 2025-03-16 DOI: 10.1007/s11222-025-10600-2
Joseph Rilling, Cheng Yong Tang

This study introduces a novel p-value-based multiple testing approach tailored for generalized linear models. Despite the crucial role of generalized linear models in statistics, existing methodologies face obstacles arising from the heterogeneous variance of response variables and complex dependencies among estimated parameters. Our aim is to address the challenge of controlling the false discovery rate (FDR) amidst arbitrarily dependent test statistics. Through the development of efficient computational algorithms, we present a versatile statistical framework for multiple testing. The proposed framework accommodates a range of tools developed for constructing a new model matrix in regression-type analysis, including random row permutations and Model-X knockoffs. We devise efficient computing techniques to solve the encountered non-trivial quadratic matrix equations, enabling the construction of paired p-values suitable for the two-step multiple testing procedure proposed by Sarkar and Tang (Biometrika 109(4): 1149-1155, 2022). Theoretical analysis affirms the properties of our approach, demonstrating its capability to control the FDR at a given level. Empirical evaluations further substantiate its promising performance across diverse simulation settings.

Supplementary information: The online version contains supplementary material available at 10.1007/s11222-025-10600-2.

{"title":"A new <i>p</i>-value based multiple testing procedure for generalized linear models.","authors":"Joseph Rilling, Cheng Yong Tang","doi":"10.1007/s11222-025-10600-2","DOIUrl":"10.1007/s11222-025-10600-2","url":null,"abstract":"<p><p>This study introduces a novel <i>p</i>-value-based multiple testing approach tailored for generalized linear models. Despite the crucial role of generalized linear models in statistics, existing methodologies face obstacles arising from the heterogeneous variance of response variables and complex dependencies among estimated parameters. Our aim is to address the challenge of controlling the false discovery rate (FDR) amidst arbitrarily dependent test statistics. Through the development of efficient computational algorithms, we present a versatile statistical framework for multiple testing. The proposed framework accommodates a range of tools developed for constructing a new model matrix in regression-type analysis, including random row permutations and Model-X knockoffs. We devise efficient computing techniques to solve the encountered non-trivial quadratic matrix equations, enabling the construction of paired <i>p</i>-values suitable for the two-step multiple testing procedure proposed by Sarkar and Tang (Biometrika 109(4): 1149-1155, 2022). Theoretical analysis affirms the properties of our approach, demonstrating its capability to control the FDR at a given level. Empirical evaluations further substantiate its promising performance across diverse simulation settings.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s11222-025-10600-2.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 3","pages":"69"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11911269/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143658683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
funBIalign: a hierachical algorithm for functional motif discovery based on mean squared residue scores. funBIalign:基于平均残基平方得分的功能主题发现分层算法。
IF 1.6 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-01-01 Epub Date: 2024-12-10 DOI: 10.1007/s11222-024-10537-y
Jacopo Di Iorio, Marzia A Cremona, Francesca Chiaromonte

Motif discovery is gaining increasing attention in the domain of functional data analysis. Functional motifs are typical "shapes" or "patterns" that recur multiple times in different portions of a single curve and/or in misaligned portions of multiple curves. In this paper, we define functional motifs using an additive model and we propose funBIalign for their discovery and evaluation. Inspired by clustering and biclustering techniques, funBIalign is a multi-step procedure which uses agglomerative hierarchical clustering with complete linkage and a functional distance based on mean squared residue scores to discover functional motifs, both in a single curve (e.g., time series) and in a set of curves. We assess its performance and compare it to other recent methods through extensive simulations. Moreover, we use funBIalign for discovering motifs in two real-data case studies; one on food price inflation and one on temperature changes.

Supplementary information: The online version contains supplementary material available at 10.1007/s11222-024-10537-y.

Motif发现在功能数据分析领域受到越来越多的关注。功能图案是典型的“形状”或“图案”,它们在单个曲线的不同部分和/或多个曲线的不对齐部分反复出现多次。在本文中,我们使用一个加法模型来定义功能基序,并提出了funBIalign来发现和评估它们。受聚类和双聚类技术的启发,funBIalign是一个多步骤的过程,它使用具有完整链接的聚集分层聚类和基于均方残差分数的功能距离来发现单个曲线(例如,时间序列)和一组曲线中的功能基序。我们评估了它的性能,并通过广泛的模拟与其他最近的方法进行了比较。此外,我们在两个实际数据案例研究中使用funBIalign来发现motif;一个是关于食品价格通胀,另一个是关于气温变化。补充资料:在线版本提供补充资料,网址为10.1007/s11222-024-10537-y。
{"title":"funBIalign: a hierachical algorithm for functional motif discovery based on mean squared residue scores.","authors":"Jacopo Di Iorio, Marzia A Cremona, Francesca Chiaromonte","doi":"10.1007/s11222-024-10537-y","DOIUrl":"10.1007/s11222-024-10537-y","url":null,"abstract":"<p><p>Motif discovery is gaining increasing attention in the domain of functional data analysis. Functional motifs are typical \"shapes\" or \"patterns\" that recur multiple times in different portions of a single curve and/or in misaligned portions of multiple curves. In this paper, we define functional motifs using an additive model and we propose <i>funBIalign</i> for their discovery and evaluation. Inspired by clustering and biclustering techniques, <i>funBIalign</i> is a multi-step procedure which uses agglomerative hierarchical clustering with complete linkage and a functional distance based on mean squared residue scores to discover functional motifs, both in a single curve (e.g., time series) and in a set of curves. We assess its performance and compare it to other recent methods through extensive simulations. Moreover, we use <i>funBIalign</i> for discovering motifs in two real-data case studies; one on food price inflation and one on temperature changes.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s11222-024-10537-y.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"35 1","pages":"11"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11632007/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hidden Markov models for multivariate panel data 多元面板数据的隐马尔可夫模型
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-09-18 DOI: 10.1007/s11222-024-10462-0
Mackenzie R. Neal, Alexa A. Sochaniwsky, Paul D. McNicholas

While advances continue to be made in model-based clustering, challenges persist in modeling various data types such as panel data. Multivariate panel data present difficulties for clustering algorithms because they are often plagued by missing data and dropouts, presenting issues for estimation algorithms. This research presents a family of hidden Markov models that compensate for the issues that arise in panel data. A modified expectation–maximization algorithm capable of handling missing not at random data and dropout is presented and used to perform model estimation.

尽管基于模型的聚类技术不断进步,但在对面板数据等各种数据类型进行建模时仍面临挑战。多变量面板数据给聚类算法带来了困难,因为它们经常受到缺失数据和遗漏数据的困扰,给估计算法带来了问题。本研究提出了一系列隐马尔可夫模型,以弥补面板数据中出现的问题。本文提出了一种能够处理非随机数据缺失和遗漏的修正期望最大化算法,并将其用于模型估计。
{"title":"Hidden Markov models for multivariate panel data","authors":"Mackenzie R. Neal, Alexa A. Sochaniwsky, Paul D. McNicholas","doi":"10.1007/s11222-024-10462-0","DOIUrl":"https://doi.org/10.1007/s11222-024-10462-0","url":null,"abstract":"<p>While advances continue to be made in model-based clustering, challenges persist in modeling various data types such as panel data. Multivariate panel data present difficulties for clustering algorithms because they are often plagued by missing data and dropouts, presenting issues for estimation algorithms. This research presents a family of hidden Markov models that compensate for the issues that arise in panel data. A modified expectation–maximization algorithm capable of handling missing not at random data and dropout is presented and used to perform model estimation.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"20 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerated failure time models with error-prone response and nonlinear covariates 具有易出错响应和非线性协变量的加速故障时间模型
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-09-18 DOI: 10.1007/s11222-024-10491-9
Li-Pang Chen

As a specific application of survival analysis, one of main interests in medical studies aims to analyze the patients’ survival time of a specific cancer. Typically, gene expressions are treated as covariates to characterize the survival time. In the framework of survival analysis, the accelerated failure time model in the parametric form is perhaps a common approach. However, gene expressions are possibly nonlinear and the survival time as well as censoring status are subject to measurement error. In this paper, we aim to tackle those complex features simultaneously. We first correct for measurement error in survival time and censoring status, and use them to develop a corrected Buckley–James estimator. After that, we use the boosting algorithm with the cubic spline estimation method to iteratively recover nonlinear relationship between covariates and survival time. Theoretically, we justify the validity of measurement error correction and estimation procedure. Numerical studies show that the proposed method improves the performance of estimation and is able to capture informative covariates. The methodology is primarily used to analyze the breast cancer data provided by the Netherlands Cancer Institute for research.

作为生存分析的一个具体应用,医学研究的主要兴趣之一是分析特定癌症患者的生存时间。通常情况下,基因表达被视为协变量来描述生存时间。在生存分析框架中,参数形式的加速失效时间模型也许是一种常见的方法。然而,基因表达可能是非线性的,生存时间和普查状态也会受到测量误差的影响。本文旨在同时解决这些复杂的问题。我们首先修正了生存时间和普查状态的测量误差,并利用它们开发了一个修正的巴克利-詹姆斯估计器。之后,我们使用提升算法和三次样条估计方法迭代恢复协变量和生存时间之间的非线性关系。我们从理论上证明了测量误差校正和估计程序的有效性。数值研究表明,所提出的方法提高了估计的性能,并能捕捉到有信息量的协变量。该方法主要用于分析荷兰癌症研究所提供的乳腺癌研究数据。
{"title":"Accelerated failure time models with error-prone response and nonlinear covariates","authors":"Li-Pang Chen","doi":"10.1007/s11222-024-10491-9","DOIUrl":"https://doi.org/10.1007/s11222-024-10491-9","url":null,"abstract":"<p>As a specific application of survival analysis, one of main interests in medical studies aims to analyze the patients’ survival time of a specific cancer. Typically, gene expressions are treated as covariates to characterize the survival time. In the framework of survival analysis, the accelerated failure time model in the parametric form is perhaps a common approach. However, gene expressions are possibly nonlinear and the survival time as well as censoring status are subject to measurement error. In this paper, we aim to tackle those complex features simultaneously. We first correct for measurement error in survival time and censoring status, and use them to develop a corrected Buckley–James estimator. After that, we use the boosting algorithm with the cubic spline estimation method to iteratively recover nonlinear relationship between covariates and survival time. Theoretically, we justify the validity of measurement error correction and estimation procedure. Numerical studies show that the proposed method improves the performance of estimation and is able to capture informative covariates. The methodology is primarily used to analyze the breast cancer data provided by the Netherlands Cancer Institute for research.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sequential model identification with reversible jump ensemble data assimilation method 采用可逆跃迁集合数据同化方法进行序列模型识别
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-09-18 DOI: 10.1007/s11222-024-10499-1
Yue Huan, Hai Xiang Lin

In data assimilation (DA) schemes, the form representing the processes in the evolution models are pre-determined except some parameters to be estimated. In some applications, such as the contaminant solute transport model and the gas reservoir model, the modes in the equations within the evolution model cannot be predetermined from the outset and may change with the time. We propose a framework of sequential DA method named Reversible Jump Ensemble Filter (RJEnF) to identify the governing modes of the evolution model over time. The main idea is to introduce the Reversible Jump Markov Chain Monte Carlo (RJMCMC) method to the DA schemes to fit the situation where the modes of the evolution model are unknown and the dimension of the parameters is changing. Our framework allows us to identify the modes in the evolution model and their changes, as well as estimate the parameters and states of the dynamic system. Numerical experiments are conducted and the results show that our framework can effectively identify the underlying evolution models and increase the predictive accuracy of DA methods.

在数据同化(DA)方案中,除了一些需要估算的参数外,演化模型中表示过程的形式都是预先确定的。在某些应用中,如污染物溶质传输模型和储气库模型,演化模型中的方程模式无法从一开始就预先确定,可能会随着时间的推移而改变。我们提出了一种名为 "可逆跃迁集合滤波器(RJEnF)"的序列分析方法框架,用于识别演化模型随时间变化的支配模式。其主要思想是将可逆跃迁马尔可夫链蒙特卡洛(RJMCMC)方法引入数模转换方案,以适应演化模型模式未知且参数维度不断变化的情况。我们的框架允许我们识别演化模型中的模式及其变化,以及估计动态系统的参数和状态。我们进行了数值实验,结果表明我们的框架能有效识别底层演化模型,提高数模转换方法的预测精度。
{"title":"Sequential model identification with reversible jump ensemble data assimilation method","authors":"Yue Huan, Hai Xiang Lin","doi":"10.1007/s11222-024-10499-1","DOIUrl":"https://doi.org/10.1007/s11222-024-10499-1","url":null,"abstract":"<p>In data assimilation (DA) schemes, the form representing the processes in the evolution models are pre-determined except some parameters to be estimated. In some applications, such as the contaminant solute transport model and the gas reservoir model, the modes in the equations within the evolution model cannot be predetermined from the outset and may change with the time. We propose a framework of sequential DA method named Reversible Jump Ensemble Filter (RJEnF) to identify the governing modes of the evolution model over time. The main idea is to introduce the Reversible Jump Markov Chain Monte Carlo (RJMCMC) method to the DA schemes to fit the situation where the modes of the evolution model are unknown and the dimension of the parameters is changing. Our framework allows us to identify the modes in the evolution model and their changes, as well as estimate the parameters and states of the dynamic system. Numerical experiments are conducted and the results show that our framework can effectively identify the underlying evolution models and increase the predictive accuracy of DA methods.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"94 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shrinkage for extreme partial least-squares 极端部分最小二乘法的收缩
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-09-17 DOI: 10.1007/s11222-024-10490-w
Julyan Arbel, Stéphane Girard, Hadrien Lorenzo

This work focuses on dimension-reduction techniques for modelling conditional extreme values. Specifically, we investigate the idea that extreme values of a response variable can be explained by nonlinear functions derived from linear projections of an input random vector. In this context, the estimation of projection directions is examined, as approached by the extreme partial least squares (EPLS) method—an adaptation of the original partial least squares (PLS) method tailored to the extreme-value framework. Further, a novel interpretation of EPLS directions as maximum likelihood estimators is introduced, utilizing the von Mises–Fisher distribution applied to hyperballs. The dimension reduction process is enhanced through the Bayesian paradigm, enabling the incorporation of prior information into the projection direction estimation. The maximum a posteriori estimator is derived in two specific cases, elucidating it as a regularization or shrinkage of the EPLS estimator. We also establish its asymptotic behavior as the sample size approaches infinity. A simulation data study is conducted in order to assess the practical utility of our proposed method. This clearly demonstrates its effectiveness even in moderate data problems within high-dimensional settings. Furthermore, we provide an illustrative example of the method’s applicability using French farm income data, highlighting its efficacy in real-world scenarios.

这项研究的重点是条件极值建模的降维技术。具体来说,我们研究了这样一种观点,即响应变量的极值可以用输入随机向量的线性投影得出的非线性函数来解释。在此背景下,我们研究了极值偏最小二乘法(EPLS)对投影方向的估计,该方法是对原始偏最小二乘法(PLS)的改良,专门针对极值框架而设计。此外,利用应用于超球的 von Mises-Fisher 分布,引入了将 EPLS 方向解释为最大似然估计器的新方法。通过贝叶斯范式增强了维度缩减过程,从而将先验信息纳入投影方向估计。最大后验估计器在两种特定情况下得出,阐明了它是 EPLS 估计器的正则化或缩小。我们还确定了其在样本量接近无穷大时的渐近行为。为了评估我们提出的方法的实用性,我们进行了一项模拟数据研究。这清楚地表明,即使在高维设置下的中等数据问题中,该方法也非常有效。此外,我们还利用法国的农业收入数据举例说明了该方法的适用性,突出了它在现实世界中的功效。
{"title":"Shrinkage for extreme partial least-squares","authors":"Julyan Arbel, Stéphane Girard, Hadrien Lorenzo","doi":"10.1007/s11222-024-10490-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10490-w","url":null,"abstract":"<p>This work focuses on dimension-reduction techniques for modelling conditional extreme values. Specifically, we investigate the idea that extreme values of a response variable can be explained by nonlinear functions derived from linear projections of an input random vector. In this context, the estimation of projection directions is examined, as approached by the extreme partial least squares (EPLS) method—an adaptation of the original partial least squares (PLS) method tailored to the extreme-value framework. Further, a novel interpretation of EPLS directions as maximum likelihood estimators is introduced, utilizing the von Mises–Fisher distribution applied to hyperballs. The dimension reduction process is enhanced through the Bayesian paradigm, enabling the incorporation of prior information into the projection direction estimation. The maximum a posteriori estimator is derived in two specific cases, elucidating it as a regularization or shrinkage of the EPLS estimator. We also establish its asymptotic behavior as the sample size approaches infinity. A simulation data study is conducted in order to assess the practical utility of our proposed method. This clearly demonstrates its effectiveness even in moderate data problems within high-dimensional settings. Furthermore, we provide an illustrative example of the method’s applicability using French farm income data, highlighting its efficacy in real-world scenarios.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"205 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistics and Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1