Pub Date : 2024-05-31DOI: 10.1016/j.csda.2024.107995
Matthieu Bulté , Helle Sørensen
An adaptation of the random forest algorithm for Fréchet regression is revisited, addressing the challenge of regression with random objects in metric spaces. To overcome the limitations of previous approaches, a new splitting rule is introduced, substituting the computationally expensive Fréchet means with a medoid-based approach. The asymptotic equivalence of this method to Fréchet mean-based procedures is demonstrated, along with the consistency of the associated regression estimator. This approach provides a sound theoretical framework and a more efficient computational solution to Fréchet regression, broadening its application to non-standard data types and complex use cases.
{"title":"Medoid splits for efficient random forests in metric spaces","authors":"Matthieu Bulté , Helle Sørensen","doi":"10.1016/j.csda.2024.107995","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107995","url":null,"abstract":"<div><p>An adaptation of the random forest algorithm for Fréchet regression is revisited, addressing the challenge of regression with random objects in metric spaces. To overcome the limitations of previous approaches, a new splitting rule is introduced, substituting the computationally expensive Fréchet means with a medoid-based approach. The asymptotic equivalence of this method to Fréchet mean-based procedures is demonstrated, along with the consistency of the associated regression estimator. This approach provides a sound theoretical framework and a more efficient computational solution to Fréchet regression, broadening its application to non-standard data types and complex use cases.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107995"},"PeriodicalIF":1.5,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000793/pdfft?md5=90ce48cb2e6d039f213ac81b5b60098d&pid=1-s2.0-S0167947324000793-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141481267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-27DOI: 10.1016/j.csda.2024.107993
Jiarong Ouyang, Xuan Cao
Spike and slab priors have emerged as effective and computationally scalable tools for Bayesian variable selection in high-dimensional linear regression. However, the crucial model selection consistency and efficient computational strategies using spike and slab priors in probit regression have rarely been investigated. A hierarchical probit model with continuous spike and slab priors over regression coefficients is considered, and a highly scalable Gibbs sampler with a computational complexity that grows only linearly in the dimension of predictors is proposed. Specifically, the “Skinny Gibbs” algorithm is adapted to the setting of probit and negative binomial regression and model selection consistency for the proposed method under probit model is established, when the number of covariates is allowed to grow much larger than the sample size. Through simulation studies, the method is shown to achieve superior empirical performance compared with other state-of-the art methods. Gene expression data from 51 asthmatic and 44 non-asthmatic samples are analyzed and the performance for predicting asthma using the proposed approach is compared with existing approaches.
{"title":"Consistent skinny Gibbs in probit regression","authors":"Jiarong Ouyang, Xuan Cao","doi":"10.1016/j.csda.2024.107993","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107993","url":null,"abstract":"<div><p>Spike and slab priors have emerged as effective and computationally scalable tools for Bayesian variable selection in high-dimensional linear regression. However, the crucial model selection consistency and efficient computational strategies using spike and slab priors in probit regression have rarely been investigated. A hierarchical probit model with continuous spike and slab priors over regression coefficients is considered, and a highly scalable Gibbs sampler with a computational complexity that grows only linearly in the dimension of predictors is proposed. Specifically, the “Skinny Gibbs” algorithm is adapted to the setting of probit and negative binomial regression and model selection consistency for the proposed method under probit model is established, when the number of covariates is allowed to grow much larger than the sample size. Through simulation studies, the method is shown to achieve superior empirical performance compared with other state-of-the art methods. Gene expression data from 51 asthmatic and 44 non-asthmatic samples are analyzed and the performance for predicting asthma using the proposed approach is compared with existing approaches.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107993"},"PeriodicalIF":1.8,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141243339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-23DOI: 10.1016/j.csda.2024.107992
Guanghui Cheng , Qiang Xiong , Ruitao Lin
In real-world applications, the geometric median is a natural quantity to consider for robust inference of location or central tendency, particularly when dealing with non-standard or irregular data distributions. An innovative online bootstrap inference algorithm, using the averaged nonlinear stochastic gradient algorithm, is proposed to make statistical inference about the geometric median from massive datasets. The method is computationally fast and memory-friendly, and it is easy to update as new data is received sequentially. The validity of the proposed online bootstrap inference is theoretically justified. Simulation studies under a variety of scenarios are conducted to demonstrate its effectiveness and efficiency in terms of computation speed and memory usage. Additionally, the online inference procedure is applied to a large publicly available dataset for skin segmentation.
{"title":"Online bootstrap inference for the geometric median","authors":"Guanghui Cheng , Qiang Xiong , Ruitao Lin","doi":"10.1016/j.csda.2024.107992","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107992","url":null,"abstract":"<div><p>In real-world applications, the geometric median is a natural quantity to consider for robust inference of location or central tendency, particularly when dealing with non-standard or irregular data distributions. An innovative online bootstrap inference algorithm, using the averaged nonlinear stochastic gradient algorithm, is proposed to make statistical inference about the geometric median from massive datasets. The method is computationally fast and memory-friendly, and it is easy to update as new data is received sequentially. The validity of the proposed online bootstrap inference is theoretically justified. Simulation studies under a variety of scenarios are conducted to demonstrate its effectiveness and efficiency in terms of computation speed and memory usage. Additionally, the online inference procedure is applied to a large publicly available dataset for skin segmentation.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"197 ","pages":"Article 107992"},"PeriodicalIF":1.8,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141090527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-23DOI: 10.1016/j.csda.2024.107987
Wenqing Su , Xiao Guo , Xiangyu Chang , Ying Yang
Modern network analysis often involves multi-layer network data in which the nodes are aligned, and the edges on each layer represent one of the multiple relations among the nodes. Current literature on multi-layer network data is mostly limited to undirected relations. However, direct relations are more common and may introduce extra information. This study focuses on community detection (or clustering) in multi-layer directed networks. To take into account the asymmetry, a novel spectral-co-clustering-based algorithm is developed to detect co-clusters, which capture the sending patterns and receiving patterns of nodes, respectively. Specifically, the eigendecomposition of the debiased sum of Gram matrices over the layer-wise adjacency matrices is computed, followed by the k-means, where the sum of Gram matrices is used to avoid possible cancellation of clusters caused by direct summation. Theoretical analysis of the algorithm under the multi-layer stochastic co-block model is provided, where the common assumption that the cluster number is coupled with the rank of the model is relaxed. After a systematic analysis of the eigenvectors of the population version algorithm, the misclassification rates are derived, which show that multi-layers would bring benefits to the clustering performance. The experimental results of simulated data corroborate the theoretical predictions, and the analysis of a real-world trade network dataset provides interpretable results.
{"title":"Spectral co-clustering in multi-layer directed networks","authors":"Wenqing Su , Xiao Guo , Xiangyu Chang , Ying Yang","doi":"10.1016/j.csda.2024.107987","DOIUrl":"10.1016/j.csda.2024.107987","url":null,"abstract":"<div><p>Modern network analysis often involves multi-layer network data in which the nodes are aligned, and the edges on each layer represent one of the multiple relations among the nodes. Current literature on multi-layer network data is mostly limited to undirected relations. However, direct relations are more common and may introduce extra information. This study focuses on community detection (or clustering) in multi-layer directed networks. To take into account the asymmetry, a novel spectral-co-clustering-based algorithm is developed to detect <em>co-clusters</em>, which capture the sending patterns and receiving patterns of nodes, respectively. Specifically, the eigendecomposition of the <em>debiased</em> sum of Gram matrices over the layer-wise adjacency matrices is computed, followed by the <em>k</em>-means, where the sum of Gram matrices is used to avoid possible cancellation of clusters caused by direct summation. Theoretical analysis of the algorithm under the multi-layer stochastic co-block model is provided, where the common assumption that the cluster number is coupled with the rank of the model is relaxed. After a systematic analysis of the eigenvectors of the population version algorithm, the misclassification rates are derived, which show that multi-layers would bring benefits to the clustering performance. The experimental results of simulated data corroborate the theoretical predictions, and the analysis of a real-world trade network dataset provides interpretable results.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107987"},"PeriodicalIF":1.8,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141132464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-23DOI: 10.1016/j.csda.2024.107991
Mingyue Du , Xingqiu Zhao
This paper discusses regression analysis of case K interval-censored failure time data, a general type of failure time data, in the presence of informative censoring with the focus on simultaneous variable selection and estimation. Although many authors have considered the challenging variable selection problem for interval-censored data, most of the existing methods assume independent or non-informative censoring. More importantly, the existing methods that allow for informative censoring are frailty model-based approaches and cannot directly assess the degree of informative censoring among other shortcomings. To address these, we propose a conditional approach and develop a penalized sieve maximum likelihood procedure for the simultaneous variable selection and estimation of covariate effects. Furthermore, we establish the oracle property of the proposed method and illustrate the appropriateness and usefulness of the approach using a simulation study. Finally we apply the proposed method to a set of real data on Alzheimer's disease and provide some new insights.
本文讨论了 K 例区间删失故障时间数据(一种常见的故障时间数据)在有信息删失情况下的回归分析,重点是同时进行变量选择和估计。尽管许多学者都考虑过区间删失数据的变量选择问题,但现有的大多数方法都假设了独立或非信息删失。更重要的是,现有的允许信息剔除的方法都是基于虚弱模型的方法,不能直接评估信息剔除的程度等缺点。为了解决这些问题,我们提出了一种有条件的方法,并开发了一种惩罚性筛最大似然程序,用于同时选择变量和估计协变量效应。此外,我们还建立了所提方法的甲骨文属性,并通过模拟研究说明了该方法的适当性和实用性。最后,我们将提出的方法应用于一组关于阿尔茨海默病的真实数据,并提出了一些新的见解。
{"title":"A conditional approach for regression analysis of case K interval-censored failure time data with informative censoring","authors":"Mingyue Du , Xingqiu Zhao","doi":"10.1016/j.csda.2024.107991","DOIUrl":"10.1016/j.csda.2024.107991","url":null,"abstract":"<div><p>This paper discusses regression analysis of case <em>K</em> interval-censored failure time data, a general type of failure time data, in the presence of informative censoring with the focus on simultaneous variable selection and estimation. Although many authors have considered the challenging variable selection problem for interval-censored data, most of the existing methods assume independent or non-informative censoring. More importantly, the existing methods that allow for informative censoring are frailty model-based approaches and cannot directly assess the degree of informative censoring among other shortcomings. To address these, we propose a conditional approach and develop a penalized sieve maximum likelihood procedure for the simultaneous variable selection and estimation of covariate effects. Furthermore, we establish the oracle property of the proposed method and illustrate the appropriateness and usefulness of the approach using a simulation study. Finally we apply the proposed method to a set of real data on Alzheimer's disease and provide some new insights.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107991"},"PeriodicalIF":1.8,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141133396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-21DOI: 10.1016/j.csda.2024.107989
Kipoong Kim , Jaesung Park , Sungkyu Jung
Recent advances in DNA sequencing technology have led to a growing interest in microbiome data. Since the data are often high-dimensional, there is a clear need for dimensionality reduction. However, the compositional nature and zero-inflation of microbiome data present many challenges in developing new methodologies. New PCA methods for zero-inflated compositional data are presented, based on a novel framework called principal compositional subspace. These methods aim to identify both the principal compositional subspace and the corresponding principal scores that best approximate the given data, ensuring that their reconstruction remains within the compositional simplex. To this end, the constrained optimization problems are established and alternating minimization algorithms are provided to solve the problems. The theoretical properties of the principal compositional subspace, particularly focusing on its existence and consistency, are further investigated. Simulation studies have demonstrated that the methods achieve lower reconstruction errors than the existing log-ratio PCA in the presence of a linear pattern and have shown comparable performance in a curved pattern. The methods have been applied to four microbiome compositional datasets with excessive zeros, successfully recovering the underlying low-rank structure.
DNA 测序技术的最新进展使得人们对微生物组数据的兴趣与日俱增。由于数据通常是高维的,因此显然需要降维。然而,微生物组数据的组成性质和零膨胀性给开发新方法带来了许多挑战。本文基于一个称为主成分子空间的新框架,介绍了用于零膨胀成分数据的新 PCA 方法。这些方法旨在找出最接近给定数据的主成分子空间和相应的主分数,确保它们的重构保持在成分单纯形内。为此,建立了约束优化问题,并提供了交替最小化算法来解决这些问题。此外,还进一步研究了主组成子空间的理论特性,特别是其存在性和一致性。模拟研究表明,与现有的对数比率 PCA 相比,这些方法在线性模式下的重建误差更小,在曲线模式下的性能相当。这些方法已应用于四个零点过多的微生物组成分数据集,成功地恢复了底层的低秩结构。
{"title":"Principal component analysis for zero-inflated compositional data","authors":"Kipoong Kim , Jaesung Park , Sungkyu Jung","doi":"10.1016/j.csda.2024.107989","DOIUrl":"10.1016/j.csda.2024.107989","url":null,"abstract":"<div><p>Recent advances in DNA sequencing technology have led to a growing interest in microbiome data. Since the data are often high-dimensional, there is a clear need for dimensionality reduction. However, the compositional nature and zero-inflation of microbiome data present many challenges in developing new methodologies. New PCA methods for zero-inflated compositional data are presented, based on a novel framework called principal compositional subspace. These methods aim to identify both the principal compositional subspace and the corresponding principal scores that best approximate the given data, ensuring that their reconstruction remains within the compositional simplex. To this end, the constrained optimization problems are established and alternating minimization algorithms are provided to solve the problems. The theoretical properties of the principal compositional subspace, particularly focusing on its existence and consistency, are further investigated. Simulation studies have demonstrated that the methods achieve lower reconstruction errors than the existing log-ratio PCA in the presence of a linear pattern and have shown comparable performance in a curved pattern. The methods have been applied to four microbiome compositional datasets with excessive zeros, successfully recovering the underlying low-rank structure.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107989"},"PeriodicalIF":1.8,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141130855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-20DOI: 10.1016/j.csda.2024.107990
Olha Bodnar , Taras Bodnar
Bayesian inference procedures for the parameters of the multivariate random effects model are derived under the assumption of an elliptically contoured distribution when the Berger and Bernardo reference and the Jeffreys priors are assigned to the model parameters. A new numerical algorithm for drawing samples from the posterior distribution is developed, which is based on the hybrid Gibbs sampler. The new approach is compared to the two Metropolis-Hastings algorithms previously derived in the literature via an extensive simulation study. The findings are applied to a Bayesian multivariate meta-analysis, conducted using the results of ten studies on the effectiveness of a treatment for hypertension. The analysis investigates the treatment effects on systolic and diastolic blood pressure. The second empirical illustration deals with measurement data from the CCAUV.V-K1 key comparison, aiming to compare measurement results of sinusoidal linear accelerometers at four frequencies.
{"title":"Gibbs sampler approach for objective Bayesian inference in elliptical multivariate meta-analysis random effects model","authors":"Olha Bodnar , Taras Bodnar","doi":"10.1016/j.csda.2024.107990","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107990","url":null,"abstract":"<div><p>Bayesian inference procedures for the parameters of the multivariate random effects model are derived under the assumption of an elliptically contoured distribution when the Berger and Bernardo reference and the Jeffreys priors are assigned to the model parameters. A new numerical algorithm for drawing samples from the posterior distribution is developed, which is based on the hybrid Gibbs sampler. The new approach is compared to the two Metropolis-Hastings algorithms previously derived in the literature via an extensive simulation study. The findings are applied to a Bayesian multivariate meta-analysis, conducted using the results of ten studies on the effectiveness of a treatment for hypertension. The analysis investigates the treatment effects on systolic and diastolic blood pressure. The second empirical illustration deals with measurement data from the CCAUV.V-K1 key comparison, aiming to compare measurement results of sinusoidal linear accelerometers at four frequencies.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"197 ","pages":"Article 107990"},"PeriodicalIF":1.8,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000744/pdfft?md5=f03345bd15314ef0a3bf57ae49fa38db&pid=1-s2.0-S0167947324000744-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141084457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-16DOI: 10.1016/j.csda.2024.107988
S.G. Meintanis , B. Milošević , M.D. Jiménez–Gamero
Tests of fit for classes of distributions that include the Weibull, the Pareto and the Fréchet families are proposed. The new tests employ the novel tool of the min–characteristic function and are based on an –type weighted distance between this function and its empirical counterpart applied on suitably standardized data. If data–standardization is performed using the MLE of the distributional parameters then the method reduces to testing for the standard member of the family, with parameter values known and set equal to one. Asymptotic properties of the tests are investigated. A Monte Carlo study is presented that includes the new procedure as well as competitors for the purpose of specification testing with three extreme value distributions. The new tests are also applied on a few real–data sets.
{"title":"Goodness–of–fit tests based on the min–characteristic function","authors":"S.G. Meintanis , B. Milošević , M.D. Jiménez–Gamero","doi":"10.1016/j.csda.2024.107988","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107988","url":null,"abstract":"<div><p>Tests of fit for classes of distributions that include the Weibull, the Pareto and the Fréchet families are proposed. The new tests employ the novel tool of the min–characteristic function and are based on an <span><math><msup><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>–type weighted distance between this function and its empirical counterpart applied on suitably standardized data. If data–standardization is performed using the MLE of the distributional parameters then the method reduces to testing for the standard member of the family, with parameter values known and set equal to one. Asymptotic properties of the tests are investigated. A Monte Carlo study is presented that includes the new procedure as well as competitors for the purpose of specification testing with three extreme value distributions. The new tests are also applied on a few real–data sets.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"197 ","pages":"Article 107988"},"PeriodicalIF":1.8,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000720/pdfft?md5=345ca757392dc6a128ee30fc0f6964c2&pid=1-s2.0-S0167947324000720-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141072605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-13DOI: 10.1016/j.csda.2024.107978
Ke Yu, Shan Luo
High-dimensional accelerated failure time (AFT) models are commonly used regression models in survival analysis. Feature selection problem in high-dimensional AFT models is addressed, considering scenarios involving solely main effects or encompassing both main and interaction effects. A rank-based sequential feature selection (RankSFS) method is proposed, the selection consistency is established and illustrated by comparing it with existing methods through extensive numerical simulations. The results show that RankSFS achieves a higher Positive Discovery Rate (PDR) and lower False Discovery Rate (FDR). Additionally, RankSFS is applied to analyze the data on Breast Cancer Relapse. With a remarkable short computational time, RankSFS successfully identifies two crucial genes.
{"title":"Rank-based sequential feature selection for high-dimensional accelerated failure time models with main and interaction effects","authors":"Ke Yu, Shan Luo","doi":"10.1016/j.csda.2024.107978","DOIUrl":"10.1016/j.csda.2024.107978","url":null,"abstract":"<div><p>High-dimensional accelerated failure time (AFT) models are commonly used regression models in survival analysis. Feature selection problem in high-dimensional AFT models is addressed, considering scenarios involving solely main effects or encompassing both main and interaction effects. A rank-based sequential feature selection (RankSFS) method is proposed, the selection consistency is established and illustrated by comparing it with existing methods through extensive numerical simulations. The results show that RankSFS achieves a higher Positive Discovery Rate (PDR) and lower False Discovery Rate (FDR). Additionally, RankSFS is applied to analyze the data on Breast Cancer Relapse. With a remarkable short computational time, RankSFS successfully identifies two crucial genes.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"197 ","pages":"Article 107978"},"PeriodicalIF":1.8,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141027872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-09DOI: 10.1016/j.csda.2024.107977
Jingxue Feng, Liangliang Wang
The effective control of infectious diseases relies on accurate assessment of the impact of interventions, which is often hindered by the complex dynamics of the spread of disease. A Beta-Dirichlet switching state-space transmission model is proposed to track underlying dynamics of disease and evaluate the effectiveness of interventions simultaneously. As time evolves, the switching mechanism introduced in the susceptible-exposed-infected-recovered (SEIR) model is able to capture the timing and magnitude of changes in the transmission rate due to the effectiveness of control measures. The implementation of this model is based on a particle Markov Chain Monte Carlo algorithm, which can estimate the time evolution of SEIR states, switching states, and high-dimensional parameters efficiently. The efficacy of the proposed model and estimation procedure are demonstrated through simulation studies. With a real-world application to British Columbia's COVID-19 outbreak, the proposed switching state-space transmission model quantifies the reduction of transmission rate following interventions. The proposed model provides a promising tool to inform public health policies aimed at studying the underlying dynamics and evaluating the effectiveness of interventions during the spread of the disease.
{"title":"A switching state-space transmission model for tracking epidemics and assessing interventions","authors":"Jingxue Feng, Liangliang Wang","doi":"10.1016/j.csda.2024.107977","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107977","url":null,"abstract":"<div><p>The effective control of infectious diseases relies on accurate assessment of the impact of interventions, which is often hindered by the complex dynamics of the spread of disease. A Beta-Dirichlet switching state-space transmission model is proposed to track underlying dynamics of disease and evaluate the effectiveness of interventions simultaneously. As time evolves, the switching mechanism introduced in the susceptible-exposed-infected-recovered (SEIR) model is able to capture the timing and magnitude of changes in the transmission rate due to the effectiveness of control measures. The implementation of this model is based on a particle Markov Chain Monte Carlo algorithm, which can estimate the time evolution of SEIR states, switching states, and high-dimensional parameters efficiently. The efficacy of the proposed model and estimation procedure are demonstrated through simulation studies. With a real-world application to British Columbia's COVID-19 outbreak, the proposed switching state-space transmission model quantifies the reduction of transmission rate following interventions. The proposed model provides a promising tool to inform public health policies aimed at studying the underlying dynamics and evaluating the effectiveness of interventions during the spread of the disease.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"197 ","pages":"Article 107977"},"PeriodicalIF":1.8,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000616/pdfft?md5=2ef429fe1ac8d3ce2c054c514b5fee1b&pid=1-s2.0-S0167947324000616-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140947674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}