Pub Date : 2025-06-01Epub Date: 2025-01-19DOI: 10.1111/sjos.12765
Tommaso Rigon, Sonia Petrone, Bruno Scarpa
Bayesian nonparametrics has evolved into a broad area encompassing flexible methods for Bayesian inference, combinatorial structures, tools for complex data reduction, and more. Discrete prior laws play an important role in these developments, and various choices are available nowadays. However, many existing priors, such as the Dirichlet process, have limitations if data require nested clustering structures. Thus, we introduce a discrete nonparametric prior, termed the enriched Pitman-Yor process, which offers higher flexibility in modeling such elaborate partition structures. We investigate the theoretical properties of this novel prior and establish its formal connection with the enriched Dirichlet process and normalized random measures. Additionally, we present a square-breaking representation and derive closed-form expressions for the posterior law and associated urn schemes. Furthermore, we demonstrate that several established models, including Dirichlet processes with a spike-and-slab base measure and mixture of mixtures models, emerge as special instances of the enriched Pitman-Yor process, which therefore serves as a unified probabilistic framework for various Bayesian nonparametric priors. To illustrate its practical utility, we employ the enriched Pitman-Yor process for a species-sampling ecological problem.
{"title":"Enriched Pitman-Yor processes.","authors":"Tommaso Rigon, Sonia Petrone, Bruno Scarpa","doi":"10.1111/sjos.12765","DOIUrl":"10.1111/sjos.12765","url":null,"abstract":"<p><p>Bayesian nonparametrics has evolved into a broad area encompassing flexible methods for Bayesian inference, combinatorial structures, tools for complex data reduction, and more. Discrete prior laws play an important role in these developments, and various choices are available nowadays. However, many existing priors, such as the Dirichlet process, have limitations if data require nested clustering structures. Thus, we introduce a discrete nonparametric prior, termed the enriched Pitman-Yor process, which offers higher flexibility in modeling such elaborate partition structures. We investigate the theoretical properties of this novel prior and establish its formal connection with the enriched Dirichlet process and normalized random measures. Additionally, we present a square-breaking representation and derive closed-form expressions for the posterior law and associated urn schemes. Furthermore, we demonstrate that several established models, including Dirichlet processes with a spike-and-slab base measure and mixture of mixtures models, emerge as special instances of the enriched Pitman-Yor process, which therefore serves as a unified probabilistic framework for various Bayesian nonparametric priors. To illustrate its practical utility, we employ the enriched Pitman-Yor process for a species-sampling ecological problem.</p>","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":"52 2","pages":"631-657"},"PeriodicalIF":1.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12338310/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144838401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2025-02-05DOI: 10.1111/sjos.12768
Jianrui Zhang, Chenxi Li, Haolei Weng
We develop a post-selection inference method for the Cox proportional hazards model with interval-censored data, which provides asymptotically valid p-values and confidence intervals conditional on the model selected by lasso. The method is based on a pivotal quantity that is shown to converge to a uniform distribution under local parameters. Our method involves estimation of the efficient information matrix, for which several approaches are proposed with proof of their consistency. Thorough simulation studies show that our method has satisfactory performance in samples of modest sizes. The utility of the method is illustrated via an application to an Alzheimer's disease study.
{"title":"Post-selection inference for the Cox model with interval-censored data.","authors":"Jianrui Zhang, Chenxi Li, Haolei Weng","doi":"10.1111/sjos.12768","DOIUrl":"10.1111/sjos.12768","url":null,"abstract":"<p><p>We develop a post-selection inference method for the Cox proportional hazards model with interval-censored data, which provides asymptotically valid p-values and confidence intervals conditional on the model selected by lasso. The method is based on a pivotal quantity that is shown to converge to a uniform distribution under local parameters. Our method involves estimation of the efficient information matrix, for which several approaches are proposed with proof of their consistency. Thorough simulation studies show that our method has satisfactory performance in samples of modest sizes. The utility of the method is illustrated via an application to an Alzheimer's disease study.</p>","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":"52 2","pages":"710-735"},"PeriodicalIF":1.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12347693/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144856896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2025-02-09DOI: 10.1111/sjos.12770
Tzu-Jung Huang, Zhonghua Liu, Ian W McKeague
It is of substantial scientific interest to detect mediators that lie in the causal pathway from an exposure to a survival outcome. However, with high-dimensional mediators, as often encountered in modern genomic data settings, there is a lack of powerful methods that can provide valid post-selection inference for the identified marginal mediation effect. To resolve this challenge, we develop a post-selection inference procedure for the maximally selected natural indirect effect using a semiparametric efficient influence function approach. To this end, we establish the asymptotic normality of a stabilized one-step estimator that takes the selection of the mediator into account. Simulation studies show that our proposed method has good empirical performance. We further apply our proposed approach to a lung cancer dataset and find multiple DNA methylation CpG sites that might mediate the effect of cigarette smoking on lung cancer survival.
{"title":"Post-selection inference for high-dimensional mediation analysis with survival outcomes.","authors":"Tzu-Jung Huang, Zhonghua Liu, Ian W McKeague","doi":"10.1111/sjos.12770","DOIUrl":"10.1111/sjos.12770","url":null,"abstract":"<p><p>It is of substantial scientific interest to detect mediators that lie in the causal pathway from an exposure to a survival outcome. However, with high-dimensional mediators, as often encountered in modern genomic data settings, there is a lack of powerful methods that can provide valid post-selection inference for the identified marginal mediation effect. To resolve this challenge, we develop a post-selection inference procedure for the maximally selected natural indirect effect using a semiparametric efficient influence function approach. To this end, we establish the asymptotic normality of a stabilized one-step estimator that takes the selection of the mediator into account. Simulation studies show that our proposed method has good empirical performance. We further apply our proposed approach to a lung cancer dataset and find multiple DNA methylation CpG sites that might mediate the effect of cigarette smoking on lung cancer survival.</p>","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":"52 2","pages":"756-776"},"PeriodicalIF":1.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12369553/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144976524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In linear least squares regression there exists a simple decomposition of the effect of an exposure on an outcome into two parts in the presence of an intermediate variable. This decomposition is described and then analogous decompositions for other models are examined, namely for logistic regression and proportional hazards models.
{"title":"Some approximations to the path formula for some nonlinear models","authors":"Christiana Kartsonaki","doi":"10.1111/sjos.12753","DOIUrl":"https://doi.org/10.1111/sjos.12753","url":null,"abstract":"In linear least squares regression there exists a simple decomposition of the effect of an exposure on an outcome into two parts in the presence of an intermediate variable. This decomposition is described and then analogous decompositions for other models are examined, namely for logistic regression and proportional hazards models.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":"19 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a model to address the overlooked problem of node clustering in simple hypergraphs. Simple hypergraphs are suitable when a node may not appear multiple times in the same hyperedge, such as in co‐authorship datasets. Our model generalizes the stochastic blockmodel for graphs and assumes the existence of latent node groups and hyperedges are conditionally independent given these groups. We first establish the generic identifiability of the model parameters. We then develop a variational approximation Expectation‐Maximization algorithm for parameter inference and node clustering, and derive a statistical criterion for model selection. To illustrate the performance of our R package HyperSBM, we compare it with other node clustering methods using synthetic data generated from the model, as well as from a line clustering experiment and a co‐authorship dataset.
我们提出了一个模型来解决简单超图中被忽视的节点聚类问题。简单超图适用于一个节点可能不会多次出现在同一个超节点中的情况,例如在共同作者数据集中。我们的模型概括了图的随机块模型,并假定存在潜在的节点群组,而超图在这些群组中是有条件独立的。我们首先建立了模型参数的通用可识别性。然后,我们开发了一种用于参数推断和节点聚类的变分近似期望最大化算法,并推导出一种用于模型选择的统计标准。为了说明我们的 R 软件包 HyperSBM 的性能,我们使用该模型生成的合成数据以及行聚类实验和共同作者数据集,将其与其他节点聚类方法进行了比较。
{"title":"Model‐based clustering in simple hypergraphs through a stochastic blockmodel","authors":"Luca Brusa, Catherine Matias","doi":"10.1111/sjos.12754","DOIUrl":"https://doi.org/10.1111/sjos.12754","url":null,"abstract":"We propose a model to address the overlooked problem of node clustering in simple hypergraphs. Simple hypergraphs are suitable when a node may not appear multiple times in the same hyperedge, such as in co‐authorship datasets. Our model generalizes the stochastic blockmodel for graphs and assumes the existence of latent node groups and hyperedges are conditionally independent given these groups. We first establish the generic identifiability of the model parameters. We then develop a variational approximation Expectation‐Maximization algorithm for parameter inference and node clustering, and derive a statistical criterion for model selection. To illustrate the performance of our <jats:styled-content>R</jats:styled-content> package <jats:styled-content>HyperSBM</jats:styled-content>, we compare it with other node clustering methods using synthetic data generated from the model, as well as from a line clustering experiment and a co‐authorship dataset.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":"66 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several models for count time series have been developed during the last decades, often inspired by traditional autoregressive moving average (ARMA) models for real‐valued time series, including integer‐valued ARMA (INARMA) and integer‐valued generalized autoregressive conditional heteroscedasticity (INGARCH) models. Both INARMA and INGARCH models exhibit an ARMA‐like autocorrelation function (ACF). To achieve negative ACF values within the class of INGARCH models, log and softplus link functions are suggested in the literature, where the softplus approach leads to conditional linearity in good approximation. However, the softplus approach is limited to the INGARCH family for unbounded counts, that is, it can neither be used for bounded counts, nor for count processes from the INARMA family. In this paper, we present an alternative solution, named the Tobit approach, for achieving approximate linearity together with negative ACF values, which is more generally applicable than the softplus approach. A Skellam–Tobit INGARCH model for unbounded counts is studied in detail, including stationarity, approximate computation of moments, maximum likelihood and censored least absolute deviations estimation for unknown parameters and corresponding simulations. Extensions of the Tobit approach to other situations are also discussed, including underlying discrete distributions, INAR models, and bounded counts. Three real‐data examples are considered to illustrate the usefulness of the new approach.
{"title":"Tobit models for count time series","authors":"Christian H. Weiß, Fukang Zhu","doi":"10.1111/sjos.12751","DOIUrl":"https://doi.org/10.1111/sjos.12751","url":null,"abstract":"Several models for count time series have been developed during the last decades, often inspired by traditional autoregressive moving average (ARMA) models for real‐valued time series, including integer‐valued ARMA (INARMA) and integer‐valued generalized autoregressive conditional heteroscedasticity (INGARCH) models. Both INARMA and INGARCH models exhibit an ARMA‐like autocorrelation function (ACF). To achieve negative ACF values within the class of INGARCH models, log and softplus link functions are suggested in the literature, where the softplus approach leads to conditional linearity in good approximation. However, the softplus approach is limited to the INGARCH family for unbounded counts, that is, it can neither be used for bounded counts, nor for count processes from the INARMA family. In this paper, we present an alternative solution, named the Tobit approach, for achieving approximate linearity together with negative ACF values, which is more generally applicable than the softplus approach. A Skellam–Tobit INGARCH model for unbounded counts is studied in detail, including stationarity, approximate computation of moments, maximum likelihood and censored least absolute deviations estimation for unknown parameters and corresponding simulations. Extensions of the Tobit approach to other situations are also discussed, including underlying discrete distributions, INAR models, and bounded counts. Three real‐data examples are considered to illustrate the usefulness of the new approach.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":"51 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sir David Cox published four papers in the Scandinavian Journal of Statistics and two in the Scandinavian Actuarial Journal. This note provides some brief summaries of these papers.
{"title":"On some publications of Sir David Cox","authors":"Nancy Reid","doi":"10.1111/sjos.12752","DOIUrl":"https://doi.org/10.1111/sjos.12752","url":null,"abstract":"Sir David Cox published four papers in the <jats:italic>Scandinavian Journal of Statistics</jats:italic> and two in the <jats:italic>Scandinavian Actuarial Journal</jats:italic>. This note provides some brief summaries of these papers.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":"2022 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Statistician C. R. Rao made many contributions to multivariate analysis over the span of his career. Some of his earliest contributions continue to be used and built upon almost 80 years later, while his more recent contributions spur new avenues of research. The present article discusses these contributions, how they helped shape multivariate analysis as we see it today, and what we may learn from reviewing his works. Topics include his extension of linear discriminant analysis, Rao's perimeter test, Rao's U statistic, his asymptotic expansion of Wilks' statistic, canonical factor analysis, functional principal component analysis, redundancy analysis, canonical coordinates, and correspondence analysis. The examination of his works shows that interdisciplinary collaboration and the utilization of real datasets were crucial in almost all of Rao's impactful contributions.
统计学家 C. R. Rao 在其职业生涯中对多元分析做出了许多贡献。他最早的一些贡献在近 80 年后的今天仍被继续使用和发扬光大,而他最近的贡献则推动了新的研究方向。本文将讨论这些贡献,它们如何帮助塑造了我们今天看到的多元分析,以及我们可以从回顾他的作品中学到什么。主题包括他对线性判别分析的扩展、Rao 的周长检验、Rao 的 U 统计量、Wilks 统计量的渐近展开、典型因子分析、函数主成分分析、冗余分析、典型坐标和对应分析。对其著作的研究表明,跨学科合作和对真实数据集的利用在拉奥几乎所有具有影响力的贡献中都至关重要。
{"title":"Looking back: Selected contributions by C. R. Rao to multivariate analysis","authors":"Dianna Smith","doi":"10.1111/sjos.12749","DOIUrl":"https://doi.org/10.1111/sjos.12749","url":null,"abstract":"Statistician C. R. Rao made many contributions to multivariate analysis over the span of his career. Some of his earliest contributions continue to be used and built upon almost 80 years later, while his more recent contributions spur new avenues of research. The present article discusses these contributions, how they helped shape multivariate analysis as we see it today, and what we may learn from reviewing his works. Topics include his extension of linear discriminant analysis, Rao's perimeter test, Rao's U statistic, his asymptotic expansion of Wilks' statistic, canonical factor analysis, functional principal component analysis, redundancy analysis, canonical coordinates, and correspondence analysis. The examination of his works shows that interdisciplinary collaboration and the utilization of real datasets were crucial in almost all of Rao's impactful contributions.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":"43 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the analysis of clustered failure time data, Cox frailty models have been extensively studied by incorporating frailty with a prespecified distribution to address potential correlation of data within clusters. In this paper, we propose a frailty proportional mean residual life regression model to analyze such data. A novel conditional quasi‐likelihood inference procedure is developed, utilizing a stochastic process and the inverse probability of censoring weighting (IPCW) to form estimating equations for regression parameters. Our proposal employs conditional inference based on a penalized quasi‐likelihood to address within‐cluster correlation without need to specify the frailty distribution, bringing the method closer to what suffices for real‐world applications. By adopting the Buckley–James estimator in the IPCW, the method further allows for dependent censoring. We establish asymptotic properties of the proposed estimator and evaluate its finite sample performance via simulation studies. An application to the data from a multi‐institutional breast cancer study is presented for illustration.
{"title":"Conditional quasi‐likelihood inference for mean residual life regression with clustered failure time data","authors":"Rui Huang, Liuquan Sun, Liming Xiang","doi":"10.1111/sjos.12746","DOIUrl":"https://doi.org/10.1111/sjos.12746","url":null,"abstract":"In the analysis of clustered failure time data, Cox frailty models have been extensively studied by incorporating frailty with a prespecified distribution to address potential correlation of data within clusters. In this paper, we propose a frailty proportional mean residual life regression model to analyze such data. A novel conditional quasi‐likelihood inference procedure is developed, utilizing a stochastic process and the inverse probability of censoring weighting (IPCW) to form estimating equations for regression parameters. Our proposal employs conditional inference based on a penalized quasi‐likelihood to address within‐cluster correlation without need to specify the frailty distribution, bringing the method closer to what suffices for real‐world applications. By adopting the Buckley–James estimator in the IPCW, the method further allows for dependent censoring. We establish asymptotic properties of the proposed estimator and evaluate its finite sample performance via simulation studies. An application to the data from a multi‐institutional breast cancer study is presented for illustration.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":"392 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We analyze the convergence rates for a family of auto‐regressive Markov chains on Euclidean space depending on a parameter , where at each step a randomly chosen coordinate is replaced by a noisy damped weighted average of the others. The interest in the model comes from the connection with a certain Bayesian scheme introduced by de Finetti in the analysis of partially exchangeable data. Our main result shows that, when n gets large (corresponding to a vanishing noise), a cutoff phenomenon occurs.
我们分析了欧几里得空间上的自动回归马尔可夫链的收敛率,该链取决于一个参数 ,其中每一步随机选择的坐标都由其他坐标的噪声阻尼加权平均值代替。该模型与德菲内蒂(de Finetti)在分析部分可交换数据时引入的某种贝叶斯方案有关,因而引起了人们的兴趣。我们的主要结果表明,当 n 变大时(对应于噪声消失),就会出现截断现象。
{"title":"Cutoff for a class of auto‐regressive models with vanishing additive noise","authors":"Balázs Gerencsér, Andrea Ottolini","doi":"10.1111/sjos.12748","DOIUrl":"https://doi.org/10.1111/sjos.12748","url":null,"abstract":"We analyze the convergence rates for a family of auto‐regressive Markov chains on Euclidean space depending on a parameter , where at each step a randomly chosen coordinate is replaced by a noisy damped weighted average of the others. The interest in the model comes from the connection with a certain Bayesian scheme introduced by de Finetti in the analysis of partially exchangeable data. Our main result shows that, when <jats:italic>n</jats:italic> gets large (corresponding to a vanishing noise), a cutoff phenomenon occurs.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":"10 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}