In linear least squares regression there exists a simple decomposition of the effect of an exposure on an outcome into two parts in the presence of an intermediate variable. This decomposition is described and then analogous decompositions for other models are examined, namely for logistic regression and proportional hazards models.
{"title":"Some approximations to the path formula for some nonlinear models","authors":"Christiana Kartsonaki","doi":"10.1111/sjos.12753","DOIUrl":"https://doi.org/10.1111/sjos.12753","url":null,"abstract":"In linear least squares regression there exists a simple decomposition of the effect of an exposure on an outcome into two parts in the presence of an intermediate variable. This decomposition is described and then analogous decompositions for other models are examined, namely for logistic regression and proportional hazards models.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a model to address the overlooked problem of node clustering in simple hypergraphs. Simple hypergraphs are suitable when a node may not appear multiple times in the same hyperedge, such as in co‐authorship datasets. Our model generalizes the stochastic blockmodel for graphs and assumes the existence of latent node groups and hyperedges are conditionally independent given these groups. We first establish the generic identifiability of the model parameters. We then develop a variational approximation Expectation‐Maximization algorithm for parameter inference and node clustering, and derive a statistical criterion for model selection. To illustrate the performance of our R package HyperSBM, we compare it with other node clustering methods using synthetic data generated from the model, as well as from a line clustering experiment and a co‐authorship dataset.
我们提出了一个模型来解决简单超图中被忽视的节点聚类问题。简单超图适用于一个节点可能不会多次出现在同一个超节点中的情况,例如在共同作者数据集中。我们的模型概括了图的随机块模型,并假定存在潜在的节点群组,而超图在这些群组中是有条件独立的。我们首先建立了模型参数的通用可识别性。然后,我们开发了一种用于参数推断和节点聚类的变分近似期望最大化算法,并推导出一种用于模型选择的统计标准。为了说明我们的 R 软件包 HyperSBM 的性能,我们使用该模型生成的合成数据以及行聚类实验和共同作者数据集,将其与其他节点聚类方法进行了比较。
{"title":"Model‐based clustering in simple hypergraphs through a stochastic blockmodel","authors":"Luca Brusa, Catherine Matias","doi":"10.1111/sjos.12754","DOIUrl":"https://doi.org/10.1111/sjos.12754","url":null,"abstract":"We propose a model to address the overlooked problem of node clustering in simple hypergraphs. Simple hypergraphs are suitable when a node may not appear multiple times in the same hyperedge, such as in co‐authorship datasets. Our model generalizes the stochastic blockmodel for graphs and assumes the existence of latent node groups and hyperedges are conditionally independent given these groups. We first establish the generic identifiability of the model parameters. We then develop a variational approximation Expectation‐Maximization algorithm for parameter inference and node clustering, and derive a statistical criterion for model selection. To illustrate the performance of our <jats:styled-content>R</jats:styled-content> package <jats:styled-content>HyperSBM</jats:styled-content>, we compare it with other node clustering methods using synthetic data generated from the model, as well as from a line clustering experiment and a co‐authorship dataset.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several models for count time series have been developed during the last decades, often inspired by traditional autoregressive moving average (ARMA) models for real‐valued time series, including integer‐valued ARMA (INARMA) and integer‐valued generalized autoregressive conditional heteroscedasticity (INGARCH) models. Both INARMA and INGARCH models exhibit an ARMA‐like autocorrelation function (ACF). To achieve negative ACF values within the class of INGARCH models, log and softplus link functions are suggested in the literature, where the softplus approach leads to conditional linearity in good approximation. However, the softplus approach is limited to the INGARCH family for unbounded counts, that is, it can neither be used for bounded counts, nor for count processes from the INARMA family. In this paper, we present an alternative solution, named the Tobit approach, for achieving approximate linearity together with negative ACF values, which is more generally applicable than the softplus approach. A Skellam–Tobit INGARCH model for unbounded counts is studied in detail, including stationarity, approximate computation of moments, maximum likelihood and censored least absolute deviations estimation for unknown parameters and corresponding simulations. Extensions of the Tobit approach to other situations are also discussed, including underlying discrete distributions, INAR models, and bounded counts. Three real‐data examples are considered to illustrate the usefulness of the new approach.
{"title":"Tobit models for count time series","authors":"Christian H. Weiß, Fukang Zhu","doi":"10.1111/sjos.12751","DOIUrl":"https://doi.org/10.1111/sjos.12751","url":null,"abstract":"Several models for count time series have been developed during the last decades, often inspired by traditional autoregressive moving average (ARMA) models for real‐valued time series, including integer‐valued ARMA (INARMA) and integer‐valued generalized autoregressive conditional heteroscedasticity (INGARCH) models. Both INARMA and INGARCH models exhibit an ARMA‐like autocorrelation function (ACF). To achieve negative ACF values within the class of INGARCH models, log and softplus link functions are suggested in the literature, where the softplus approach leads to conditional linearity in good approximation. However, the softplus approach is limited to the INGARCH family for unbounded counts, that is, it can neither be used for bounded counts, nor for count processes from the INARMA family. In this paper, we present an alternative solution, named the Tobit approach, for achieving approximate linearity together with negative ACF values, which is more generally applicable than the softplus approach. A Skellam–Tobit INGARCH model for unbounded counts is studied in detail, including stationarity, approximate computation of moments, maximum likelihood and censored least absolute deviations estimation for unknown parameters and corresponding simulations. Extensions of the Tobit approach to other situations are also discussed, including underlying discrete distributions, INAR models, and bounded counts. Three real‐data examples are considered to illustrate the usefulness of the new approach.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sir David Cox published four papers in the Scandinavian Journal of Statistics and two in the Scandinavian Actuarial Journal. This note provides some brief summaries of these papers.
{"title":"On some publications of Sir David Cox","authors":"Nancy Reid","doi":"10.1111/sjos.12752","DOIUrl":"https://doi.org/10.1111/sjos.12752","url":null,"abstract":"Sir David Cox published four papers in the <jats:italic>Scandinavian Journal of Statistics</jats:italic> and two in the <jats:italic>Scandinavian Actuarial Journal</jats:italic>. This note provides some brief summaries of these papers.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Statistician C. R. Rao made many contributions to multivariate analysis over the span of his career. Some of his earliest contributions continue to be used and built upon almost 80 years later, while his more recent contributions spur new avenues of research. The present article discusses these contributions, how they helped shape multivariate analysis as we see it today, and what we may learn from reviewing his works. Topics include his extension of linear discriminant analysis, Rao's perimeter test, Rao's U statistic, his asymptotic expansion of Wilks' statistic, canonical factor analysis, functional principal component analysis, redundancy analysis, canonical coordinates, and correspondence analysis. The examination of his works shows that interdisciplinary collaboration and the utilization of real datasets were crucial in almost all of Rao's impactful contributions.
统计学家 C. R. Rao 在其职业生涯中对多元分析做出了许多贡献。他最早的一些贡献在近 80 年后的今天仍被继续使用和发扬光大,而他最近的贡献则推动了新的研究方向。本文将讨论这些贡献,它们如何帮助塑造了我们今天看到的多元分析,以及我们可以从回顾他的作品中学到什么。主题包括他对线性判别分析的扩展、Rao 的周长检验、Rao 的 U 统计量、Wilks 统计量的渐近展开、典型因子分析、函数主成分分析、冗余分析、典型坐标和对应分析。对其著作的研究表明,跨学科合作和对真实数据集的利用在拉奥几乎所有具有影响力的贡献中都至关重要。
{"title":"Looking back: Selected contributions by C. R. Rao to multivariate analysis","authors":"Dianna Smith","doi":"10.1111/sjos.12749","DOIUrl":"https://doi.org/10.1111/sjos.12749","url":null,"abstract":"Statistician C. R. Rao made many contributions to multivariate analysis over the span of his career. Some of his earliest contributions continue to be used and built upon almost 80 years later, while his more recent contributions spur new avenues of research. The present article discusses these contributions, how they helped shape multivariate analysis as we see it today, and what we may learn from reviewing his works. Topics include his extension of linear discriminant analysis, Rao's perimeter test, Rao's U statistic, his asymptotic expansion of Wilks' statistic, canonical factor analysis, functional principal component analysis, redundancy analysis, canonical coordinates, and correspondence analysis. The examination of his works shows that interdisciplinary collaboration and the utilization of real datasets were crucial in almost all of Rao's impactful contributions.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We analyze the convergence rates for a family of auto‐regressive Markov chains on Euclidean space depending on a parameter , where at each step a randomly chosen coordinate is replaced by a noisy damped weighted average of the others. The interest in the model comes from the connection with a certain Bayesian scheme introduced by de Finetti in the analysis of partially exchangeable data. Our main result shows that, when n gets large (corresponding to a vanishing noise), a cutoff phenomenon occurs.
我们分析了欧几里得空间上的自动回归马尔可夫链的收敛率,该链取决于一个参数 ,其中每一步随机选择的坐标都由其他坐标的噪声阻尼加权平均值代替。该模型与德菲内蒂(de Finetti)在分析部分可交换数据时引入的某种贝叶斯方案有关,因而引起了人们的兴趣。我们的主要结果表明,当 n 变大时(对应于噪声消失),就会出现截断现象。
{"title":"Cutoff for a class of auto‐regressive models with vanishing additive noise","authors":"Balázs Gerencsér, Andrea Ottolini","doi":"10.1111/sjos.12748","DOIUrl":"https://doi.org/10.1111/sjos.12748","url":null,"abstract":"We analyze the convergence rates for a family of auto‐regressive Markov chains on Euclidean space depending on a parameter , where at each step a randomly chosen coordinate is replaced by a noisy damped weighted average of the others. The interest in the model comes from the connection with a certain Bayesian scheme introduced by de Finetti in the analysis of partially exchangeable data. Our main result shows that, when <jats:italic>n</jats:italic> gets large (corresponding to a vanishing noise), a cutoff phenomenon occurs.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the analysis of clustered failure time data, Cox frailty models have been extensively studied by incorporating frailty with a prespecified distribution to address potential correlation of data within clusters. In this paper, we propose a frailty proportional mean residual life regression model to analyze such data. A novel conditional quasi‐likelihood inference procedure is developed, utilizing a stochastic process and the inverse probability of censoring weighting (IPCW) to form estimating equations for regression parameters. Our proposal employs conditional inference based on a penalized quasi‐likelihood to address within‐cluster correlation without need to specify the frailty distribution, bringing the method closer to what suffices for real‐world applications. By adopting the Buckley–James estimator in the IPCW, the method further allows for dependent censoring. We establish asymptotic properties of the proposed estimator and evaluate its finite sample performance via simulation studies. An application to the data from a multi‐institutional breast cancer study is presented for illustration.
{"title":"Conditional quasi‐likelihood inference for mean residual life regression with clustered failure time data","authors":"Rui Huang, Liuquan Sun, Liming Xiang","doi":"10.1111/sjos.12746","DOIUrl":"https://doi.org/10.1111/sjos.12746","url":null,"abstract":"In the analysis of clustered failure time data, Cox frailty models have been extensively studied by incorporating frailty with a prespecified distribution to address potential correlation of data within clusters. In this paper, we propose a frailty proportional mean residual life regression model to analyze such data. A novel conditional quasi‐likelihood inference procedure is developed, utilizing a stochastic process and the inverse probability of censoring weighting (IPCW) to form estimating equations for regression parameters. Our proposal employs conditional inference based on a penalized quasi‐likelihood to address within‐cluster correlation without need to specify the frailty distribution, bringing the method closer to what suffices for real‐world applications. By adopting the Buckley–James estimator in the IPCW, the method further allows for dependent censoring. We establish asymptotic properties of the proposed estimator and evaluate its finite sample performance via simulation studies. An application to the data from a multi‐institutional breast cancer study is presented for illustration.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tomasz Ca̧kała, Błażej Miasojedow, Wojciech Rejchel, Maryia Shpak
Continuous time Bayesian networks (CTBNs) represent a class of stochastic processes, which can be used to model complex phenomena, for instance, they can describe interactions occurring in living processes, social science models or medicine. The literature on this topic is usually focused on a case when a dependence structure of a system is known and we are to determine conditional transition intensities (parameters of a network). In the paper, we study a structure learning problem, which is a more challenging task and the existing research on this topic is limited. The approach, which we propose, is based on a penalized likelihood method. We prove that our algorithm, under mild regularity conditions, recognizes a dependence structure of a graph with high probability. We also investigate properties of the procedure in numerical studies.
{"title":"Structure learning for continuous time Bayesian networks via penalized likelihood","authors":"Tomasz Ca̧kała, Błażej Miasojedow, Wojciech Rejchel, Maryia Shpak","doi":"10.1111/sjos.12747","DOIUrl":"https://doi.org/10.1111/sjos.12747","url":null,"abstract":"Continuous time Bayesian networks (CTBNs) represent a class of stochastic processes, which can be used to model complex phenomena, for instance, they can describe interactions occurring in living processes, social science models or medicine. The literature on this topic is usually focused on a case when a dependence structure of a system is known and we are to determine conditional transition intensities (parameters of a network). In the paper, we study a structure learning problem, which is a more challenging task and the existing research on this topic is limited. The approach, which we propose, is based on a penalized likelihood method. We prove that our algorithm, under mild regularity conditions, recognizes a dependence structure of a graph with high probability. We also investigate properties of the procedure in numerical studies.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kristian Gundersen, Timothée Bacri, J. Bulla, S. Hølleland, A. Maruotti, Bård Støve
This paper examines nonlinear and time‐varying dependence structures between a pair of stochastic variables, using a novel approach which combines regime‐switching models and local Gaussian correlation (LGC). We propose an LGC‐based bootstrap test for examining whether the dependence structure between two variables is equal across different regimes. We examine this test in a Monte Carlo study, where it shows good level and power properties. We argue that this approach is more intuitive than competing approaches, typically combining regime‐switching models with copula theory. Furthermore, LGC is a semi‐parametric approach, hence avoids any parametric specification of the dependence structure. We illustrate our approach using financial returns from the US–UK stock markets and the US stock and government bond markets, and provide detailed insight into their dependence structures.
{"title":"Testing for time‐varying nonlinear dependence structures: Regime‐switching and local Gaussian correlation","authors":"Kristian Gundersen, Timothée Bacri, J. Bulla, S. Hølleland, A. Maruotti, Bård Støve","doi":"10.1111/sjos.12744","DOIUrl":"https://doi.org/10.1111/sjos.12744","url":null,"abstract":"This paper examines nonlinear and time‐varying dependence structures between a pair of stochastic variables, using a novel approach which combines regime‐switching models and local Gaussian correlation (LGC). We propose an LGC‐based bootstrap test for examining whether the dependence structure between two variables is equal across different regimes. We examine this test in a Monte Carlo study, where it shows good level and power properties. We argue that this approach is more intuitive than competing approaches, typically combining regime‐switching models with copula theory. Furthermore, LGC is a semi‐parametric approach, hence avoids any parametric specification of the dependence structure. We illustrate our approach using financial returns from the US–UK stock markets and the US stock and government bond markets, and provide detailed insight into their dependence structures.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141796437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Network or matrix reconstruction is a general problem that occurs if the row‐ and column sums of a matrix are given, and the matrix entries need to be predicted conditional on the aggregated information. In this paper, we show that the predictions obtained from the iterative proportional fitting procedure (IPFP) or equivalently maximum entropy (ME) can be obtained by restricted maximum likelihood estimation relying on augmented Lagrangian optimization. Based on this equivalence, we extend the framework of network reconstruction, conditional on row and column sums, toward regression, which allows the inclusion of exogenous covariates and bootstrap‐based uncertainty quantification. More specifically, the mean of the regression model leads to the observed row and column margins. To exemplify the approach, we provide a simulation study and investigate interbank lending data, provided by the Bank for International Settlement. This dataset provides full knowledge of the real network and is, therefore, suitable to evaluate the predictions of our approach. It is shown that the inclusion of exogenous information leads to superior predictions in terms of and errors.
{"title":"Regression‐based network‐flow and inner‐matrix reconstruction","authors":"Michael Lebacher, Göran Kauermann","doi":"10.1111/sjos.12742","DOIUrl":"https://doi.org/10.1111/sjos.12742","url":null,"abstract":"Network or matrix reconstruction is a general problem that occurs if the row‐ and column sums of a matrix are given, and the matrix entries need to be predicted conditional on the aggregated information. In this paper, we show that the predictions obtained from the iterative proportional fitting procedure (IPFP) or equivalently maximum entropy (ME) can be obtained by restricted maximum likelihood estimation relying on augmented Lagrangian optimization. Based on this equivalence, we extend the framework of network reconstruction, conditional on row and column sums, toward regression, which allows the inclusion of exogenous covariates and bootstrap‐based uncertainty quantification. More specifically, the mean of the regression model leads to the observed row and column margins. To exemplify the approach, we provide a simulation study and investigate interbank lending data, provided by the Bank for International Settlement. This dataset provides full knowledge of the real network and is, therefore, suitable to evaluate the predictions of our approach. It is shown that the inclusion of exogenous information leads to superior predictions in terms of and errors.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141780377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}