首页 > 最新文献

Australian & New Zealand Journal of Statistics最新文献

英文 中文
PanIC: Consistent information criteria for general model selection problems
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-10-31 DOI: 10.1111/anzs.12426
Hien Duy Nguyen

Model selection is a ubiquitous problem that arises in the application of many statistical and machine learning methods. In the likelihood and related settings, it is typical to use the method of information criteria (ICs) to choose the most parsimonious among competing models by penalizing the likelihood-based objective function. Theorems guaranteeing the consistency of ICs can often be difficult to verify and are often specific and bespoke. We present a set of results that guarantee consistency for a class of ICs, which we call PanIC (from the Greek root ‘pan’, meaning ‘of everything’), with easily verifiable regularity conditions. PanICs are applicable in any loss-based learning problem and are not exclusive to likelihood problems. We illustrate the verification of regularity conditions for model selection problems regarding finite mixture models, least absolute deviation and support vector regression and principal component analysis, and demonstrate the effectiveness of PanICs for such problems via numerical simulations. Furthermore, we present new sufficient conditions for the consistency of BIC-like estimators and provide comparisons of the BIC with PanIC.

{"title":"PanIC: Consistent information criteria for general model selection problems","authors":"Hien Duy Nguyen","doi":"10.1111/anzs.12426","DOIUrl":"https://doi.org/10.1111/anzs.12426","url":null,"abstract":"<p>Model selection is a ubiquitous problem that arises in the application of many statistical and machine learning methods. In the likelihood and related settings, it is typical to use the method of information criteria (ICs) to choose the most parsimonious among competing models by penalizing the likelihood-based objective function. Theorems guaranteeing the consistency of ICs can often be difficult to verify and are often specific and bespoke. We present a set of results that guarantee consistency for a class of ICs, which we call PanIC (from the Greek root ‘<i>pan</i>’, meaning ‘<i>of everything</i>’), with easily verifiable regularity conditions. PanICs are applicable in any loss-based learning problem and are not exclusive to likelihood problems. We illustrate the verification of regularity conditions for model selection problems regarding finite mixture models, least absolute deviation and support vector regression and principal component analysis, and demonstrate the effectiveness of PanICs for such problems via numerical simulations. Furthermore, we present new sufficient conditions for the consistency of BIC-like estimators and provide comparisons of the BIC with PanIC.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"441-466"},"PeriodicalIF":0.8,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12426","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction de-correlated inference: A safe approach for post-prediction inference
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-10-24 DOI: 10.1111/anzs.12429
Feng Gan, Wanfeng Liang, Changliang Zou

In modern data analysis, it is common to use machine learning methods to predict outcomes on unlabelled datasets and then use these pseudo-outcomes in subsequent statistical inference. Inference in this setting is often called post-prediction inference. We propose a novel assumption-lean framework for statistical inference under post-prediction setting, called prediction de-correlated inference (PDC). Our approach is safe, in the sense that PDC can automatically adapt to any black-box machine-learning model and consistently outperform the supervised counterparts. The PDC framework also offers easy extensibility for accommodating multiple predictive models. Both numerical results and real-world data analysis demonstrate the superiority of PDC over the state-of-the-art methods.

{"title":"Prediction de-correlated inference: A safe approach for post-prediction inference","authors":"Feng Gan,&nbsp;Wanfeng Liang,&nbsp;Changliang Zou","doi":"10.1111/anzs.12429","DOIUrl":"https://doi.org/10.1111/anzs.12429","url":null,"abstract":"<div>\u0000 \u0000 <p>In modern data analysis, it is common to use machine learning methods to predict outcomes on unlabelled datasets and then use these pseudo-outcomes in subsequent statistical inference. Inference in this setting is often called post-prediction inference. We propose a novel assumption-lean framework for statistical inference under post-prediction setting, called prediction de-correlated inference (PDC). Our approach is safe, in the sense that PDC can automatically adapt to any black-box machine-learning model and consistently outperform the supervised counterparts. The PDC framework also offers easy extensibility for accommodating multiple predictive models. Both numerical results and real-world data analysis demonstrate the superiority of PDC over the state-of-the-art methods.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"417-440"},"PeriodicalIF":0.8,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Telling Stories with Data: With Application in R. By Rohan Alexander. CRC Press. 2023. 622 pages. AU$129.60 (hardback). ISBN: 978-1-0321-3477-2.
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-09-30 DOI: 10.1111/anzs.12428
Emi Tanaka
{"title":"Telling Stories with Data: With Application in R. By Rohan Alexander. CRC Press. 2023. 622 pages. AU$129.60 (hardback). ISBN: 978-1-0321-3477-2.","authors":"Emi Tanaka","doi":"10.1111/anzs.12428","DOIUrl":"https://doi.org/10.1111/anzs.12428","url":null,"abstract":"","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"467-470"},"PeriodicalIF":0.8,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Full Bayesian analysis of triple seasonal autoregressive models
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-09-27 DOI: 10.1111/anzs.12427
Ayman A. Amin

Seasonal autoregressive (SAR) time series models have been extended to fit time series exhibiting multiple seasonalities. However, hardly any research in Bayesian literature has been done on modelling multiple seasonalities. In this article, we propose a full Bayesian analysis of triple SAR (TSAR) models for time series with triple seasonality, considering identification, estimation and prediction for these TSAR models. In this Bayesian analysis of TSAR models, we assume the model errors to be normally distributed and the model order to be a random variable with a known maximum value, and we employ the g prior for the model coefficients and variance. Accordingly, we first derive the posterior mass function of the TSAR order in closed form, which then enables us to identify the best order of TSAR model as the order value with the highest posterior probability. In addition, we derive the conditional posteriors to be a multivariate normal for the TSAR coefficients and to be an inverse gamma for the TSAR variance; also, we derive the conditional predictive distribution to be a multivariate normal for future observations. Since these derived conditional distributions are in closed forms, we introduce the Gibbs sampler to present the Bayesian analysis of TSAR models and to easily produce multiple-step-ahead predictions. Using Julia programming language, we conduct an extensive simulation study, aiming to evaluate the accuracy of our proposed full Bayesian analysis for TSAR models. In addition, we apply our work on time series to hourly electricity load in some European countries.

{"title":"Full Bayesian analysis of triple seasonal autoregressive models","authors":"Ayman A. Amin","doi":"10.1111/anzs.12427","DOIUrl":"https://doi.org/10.1111/anzs.12427","url":null,"abstract":"<div>\u0000 \u0000 <p>Seasonal autoregressive (SAR) time series models have been extended to fit time series exhibiting multiple seasonalities. However, hardly any research in Bayesian literature has been done on modelling multiple seasonalities. In this article, we propose a full Bayesian analysis of triple SAR (TSAR) models for time series with triple seasonality, considering identification, estimation and prediction for these TSAR models. In this Bayesian analysis of TSAR models, we assume the model errors to be normally distributed and the model order to be a random variable with a known maximum value, and we employ the g prior for the model coefficients and variance. Accordingly, we first derive the posterior mass function of the TSAR order in closed form, which then enables us to identify the best order of TSAR model as the order value with the highest posterior probability. In addition, we derive the conditional posteriors to be a multivariate normal for the TSAR coefficients and to be an inverse gamma for the TSAR variance; also, we derive the conditional predictive distribution to be a multivariate normal for future observations. Since these derived conditional distributions are in closed forms, we introduce the Gibbs sampler to present the Bayesian analysis of TSAR models and to easily produce multiple-step-ahead predictions. Using <span>Julia</span> programming language, we conduct an extensive simulation study, aiming to evaluate the accuracy of our proposed full Bayesian analysis for TSAR models. In addition, we apply our work on time series to hourly electricity load in some European countries.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"389-416"},"PeriodicalIF":0.8,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining collinearities 检查共线性
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-29 DOI: 10.1111/anzs.12425
Zillur R. Shabuz, Paul H. Garthwaite

The cos-max method is a little-known method of identifying collinearities. It is based on the cos-max transformation, which makes minimal adjustment to a set of vectors to create orthogonal components with a one-to-one correspondence between the original vectors and the components. The aim of the transformation is that each vector should be close to the orthogonal component with which it is paired. Vectors involved in a collinearity must be adjusted substantially in order to create orthogonal components, while other vectors will typically be adjusted far less. The cos-max method uses the size of adjustments to identify collinearities. It gives a coherent relationship between collinear sets of variables and variance inflation factors (VIFs) and identifies collinear sets using more information than traditional methods. In this paper we describe these features of the method and examine its performance in examples, comparing it with alternative methods. In each example, the collinearities identified by the cos-max method only contained variables with high VIFs and contained all variables with high VIFs. The collinearities identified by other methods did not have such a close link to VIFs. Also, the collinearities identified by the cos-max method were as simple as or simpler than those given by other methods, with less overlap between collinearities in the variables that they contained.

摘要 cos-max 法是一种鲜为人知的识别共线性的方法。它以 cos-max 变换为基础,对一组向量进行最小调整,以创建正交分量,并在原始向量和分量之间建立一一对应关系。变换的目的是使每个向量都能接近与其配对的正交分量。为了创建正交分量,必须对涉及共线性的向量进行大幅调整,而其他向量的调整幅度通常要小得多。cos-max 方法使用调整的大小来识别共线性。与传统方法相比,该方法在共线变量集和方差膨胀因子(VIF)之间给出了一种连贯的关系,并利用更多的信息来识别共线变量集。在本文中,我们介绍了该方法的这些特点,并通过实例检验了其性能,同时将其与其他方法进行了比较。在每个例子中,cos-max 方法识别出的共线性只包含高 VIF 的变量,也包含所有高 VIF 的变量。其他方法识别出的共线性与 VIF 没有如此密切的联系。此外,cos-max 方法确定的共线性与其他方法确定的共线性一样简单,甚至更简单,其包含的变量共线性之间的重叠较少。
{"title":"Examining collinearities","authors":"Zillur R. Shabuz,&nbsp;Paul H. Garthwaite","doi":"10.1111/anzs.12425","DOIUrl":"10.1111/anzs.12425","url":null,"abstract":"<div>\u0000 \u0000 <p>The cos-max method is a little-known method of identifying collinearities. It is based on the cos-max transformation, which makes minimal adjustment to a set of vectors to create orthogonal components with a one-to-one correspondence between the original vectors and the components. The aim of the transformation is that each vector should be close to the orthogonal component with which it is paired. Vectors involved in a collinearity must be adjusted substantially in order to create orthogonal components, while other vectors will typically be adjusted far less. The cos-max method uses the size of adjustments to identify collinearities. It gives a coherent relationship between collinear sets of variables and variance inflation factors (VIFs) and identifies collinear sets using more information than traditional methods. In this paper we describe these features of the method and examine its performance in examples, comparing it with alternative methods. In each example, the collinearities identified by the cos-max method only contained variables with high VIFs and contained all variables with high VIFs. The collinearities identified by other methods did not have such a close link to VIFs. Also, the collinearities identified by the cos-max method were as simple as or simpler than those given by other methods, with less overlap between collinearities in the variables that they contained.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"367-388"},"PeriodicalIF":0.8,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exact samples sizes for clinical trials subject to size and power constraints 受规模和功率限制的临床试验的精确样本量
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-29 DOI: 10.1111/anzs.12424
Chris J. Lloyd

This paper first describes the difficulties in providing the required sample sizes for clinical trials that guarantee type 1 and type 2 error control. The required sample sizes obviously depend on the test employed, and in this study we use the so-called E-test, which is known to have extremely favourable size properties and higher power than alternatives. To compute exact powers for this test in real time is not currently feasible, so a corpus of pre-computed exact powers (and sizes) was created, covering sample sizes up to 500. When there are no solutions within the corpus, a novel extrapolation technique is used. Exact size can be computed after the sample sizes have been extracted; however, for the E-test the exact size is virtually always very close to the nominal target. All the code has been converted into an R-package, which is available on CRAN and illustrated.

摘要 本文首先介绍了为临床试验提供所需的样本量以保证 1 型和 2 型误差控制所面临的困难。所需的样本量显然取决于所采用的检验,在本研究中,我们采用了所谓的 E 检验,众所周知,该检验具有极其有利的样本量特性,且比其他检验具有更高的功率。实时计算该测试的精确幂目前并不可行,因此我们创建了一个预先计算精确幂(和大小)的语料库,涵盖的样本量最高可达 500 个。当语料库中没有解决方案时,就会使用一种新颖的外推法。在提取样本大小后,可以计算精确大小;不过,对于 E 测试,精确大小几乎总是非常接近标称目标。所有代码都已转换成 R 包,可在 CRAN 上获取,并附有图解。
{"title":"Exact samples sizes for clinical trials subject to size and power constraints","authors":"Chris J. Lloyd","doi":"10.1111/anzs.12424","DOIUrl":"10.1111/anzs.12424","url":null,"abstract":"<p>This paper first describes the difficulties in providing the required sample sizes for clinical trials that guarantee type 1 and type 2 error control. The required sample sizes obviously depend on the test employed, and in this study we use the so-called <i>E</i>-test, which is known to have extremely favourable size properties and higher power than alternatives. To compute exact powers for this test in real time is not currently feasible, so a corpus of pre-computed exact powers (and sizes) was created, covering sample sizes up to 500. When there are no solutions within the corpus, a novel extrapolation technique is used. Exact size can be computed after the sample sizes have been extracted; however, for the <i>E</i>-test the exact size is virtually always very close to the nominal target. All the code has been converted into an <span>R-package</span>, which is available on CRAN and illustrated.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"297-305"},"PeriodicalIF":0.8,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12424","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian analysis of multivariate mixed longitudinal ordinal and continuous data 多变量混合纵向序数和连续数据的贝叶斯分析
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-13 DOI: 10.1111/anzs.12421
Xiao Zhang

Multivariate longitudinal ordinal and continuous data exist in many scientific fields. However, it is a rigorous task to jointly analyse them due to the complicated correlated structures of those mixed data and the lack of a multivariate distribution. The multivariate probit model, assuming there is a multivariate normal latent variable for each multivariate ordinal data, becomes a natural modeling choice for longitudinal ordinal data especially for jointly analysing with longitudinal continuous data. However, the identifiable multivariate probit model requires the variances of the latent normal variables to be fixed at 1, thus the joint covariance matrix of the latent variables and the continuous multivariate normal variables is restricted at some of the diagonal elements. This constrains to develop both the classical and Bayesian methods to analyse mixed ordinal and continuous data. In this investigation, we proposed three Markov chain Monte Carlo (MCMC) methods: Metropolis–Hastings within Gibbs algorithm based on the identifiable model, and a Gibbs sampling algorithm and parameter-expanded data augmentation based on the constructed non-identifiable model. Through simulation studies and a real data application, we illustrated the performance of these three methods and provided an observation of using non-identifiable model to develop MCMC sampling methods.

摘要许多科学领域都存在多变量纵向序数和连续数据。然而,由于这些混合数据的相关结构复杂且缺乏多元分布,对它们进行联合分析是一项艰巨的任务。多变量 probit 模型假定每个多变量序数数据都有一个多变量正态潜变量,因此成为纵向序数数据,尤其是与纵向连续数据进行联合分析时的自然建模选择。然而,可识别多元 probit 模型要求潜变量正态变量的方差固定为 1,因此潜变量和连续多元正态变量的联合协方差矩阵在某些对角元素上受到限制。这就要求我们同时开发经典方法和贝叶斯方法来分析混合序数和连续数据。在这项研究中,我们提出了三种马尔科夫链蒙特卡罗(MCMC)方法:基于可识别模型的吉布斯算法中的 Metropolis-Hastings,以及基于构建的不可识别模型的吉布斯抽样算法和参数扩展数据增强。通过模拟研究和实际数据应用,我们说明了这三种方法的性能,并提供了使用不可识别模型开发 MCMC 采样方法的观察结果。
{"title":"Bayesian analysis of multivariate mixed longitudinal ordinal and continuous data","authors":"Xiao Zhang","doi":"10.1111/anzs.12421","DOIUrl":"10.1111/anzs.12421","url":null,"abstract":"<p>Multivariate longitudinal ordinal and continuous data exist in many scientific fields. However, it is a rigorous task to jointly analyse them due to the complicated correlated structures of those mixed data and the lack of a multivariate distribution. The multivariate probit model, assuming there is a multivariate normal latent variable for each multivariate ordinal data, becomes a natural modeling choice for longitudinal ordinal data especially for jointly analysing with longitudinal continuous data. However, the identifiable multivariate probit model requires the variances of the latent normal variables to be fixed at 1, thus the joint covariance matrix of the latent variables and the continuous multivariate normal variables is restricted at some of the diagonal elements. This constrains to develop both the classical and Bayesian methods to analyse mixed ordinal and continuous data. In this investigation, we proposed three Markov chain Monte Carlo (MCMC) methods: Metropolis–Hastings within Gibbs algorithm based on the identifiable model, and a Gibbs sampling algorithm and parameter-expanded data augmentation based on the constructed non-identifiable model. Through simulation studies and a real data application, we illustrated the performance of these three methods and provided an observation of using non-identifiable model to develop MCMC sampling methods.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"325-346"},"PeriodicalIF":0.8,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12421","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributional modelling of positively skewed data via the flexible Weibull extension distribution 通过灵活的威布尔扩展分布建立正倾斜数据的分布模型
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-11 DOI: 10.1111/anzs.12423
Freddy Hernández-Barajas, Olga Usuga-Manco, Carmen Patino-Rodríguez, Fernando Marmolejo-Ramos

The time until an event occurs is often known to have a skewed distribution. To model this, a statistical distribution called the two-parameter flexible Weibull extension (FWE) has been proposed. In this paper, the FWE distribution is used to model datasets through the use of generalised additive models for location, scale and shape (GAMLSS) distributional regression. GAMLSS is the only regression technique that can examine the effects of both categorical and numeric predictors on all the parameters of the distribution used to fit the dependent variable. To make it easier to use the FWE distribution through GAMLSS, the RelDists R package is proposed. A simulation study shows that FWE modelling through GAMLSS provides reliable parameter estimates even in the presence of factors that affect the distribution.

摘要众所周知,事件发生前的时间通常呈倾斜分布。为了模拟这种情况,有人提出了一种称为双参数灵活威布尔扩展(FWE)的统计分布。本文通过使用位置、规模和形状的广义加性模型(GAMLSS)分布回归,将 FWE 分布用于数据集建模。GAMLSS 是唯一一种可以检查分类和数字预测因子对用于拟合因变量的分布的所有参数的影响的回归技术。为了更方便地通过 GAMLSS 使用 FWE 分布,我们提出了 RelDists R 软件包。模拟研究表明,即使存在影响分布的因素,通过 GAMLSS 建立 FWE 模型也能提供可靠的参数估计。
{"title":"Distributional modelling of positively skewed data via the flexible Weibull extension distribution","authors":"Freddy Hernández-Barajas,&nbsp;Olga Usuga-Manco,&nbsp;Carmen Patino-Rodríguez,&nbsp;Fernando Marmolejo-Ramos","doi":"10.1111/anzs.12423","DOIUrl":"10.1111/anzs.12423","url":null,"abstract":"<p>The time until an event occurs is often known to have a skewed distribution. To model this, a statistical distribution called the two-parameter flexible Weibull extension (FWE) has been proposed. In this paper, the FWE distribution is used to model datasets through the use of generalised additive models for location, scale and shape (GAMLSS) distributional regression. GAMLSS is the only regression technique that can examine the effects of both categorical and numeric predictors on all the parameters of the distribution used to fit the dependent variable. To make it easier to use the FWE distribution through GAMLSS, the <span>RelDists</span> R package is proposed. A simulation study shows that FWE modelling through GAMLSS provides reliable parameter estimates even in the presence of factors that affect the distribution.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"306-324"},"PeriodicalIF":0.8,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12423","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spline linear mixed-effects models for causal mediation analysis with longitudinal data 用于纵向数据因果中介分析的样条线性混合效应模型
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-26 DOI: 10.1111/anzs.12422
Jeffrey M. Albert, Hongxu Zhu, Tanujit Dey, Jiayang Sun, Wojbor A. Woyczynski, Gregory Powers, Meeyoung Min

Often, causal mediation analysis is of interest when both the mediator and the final outcome are repeatedly measured, but limited work has been done for this situation (as opposed to where only the mediator is repeatedly measured). Available methods are primarily based on parametric models and tend to be sensitive to model assumptions. This article presents semiparametric, continuous-time models to provide a flexible and robust approach to causal mediation analysis for longitudinal data, which allows these data to be unbalanced or irregular. Specifically, the method uses spline linear mixed-effects models for the mediator and for the final outcome, with a two-step approach to model-fitting in which a predicted mediator is used as a covariate in the final outcome model. The models allow flexible functions for both the mean and individual response functions for each outcome. We derive estimated natural direct and indirect effects as a function of time using an extended mediation formula and sequential ignorability assumption. In simulation studies, we compare properties of estimated direct and indirect effects, and a delta method estimate of the standard error of the latter, under alternative approaches for predicting the mediator. The approach is illustrated using harmonised data from two cohort studies to examine attention as a mediator of the effect of prenatal tobacco exposure on externalising behaviour in children.

摘要通常情况下,当中介因子和最终结果都被重复测量时,因果中介分析就会引起人们的兴趣,但针对这种情况(与只重复测量中介因子的情况相反)所做的工作很有限。现有方法主要基于参数模型,往往对模型假设很敏感。本文提出了半参数连续时间模型,为纵向数据的因果中介分析提供了一种灵活稳健的方法,允许这些数据是不平衡或不规则的。具体来说,该方法对中介因子和最终结果使用样条线性混合效应模型,采用两步法进行模型拟合,其中预测的中介因子在最终结果模型中用作协变量。这些模型允许对每种结果的平均值和个体反应函数使用灵活的函数。我们利用扩展中介公式和顺序无知假设,得出作为时间函数的估计自然直接效应和间接效应。在模拟研究中,我们比较了估算的直接效应和间接效应的特性,以及在其他中介预测方法下,后者标准误差的德尔塔法估算值。我们使用了两项队列研究的统一数据来说明这种方法,以研究注意力作为产前烟草暴露对儿童外化行为影响的中介因素。
{"title":"Spline linear mixed-effects models for causal mediation analysis with longitudinal data","authors":"Jeffrey M. Albert,&nbsp;Hongxu Zhu,&nbsp;Tanujit Dey,&nbsp;Jiayang Sun,&nbsp;Wojbor A. Woyczynski,&nbsp;Gregory Powers,&nbsp;Meeyoung Min","doi":"10.1111/anzs.12422","DOIUrl":"10.1111/anzs.12422","url":null,"abstract":"<div>\u0000 \u0000 <p>Often, causal mediation analysis is of interest when both the mediator and the final outcome are repeatedly measured, but limited work has been done for this situation (as opposed to where only the mediator is repeatedly measured). Available methods are primarily based on parametric models and tend to be sensitive to model assumptions. This article presents semiparametric, continuous-time models to provide a flexible and robust approach to causal mediation analysis for longitudinal data, which allows these data to be unbalanced or irregular. Specifically, the method uses spline linear mixed-effects models for the mediator and for the final outcome, with a two-step approach to model-fitting in which a predicted mediator is used as a covariate in the final outcome model. The models allow flexible functions for both the mean and individual response functions for each outcome. We derive estimated natural direct and indirect effects as a function of time using an extended mediation formula and sequential ignorability assumption. In simulation studies, we compare properties of estimated direct and indirect effects, and a delta method estimate of the standard error of the latter, under alternative approaches for predicting the mediator. The approach is illustrated using harmonised data from two cohort studies to examine attention as a mediator of the effect of prenatal tobacco exposure on externalising behaviour in children.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"347-366"},"PeriodicalIF":0.8,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141779821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new robust covariance matrix estimation for high-dimensional microbiome data 用于高维微生物组数据的新型鲁棒协方差矩阵估算法
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-05-28 DOI: 10.1111/anzs.12415
Jiyang Wang, Wanfeng Liang, Lijie Li, Yue Wu, Xiaoyan Ma
<div> <p>Microbiome data typically lie in a high-dimensional simplex. One of the key questions in metagenomic analysis is to exploit the covariance structure for this kind of data. In this paper, a framework called approximate-estimate-threshold (AET) is developed for the robust basis covariance estimation for high-dimensional microbiome data. To be specific, we first construct a proxy matrix <span></span><math> <semantics> <mrow> <mi>Γ</mi> </mrow> <annotation>$$ boldsymbol{Gamma} $$</annotation> </semantics></math>, which is almost indistinguishable from the real basis covariance matrix <span></span><math> <semantics> <mrow> <mi>∑</mi> </mrow> <annotation>$$ boldsymbol{Sigma} $$</annotation> </semantics></math>. Then, any estimator <span></span><math> <semantics> <mrow> <mover> <mrow> <mi>Γ</mi> </mrow> <mo>^</mo> </mover> </mrow> <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation> </semantics></math> satisfying some conditions can be used to estimate <span></span><math> <semantics> <mrow> <mi>Γ</mi> </mrow> <annotation>$$ boldsymbol{Gamma} $$</annotation> </semantics></math>. Finally, we impose a thresholding step on <span></span><math> <semantics> <mrow> <mover> <mrow> <mi>Γ</mi> </mrow> <mo>^</mo> </mover> </mrow> <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation> </semantics></math> to obtain the final estimator <span></span><math> <semantics> <mrow> <mover> <mrow> <mi>∑</mi> </mrow> <mo>^</mo> </mover> </mrow> <annotation>$$ hat{boldsymbol{Sigma}} $$</annotation> </semantics></math>. In particular, this paper applies a Huber-type estimator <span></span><math> <semantics> <mrow> <mover> <mrow> <mi>Γ</mi> </mrow> <mo>^</mo> </mover> </mrow> <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation> </semantics></math>, and achieves robustness by only requiring the boundedness of 2+<span></span><math> <semantics> <mrow> <mi>ϵ</mi> </mrow> <a
摘要微生物组数据通常位于高维单纯形中。元基因组分析的关键问题之一是如何利用这类数据的协方差结构。本文为高维微生物组数据的稳健基础协方差估计建立了一个称为近似估计阈值(AET)的框架。具体来说,我们首先构建一个代理矩阵 ,它与真实的基础协方差矩阵几乎没有区别。然后,任何满足某些条件的估计器都可以用来估计 。最后,我们对其进行阈值化处理,得到最终的估计值。本文特别应用了一种 Huber 型估计器 , 并通过只要求某些 ...的 2+ 矩的有界性来实现稳健性。我们推导了谱规范下的收敛率,并提供了支持恢复的理论保证。我们利用大量模拟和一个实际例子来说明我们方法的经验性能。
{"title":"A new robust covariance matrix estimation for high-dimensional microbiome data","authors":"Jiyang Wang,&nbsp;Wanfeng Liang,&nbsp;Lijie Li,&nbsp;Yue Wu,&nbsp;Xiaoyan Ma","doi":"10.1111/anzs.12415","DOIUrl":"10.1111/anzs.12415","url":null,"abstract":"&lt;div&gt;\u0000 \u0000 &lt;p&gt;Microbiome data typically lie in a high-dimensional simplex. One of the key questions in metagenomic analysis is to exploit the covariance structure for this kind of data. In this paper, a framework called approximate-estimate-threshold (AET) is developed for the robust basis covariance estimation for high-dimensional microbiome data. To be specific, we first construct a proxy matrix &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;Γ&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ boldsymbol{Gamma} $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;, which is almost indistinguishable from the real basis covariance matrix &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;∑&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ boldsymbol{Sigma} $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;. Then, any estimator &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mover&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;Γ&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;mo&gt;^&lt;/mo&gt;\u0000 &lt;/mover&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ hat{boldsymbol{Gamma}} $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt; satisfying some conditions can be used to estimate &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;Γ&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ boldsymbol{Gamma} $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;. Finally, we impose a thresholding step on &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mover&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;Γ&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;mo&gt;^&lt;/mo&gt;\u0000 &lt;/mover&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ hat{boldsymbol{Gamma}} $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt; to obtain the final estimator &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mover&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;∑&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;mo&gt;^&lt;/mo&gt;\u0000 &lt;/mover&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ hat{boldsymbol{Sigma}} $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;. In particular, this paper applies a Huber-type estimator &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mover&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;Γ&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;mo&gt;^&lt;/mo&gt;\u0000 &lt;/mover&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ hat{boldsymbol{Gamma}} $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;, and achieves robustness by only requiring the boundedness of 2+&lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;ϵ&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;a","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"281-295"},"PeriodicalIF":1.1,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141190779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Australian & New Zealand Journal of Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1