首页 > 最新文献

Australian & New Zealand Journal of Statistics最新文献

英文 中文
PanIC: Consistent information criteria for general model selection problems PanIC:一般模型选择问题的一致信息标准
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-10-31 DOI: 10.1111/anzs.12426
Hien Duy Nguyen

Model selection is a ubiquitous problem that arises in the application of many statistical and machine learning methods. In the likelihood and related settings, it is typical to use the method of information criteria (ICs) to choose the most parsimonious among competing models by penalizing the likelihood-based objective function. Theorems guaranteeing the consistency of ICs can often be difficult to verify and are often specific and bespoke. We present a set of results that guarantee consistency for a class of ICs, which we call PanIC (from the Greek root ‘pan’, meaning ‘of everything’), with easily verifiable regularity conditions. PanICs are applicable in any loss-based learning problem and are not exclusive to likelihood problems. We illustrate the verification of regularity conditions for model selection problems regarding finite mixture models, least absolute deviation and support vector regression and principal component analysis, and demonstrate the effectiveness of PanICs for such problems via numerical simulations. Furthermore, we present new sufficient conditions for the consistency of BIC-like estimators and provide comparisons of the BIC with PanIC.

模型选择是一个普遍存在的问题,它出现在许多统计和机器学习方法的应用中。在似然和相关设置中,通常使用信息标准(ICs)的方法,通过惩罚基于似然的目标函数,在竞争模型中选择最节俭的模型。保证ic一致性的定理通常很难验证,并且通常是特定的和定制的。我们提出了一组结果,保证了一类ic的一致性,我们称之为PanIC(来自希腊语词根“pan”,意思是“一切”),具有易于验证的正则性条件。恐慌适用于任何基于损失的学习问题,而不仅仅是可能性问题。我们举例说明了有限混合模型、最小绝对偏差、支持向量回归和主成分分析的模型选择问题的正则性条件的验证,并通过数值模拟证明了PanICs对这类问题的有效性。此外,我们给出了类BIC估计一致性的新充分条件,并将类BIC估计与PanIC估计进行了比较。
{"title":"PanIC: Consistent information criteria for general model selection problems","authors":"Hien Duy Nguyen","doi":"10.1111/anzs.12426","DOIUrl":"https://doi.org/10.1111/anzs.12426","url":null,"abstract":"<p>Model selection is a ubiquitous problem that arises in the application of many statistical and machine learning methods. In the likelihood and related settings, it is typical to use the method of information criteria (ICs) to choose the most parsimonious among competing models by penalizing the likelihood-based objective function. Theorems guaranteeing the consistency of ICs can often be difficult to verify and are often specific and bespoke. We present a set of results that guarantee consistency for a class of ICs, which we call PanIC (from the Greek root ‘<i>pan</i>’, meaning ‘<i>of everything</i>’), with easily verifiable regularity conditions. PanICs are applicable in any loss-based learning problem and are not exclusive to likelihood problems. We illustrate the verification of regularity conditions for model selection problems regarding finite mixture models, least absolute deviation and support vector regression and principal component analysis, and demonstrate the effectiveness of PanICs for such problems via numerical simulations. Furthermore, we present new sufficient conditions for the consistency of BIC-like estimators and provide comparisons of the BIC with PanIC.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"441-466"},"PeriodicalIF":0.8,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12426","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction de-correlated inference: A safe approach for post-prediction inference 预测去相关推理:一种安全的后预测推理方法
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-10-24 DOI: 10.1111/anzs.12429
Feng Gan, Wanfeng Liang, Changliang Zou

In modern data analysis, it is common to use machine learning methods to predict outcomes on unlabelled datasets and then use these pseudo-outcomes in subsequent statistical inference. Inference in this setting is often called post-prediction inference. We propose a novel assumption-lean framework for statistical inference under post-prediction setting, called prediction de-correlated inference (PDC). Our approach is safe, in the sense that PDC can automatically adapt to any black-box machine-learning model and consistently outperform the supervised counterparts. The PDC framework also offers easy extensibility for accommodating multiple predictive models. Both numerical results and real-world data analysis demonstrate the superiority of PDC over the state-of-the-art methods.

在现代数据分析中,通常使用机器学习方法来预测未标记数据集的结果,然后在随后的统计推断中使用这些伪结果。这种情况下的推理通常被称为后预测推理。我们提出了一种新的预测后设置统计推断的精简假设框架,即预测去相关推理(PDC)。我们的方法是安全的,从某种意义上说,PDC可以自动适应任何黑箱机器学习模型,并始终优于有监督的对应模型。PDC框架还为适应多个预测模型提供了简单的可扩展性。数值结果和实际数据分析都证明了PDC的优越性。
{"title":"Prediction de-correlated inference: A safe approach for post-prediction inference","authors":"Feng Gan,&nbsp;Wanfeng Liang,&nbsp;Changliang Zou","doi":"10.1111/anzs.12429","DOIUrl":"https://doi.org/10.1111/anzs.12429","url":null,"abstract":"<div>\u0000 \u0000 <p>In modern data analysis, it is common to use machine learning methods to predict outcomes on unlabelled datasets and then use these pseudo-outcomes in subsequent statistical inference. Inference in this setting is often called post-prediction inference. We propose a novel assumption-lean framework for statistical inference under post-prediction setting, called prediction de-correlated inference (PDC). Our approach is safe, in the sense that PDC can automatically adapt to any black-box machine-learning model and consistently outperform the supervised counterparts. The PDC framework also offers easy extensibility for accommodating multiple predictive models. Both numerical results and real-world data analysis demonstrate the superiority of PDC over the state-of-the-art methods.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"417-440"},"PeriodicalIF":0.8,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Telling Stories with Data: With Application in R. By Rohan Alexander. CRC Press. 2023. 622 pages. AU$129.60 (hardback). ISBN: 978-1-0321-3477-2. 用数据讲故事:在r语言中的应用CRC出版社。2023。622页。非盟(精装)129.60美元。ISBN: 978-1-0321-3477-2。
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-09-30 DOI: 10.1111/anzs.12428
Emi Tanaka
{"title":"Telling Stories with Data: With Application in R. By Rohan Alexander. CRC Press. 2023. 622 pages. AU$129.60 (hardback). ISBN: 978-1-0321-3477-2.","authors":"Emi Tanaka","doi":"10.1111/anzs.12428","DOIUrl":"https://doi.org/10.1111/anzs.12428","url":null,"abstract":"","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"467-470"},"PeriodicalIF":0.8,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Full Bayesian analysis of triple seasonal autoregressive models 三季节自回归模型的全贝叶斯分析
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-09-27 DOI: 10.1111/anzs.12427
Ayman A. Amin

Seasonal autoregressive (SAR) time series models have been extended to fit time series exhibiting multiple seasonalities. However, hardly any research in Bayesian literature has been done on modelling multiple seasonalities. In this article, we propose a full Bayesian analysis of triple SAR (TSAR) models for time series with triple seasonality, considering identification, estimation and prediction for these TSAR models. In this Bayesian analysis of TSAR models, we assume the model errors to be normally distributed and the model order to be a random variable with a known maximum value, and we employ the g prior for the model coefficients and variance. Accordingly, we first derive the posterior mass function of the TSAR order in closed form, which then enables us to identify the best order of TSAR model as the order value with the highest posterior probability. In addition, we derive the conditional posteriors to be a multivariate normal for the TSAR coefficients and to be an inverse gamma for the TSAR variance; also, we derive the conditional predictive distribution to be a multivariate normal for future observations. Since these derived conditional distributions are in closed forms, we introduce the Gibbs sampler to present the Bayesian analysis of TSAR models and to easily produce multiple-step-ahead predictions. Using Julia programming language, we conduct an extensive simulation study, aiming to evaluate the accuracy of our proposed full Bayesian analysis for TSAR models. In addition, we apply our work on time series to hourly electricity load in some European countries.

季节自回归(SAR)时间序列模型已扩展到拟合具有多季节性的时间序列。然而,在贝叶斯文献中,几乎没有研究对多季节性进行建模。在本文中,我们提出了一个完整的贝叶斯分析三重季节序列的三重SAR (TSAR)模型,考虑这些TSAR模型的识别,估计和预测。在TSAR模型的贝叶斯分析中,我们假设模型误差是正态分布的,模型阶数是一个已知最大值的随机变量,我们使用g先验来表示模型系数和方差。因此,我们首先以封闭形式推导出TSAR阶次的后验质量函数,从而使我们能够识别出TSAR模型的最佳阶次作为具有最高后验概率的阶值。此外,我们推导出条件后验是TSAR系数的多元正态,是TSAR方差的逆伽马;此外,我们推导出条件预测分布是未来观测的多变量正态分布。由于这些导出的条件分布是封闭形式,我们引入吉布斯采样器来呈现TSAR模型的贝叶斯分析,并轻松产生多步提前预测。使用Julia编程语言,我们进行了广泛的模拟研究,旨在评估我们提出的TSAR模型的全贝叶斯分析的准确性。此外,我们将我们的工作时间序列应用于一些欧洲国家的小时电力负荷。
{"title":"Full Bayesian analysis of triple seasonal autoregressive models","authors":"Ayman A. Amin","doi":"10.1111/anzs.12427","DOIUrl":"https://doi.org/10.1111/anzs.12427","url":null,"abstract":"<div>\u0000 \u0000 <p>Seasonal autoregressive (SAR) time series models have been extended to fit time series exhibiting multiple seasonalities. However, hardly any research in Bayesian literature has been done on modelling multiple seasonalities. In this article, we propose a full Bayesian analysis of triple SAR (TSAR) models for time series with triple seasonality, considering identification, estimation and prediction for these TSAR models. In this Bayesian analysis of TSAR models, we assume the model errors to be normally distributed and the model order to be a random variable with a known maximum value, and we employ the g prior for the model coefficients and variance. Accordingly, we first derive the posterior mass function of the TSAR order in closed form, which then enables us to identify the best order of TSAR model as the order value with the highest posterior probability. In addition, we derive the conditional posteriors to be a multivariate normal for the TSAR coefficients and to be an inverse gamma for the TSAR variance; also, we derive the conditional predictive distribution to be a multivariate normal for future observations. Since these derived conditional distributions are in closed forms, we introduce the Gibbs sampler to present the Bayesian analysis of TSAR models and to easily produce multiple-step-ahead predictions. Using <span>Julia</span> programming language, we conduct an extensive simulation study, aiming to evaluate the accuracy of our proposed full Bayesian analysis for TSAR models. In addition, we apply our work on time series to hourly electricity load in some European countries.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"389-416"},"PeriodicalIF":0.8,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining collinearities 检查共线性
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-29 DOI: 10.1111/anzs.12425
Zillur R. Shabuz, Paul H. Garthwaite

The cos-max method is a little-known method of identifying collinearities. It is based on the cos-max transformation, which makes minimal adjustment to a set of vectors to create orthogonal components with a one-to-one correspondence between the original vectors and the components. The aim of the transformation is that each vector should be close to the orthogonal component with which it is paired. Vectors involved in a collinearity must be adjusted substantially in order to create orthogonal components, while other vectors will typically be adjusted far less. The cos-max method uses the size of adjustments to identify collinearities. It gives a coherent relationship between collinear sets of variables and variance inflation factors (VIFs) and identifies collinear sets using more information than traditional methods. In this paper we describe these features of the method and examine its performance in examples, comparing it with alternative methods. In each example, the collinearities identified by the cos-max method only contained variables with high VIFs and contained all variables with high VIFs. The collinearities identified by other methods did not have such a close link to VIFs. Also, the collinearities identified by the cos-max method were as simple as or simpler than those given by other methods, with less overlap between collinearities in the variables that they contained.

摘要 cos-max 法是一种鲜为人知的识别共线性的方法。它以 cos-max 变换为基础,对一组向量进行最小调整,以创建正交分量,并在原始向量和分量之间建立一一对应关系。变换的目的是使每个向量都能接近与其配对的正交分量。为了创建正交分量,必须对涉及共线性的向量进行大幅调整,而其他向量的调整幅度通常要小得多。cos-max 方法使用调整的大小来识别共线性。与传统方法相比,该方法在共线变量集和方差膨胀因子(VIF)之间给出了一种连贯的关系,并利用更多的信息来识别共线变量集。在本文中,我们介绍了该方法的这些特点,并通过实例检验了其性能,同时将其与其他方法进行了比较。在每个例子中,cos-max 方法识别出的共线性只包含高 VIF 的变量,也包含所有高 VIF 的变量。其他方法识别出的共线性与 VIF 没有如此密切的联系。此外,cos-max 方法确定的共线性与其他方法确定的共线性一样简单,甚至更简单,其包含的变量共线性之间的重叠较少。
{"title":"Examining collinearities","authors":"Zillur R. Shabuz,&nbsp;Paul H. Garthwaite","doi":"10.1111/anzs.12425","DOIUrl":"10.1111/anzs.12425","url":null,"abstract":"<div>\u0000 \u0000 <p>The cos-max method is a little-known method of identifying collinearities. It is based on the cos-max transformation, which makes minimal adjustment to a set of vectors to create orthogonal components with a one-to-one correspondence between the original vectors and the components. The aim of the transformation is that each vector should be close to the orthogonal component with which it is paired. Vectors involved in a collinearity must be adjusted substantially in order to create orthogonal components, while other vectors will typically be adjusted far less. The cos-max method uses the size of adjustments to identify collinearities. It gives a coherent relationship between collinear sets of variables and variance inflation factors (VIFs) and identifies collinear sets using more information than traditional methods. In this paper we describe these features of the method and examine its performance in examples, comparing it with alternative methods. In each example, the collinearities identified by the cos-max method only contained variables with high VIFs and contained all variables with high VIFs. The collinearities identified by other methods did not have such a close link to VIFs. Also, the collinearities identified by the cos-max method were as simple as or simpler than those given by other methods, with less overlap between collinearities in the variables that they contained.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"367-388"},"PeriodicalIF":0.8,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exact samples sizes for clinical trials subject to size and power constraints 受规模和功率限制的临床试验的精确样本量
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-29 DOI: 10.1111/anzs.12424
Chris J. Lloyd

This paper first describes the difficulties in providing the required sample sizes for clinical trials that guarantee type 1 and type 2 error control. The required sample sizes obviously depend on the test employed, and in this study we use the so-called E-test, which is known to have extremely favourable size properties and higher power than alternatives. To compute exact powers for this test in real time is not currently feasible, so a corpus of pre-computed exact powers (and sizes) was created, covering sample sizes up to 500. When there are no solutions within the corpus, a novel extrapolation technique is used. Exact size can be computed after the sample sizes have been extracted; however, for the E-test the exact size is virtually always very close to the nominal target. All the code has been converted into an R-package, which is available on CRAN and illustrated.

摘要 本文首先介绍了为临床试验提供所需的样本量以保证 1 型和 2 型误差控制所面临的困难。所需的样本量显然取决于所采用的检验,在本研究中,我们采用了所谓的 E 检验,众所周知,该检验具有极其有利的样本量特性,且比其他检验具有更高的功率。实时计算该测试的精确幂目前并不可行,因此我们创建了一个预先计算精确幂(和大小)的语料库,涵盖的样本量最高可达 500 个。当语料库中没有解决方案时,就会使用一种新颖的外推法。在提取样本大小后,可以计算精确大小;不过,对于 E 测试,精确大小几乎总是非常接近标称目标。所有代码都已转换成 R 包,可在 CRAN 上获取,并附有图解。
{"title":"Exact samples sizes for clinical trials subject to size and power constraints","authors":"Chris J. Lloyd","doi":"10.1111/anzs.12424","DOIUrl":"10.1111/anzs.12424","url":null,"abstract":"<p>This paper first describes the difficulties in providing the required sample sizes for clinical trials that guarantee type 1 and type 2 error control. The required sample sizes obviously depend on the test employed, and in this study we use the so-called <i>E</i>-test, which is known to have extremely favourable size properties and higher power than alternatives. To compute exact powers for this test in real time is not currently feasible, so a corpus of pre-computed exact powers (and sizes) was created, covering sample sizes up to 500. When there are no solutions within the corpus, a novel extrapolation technique is used. Exact size can be computed after the sample sizes have been extracted; however, for the <i>E</i>-test the exact size is virtually always very close to the nominal target. All the code has been converted into an <span>R-package</span>, which is available on CRAN and illustrated.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"297-305"},"PeriodicalIF":0.8,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12424","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian analysis of multivariate mixed longitudinal ordinal and continuous data 多变量混合纵向序数和连续数据的贝叶斯分析
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-13 DOI: 10.1111/anzs.12421
Xiao Zhang

Multivariate longitudinal ordinal and continuous data exist in many scientific fields. However, it is a rigorous task to jointly analyse them due to the complicated correlated structures of those mixed data and the lack of a multivariate distribution. The multivariate probit model, assuming there is a multivariate normal latent variable for each multivariate ordinal data, becomes a natural modeling choice for longitudinal ordinal data especially for jointly analysing with longitudinal continuous data. However, the identifiable multivariate probit model requires the variances of the latent normal variables to be fixed at 1, thus the joint covariance matrix of the latent variables and the continuous multivariate normal variables is restricted at some of the diagonal elements. This constrains to develop both the classical and Bayesian methods to analyse mixed ordinal and continuous data. In this investigation, we proposed three Markov chain Monte Carlo (MCMC) methods: Metropolis–Hastings within Gibbs algorithm based on the identifiable model, and a Gibbs sampling algorithm and parameter-expanded data augmentation based on the constructed non-identifiable model. Through simulation studies and a real data application, we illustrated the performance of these three methods and provided an observation of using non-identifiable model to develop MCMC sampling methods.

摘要许多科学领域都存在多变量纵向序数和连续数据。然而,由于这些混合数据的相关结构复杂且缺乏多元分布,对它们进行联合分析是一项艰巨的任务。多变量 probit 模型假定每个多变量序数数据都有一个多变量正态潜变量,因此成为纵向序数数据,尤其是与纵向连续数据进行联合分析时的自然建模选择。然而,可识别多元 probit 模型要求潜变量正态变量的方差固定为 1,因此潜变量和连续多元正态变量的联合协方差矩阵在某些对角元素上受到限制。这就要求我们同时开发经典方法和贝叶斯方法来分析混合序数和连续数据。在这项研究中,我们提出了三种马尔科夫链蒙特卡罗(MCMC)方法:基于可识别模型的吉布斯算法中的 Metropolis-Hastings,以及基于构建的不可识别模型的吉布斯抽样算法和参数扩展数据增强。通过模拟研究和实际数据应用,我们说明了这三种方法的性能,并提供了使用不可识别模型开发 MCMC 采样方法的观察结果。
{"title":"Bayesian analysis of multivariate mixed longitudinal ordinal and continuous data","authors":"Xiao Zhang","doi":"10.1111/anzs.12421","DOIUrl":"10.1111/anzs.12421","url":null,"abstract":"<p>Multivariate longitudinal ordinal and continuous data exist in many scientific fields. However, it is a rigorous task to jointly analyse them due to the complicated correlated structures of those mixed data and the lack of a multivariate distribution. The multivariate probit model, assuming there is a multivariate normal latent variable for each multivariate ordinal data, becomes a natural modeling choice for longitudinal ordinal data especially for jointly analysing with longitudinal continuous data. However, the identifiable multivariate probit model requires the variances of the latent normal variables to be fixed at 1, thus the joint covariance matrix of the latent variables and the continuous multivariate normal variables is restricted at some of the diagonal elements. This constrains to develop both the classical and Bayesian methods to analyse mixed ordinal and continuous data. In this investigation, we proposed three Markov chain Monte Carlo (MCMC) methods: Metropolis–Hastings within Gibbs algorithm based on the identifiable model, and a Gibbs sampling algorithm and parameter-expanded data augmentation based on the constructed non-identifiable model. Through simulation studies and a real data application, we illustrated the performance of these three methods and provided an observation of using non-identifiable model to develop MCMC sampling methods.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"325-346"},"PeriodicalIF":0.8,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12421","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributional modelling of positively skewed data via the flexible Weibull extension distribution 通过灵活的威布尔扩展分布建立正倾斜数据的分布模型
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-08-11 DOI: 10.1111/anzs.12423
Freddy Hernández-Barajas, Olga Usuga-Manco, Carmen Patino-Rodríguez, Fernando Marmolejo-Ramos

The time until an event occurs is often known to have a skewed distribution. To model this, a statistical distribution called the two-parameter flexible Weibull extension (FWE) has been proposed. In this paper, the FWE distribution is used to model datasets through the use of generalised additive models for location, scale and shape (GAMLSS) distributional regression. GAMLSS is the only regression technique that can examine the effects of both categorical and numeric predictors on all the parameters of the distribution used to fit the dependent variable. To make it easier to use the FWE distribution through GAMLSS, the RelDists R package is proposed. A simulation study shows that FWE modelling through GAMLSS provides reliable parameter estimates even in the presence of factors that affect the distribution.

摘要众所周知,事件发生前的时间通常呈倾斜分布。为了模拟这种情况,有人提出了一种称为双参数灵活威布尔扩展(FWE)的统计分布。本文通过使用位置、规模和形状的广义加性模型(GAMLSS)分布回归,将 FWE 分布用于数据集建模。GAMLSS 是唯一一种可以检查分类和数字预测因子对用于拟合因变量的分布的所有参数的影响的回归技术。为了更方便地通过 GAMLSS 使用 FWE 分布,我们提出了 RelDists R 软件包。模拟研究表明,即使存在影响分布的因素,通过 GAMLSS 建立 FWE 模型也能提供可靠的参数估计。
{"title":"Distributional modelling of positively skewed data via the flexible Weibull extension distribution","authors":"Freddy Hernández-Barajas,&nbsp;Olga Usuga-Manco,&nbsp;Carmen Patino-Rodríguez,&nbsp;Fernando Marmolejo-Ramos","doi":"10.1111/anzs.12423","DOIUrl":"10.1111/anzs.12423","url":null,"abstract":"<p>The time until an event occurs is often known to have a skewed distribution. To model this, a statistical distribution called the two-parameter flexible Weibull extension (FWE) has been proposed. In this paper, the FWE distribution is used to model datasets through the use of generalised additive models for location, scale and shape (GAMLSS) distributional regression. GAMLSS is the only regression technique that can examine the effects of both categorical and numeric predictors on all the parameters of the distribution used to fit the dependent variable. To make it easier to use the FWE distribution through GAMLSS, the <span>RelDists</span> R package is proposed. A simulation study shows that FWE modelling through GAMLSS provides reliable parameter estimates even in the presence of factors that affect the distribution.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"306-324"},"PeriodicalIF":0.8,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12423","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spline linear mixed-effects models for causal mediation analysis with longitudinal data 用于纵向数据因果中介分析的样条线性混合效应模型
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-26 DOI: 10.1111/anzs.12422
Jeffrey M. Albert, Hongxu Zhu, Tanujit Dey, Jiayang Sun, Wojbor A. Woyczynski, Gregory Powers, Meeyoung Min

Often, causal mediation analysis is of interest when both the mediator and the final outcome are repeatedly measured, but limited work has been done for this situation (as opposed to where only the mediator is repeatedly measured). Available methods are primarily based on parametric models and tend to be sensitive to model assumptions. This article presents semiparametric, continuous-time models to provide a flexible and robust approach to causal mediation analysis for longitudinal data, which allows these data to be unbalanced or irregular. Specifically, the method uses spline linear mixed-effects models for the mediator and for the final outcome, with a two-step approach to model-fitting in which a predicted mediator is used as a covariate in the final outcome model. The models allow flexible functions for both the mean and individual response functions for each outcome. We derive estimated natural direct and indirect effects as a function of time using an extended mediation formula and sequential ignorability assumption. In simulation studies, we compare properties of estimated direct and indirect effects, and a delta method estimate of the standard error of the latter, under alternative approaches for predicting the mediator. The approach is illustrated using harmonised data from two cohort studies to examine attention as a mediator of the effect of prenatal tobacco exposure on externalising behaviour in children.

摘要通常情况下,当中介因子和最终结果都被重复测量时,因果中介分析就会引起人们的兴趣,但针对这种情况(与只重复测量中介因子的情况相反)所做的工作很有限。现有方法主要基于参数模型,往往对模型假设很敏感。本文提出了半参数连续时间模型,为纵向数据的因果中介分析提供了一种灵活稳健的方法,允许这些数据是不平衡或不规则的。具体来说,该方法对中介因子和最终结果使用样条线性混合效应模型,采用两步法进行模型拟合,其中预测的中介因子在最终结果模型中用作协变量。这些模型允许对每种结果的平均值和个体反应函数使用灵活的函数。我们利用扩展中介公式和顺序无知假设,得出作为时间函数的估计自然直接效应和间接效应。在模拟研究中,我们比较了估算的直接效应和间接效应的特性,以及在其他中介预测方法下,后者标准误差的德尔塔法估算值。我们使用了两项队列研究的统一数据来说明这种方法,以研究注意力作为产前烟草暴露对儿童外化行为影响的中介因素。
{"title":"Spline linear mixed-effects models for causal mediation analysis with longitudinal data","authors":"Jeffrey M. Albert,&nbsp;Hongxu Zhu,&nbsp;Tanujit Dey,&nbsp;Jiayang Sun,&nbsp;Wojbor A. Woyczynski,&nbsp;Gregory Powers,&nbsp;Meeyoung Min","doi":"10.1111/anzs.12422","DOIUrl":"10.1111/anzs.12422","url":null,"abstract":"<div>\u0000 \u0000 <p>Often, causal mediation analysis is of interest when both the mediator and the final outcome are repeatedly measured, but limited work has been done for this situation (as opposed to where only the mediator is repeatedly measured). Available methods are primarily based on parametric models and tend to be sensitive to model assumptions. This article presents semiparametric, continuous-time models to provide a flexible and robust approach to causal mediation analysis for longitudinal data, which allows these data to be unbalanced or irregular. Specifically, the method uses spline linear mixed-effects models for the mediator and for the final outcome, with a two-step approach to model-fitting in which a predicted mediator is used as a covariate in the final outcome model. The models allow flexible functions for both the mean and individual response functions for each outcome. We derive estimated natural direct and indirect effects as a function of time using an extended mediation formula and sequential ignorability assumption. In simulation studies, we compare properties of estimated direct and indirect effects, and a delta method estimate of the standard error of the latter, under alternative approaches for predicting the mediator. The approach is illustrated using harmonised data from two cohort studies to examine attention as a mediator of the effect of prenatal tobacco exposure on externalising behaviour in children.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"347-366"},"PeriodicalIF":0.8,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141779821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new robust covariance matrix estimation for high-dimensional microbiome data 用于高维微生物组数据的新型鲁棒协方差矩阵估算法
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-05-28 DOI: 10.1111/anzs.12415
Jiyang Wang, Wanfeng Liang, Lijie Li, Yue Wu, Xiaoyan Ma
<div> <p>Microbiome data typically lie in a high-dimensional simplex. One of the key questions in metagenomic analysis is to exploit the covariance structure for this kind of data. In this paper, a framework called approximate-estimate-threshold (AET) is developed for the robust basis covariance estimation for high-dimensional microbiome data. To be specific, we first construct a proxy matrix <span></span><math> <semantics> <mrow> <mi>Γ</mi> </mrow> <annotation>$$ boldsymbol{Gamma} $$</annotation> </semantics></math>, which is almost indistinguishable from the real basis covariance matrix <span></span><math> <semantics> <mrow> <mi>∑</mi> </mrow> <annotation>$$ boldsymbol{Sigma} $$</annotation> </semantics></math>. Then, any estimator <span></span><math> <semantics> <mrow> <mover> <mrow> <mi>Γ</mi> </mrow> <mo>^</mo> </mover> </mrow> <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation> </semantics></math> satisfying some conditions can be used to estimate <span></span><math> <semantics> <mrow> <mi>Γ</mi> </mrow> <annotation>$$ boldsymbol{Gamma} $$</annotation> </semantics></math>. Finally, we impose a thresholding step on <span></span><math> <semantics> <mrow> <mover> <mrow> <mi>Γ</mi> </mrow> <mo>^</mo> </mover> </mrow> <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation> </semantics></math> to obtain the final estimator <span></span><math> <semantics> <mrow> <mover> <mrow> <mi>∑</mi> </mrow> <mo>^</mo> </mover> </mrow> <annotation>$$ hat{boldsymbol{Sigma}} $$</annotation> </semantics></math>. In particular, this paper applies a Huber-type estimator <span></span><math> <semantics> <mrow> <mover> <mrow> <mi>Γ</mi> </mrow> <mo>^</mo> </mover> </mrow> <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation> </semantics></math>, and achieves robustness by only requiring the boundedness of 2+<span></span><math> <semantics> <mrow> <mi>ϵ</mi> </mrow> <a
摘要微生物组数据通常位于高维单纯形中。元基因组分析的关键问题之一是如何利用这类数据的协方差结构。本文为高维微生物组数据的稳健基础协方差估计建立了一个称为近似估计阈值(AET)的框架。具体来说,我们首先构建一个代理矩阵 ,它与真实的基础协方差矩阵几乎没有区别。然后,任何满足某些条件的估计器都可以用来估计 。最后,我们对其进行阈值化处理,得到最终的估计值。本文特别应用了一种 Huber 型估计器 , 并通过只要求某些 ...的 2+ 矩的有界性来实现稳健性。我们推导了谱规范下的收敛率,并提供了支持恢复的理论保证。我们利用大量模拟和一个实际例子来说明我们方法的经验性能。
{"title":"A new robust covariance matrix estimation for high-dimensional microbiome data","authors":"Jiyang Wang,&nbsp;Wanfeng Liang,&nbsp;Lijie Li,&nbsp;Yue Wu,&nbsp;Xiaoyan Ma","doi":"10.1111/anzs.12415","DOIUrl":"10.1111/anzs.12415","url":null,"abstract":"&lt;div&gt;\u0000 \u0000 &lt;p&gt;Microbiome data typically lie in a high-dimensional simplex. One of the key questions in metagenomic analysis is to exploit the covariance structure for this kind of data. In this paper, a framework called approximate-estimate-threshold (AET) is developed for the robust basis covariance estimation for high-dimensional microbiome data. To be specific, we first construct a proxy matrix &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;Γ&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ boldsymbol{Gamma} $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;, which is almost indistinguishable from the real basis covariance matrix &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;∑&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ boldsymbol{Sigma} $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;. Then, any estimator &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mover&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;Γ&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;mo&gt;^&lt;/mo&gt;\u0000 &lt;/mover&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ hat{boldsymbol{Gamma}} $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt; satisfying some conditions can be used to estimate &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;Γ&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ boldsymbol{Gamma} $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;. Finally, we impose a thresholding step on &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mover&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;Γ&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;mo&gt;^&lt;/mo&gt;\u0000 &lt;/mover&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ hat{boldsymbol{Gamma}} $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt; to obtain the final estimator &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mover&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;∑&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;mo&gt;^&lt;/mo&gt;\u0000 &lt;/mover&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ hat{boldsymbol{Sigma}} $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;. In particular, this paper applies a Huber-type estimator &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mover&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;Γ&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;mo&gt;^&lt;/mo&gt;\u0000 &lt;/mover&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$$ hat{boldsymbol{Gamma}} $$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;, and achieves robustness by only requiring the boundedness of 2+&lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;ϵ&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;a","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"281-295"},"PeriodicalIF":1.1,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141190779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Australian & New Zealand Journal of Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1