Australian & New Zealand Journal of Statistics最新文献_第6页

A new robust covariance matrix estimation for high-dimensional microbiome data 用于高维微生物组数据的新型鲁棒协方差矩阵估算法

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2024-05-28 DOI: 10.1111/anzs.12415

Jiyang Wang, Wanfeng Liang, Lijie Li, Yue Wu, Xiaoyan Ma

<div> Microbiome data typically lie in a high-dimensional simplex. One of the key questions in metagenomic analysis is to exploit the covariance structure for this kind of data. In this paper, a framework called approximate-estimate-threshold (AET) is developed for the robust basis covariance estimation for high-dimensional microbiome data. To be specific, we first construct a proxy matrix <math> <semantics> <mrow> <mi>Γ</mi> </mrow> <annotation>$$ boldsymbol{Gamma} $$</annotation> </semantics></math>, which is almost indistinguishable from the real basis covariance matrix <math> <semantics> <mrow> <mi>∑</mi> </mrow> <annotation>$$ boldsymbol{Sigma} $$</annotation> </semantics></math>. Then, any estimator <math> <semantics> <mrow> <mover> <mrow> <mi>Γ</mi> </mrow> <mo>^</mo> </mover> </mrow> <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation> </semantics></math> satisfying some conditions can be used to estimate <math> <semantics> <mrow> <mi>Γ</mi> </mrow> <annotation>$$ boldsymbol{Gamma} $$</annotation> </semantics></math>. Finally, we impose a thresholding step on <math> <semantics> <mrow> <mover> <mrow> <mi>Γ</mi> </mrow> <mo>^</mo> </mover> </mrow> <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation> </semantics></math> to obtain the final estimator <math> <semantics> <mrow> <mover> <mrow> <mi>∑</mi> </mrow> <mo>^</mo> </mover> </mrow> <annotation>$$ hat{boldsymbol{Sigma}} $$</annotation> </semantics></math>. In particular, this paper applies a Huber-type estimator <math> <semantics> <mrow> <mover> <mrow> <mi>Γ</mi> </mrow> <mo>^</mo> </mover> </mrow> <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation> </semantics></math>, and achieves robustness by only requiring the boundedness of 2+<math> <semantics> <mrow> <mi>ϵ</mi> </mrow> <a

摘要微生物组数据通常位于高维单纯形中。元基因组分析的关键问题之一是如何利用这类数据的协方差结构。本文为高维微生物组数据的稳健基础协方差估计建立了一个称为近似估计阈值（AET）的框架。具体来说，我们首先构建一个代理矩阵，它与真实的基础协方差矩阵几乎没有区别。然后，任何满足某些条件的估计器都可以用来估计。最后，我们对其进行阈值化处理，得到最终的估计值。本文特别应用了一种 Huber 型估计器 , 并通过只要求某些 ...的 2+ 矩的有界性来实现稳健性。我们推导了谱规范下的收敛率，并提供了支持恢复的理论保证。我们利用大量模拟和一个实际例子来说明我们方法的经验性能。

{"title":"A new robust covariance matrix estimation for high-dimensional microbiome data","authors":"Jiyang Wang, Wanfeng Liang, Lijie Li, Yue Wu, Xiaoyan Ma","doi":"10.1111/anzs.12415","DOIUrl":"10.1111/anzs.12415","url":null,"abstract":"<div>\u0000 \u0000 Microbiome data typically lie in a high-dimensional simplex. One of the key questions in metagenomic analysis is to exploit the covariance structure for this kind of data. In this paper, a framework called approximate-estimate-threshold (AET) is developed for the robust basis covariance estimation for high-dimensional microbiome data. To be specific, we first construct a proxy matrix <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>Γ</mi>\u0000 </mrow>\u0000 <annotation>$$ boldsymbol{Gamma} $$</annotation>\u0000 </semantics></math>, which is almost indistinguishable from the real basis covariance matrix <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>∑</mi>\u0000 </mrow>\u0000 <annotation>$$ boldsymbol{Sigma} $$</annotation>\u0000 </semantics></math>. Then, any estimator <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mover>\u0000 <mrow>\u0000 <mi>Γ</mi>\u0000 </mrow>\u0000 <mo>^</mo>\u0000 </mover>\u0000 </mrow>\u0000 <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation>\u0000 </semantics></math> satisfying some conditions can be used to estimate <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>Γ</mi>\u0000 </mrow>\u0000 <annotation>$$ boldsymbol{Gamma} $$</annotation>\u0000 </semantics></math>. Finally, we impose a thresholding step on <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mover>\u0000 <mrow>\u0000 <mi>Γ</mi>\u0000 </mrow>\u0000 <mo>^</mo>\u0000 </mover>\u0000 </mrow>\u0000 <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation>\u0000 </semantics></math> to obtain the final estimator <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mover>\u0000 <mrow>\u0000 <mi>∑</mi>\u0000 </mrow>\u0000 <mo>^</mo>\u0000 </mover>\u0000 </mrow>\u0000 <annotation>$$ hat{boldsymbol{Sigma}} $$</annotation>\u0000 </semantics></math>. In particular, this paper applies a Huber-type estimator <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mover>\u0000 <mrow>\u0000 <mi>Γ</mi>\u0000 </mrow>\u0000 <mo>^</mo>\u0000 </mover>\u0000 </mrow>\u0000 <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation>\u0000 </semantics></math>, and achieves robustness by only requiring the boundedness of 2+<math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>ϵ</mi>\u0000 </mrow>\u0000 <a","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"281-295"},"PeriodicalIF":1.1,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141190779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Testing multiple dispersion effects from unreplicated order-of-addition experiments 从不可重复的加阶实验中测试多重分散效应

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2024-05-23 DOI: 10.1111/anzs.12416

Shin-Fu Tsai, Shan-Syue He

Optimal addition orders of several components can be determined systematically to address order-of-addition problems when active location and dispersion effects are both taken into account. Based on the concept of fiducial generalised pivotal quantities, a new testing procedure is proposed in this paper to identify active dispersion effects from unreplicated order-of-addition experiments. Because the proposed method is free of all nuisance parameters indexed by the requirement set, it is capable of testing multiple dispersion effects. Simulation results show that the proposed method can maintain the empirical sizes close to the nominal level. A paint viscosity study is used to show that the proposed method can be practical. In addition, testable requirement sets are characterised when an order-of-addition orthogonal array is used to design an experiment.

在同时考虑活性位置效应和分散效应的情况下，可以系统地确定几种成分的最佳添加顺序，以解决添加顺序问题。本文基于 "固定通用枢轴量 "的概念，提出了一种新的测试程序，用于从不可重复的添加阶次实验中识别主动分散效应。由于所提出的方法不受要求集索引的所有滋扰参数的影响，因此能够测试多种分散效应。仿真结果表明，建议的方法可以将经验尺寸保持在接近额定值的水平。一项涂料粘度研究表明，建议的方法是实用的。此外，在使用加阶正交阵列设计实验时，可测试的要求集也得到了表征。

引用次数: 0

A calibrated data-driven approach for small area estimation using big data 利用大数据进行小面积估算的校准数据驱动方法

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2024-05-14 DOI: 10.1111/anzs.12414

Siu-Ming Tam, Shaila Sharmeen

Where the response variable in a big dataset is consistent with the variable of interest for small area estimation, the big data by itself can provide the estimates for small areas. These estimates are often subject to the coverage and measurement error bias inherited from the big data. However, if a probability survey of the same variable of interest is available, the survey data can be used as a training dataset to develop an algorithm to impute for the data missed by the big data and adjust for measurement errors. In this paper, we outline a methodology for such imputations based on an k-nearest neighbours (kNN) algorithm calibrated to an asymptotically design-unbiased estimate of the national total, and illustrate the use of a training dataset to estimate the imputation bias and the “fixed-k asymptotic” bootstrap to estimate the variance of the small area hybrid estimator. We illustrate the methodology of this paper using a public-use dataset and use it to compare the accuracy and precision of our hybrid estimator with the Fay–Harriot (FH) estimator. Finally, we also examine numerically the accuracy and precision of the FH estimator when the auxiliary variables used in the linking models are subject to undercoverage errors.

摘要当大数据集中的响应变量与小地区估算中的相关变量一致时，大数据本身就可以提供小地区的估算值。这些估算值通常会受到大数据的覆盖范围和测量误差偏差的影响。不过，如果有对相同相关变量的概率调查，则可将调查数据用作训练数据集，以开发算法来估算大数据遗漏的数据并调整测量误差。在本文中，我们概述了一种基于 k 近邻（kNN）算法的此类估算方法，该算法被校准为对全国总量的渐近设计无偏估计，并说明了如何使用训练数据集来估算估算偏差，以及如何使用 "固定-k 渐近 "自举法来估算小范围混合估算器的方差。我们使用一个公共使用数据集来说明本文的方法，并用它来比较我们的混合估算器与费-哈里奥特（FH）估算器的准确性和精确度。最后，我们还从数值上检验了当连接模型中使用的辅助变量受到覆盖不足误差影响时 FH 估算器的准确性和精确度。

{"title":"A calibrated data-driven approach for small area estimation using big data","authors":"Siu-Ming Tam, Shaila Sharmeen","doi":"10.1111/anzs.12414","DOIUrl":"10.1111/anzs.12414","url":null,"abstract":"<div>\u0000 \u0000 Where the response variable in a big dataset is consistent with the variable of interest for small area estimation, the big data by itself can provide the estimates for small areas. These estimates are often subject to the coverage and measurement error bias inherited from the big data. However, if a probability survey of the same variable of interest is available, the survey data can be used as a training dataset to develop an algorithm to impute for the data missed by the big data and adjust for measurement errors. In this paper, we outline a methodology for such imputations based on an k-nearest neighbours (kNN) algorithm calibrated to an asymptotically design-unbiased estimate of the national total, and illustrate the use of a training dataset to estimate the imputation bias and the “fixed-k asymptotic” bootstrap to estimate the variance of the small area hybrid estimator. We illustrate the methodology of this paper using a public-use dataset and use it to compare the accuracy and precision of our hybrid estimator with the Fay–Harriot (FH) estimator. Finally, we also examine numerically the accuracy and precision of the FH estimator when the auxiliary variables used in the linking models are subject to undercoverage errors.\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"125-145"},"PeriodicalIF":1.1,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Approximate inferences for Bayesian hierarchical generalised linear regression models 贝叶斯分层广义线性回归模型的近似推论

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2024-05-08 DOI: 10.1111/anzs.12412

Brandon Berman, Wesley O. Johnson, Weining Shen

Generalised linear mixed regression models are fundamental in statistics. Modelling random effects that are shared by individuals allows for correlation among those individuals. There are many methods and statistical packages available for analysing data using these models. Most require some form of numerical or analytic approximation because the likelihood function generally involves intractable integrals over the latents. The Bayesian approach avoids this issue by iteratively sampling the full conditional distributions for various blocks of parameters and latent random effects. Depending on the choice of the prior, some full conditionals are recognisable while others are not. In this paper we develop a novel normal approximation for the random effects full conditional, establish its asymptotic correctness and evaluate how well it performs. We make the case for hierarchical binomial and Poisson regression models with canonical link functions, for hierarchical gamma regression models with log link and for other cases. We also develop what we term a sufficient reduction (SR) approach to the Markov Chain Monte Carlo algorithm that allows for making inferences about all model parameters by replacing the full conditional for the latent variables with a considerably reduced dimensional function of the latents. We expect that this approximation could be quite useful in situations where there are a very large number of latent effects, which may be occurring in an increasingly ‘Big Data’ world. In the sequel, we compare our methods with INLA, which is a particularly popular method and which has been shown to be excellent in terms of speed and accuracy across a variety of settings. Our methods appear to be comparable to theirs in terms of accuracy, while INLA was faster, for the settings we considered. In addition, we note that our methods and those of others that involve Gibbs sampling trivially handle parameters that are functions of multiple parameters, while INLA approximations do not. Our primary illustration is for a three-level hierarchical binomial regression model for data on health outcomes for patients who are clustered within physicians who are clustered within particular hospitals or hospital systems.

摘要广义线性混合回归模型是统计学的基础。对个体共享的随机效应进行建模，可以考虑这些个体之间的相关性。有许多方法和统计软件包可用于使用这些模型分析数据。大多数都需要某种形式的数值或分析近似，因为似然函数通常涉及对潜变量进行难以处理的积分。贝叶斯方法通过对各种参数块和潜在随机效应的全条件分布进行迭代采样，避免了这一问题。根据先验值的选择，一些全条件分布是可识别的，而另一些则不可识别。在本文中，我们开发了一种新的随机效应全条件正态近似值，建立了其渐近正确性，并对其性能进行了评估。我们对具有典型联系函数的分层二项式和泊松回归模型、具有对数联系的分层伽马回归模型以及其他情况进行了论证。我们还为马尔可夫链蒙特卡洛算法开发了一种称为 "充分还原（SR）"的方法，通过用一个大大降低维度的潜变量函数来替代潜变量的全条件，从而对所有模型参数进行推断。我们预计，这种近似方法在存在大量潜变量效应的情况下会非常有用，而这种情况可能会出现在越来越多的 "大数据 "世界中。在接下来的文章中，我们将把我们的方法与 INLA 进行比较，INLA 是一种特别流行的方法，在各种情况下都表现出卓越的速度和准确性。我们的方法在准确性方面似乎与 INLA 不相上下，而在我们考虑的环境中，INLA 的速度更快。此外，我们还注意到，我们的方法和其他涉及吉布斯采样的方法可以轻松处理多个参数的函数参数，而 INLA 近似方法则不行。我们的主要示例是一个三级分层二叉回归模型，该模型针对的是聚集在特定医院或医院系统内的医生的病人健康结果数据。

{"title":"Approximate inferences for Bayesian hierarchical generalised linear regression models","authors":"Brandon Berman, Wesley O. Johnson, Weining Shen","doi":"10.1111/anzs.12412","DOIUrl":"10.1111/anzs.12412","url":null,"abstract":"<div>\u0000 \u0000 Generalised linear mixed regression models are fundamental in statistics. Modelling random effects that are shared by individuals allows for correlation among those individuals. There are many methods and statistical packages available for analysing data using these models. Most require some form of numerical or analytic approximation because the likelihood function generally involves intractable integrals over the latents. The Bayesian approach avoids this issue by iteratively sampling the full conditional distributions for various blocks of parameters and latent random effects. Depending on the choice of the prior, some full conditionals are recognisable while others are not. In this paper we develop a novel normal approximation for the random effects full conditional, establish its asymptotic correctness and evaluate how well it performs. We make the case for hierarchical binomial and Poisson regression models with canonical link functions, for hierarchical gamma regression models with log link and for other cases. We also develop what we term a sufficient reduction (SR) approach to the Markov Chain Monte Carlo algorithm that allows for making inferences about all model parameters by replacing the full conditional for the latent variables with a considerably reduced dimensional function of the latents. We expect that this approximation could be quite useful in situations where there are a very large number of latent effects, which may be occurring in an increasingly ‘Big Data’ world. In the sequel, we compare our methods with INLA, which is a particularly popular method and which has been shown to be excellent in terms of speed and accuracy across a variety of settings. Our methods appear to be comparable to theirs in terms of accuracy, while INLA was faster, for the settings we considered. In addition, we note that our methods and those of others that involve Gibbs sampling trivially handle parameters that are functions of multiple parameters, while INLA approximations do not. Our primary illustration is for a three-level hierarchical binomial regression model for data on health outcomes for patients who are clustered within physicians who are clustered within particular hospitals or hospital systems.\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"163-203"},"PeriodicalIF":1.1,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140941801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semi-supervised Gaussian mixture modelling with a missing-data mechanism in R 用 R 语言建立具有缺失数据机制的半监督高斯混合物模型

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2024-05-05 DOI: 10.1111/anzs.12413

Ziyang Lyu, Daniel Ahfock, Ryan Thompson, Geoffrey J. McLachlan

Semi-supervised learning is being extensively applied to estimate classifiers from training data in which not all the labels of the feature vectors are available. We present gmmsslm, an R package for estimating the Bayes' classifier from such partially classified data in the case where the feature vector has a multivariate Gaussian (normal) distribution in each of the pre-defined classes. Our package implements a recently proposed Gaussian mixture modelling framework that incorporates a missingness mechanism for the missing labels in which the probability of a missing label is represented via a logistic model with covariates that depend on the entropy of the feature vector. Under this framework, it has been shown that the accuracy of the Bayes' classifier formed from the Gaussian mixture model fitted to the partially classified training data can even have lower error rate than if it were estimated from the sample completely classified. This result was established in the particular case of two Gaussian classes with a common covariance matrix. Here we focus on the effective implementation of an algorithm for multiple Gaussian classes with arbitrary covariance matrices. A strategy for initialising the algorithm is discussed and illustrated. The new package is demonstrated on some real data.

摘要半监督学习被广泛应用于从并非所有特征向量标签都可用的训练数据中估计分类器。我们介绍的 gmmsslm 是一个 R 软件包，用于在特征向量在每个预定义类别中都具有多元高斯（正态）分布的情况下，从此类部分分类数据中估计贝叶斯分类器。我们的软件包实现了最近提出的高斯混合建模框架，该框架纳入了缺失标签的缺失机制，其中缺失标签的概率通过一个逻辑模型来表示，该模型的协变量取决于特征向量的熵。在这一框架下，贝叶斯分类器的准确率甚至低于根据完全分类样本估计的准确率。这一结果是在两个具有共同协方差矩阵的高斯类的特殊情况下得出的。在此，我们将重点讨论如何有效地实现具有任意协方差矩阵的多个高斯类的算法。我们讨论并说明了初始化算法的策略。新软件包在一些真实数据上进行了演示。

{"title":"Semi-supervised Gaussian mixture modelling with a missing-data mechanism in R","authors":"Ziyang Lyu, Daniel Ahfock, Ryan Thompson, Geoffrey J. McLachlan","doi":"10.1111/anzs.12413","DOIUrl":"10.1111/anzs.12413","url":null,"abstract":"Semi-supervised learning is being extensively applied to estimate classifiers from training data in which not all the labels of the feature vectors are available. We present gmmsslm, an R package for estimating the Bayes' classifier from such partially classified data in the case where the feature vector has a multivariate Gaussian (normal) distribution in each of the pre-defined classes. Our package implements a recently proposed Gaussian mixture modelling framework that incorporates a missingness mechanism for the missing labels in which the probability of a missing label is represented via a logistic model with covariates that depend on the entropy of the feature vector. Under this framework, it has been shown that the accuracy of the Bayes' classifier formed from the Gaussian mixture model fitted to the partially classified training data can even have lower error rate than if it were estimated from the sample completely classified. This result was established in the particular case of two Gaussian classes with a common covariance matrix. Here we focus on the effective implementation of an algorithm for multiple Gaussian classes with arbitrary covariance matrices. A strategy for initialising the algorithm is discussed and illustrated. The new package is demonstrated on some real data.","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"146-162"},"PeriodicalIF":1.1,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12413","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A class of kth-order dependence-driven random coefficient mixed thinning integer-valued autoregressive process to analyse epileptic seizure data and COVID-19 data 一类用于分析癫痫发作数据和 COVID-19 数据的 kth 阶依赖性驱动随机系数混合稀疏整数值自回归过程

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2024-04-08 DOI: 10.1111/anzs.12411

Xiufang Liu, Dehui Wang, Huaping Chen, Lifang Zhao, Liang Liu

Data related to the counting of elements of variable character are frequently encountered in time series studies. This paper brings forward a new class of $� � � k � � �$ th-order dependence-driven random coefficient mixed thinning integer-valued autoregressive time series model (DDRCMTINAR( $� � � k � � �$ )) to deal with such data. Stationarity and ergodicity properties of the proposed model are derived in detail. The unknown parameters are estimated by conditional least squares, and modified quasi-likelihood and asymptotic normality of the obtained parameter estimators is established. The performances of the adopted estimate methods are checked via simulations, which present that modified quasi-likelihood estimators perform better than the conditional least squares considering the proportion of within- $� � � Ω � � �$ estimates in certain regions of the parameter space. The validity and practical utility of the model are investigated by epileptic seizure data and COVID-19 data of suspected cases in China.

摘要在时间序列研究中经常会遇到与变量元素计数有关的数据。本文提出了一类新的三阶依赖驱动随机系数混合稀疏整数值自回归时间序列模型（DDRCMTINAR()）来处理这类数据。详细推导了所提模型的平稳性和遍历性。用条件最小二乘法估计未知参数，并建立了修正准似然法和所获参数估计值的渐近正态性。通过模拟检验了所采用的估计方法的性能，结果表明，考虑到参数空间某些区域内估计值的比例，修正的准似然估计值的性能优于条件最小二乘法。该模型的有效性和实用性通过中国癫痫发作数据和 COVID-19 疑似病例数据进行了研究。

{"title":"A class of kth-order dependence-driven random coefficient mixed thinning integer-valued autoregressive process to analyse epileptic seizure data and COVID-19 data","authors":"Xiufang Liu, Dehui Wang, Huaping Chen, Lifang Zhao, Liang Liu","doi":"10.1111/anzs.12411","DOIUrl":"10.1111/anzs.12411","url":null,"abstract":"<div>\u0000 \u0000 Data related to the counting of elements of variable character are frequently encountered in time series studies. This paper brings forward a new class of <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>k</mi>\u0000 </mrow>\u0000 <annotation>$$ k $$</annotation>\u0000 </semantics></math>th-order dependence-driven random coefficient mixed thinning integer-valued autoregressive time series model (DDRCMTINAR(<math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>k</mi>\u0000 </mrow>\u0000 <annotation>$$ k $$</annotation>\u0000 </semantics></math>)) to deal with such data. Stationarity and ergodicity properties of the proposed model are derived in detail. The unknown parameters are estimated by conditional least squares, and modified quasi-likelihood and asymptotic normality of the obtained parameter estimators is established. The performances of the adopted estimate methods are checked via simulations, which present that modified quasi-likelihood estimators perform better than the conditional least squares considering the proportion of within-<math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>Ω</mi>\u0000 </mrow>\u0000 <annotation>$$ Omega $$</annotation>\u0000 </semantics></math> estimates in certain regions of the parameter space. The validity and practical utility of the model are investigated by epileptic seizure data and COVID-19 data of suspected cases in China.\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"249-280"},"PeriodicalIF":1.1,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140599831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian hypothesis tests with diffuse priors: Can we have our cake and eat it too? 具有扩散先验的贝叶斯假设检验：我们能既吃蛋糕又吃蛋糕吗？

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2024-03-19 DOI: 10.1111/anzs.12410

J. T. Ormerod, M. Stewart, W. Yu, S. E. Romanes

We propose a new class of priors for Bayesian hypothesis testing, which we name ‘cake priors’. These priors circumvent the Jeffreys–Lindley paradox (also called Bartlett's paradox) a problem associated with the use of diffuse priors leading to nonsensical statistical inferences. Cake priors allow the use of diffuse priors (having one's cake) while achieving theoretically justified inferences (eating it too). We demonstrate this methodology for Bayesian hypotheses tests for various common scenarios. The resulting Bayesian test statistic takes the form of a penalised likelihood ratio test statistic. Under typical regularity conditions, we show that Bayesian hypothesis tests based on cake priors are Chernoff consistent, that is, achieve zero type I and II error probabilities asymptotically. We also discuss Lindley's paradox and argue that the paradox occurs with small and vanishing probability as sample size increases.

我们提出了一类新的贝叶斯假设检验先验，并将其命名为 "蛋糕先验"。这些先验值规避了杰弗里斯-林德利悖论（又称巴特利悖论），这是一个与使用扩散先验值导致不合理统计推断有关的问题。蛋糕先验允许使用扩散先验（拥有自己的蛋糕），同时实现理论上合理的推论（吃蛋糕）。我们针对各种常见情况的贝叶斯假设检验演示了这种方法。由此得出的贝叶斯检验统计量采用了惩罚似然比检验统计量的形式。在典型的正则条件下，我们证明了基于饼先验的贝叶斯假设检验是切尔诺夫一致的，即渐进地达到零I型和II型误差概率。我们还讨论了林德利悖论，并论证了随着样本量的增加，该悖论出现的概率很小，甚至消失。

引用次数: 0

Asymptotics for the conditional self-weighted M $$ M $$ estimator of GRCA( p $$ p $$ ) models and its statistical inference GRCA(p$$ p$$) 模型的条件自加权 M$$ M$$ 估计器的渐近性及其统计推论

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2024-02-21 DOI: 10.1111/anzs.12408

Chi Yao, Wei Yu, Xuejun Wang

Under the $� � � p � � �$ -order generalised random coefficient autoregressive (GRCA( $� � � p � � �$ )) model with random coefficients $� � � �_{� Φ � � � t � �} �, � � �$ we propose a conditional self-weighted $� � � M � � �$ estimator of $� � � E � �_{� Φ � � � t � �} � � �$ . We investigate the asymptotic normality of this estimator with possibly heavy-tailed random variables. Furthermore, a Wald test statistic is constructed for the linear restriction on the parameters. In addition, the simulation experiments are carried out to assess the finite sample performance of theoretical results. Finally, a real data analysis about the increase (%) in the number of construction projects this year over the same period of last year is provided.

摘要在具有随机系数的-阶广义随机系数自回归（GRCA()）模型下，我们提出了一个条件自加权估计器。我们研究了该估计器在可能存在重尾随机变量的情况下的渐近正态性。此外，我们还构建了参数线性限制的 Wald 检验统计量。此外，我们还进行了模拟实验，以评估理论结果的有限样本性能。最后，提供了有关今年建筑项目数量比去年同期增长（%）的真实数据分析。

{"title":"Asymptotics for the conditional self-weighted \u0000 \u0000 \u0000 M\u0000 \u0000 $$ M $$\u0000 estimator of GRCA(\u0000 \u0000 \u0000 p\u0000 \u0000 $$ p $$\u0000 ) models and its statistical inference","authors":"Chi Yao, Wei Yu, Xuejun Wang","doi":"10.1111/anzs.12408","DOIUrl":"10.1111/anzs.12408","url":null,"abstract":"<div>\u0000 \u0000 Under the <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>p</mi>\u0000 </mrow>\u0000 <annotation>$$ p $$</annotation>\u0000 </semantics></math>-order generalised random coefficient autoregressive (GRCA(<math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>p</mi>\u0000 </mrow>\u0000 <annotation>$$ p $$</annotation>\u0000 </semantics></math>)) model with random coefficients <math>\u0000 <semantics>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>Φ</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>t</mi>\u0000 </mrow>\u0000 </msub>\u0000 <mo>,</mo>\u0000 </mrow>\u0000 <annotation>$$ {boldsymbol{Phi}}_t, $$</annotation>\u0000 </semantics></math> we propose a conditional self-weighted <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>M</mi>\u0000 </mrow>\u0000 <annotation>$$ M $$</annotation>\u0000 </semantics></math> estimator of <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>E</mi>\u0000 <msub>\u0000 <mrow>\u0000 <mi>Φ</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>t</mi>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$$ mathrm{E}{boldsymbol{Phi}}_t $$</annotation>\u0000 </semantics></math>. We investigate the asymptotic normality of this estimator with possibly heavy-tailed random variables. Furthermore, a Wald test statistic is constructed for the linear restriction on the parameters. In addition, the simulation experiments are carried out to assess the finite sample performance of theoretical results. Finally, a real data analysis about the increase (%) in the number of construction projects this year over the same period of last year is provided.\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 1","pages":"103-124"},"PeriodicalIF":1.1,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139954304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unified robust estimation 统一稳健估算

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2024-02-20 DOI: 10.1111/anzs.12409

Zhu Wang

Robust estimation is primarily concerned with providing reliable parameter estimates in the presence of outliers. Numerous robust loss functions have been proposed in regression and classification, along with various computing algorithms. In modern penalised generalised linear models (GLMs), however, there is limited research on robust estimation that can provide weights to determine the outlier status of the observations. This article proposes a unified framework based on a large family of loss functions, a composite of concave and convex functions (CC-family). Properties of the CC-family are investigated, and CC-estimation is innovatively conducted via the iteratively reweighted convex optimisation (IRCO), which is a generalisation of the iteratively reweighted least squares in robust linear regression. For robust GLM, the IRCO becomes the iteratively reweighted GLM. The unified framework contains penalised estimation and robust support vector machine (SVM) and is demonstrated with a variety of data applications.

摘要稳健估计主要涉及在存在异常值的情况下提供可靠的参数估计。在回归和分类中提出了许多稳健损失函数以及各种计算算法。然而，在现代惩罚性广义线性模型（GLM）中，能提供权重以确定观测值离群状态的稳健估计研究还很有限。本文提出了一个基于损失函数大家族的统一框架，即凹函数和凸函数的复合体（CC-family）。本文研究了 CC 系列的特性，并通过迭代加权凸优化（IRCO）创新性地进行了 CC 估计，IRCO 是稳健线性回归中迭代加权最小二乘法的概括。对于稳健 GLM，IRCO 成为迭代重权 GLM。该统一框架包含惩罚估计和稳健支持向量机（SVM），并通过各种数据应用进行了演示。

引用次数: 0

Latent heterogeneity in COVID-19 hospitalisations: a cluster-weighted approach to analyse mortality COVID-19 住院病例的潜在异质性：采用聚类加权法分析死亡率

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2024-02-13 DOI: 10.1111/anzs.12407

Paolo Berta, Salvatore Ingrassia, Giorgio Vittadini, Daniele Spinelli

The COVID-19 pandemic caused an unprecedented excess mortality. Since 2020, many studies have focussed on the characteristics of COVID-19 patients who did not survive. From the statistical point of view, what seems to dominate is the large heterogeneity of the populations affected by COVID-19 and the extreme difficulty in identifying subpopulations who died affected by a plurality of contemporary characteristics. In this paper, we propose an extremely flexible approach based on a cluster-weighted model, which allows us to identify latent groups of patients sharing similar characteristics at the moment of hospitalisation as well as a similar mortality. We focus on one of the hardest hit areas in Italy and study the heterogeneity in the population of patients affected by COVID-19 using administrative data on hospitalisations in the first wave of the pandemic. Results highlighted that a model-based clustering approach is essential to understand the complexity of the COVID-19 patients treated by hospitals and who die during hospitalisation.

COVID-19 大流行造成了前所未有的超额死亡率。自 2020 年以来，许多研究重点关注 COVID-19 未存活患者的特征。从统计学的角度来看，受 COVID-19 影响的人群具有很大的异质性，要识别受多种当代特征影响而死亡的亚人群极其困难。在本文中，我们提出了一种基于聚类加权模型的极为灵活的方法，该方法允许我们识别在住院时具有相似特征以及相似死亡率的潜在患者群体。我们将重点放在意大利的重灾区之一，并利用大流行第一波住院治疗的行政数据研究了受 COVID-19 影响的患者群体的异质性。研究结果表明，基于模型的聚类方法对于了解接受医院治疗并在住院期间死亡的 COVID-19 患者的复杂性至关重要。

{"title":"Latent heterogeneity in COVID-19 hospitalisations: a cluster-weighted approach to analyse mortality","authors":"Paolo Berta, Salvatore Ingrassia, Giorgio Vittadini, Daniele Spinelli","doi":"10.1111/anzs.12407","DOIUrl":"10.1111/anzs.12407","url":null,"abstract":"The COVID-19 pandemic caused an unprecedented excess mortality. Since 2020, many studies have focussed on the characteristics of COVID-19 patients who did not survive. From the statistical point of view, what seems to dominate is the large heterogeneity of the populations affected by COVID-19 and the extreme difficulty in identifying subpopulations who died affected by a plurality of contemporary characteristics. In this paper, we propose an extremely flexible approach based on a cluster-weighted model, which allows us to identify latent groups of patients sharing similar characteristics at the moment of hospitalisation as well as a similar mortality. We focus on one of the hardest hit areas in Italy and study the heterogeneity in the population of patients affected by COVID-19 using administrative data on hospitalisations in the first wave of the pandemic. Results highlighted that a model-based clustering approach is essential to understand the complexity of the COVID-19 patients treated by hospitals and who die during hospitalisation.","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 1","pages":"1-20"},"PeriodicalIF":1.1,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12407","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139772597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0