首页 > 最新文献

Australian & New Zealand Journal of Statistics最新文献

英文 中文
Testing multiple dispersion effects from unreplicated order-of-addition experiments 从不可重复的加阶实验中测试多重分散效应
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-05-23 DOI: 10.1111/anzs.12416
Shin-Fu Tsai, Shan-Syue He

Optimal addition orders of several components can be determined systematically to address order-of-addition problems when active location and dispersion effects are both taken into account. Based on the concept of fiducial generalised pivotal quantities, a new testing procedure is proposed in this paper to identify active dispersion effects from unreplicated order-of-addition experiments. Because the proposed method is free of all nuisance parameters indexed by the requirement set, it is capable of testing multiple dispersion effects. Simulation results show that the proposed method can maintain the empirical sizes close to the nominal level. A paint viscosity study is used to show that the proposed method can be practical. In addition, testable requirement sets are characterised when an order-of-addition orthogonal array is used to design an experiment.

在同时考虑活性位置效应和分散效应的情况下,可以系统地确定几种成分的最佳添加顺序,以解决添加顺序问题。本文基于 "固定通用枢轴量 "的概念,提出了一种新的测试程序,用于从不可重复的添加阶次实验中识别主动分散效应。由于所提出的方法不受要求集索引的所有滋扰参数的影响,因此能够测试多种分散效应。仿真结果表明,建议的方法可以将经验尺寸保持在接近额定值的水平。一项涂料粘度研究表明,建议的方法是实用的。此外,在使用加阶正交阵列设计实验时,可测试的要求集也得到了表征。
{"title":"Testing multiple dispersion effects from unreplicated order-of-addition experiments","authors":"Shin-Fu Tsai,&nbsp;Shan-Syue He","doi":"10.1111/anzs.12416","DOIUrl":"10.1111/anzs.12416","url":null,"abstract":"<p>Optimal addition orders of several components can be determined systematically to address order-of-addition problems when active location and dispersion effects are both taken into account. Based on the concept of fiducial generalised pivotal quantities, a new testing procedure is proposed in this paper to identify active dispersion effects from unreplicated order-of-addition experiments. Because the proposed method is free of all nuisance parameters indexed by the requirement set, it is capable of testing multiple dispersion effects. Simulation results show that the proposed method can maintain the empirical sizes close to the nominal level. A paint viscosity study is used to show that the proposed method can be practical. In addition, testable requirement sets are characterised when an order-of-addition orthogonal array is used to design an experiment.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"228-248"},"PeriodicalIF":1.1,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12416","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141104106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A calibrated data-driven approach for small area estimation using big data 利用大数据进行小面积估算的校准数据驱动方法
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-05-14 DOI: 10.1111/anzs.12414
Siu-Ming Tam, Shaila Sharmeen

Where the response variable in a big dataset is consistent with the variable of interest for small area estimation, the big data by itself can provide the estimates for small areas. These estimates are often subject to the coverage and measurement error bias inherited from the big data. However, if a probability survey of the same variable of interest is available, the survey data can be used as a training dataset to develop an algorithm to impute for the data missed by the big data and adjust for measurement errors. In this paper, we outline a methodology for such imputations based on an k-nearest neighbours (kNN) algorithm calibrated to an asymptotically design-unbiased estimate of the national total, and illustrate the use of a training dataset to estimate the imputation bias and the “fixed-k asymptotic” bootstrap to estimate the variance of the small area hybrid estimator. We illustrate the methodology of this paper using a public-use dataset and use it to compare the accuracy and precision of our hybrid estimator with the Fay–Harriot (FH) estimator. Finally, we also examine numerically the accuracy and precision of the FH estimator when the auxiliary variables used in the linking models are subject to undercoverage errors.

摘要当大数据集中的响应变量与小地区估算中的相关变量一致时,大数据本身就可以提供小地区的估算值。这些估算值通常会受到大数据的覆盖范围和测量误差偏差的影响。不过,如果有对相同相关变量的概率调查,则可将调查数据用作训练数据集,以开发算法来估算大数据遗漏的数据并调整测量误差。在本文中,我们概述了一种基于 k 近邻(kNN)算法的此类估算方法,该算法被校准为对全国总量的渐近设计无偏估计,并说明了如何使用训练数据集来估算估算偏差,以及如何使用 "固定-k 渐近 "自举法来估算小范围混合估算器的方差。我们使用一个公共使用数据集来说明本文的方法,并用它来比较我们的混合估算器与费-哈里奥特(FH)估算器的准确性和精确度。最后,我们还从数值上检验了当连接模型中使用的辅助变量受到覆盖不足误差影响时 FH 估算器的准确性和精确度。
{"title":"A calibrated data-driven approach for small area estimation using big data","authors":"Siu-Ming Tam,&nbsp;Shaila Sharmeen","doi":"10.1111/anzs.12414","DOIUrl":"10.1111/anzs.12414","url":null,"abstract":"<div>\u0000 \u0000 <p>Where the response variable in a big dataset is consistent with the variable of interest for small area estimation, the big data by itself can provide the estimates for small areas. These estimates are often subject to the coverage and measurement error bias inherited from the big data. However, if a probability survey of the same variable of interest is available, the survey data can be used as a training dataset to develop an algorithm to impute for the data missed by the big data and adjust for measurement errors. In this paper, we outline a methodology for such imputations based on an <i>k</i>-nearest neighbours (kNN) algorithm calibrated to an asymptotically design-unbiased estimate of the national total, and illustrate the use of a training dataset to estimate the imputation bias and the “fixed-<i>k</i> asymptotic” bootstrap to estimate the variance of the small area hybrid estimator. We illustrate the methodology of this paper using a public-use dataset and use it to compare the accuracy and precision of our hybrid estimator with the Fay–Harriot (FH) estimator. Finally, we also examine numerically the accuracy and precision of the FH estimator when the auxiliary variables used in the linking models are subject to undercoverage errors.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"125-145"},"PeriodicalIF":1.1,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximate inferences for Bayesian hierarchical generalised linear regression models 贝叶斯分层广义线性回归模型的近似推论
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-05-08 DOI: 10.1111/anzs.12412
Brandon Berman, Wesley O. Johnson, Weining Shen

Generalised linear mixed regression models are fundamental in statistics. Modelling random effects that are shared by individuals allows for correlation among those individuals. There are many methods and statistical packages available for analysing data using these models. Most require some form of numerical or analytic approximation because the likelihood function generally involves intractable integrals over the latents. The Bayesian approach avoids this issue by iteratively sampling the full conditional distributions for various blocks of parameters and latent random effects. Depending on the choice of the prior, some full conditionals are recognisable while others are not. In this paper we develop a novel normal approximation for the random effects full conditional, establish its asymptotic correctness and evaluate how well it performs. We make the case for hierarchical binomial and Poisson regression models with canonical link functions, for hierarchical gamma regression models with log link and for other cases. We also develop what we term a sufficient reduction (SR) approach to the Markov Chain Monte Carlo algorithm that allows for making inferences about all model parameters by replacing the full conditional for the latent variables with a considerably reduced dimensional function of the latents. We expect that this approximation could be quite useful in situations where there are a very large number of latent effects, which may be occurring in an increasingly ‘Big Data’ world. In the sequel, we compare our methods with INLA, which is a particularly popular method and which has been shown to be excellent in terms of speed and accuracy across a variety of settings. Our methods appear to be comparable to theirs in terms of accuracy, while INLA was faster, for the settings we considered. In addition, we note that our methods and those of others that involve Gibbs sampling trivially handle parameters that are functions of multiple parameters, while INLA approximations do not. Our primary illustration is for a three-level hierarchical binomial regression model for data on health outcomes for patients who are clustered within physicians who are clustered within particular hospitals or hospital systems.

摘要广义线性混合回归模型是统计学的基础。对个体共享的随机效应进行建模,可以考虑这些个体之间的相关性。有许多方法和统计软件包可用于使用这些模型分析数据。大多数都需要某种形式的数值或分析近似,因为似然函数通常涉及对潜变量进行难以处理的积分。贝叶斯方法通过对各种参数块和潜在随机效应的全条件分布进行迭代采样,避免了这一问题。根据先验值的选择,一些全条件分布是可识别的,而另一些则不可识别。在本文中,我们开发了一种新的随机效应全条件正态近似值,建立了其渐近正确性,并对其性能进行了评估。我们对具有典型联系函数的分层二项式和泊松回归模型、具有对数联系的分层伽马回归模型以及其他情况进行了论证。我们还为马尔可夫链蒙特卡洛算法开发了一种称为 "充分还原(SR)"的方法,通过用一个大大降低维度的潜变量函数来替代潜变量的全条件,从而对所有模型参数进行推断。我们预计,这种近似方法在存在大量潜变量效应的情况下会非常有用,而这种情况可能会出现在越来越多的 "大数据 "世界中。在接下来的文章中,我们将把我们的方法与 INLA 进行比较,INLA 是一种特别流行的方法,在各种情况下都表现出卓越的速度和准确性。我们的方法在准确性方面似乎与 INLA 不相上下,而在我们考虑的环境中,INLA 的速度更快。此外,我们还注意到,我们的方法和其他涉及吉布斯采样的方法可以轻松处理多个参数的函数参数,而 INLA 近似方法则不行。我们的主要示例是一个三级分层二叉回归模型,该模型针对的是聚集在特定医院或医院系统内的医生的病人健康结果数据。
{"title":"Approximate inferences for Bayesian hierarchical generalised linear regression models","authors":"Brandon Berman,&nbsp;Wesley O. Johnson,&nbsp;Weining Shen","doi":"10.1111/anzs.12412","DOIUrl":"10.1111/anzs.12412","url":null,"abstract":"<div>\u0000 \u0000 <p>Generalised linear mixed regression models are fundamental in statistics. Modelling random effects that are shared by individuals allows for correlation among those individuals. There are many methods and statistical packages available for analysing data using these models. Most require some form of numerical or analytic approximation because the likelihood function generally involves intractable integrals over the latents. The Bayesian approach avoids this issue by iteratively sampling the full conditional distributions for various blocks of parameters and latent random effects. Depending on the choice of the prior, some full conditionals are recognisable while others are not. In this paper we develop a novel normal approximation for the random effects full conditional, establish its asymptotic correctness and evaluate how well it performs. We make the case for hierarchical binomial and Poisson regression models with canonical link functions, for hierarchical gamma regression models with log link and for other cases. We also develop what we term a sufficient reduction (SR) approach to the Markov Chain Monte Carlo algorithm that allows for making inferences about all model parameters by replacing the full conditional for the latent variables with a considerably reduced dimensional function of the latents. We expect that this approximation could be quite useful in situations where there are a very large number of latent effects, which may be occurring in an increasingly ‘Big Data’ world. In the sequel, we compare our methods with INLA, which is a particularly popular method and which has been shown to be excellent in terms of speed and accuracy across a variety of settings. Our methods appear to be comparable to theirs in terms of accuracy, while INLA was faster, for the settings we considered. In addition, we note that our methods and those of others that involve Gibbs sampling trivially handle parameters that are functions of multiple parameters, while INLA approximations do not. Our primary illustration is for a three-level hierarchical binomial regression model for data on health outcomes for patients who are clustered within physicians who are clustered within particular hospitals or hospital systems.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"163-203"},"PeriodicalIF":1.1,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140941801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-supervised Gaussian mixture modelling with a missing-data mechanism in R 用 R 语言建立具有缺失数据机制的半监督高斯混合物模型
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-05-05 DOI: 10.1111/anzs.12413
Ziyang Lyu, Daniel Ahfock, Ryan Thompson, Geoffrey J. McLachlan

Semi-supervised learning is being extensively applied to estimate classifiers from training data in which not all the labels of the feature vectors are available. We present gmmsslm, an R package for estimating the Bayes' classifier from such partially classified data in the case where the feature vector has a multivariate Gaussian (normal) distribution in each of the pre-defined classes. Our package implements a recently proposed Gaussian mixture modelling framework that incorporates a missingness mechanism for the missing labels in which the probability of a missing label is represented via a logistic model with covariates that depend on the entropy of the feature vector. Under this framework, it has been shown that the accuracy of the Bayes' classifier formed from the Gaussian mixture model fitted to the partially classified training data can even have lower error rate than if it were estimated from the sample completely classified. This result was established in the particular case of two Gaussian classes with a common covariance matrix. Here we focus on the effective implementation of an algorithm for multiple Gaussian classes with arbitrary covariance matrices. A strategy for initialising the algorithm is discussed and illustrated. The new package is demonstrated on some real data.

摘要半监督学习被广泛应用于从并非所有特征向量标签都可用的训练数据中估计分类器。我们介绍的 gmmsslm 是一个 R 软件包,用于在特征向量在每个预定义类别中都具有多元高斯(正态)分布的情况下,从此类部分分类数据中估计贝叶斯分类器。我们的软件包实现了最近提出的高斯混合建模框架,该框架纳入了缺失标签的缺失机制,其中缺失标签的概率通过一个逻辑模型来表示,该模型的协变量取决于特征向量的熵。在这一框架下,贝叶斯分类器的准确率甚至低于根据完全分类样本估计的准确率。这一结果是在两个具有共同协方差矩阵的高斯类的特殊情况下得出的。在此,我们将重点讨论如何有效地实现具有任意协方差矩阵的多个高斯类的算法。我们讨论并说明了初始化算法的策略。新软件包在一些真实数据上进行了演示。
{"title":"Semi-supervised Gaussian mixture modelling with a missing-data mechanism in R","authors":"Ziyang Lyu,&nbsp;Daniel Ahfock,&nbsp;Ryan Thompson,&nbsp;Geoffrey J. McLachlan","doi":"10.1111/anzs.12413","DOIUrl":"10.1111/anzs.12413","url":null,"abstract":"<p>Semi-supervised learning is being extensively applied to estimate classifiers from training data in which not all the labels of the feature vectors are available. We present <span>gmmsslm</span>, an <span>R</span> package for estimating the Bayes' classifier from such partially classified data in the case where the feature vector has a multivariate Gaussian (normal) distribution in each of the pre-defined classes. Our package implements a recently proposed Gaussian mixture modelling framework that incorporates a missingness mechanism for the missing labels in which the probability of a missing label is represented via a logistic model with covariates that depend on the entropy of the feature vector. Under this framework, it has been shown that the accuracy of the Bayes' classifier formed from the Gaussian mixture model fitted to the partially classified training data can even have lower error rate than if it were estimated from the sample completely classified. This result was established in the particular case of two Gaussian classes with a common covariance matrix. Here we focus on the effective implementation of an algorithm for multiple Gaussian classes with arbitrary covariance matrices. A strategy for initialising the algorithm is discussed and illustrated. The new package is demonstrated on some real data.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"146-162"},"PeriodicalIF":1.1,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12413","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A class of kth-order dependence-driven random coefficient mixed thinning integer-valued autoregressive process to analyse epileptic seizure data and COVID-19 data 一类用于分析癫痫发作数据和 COVID-19 数据的 kth 阶依赖性驱动随机系数混合稀疏整数值自回归过程
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-04-08 DOI: 10.1111/anzs.12411
Xiufang Liu, Dehui Wang, Huaping Chen, Lifang Zhao, Liang Liu

Data related to the counting of elements of variable character are frequently encountered in time series studies. This paper brings forward a new class of k$$ k $$th-order dependence-driven random coefficient mixed thinning integer-valued autoregressive time series model (DDRCMTINAR(k$$ k $$)) to deal with such data. Stationarity and ergodicity properties of the proposed model are derived in detail. The unknown parameters are estimated by conditional least squares, and modified quasi-likelihood and asymptotic normality of the obtained parameter estimators is established. The performances of the adopted estimate methods are checked via simulations, which present that modified quasi-likelihood estimators perform better than the conditional least squares considering the proportion of within-Ω$$ Omega $$ estimates in certain regions of the parameter space. The validity and practical utility of the model are investigated by epileptic seizure data and COVID-19 data of suspected cases in China.

摘要 在时间序列研究中经常会遇到与变量元素计数有关的数据。本文提出了一类新的三阶依赖驱动随机系数混合稀疏整数值自回归时间序列模型(DDRCMTINAR())来处理这类数据。详细推导了所提模型的平稳性和遍历性。用条件最小二乘法估计未知参数,并建立了修正准似然法和所获参数估计值的渐近正态性。通过模拟检验了所采用的估计方法的性能,结果表明,考虑到参数空间某些区域内估计值的比例,修正的准似然估计值的性能优于条件最小二乘法。该模型的有效性和实用性通过中国癫痫发作数据和 COVID-19 疑似病例数据进行了研究。
{"title":"A class of kth-order dependence-driven random coefficient mixed thinning integer-valued autoregressive process to analyse epileptic seizure data and COVID-19 data","authors":"Xiufang Liu,&nbsp;Dehui Wang,&nbsp;Huaping Chen,&nbsp;Lifang Zhao,&nbsp;Liang Liu","doi":"10.1111/anzs.12411","DOIUrl":"10.1111/anzs.12411","url":null,"abstract":"<div>\u0000 \u0000 <p>Data related to the counting of elements of variable character are frequently encountered in time series studies. This paper brings forward a new class of <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>k</mi>\u0000 </mrow>\u0000 <annotation>$$ k $$</annotation>\u0000 </semantics></math>th-order dependence-driven random coefficient mixed thinning integer-valued autoregressive time series model (DDRCMTINAR(<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>k</mi>\u0000 </mrow>\u0000 <annotation>$$ k $$</annotation>\u0000 </semantics></math>)) to deal with such data. Stationarity and ergodicity properties of the proposed model are derived in detail. The unknown parameters are estimated by conditional least squares, and modified quasi-likelihood and asymptotic normality of the obtained parameter estimators is established. The performances of the adopted estimate methods are checked via simulations, which present that modified quasi-likelihood estimators perform better than the conditional least squares considering the proportion of within-<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>Ω</mi>\u0000 </mrow>\u0000 <annotation>$$ Omega $$</annotation>\u0000 </semantics></math> estimates in certain regions of the parameter space. The validity and practical utility of the model are investigated by epileptic seizure data and COVID-19 data of suspected cases in China.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"249-280"},"PeriodicalIF":1.1,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140599831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian hypothesis tests with diffuse priors: Can we have our cake and eat it too? 具有扩散先验的贝叶斯假设检验:我们能既吃蛋糕又吃蛋糕吗?
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-03-19 DOI: 10.1111/anzs.12410
J. T. Ormerod, M. Stewart, W. Yu, S. E. Romanes

We propose a new class of priors for Bayesian hypothesis testing, which we name ‘cake priors’. These priors circumvent the Jeffreys–Lindley paradox (also called Bartlett's paradox) a problem associated with the use of diffuse priors leading to nonsensical statistical inferences. Cake priors allow the use of diffuse priors (having one's cake) while achieving theoretically justified inferences (eating it too). We demonstrate this methodology for Bayesian hypotheses tests for various common scenarios. The resulting Bayesian test statistic takes the form of a penalised likelihood ratio test statistic. Under typical regularity conditions, we show that Bayesian hypothesis tests based on cake priors are Chernoff consistent, that is, achieve zero type I and II error probabilities asymptotically. We also discuss Lindley's paradox and argue that the paradox occurs with small and vanishing probability as sample size increases.

我们提出了一类新的贝叶斯假设检验先验,并将其命名为 "蛋糕先验"。这些先验值规避了杰弗里斯-林德利悖论(又称巴特利悖论),这是一个与使用扩散先验值导致不合理统计推断有关的问题。蛋糕先验允许使用扩散先验(拥有自己的蛋糕),同时实现理论上合理的推论(吃蛋糕)。我们针对各种常见情况的贝叶斯假设检验演示了这种方法。由此得出的贝叶斯检验统计量采用了惩罚似然比检验统计量的形式。在典型的正则条件下,我们证明了基于饼先验的贝叶斯假设检验是切尔诺夫一致的,即渐进地达到零I型和II型误差概率。我们还讨论了林德利悖论,并论证了随着样本量的增加,该悖论出现的概率很小,甚至消失。
{"title":"Bayesian hypothesis tests with diffuse priors: Can we have our cake and eat it too?","authors":"J. T. Ormerod,&nbsp;M. Stewart,&nbsp;W. Yu,&nbsp;S. E. Romanes","doi":"10.1111/anzs.12410","DOIUrl":"10.1111/anzs.12410","url":null,"abstract":"<p>We propose a new class of priors for Bayesian hypothesis testing, which we name ‘cake priors’. These priors circumvent the Jeffreys–Lindley paradox (also called Bartlett's paradox) a problem associated with the use of diffuse priors leading to nonsensical statistical inferences. Cake priors allow the use of diffuse priors (having one's cake) while achieving theoretically justified inferences (eating it too). We demonstrate this methodology for Bayesian hypotheses tests for various common scenarios. The resulting Bayesian test statistic takes the form of a penalised likelihood ratio test statistic. Under typical regularity conditions, we show that Bayesian hypothesis tests based on cake priors are Chernoff consistent, that is, achieve zero type I and II error probabilities asymptotically. We also discuss Lindley's paradox and argue that the paradox occurs with small and vanishing probability as sample size increases.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"204-227"},"PeriodicalIF":1.1,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12410","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140182403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asymptotics for the conditional self-weighted M $$ M $$ estimator of GRCA( p $$ p $$ ) models and its statistical inference GRCA(p$$ p$$) 模型的条件自加权 M$$ M$$ 估计器的渐近性及其统计推论
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-21 DOI: 10.1111/anzs.12408
Chi Yao, Wei Yu, Xuejun Wang

Under the p$$ p $$-order generalised random coefficient autoregressive (GRCA(p$$ p $$)) model with random coefficients Φt,$$ {boldsymbol{Phi}}_t, $$ we propose a conditional self-weighted M$$ M $$ estimator of EΦt$$ mathrm{E}{boldsymbol{Phi}}_t $$. We investigate the asymptotic normality of this estimator with possibly heavy-tailed random variables. Furthermore, a Wald test statistic is constructed for the linear restriction on the parameters. In addition, the simulation experiments are carried out to assess the finite sample performance of theoretical results. Finally, a real data analysis about the increase (%) in the number of construction projects this year over the same period of last year is provided.

摘要在具有随机系数的-阶广义随机系数自回归(GRCA())模型下,我们提出了一个条件自加权估计器。 我们研究了该估计器在可能存在重尾随机变量的情况下的渐近正态性。此外,我们还构建了参数线性限制的 Wald 检验统计量。此外,我们还进行了模拟实验,以评估理论结果的有限样本性能。最后,提供了有关今年建筑项目数量比去年同期增长(%)的真实数据分析。
{"title":"Asymptotics for the conditional self-weighted \u0000 \u0000 \u0000 M\u0000 \u0000 $$ M $$\u0000 estimator of GRCA(\u0000 \u0000 \u0000 p\u0000 \u0000 $$ p $$\u0000 ) models and its statistical inference","authors":"Chi Yao,&nbsp;Wei Yu,&nbsp;Xuejun Wang","doi":"10.1111/anzs.12408","DOIUrl":"10.1111/anzs.12408","url":null,"abstract":"<div>\u0000 \u0000 <p>Under the <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>p</mi>\u0000 </mrow>\u0000 <annotation>$$ p $$</annotation>\u0000 </semantics></math>-order generalised random coefficient autoregressive (GRCA(<math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>p</mi>\u0000 </mrow>\u0000 <annotation>$$ p $$</annotation>\u0000 </semantics></math>)) model with random coefficients <math>\u0000 <semantics>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>Φ</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>t</mi>\u0000 </mrow>\u0000 </msub>\u0000 <mo>,</mo>\u0000 </mrow>\u0000 <annotation>$$ {boldsymbol{Phi}}_t, $$</annotation>\u0000 </semantics></math> we propose a conditional self-weighted <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>M</mi>\u0000 </mrow>\u0000 <annotation>$$ M $$</annotation>\u0000 </semantics></math> estimator of <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>E</mi>\u0000 <msub>\u0000 <mrow>\u0000 <mi>Φ</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>t</mi>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$$ mathrm{E}{boldsymbol{Phi}}_t $$</annotation>\u0000 </semantics></math>. We investigate the asymptotic normality of this estimator with possibly heavy-tailed random variables. Furthermore, a Wald test statistic is constructed for the linear restriction on the parameters. In addition, the simulation experiments are carried out to assess the finite sample performance of theoretical results. Finally, a real data analysis about the increase (%) in the number of construction projects this year over the same period of last year is provided.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 1","pages":"103-124"},"PeriodicalIF":1.1,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139954304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unified robust estimation 统一稳健估算
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-20 DOI: 10.1111/anzs.12409
Zhu Wang

Robust estimation is primarily concerned with providing reliable parameter estimates in the presence of outliers. Numerous robust loss functions have been proposed in regression and classification, along with various computing algorithms. In modern penalised generalised linear models (GLMs), however, there is limited research on robust estimation that can provide weights to determine the outlier status of the observations. This article proposes a unified framework based on a large family of loss functions, a composite of concave and convex functions (CC-family). Properties of the CC-family are investigated, and CC-estimation is innovatively conducted via the iteratively reweighted convex optimisation (IRCO), which is a generalisation of the iteratively reweighted least squares in robust linear regression. For robust GLM, the IRCO becomes the iteratively reweighted GLM. The unified framework contains penalised estimation and robust support vector machine (SVM) and is demonstrated with a variety of data applications.

摘要稳健估计主要涉及在存在异常值的情况下提供可靠的参数估计。在回归和分类中提出了许多稳健损失函数以及各种计算算法。然而,在现代惩罚性广义线性模型(GLM)中,能提供权重以确定观测值离群状态的稳健估计研究还很有限。本文提出了一个基于损失函数大家族的统一框架,即凹函数和凸函数的复合体(CC-family)。本文研究了 CC 系列的特性,并通过迭代加权凸优化(IRCO)创新性地进行了 CC 估计,IRCO 是稳健线性回归中迭代加权最小二乘法的概括。对于稳健 GLM,IRCO 成为迭代重权 GLM。该统一框架包含惩罚估计和稳健支持向量机(SVM),并通过各种数据应用进行了演示。
{"title":"Unified robust estimation","authors":"Zhu Wang","doi":"10.1111/anzs.12409","DOIUrl":"10.1111/anzs.12409","url":null,"abstract":"<div>\u0000 \u0000 <p>Robust estimation is primarily concerned with providing reliable parameter estimates in the presence of outliers. Numerous robust loss functions have been proposed in regression and classification, along with various computing algorithms. In modern penalised generalised linear models (GLMs), however, there is limited research on robust estimation that can provide weights to determine the outlier status of the observations. This article proposes a unified framework based on a large family of loss functions, a composite of concave and convex functions (CC-family). Properties of the CC-family are investigated, and CC-estimation is innovatively conducted via the iteratively reweighted convex optimisation (IRCO), which is a generalisation of the iteratively reweighted least squares in robust linear regression. For robust GLM, the IRCO becomes the iteratively reweighted GLM. The unified framework contains penalised estimation and robust support vector machine (SVM) and is demonstrated with a variety of data applications.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 1","pages":"77-102"},"PeriodicalIF":1.1,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139953816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent heterogeneity in COVID-19 hospitalisations: a cluster-weighted approach to analyse mortality COVID-19 住院病例的潜在异质性:采用聚类加权法分析死亡率
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-13 DOI: 10.1111/anzs.12407
Paolo Berta, Salvatore Ingrassia, Giorgio Vittadini, Daniele Spinelli

The COVID-19 pandemic caused an unprecedented excess mortality. Since 2020, many studies have focussed on the characteristics of COVID-19 patients who did not survive. From the statistical point of view, what seems to dominate is the large heterogeneity of the populations affected by COVID-19 and the extreme difficulty in identifying subpopulations who died affected by a plurality of contemporary characteristics. In this paper, we propose an extremely flexible approach based on a cluster-weighted model, which allows us to identify latent groups of patients sharing similar characteristics at the moment of hospitalisation as well as a similar mortality. We focus on one of the hardest hit areas in Italy and study the heterogeneity in the population of patients affected by COVID-19 using administrative data on hospitalisations in the first wave of the pandemic. Results highlighted that a model-based clustering approach is essential to understand the complexity of the COVID-19 patients treated by hospitals and who die during hospitalisation.

COVID-19 大流行造成了前所未有的超额死亡率。自 2020 年以来,许多研究重点关注 COVID-19 未存活患者的特征。从统计学的角度来看,受 COVID-19 影响的人群具有很大的异质性,要识别受多种当代特征影响而死亡的亚人群极其困难。在本文中,我们提出了一种基于聚类加权模型的极为灵活的方法,该方法允许我们识别在住院时具有相似特征以及相似死亡率的潜在患者群体。我们将重点放在意大利的重灾区之一,并利用大流行第一波住院治疗的行政数据研究了受 COVID-19 影响的患者群体的异质性。研究结果表明,基于模型的聚类方法对于了解接受医院治疗并在住院期间死亡的 COVID-19 患者的复杂性至关重要。
{"title":"Latent heterogeneity in COVID-19 hospitalisations: a cluster-weighted approach to analyse mortality","authors":"Paolo Berta,&nbsp;Salvatore Ingrassia,&nbsp;Giorgio Vittadini,&nbsp;Daniele Spinelli","doi":"10.1111/anzs.12407","DOIUrl":"10.1111/anzs.12407","url":null,"abstract":"<p>The COVID-19 pandemic caused an unprecedented excess mortality. Since 2020, many studies have focussed on the characteristics of COVID-19 patients who did not survive. From the statistical point of view, what seems to dominate is the large heterogeneity of the populations affected by COVID-19 and the extreme difficulty in identifying subpopulations who died affected by a plurality of contemporary characteristics. In this paper, we propose an extremely flexible approach based on a cluster-weighted model, which allows us to identify latent groups of patients sharing similar characteristics at the moment of hospitalisation as well as a similar mortality. We focus on one of the hardest hit areas in Italy and study the heterogeneity in the population of patients affected by COVID-19 using administrative data on hospitalisations in the first wave of the pandemic. Results highlighted that a model-based clustering approach is essential to understand the complexity of the COVID-19 patients treated by hospitals and who die during hospitalisation.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 1","pages":"1-20"},"PeriodicalIF":1.1,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12407","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139772597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel response model and target selection method with applications to marketing 应用于市场营销的新型响应模型和目标选择方法
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-01-18 DOI: 10.1111/anzs.12406
Y. Cai

Response models used in marketing are not always constructed for later marketing optimisation, which often results in unsatisfactory results in target selection for future marketing activities. To solve this problem, we develop a new binary response model and a new marketing target selection method. The proposed model can predict multiple propensity scores per customer through customer-specific propensity score distributions, which is not possible with existing response models, filling a gap in the literature. The target selection method can determine the best propensity scores from those predicted by the proposed model and use them to select customers for further marketing activities. Our simulation results and application to real marketing data confirm that the performance of the proposed model in target selection is significantly better than that of the existing models, including some popular machine learning methods, which indicate that our method can be very useful in practice.

市场营销中使用的响应模型并不总是为以后的市场营销优化而构建的,这往往会导致未来市场营销活动的目标选择结果不尽如人意。为了解决这个问题,我们开发了一种新的二元响应模型和一种新的营销目标选择方法。所提出的模型可以通过特定客户的倾向得分分布来预测每个客户的多个倾向得分,这是现有响应模型所无法实现的,填补了文献空白。目标选择方法可从所提模型预测的倾向得分中确定最佳倾向得分,并利用这些倾向得分选择客户开展进一步营销活动。我们的仿真结果和对真实营销数据的应用证实,建议模型在目标选择方面的性能明显优于现有模型,包括一些流行的机器学习方法,这表明我们的方法在实践中非常有用。
{"title":"A novel response model and target selection method with applications to marketing","authors":"Y. Cai","doi":"10.1111/anzs.12406","DOIUrl":"10.1111/anzs.12406","url":null,"abstract":"<p>Response models used in marketing are not always constructed for later marketing optimisation, which often results in unsatisfactory results in target selection for future marketing activities. To solve this problem, we develop a new binary response model and a new marketing target selection method. The proposed model can predict multiple propensity scores per customer through customer-specific propensity score distributions, which is not possible with existing response models, filling a gap in the literature. The target selection method can determine the best propensity scores from those predicted by the proposed model and use them to select customers for further marketing activities. Our simulation results and application to real marketing data confirm that the performance of the proposed model in target selection is significantly better than that of the existing models, including some popular machine learning methods, which indicate that our method can be very useful in practice.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 1","pages":"48-76"},"PeriodicalIF":1.1,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12406","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139515508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Australian & New Zealand Journal of Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1