Optimal addition orders of several components can be determined systematically to address order-of-addition problems when active location and dispersion effects are both taken into account. Based on the concept of fiducial generalised pivotal quantities, a new testing procedure is proposed in this paper to identify active dispersion effects from unreplicated order-of-addition experiments. Because the proposed method is free of all nuisance parameters indexed by the requirement set, it is capable of testing multiple dispersion effects. Simulation results show that the proposed method can maintain the empirical sizes close to the nominal level. A paint viscosity study is used to show that the proposed method can be practical. In addition, testable requirement sets are characterised when an order-of-addition orthogonal array is used to design an experiment.
{"title":"Testing multiple dispersion effects from unreplicated order-of-addition experiments","authors":"Shin-Fu Tsai, Shan-Syue He","doi":"10.1111/anzs.12416","DOIUrl":"10.1111/anzs.12416","url":null,"abstract":"<p>Optimal addition orders of several components can be determined systematically to address order-of-addition problems when active location and dispersion effects are both taken into account. Based on the concept of fiducial generalised pivotal quantities, a new testing procedure is proposed in this paper to identify active dispersion effects from unreplicated order-of-addition experiments. Because the proposed method is free of all nuisance parameters indexed by the requirement set, it is capable of testing multiple dispersion effects. Simulation results show that the proposed method can maintain the empirical sizes close to the nominal level. A paint viscosity study is used to show that the proposed method can be practical. In addition, testable requirement sets are characterised when an order-of-addition orthogonal array is used to design an experiment.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"228-248"},"PeriodicalIF":1.1,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12416","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141104106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Where the response variable in a big dataset is consistent with the variable of interest for small area estimation, the big data by itself can provide the estimates for small areas. These estimates are often subject to the coverage and measurement error bias inherited from the big data. However, if a probability survey of the same variable of interest is available, the survey data can be used as a training dataset to develop an algorithm to impute for the data missed by the big data and adjust for measurement errors. In this paper, we outline a methodology for such imputations based on an k-nearest neighbours (kNN) algorithm calibrated to an asymptotically design-unbiased estimate of the national total, and illustrate the use of a training dataset to estimate the imputation bias and the “fixed-k asymptotic” bootstrap to estimate the variance of the small area hybrid estimator. We illustrate the methodology of this paper using a public-use dataset and use it to compare the accuracy and precision of our hybrid estimator with the Fay–Harriot (FH) estimator. Finally, we also examine numerically the accuracy and precision of the FH estimator when the auxiliary variables used in the linking models are subject to undercoverage errors.
摘要当大数据集中的响应变量与小地区估算中的相关变量一致时,大数据本身就可以提供小地区的估算值。这些估算值通常会受到大数据的覆盖范围和测量误差偏差的影响。不过,如果有对相同相关变量的概率调查,则可将调查数据用作训练数据集,以开发算法来估算大数据遗漏的数据并调整测量误差。在本文中,我们概述了一种基于 k 近邻(kNN)算法的此类估算方法,该算法被校准为对全国总量的渐近设计无偏估计,并说明了如何使用训练数据集来估算估算偏差,以及如何使用 "固定-k 渐近 "自举法来估算小范围混合估算器的方差。我们使用一个公共使用数据集来说明本文的方法,并用它来比较我们的混合估算器与费-哈里奥特(FH)估算器的准确性和精确度。最后,我们还从数值上检验了当连接模型中使用的辅助变量受到覆盖不足误差影响时 FH 估算器的准确性和精确度。
{"title":"A calibrated data-driven approach for small area estimation using big data","authors":"Siu-Ming Tam, Shaila Sharmeen","doi":"10.1111/anzs.12414","DOIUrl":"10.1111/anzs.12414","url":null,"abstract":"<div>\u0000 \u0000 <p>Where the response variable in a big dataset is consistent with the variable of interest for small area estimation, the big data by itself can provide the estimates for small areas. These estimates are often subject to the coverage and measurement error bias inherited from the big data. However, if a probability survey of the same variable of interest is available, the survey data can be used as a training dataset to develop an algorithm to impute for the data missed by the big data and adjust for measurement errors. In this paper, we outline a methodology for such imputations based on an <i>k</i>-nearest neighbours (kNN) algorithm calibrated to an asymptotically design-unbiased estimate of the national total, and illustrate the use of a training dataset to estimate the imputation bias and the “fixed-<i>k</i> asymptotic” bootstrap to estimate the variance of the small area hybrid estimator. We illustrate the methodology of this paper using a public-use dataset and use it to compare the accuracy and precision of our hybrid estimator with the Fay–Harriot (FH) estimator. Finally, we also examine numerically the accuracy and precision of the FH estimator when the auxiliary variables used in the linking models are subject to undercoverage errors.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"125-145"},"PeriodicalIF":1.1,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Generalised linear mixed regression models are fundamental in statistics. Modelling random effects that are shared by individuals allows for correlation among those individuals. There are many methods and statistical packages available for analysing data using these models. Most require some form of numerical or analytic approximation because the likelihood function generally involves intractable integrals over the latents. The Bayesian approach avoids this issue by iteratively sampling the full conditional distributions for various blocks of parameters and latent random effects. Depending on the choice of the prior, some full conditionals are recognisable while others are not. In this paper we develop a novel normal approximation for the random effects full conditional, establish its asymptotic correctness and evaluate how well it performs. We make the case for hierarchical binomial and Poisson regression models with canonical link functions, for hierarchical gamma regression models with log link and for other cases. We also develop what we term a sufficient reduction (SR) approach to the Markov Chain Monte Carlo algorithm that allows for making inferences about all model parameters by replacing the full conditional for the latent variables with a considerably reduced dimensional function of the latents. We expect that this approximation could be quite useful in situations where there are a very large number of latent effects, which may be occurring in an increasingly ‘Big Data’ world. In the sequel, we compare our methods with INLA, which is a particularly popular method and which has been shown to be excellent in terms of speed and accuracy across a variety of settings. Our methods appear to be comparable to theirs in terms of accuracy, while INLA was faster, for the settings we considered. In addition, we note that our methods and those of others that involve Gibbs sampling trivially handle parameters that are functions of multiple parameters, while INLA approximations do not. Our primary illustration is for a three-level hierarchical binomial regression model for data on health outcomes for patients who are clustered within physicians who are clustered within particular hospitals or hospital systems.
{"title":"Approximate inferences for Bayesian hierarchical generalised linear regression models","authors":"Brandon Berman, Wesley O. Johnson, Weining Shen","doi":"10.1111/anzs.12412","DOIUrl":"10.1111/anzs.12412","url":null,"abstract":"<div>\u0000 \u0000 <p>Generalised linear mixed regression models are fundamental in statistics. Modelling random effects that are shared by individuals allows for correlation among those individuals. There are many methods and statistical packages available for analysing data using these models. Most require some form of numerical or analytic approximation because the likelihood function generally involves intractable integrals over the latents. The Bayesian approach avoids this issue by iteratively sampling the full conditional distributions for various blocks of parameters and latent random effects. Depending on the choice of the prior, some full conditionals are recognisable while others are not. In this paper we develop a novel normal approximation for the random effects full conditional, establish its asymptotic correctness and evaluate how well it performs. We make the case for hierarchical binomial and Poisson regression models with canonical link functions, for hierarchical gamma regression models with log link and for other cases. We also develop what we term a sufficient reduction (SR) approach to the Markov Chain Monte Carlo algorithm that allows for making inferences about all model parameters by replacing the full conditional for the latent variables with a considerably reduced dimensional function of the latents. We expect that this approximation could be quite useful in situations where there are a very large number of latent effects, which may be occurring in an increasingly ‘Big Data’ world. In the sequel, we compare our methods with INLA, which is a particularly popular method and which has been shown to be excellent in terms of speed and accuracy across a variety of settings. Our methods appear to be comparable to theirs in terms of accuracy, while INLA was faster, for the settings we considered. In addition, we note that our methods and those of others that involve Gibbs sampling trivially handle parameters that are functions of multiple parameters, while INLA approximations do not. Our primary illustration is for a three-level hierarchical binomial regression model for data on health outcomes for patients who are clustered within physicians who are clustered within particular hospitals or hospital systems.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"163-203"},"PeriodicalIF":1.1,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140941801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Semi-supervised learning is being extensively applied to estimate classifiers from training data in which not all the labels of the feature vectors are available. We present gmmsslm, an R package for estimating the Bayes' classifier from such partially classified data in the case where the feature vector has a multivariate Gaussian (normal) distribution in each of the pre-defined classes. Our package implements a recently proposed Gaussian mixture modelling framework that incorporates a missingness mechanism for the missing labels in which the probability of a missing label is represented via a logistic model with covariates that depend on the entropy of the feature vector. Under this framework, it has been shown that the accuracy of the Bayes' classifier formed from the Gaussian mixture model fitted to the partially classified training data can even have lower error rate than if it were estimated from the sample completely classified. This result was established in the particular case of two Gaussian classes with a common covariance matrix. Here we focus on the effective implementation of an algorithm for multiple Gaussian classes with arbitrary covariance matrices. A strategy for initialising the algorithm is discussed and illustrated. The new package is demonstrated on some real data.
摘要半监督学习被广泛应用于从并非所有特征向量标签都可用的训练数据中估计分类器。我们介绍的 gmmsslm 是一个 R 软件包,用于在特征向量在每个预定义类别中都具有多元高斯(正态)分布的情况下,从此类部分分类数据中估计贝叶斯分类器。我们的软件包实现了最近提出的高斯混合建模框架,该框架纳入了缺失标签的缺失机制,其中缺失标签的概率通过一个逻辑模型来表示,该模型的协变量取决于特征向量的熵。在这一框架下,贝叶斯分类器的准确率甚至低于根据完全分类样本估计的准确率。这一结果是在两个具有共同协方差矩阵的高斯类的特殊情况下得出的。在此,我们将重点讨论如何有效地实现具有任意协方差矩阵的多个高斯类的算法。我们讨论并说明了初始化算法的策略。新软件包在一些真实数据上进行了演示。
{"title":"Semi-supervised Gaussian mixture modelling with a missing-data mechanism in R","authors":"Ziyang Lyu, Daniel Ahfock, Ryan Thompson, Geoffrey J. McLachlan","doi":"10.1111/anzs.12413","DOIUrl":"10.1111/anzs.12413","url":null,"abstract":"<p>Semi-supervised learning is being extensively applied to estimate classifiers from training data in which not all the labels of the feature vectors are available. We present <span>gmmsslm</span>, an <span>R</span> package for estimating the Bayes' classifier from such partially classified data in the case where the feature vector has a multivariate Gaussian (normal) distribution in each of the pre-defined classes. Our package implements a recently proposed Gaussian mixture modelling framework that incorporates a missingness mechanism for the missing labels in which the probability of a missing label is represented via a logistic model with covariates that depend on the entropy of the feature vector. Under this framework, it has been shown that the accuracy of the Bayes' classifier formed from the Gaussian mixture model fitted to the partially classified training data can even have lower error rate than if it were estimated from the sample completely classified. This result was established in the particular case of two Gaussian classes with a common covariance matrix. Here we focus on the effective implementation of an algorithm for multiple Gaussian classes with arbitrary covariance matrices. A strategy for initialising the algorithm is discussed and illustrated. The new package is demonstrated on some real data.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"146-162"},"PeriodicalIF":1.1,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12413","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data related to the counting of elements of variable character are frequently encountered in time series studies. This paper brings forward a new class of th-order dependence-driven random coefficient mixed thinning integer-valued autoregressive time series model (DDRCMTINAR()) to deal with such data. Stationarity and ergodicity properties of the proposed model are derived in detail. The unknown parameters are estimated by conditional least squares, and modified quasi-likelihood and asymptotic normality of the obtained parameter estimators is established. The performances of the adopted estimate methods are checked via simulations, which present that modified quasi-likelihood estimators perform better than the conditional least squares considering the proportion of within- estimates in certain regions of the parameter space. The validity and practical utility of the model are investigated by epileptic seizure data and COVID-19 data of suspected cases in China.
{"title":"A class of kth-order dependence-driven random coefficient mixed thinning integer-valued autoregressive process to analyse epileptic seizure data and COVID-19 data","authors":"Xiufang Liu, Dehui Wang, Huaping Chen, Lifang Zhao, Liang Liu","doi":"10.1111/anzs.12411","DOIUrl":"10.1111/anzs.12411","url":null,"abstract":"<div>\u0000 \u0000 <p>Data related to the counting of elements of variable character are frequently encountered in time series studies. This paper brings forward a new class of <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>k</mi>\u0000 </mrow>\u0000 <annotation>$$ k $$</annotation>\u0000 </semantics></math>th-order dependence-driven random coefficient mixed thinning integer-valued autoregressive time series model (DDRCMTINAR(<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>k</mi>\u0000 </mrow>\u0000 <annotation>$$ k $$</annotation>\u0000 </semantics></math>)) to deal with such data. Stationarity and ergodicity properties of the proposed model are derived in detail. The unknown parameters are estimated by conditional least squares, and modified quasi-likelihood and asymptotic normality of the obtained parameter estimators is established. The performances of the adopted estimate methods are checked via simulations, which present that modified quasi-likelihood estimators perform better than the conditional least squares considering the proportion of within-<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>Ω</mi>\u0000 </mrow>\u0000 <annotation>$$ Omega $$</annotation>\u0000 </semantics></math> estimates in certain regions of the parameter space. The validity and practical utility of the model are investigated by epileptic seizure data and COVID-19 data of suspected cases in China.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"249-280"},"PeriodicalIF":1.1,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140599831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a new class of priors for Bayesian hypothesis testing, which we name ‘cake priors’. These priors circumvent the Jeffreys–Lindley paradox (also called Bartlett's paradox) a problem associated with the use of diffuse priors leading to nonsensical statistical inferences. Cake priors allow the use of diffuse priors (having one's cake) while achieving theoretically justified inferences (eating it too). We demonstrate this methodology for Bayesian hypotheses tests for various common scenarios. The resulting Bayesian test statistic takes the form of a penalised likelihood ratio test statistic. Under typical regularity conditions, we show that Bayesian hypothesis tests based on cake priors are Chernoff consistent, that is, achieve zero type I and II error probabilities asymptotically. We also discuss Lindley's paradox and argue that the paradox occurs with small and vanishing probability as sample size increases.
{"title":"Bayesian hypothesis tests with diffuse priors: Can we have our cake and eat it too?","authors":"J. T. Ormerod, M. Stewart, W. Yu, S. E. Romanes","doi":"10.1111/anzs.12410","DOIUrl":"10.1111/anzs.12410","url":null,"abstract":"<p>We propose a new class of priors for Bayesian hypothesis testing, which we name ‘cake priors’. These priors circumvent the Jeffreys–Lindley paradox (also called Bartlett's paradox) a problem associated with the use of diffuse priors leading to nonsensical statistical inferences. Cake priors allow the use of diffuse priors (having one's cake) while achieving theoretically justified inferences (eating it too). We demonstrate this methodology for Bayesian hypotheses tests for various common scenarios. The resulting Bayesian test statistic takes the form of a penalised likelihood ratio test statistic. Under typical regularity conditions, we show that Bayesian hypothesis tests based on cake priors are Chernoff consistent, that is, achieve zero type I and II error probabilities asymptotically. We also discuss Lindley's paradox and argue that the paradox occurs with small and vanishing probability as sample size increases.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"204-227"},"PeriodicalIF":1.1,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12410","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140182403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Under the -order generalised random coefficient autoregressive (GRCA()) model with random coefficients we propose a conditional self-weighted estimator of . We investigate the asymptotic normality of this estimator with possibly heavy-tailed random variables. Furthermore, a Wald test statistic is constructed for the linear restriction on the parameters. In addition, the simulation experiments are carried out to assess the finite sample performance of theoretical results. Finally, a real data analysis about the increase (%) in the number of construction projects this year over the same period of last year is provided.
{"title":"Asymptotics for the conditional self-weighted \u0000 \u0000 \u0000 M\u0000 \u0000 $$ M $$\u0000 estimator of GRCA(\u0000 \u0000 \u0000 p\u0000 \u0000 $$ p $$\u0000 ) models and its statistical inference","authors":"Chi Yao, Wei Yu, Xuejun Wang","doi":"10.1111/anzs.12408","DOIUrl":"10.1111/anzs.12408","url":null,"abstract":"<div>\u0000 \u0000 <p>Under the <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>p</mi>\u0000 </mrow>\u0000 <annotation>$$ p $$</annotation>\u0000 </semantics></math>-order generalised random coefficient autoregressive (GRCA(<math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>p</mi>\u0000 </mrow>\u0000 <annotation>$$ p $$</annotation>\u0000 </semantics></math>)) model with random coefficients <math>\u0000 <semantics>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>Φ</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>t</mi>\u0000 </mrow>\u0000 </msub>\u0000 <mo>,</mo>\u0000 </mrow>\u0000 <annotation>$$ {boldsymbol{Phi}}_t, $$</annotation>\u0000 </semantics></math> we propose a conditional self-weighted <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>M</mi>\u0000 </mrow>\u0000 <annotation>$$ M $$</annotation>\u0000 </semantics></math> estimator of <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>E</mi>\u0000 <msub>\u0000 <mrow>\u0000 <mi>Φ</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>t</mi>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$$ mathrm{E}{boldsymbol{Phi}}_t $$</annotation>\u0000 </semantics></math>. We investigate the asymptotic normality of this estimator with possibly heavy-tailed random variables. Furthermore, a Wald test statistic is constructed for the linear restriction on the parameters. In addition, the simulation experiments are carried out to assess the finite sample performance of theoretical results. Finally, a real data analysis about the increase (%) in the number of construction projects this year over the same period of last year is provided.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 1","pages":"103-124"},"PeriodicalIF":1.1,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139954304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robust estimation is primarily concerned with providing reliable parameter estimates in the presence of outliers. Numerous robust loss functions have been proposed in regression and classification, along with various computing algorithms. In modern penalised generalised linear models (GLMs), however, there is limited research on robust estimation that can provide weights to determine the outlier status of the observations. This article proposes a unified framework based on a large family of loss functions, a composite of concave and convex functions (CC-family). Properties of the CC-family are investigated, and CC-estimation is innovatively conducted via the iteratively reweighted convex optimisation (IRCO), which is a generalisation of the iteratively reweighted least squares in robust linear regression. For robust GLM, the IRCO becomes the iteratively reweighted GLM. The unified framework contains penalised estimation and robust support vector machine (SVM) and is demonstrated with a variety of data applications.
摘要稳健估计主要涉及在存在异常值的情况下提供可靠的参数估计。在回归和分类中提出了许多稳健损失函数以及各种计算算法。然而,在现代惩罚性广义线性模型(GLM)中,能提供权重以确定观测值离群状态的稳健估计研究还很有限。本文提出了一个基于损失函数大家族的统一框架,即凹函数和凸函数的复合体(CC-family)。本文研究了 CC 系列的特性,并通过迭代加权凸优化(IRCO)创新性地进行了 CC 估计,IRCO 是稳健线性回归中迭代加权最小二乘法的概括。对于稳健 GLM,IRCO 成为迭代重权 GLM。该统一框架包含惩罚估计和稳健支持向量机(SVM),并通过各种数据应用进行了演示。
{"title":"Unified robust estimation","authors":"Zhu Wang","doi":"10.1111/anzs.12409","DOIUrl":"10.1111/anzs.12409","url":null,"abstract":"<div>\u0000 \u0000 <p>Robust estimation is primarily concerned with providing reliable parameter estimates in the presence of outliers. Numerous robust loss functions have been proposed in regression and classification, along with various computing algorithms. In modern penalised generalised linear models (GLMs), however, there is limited research on robust estimation that can provide weights to determine the outlier status of the observations. This article proposes a unified framework based on a large family of loss functions, a composite of concave and convex functions (CC-family). Properties of the CC-family are investigated, and CC-estimation is innovatively conducted via the iteratively reweighted convex optimisation (IRCO), which is a generalisation of the iteratively reweighted least squares in robust linear regression. For robust GLM, the IRCO becomes the iteratively reweighted GLM. The unified framework contains penalised estimation and robust support vector machine (SVM) and is demonstrated with a variety of data applications.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 1","pages":"77-102"},"PeriodicalIF":1.1,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139953816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The COVID-19 pandemic caused an unprecedented excess mortality. Since 2020, many studies have focussed on the characteristics of COVID-19 patients who did not survive. From the statistical point of view, what seems to dominate is the large heterogeneity of the populations affected by COVID-19 and the extreme difficulty in identifying subpopulations who died affected by a plurality of contemporary characteristics. In this paper, we propose an extremely flexible approach based on a cluster-weighted model, which allows us to identify latent groups of patients sharing similar characteristics at the moment of hospitalisation as well as a similar mortality. We focus on one of the hardest hit areas in Italy and study the heterogeneity in the population of patients affected by COVID-19 using administrative data on hospitalisations in the first wave of the pandemic. Results highlighted that a model-based clustering approach is essential to understand the complexity of the COVID-19 patients treated by hospitals and who die during hospitalisation.
{"title":"Latent heterogeneity in COVID-19 hospitalisations: a cluster-weighted approach to analyse mortality","authors":"Paolo Berta, Salvatore Ingrassia, Giorgio Vittadini, Daniele Spinelli","doi":"10.1111/anzs.12407","DOIUrl":"10.1111/anzs.12407","url":null,"abstract":"<p>The COVID-19 pandemic caused an unprecedented excess mortality. Since 2020, many studies have focussed on the characteristics of COVID-19 patients who did not survive. From the statistical point of view, what seems to dominate is the large heterogeneity of the populations affected by COVID-19 and the extreme difficulty in identifying subpopulations who died affected by a plurality of contemporary characteristics. In this paper, we propose an extremely flexible approach based on a cluster-weighted model, which allows us to identify latent groups of patients sharing similar characteristics at the moment of hospitalisation as well as a similar mortality. We focus on one of the hardest hit areas in Italy and study the heterogeneity in the population of patients affected by COVID-19 using administrative data on hospitalisations in the first wave of the pandemic. Results highlighted that a model-based clustering approach is essential to understand the complexity of the COVID-19 patients treated by hospitals and who die during hospitalisation.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 1","pages":"1-20"},"PeriodicalIF":1.1,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12407","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139772597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Response models used in marketing are not always constructed for later marketing optimisation, which often results in unsatisfactory results in target selection for future marketing activities. To solve this problem, we develop a new binary response model and a new marketing target selection method. The proposed model can predict multiple propensity scores per customer through customer-specific propensity score distributions, which is not possible with existing response models, filling a gap in the literature. The target selection method can determine the best propensity scores from those predicted by the proposed model and use them to select customers for further marketing activities. Our simulation results and application to real marketing data confirm that the performance of the proposed model in target selection is significantly better than that of the existing models, including some popular machine learning methods, which indicate that our method can be very useful in practice.
{"title":"A novel response model and target selection method with applications to marketing","authors":"Y. Cai","doi":"10.1111/anzs.12406","DOIUrl":"10.1111/anzs.12406","url":null,"abstract":"<p>Response models used in marketing are not always constructed for later marketing optimisation, which often results in unsatisfactory results in target selection for future marketing activities. To solve this problem, we develop a new binary response model and a new marketing target selection method. The proposed model can predict multiple propensity scores per customer through customer-specific propensity score distributions, which is not possible with existing response models, filling a gap in the literature. The target selection method can determine the best propensity scores from those predicted by the proposed model and use them to select customers for further marketing activities. Our simulation results and application to real marketing data confirm that the performance of the proposed model in target selection is significantly better than that of the existing models, including some popular machine learning methods, which indicate that our method can be very useful in practice.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 1","pages":"48-76"},"PeriodicalIF":1.1,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12406","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139515508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}