首页 > 最新文献

Australian & New Zealand Journal of Statistics最新文献

英文 中文
Proportional inverse Gaussian distribution: A new tool for analysing continuous proportional data 比例反高斯分布:分析连续比例数据的新工具
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2021-11-23 DOI: 10.1111/anzs.12345
Pengyi Liu, Guo-Liang Tian, Kam Chuen Yuen, Chi Zhang, Man-Lai Tang

Outcomes in the form of rates, fractions, proportions and percentages often appear in various fields. Existing beta and simplex distributions are frequently unable to exhibit satisfactory performances in fitting such continuous data. This paper aims to develop the normalised inverse Gaussian (N-IG) distribution proposed by Lijoi, Mena & Prünster (2005, Journal of the American Statistical Association, 100, 1278–1291) as a new tool for analysing continuous proportional data in (0,1) and renames the N-IG as proportional inverse Gaussian (PIG) distribution. Our main contributions include: (i) To overcome the difficulty of an integral in the PIG density function, we propose a novel minorisation–maximisation (MM) algorithm via the continuous version of Jensen's inequality to calculate the maximum likelihood estimates of the parameters in the PIG distribution; (ii) We also develop an MM algorithm aided by the gradient descent algorithm for the PIG regression model, which allows us to explore the relationship between a set of covariates with the mean parameter; (iii) Both the comparative studies and the real data analyses show that the PIG distribution is better when comparing with the beta and simplex distributions in terms of the AIC, the Cramér–von Mises and the Kolmogorov–Smirnov tests. In addition, bootstrap confidence intervals and testing hypothesis on the symmetry of the PIG density are also presented. Simulation studies are conducted and the hospital stay data of Barcelona in 1988 and 1990 are analysed to illustrate the proposed methods.

比率、分数、比例和百分比形式的结果经常出现在各个领域。现有的beta和单纯形分布在拟合此类连续数据时往往不能表现出令人满意的性能。本文旨在发展Lijoi, Mena &提出的归一化逆高斯分布(N-IG)。pr nster (2005, Journal of American Statistical Association, 100, 1278-1291)作为分析(0,1)中连续比例数据的新工具,并将N-IG重命名为比例逆高斯分布(PIG)。我们的主要贡献包括:(i)为了克服PIG密度函数中积分的困难,我们提出了一种新的最小化-最大化(MM)算法,该算法通过Jensen不等式的连续版本来计算PIG分布中参数的最大似然估计;(ii)我们还开发了一种由梯度下降算法辅助的MM算法,用于PIG回归模型,这使我们能够探索一组协变量与平均参数之间的关系;(iii)对比研究和实际数据分析均表明,在AIC、cram - von Mises和Kolmogorov-Smirnov检验方面,PIG分布优于beta分布和单纯形分布。此外,还提出了自举置信区间和关于PIG密度对称性的检验假设。本文进行了模拟研究,并分析了巴塞罗那1988年和1990年的住院数据,以说明所提出的方法。
{"title":"Proportional inverse Gaussian distribution: A new tool for analysing continuous proportional data","authors":"Pengyi Liu,&nbsp;Guo-Liang Tian,&nbsp;Kam Chuen Yuen,&nbsp;Chi Zhang,&nbsp;Man-Lai Tang","doi":"10.1111/anzs.12345","DOIUrl":"10.1111/anzs.12345","url":null,"abstract":"<div>\u0000 \u0000 <p>Outcomes in the form of rates, fractions, proportions and percentages often appear in various fields. Existing beta and simplex distributions are frequently unable to exhibit satisfactory performances in fitting such continuous data. This paper aims to develop the normalised inverse Gaussian (N-IG) distribution proposed by Lijoi, Mena &amp; Prünster (2005, Journal of the American Statistical Association, <b>100</b>, 1278–1291) as a new tool for analysing continuous proportional data in (0,1) and renames the N-IG as proportional inverse Gaussian (PIG) distribution. Our main contributions include: (i) To overcome the difficulty of an integral in the PIG density function, we propose a novel minorisation–maximisation (MM) algorithm via the continuous version of Jensen's inequality to calculate the maximum likelihood estimates of the parameters in the PIG distribution; (ii) We also develop an MM algorithm aided by the gradient descent algorithm for the PIG regression model, which allows us to explore the relationship between a set of covariates with the mean parameter; (iii) Both the comparative studies and the real data analyses show that the PIG distribution is better when comparing with the beta and simplex distributions in terms of the AIC, the Cramér–von Mises and the Kolmogorov–Smirnov tests. In addition, bootstrap confidence intervals and testing hypothesis on the symmetry of the PIG density are also presented. Simulation studies are conducted and the hospital stay data of Barcelona in 1988 and 1990 are analysed to illustrate the proposed methods.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 4","pages":"579-605"},"PeriodicalIF":1.1,"publicationDate":"2021-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87974708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
BNPdensity: Bayesian nonparametric mixture modelling in R bnp密度:贝叶斯非参数混合建模
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2021-11-17 DOI: 10.1111/anzs.12342
J. Arbel, G. Kon Kam King, A. Lijoi, L. Nieto-Barajas, I. Prünster

Robust statistical data modelling under potential model mis-specification often requires leaving the parametric world for the nonparametric. In the latter, parameters are infinite dimensional objects such as functions, probability distributions or infinite vectors. In the Bayesian nonparametric approach, prior distributions are designed for these parameters, which provide a handle to manage the complexity of nonparametric models in practice. However, most modern Bayesian nonparametric models seem often out of reach to practitioners, as inference algorithms need careful design to deal with the infinite number of parameters. The aim of this work is to facilitate the journey by providing computational tools for Bayesian nonparametric inference. The article describes a set of functions available in the R package BNPdensity in order to carry out density estimation with an infinite mixture model, including all types of censored data. The package provides access to a large class of such models based on normalised random measures, which represent a generalisation of the popular Dirichlet process mixture. One striking advantage of this generalisation is that it offers much more robust priors on the number of clusters than the Dirichlet. Another crucial advantage is the complete flexibility in specifying the prior for the scale and location parameters of the clusters, because conjugacy is not required. Inference is performed using a theoretically grounded approximate sampling methodology known as the Ferguson & Klass algorithm. The package also offers several goodness-of-fit diagnostics such as QQ plots, including a cross-validation criterion, the conditional predictive ordinate. The proposed methodology is illustrated on a classical ecological risk assessment method called the species sensitivity distribution problem, showcasing the benefits of the Bayesian nonparametric framework.

在潜在的模型错误规范下,稳健的统计数据建模通常需要离开参数世界而进入非参数世界。在后者中,参数是无限维对象,如函数、概率分布或无限向量。在贝叶斯非参数方法中,为这些参数设计了先验分布,为实际中管理非参数模型的复杂性提供了一个把柄。然而,大多数现代贝叶斯非参数模型对于实践者来说似乎经常是遥不可及的,因为推理算法需要仔细设计来处理无限数量的参数。这项工作的目的是通过为贝叶斯非参数推理提供计算工具来促进这一过程。本文描述了R包BNPdensity中可用的一组函数,用于对无限混合模型(包括所有类型的截尾数据)进行密度估计。该包提供了访问一个大的类这样的模型基于标准化的随机措施,这代表了流行的狄利克雷过程混合物的推广。这种泛化的一个显著优点是,它提供了比狄利克雷更健壮的聚类数量先验。另一个关键的优点是在指定集群的规模和位置参数的先验时完全灵活,因为不需要共轭。推理是使用一种被称为弗格森(Ferguson)的理论基础近似抽样方法进行的。Klass算法。该软件包还提供了一些适合度诊断,如QQ图,包括交叉验证标准,条件预测坐标。该方法以物种敏感性分布问题为例,展示了贝叶斯非参数框架的优越性。
{"title":"BNPdensity: Bayesian nonparametric mixture modelling in R","authors":"J. Arbel,&nbsp;G. Kon Kam King,&nbsp;A. Lijoi,&nbsp;L. Nieto-Barajas,&nbsp;I. Prünster","doi":"10.1111/anzs.12342","DOIUrl":"10.1111/anzs.12342","url":null,"abstract":"<div>\u0000 \u0000 <p>Robust statistical data modelling under potential model mis-specification often requires leaving the parametric world for the nonparametric. In the latter, parameters are infinite dimensional objects such as functions, probability distributions or infinite vectors. In the Bayesian nonparametric approach, prior distributions are designed for these parameters, which provide a handle to manage the complexity of nonparametric models in practice. However, most modern Bayesian nonparametric models seem often out of reach to practitioners, as inference algorithms need careful design to deal with the infinite number of parameters. The aim of this work is to facilitate the journey by providing computational tools for Bayesian nonparametric inference. The article describes a set of functions available in the <span>R</span> package <span>BNPdensity</span> in order to carry out density estimation with an infinite mixture model, including all types of censored data. The package provides access to a large class of such models based on normalised random measures, which represent a generalisation of the popular Dirichlet process mixture. One striking advantage of this generalisation is that it offers much more robust priors on the number of clusters than the Dirichlet. Another crucial advantage is the complete flexibility in specifying the prior for the scale and location parameters of the clusters, because conjugacy is not required. Inference is performed using a theoretically grounded approximate sampling methodology known as the Ferguson &amp; Klass algorithm. The package also offers several goodness-of-fit diagnostics such as QQ plots, including a cross-validation criterion, the conditional predictive ordinate. The proposed methodology is illustrated on a classical ecological risk assessment method called the species sensitivity distribution problem, showcasing the benefits of the Bayesian nonparametric framework.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 3","pages":"542-564"},"PeriodicalIF":1.1,"publicationDate":"2021-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90676545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Experimental design in practice: The importance of blocking and treatment structures 实践中的实验设计:阻塞和处理结构的重要性
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2021-11-08 DOI: 10.1111/anzs.12343
E.R. Williams, C.G. Forde, J. Imaki, K. Oelkers

Experimental design and analysis has evolved substantially over the last 100 years, driven to a large extent by the power and availability of the computer. To demonstrate this development and encourage the use of experimental design in practice, three experiments from different research areas are presented. In these examples multiple blocking factors have been employed and they show how extraneous variation can be accommodated and interpreted. The examples are used to discuss the importance of blocking and treatment structures in the conduct of designed experiments.

在过去的100年里,实验设计和分析已经有了很大的发展,很大程度上是由计算机的能力和可用性驱动的。为了展示这一发展并鼓励在实践中使用实验设计,本文介绍了来自不同研究领域的三个实验。在这些例子中,多个阻碍因素被采用,它们显示了如何适应和解释外来的变化。通过实例讨论了阻塞和处理结构在设计实验中的重要性。
{"title":"Experimental design in practice: The importance of blocking and treatment structures","authors":"E.R. Williams,&nbsp;C.G. Forde,&nbsp;J. Imaki,&nbsp;K. Oelkers","doi":"10.1111/anzs.12343","DOIUrl":"10.1111/anzs.12343","url":null,"abstract":"<div>\u0000 \u0000 <p>Experimental design and analysis has evolved substantially over the last 100 years, driven to a large extent by the power and availability of the computer. To demonstrate this development and encourage the use of experimental design in practice, three experiments from different research areas are presented. In these examples multiple blocking factors have been employed and they show how extraneous variation can be accommodated and interpreted. The examples are used to discuss the importance of blocking and treatment structures in the conduct of designed experiments.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 3","pages":"455-467"},"PeriodicalIF":1.1,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79888954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Accelerating adaptation in the adaptive Metropolis–Hastings random walk algorithm 自适应Metropolis-Hastings随机漫步算法中的加速自适应
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2021-11-03 DOI: 10.1111/anzs.12344
Simon E.F. Spencer

The Metropolis–Hastings random walk algorithm remains popular with practitioners due to the wide variety of situations in which it can be successfully applied and the extreme ease with which it can be implemented. Adaptive versions of the algorithm use information from the early iterations of the Markov chain to improve the efficiency of the proposal. The aim of this paper is to reduce the number of iterations needed to adapt the proposal to the target, which is particularly important when the likelihood is time-consuming to evaluate. First, the accelerated shaping algorithm is a generalisation of both the adaptive proposal and adaptive Metropolis algorithms. It is designed to remove, from the estimate of the covariance matrix of the target, misleading information from the start of the chain. Second, the accelerated scaling algorithm rapidly changes the scale of the proposal to achieve a target acceptance rate. The usefulness of these approaches is illustrated with a range of examples.

大都会-黑斯廷斯随机游走算法仍然受到实践者的欢迎,因为它可以在各种各样的情况下成功应用,并且可以极其容易地实现。该算法的自适应版本使用来自马尔可夫链的早期迭代的信息来提高建议的效率。本文的目的是减少使建议适应目标所需的迭代次数,当评估可能性非常耗时时,这一点尤为重要。首先,加速整形算法是自适应proposal算法和自适应Metropolis算法的推广。它的目的是从目标的协方差矩阵的估计中去除从链开始的误导性信息。其次,加速缩放算法快速改变提案的尺度,以达到目标接受率。通过一系列例子说明了这些方法的有用性。
{"title":"Accelerating adaptation in the adaptive Metropolis–Hastings random walk algorithm","authors":"Simon E.F. Spencer","doi":"10.1111/anzs.12344","DOIUrl":"10.1111/anzs.12344","url":null,"abstract":"<p>The Metropolis–Hastings random walk algorithm remains popular with practitioners due to the wide variety of situations in which it can be successfully applied and the extreme ease with which it can be implemented. Adaptive versions of the algorithm use information from the early iterations of the Markov chain to improve the efficiency of the proposal. The aim of this paper is to reduce the number of iterations needed to adapt the proposal to the target, which is particularly important when the likelihood is time-consuming to evaluate. First, the accelerated shaping algorithm is a generalisation of both the adaptive proposal and adaptive Metropolis algorithms. It is designed to remove, from the estimate of the covariance matrix of the target, misleading information from the start of the chain. Second, the accelerated scaling algorithm rapidly changes the scale of the proposal to achieve a target acceptance rate. The usefulness of these approaches is illustrated with a range of examples.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 3","pages":"468-484"},"PeriodicalIF":1.1,"publicationDate":"2021-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12344","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76002648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Variable selection using penalised likelihoods for point patterns on a linear network 使用惩罚似然对线性网络上的点模式进行变量选择
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2021-10-18 DOI: 10.1111/anzs.12341
Suman Rakshit, Greg McSwiggan, Gopalan Nair, Adrian Baddeley

Motivated by the analysis of a comprehensive database of road traffic accidents, we investigate methods of variable selection for spatial point process models on a linear network. The original data may include explanatory spatial covariates, such as road curvature, and ‘mark’ variables attributed to individual accidents, such as accident severity. The treatment of mark variables is new. Variable selection is applied to the canonical covariates, which may include spatial covariate effects, mark effects and mark-covariate interactions. We approximate the likelihood of the point process model by that of a generalised linear model, in such a way that spatial covariates and marks are both associated with canonical covariates. We impose a convex penalty on the log likelihood, principally the elastic-net penalty, and maximise the penalised loglikelihood by cyclic coordinate ascent. A simulation study compares the performances of the lasso, ridge regression and elastic-net methods of variable selection on their ability to select variables correctly, and on their bias and standard error. Standard techniques for selecting the regularisation parameter γ often yielded unsatisfactory results. We propose two new rules for selecting γ which are designed to have better performance. The methods are tested on a small dataset on crimes in a Chicago neighbourhood, and applied to a large dataset of road traffic accidents in Western Australia.

通过对道路交通事故综合数据库的分析,研究了线性网络空间点过程模型的变量选择方法。原始数据可能包括解释性空间协变量,如道路曲率,以及归因于个别事故的“标记”变量,如事故严重程度。标记变量的处理是新的。变量选择应用于典型协变量,其中可能包括空间协变量效应、标记效应和标记-协变量相互作用。我们通过广义线性模型近似点过程模型的似然,以这样一种方式,空间协变量和标记都与正则协变量相关联。我们在对数似然上施加一个凸惩罚,主要是弹性网惩罚,并通过循环坐标上升最大化惩罚的对数似然。仿真研究比较了套索、脊回归和弹性网三种变量选择方法正确选择变量的能力,以及它们的偏差和标准误差。选择正则化参数γ的标准技术常常产生不满意的结果。我们提出了两个新的选择γ的规则,它们具有更好的性能。这些方法在芝加哥社区的一个小型犯罪数据集上进行了测试,并应用于西澳大利亚州的一个大型道路交通事故数据集。
{"title":"Variable selection using penalised likelihoods for point patterns on a linear network","authors":"Suman Rakshit,&nbsp;Greg McSwiggan,&nbsp;Gopalan Nair,&nbsp;Adrian Baddeley","doi":"10.1111/anzs.12341","DOIUrl":"10.1111/anzs.12341","url":null,"abstract":"<div>\u0000 \u0000 <p>Motivated by the analysis of a comprehensive database of road traffic accidents, we investigate methods of variable selection for spatial point process models on a linear network. The original data may include explanatory spatial covariates, such as road curvature, and ‘mark’ variables attributed to individual accidents, such as accident severity. The treatment of mark variables is new. Variable selection is applied to the canonical covariates, which may include spatial covariate effects, mark effects and mark-covariate interactions. We approximate the likelihood of the point process model by that of a generalised linear model, in such a way that spatial covariates and marks are both associated with canonical covariates. We impose a convex penalty on the log likelihood, principally the elastic-net penalty, and maximise the penalised loglikelihood by cyclic coordinate ascent. A simulation study compares the performances of the lasso, ridge regression and elastic-net methods of variable selection on their ability to select variables correctly, and on their bias and standard error. Standard techniques for selecting the regularisation parameter <i>γ</i> often yielded unsatisfactory results. We propose two new rules for selecting <i>γ</i> which are designed to have better performance. The methods are tested on a small dataset on crimes in a Chicago neighbourhood, and applied to a large dataset of road traffic accidents in Western Australia.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 3","pages":"417-454"},"PeriodicalIF":1.1,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90533201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
ECM algorithm for estimating vector ARMA model with variance gamma distribution and possible unbounded density 用ECM算法估计具有方差分布和可能无界密度的向量ARMA模型
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2021-10-18 DOI: 10.1111/anzs.12340
Thanakorn Nitithumbundit, Jennifer S.K. Chan

The simultaneous analysis of several financial time series is salient in portfolio setting and risk management. This paper proposes a novel alternating expectation conditional maximisation (AECM) algorithm to estimate the vector autoregressive moving average (VARMA) model with variance gamma (VG) error distribution in the multivariate skewed setting. We explain why the VARMA-VG model is suitable for high-frequency returns (HFRs) because VG distribution provides thick tails to capture the high kurtosis in the data and unbounded central density further captures the majority of near-zero HFRs. The distribution can also be expressed in normal-mean-variance mixtures to facilitate model implementation using the Bayesian or expectation maximisation (EM) approach. We adopt the EM approach to avoid the time-consuming Markov chain Monto Carlo sampling and solve the unbounded density problem in the classical maximum likelihood estimation. We conduct extensive simulation studies to evaluate the accuracy of the proposed AECM estimator and apply the models to analyse the dependency between two HFR series from the time zones that only differ by one hour.

同时分析多个金融时间序列在投资组合设置和风险管理中具有重要意义。本文提出了一种新的交替期望条件最大化(AECM)算法,用于估计多元偏态设置下具有方差伽玛(VG)误差分布的向量自回归移动平均(VARMA)模型。我们解释了为什么VARMA-VG模型适用于高频回报(HFRs),因为VG分布提供了厚尾来捕获数据中的高峰度,无界中心密度进一步捕获了大多数接近零的HFRs。分布也可以用正态-均值-方差混合表示,以方便使用贝叶斯或期望最大化(EM)方法实现模型。采用EM方法避免了耗时的马尔可夫链蒙特卡罗采样,解决了经典极大似然估计中的无界密度问题。我们进行了广泛的模拟研究,以评估所提出的AECM估计器的准确性,并应用模型来分析两个仅相差一小时的时区HFR序列之间的相关性。
{"title":"ECM algorithm for estimating vector ARMA model with variance gamma distribution and possible unbounded density","authors":"Thanakorn Nitithumbundit,&nbsp;Jennifer S.K. Chan","doi":"10.1111/anzs.12340","DOIUrl":"https://doi.org/10.1111/anzs.12340","url":null,"abstract":"<div>\u0000 \u0000 <p>The simultaneous analysis of several financial time series is salient in portfolio setting and risk management. This paper proposes a novel alternating expectation conditional maximisation (AECM) algorithm to estimate the vector autoregressive moving average (VARMA) model with variance gamma (VG) error distribution in the multivariate skewed setting. We explain why the VARMA-VG model is suitable for high-frequency returns (HFRs) because VG distribution provides thick tails to capture the high kurtosis in the data and unbounded central density further captures the majority of near-zero HFRs. The distribution can also be expressed in normal-mean-variance mixtures to facilitate model implementation using the Bayesian or expectation maximisation (EM) approach. We adopt the EM approach to avoid the time-consuming Markov chain Monto Carlo sampling and solve the unbounded density problem in the classical maximum likelihood estimation. We conduct extensive simulation studies to evaluate the accuracy of the proposed AECM estimator and apply the models to analyse the dependency between two HFR series from the time zones that only differ by one hour.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 3","pages":"485-516"},"PeriodicalIF":1.1,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137538704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Inverse G-Wishart distribution and variational message passing 逆G-Wishart分布与变分消息传递
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2021-10-07 DOI: 10.1111/anzs.12339
Luca Maestrini, Matt P. Wand

Message passing on a factor graph is a powerful paradigm for the coding of approximate inference algorithms for arbitrarily large graphical models. The notion of a factor graph fragment allows for compartmentalisation of algebra and computer code. We show that the Inverse G-Wishart family of distributions enables fundamental variational message passing factor graph fragments to be expressed elegantly and succinctly. Such fragments arise in models for which approximate inference concerning covariance matrix or variance parameters is made, and are ubiquitous in contemporary statistics and machine learning.

在因子图上传递消息是为任意大型图形模型编写近似推理算法的强大范例。因子图片段的概念允许代数和计算机代码的划分。我们证明了逆G-Wishart分布族使基本变分消息传递因子图片段能够优雅而简洁地表达。这种片段出现在对协方差矩阵或方差参数进行近似推理的模型中,在当代统计学和机器学习中无处不在。
{"title":"The Inverse G-Wishart distribution and variational message passing","authors":"Luca Maestrini,&nbsp;Matt P. Wand","doi":"10.1111/anzs.12339","DOIUrl":"10.1111/anzs.12339","url":null,"abstract":"<div>\u0000 \u0000 <p>Message passing on a factor graph is a powerful paradigm for the coding of approximate inference algorithms for arbitrarily large graphical models. The notion of a factor graph fragment allows for compartmentalisation of algebra and computer code. We show that the Inverse G-Wishart family of distributions enables fundamental variational message passing factor graph fragments to be expressed elegantly and succinctly. Such fragments arise in models for which approximate inference concerning covariance matrix or variance parameters is made, and are ubiquitous in contemporary statistics and machine learning.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 3","pages":"517-541"},"PeriodicalIF":1.1,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81925035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An adequacy approach for deciding the number of clusters for OTRIMLE robust Gaussian mixture-based clustering 基于OTRIMLE鲁棒高斯混合聚类的聚类数量决定的充分性方法
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2021-09-03 DOI: 10.1111/anzs.12338
Christian Hennig, Pietro Coretto

We introduce a new approach to deciding the number of clusters. The approach is applied to Optimally Tuned Robust Improper Maximum Likelihood Estimation (OTRIMLE; Coretto & Hennig, Journal of the American Statistical Association 111, 1648–1659) of a Gaussian mixture model allowing for observations to be classified as ‘noise’, but it can be applied to other clustering methods as well. The quality of a clustering is assessed by a statistic Q that measures how close the within-cluster distributions are to elliptical unimodal distributions that have the only mode in the mean. This non-parametric measure allows for non-Gaussian clusters as long as they have a good quality according to Q. The simplicity of a model is assessed by a measure S that prefers a smaller number of clusters unless additional clusters can reduce the estimated noise proportion substantially. The simplest model is then chosen that is adequate for the data in the sense that its observed value of Q is not significantly larger than what is expected for data truly generated from the fitted model, as can be assessed by parametric bootstrap. The approach is compared with model-based clustering using the Bayesian information criterion (BIC) and the integrated complete likelihood (ICL) in a simulation study and on two real data sets.

我们介绍了一种确定集群数量的新方法。该方法应用于最优调谐鲁棒不当极大似然估计(OTRIMLE;Coretto,Hennig,《美国统计协会杂志》(Journal of American Statistical Association),第111期,1648-1659期),他提出了一种高斯混合模型,该模型允许将观测结果归类为“噪声”,但它也可以应用于其他聚类方法。聚类的质量是通过统计量Q来评估的,该统计量Q测量聚类内分布与椭圆单峰分布的接近程度,椭圆单峰分布的唯一模式是在平均值中。这种非参数度量允许非高斯聚类,只要它们根据q具有良好的质量。模型的简单性由度量S评估,该度量S倾向于较少数量的聚类,除非额外的聚类可以大幅降低估计的噪声比例。然后选择最简单的模型,该模型适合于数据,因为其观察到的Q值不会显著大于从拟合模型真正生成的数据的预期值,可以通过参数自举来评估。在仿真研究和两个真实数据集上,将该方法与基于贝叶斯信息准则(BIC)和集成完全似然(ICL)的模型聚类方法进行了比较。
{"title":"An adequacy approach for deciding the number of clusters for OTRIMLE robust Gaussian mixture-based clustering","authors":"Christian Hennig,&nbsp;Pietro Coretto","doi":"10.1111/anzs.12338","DOIUrl":"10.1111/anzs.12338","url":null,"abstract":"<p>We introduce a new approach to deciding the number of clusters. The approach is applied to Optimally Tuned Robust Improper Maximum Likelihood Estimation (OTRIMLE; Coretto &amp; Hennig, <i>Journal of the American Statistical Association</i> <b>111</b>, 1648–1659) of a Gaussian mixture model allowing for observations to be classified as ‘noise’, but it can be applied to other clustering methods as well. The quality of a clustering is assessed by a statistic <i>Q</i> that measures how close the within-cluster distributions are to elliptical unimodal distributions that have the only mode in the mean. This non-parametric measure allows for non-Gaussian clusters as long as they have a good quality according to <i>Q</i>. The simplicity of a model is assessed by a measure <i>S</i> that prefers a smaller number of clusters unless additional clusters can reduce the estimated noise proportion substantially. The simplest model is then chosen that is adequate for the data in the sense that its observed value of <i>Q</i> is not significantly larger than what is expected for data truly generated from the fitted model, as can be assessed by parametric bootstrap. The approach is compared with model-based clustering using the Bayesian information criterion (BIC) and the integrated complete likelihood (ICL) in a simulation study and on two real data sets.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 2","pages":"230-254"},"PeriodicalIF":1.1,"publicationDate":"2021-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/anzs.12338","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75692546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
What is the effective sample size of a spatial point process? 空间点过程的有效样本量是多少?
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2021-07-21 DOI: 10.1111/anzs.12337
Ian W. Renner, David I. Warton, Francis K.C. Hui

Point process models are a natural approach for modelling data that arise as point events. In the case of Poisson counts, these may be fitted easily as a weighted Poisson regression. Point processes lack the notion of sample size. This is problematic for model selection, because various classical criteria such as the Bayesian information criterion (BIC) are a function of the sample size, n, and are derived in an asymptotic framework where n tends to infinity. In this paper, we develop an asymptotic result for Poisson point process models in which the observed number of point events, m, plays the role that sample size does in the classical regression context. Following from this result, we derive a version of BIC for point process models, and when fitted via penalised likelihood, conditions for the LASSO penalty that ensure consistency in estimation and the oracle property. We discuss challenges extending these results to the wider class of Gibbs models, of which the Poisson point process model is a special case.

点过程模型是对作为点事件产生的数据进行建模的自然方法。在泊松计数的情况下,这些可以很容易地拟合为加权泊松回归。点过程缺乏样本大小的概念。这对于模型选择是有问题的,因为各种经典准则,如贝叶斯信息准则(BIC)是样本量n的函数,并且是在n趋于无穷大的渐近框架中导出的。在本文中,我们开发了泊松点过程模型的渐近结果,其中观察到的点事件数m在经典回归环境中起着样本大小的作用。根据这一结果,我们为点过程模型导出了一个版本的BIC,当通过惩罚似然进行拟合时,LASSO惩罚的条件确保了估计和oracle属性的一致性。我们讨论了将这些结果扩展到更广泛的吉布斯模型的挑战,其中泊松点过程模型是一个特例。
{"title":"What is the effective sample size of a spatial point process?","authors":"Ian W. Renner,&nbsp;David I. Warton,&nbsp;Francis K.C. Hui","doi":"10.1111/anzs.12337","DOIUrl":"10.1111/anzs.12337","url":null,"abstract":"<div>\u0000 \u0000 <p>Point process models are a natural approach for modelling data that arise as point events. In the case of Poisson counts, these may be fitted easily as a weighted Poisson regression. Point processes lack the notion of sample size. This is problematic for model selection, because various classical criteria such as the Bayesian information criterion (BIC) are a function of the sample size, <i>n</i>, and are derived in an asymptotic framework where <i>n</i> tends to infinity. In this paper, we develop an asymptotic result for Poisson point process models in which the observed number of point events, <i>m</i>, plays the role that sample size does in the classical regression context. Following from this result, we derive a version of BIC for point process models, and when fitted via penalised likelihood, conditions for the LASSO penalty that ensure consistency in estimation and the oracle property. We discuss challenges extending these results to the wider class of Gibbs models, of which the Poisson point process model is a special case.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 1","pages":"144-158"},"PeriodicalIF":1.1,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/anzs.12337","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81154600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Anna Karenina and the two envelopes problem 安娜·卡列尼娜和两个信封的问题
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2021-07-21 DOI: 10.1111/anzs.12329
R. D. Gill

The Anna Karenina principle is named after the opening sentence in the eponymous novel: Happy families are all alike; every unhappy family is unhappy in its own way. The two envelopes problem (TEP) is a much-studied paradox in probability theory, mathematical economics, logic and philosophy. Time and again a new analysis is published in which an author claims finally to explain what actually goes wrong in this paradox. Each author (the present author included) emphasises what is new in their approach and concludes that earlier approaches did not get to the root of the matter. We observe that though a logical argument is only correct if every step is correct, an apparently logical argument which goes astray can be thought of as going astray at different places. This leads to a comparison between the literature on TEP and a successful movie franchise: it generates a succession of sequels, and even prequels, each with a different director who approaches the same basic premise in a personal way. We survey resolutions in the literature with a view to synthesis, correct common errors, and give a new theorem on order properties of an exchangeable pair of random variables, at the heart of most TEP variants and interpretations. A theorem on asymptotic independence between the amount in your envelope and the question whether it is smaller or larger shows that the pathological situation of improper priors or infinite expectation values has consequences as we merely approach such a situation.

安娜·卡列尼娜原则是以同名小说的开头一句话命名的:幸福的家庭都是相似的;不幸的家庭各有各的不幸。双信封问题(TEP)是概率论、数理经济学、逻辑学和哲学中一个被广泛研究的悖论。不断有新的分析发表,其中作者声称最终解释了这个悖论到底出了什么问题。每位作者(包括本作者)都强调了他们方法中的新内容,并得出结论说,以前的方法没有触及问题的根源。我们注意到,虽然逻辑论证只有在每一步都正确的情况下才是正确的,但一个表面上合乎逻辑的论证,如果误入歧途,可以认为是在不同的地方误入歧途。这让我们将TEP的文学作品与成功的电影系列进行比较:它产生了一系列续集,甚至前传,每一部都有不同的导演,以个人的方式处理相同的基本前提。我们调查了文献中的决议,以综合,纠正常见错误,并给出了一个关于可交换随机变量对的阶性质的新定理,这是大多数TEP变体和解释的核心。一个关于你信封里的数量和它是大还是小的问题之间的渐近独立的定理表明,当我们仅仅接近这种情况时,不当先验或无限期望值的病态情况就会产生后果。
{"title":"Anna Karenina and the two envelopes problem","authors":"R. D. Gill","doi":"10.1111/anzs.12329","DOIUrl":"10.1111/anzs.12329","url":null,"abstract":"<div>\u0000 \u0000 <p>The Anna Karenina principle is named after the opening sentence in the eponymous novel: Happy families are all alike; every unhappy family is unhappy in its own way. The two envelopes problem (TEP) is a much-studied paradox in probability theory, mathematical economics, logic and philosophy. Time and again a new analysis is published in which an author claims finally to explain what actually goes wrong in this paradox. Each author (the present author included) emphasises what is new in their approach and concludes that earlier approaches did not get to the root of the matter. We observe that though a logical argument is only correct if every step is correct, an apparently logical argument which goes astray can be thought of as going astray at different places. This leads to a comparison between the literature on TEP and a successful movie franchise: it generates a succession of sequels, and even prequels, each with a different director who approaches the same basic premise in a personal way. We survey resolutions in the literature with a view to synthesis, correct common errors, and give a new theorem on order properties of an exchangeable pair of random variables, at the heart of most TEP variants and interpretations. A theorem on asymptotic independence between the amount in your envelope and the question whether it is smaller or larger shows that the pathological situation of improper priors or infinite expectation values has consequences as we merely approach such a situation.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 1","pages":"201-218"},"PeriodicalIF":1.1,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/anzs.12329","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80260052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Australian & New Zealand Journal of Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1