首页 > 最新文献

Australian & New Zealand Journal of Statistics最新文献

英文 中文
Smooth tests of goodness of fit for the distributional assumption of regression models 回归模型分布假设拟合优度的平滑检验
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-04-18 DOI: 10.1111/anzs.12361
J. C. W. Rayner, Paul Rippon, Thomas Suesse, Olivier Thas

We focus on regression models that consist of (i) a model for the conditional mean of the outcome and (ii) a distributional assumption about the distribution of the outcome, both conditional on the regressors. Generalised linear models form a well-known example. The choice of the outcome distribution is often motivated by prior or background knowledge of the researcher, or it is simply chosen for convenience. We propose smooth goodness of fit tests for testing the distributional assumption in regression models. The tests arise from embedding the regression model in a smooth family of alternatives, and constructing appropriate score tests that correctly account for nuisance parameter estimation. The tests are customised, focussed and comprehensive. We present several examples to illustrate the wide applicability of our method. A small simulation study demonstrates that our tests have power to detect important deviations from the hypothesised model.

我们关注的回归模型由(i)结果的条件均值模型和(ii)结果分布的分布假设组成,两者都以回归量为条件。广义线性模型就是一个众所周知的例子。结果分布的选择通常是由研究人员的先验或背景知识驱动的,或者只是为了方便而选择。我们提出了平滑拟合优度检验来检验回归模型中的分布假设。测试产生于将回归模型嵌入到平滑的备选方案家族中,并构建正确考虑干扰参数估计的适当分数测试。这些测试是定制的、重点突出的、全面的。我们举几个例子来说明我们的方法的广泛适用性。一项小型模拟研究表明,我们的测试有能力检测出与假设模型的重要偏差。
{"title":"Smooth tests of goodness of fit for the distributional assumption of regression models","authors":"J. C. W. Rayner,&nbsp;Paul Rippon,&nbsp;Thomas Suesse,&nbsp;Olivier Thas","doi":"10.1111/anzs.12361","DOIUrl":"10.1111/anzs.12361","url":null,"abstract":"<div>\u0000 \u0000 <p>We focus on regression models that consist of (i) a model for the conditional mean of the outcome and (ii) a distributional assumption about the distribution of the outcome, both conditional on the regressors. Generalised linear models form a well-known example. The choice of the outcome distribution is often motivated by prior or background knowledge of the researcher, or it is simply chosen for convenience. We propose smooth goodness of fit tests for testing the distributional assumption in regression models. The tests arise from embedding the regression model in a smooth family of alternatives, and constructing appropriate score tests that correctly account for nuisance parameter estimation. The tests are customised, focussed and comprehensive. We present several examples to illustrate the wide applicability of our method. A small simulation study demonstrates that our tests have power to detect important deviations from the hypothesised model.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79975652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modal clustering on PPGMMGA projection subspace PPGMMGA投影子空间上的模态聚类
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-04-14 DOI: 10.1111/anzs.12360
Luca Scrucca

PPGMMGA is a projection pursuit (PP) algorithm aimed at detecting and visualising clustering structures in multivariate data. The algorithm uses the negentropy as PP index obtained by fitting Gaussian mixture models (GMMs) for density estimation and, then, exploits genetic algorithms (GAs) for its optimisation. Since the PPGMMGA algorithm is a dimension reduction technique specifically introduced for visualisation purposes, cluster memberships are not explicitly provided. In this paper a modal clustering approach is proposed for estimating clusters of projected data points. In particular, a modal EM algorithm is employed to estimate the modes corresponding to the local maxima in the projection subspace of the underlying density estimated using parsimonious GMMs. Data points are then clustered according to the domain of attraction of the identified modes. Simulated and real data are discussed to illustrate the proposed method and evaluate the clustering performance.

PPGMMGA是一种投影寻踪算法,旨在检测和可视化多变量数据中的聚类结构。该算法使用通过拟合高斯混合模型(GMMs)获得的负熵作为PP指数进行密度估计,然后利用遗传算法(GAs)进行优化。由于PPGMMGA算法是一种专门为可视化目的引入的降维技术,因此没有明确提供集群成员关系。本文提出了一种估计投影数据点聚类的模态聚类方法。特别地,使用模态EM算法来估计使用简约GMMs估计的底层密度的投影子空间中的局部最大值对应的模态。然后根据识别模式的吸引域对数据点进行聚类。通过仿真数据和真实数据对该方法进行了验证,并对聚类性能进行了评价。
{"title":"Modal clustering on PPGMMGA projection subspace","authors":"Luca Scrucca","doi":"10.1111/anzs.12360","DOIUrl":"10.1111/anzs.12360","url":null,"abstract":"<p>PPGMMGA is a projection pursuit (PP) algorithm aimed at detecting and visualising clustering structures in multivariate data. The algorithm uses the negentropy as PP index obtained by fitting Gaussian mixture models (GMMs) for density estimation and, then, exploits genetic algorithms (GAs) for its optimisation. Since the PPGMMGA algorithm is a dimension reduction technique specifically introduced for visualisation purposes, cluster memberships are not explicitly provided. In this paper a modal clustering approach is proposed for estimating clusters of projected data points. In particular, a modal EM algorithm is employed to estimate the modes corresponding to the local maxima in the projection subspace of the underlying density estimated using parsimonious GMMs. Data points are then clustered according to the domain of attraction of the identified modes. Simulated and real data are discussed to illustrate the proposed method and evaluate the clustering performance.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12360","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81884427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MPS: An R package for modelling shifted families of distributions MPS:一个R软件包,用于建模移位的分布族
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-04-14 DOI: 10.1111/anzs.12359
Mahdi Teimouri, Saralees Nadarajah

Generalised statistical distributions have been widely used over the last decades for modelling phenomena in different fields. The generalisations have been made to produce distributions with more flexibility and lead to more accurate modelling in practice. Statistical analysis of the generalised distributions requires new statistical packages. The Newdistns package due to Nadarajah and Rocha provides R routines with functionality to compute probability density function (PDF), cumulative distribution function (CDF), quantile function, random numbers and parameter estimates of 19 families of distributions with applications in survival analysis. Here, we introduce an R package, called MPS, for computing PDF, CDF, quantile function, random numbers, Q–Q plots and parameter estimates for 24 shifted new families of distributions. By considering an extra location parameter, each family will be defined on the whole real line and so covers a broader range of applicability. We adopt the well-known maximum product spacing approach to estimate parameters of the families because under some situations the maximum likelihood (ML) estimators fail to exist. We demonstrate MPS by analysing two well-known real data sets. For the first data set, the ML estimators break down, but MPS works well. For the second set, adding a location parameter results in a reasonable model while the absence of the location parameter makes the model quite inappropriate. The MPS is available from CRAN at https://cran.r-project.org/package=MPS.

在过去的几十年里,广义统计分布被广泛用于不同领域的现象建模。泛化是为了产生更灵活的分布,并在实践中导致更准确的建模。广义分布的统计分析需要新的统计软件包。Nadarajah和Rocha开发的Newdistns包为R例程提供了计算概率密度函数(PDF)、累积分布函数(CDF)、分位数函数、随机数和19个分布族参数估计的功能,并应用于生存分析。在这里,我们介绍了一个R包,称为MPS,用于计算PDF, CDF,分位数函数,随机数,Q-Q图和24移位的新分布族的参数估计。通过考虑额外的位置参数,每个族将在整个实线上定义,因此涵盖了更广泛的适用性。由于在某些情况下最大似然(ML)估计器不存在,我们采用众所周知的最大积间距方法来估计族的参数。我们通过分析两个众所周知的真实数据集来证明MPS。对于第一个数据集,ML估计器失效了,但MPS工作得很好。对于第二组,增加了位置参数得到了一个合理的模型,而没有位置参数则使模型非常不合适。MPS可从CRAN获取,网址为https://cran.r-project.org/package=MPS。
{"title":"MPS: An R package for modelling shifted families of distributions","authors":"Mahdi Teimouri,&nbsp;Saralees Nadarajah","doi":"10.1111/anzs.12359","DOIUrl":"10.1111/anzs.12359","url":null,"abstract":"<div>\u0000 \u0000 <p>Generalised statistical distributions have been widely used over the last decades for modelling phenomena in different fields. The generalisations have been made to produce distributions with more flexibility and lead to more accurate modelling in practice. Statistical analysis of the generalised distributions requires new statistical packages. The <span>Newdistns</span> package due to Nadarajah and Rocha provides <span>R</span> routines with functionality to compute probability density function (PDF), cumulative distribution function (CDF), quantile function, random numbers and parameter estimates of 19 families of distributions with applications in survival analysis. Here, we introduce an <span>R</span> package, called <span>MPS</span>, for computing PDF, CDF, quantile function, random numbers, Q–Q plots and parameter estimates for 24 shifted new families of distributions. By considering an extra location parameter, each family will be defined on the whole real line and so covers a broader range of applicability. We adopt the well-known maximum product spacing approach to estimate parameters of the families because under some situations the maximum likelihood (ML) estimators fail to exist. We demonstrate <span>MPS</span> by analysing two well-known real data sets. For the first data set, the ML estimators break down, but <span>MPS</span> works well. For the second set, adding a location parameter results in a reasonable model while the absence of the location parameter makes the model quite inappropriate. The <span>MPS</span> is available from CRAN at https://cran.r-project.org/package=MPS.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84200205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fast and efficient algorithms for sparse semiparametric bifunctional regression 稀疏半参数双泛函回归的快速有效算法
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-03-08 DOI: 10.1111/anzs.12355
Silvia Novo, Philippe Vieu, Germán Aneiros

A new sparse semiparametric model is proposed, which incorporates the influence of two functional random variables in a scalar response in a flexible and interpretable manner. One of the functional covariates is included through a single-index structure, while the other is included linearly through the high-dimensional vector formed by its discretised observations. For this model, two new algorithms are presented for selecting relevant variables in the linear part and estimating the model. Both procedures utilise the functional origin of linear covariates. Finite sample experiments demonstrated the scope of application of both algorithms: the first method is a fast algorithm that provides a solution (without loss in predictive ability) for the significant computational time required by standard variable selection methods for estimating this model, and the second algorithm completes the set of relevant linear covariates provided by the first, thus improving its predictive efficiency. Some asymptotic results theoretically support both procedures. A real data application demonstrated the applicability of the presented methodology from a predictive perspective in terms of the interpretability of outputs and low computational cost.

提出了一种新的稀疏半参数模型,该模型以灵活和可解释的方式考虑了标量响应中两个泛函随机变量的影响。其中一个函数协变量通过单指标结构包含,而另一个函数协变量通过其离散观测形成的高维向量线性包含。针对该模型,提出了线性部分相关变量选取和模型估计的两种新算法。这两种方法都利用了线性协变量的函数原点。有限样本实验证明了两种算法的适用范围:第一种算法是一种快速算法,它在不损失预测能力的情况下解决了标准变量选择方法估计该模型所需的大量计算时间,第二种算法完成了第一种算法提供的相关线性协变量集,从而提高了其预测效率。一些渐近结果在理论上支持这两种方法。一个真实的数据应用表明,从预测的角度来看,所提出的方法在输出的可解释性和低计算成本方面的适用性。
{"title":"Fast and efficient algorithms for sparse semiparametric bifunctional regression","authors":"Silvia Novo,&nbsp;Philippe Vieu,&nbsp;Germán Aneiros","doi":"10.1111/anzs.12355","DOIUrl":"10.1111/anzs.12355","url":null,"abstract":"<div>\u0000 \u0000 <p>A new sparse semiparametric model is proposed, which incorporates the influence of two functional random variables in a scalar response in a flexible and interpretable manner. One of the functional covariates is included through a single-index structure, while the other is included linearly through the high-dimensional vector formed by its discretised observations. For this model, two new algorithms are presented for selecting relevant variables in the linear part and estimating the model. Both procedures utilise the functional origin of linear covariates. Finite sample experiments demonstrated the scope of application of both algorithms: the first method is a fast algorithm that provides a solution (without loss in predictive ability) for the significant computational time required by standard variable selection methods for estimating this model, and the second algorithm completes the set of relevant linear covariates provided by the first, thus improving its predictive efficiency. Some asymptotic results theoretically support both procedures. A real data application demonstrated the applicability of the presented methodology from a predictive perspective in terms of the interpretability of outputs and low computational cost.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83590071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bessel regression and bbreg package to analyse bounded data 贝塞尔回归和bbreg包分析有界数据
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-02-12 DOI: 10.1111/anzs.12354
Wagner Barreto-Souza, Vinícius D. Mayrink, Alexandre B. Simas

Beta regression has been extensively used by statisticians and practitioners to model bounded continuous data without a strong competitor having the same main features. A class of normalised inverse-Gaussian (N-IG) process was introduced in the literature and has been explored in the Bayesian context as a powerful alternative to the Dirichlet process. Until this moment, no attention has been paid to the univariate N-IG distribution in the classical inference. In this paper, we propose the bessel regression based on the univariate N-IG distribution, which is an alternative to the beta model. The estimation of the parameters is done through an expectation–maximisation (EM) algorithm and the paper discusses how to perform inference. A useful and practical discrimination procedure is proposed for model selection between bessel and beta regressions. A new R package called bbreg is developed for fitting both bessel and beta regression models based on the EM-algorithm and further providing graphical tools for model adequacy and model selection as well. Proper documentation for this package is available. The performances of the models are evaluated under misspecification in a simulation study. An empirical illustration is explored to confront results from bessel and beta regressions by using the new R package bbreg.

Beta回归已被统计学家和从业者广泛用于建模有界连续数据,而没有具有相同主要特征的强大竞争对手。一类归一化逆高斯(N-IG)过程在文献中被引入,并在贝叶斯背景下作为Dirichlet过程的强大替代品进行了探索。到目前为止,还没有注意到经典推理中的单变量N-IG分布。在本文中,我们提出了基于单变量N-IG分布的贝塞尔回归,这是贝塔模型的替代方案。通过期望最大化(EM)算法对参数进行估计,并讨论了如何进行推理。提出了一种实用的贝塞尔回归和贝塔回归模型选择的判别方法。一个名为bbreg的新R包被开发出来,用于拟合基于em -算法的贝塞尔和贝塔回归模型,并进一步提供模型充分性和模型选择的图形工具。此包的适当文档是可用的。在仿真研究中,对模型的性能进行了评估。通过使用新的R包bbreg,探索了一个实证说明来面对贝塞尔和贝塔回归的结果。
{"title":"Bessel regression and bbreg package to analyse bounded data","authors":"Wagner Barreto-Souza,&nbsp;Vinícius D. Mayrink,&nbsp;Alexandre B. Simas","doi":"10.1111/anzs.12354","DOIUrl":"10.1111/anzs.12354","url":null,"abstract":"<div>\u0000 \u0000 <p>Beta regression has been extensively used by statisticians and practitioners to model bounded continuous data without a strong competitor having the same main features. A class of normalised inverse-Gaussian (N-IG) process was introduced in the literature and has been explored in the Bayesian context as a powerful alternative to the Dirichlet process. Until this moment, no attention has been paid to the univariate N-IG distribution in the classical inference. In this paper, we propose the bessel regression based on the univariate N-IG distribution, which is an alternative to the beta model. The estimation of the parameters is done through an expectation–maximisation (EM) algorithm and the paper discusses how to perform inference. A useful and practical discrimination procedure is proposed for model selection between bessel and beta regressions. A new <span>R</span> package called <span>bbreg</span> is developed for fitting both bessel and beta regression models based on the EM-algorithm and further providing graphical tools for model adequacy and model selection as well. Proper documentation for this package is available. The performances of the models are evaluated under misspecification in a simulation study. An empirical illustration is explored to confront results from bessel and beta regressions by using the new <span>R</span> package <span>bbreg</span>.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81092039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modelling students’ career indicators via mixtures of parsimonious matrix-normal distributions 通过简洁矩阵-正态分布的混合模型对学生的职业指标进行建模
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-02-10 DOI: 10.1111/anzs.12351
Salvatore D. Tomarchio, Salvatore Ingrassia, Volodymyr Melnykov

The evaluation of the teaching efficiency, under different points of view, is an important aspect for the university system because it helps managers to improve more and more the quality of the education and helps students to achieve strong professional skills. In this framework, students’ careers as well as teachers’ qualification and quantity adequacy indicators are analysed based on data sets provided by the Italian National Agency for the Evaluation of Universities and Research Institutes (ANVUR) according to a mixture model approach. In particular, parsimonious mixtures of matrix-normal distributions are used to detect underlying grouping structures. The results show that the data present an underlying group structure of courses having different traits, thus providing useful information for the university policy makers.

从不同的角度来看,教学效率的评估是大学系统的一个重要方面,因为它有助于管理者越来越多地提高教育质量,帮助学生获得强大的专业技能。在这个框架中,学生的职业生涯以及教师的资格和数量充足性指标是根据意大利国家大学和研究所评估机构(ANVUR)根据混合模型方法提供的数据集进行分析的。特别是,使用矩阵-正态分布的简约混合来检测潜在的分组结构。结果表明,这些数据显示了具有不同特征的课程的潜在群体结构,从而为大学决策者提供了有用的信息。
{"title":"Modelling students’ career indicators via mixtures of parsimonious matrix-normal distributions","authors":"Salvatore D. Tomarchio,&nbsp;Salvatore Ingrassia,&nbsp;Volodymyr Melnykov","doi":"10.1111/anzs.12351","DOIUrl":"10.1111/anzs.12351","url":null,"abstract":"<div>\u0000 \u0000 <p>The evaluation of the teaching efficiency, under different points of view, is an important aspect for the university system because it helps managers to improve more and more the quality of the education and helps students to achieve strong professional skills. In this framework, students’ careers as well as teachers’ qualification and quantity adequacy indicators are analysed based on data sets provided by the Italian National Agency for the Evaluation of Universities and Research Institutes (ANVUR) according to a mixture model approach. In particular, parsimonious mixtures of matrix-normal distributions are used to detect underlying grouping structures. The results show that the data present an underlying group structure of courses having different traits, thus providing useful information for the university policy makers.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82270077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Spying on the prior of the number of data clusters and the partition distribution in Bayesian cluster analysis 监视贝叶斯聚类分析中数据簇数的先验性和分区分布
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-02-10 DOI: 10.1111/anzs.12350
Jan Greve, Bettina Grün, Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter

Cluster analysis aims at partitioning data into groups or clusters. In applications, it is common to deal with problems where the number of clusters is unknown. Bayesian mixture models employed in such applications usually specify a flexible prior that takes into account the uncertainty with respect to the number of clusters. However, a major empirical challenge involving the use of these models is in the characterisation of the induced prior on the partitions. This work introduces an approach to compute descriptive statistics of the prior on the partitions for three selected Bayesian mixture models developed in the areas of Bayesian finite mixtures and Bayesian nonparametrics. The proposed methodology involves computationally efficient enumeration of the prior on the number of clusters in-sample (termed as ‘data clusters’) and determining the first two prior moments of symmetric additive statistics characterising the partitions. The accompanying reference implementation is made available in the R package fipp. Finally, we illustrate the proposed methodology through comparisons and also discuss the implications for prior elicitation in applications.

聚类分析的目的是将数据划分成组或簇。在应用程序中,通常会处理集群数量未知的问题。在这类应用中使用的贝叶斯混合模型通常指定了一个灵活的先验,该先验考虑了与集群数量有关的不确定性。然而,涉及使用这些模型的主要经验挑战是在分区上诱导先验的表征。这项工作介绍了一种方法来计算在贝叶斯有限混合和贝叶斯非参数领域开发的三个选定贝叶斯混合模型分区上的先验描述性统计。所提出的方法包括对样本内簇(称为“数据簇”)数量的先验进行计算效率枚举,并确定描述分区的对称加性统计的前两个先验矩。附带的参考实现可在R包fipp中获得。最后,我们通过比较说明了所提出的方法,并讨论了在应用程序中对先验启发的影响。
{"title":"Spying on the prior of the number of data clusters and the partition distribution in Bayesian cluster analysis","authors":"Jan Greve,&nbsp;Bettina Grün,&nbsp;Gertraud Malsiner-Walli,&nbsp;Sylvia Frühwirth-Schnatter","doi":"10.1111/anzs.12350","DOIUrl":"10.1111/anzs.12350","url":null,"abstract":"<p>Cluster analysis aims at partitioning data into groups or clusters. In applications, it is common to deal with problems where the number of clusters is unknown. Bayesian mixture models employed in such applications usually specify a flexible prior that takes into account the uncertainty with respect to the number of clusters. However, a major empirical challenge involving the use of these models is in the characterisation of the induced prior on the partitions. This work introduces an approach to compute descriptive statistics of the prior on the partitions for three selected Bayesian mixture models developed in the areas of Bayesian finite mixtures and Bayesian nonparametrics. The proposed methodology involves computationally efficient enumeration of the prior on the number of clusters in-sample (termed as ‘data clusters’) and determining the first two prior moments of symmetric additive statistics characterising the partitions. The accompanying reference implementation is made available in the <span>R</span> package <span>fipp</span>. Finally, we illustrate the proposed methodology through comparisons and also discuss the implications for prior elicitation in applications.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12350","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72830218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Variable selection and debiased estimation for single-index expectile model 单指标期望模型的变量选择与去偏估计
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-02-02 DOI: 10.1111/anzs.12348
Rong Jiang, Yexun Peng, Yufei Deng

This article develops a penalised asymmetric least squares estimator for single-index expectile model. The oracle property of the proposed estimator is established. Moreover, the debiasing technique is used to construct an estimator that is asymptotically normal, which enables the construction of valid confidence intervals and hypothesis testing. Simulation studies and one real data application are conducted to illustrate the finite sample performance of the proposed methods.

本文提出了单指标期望模型的惩罚非对称最小二乘估计。建立了该估计器的预言性。此外,利用消偏技术构造渐近正态的估计量,使有效置信区间的构造和假设检验成为可能。通过仿真研究和一个实际数据应用来说明所提出方法的有限样本性能。
{"title":"Variable selection and debiased estimation for single-index expectile model","authors":"Rong Jiang,&nbsp;Yexun Peng,&nbsp;Yufei Deng","doi":"10.1111/anzs.12348","DOIUrl":"10.1111/anzs.12348","url":null,"abstract":"<div>\u0000 \u0000 <p>This article develops a penalised asymmetric least squares estimator for single-index expectile model. The oracle property of the proposed estimator is established. Moreover, the debiasing technique is used to construct an estimator that is asymptotically normal, which enables the construction of valid confidence intervals and hypothesis testing. Simulation studies and one real data application are conducted to illustrate the finite sample performance of the proposed methods.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79758873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient estimation of partially linear tail index models using B-splines 部分线性尾指数模型的b样条有效估计
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-02-02 DOI: 10.1111/anzs.12357
Yaolan Ma, Bo Wei

The tail index is an important parameter in extreme value theory. In this paper, we consider a simple yet flexible spline estimation method for partially linear tail index models. We approximate the unknown function by B-splines and construct an approximate log-likelihood function to estimate the coefficients of the linear covariates and the B-spline basis functions. Consistency and asymptotic normality of the estimators are established. Subsequently, the proposed method is illustrated by using simulations and applications to the Fremantle annual maximum sea levels data and Chicago air pollution data.

尾指数是极值理论中的一个重要参数。本文考虑了部分线性尾指数模型的一种简单而灵活的样条估计方法。我们用b样条近似未知函数,构造一个近似对数似然函数来估计线性协变量和b样条基函数的系数。建立了估计量的相合性和渐近正态性。随后,通过对Fremantle年最高海平面数据和芝加哥空气污染数据的模拟和应用说明了所提出的方法。
{"title":"Efficient estimation of partially linear tail index models using B-splines","authors":"Yaolan Ma,&nbsp;Bo Wei","doi":"10.1111/anzs.12357","DOIUrl":"10.1111/anzs.12357","url":null,"abstract":"<div>\u0000 \u0000 <p>The tail index is an important parameter in extreme value theory. In this paper, we consider a simple yet flexible spline estimation method for partially linear tail index models. We approximate the unknown function by B-splines and construct an approximate log-likelihood function to estimate the coefficients of the linear covariates and the B-spline basis functions. Consistency and asymptotic normality of the estimators are established. Subsequently, the proposed method is illustrated by using simulations and applications to the Fremantle annual maximum sea levels data and Chicago air pollution data.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84190109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Properties of the affine-invariant ensemble sampler's ‘stretch move’ in high dimensions 高维仿射不变系综采样器“拉伸移动”的性质
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-02-02 DOI: 10.1111/anzs.12358
David Huijser, Jesse Goodman, Brendon J. Brewer

We present theoretical and practical properties of the affine-invariant ensemble sampler Markov Chain Monte Carlo method. In high dimensions, the sampler's ‘stretch move’ has unusual and undesirable properties. We demonstrate this with an n-dimensional correlated Gaussian toy problem with a known mean and covariance structure, and a multivariate version of the Rosenbrock problem. Visual inspection of a trace plots suggests the burn-in period is short. Upon closer inspection, we discover the mean and the variance of the target distribution do not match the known values, and the chain takes a very long time to converge. This problem becomes severe as n increases beyond 50. We also applied different diagnostics adapted to be applicable to ensemble methods to determine any lack of convergence. The diagnostics include the Gelman–Rubin method, the Heidelberger–Welch test, the integrated autocorrelation and the acceptance rate. The trace plot of individual walkers appears to be useful as well. We therefore conclude that the stretch move should be used with caution in moderate to high dimensions. We also present some heuristic results explaining this behaviour.

给出了仿射不变集合采样器马尔可夫链蒙特卡罗方法的理论和实际性质。在高维中,采样器的“拉伸移动”具有不寻常和不受欢迎的特性。我们用一个已知均值和协方差结构的n维相关高斯玩具问题和一个多变量版本的Rosenbrock问题来证明这一点。目视检查痕迹图表明烧蚀期很短。经过仔细检查,我们发现目标分布的均值和方差与已知值不匹配,并且链需要很长时间才能收敛。当n大于50时,这个问题变得更加严重。我们还应用了适用于集成方法的不同诊断方法来确定是否缺乏收敛性。诊断方法包括Gelman-Rubin法、海德堡-韦尔奇检验、综合自相关和接受率。单个步行者的轨迹图似乎也很有用。因此,我们得出结论,拉伸移动应谨慎使用中至高维。我们还提出了一些启发式结果来解释这种行为。
{"title":"Properties of the affine-invariant ensemble sampler's ‘stretch move’ in high dimensions","authors":"David Huijser,&nbsp;Jesse Goodman,&nbsp;Brendon J. Brewer","doi":"10.1111/anzs.12358","DOIUrl":"10.1111/anzs.12358","url":null,"abstract":"<div>\u0000 \u0000 <p>We present theoretical and practical properties of the affine-invariant ensemble sampler Markov Chain Monte Carlo method. In high dimensions, the sampler's ‘stretch move’ has unusual and undesirable properties. We demonstrate this with an <i>n</i>-dimensional correlated Gaussian toy problem with a known mean and covariance structure, and a multivariate version of the Rosenbrock problem. Visual inspection of a trace plots suggests the burn-in period is short. Upon closer inspection, we discover the mean and the variance of the target distribution do not match the known values, and the chain takes a very long time to converge. This problem becomes severe as <i>n</i> increases beyond 50. We also applied different diagnostics adapted to be applicable to ensemble methods to determine any lack of convergence. The diagnostics include the Gelman–Rubin method, the Heidelberger–Welch test, the integrated autocorrelation and the acceptance rate. The trace plot of individual walkers appears to be useful as well. We therefore conclude that the stretch move should be used with caution in moderate to high dimensions. We also present some heuristic results explaining this behaviour.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88726225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Australian & New Zealand Journal of Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1