首页 > 最新文献

Australian & New Zealand Journal of Statistics最新文献

英文 中文
Permutation entropy and its variants for measuring temporal dependence 测量时间依赖性的排列熵及其变体
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-12-08 DOI: 10.1111/anzs.12376
Xin Huang, Han Lin Shang, David Pitt

Permutation entropy (PE) is an ordinal-based non-parametric complexity measure for studying the temporal dependence structure in a linear or non-linear time series. Based on the PE, we propose a new measure, namely permutation dependence (PD), to quantify the strength of the temporal dependence in a univariate time series and remedy the major drawbacks of PE. We demonstrate that the PE and PD are viable and useful alternatives to conventional temporal dependence measures, such as the autocorrelation function (ACF) and mutual information (MI). Compared to the ACF, the PE and PD are not restricted in detecting the linear or quasi-linear serial correlation in an autoregression model. Instead, they can be viewed as non-parametric and non-linear alternatives since they do not require any prior knowledge or assumptions about the underlying structure. Compared to MI estimated by k-nearest neighbour, PE and PD show added sensitivity to structures of relatively weak strength. We compare the finite-sample performance of the PE and PD with the ACF and the MI estimated by k-nearest neighbour in a number of simulation studies to showcase their respective strengths and weaknesses. Moreover, their performance under non-stationarity is also investigated. Using high-frequency EUR/USD exchange rate returns data, we apply the PE and PD to study the temporal dependence structure in intraday foreign exchange.

置换熵(Permutation entropy, PE)是一种基于序数的非参数复杂度度量,用于研究线性或非线性时间序列中的时间依赖结构。在此基础上,我们提出了一种新的度量方法,即置换依赖(PD),以量化单变量时间序列中时间依赖性的强度,并弥补了置换依赖的主要缺陷。我们证明,PE和PD是可行的和有用的替代传统的时间依赖性措施,如自相关函数(ACF)和互信息(MI)。与ACF相比,PE和PD在检测自回归模型中的线性或拟线性序列相关方面不受限制。相反,它们可以被视为非参数和非线性替代方案,因为它们不需要任何关于底层结构的先验知识或假设。与k近邻估计的MI相比,PE和PD对强度相对较弱的结构表现出更高的敏感性。在许多模拟研究中,我们将PE和PD的有限样本性能与由k近邻估计的ACF和MI进行比较,以展示它们各自的优点和缺点。此外,还研究了它们在非平稳条件下的性能。利用欧元/美元的高频汇率回报数据,我们运用PE和PD来研究外汇交易的时间依赖结构。
{"title":"Permutation entropy and its variants for measuring temporal dependence","authors":"Xin Huang,&nbsp;Han Lin Shang,&nbsp;David Pitt","doi":"10.1111/anzs.12376","DOIUrl":"10.1111/anzs.12376","url":null,"abstract":"<p>Permutation entropy (PE) is an ordinal-based non-parametric complexity measure for studying the temporal dependence structure in a linear or non-linear time series. Based on the PE, we propose a new measure, namely permutation dependence (PD), to quantify the strength of the temporal dependence in a univariate time series and remedy the major drawbacks of PE. We demonstrate that the PE and PD are viable and useful alternatives to conventional temporal dependence measures, such as the autocorrelation function (ACF) and mutual information (MI). Compared to the ACF, the PE and PD are not restricted in detecting the linear or quasi-linear serial correlation in an autoregression model. Instead, they can be viewed as non-parametric and non-linear alternatives since they do not require any prior knowledge or assumptions about the underlying structure. Compared to MI estimated by <i>k</i>-nearest neighbour, PE and PD show added sensitivity to structures of relatively weak strength. We compare the finite-sample performance of the PE and PD with the ACF and the MI estimated by <i>k</i>-nearest neighbour in a number of simulation studies to showcase their respective strengths and weaknesses. Moreover, their performance under non-stationarity is also investigated. Using high-frequency EUR/USD exchange rate returns data, we apply the PE and PD to study the temporal dependence structure in intraday foreign exchange.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12376","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76160251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The place of probability distributions in statistical learning. A commented book review of ‘Distributions for modeling location, scale, and shape using GAMLSS in R’ by Rigby et al. (2021) 概率分布在统计学习中的地位。Rigby等人对《在R中使用GAMLSS建模位置、规模和形状的分布》的书评(2021年)。
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-09-23 DOI: 10.1111/anzs.12374
Fernando Marmolejo-Ramos, Raydonal Ospina, Freddy Hernández-Barajas

Generalised additive models for location, scale and shape (GAMLSS) is a type of distributional regression framework that enables modelling numeric dependent variables via probability distributions other than those of the exponential family. While the cogs behind GAMLSS are provided in Stasinopoulos et al. 2017's book ‘Flexible regression and smoothing using GAMLSS in R, the new book by Rigby et al. considers the distributions implemented in the R software that are usable for GAMLSS modelling. A commented summary of that second book is provided in a supplementary file. Unlike traditional book reviews, two topics in this new book are briefly elaborated on: robustness (Chapter 12) and shape (Chapters 14–16). It is concluded that despite GAMLSS being a powerful and flexible framework for supervised statistical learning, striving for interpretable GAMLSS models is essential.

广义加性位置、规模和形状模型(GAMLSS)是一种分布回归框架,它可以通过概率分布而不是指数族分布来模拟数值因变量。尽管Stasinopoulos等人在2017年出版的《在R中使用GAMLSS进行灵活回归和平滑》一书中提供了GAMLSS背后的细节,但Rigby等人的新书考虑了R软件中实现的可用于GAMLSS建模的分布。在补充文件中提供了第二本书的评论摘要。与传统的书评不同,这本新书中简要阐述了两个主题:稳健性(第12章)和形状(第14-16章)。结论是,尽管GAMLSS是一个强大而灵活的监督统计学习框架,但努力建立可解释的GAMLSS模型是必不可少的。
{"title":"The place of probability distributions in statistical learning. A commented book review of ‘Distributions for modeling location, scale, and shape using GAMLSS in R’ by Rigby et al. (2021)","authors":"Fernando Marmolejo-Ramos,&nbsp;Raydonal Ospina,&nbsp;Freddy Hernández-Barajas","doi":"10.1111/anzs.12374","DOIUrl":"10.1111/anzs.12374","url":null,"abstract":"<p>Generalised additive models for location, scale and shape (GAMLSS) is a type of distributional regression framework that enables modelling numeric dependent variables via probability distributions other than those of the exponential family. While the cogs behind GAMLSS are provided in Stasinopoulos <i>et al</i>. 2017's book ‘Flexible regression and smoothing using GAMLSS in R, the new book by Rigby <i>et al</i>. considers the distributions implemented in the R software that are usable for GAMLSS modelling. A commented summary of that second book is provided in a supplementary file. Unlike traditional book reviews, two topics in this new book are briefly elaborated on: robustness (Chapter 12) and shape (Chapters 14–16). It is concluded that despite GAMLSS being a powerful and flexible framework for supervised statistical learning, striving for interpretable GAMLSS models is essential.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121903776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Penalised, post-pretest, and post-shrinkage strategies in nonlinear growth models 非线性增长模型中的惩罚、后预测和后收缩策略
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-09-04 DOI: 10.1111/anzs.12373
Janjira Piladaeng, S. Ejaz Ahmed, Supranee Lisawadi

In nonlinear growth models, we considered the parameter estimation under subspace information for low-dimensional and high-dimensional data. We proposed novel estimators based on pretest and shrinkage strategies to improve the estimation efficiency and to establish asymptotic properties. We used simulation studies and a real data example to confirm the theoretical results. We also applied two well-known penalised methods—least absolute shrinkage and selection operator (LASSO) and adaptive LASSO (aLASSO)—for the dimensional reduction of the predictor variables. The results demonstrated that the pretest and shrinkage estimation strategies performed well in parameter estimations when the subspace information was incorrect for both low- and high-dimensional regimes.

在非线性增长模型中,我们考虑了低维和高维数据在子空间信息下的参数估计。我们提出了基于预检验和收缩策略的新估计器,以提高估计效率并建立渐近性质。通过仿真研究和实际数据算例对理论结果进行了验证。我们还应用了两种众所周知的惩罚方法-最小绝对收缩和选择算子(LASSO)和自适应LASSO (aLASSO) -用于预测变量的降维。结果表明,当子空间信息不正确时,预测试和收缩估计策略在低维和高维区域的参数估计中都表现良好。
{"title":"Penalised, post-pretest, and post-shrinkage strategies in nonlinear growth models","authors":"Janjira Piladaeng,&nbsp;S. Ejaz Ahmed,&nbsp;Supranee Lisawadi","doi":"10.1111/anzs.12373","DOIUrl":"10.1111/anzs.12373","url":null,"abstract":"<div>\u0000 \u0000 <p>In nonlinear growth models, we considered the parameter estimation under subspace information for low-dimensional and high-dimensional data. We proposed novel estimators based on pretest and shrinkage strategies to improve the estimation efficiency and to establish asymptotic properties. We used simulation studies and a real data example to confirm the theoretical results. We also applied two well-known penalised methods—least absolute shrinkage and selection operator (LASSO) and adaptive LASSO (aLASSO)—for the dimensional reduction of the predictor variables. The results demonstrated that the pretest and shrinkage estimation strategies performed well in parameter estimations when the subspace information was incorrect for both low- and high-dimensional regimes.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86830099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Robust subtractive stability measures for fast and exhaustive feature importance ranking and selection in generalised linear models 广义线性模型中快速穷尽特征重要性排序和选择的鲁棒减法稳定性测度
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-09-02 DOI: 10.1111/anzs.12375
Connor Smith, Boris Guennewig, Samuel Muller

We introduce the relatively new concept of subtractive lack-of-fit measures in the context of robust regression, in particular in generalised linear models. We devise a fast and robust feature selection framework for regression that empirically enjoys better performance than other selection methods while remaining computationally feasible when fully exhaustive methods are not. Our method builds on the concepts of model stability, subtractive lack-of-fit measures and repeated model identification. We demonstrate how the multiple implementations add value in a robust regression type context, in particular through utilizing a combination of robust regression coefficient and scale estimates. Through resampling, we construct a robust stability matrix, which contains multiple measures of feature importance for each variable. By constructing this stability matrix and using it to rank features based on importance, we are able to reduce the candidate model space and then perform an exhaustive search on the remaining models. We also introduce two different visualisations to better convey information held within the stability matrix; a subtractive Mosaic Probability Plot and a subtractive Variable Inclusion Plot. We demonstrate how these graphics allow for a better understanding of how variable importance changes under small alterations to the underlying data. Our framework is made available in R through the RobStabR package.

我们在鲁棒回归的背景下,特别是在广义线性模型中,引入了相对较新的减法拟合缺失度量的概念。我们设计了一个快速和鲁棒的回归特征选择框架,在经验上比其他选择方法具有更好的性能,同时在完全穷尽方法不具备计算可行性的情况下保持计算可行性。我们的方法建立在模型稳定性、相减失拟合度量和重复模型识别的概念之上。我们演示了多个实现如何在稳健回归类型上下文中增加价值,特别是通过结合使用稳健回归系数和规模估计。通过重采样,我们构建了一个鲁棒稳定性矩阵,其中包含每个变量的多个特征重要性度量。通过构造这个稳定性矩阵并使用它来根据重要性对特征进行排序,我们能够减少候选模型空间,然后对剩余模型进行穷举搜索。我们还引入了两种不同的可视化,以更好地传达稳定性矩阵中包含的信息;一个相减的镶嵌概率图和一个相减的变量包含图。我们演示了这些图形如何允许更好地理解在底层数据的微小变化下变量重要性是如何变化的。我们的框架可以通过RobStabR包在R中使用。
{"title":"Robust subtractive stability measures for fast and exhaustive feature importance ranking and selection in generalised linear models","authors":"Connor Smith,&nbsp;Boris Guennewig,&nbsp;Samuel Muller","doi":"10.1111/anzs.12375","DOIUrl":"10.1111/anzs.12375","url":null,"abstract":"<p>We introduce the relatively new concept of subtractive lack-of-fit measures in the context of robust regression, in particular in generalised linear models. We devise a fast and robust feature selection framework for regression that empirically enjoys better performance than other selection methods while remaining computationally feasible when fully exhaustive methods are not. Our method builds on the concepts of model stability, subtractive lack-of-fit measures and repeated model identification. We demonstrate how the multiple implementations add value in a robust regression type context, in particular through utilizing a combination of robust regression coefficient and scale estimates. Through resampling, we construct a robust stability matrix, which contains multiple measures of feature importance for each variable. By constructing this stability matrix and using it to rank features based on importance, we are able to reduce the candidate model space and then perform an exhaustive search on the remaining models. We also introduce two different visualisations to better convey information held within the stability matrix; a subtractive Mosaic Probability Plot and a subtractive Variable Inclusion Plot. We demonstrate how these graphics allow for a better understanding of how variable importance changes under small alterations to the underlying data. Our framework is made available in <span>R</span> through the <span>RobStabR</span> package.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90245712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multivariate Kruskal_Wallis tests based on principal component score and latent source of independent component analysis 基于主成分评分和独立成分分析潜在源的多元Kruskal_Wallis检验
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-08-04 DOI: 10.1111/anzs.12371
Amitava Mukherjee, Hidetoshi Murakami

Analysing multivariate and high_dimensional multi_sample data is essential in many scientific fields. One of the most crucial and popular topics in modern nonparametric statistics is multi_sample comparison problems for such multivariate and high_dimensional data. The Kruskal_Wallis test is widely used in the multi_sample problem. For multivariate or high_dimensional data, it is imperative to specify how to determine the ranks of individual vector_valued observations in terms of various distance metrics. Alternatively, one can combine the concept of principal component scores or independent component scores with the Kruskal_Wallis test. A simple but powerful Kruskal_Wallis test based on the principal component scores is discussed in this paper for the multivariate and high_dimensional data. Another type of Kruskal_Wallis test based on latent sources of independent component analysis is constructed as a competitor. These tests are suitable for testing the difference in the location vector, scale matrix or both and can be used with equal and unequal sample sizes. These tests_ power performances are thoroughly compared with traditional distance_based Kruskal_Wallis tests for multivariate data using simulation based on Monte Carlo for various population distributions. We include an illustration of the proposed tests using real data. The paper concludes with some remarks and directions for future research.

分析多元和高维多样本数据在许多科学领域是必不可少的。多样本比较问题是现代非参数统计中最重要和最热门的课题之一。Kruskal_Wallis检验被广泛应用于多样本问题。对于多变量或高维数据,必须指定如何根据各种距离度量来确定单个vector_value观测值的秩。或者,可以将主成分分数或独立成分分数的概念与Kruskal_Wallis检验结合起来。本文讨论了一种简单但功能强大的基于主成分分数的多维高维数据Kruskal_Wallis检验方法。构建了另一种基于独立分量分析潜在源的Kruskal_Wallis检验作为竞争对手。这些测试适用于测试位置向量、尺度矩阵或两者的差异,并可用于相等和不相等样本量。通过蒙特卡罗模拟,对不同总体分布的多变量数据与传统的基于距离的Kruskal_Wallis测试进行了比较。我们包括使用实际数据的拟议测试的说明。最后,对今后的研究提出了几点看法和方向。
{"title":"Multivariate Kruskal_Wallis tests based on principal component score and latent source of independent component analysis","authors":"Amitava Mukherjee,&nbsp;Hidetoshi Murakami","doi":"10.1111/anzs.12371","DOIUrl":"10.1111/anzs.12371","url":null,"abstract":"<div>\u0000 \u0000 <p>Analysing multivariate and high_dimensional multi_sample data is essential in many scientific fields. One of the most crucial and popular topics in modern nonparametric statistics is multi_sample comparison problems for such multivariate and high_dimensional data. The Kruskal_Wallis test is widely used in the multi_sample problem. For multivariate or high_dimensional data, it is imperative to specify how to determine the ranks of individual vector_valued observations in terms of various distance metrics. Alternatively, one can combine the concept of principal component scores or independent component scores with the Kruskal_Wallis test. A simple but powerful Kruskal_Wallis test based on the principal component scores is discussed in this paper for the multivariate and high_dimensional data. Another type of Kruskal_Wallis test based on latent sources of independent component analysis is constructed as a competitor. These tests are suitable for testing the difference in the location vector, scale matrix or both and can be used with equal and unequal sample sizes. These tests_ power performances are thoroughly compared with traditional distance_based Kruskal_Wallis tests for multivariate data using simulation based on Monte Carlo for various population distributions. We include an illustration of the proposed tests using real data. The paper concludes with some remarks and directions for future research.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72631322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Festschrift for Geoff McLachlan 杰夫·麦克拉克伦的纪念
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-08-01 DOI: 10.1111/anzs.12372
Hien Nguyen, Sharon Lee, Florence Forbes

This article introduces a special issue of the Australian and New Zealand Journal of Statistics, dedicated as a Festschrift for Geoff McLachlan on the occasion of his 75th birthday.

这篇文章介绍了澳大利亚和新西兰统计杂志的一期特刊,作为对杰夫·麦克拉克兰75岁生日的纪念。
{"title":"A Festschrift for Geoff McLachlan","authors":"Hien Nguyen,&nbsp;Sharon Lee,&nbsp;Florence Forbes","doi":"10.1111/anzs.12372","DOIUrl":"10.1111/anzs.12372","url":null,"abstract":"<p>This article introduces a special issue of the Australian and New Zealand Journal of Statistics, dedicated as a Festschrift for Geoff McLachlan on the occasion of his 75th birthday.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12372","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78925706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets 用于检测非正态聚类的贝叶斯层次混合模型应用于嘈杂的基因组和环境数据集
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-08-01 DOI: 10.1111/anzs.12370
Huizi Zhang, Ben Swallow, Mayetri Gupta

Clustering to find subgroups with common features is often a necessary first step in the statistical modelling and analysis of large and complex datasets. Although follow-up analyses often make use of complex statistical models that are appropriate for the specific application, most popular clustering approaches are either nonparametric, or based on Gaussian mixture models and their variants, often for reasons of computational efficiency. Certain characteristics in the data, such as the presence of outliers, or non-ellipsoidal cluster shapes, that are common in modern scientific datasets, often lead these methods to fail to detect the cluster components accurately. In this article, we present two efficient and robust Bayesian clustering approaches that seek to overcome these limitations—a model-based ‘tight’ clustering approach to cluster points in the presence of outliers, and a hierarchical Laplace mixture-based approach to cluster heavy-tailed and otherwise non-normal cluster components—and illustrate their power and accuracy in detecting meaningful clusters in datasets from genomics, imaging and the environmental sciences.

聚类发现具有共同特征的子组通常是对大型复杂数据集进行统计建模和分析的必要的第一步。尽管后续分析经常使用适合特定应用的复杂统计模型,但最流行的聚类方法要么是非参数的,要么基于高斯混合模型及其变体,这通常是出于计算效率的原因。数据中的某些特征,例如在现代科学数据集中常见的异常值或非椭球形簇形状的存在,往往导致这些方法无法准确地检测到簇成分。在本文中,我们提出了两种高效且稳健的贝叶斯聚类方法,旨在克服这些局限性——一种基于模型的“紧密”聚类方法,用于在异常值存在的情况下聚类点,以及一种基于分层拉普拉斯混合的方法,用于聚类重尾和其他非正常聚类组件——并说明它们在检测基因组学、成像和环境科学数据集中有意义的聚类方面的能力和准确性。
{"title":"Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets","authors":"Huizi Zhang,&nbsp;Ben Swallow,&nbsp;Mayetri Gupta","doi":"10.1111/anzs.12370","DOIUrl":"10.1111/anzs.12370","url":null,"abstract":"<p>Clustering to find subgroups with common features is often a necessary first step in the statistical modelling and analysis of large and complex datasets. Although follow-up analyses often make use of complex statistical models that are appropriate for the specific application, most popular clustering approaches are either nonparametric, or based on Gaussian mixture models and their variants, often for reasons of computational efficiency. Certain characteristics in the data, such as the presence of outliers, or non-ellipsoidal cluster shapes, that are common in modern scientific datasets, often lead these methods to fail to detect the cluster components accurately. In this article, we present two efficient and robust Bayesian clustering approaches that seek to overcome these limitations—a model-based ‘tight’ clustering approach to cluster points in the presence of outliers, and a hierarchical Laplace mixture-based approach to cluster heavy-tailed and otherwise non-normal cluster components—and illustrate their power and accuracy in detecting meaningful clusters in datasets from genomics, imaging and the environmental sciences.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12370","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83368306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bayesian non-parametric spatial prior for traffic crash risk mapping: A case study of Victoria, Australia 交通碰撞风险映射的贝叶斯非参数空间先验:以澳大利亚维多利亚州为例
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-07-06 DOI: 10.1111/anzs.12369
J.-B. Durand, F. Forbes, C.D. Phan, L. Truong, H.D. Nguyen, F. Dama

We develop a Bayesian non-parametric (BNP) model coupled with Markov random fields (MRFs) for risk mapping, to infer homogeneous spatial regions in terms of risks. In contrast to most existing methods, the proposed approach does not require an arbitrary commitment to a specified number of risk classes and determines their risk levels automatically. We consider settings in which the relevant information are counts and propose a so-called BNP hidden MRF (BNP-HMRF) model that is able to handle such data. The model inference is carried out using a variational Bayes expectation–maximisation algorithm and the approach is illustrated on traffic crash data in the state of Victoria, Australia. The obtained results corroborate well with the traffic safety literature. More generally, the model presented here for risk mapping offers an effective, convenient and fast way to conduct partition of spatially localised count data.

我们开发了一个贝叶斯非参数(BNP)模型,结合马尔可夫随机场(mrf)进行风险映射,以推断风险的均匀空间区域。与大多数现有方法相比,所提出的方法不需要对指定数量的风险类别进行任意承诺,并自动确定其风险级别。我们考虑了相关信息计数的设置,并提出了能够处理此类数据的所谓BNP隐藏MRF (BNP- hmrf)模型。模型推理使用变分贝叶斯期望最大化算法进行,该方法在澳大利亚维多利亚州的交通事故数据上进行了说明。所得结果与交通安全文献吻合较好。更一般地说,本文提出的风险映射模型为对空间局部化计数数据进行分区提供了一种有效、方便、快速的方法。
{"title":"Bayesian non-parametric spatial prior for traffic crash risk mapping: A case study of Victoria, Australia","authors":"J.-B. Durand,&nbsp;F. Forbes,&nbsp;C.D. Phan,&nbsp;L. Truong,&nbsp;H.D. Nguyen,&nbsp;F. Dama","doi":"10.1111/anzs.12369","DOIUrl":"10.1111/anzs.12369","url":null,"abstract":"<p>We develop a Bayesian non-parametric (BNP) model coupled with Markov random fields (MRFs) for risk mapping, to infer homogeneous spatial regions in terms of risks. In contrast to most existing methods, the proposed approach does not require an arbitrary commitment to a specified number of risk classes and determines their risk levels automatically. We consider settings in which the relevant information are counts and propose a so-called BNP hidden MRF (BNP-HMRF) model that is able to handle such data. The model inference is carried out using a variational Bayes expectation–maximisation algorithm and the approach is illustrated on traffic crash data in the state of Victoria, Australia. The obtained results corroborate well with the traffic safety literature. More generally, the model presented here for risk mapping offers an effective, convenient and fast way to conduct partition of spatially localised count data.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12369","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72911289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Visualising the pattern of long-term genotype performance by leveraging a genomic prediction model 利用基因组预测模型可视化长期基因型表现模式
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-05-03 DOI: 10.1111/anzs.12362
Vivi N. Arief, Ian H. DeLacy, Thomas Payne, Kaye E. Basford

Historical data from plant breeding programs provide valuable resources to study the response of genotypes to the changing environment (i.e. genotype-by-environment interaction). Such data have been used to evaluate the pattern of genotype performance across regions or locations, but its use to evaluate the long-term pattern of genotype performance across environments (i.e. locations-by-years) has been hampered by the lack of common genotypes across years. This lack of common genotypes is due to the structure of the breeding program, especially for annual crops, where only a proportion of selected genotypes are tested in subsequent years. This has resulted in a sparse prediction of the performance of genotypes across years (i.e. a genotype-by-year table). A genomic prediction method that fitted both a relationship matrix among genotypes and a relationship matrix among environments (i.e. years) could overcome this limitation and produce a dense genotype-by-year table, thereby enabling some evaluation of long-term genotype performance. In this paper, we applied the genomic prediction model to the yield data from CIMMYT's Elite Spring Wheat Yield Trials (ESWYT) to visualise the pattern of genotype performance over 25 years.

植物育种项目的历史数据为研究基因型对环境变化的响应(即基因型与环境的相互作用)提供了宝贵的资源。这些数据已被用于评估不同地区或地点的基因型表现模式,但由于缺乏不同年份的共同基因型,将其用于评估不同环境(即按年份划分的地点)的基因型表现的长期模式受到了阻碍。常见基因型的缺乏是由于育种计划的结构造成的,特别是对于一年生作物,在随后的年份中只有一部分选定的基因型进行了测试。这导致了对基因型表现的稀疏预测(即按年的基因型表)。一种既可以拟合基因型之间的关系矩阵,也可以拟合环境(如年份)之间的关系矩阵的基因组预测方法可以克服这一限制,并产生一个密集的按年份的基因型表,从而能够对基因型的长期表现进行一些评估。在本文中,我们将基因组预测模型应用于CIMMYT的精英春小麦产量试验(ESWYT)的产量数据,以可视化25年来基因型表现的模式。
{"title":"Visualising the pattern of long-term genotype performance by leveraging a genomic prediction model","authors":"Vivi N. Arief,&nbsp;Ian H. DeLacy,&nbsp;Thomas Payne,&nbsp;Kaye E. Basford","doi":"10.1111/anzs.12362","DOIUrl":"10.1111/anzs.12362","url":null,"abstract":"<p>Historical data from plant breeding programs provide valuable resources to study the response of genotypes to the changing environment (i.e. genotype-by-environment interaction). Such data have been used to evaluate the pattern of genotype performance across regions or locations, but its use to evaluate the long-term pattern of genotype performance across environments (i.e. locations-by-years) has been hampered by the lack of common genotypes across years. This lack of common genotypes is due to the structure of the breeding program, especially for annual crops, where only a proportion of selected genotypes are tested in subsequent years. This has resulted in a sparse prediction of the performance of genotypes across years (i.e. a genotype-by-year table). A genomic prediction method that fitted both a relationship matrix among genotypes and a relationship matrix among environments (i.e. years) could overcome this limitation and produce a dense genotype-by-year table, thereby enabling some evaluation of long-term genotype performance. In this paper, we applied the genomic prediction model to the yield data from CIMMYT's Elite Spring Wheat Yield Trials (ESWYT) to visualise the pattern of genotype performance over 25 years.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12362","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72714347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Functional dimension reduction based on fuzzy partition and transformation 基于模糊划分和变换的功能降维
IF 1.1 4区 数学 Q3 Mathematics Pub Date : 2022-04-25 DOI: 10.1111/anzs.12363
Beiting Liang, Taoxuan Gao, Defa Bai, Guochang Wang

Functional sliced inverse regression (FSIR) is the among most popular methods for the functional dimension reduction. However, FSIR has two evident shortcomings. On the one hand, the number of samples in each slice must not be too small and selecting a suitable S is difficult, particularly for data with small sample size, where S indicates the number of slices. On the other hand, FSIR and its related methods are well-known for their poor performance when the link function is an even (or symmetric) dependency. To solve these two problems, we propose three new types of estimation methods. First, we propose the functional fuzzy inverse regression (FFIR) method based on a fuzzy partition. Compared with FSIR that uses a hard partition, the fuzzy partition uses all samples with different weights to estimate the mean in each slice. Therefore, FFIR exhibits good performance even for data with small sample size. Second, we suggest two transformation approaches, namely, FSIRR and FSIRP, avoiding the symmetric dependency between the response and the predictor. FSIRR eliminates the symmetric dependency by transforming the response variable, while FSIRP overcomes the symmetric dependency by transforming the functional predictor. Third, we propose the FFIRR and FFIRP methods by combining the advantages of FFIR and two transformation methods. FFIRR and FFIRP replace the FSIR method on the transformation data via FFIR. Simulation and real data analysis show that three types of proposed methods exhibit better performance than FSIR in terms of the estimation accuracy and stability.

功能切片逆回归(FSIR)是最常用的功能降维方法之一。然而,FSIR有两个明显的缺点。一方面,每个切片的样本数量不能太小,选择一个合适的S是很困难的,特别是对于小样本量的数据,其中S表示切片的数量。另一方面,众所周知,当链接函数是偶数(或对称)依赖时,FSIR及其相关方法的性能很差。为了解决这两个问题,我们提出了三种新的估计方法。首先,我们提出了基于模糊划分的功能模糊逆回归(FFIR)方法。与使用硬分割的FSIR相比,模糊分割使用所有不同权重的样本来估计每个切片的平均值。因此,即使对于小样本量的数据,FFIR也表现出良好的性能。其次,我们提出了两种转换方法,即FSIRR和FSIRP,避免了响应和预测器之间的对称依赖。FSIRR通过转换响应变量消除对称依赖,而FSIRP通过转换功能预测器克服对称依赖。第三,结合FFIRR和两种变换方法的优点,提出了FFIRR和FFIRP方法。FFIRR和FFIRP替代了对经FFIR变换数据的FSIR方法。仿真和实际数据分析表明,三种方法在估计精度和稳定性方面都优于FSIR方法。
{"title":"Functional dimension reduction based on fuzzy partition and transformation","authors":"Beiting Liang,&nbsp;Taoxuan Gao,&nbsp;Defa Bai,&nbsp;Guochang Wang","doi":"10.1111/anzs.12363","DOIUrl":"10.1111/anzs.12363","url":null,"abstract":"<div>\u0000 \u0000 <p>Functional sliced inverse regression (FSIR) is the among most popular methods for the functional dimension reduction. However, FSIR has two evident shortcomings. On the one hand, the number of samples in each slice must not be too small and selecting a suitable <i>S</i> is difficult, particularly for data with small sample size, where <i>S</i> indicates the number of slices. On the other hand, FSIR and its related methods are well-known for their poor performance when the link function is an even (or symmetric) dependency. To solve these two problems, we propose three new types of estimation methods. First, we propose the functional fuzzy inverse regression (FFIR) method based on a fuzzy partition. Compared with FSIR that uses a hard partition, the fuzzy partition uses all samples with different weights to estimate the mean in each slice. Therefore, FFIR exhibits good performance even for data with small sample size. Second, we suggest two transformation approaches, namely, FSIRR and FSIRP, avoiding the symmetric dependency between the response and the predictor. FSIRR eliminates the symmetric dependency by transforming the response variable, while FSIRP overcomes the symmetric dependency by transforming the functional predictor. Third, we propose the FFIRR and FFIRP methods by combining the advantages of FFIR and two transformation methods. FFIRR and FFIRP replace the FSIR method on the transformation data via FFIR. Simulation and real data analysis show that three types of proposed methods exhibit better performance than FSIR in terms of the estimation accuracy and stability.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81694850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Australian & New Zealand Journal of Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1