首页 > 最新文献

Australian & New Zealand Journal of Statistics最新文献

英文 中文
Penalised, post-pretest, and post-shrinkage strategies in nonlinear growth models 非线性增长模型中的惩罚、后预测和后收缩策略
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-09-04 DOI: 10.1111/anzs.12373
Janjira Piladaeng, S. Ejaz Ahmed, Supranee Lisawadi

In nonlinear growth models, we considered the parameter estimation under subspace information for low-dimensional and high-dimensional data. We proposed novel estimators based on pretest and shrinkage strategies to improve the estimation efficiency and to establish asymptotic properties. We used simulation studies and a real data example to confirm the theoretical results. We also applied two well-known penalised methods—least absolute shrinkage and selection operator (LASSO) and adaptive LASSO (aLASSO)—for the dimensional reduction of the predictor variables. The results demonstrated that the pretest and shrinkage estimation strategies performed well in parameter estimations when the subspace information was incorrect for both low- and high-dimensional regimes.

在非线性增长模型中,我们考虑了低维和高维数据在子空间信息下的参数估计。我们提出了基于预检验和收缩策略的新估计器,以提高估计效率并建立渐近性质。通过仿真研究和实际数据算例对理论结果进行了验证。我们还应用了两种众所周知的惩罚方法-最小绝对收缩和选择算子(LASSO)和自适应LASSO (aLASSO) -用于预测变量的降维。结果表明,当子空间信息不正确时,预测试和收缩估计策略在低维和高维区域的参数估计中都表现良好。
{"title":"Penalised, post-pretest, and post-shrinkage strategies in nonlinear growth models","authors":"Janjira Piladaeng,&nbsp;S. Ejaz Ahmed,&nbsp;Supranee Lisawadi","doi":"10.1111/anzs.12373","DOIUrl":"10.1111/anzs.12373","url":null,"abstract":"<div>\u0000 \u0000 <p>In nonlinear growth models, we considered the parameter estimation under subspace information for low-dimensional and high-dimensional data. We proposed novel estimators based on pretest and shrinkage strategies to improve the estimation efficiency and to establish asymptotic properties. We used simulation studies and a real data example to confirm the theoretical results. We also applied two well-known penalised methods—least absolute shrinkage and selection operator (LASSO) and adaptive LASSO (aLASSO)—for the dimensional reduction of the predictor variables. The results demonstrated that the pretest and shrinkage estimation strategies performed well in parameter estimations when the subspace information was incorrect for both low- and high-dimensional regimes.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 3","pages":"381-405"},"PeriodicalIF":1.1,"publicationDate":"2022-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86830099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Robust subtractive stability measures for fast and exhaustive feature importance ranking and selection in generalised linear models 广义线性模型中快速穷尽特征重要性排序和选择的鲁棒减法稳定性测度
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-09-02 DOI: 10.1111/anzs.12375
Connor Smith, Boris Guennewig, Samuel Muller

We introduce the relatively new concept of subtractive lack-of-fit measures in the context of robust regression, in particular in generalised linear models. We devise a fast and robust feature selection framework for regression that empirically enjoys better performance than other selection methods while remaining computationally feasible when fully exhaustive methods are not. Our method builds on the concepts of model stability, subtractive lack-of-fit measures and repeated model identification. We demonstrate how the multiple implementations add value in a robust regression type context, in particular through utilizing a combination of robust regression coefficient and scale estimates. Through resampling, we construct a robust stability matrix, which contains multiple measures of feature importance for each variable. By constructing this stability matrix and using it to rank features based on importance, we are able to reduce the candidate model space and then perform an exhaustive search on the remaining models. We also introduce two different visualisations to better convey information held within the stability matrix; a subtractive Mosaic Probability Plot and a subtractive Variable Inclusion Plot. We demonstrate how these graphics allow for a better understanding of how variable importance changes under small alterations to the underlying data. Our framework is made available in R through the RobStabR package.

我们在鲁棒回归的背景下,特别是在广义线性模型中,引入了相对较新的减法拟合缺失度量的概念。我们设计了一个快速和鲁棒的回归特征选择框架,在经验上比其他选择方法具有更好的性能,同时在完全穷尽方法不具备计算可行性的情况下保持计算可行性。我们的方法建立在模型稳定性、相减失拟合度量和重复模型识别的概念之上。我们演示了多个实现如何在稳健回归类型上下文中增加价值,特别是通过结合使用稳健回归系数和规模估计。通过重采样,我们构建了一个鲁棒稳定性矩阵,其中包含每个变量的多个特征重要性度量。通过构造这个稳定性矩阵并使用它来根据重要性对特征进行排序,我们能够减少候选模型空间,然后对剩余模型进行穷举搜索。我们还引入了两种不同的可视化,以更好地传达稳定性矩阵中包含的信息;一个相减的镶嵌概率图和一个相减的变量包含图。我们演示了这些图形如何允许更好地理解在底层数据的微小变化下变量重要性是如何变化的。我们的框架可以通过RobStabR包在R中使用。
{"title":"Robust subtractive stability measures for fast and exhaustive feature importance ranking and selection in generalised linear models","authors":"Connor Smith,&nbsp;Boris Guennewig,&nbsp;Samuel Muller","doi":"10.1111/anzs.12375","DOIUrl":"10.1111/anzs.12375","url":null,"abstract":"<p>We introduce the relatively new concept of subtractive lack-of-fit measures in the context of robust regression, in particular in generalised linear models. We devise a fast and robust feature selection framework for regression that empirically enjoys better performance than other selection methods while remaining computationally feasible when fully exhaustive methods are not. Our method builds on the concepts of model stability, subtractive lack-of-fit measures and repeated model identification. We demonstrate how the multiple implementations add value in a robust regression type context, in particular through utilizing a combination of robust regression coefficient and scale estimates. Through resampling, we construct a robust stability matrix, which contains multiple measures of feature importance for each variable. By constructing this stability matrix and using it to rank features based on importance, we are able to reduce the candidate model space and then perform an exhaustive search on the remaining models. We also introduce two different visualisations to better convey information held within the stability matrix; a subtractive Mosaic Probability Plot and a subtractive Variable Inclusion Plot. We demonstrate how these graphics allow for a better understanding of how variable importance changes under small alterations to the underlying data. Our framework is made available in <span>R</span> through the <span>RobStabR</span> package.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 3","pages":"339-355"},"PeriodicalIF":1.1,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90245712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multivariate Kruskal_Wallis tests based on principal component score and latent source of independent component analysis 基于主成分评分和独立成分分析潜在源的多元Kruskal_Wallis检验
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-08-04 DOI: 10.1111/anzs.12371
Amitava Mukherjee, Hidetoshi Murakami

Analysing multivariate and high_dimensional multi_sample data is essential in many scientific fields. One of the most crucial and popular topics in modern nonparametric statistics is multi_sample comparison problems for such multivariate and high_dimensional data. The Kruskal_Wallis test is widely used in the multi_sample problem. For multivariate or high_dimensional data, it is imperative to specify how to determine the ranks of individual vector_valued observations in terms of various distance metrics. Alternatively, one can combine the concept of principal component scores or independent component scores with the Kruskal_Wallis test. A simple but powerful Kruskal_Wallis test based on the principal component scores is discussed in this paper for the multivariate and high_dimensional data. Another type of Kruskal_Wallis test based on latent sources of independent component analysis is constructed as a competitor. These tests are suitable for testing the difference in the location vector, scale matrix or both and can be used with equal and unequal sample sizes. These tests_ power performances are thoroughly compared with traditional distance_based Kruskal_Wallis tests for multivariate data using simulation based on Monte Carlo for various population distributions. We include an illustration of the proposed tests using real data. The paper concludes with some remarks and directions for future research.

分析多元和高维多样本数据在许多科学领域是必不可少的。多样本比较问题是现代非参数统计中最重要和最热门的课题之一。Kruskal_Wallis检验被广泛应用于多样本问题。对于多变量或高维数据,必须指定如何根据各种距离度量来确定单个vector_value观测值的秩。或者,可以将主成分分数或独立成分分数的概念与Kruskal_Wallis检验结合起来。本文讨论了一种简单但功能强大的基于主成分分数的多维高维数据Kruskal_Wallis检验方法。构建了另一种基于独立分量分析潜在源的Kruskal_Wallis检验作为竞争对手。这些测试适用于测试位置向量、尺度矩阵或两者的差异,并可用于相等和不相等样本量。通过蒙特卡罗模拟,对不同总体分布的多变量数据与传统的基于距离的Kruskal_Wallis测试进行了比较。我们包括使用实际数据的拟议测试的说明。最后,对今后的研究提出了几点看法和方向。
{"title":"Multivariate Kruskal_Wallis tests based on principal component score and latent source of independent component analysis","authors":"Amitava Mukherjee,&nbsp;Hidetoshi Murakami","doi":"10.1111/anzs.12371","DOIUrl":"10.1111/anzs.12371","url":null,"abstract":"<div>\u0000 \u0000 <p>Analysing multivariate and high_dimensional multi_sample data is essential in many scientific fields. One of the most crucial and popular topics in modern nonparametric statistics is multi_sample comparison problems for such multivariate and high_dimensional data. The Kruskal_Wallis test is widely used in the multi_sample problem. For multivariate or high_dimensional data, it is imperative to specify how to determine the ranks of individual vector_valued observations in terms of various distance metrics. Alternatively, one can combine the concept of principal component scores or independent component scores with the Kruskal_Wallis test. A simple but powerful Kruskal_Wallis test based on the principal component scores is discussed in this paper for the multivariate and high_dimensional data. Another type of Kruskal_Wallis test based on latent sources of independent component analysis is constructed as a competitor. These tests are suitable for testing the difference in the location vector, scale matrix or both and can be used with equal and unequal sample sizes. These tests_ power performances are thoroughly compared with traditional distance_based Kruskal_Wallis tests for multivariate data using simulation based on Monte Carlo for various population distributions. We include an illustration of the proposed tests using real data. The paper concludes with some remarks and directions for future research.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 3","pages":"356-380"},"PeriodicalIF":1.1,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72631322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Festschrift for Geoff McLachlan 杰夫·麦克拉克伦的纪念
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-08-01 DOI: 10.1111/anzs.12372
Hien Nguyen, Sharon Lee, Florence Forbes

This article introduces a special issue of the Australian and New Zealand Journal of Statistics, dedicated as a Festschrift for Geoff McLachlan on the occasion of his 75th birthday.

这篇文章介绍了澳大利亚和新西兰统计杂志的一期特刊,作为对杰夫·麦克拉克兰75岁生日的纪念。
{"title":"A Festschrift for Geoff McLachlan","authors":"Hien Nguyen,&nbsp;Sharon Lee,&nbsp;Florence Forbes","doi":"10.1111/anzs.12372","DOIUrl":"10.1111/anzs.12372","url":null,"abstract":"<p>This article introduces a special issue of the Australian and New Zealand Journal of Statistics, dedicated as a Festschrift for Geoff McLachlan on the occasion of his 75th birthday.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 2","pages":"111-116"},"PeriodicalIF":1.1,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12372","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78925706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets 用于检测非正态聚类的贝叶斯层次混合模型应用于嘈杂的基因组和环境数据集
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-08-01 DOI: 10.1111/anzs.12370
Huizi Zhang, Ben Swallow, Mayetri Gupta

Clustering to find subgroups with common features is often a necessary first step in the statistical modelling and analysis of large and complex datasets. Although follow-up analyses often make use of complex statistical models that are appropriate for the specific application, most popular clustering approaches are either nonparametric, or based on Gaussian mixture models and their variants, often for reasons of computational efficiency. Certain characteristics in the data, such as the presence of outliers, or non-ellipsoidal cluster shapes, that are common in modern scientific datasets, often lead these methods to fail to detect the cluster components accurately. In this article, we present two efficient and robust Bayesian clustering approaches that seek to overcome these limitations—a model-based ‘tight’ clustering approach to cluster points in the presence of outliers, and a hierarchical Laplace mixture-based approach to cluster heavy-tailed and otherwise non-normal cluster components—and illustrate their power and accuracy in detecting meaningful clusters in datasets from genomics, imaging and the environmental sciences.

聚类发现具有共同特征的子组通常是对大型复杂数据集进行统计建模和分析的必要的第一步。尽管后续分析经常使用适合特定应用的复杂统计模型,但最流行的聚类方法要么是非参数的,要么基于高斯混合模型及其变体,这通常是出于计算效率的原因。数据中的某些特征,例如在现代科学数据集中常见的异常值或非椭球形簇形状的存在,往往导致这些方法无法准确地检测到簇成分。在本文中,我们提出了两种高效且稳健的贝叶斯聚类方法,旨在克服这些局限性——一种基于模型的“紧密”聚类方法,用于在异常值存在的情况下聚类点,以及一种基于分层拉普拉斯混合的方法,用于聚类重尾和其他非正常聚类组件——并说明它们在检测基因组学、成像和环境科学数据集中有意义的聚类方面的能力和准确性。
{"title":"Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets","authors":"Huizi Zhang,&nbsp;Ben Swallow,&nbsp;Mayetri Gupta","doi":"10.1111/anzs.12370","DOIUrl":"10.1111/anzs.12370","url":null,"abstract":"<p>Clustering to find subgroups with common features is often a necessary first step in the statistical modelling and analysis of large and complex datasets. Although follow-up analyses often make use of complex statistical models that are appropriate for the specific application, most popular clustering approaches are either nonparametric, or based on Gaussian mixture models and their variants, often for reasons of computational efficiency. Certain characteristics in the data, such as the presence of outliers, or non-ellipsoidal cluster shapes, that are common in modern scientific datasets, often lead these methods to fail to detect the cluster components accurately. In this article, we present two efficient and robust Bayesian clustering approaches that seek to overcome these limitations—a model-based ‘tight’ clustering approach to cluster points in the presence of outliers, and a hierarchical Laplace mixture-based approach to cluster heavy-tailed and otherwise non-normal cluster components—and illustrate their power and accuracy in detecting meaningful clusters in datasets from genomics, imaging and the environmental sciences.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 2","pages":"313-337"},"PeriodicalIF":1.1,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12370","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83368306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bayesian non-parametric spatial prior for traffic crash risk mapping: A case study of Victoria, Australia 交通碰撞风险映射的贝叶斯非参数空间先验:以澳大利亚维多利亚州为例
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-07-06 DOI: 10.1111/anzs.12369
J.-B. Durand, F. Forbes, C.D. Phan, L. Truong, H.D. Nguyen, F. Dama

We develop a Bayesian non-parametric (BNP) model coupled with Markov random fields (MRFs) for risk mapping, to infer homogeneous spatial regions in terms of risks. In contrast to most existing methods, the proposed approach does not require an arbitrary commitment to a specified number of risk classes and determines their risk levels automatically. We consider settings in which the relevant information are counts and propose a so-called BNP hidden MRF (BNP-HMRF) model that is able to handle such data. The model inference is carried out using a variational Bayes expectation–maximisation algorithm and the approach is illustrated on traffic crash data in the state of Victoria, Australia. The obtained results corroborate well with the traffic safety literature. More generally, the model presented here for risk mapping offers an effective, convenient and fast way to conduct partition of spatially localised count data.

我们开发了一个贝叶斯非参数(BNP)模型,结合马尔可夫随机场(mrf)进行风险映射,以推断风险的均匀空间区域。与大多数现有方法相比,所提出的方法不需要对指定数量的风险类别进行任意承诺,并自动确定其风险级别。我们考虑了相关信息计数的设置,并提出了能够处理此类数据的所谓BNP隐藏MRF (BNP- hmrf)模型。模型推理使用变分贝叶斯期望最大化算法进行,该方法在澳大利亚维多利亚州的交通事故数据上进行了说明。所得结果与交通安全文献吻合较好。更一般地说,本文提出的风险映射模型为对空间局部化计数数据进行分区提供了一种有效、方便、快速的方法。
{"title":"Bayesian non-parametric spatial prior for traffic crash risk mapping: A case study of Victoria, Australia","authors":"J.-B. Durand,&nbsp;F. Forbes,&nbsp;C.D. Phan,&nbsp;L. Truong,&nbsp;H.D. Nguyen,&nbsp;F. Dama","doi":"10.1111/anzs.12369","DOIUrl":"10.1111/anzs.12369","url":null,"abstract":"<p>We develop a Bayesian non-parametric (BNP) model coupled with Markov random fields (MRFs) for risk mapping, to infer homogeneous spatial regions in terms of risks. In contrast to most existing methods, the proposed approach does not require an arbitrary commitment to a specified number of risk classes and determines their risk levels automatically. We consider settings in which the relevant information are counts and propose a so-called BNP hidden MRF (BNP-HMRF) model that is able to handle such data. The model inference is carried out using a variational Bayes expectation–maximisation algorithm and the approach is illustrated on traffic crash data in the state of Victoria, Australia. The obtained results corroborate well with the traffic safety literature. More generally, the model presented here for risk mapping offers an effective, convenient and fast way to conduct partition of spatially localised count data.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 2","pages":"171-204"},"PeriodicalIF":1.1,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12369","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72911289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Visualising the pattern of long-term genotype performance by leveraging a genomic prediction model 利用基因组预测模型可视化长期基因型表现模式
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-05-03 DOI: 10.1111/anzs.12362
Vivi N. Arief, Ian H. DeLacy, Thomas Payne, Kaye E. Basford

Historical data from plant breeding programs provide valuable resources to study the response of genotypes to the changing environment (i.e. genotype-by-environment interaction). Such data have been used to evaluate the pattern of genotype performance across regions or locations, but its use to evaluate the long-term pattern of genotype performance across environments (i.e. locations-by-years) has been hampered by the lack of common genotypes across years. This lack of common genotypes is due to the structure of the breeding program, especially for annual crops, where only a proportion of selected genotypes are tested in subsequent years. This has resulted in a sparse prediction of the performance of genotypes across years (i.e. a genotype-by-year table). A genomic prediction method that fitted both a relationship matrix among genotypes and a relationship matrix among environments (i.e. years) could overcome this limitation and produce a dense genotype-by-year table, thereby enabling some evaluation of long-term genotype performance. In this paper, we applied the genomic prediction model to the yield data from CIMMYT's Elite Spring Wheat Yield Trials (ESWYT) to visualise the pattern of genotype performance over 25 years.

植物育种项目的历史数据为研究基因型对环境变化的响应(即基因型与环境的相互作用)提供了宝贵的资源。这些数据已被用于评估不同地区或地点的基因型表现模式,但由于缺乏不同年份的共同基因型,将其用于评估不同环境(即按年份划分的地点)的基因型表现的长期模式受到了阻碍。常见基因型的缺乏是由于育种计划的结构造成的,特别是对于一年生作物,在随后的年份中只有一部分选定的基因型进行了测试。这导致了对基因型表现的稀疏预测(即按年的基因型表)。一种既可以拟合基因型之间的关系矩阵,也可以拟合环境(如年份)之间的关系矩阵的基因组预测方法可以克服这一限制,并产生一个密集的按年份的基因型表,从而能够对基因型的长期表现进行一些评估。在本文中,我们将基因组预测模型应用于CIMMYT的精英春小麦产量试验(ESWYT)的产量数据,以可视化25年来基因型表现的模式。
{"title":"Visualising the pattern of long-term genotype performance by leveraging a genomic prediction model","authors":"Vivi N. Arief,&nbsp;Ian H. DeLacy,&nbsp;Thomas Payne,&nbsp;Kaye E. Basford","doi":"10.1111/anzs.12362","DOIUrl":"10.1111/anzs.12362","url":null,"abstract":"<p>Historical data from plant breeding programs provide valuable resources to study the response of genotypes to the changing environment (i.e. genotype-by-environment interaction). Such data have been used to evaluate the pattern of genotype performance across regions or locations, but its use to evaluate the long-term pattern of genotype performance across environments (i.e. locations-by-years) has been hampered by the lack of common genotypes across years. This lack of common genotypes is due to the structure of the breeding program, especially for annual crops, where only a proportion of selected genotypes are tested in subsequent years. This has resulted in a sparse prediction of the performance of genotypes across years (i.e. a genotype-by-year table). A genomic prediction method that fitted both a relationship matrix among genotypes and a relationship matrix among environments (i.e. years) could overcome this limitation and produce a dense genotype-by-year table, thereby enabling some evaluation of long-term genotype performance. In this paper, we applied the genomic prediction model to the yield data from CIMMYT's Elite Spring Wheat Yield Trials (ESWYT) to visualise the pattern of genotype performance over 25 years.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 2","pages":"297-312"},"PeriodicalIF":1.1,"publicationDate":"2022-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12362","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72714347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Functional dimension reduction based on fuzzy partition and transformation 基于模糊划分和变换的功能降维
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-04-25 DOI: 10.1111/anzs.12363
Beiting Liang, Taoxuan Gao, Defa Bai, Guochang Wang

Functional sliced inverse regression (FSIR) is the among most popular methods for the functional dimension reduction. However, FSIR has two evident shortcomings. On the one hand, the number of samples in each slice must not be too small and selecting a suitable S is difficult, particularly for data with small sample size, where S indicates the number of slices. On the other hand, FSIR and its related methods are well-known for their poor performance when the link function is an even (or symmetric) dependency. To solve these two problems, we propose three new types of estimation methods. First, we propose the functional fuzzy inverse regression (FFIR) method based on a fuzzy partition. Compared with FSIR that uses a hard partition, the fuzzy partition uses all samples with different weights to estimate the mean in each slice. Therefore, FFIR exhibits good performance even for data with small sample size. Second, we suggest two transformation approaches, namely, FSIRR and FSIRP, avoiding the symmetric dependency between the response and the predictor. FSIRR eliminates the symmetric dependency by transforming the response variable, while FSIRP overcomes the symmetric dependency by transforming the functional predictor. Third, we propose the FFIRR and FFIRP methods by combining the advantages of FFIR and two transformation methods. FFIRR and FFIRP replace the FSIR method on the transformation data via FFIR. Simulation and real data analysis show that three types of proposed methods exhibit better performance than FSIR in terms of the estimation accuracy and stability.

功能切片逆回归(FSIR)是最常用的功能降维方法之一。然而,FSIR有两个明显的缺点。一方面,每个切片的样本数量不能太小,选择一个合适的S是很困难的,特别是对于小样本量的数据,其中S表示切片的数量。另一方面,众所周知,当链接函数是偶数(或对称)依赖时,FSIR及其相关方法的性能很差。为了解决这两个问题,我们提出了三种新的估计方法。首先,我们提出了基于模糊划分的功能模糊逆回归(FFIR)方法。与使用硬分割的FSIR相比,模糊分割使用所有不同权重的样本来估计每个切片的平均值。因此,即使对于小样本量的数据,FFIR也表现出良好的性能。其次,我们提出了两种转换方法,即FSIRR和FSIRP,避免了响应和预测器之间的对称依赖。FSIRR通过转换响应变量消除对称依赖,而FSIRP通过转换功能预测器克服对称依赖。第三,结合FFIRR和两种变换方法的优点,提出了FFIRR和FFIRP方法。FFIRR和FFIRP替代了对经FFIR变换数据的FSIR方法。仿真和实际数据分析表明,三种方法在估计精度和稳定性方面都优于FSIR方法。
{"title":"Functional dimension reduction based on fuzzy partition and transformation","authors":"Beiting Liang,&nbsp;Taoxuan Gao,&nbsp;Defa Bai,&nbsp;Guochang Wang","doi":"10.1111/anzs.12363","DOIUrl":"10.1111/anzs.12363","url":null,"abstract":"<div>\u0000 \u0000 <p>Functional sliced inverse regression (FSIR) is the among most popular methods for the functional dimension reduction. However, FSIR has two evident shortcomings. On the one hand, the number of samples in each slice must not be too small and selecting a suitable <i>S</i> is difficult, particularly for data with small sample size, where <i>S</i> indicates the number of slices. On the other hand, FSIR and its related methods are well-known for their poor performance when the link function is an even (or symmetric) dependency. To solve these two problems, we propose three new types of estimation methods. First, we propose the functional fuzzy inverse regression (FFIR) method based on a fuzzy partition. Compared with FSIR that uses a hard partition, the fuzzy partition uses all samples with different weights to estimate the mean in each slice. Therefore, FFIR exhibits good performance even for data with small sample size. Second, we suggest two transformation approaches, namely, FSIRR and FSIRP, avoiding the symmetric dependency between the response and the predictor. FSIRR eliminates the symmetric dependency by transforming the response variable, while FSIRP overcomes the symmetric dependency by transforming the functional predictor. Third, we propose the FFIRR and FFIRP methods by combining the advantages of FFIR and two transformation methods. FFIRR and FFIRP replace the FSIR method on the transformation data via FFIR. Simulation and real data analysis show that three types of proposed methods exhibit better performance than FSIR in terms of the estimation accuracy and stability.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 1","pages":"45-66"},"PeriodicalIF":1.1,"publicationDate":"2022-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81694850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smooth tests of goodness of fit for the distributional assumption of regression models 回归模型分布假设拟合优度的平滑检验
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-04-18 DOI: 10.1111/anzs.12361
J. C. W. Rayner, Paul Rippon, Thomas Suesse, Olivier Thas

We focus on regression models that consist of (i) a model for the conditional mean of the outcome and (ii) a distributional assumption about the distribution of the outcome, both conditional on the regressors. Generalised linear models form a well-known example. The choice of the outcome distribution is often motivated by prior or background knowledge of the researcher, or it is simply chosen for convenience. We propose smooth goodness of fit tests for testing the distributional assumption in regression models. The tests arise from embedding the regression model in a smooth family of alternatives, and constructing appropriate score tests that correctly account for nuisance parameter estimation. The tests are customised, focussed and comprehensive. We present several examples to illustrate the wide applicability of our method. A small simulation study demonstrates that our tests have power to detect important deviations from the hypothesised model.

我们关注的回归模型由(i)结果的条件均值模型和(ii)结果分布的分布假设组成,两者都以回归量为条件。广义线性模型就是一个众所周知的例子。结果分布的选择通常是由研究人员的先验或背景知识驱动的,或者只是为了方便而选择。我们提出了平滑拟合优度检验来检验回归模型中的分布假设。测试产生于将回归模型嵌入到平滑的备选方案家族中,并构建正确考虑干扰参数估计的适当分数测试。这些测试是定制的、重点突出的、全面的。我们举几个例子来说明我们的方法的广泛适用性。一项小型模拟研究表明,我们的测试有能力检测出与假设模型的重要偏差。
{"title":"Smooth tests of goodness of fit for the distributional assumption of regression models","authors":"J. C. W. Rayner,&nbsp;Paul Rippon,&nbsp;Thomas Suesse,&nbsp;Olivier Thas","doi":"10.1111/anzs.12361","DOIUrl":"10.1111/anzs.12361","url":null,"abstract":"<div>\u0000 \u0000 <p>We focus on regression models that consist of (i) a model for the conditional mean of the outcome and (ii) a distributional assumption about the distribution of the outcome, both conditional on the regressors. Generalised linear models form a well-known example. The choice of the outcome distribution is often motivated by prior or background knowledge of the researcher, or it is simply chosen for convenience. We propose smooth goodness of fit tests for testing the distributional assumption in regression models. The tests arise from embedding the regression model in a smooth family of alternatives, and constructing appropriate score tests that correctly account for nuisance parameter estimation. The tests are customised, focussed and comprehensive. We present several examples to illustrate the wide applicability of our method. A small simulation study demonstrates that our tests have power to detect important deviations from the hypothesised model.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 1","pages":"67-85"},"PeriodicalIF":1.1,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79975652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modal clustering on PPGMMGA projection subspace PPGMMGA投影子空间上的模态聚类
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-04-14 DOI: 10.1111/anzs.12360
Luca Scrucca

PPGMMGA is a projection pursuit (PP) algorithm aimed at detecting and visualising clustering structures in multivariate data. The algorithm uses the negentropy as PP index obtained by fitting Gaussian mixture models (GMMs) for density estimation and, then, exploits genetic algorithms (GAs) for its optimisation. Since the PPGMMGA algorithm is a dimension reduction technique specifically introduced for visualisation purposes, cluster memberships are not explicitly provided. In this paper a modal clustering approach is proposed for estimating clusters of projected data points. In particular, a modal EM algorithm is employed to estimate the modes corresponding to the local maxima in the projection subspace of the underlying density estimated using parsimonious GMMs. Data points are then clustered according to the domain of attraction of the identified modes. Simulated and real data are discussed to illustrate the proposed method and evaluate the clustering performance.

PPGMMGA是一种投影寻踪算法,旨在检测和可视化多变量数据中的聚类结构。该算法使用通过拟合高斯混合模型(GMMs)获得的负熵作为PP指数进行密度估计,然后利用遗传算法(GAs)进行优化。由于PPGMMGA算法是一种专门为可视化目的引入的降维技术,因此没有明确提供集群成员关系。本文提出了一种估计投影数据点聚类的模态聚类方法。特别地,使用模态EM算法来估计使用简约GMMs估计的底层密度的投影子空间中的局部最大值对应的模态。然后根据识别模式的吸引域对数据点进行聚类。通过仿真数据和真实数据对该方法进行了验证,并对聚类性能进行了评价。
{"title":"Modal clustering on PPGMMGA projection subspace","authors":"Luca Scrucca","doi":"10.1111/anzs.12360","DOIUrl":"10.1111/anzs.12360","url":null,"abstract":"<p>PPGMMGA is a projection pursuit (PP) algorithm aimed at detecting and visualising clustering structures in multivariate data. The algorithm uses the negentropy as PP index obtained by fitting Gaussian mixture models (GMMs) for density estimation and, then, exploits genetic algorithms (GAs) for its optimisation. Since the PPGMMGA algorithm is a dimension reduction technique specifically introduced for visualisation purposes, cluster memberships are not explicitly provided. In this paper a modal clustering approach is proposed for estimating clusters of projected data points. In particular, a modal EM algorithm is employed to estimate the modes corresponding to the local maxima in the projection subspace of the underlying density estimated using parsimonious GMMs. Data points are then clustered according to the domain of attraction of the identified modes. Simulated and real data are discussed to illustrate the proposed method and evaluate the clustering performance.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 2","pages":"158-170"},"PeriodicalIF":1.1,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12360","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81884427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Australian & New Zealand Journal of Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1