首页 > 最新文献

Australian & New Zealand Journal of Statistics最新文献

英文 中文
Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets 用于检测非正态聚类的贝叶斯层次混合模型应用于嘈杂的基因组和环境数据集
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-08-01 DOI: 10.1111/anzs.12370
Huizi Zhang, Ben Swallow, Mayetri Gupta

Clustering to find subgroups with common features is often a necessary first step in the statistical modelling and analysis of large and complex datasets. Although follow-up analyses often make use of complex statistical models that are appropriate for the specific application, most popular clustering approaches are either nonparametric, or based on Gaussian mixture models and their variants, often for reasons of computational efficiency. Certain characteristics in the data, such as the presence of outliers, or non-ellipsoidal cluster shapes, that are common in modern scientific datasets, often lead these methods to fail to detect the cluster components accurately. In this article, we present two efficient and robust Bayesian clustering approaches that seek to overcome these limitations—a model-based ‘tight’ clustering approach to cluster points in the presence of outliers, and a hierarchical Laplace mixture-based approach to cluster heavy-tailed and otherwise non-normal cluster components—and illustrate their power and accuracy in detecting meaningful clusters in datasets from genomics, imaging and the environmental sciences.

聚类发现具有共同特征的子组通常是对大型复杂数据集进行统计建模和分析的必要的第一步。尽管后续分析经常使用适合特定应用的复杂统计模型,但最流行的聚类方法要么是非参数的,要么基于高斯混合模型及其变体,这通常是出于计算效率的原因。数据中的某些特征,例如在现代科学数据集中常见的异常值或非椭球形簇形状的存在,往往导致这些方法无法准确地检测到簇成分。在本文中,我们提出了两种高效且稳健的贝叶斯聚类方法,旨在克服这些局限性——一种基于模型的“紧密”聚类方法,用于在异常值存在的情况下聚类点,以及一种基于分层拉普拉斯混合的方法,用于聚类重尾和其他非正常聚类组件——并说明它们在检测基因组学、成像和环境科学数据集中有意义的聚类方面的能力和准确性。
{"title":"Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets","authors":"Huizi Zhang,&nbsp;Ben Swallow,&nbsp;Mayetri Gupta","doi":"10.1111/anzs.12370","DOIUrl":"10.1111/anzs.12370","url":null,"abstract":"<p>Clustering to find subgroups with common features is often a necessary first step in the statistical modelling and analysis of large and complex datasets. Although follow-up analyses often make use of complex statistical models that are appropriate for the specific application, most popular clustering approaches are either nonparametric, or based on Gaussian mixture models and their variants, often for reasons of computational efficiency. Certain characteristics in the data, such as the presence of outliers, or non-ellipsoidal cluster shapes, that are common in modern scientific datasets, often lead these methods to fail to detect the cluster components accurately. In this article, we present two efficient and robust Bayesian clustering approaches that seek to overcome these limitations—a model-based ‘tight’ clustering approach to cluster points in the presence of outliers, and a hierarchical Laplace mixture-based approach to cluster heavy-tailed and otherwise non-normal cluster components—and illustrate their power and accuracy in detecting meaningful clusters in datasets from genomics, imaging and the environmental sciences.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 2","pages":"313-337"},"PeriodicalIF":1.1,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12370","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83368306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bayesian non-parametric spatial prior for traffic crash risk mapping: A case study of Victoria, Australia 交通碰撞风险映射的贝叶斯非参数空间先验:以澳大利亚维多利亚州为例
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-07-06 DOI: 10.1111/anzs.12369
J.-B. Durand, F. Forbes, C.D. Phan, L. Truong, H.D. Nguyen, F. Dama

We develop a Bayesian non-parametric (BNP) model coupled with Markov random fields (MRFs) for risk mapping, to infer homogeneous spatial regions in terms of risks. In contrast to most existing methods, the proposed approach does not require an arbitrary commitment to a specified number of risk classes and determines their risk levels automatically. We consider settings in which the relevant information are counts and propose a so-called BNP hidden MRF (BNP-HMRF) model that is able to handle such data. The model inference is carried out using a variational Bayes expectation–maximisation algorithm and the approach is illustrated on traffic crash data in the state of Victoria, Australia. The obtained results corroborate well with the traffic safety literature. More generally, the model presented here for risk mapping offers an effective, convenient and fast way to conduct partition of spatially localised count data.

我们开发了一个贝叶斯非参数(BNP)模型,结合马尔可夫随机场(mrf)进行风险映射,以推断风险的均匀空间区域。与大多数现有方法相比,所提出的方法不需要对指定数量的风险类别进行任意承诺,并自动确定其风险级别。我们考虑了相关信息计数的设置,并提出了能够处理此类数据的所谓BNP隐藏MRF (BNP- hmrf)模型。模型推理使用变分贝叶斯期望最大化算法进行,该方法在澳大利亚维多利亚州的交通事故数据上进行了说明。所得结果与交通安全文献吻合较好。更一般地说,本文提出的风险映射模型为对空间局部化计数数据进行分区提供了一种有效、方便、快速的方法。
{"title":"Bayesian non-parametric spatial prior for traffic crash risk mapping: A case study of Victoria, Australia","authors":"J.-B. Durand,&nbsp;F. Forbes,&nbsp;C.D. Phan,&nbsp;L. Truong,&nbsp;H.D. Nguyen,&nbsp;F. Dama","doi":"10.1111/anzs.12369","DOIUrl":"10.1111/anzs.12369","url":null,"abstract":"<p>We develop a Bayesian non-parametric (BNP) model coupled with Markov random fields (MRFs) for risk mapping, to infer homogeneous spatial regions in terms of risks. In contrast to most existing methods, the proposed approach does not require an arbitrary commitment to a specified number of risk classes and determines their risk levels automatically. We consider settings in which the relevant information are counts and propose a so-called BNP hidden MRF (BNP-HMRF) model that is able to handle such data. The model inference is carried out using a variational Bayes expectation–maximisation algorithm and the approach is illustrated on traffic crash data in the state of Victoria, Australia. The obtained results corroborate well with the traffic safety literature. More generally, the model presented here for risk mapping offers an effective, convenient and fast way to conduct partition of spatially localised count data.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 2","pages":"171-204"},"PeriodicalIF":1.1,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12369","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72911289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Visualising the pattern of long-term genotype performance by leveraging a genomic prediction model 利用基因组预测模型可视化长期基因型表现模式
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-05-03 DOI: 10.1111/anzs.12362
Vivi N. Arief, Ian H. DeLacy, Thomas Payne, Kaye E. Basford

Historical data from plant breeding programs provide valuable resources to study the response of genotypes to the changing environment (i.e. genotype-by-environment interaction). Such data have been used to evaluate the pattern of genotype performance across regions or locations, but its use to evaluate the long-term pattern of genotype performance across environments (i.e. locations-by-years) has been hampered by the lack of common genotypes across years. This lack of common genotypes is due to the structure of the breeding program, especially for annual crops, where only a proportion of selected genotypes are tested in subsequent years. This has resulted in a sparse prediction of the performance of genotypes across years (i.e. a genotype-by-year table). A genomic prediction method that fitted both a relationship matrix among genotypes and a relationship matrix among environments (i.e. years) could overcome this limitation and produce a dense genotype-by-year table, thereby enabling some evaluation of long-term genotype performance. In this paper, we applied the genomic prediction model to the yield data from CIMMYT's Elite Spring Wheat Yield Trials (ESWYT) to visualise the pattern of genotype performance over 25 years.

植物育种项目的历史数据为研究基因型对环境变化的响应(即基因型与环境的相互作用)提供了宝贵的资源。这些数据已被用于评估不同地区或地点的基因型表现模式,但由于缺乏不同年份的共同基因型,将其用于评估不同环境(即按年份划分的地点)的基因型表现的长期模式受到了阻碍。常见基因型的缺乏是由于育种计划的结构造成的,特别是对于一年生作物,在随后的年份中只有一部分选定的基因型进行了测试。这导致了对基因型表现的稀疏预测(即按年的基因型表)。一种既可以拟合基因型之间的关系矩阵,也可以拟合环境(如年份)之间的关系矩阵的基因组预测方法可以克服这一限制,并产生一个密集的按年份的基因型表,从而能够对基因型的长期表现进行一些评估。在本文中,我们将基因组预测模型应用于CIMMYT的精英春小麦产量试验(ESWYT)的产量数据,以可视化25年来基因型表现的模式。
{"title":"Visualising the pattern of long-term genotype performance by leveraging a genomic prediction model","authors":"Vivi N. Arief,&nbsp;Ian H. DeLacy,&nbsp;Thomas Payne,&nbsp;Kaye E. Basford","doi":"10.1111/anzs.12362","DOIUrl":"10.1111/anzs.12362","url":null,"abstract":"<p>Historical data from plant breeding programs provide valuable resources to study the response of genotypes to the changing environment (i.e. genotype-by-environment interaction). Such data have been used to evaluate the pattern of genotype performance across regions or locations, but its use to evaluate the long-term pattern of genotype performance across environments (i.e. locations-by-years) has been hampered by the lack of common genotypes across years. This lack of common genotypes is due to the structure of the breeding program, especially for annual crops, where only a proportion of selected genotypes are tested in subsequent years. This has resulted in a sparse prediction of the performance of genotypes across years (i.e. a genotype-by-year table). A genomic prediction method that fitted both a relationship matrix among genotypes and a relationship matrix among environments (i.e. years) could overcome this limitation and produce a dense genotype-by-year table, thereby enabling some evaluation of long-term genotype performance. In this paper, we applied the genomic prediction model to the yield data from CIMMYT's Elite Spring Wheat Yield Trials (ESWYT) to visualise the pattern of genotype performance over 25 years.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 2","pages":"297-312"},"PeriodicalIF":1.1,"publicationDate":"2022-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12362","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72714347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Functional dimension reduction based on fuzzy partition and transformation 基于模糊划分和变换的功能降维
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-04-25 DOI: 10.1111/anzs.12363
Beiting Liang, Taoxuan Gao, Defa Bai, Guochang Wang

Functional sliced inverse regression (FSIR) is the among most popular methods for the functional dimension reduction. However, FSIR has two evident shortcomings. On the one hand, the number of samples in each slice must not be too small and selecting a suitable S is difficult, particularly for data with small sample size, where S indicates the number of slices. On the other hand, FSIR and its related methods are well-known for their poor performance when the link function is an even (or symmetric) dependency. To solve these two problems, we propose three new types of estimation methods. First, we propose the functional fuzzy inverse regression (FFIR) method based on a fuzzy partition. Compared with FSIR that uses a hard partition, the fuzzy partition uses all samples with different weights to estimate the mean in each slice. Therefore, FFIR exhibits good performance even for data with small sample size. Second, we suggest two transformation approaches, namely, FSIRR and FSIRP, avoiding the symmetric dependency between the response and the predictor. FSIRR eliminates the symmetric dependency by transforming the response variable, while FSIRP overcomes the symmetric dependency by transforming the functional predictor. Third, we propose the FFIRR and FFIRP methods by combining the advantages of FFIR and two transformation methods. FFIRR and FFIRP replace the FSIR method on the transformation data via FFIR. Simulation and real data analysis show that three types of proposed methods exhibit better performance than FSIR in terms of the estimation accuracy and stability.

功能切片逆回归(FSIR)是最常用的功能降维方法之一。然而,FSIR有两个明显的缺点。一方面,每个切片的样本数量不能太小,选择一个合适的S是很困难的,特别是对于小样本量的数据,其中S表示切片的数量。另一方面,众所周知,当链接函数是偶数(或对称)依赖时,FSIR及其相关方法的性能很差。为了解决这两个问题,我们提出了三种新的估计方法。首先,我们提出了基于模糊划分的功能模糊逆回归(FFIR)方法。与使用硬分割的FSIR相比,模糊分割使用所有不同权重的样本来估计每个切片的平均值。因此,即使对于小样本量的数据,FFIR也表现出良好的性能。其次,我们提出了两种转换方法,即FSIRR和FSIRP,避免了响应和预测器之间的对称依赖。FSIRR通过转换响应变量消除对称依赖,而FSIRP通过转换功能预测器克服对称依赖。第三,结合FFIRR和两种变换方法的优点,提出了FFIRR和FFIRP方法。FFIRR和FFIRP替代了对经FFIR变换数据的FSIR方法。仿真和实际数据分析表明,三种方法在估计精度和稳定性方面都优于FSIR方法。
{"title":"Functional dimension reduction based on fuzzy partition and transformation","authors":"Beiting Liang,&nbsp;Taoxuan Gao,&nbsp;Defa Bai,&nbsp;Guochang Wang","doi":"10.1111/anzs.12363","DOIUrl":"10.1111/anzs.12363","url":null,"abstract":"<div>\u0000 \u0000 <p>Functional sliced inverse regression (FSIR) is the among most popular methods for the functional dimension reduction. However, FSIR has two evident shortcomings. On the one hand, the number of samples in each slice must not be too small and selecting a suitable <i>S</i> is difficult, particularly for data with small sample size, where <i>S</i> indicates the number of slices. On the other hand, FSIR and its related methods are well-known for their poor performance when the link function is an even (or symmetric) dependency. To solve these two problems, we propose three new types of estimation methods. First, we propose the functional fuzzy inverse regression (FFIR) method based on a fuzzy partition. Compared with FSIR that uses a hard partition, the fuzzy partition uses all samples with different weights to estimate the mean in each slice. Therefore, FFIR exhibits good performance even for data with small sample size. Second, we suggest two transformation approaches, namely, FSIRR and FSIRP, avoiding the symmetric dependency between the response and the predictor. FSIRR eliminates the symmetric dependency by transforming the response variable, while FSIRP overcomes the symmetric dependency by transforming the functional predictor. Third, we propose the FFIRR and FFIRP methods by combining the advantages of FFIR and two transformation methods. FFIRR and FFIRP replace the FSIR method on the transformation data via FFIR. Simulation and real data analysis show that three types of proposed methods exhibit better performance than FSIR in terms of the estimation accuracy and stability.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 1","pages":"45-66"},"PeriodicalIF":1.1,"publicationDate":"2022-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81694850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smooth tests of goodness of fit for the distributional assumption of regression models 回归模型分布假设拟合优度的平滑检验
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-04-18 DOI: 10.1111/anzs.12361
J. C. W. Rayner, Paul Rippon, Thomas Suesse, Olivier Thas

We focus on regression models that consist of (i) a model for the conditional mean of the outcome and (ii) a distributional assumption about the distribution of the outcome, both conditional on the regressors. Generalised linear models form a well-known example. The choice of the outcome distribution is often motivated by prior or background knowledge of the researcher, or it is simply chosen for convenience. We propose smooth goodness of fit tests for testing the distributional assumption in regression models. The tests arise from embedding the regression model in a smooth family of alternatives, and constructing appropriate score tests that correctly account for nuisance parameter estimation. The tests are customised, focussed and comprehensive. We present several examples to illustrate the wide applicability of our method. A small simulation study demonstrates that our tests have power to detect important deviations from the hypothesised model.

我们关注的回归模型由(i)结果的条件均值模型和(ii)结果分布的分布假设组成,两者都以回归量为条件。广义线性模型就是一个众所周知的例子。结果分布的选择通常是由研究人员的先验或背景知识驱动的,或者只是为了方便而选择。我们提出了平滑拟合优度检验来检验回归模型中的分布假设。测试产生于将回归模型嵌入到平滑的备选方案家族中,并构建正确考虑干扰参数估计的适当分数测试。这些测试是定制的、重点突出的、全面的。我们举几个例子来说明我们的方法的广泛适用性。一项小型模拟研究表明,我们的测试有能力检测出与假设模型的重要偏差。
{"title":"Smooth tests of goodness of fit for the distributional assumption of regression models","authors":"J. C. W. Rayner,&nbsp;Paul Rippon,&nbsp;Thomas Suesse,&nbsp;Olivier Thas","doi":"10.1111/anzs.12361","DOIUrl":"10.1111/anzs.12361","url":null,"abstract":"<div>\u0000 \u0000 <p>We focus on regression models that consist of (i) a model for the conditional mean of the outcome and (ii) a distributional assumption about the distribution of the outcome, both conditional on the regressors. Generalised linear models form a well-known example. The choice of the outcome distribution is often motivated by prior or background knowledge of the researcher, or it is simply chosen for convenience. We propose smooth goodness of fit tests for testing the distributional assumption in regression models. The tests arise from embedding the regression model in a smooth family of alternatives, and constructing appropriate score tests that correctly account for nuisance parameter estimation. The tests are customised, focussed and comprehensive. We present several examples to illustrate the wide applicability of our method. A small simulation study demonstrates that our tests have power to detect important deviations from the hypothesised model.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 1","pages":"67-85"},"PeriodicalIF":1.1,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79975652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modal clustering on PPGMMGA projection subspace PPGMMGA投影子空间上的模态聚类
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-04-14 DOI: 10.1111/anzs.12360
Luca Scrucca

PPGMMGA is a projection pursuit (PP) algorithm aimed at detecting and visualising clustering structures in multivariate data. The algorithm uses the negentropy as PP index obtained by fitting Gaussian mixture models (GMMs) for density estimation and, then, exploits genetic algorithms (GAs) for its optimisation. Since the PPGMMGA algorithm is a dimension reduction technique specifically introduced for visualisation purposes, cluster memberships are not explicitly provided. In this paper a modal clustering approach is proposed for estimating clusters of projected data points. In particular, a modal EM algorithm is employed to estimate the modes corresponding to the local maxima in the projection subspace of the underlying density estimated using parsimonious GMMs. Data points are then clustered according to the domain of attraction of the identified modes. Simulated and real data are discussed to illustrate the proposed method and evaluate the clustering performance.

PPGMMGA是一种投影寻踪算法,旨在检测和可视化多变量数据中的聚类结构。该算法使用通过拟合高斯混合模型(GMMs)获得的负熵作为PP指数进行密度估计,然后利用遗传算法(GAs)进行优化。由于PPGMMGA算法是一种专门为可视化目的引入的降维技术,因此没有明确提供集群成员关系。本文提出了一种估计投影数据点聚类的模态聚类方法。特别地,使用模态EM算法来估计使用简约GMMs估计的底层密度的投影子空间中的局部最大值对应的模态。然后根据识别模式的吸引域对数据点进行聚类。通过仿真数据和真实数据对该方法进行了验证,并对聚类性能进行了评价。
{"title":"Modal clustering on PPGMMGA projection subspace","authors":"Luca Scrucca","doi":"10.1111/anzs.12360","DOIUrl":"10.1111/anzs.12360","url":null,"abstract":"<p>PPGMMGA is a projection pursuit (PP) algorithm aimed at detecting and visualising clustering structures in multivariate data. The algorithm uses the negentropy as PP index obtained by fitting Gaussian mixture models (GMMs) for density estimation and, then, exploits genetic algorithms (GAs) for its optimisation. Since the PPGMMGA algorithm is a dimension reduction technique specifically introduced for visualisation purposes, cluster memberships are not explicitly provided. In this paper a modal clustering approach is proposed for estimating clusters of projected data points. In particular, a modal EM algorithm is employed to estimate the modes corresponding to the local maxima in the projection subspace of the underlying density estimated using parsimonious GMMs. Data points are then clustered according to the domain of attraction of the identified modes. Simulated and real data are discussed to illustrate the proposed method and evaluate the clustering performance.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 2","pages":"158-170"},"PeriodicalIF":1.1,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12360","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81884427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MPS: An R package for modelling shifted families of distributions MPS:一个R软件包,用于建模移位的分布族
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-04-14 DOI: 10.1111/anzs.12359
Mahdi Teimouri, Saralees Nadarajah

Generalised statistical distributions have been widely used over the last decades for modelling phenomena in different fields. The generalisations have been made to produce distributions with more flexibility and lead to more accurate modelling in practice. Statistical analysis of the generalised distributions requires new statistical packages. The Newdistns package due to Nadarajah and Rocha provides R routines with functionality to compute probability density function (PDF), cumulative distribution function (CDF), quantile function, random numbers and parameter estimates of 19 families of distributions with applications in survival analysis. Here, we introduce an R package, called MPS, for computing PDF, CDF, quantile function, random numbers, Q–Q plots and parameter estimates for 24 shifted new families of distributions. By considering an extra location parameter, each family will be defined on the whole real line and so covers a broader range of applicability. We adopt the well-known maximum product spacing approach to estimate parameters of the families because under some situations the maximum likelihood (ML) estimators fail to exist. We demonstrate MPS by analysing two well-known real data sets. For the first data set, the ML estimators break down, but MPS works well. For the second set, adding a location parameter results in a reasonable model while the absence of the location parameter makes the model quite inappropriate. The MPS is available from CRAN at https://cran.r-project.org/package=MPS.

在过去的几十年里,广义统计分布被广泛用于不同领域的现象建模。泛化是为了产生更灵活的分布,并在实践中导致更准确的建模。广义分布的统计分析需要新的统计软件包。Nadarajah和Rocha开发的Newdistns包为R例程提供了计算概率密度函数(PDF)、累积分布函数(CDF)、分位数函数、随机数和19个分布族参数估计的功能,并应用于生存分析。在这里,我们介绍了一个R包,称为MPS,用于计算PDF, CDF,分位数函数,随机数,Q-Q图和24移位的新分布族的参数估计。通过考虑额外的位置参数,每个族将在整个实线上定义,因此涵盖了更广泛的适用性。由于在某些情况下最大似然(ML)估计器不存在,我们采用众所周知的最大积间距方法来估计族的参数。我们通过分析两个众所周知的真实数据集来证明MPS。对于第一个数据集,ML估计器失效了,但MPS工作得很好。对于第二组,增加了位置参数得到了一个合理的模型,而没有位置参数则使模型非常不合适。MPS可从CRAN获取,网址为https://cran.r-project.org/package=MPS。
{"title":"MPS: An R package for modelling shifted families of distributions","authors":"Mahdi Teimouri,&nbsp;Saralees Nadarajah","doi":"10.1111/anzs.12359","DOIUrl":"10.1111/anzs.12359","url":null,"abstract":"<div>\u0000 \u0000 <p>Generalised statistical distributions have been widely used over the last decades for modelling phenomena in different fields. The generalisations have been made to produce distributions with more flexibility and lead to more accurate modelling in practice. Statistical analysis of the generalised distributions requires new statistical packages. The <span>Newdistns</span> package due to Nadarajah and Rocha provides <span>R</span> routines with functionality to compute probability density function (PDF), cumulative distribution function (CDF), quantile function, random numbers and parameter estimates of 19 families of distributions with applications in survival analysis. Here, we introduce an <span>R</span> package, called <span>MPS</span>, for computing PDF, CDF, quantile function, random numbers, Q–Q plots and parameter estimates for 24 shifted new families of distributions. By considering an extra location parameter, each family will be defined on the whole real line and so covers a broader range of applicability. We adopt the well-known maximum product spacing approach to estimate parameters of the families because under some situations the maximum likelihood (ML) estimators fail to exist. We demonstrate <span>MPS</span> by analysing two well-known real data sets. For the first data set, the ML estimators break down, but <span>MPS</span> works well. For the second set, adding a location parameter results in a reasonable model while the absence of the location parameter makes the model quite inappropriate. The <span>MPS</span> is available from CRAN at https://cran.r-project.org/package=MPS.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 1","pages":"86-108"},"PeriodicalIF":1.1,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84200205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fast and efficient algorithms for sparse semiparametric bifunctional regression 稀疏半参数双泛函回归的快速有效算法
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-03-08 DOI: 10.1111/anzs.12355
Silvia Novo, Philippe Vieu, Germán Aneiros

A new sparse semiparametric model is proposed, which incorporates the influence of two functional random variables in a scalar response in a flexible and interpretable manner. One of the functional covariates is included through a single-index structure, while the other is included linearly through the high-dimensional vector formed by its discretised observations. For this model, two new algorithms are presented for selecting relevant variables in the linear part and estimating the model. Both procedures utilise the functional origin of linear covariates. Finite sample experiments demonstrated the scope of application of both algorithms: the first method is a fast algorithm that provides a solution (without loss in predictive ability) for the significant computational time required by standard variable selection methods for estimating this model, and the second algorithm completes the set of relevant linear covariates provided by the first, thus improving its predictive efficiency. Some asymptotic results theoretically support both procedures. A real data application demonstrated the applicability of the presented methodology from a predictive perspective in terms of the interpretability of outputs and low computational cost.

提出了一种新的稀疏半参数模型,该模型以灵活和可解释的方式考虑了标量响应中两个泛函随机变量的影响。其中一个函数协变量通过单指标结构包含,而另一个函数协变量通过其离散观测形成的高维向量线性包含。针对该模型,提出了线性部分相关变量选取和模型估计的两种新算法。这两种方法都利用了线性协变量的函数原点。有限样本实验证明了两种算法的适用范围:第一种算法是一种快速算法,它在不损失预测能力的情况下解决了标准变量选择方法估计该模型所需的大量计算时间,第二种算法完成了第一种算法提供的相关线性协变量集,从而提高了其预测效率。一些渐近结果在理论上支持这两种方法。一个真实的数据应用表明,从预测的角度来看,所提出的方法在输出的可解释性和低计算成本方面的适用性。
{"title":"Fast and efficient algorithms for sparse semiparametric bifunctional regression","authors":"Silvia Novo,&nbsp;Philippe Vieu,&nbsp;Germán Aneiros","doi":"10.1111/anzs.12355","DOIUrl":"10.1111/anzs.12355","url":null,"abstract":"<div>\u0000 \u0000 <p>A new sparse semiparametric model is proposed, which incorporates the influence of two functional random variables in a scalar response in a flexible and interpretable manner. One of the functional covariates is included through a single-index structure, while the other is included linearly through the high-dimensional vector formed by its discretised observations. For this model, two new algorithms are presented for selecting relevant variables in the linear part and estimating the model. Both procedures utilise the functional origin of linear covariates. Finite sample experiments demonstrated the scope of application of both algorithms: the first method is a fast algorithm that provides a solution (without loss in predictive ability) for the significant computational time required by standard variable selection methods for estimating this model, and the second algorithm completes the set of relevant linear covariates provided by the first, thus improving its predictive efficiency. Some asymptotic results theoretically support both procedures. A real data application demonstrated the applicability of the presented methodology from a predictive perspective in terms of the interpretability of outputs and low computational cost.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 4","pages":"606-638"},"PeriodicalIF":1.1,"publicationDate":"2022-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83590071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bessel regression and bbreg package to analyse bounded data 贝塞尔回归和bbreg包分析有界数据
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-02-12 DOI: 10.1111/anzs.12354
Wagner Barreto-Souza, Vinícius D. Mayrink, Alexandre B. Simas

Beta regression has been extensively used by statisticians and practitioners to model bounded continuous data without a strong competitor having the same main features. A class of normalised inverse-Gaussian (N-IG) process was introduced in the literature and has been explored in the Bayesian context as a powerful alternative to the Dirichlet process. Until this moment, no attention has been paid to the univariate N-IG distribution in the classical inference. In this paper, we propose the bessel regression based on the univariate N-IG distribution, which is an alternative to the beta model. The estimation of the parameters is done through an expectation–maximisation (EM) algorithm and the paper discusses how to perform inference. A useful and practical discrimination procedure is proposed for model selection between bessel and beta regressions. A new R package called bbreg is developed for fitting both bessel and beta regression models based on the EM-algorithm and further providing graphical tools for model adequacy and model selection as well. Proper documentation for this package is available. The performances of the models are evaluated under misspecification in a simulation study. An empirical illustration is explored to confront results from bessel and beta regressions by using the new R package bbreg.

Beta回归已被统计学家和从业者广泛用于建模有界连续数据,而没有具有相同主要特征的强大竞争对手。一类归一化逆高斯(N-IG)过程在文献中被引入,并在贝叶斯背景下作为Dirichlet过程的强大替代品进行了探索。到目前为止,还没有注意到经典推理中的单变量N-IG分布。在本文中,我们提出了基于单变量N-IG分布的贝塞尔回归,这是贝塔模型的替代方案。通过期望最大化(EM)算法对参数进行估计,并讨论了如何进行推理。提出了一种实用的贝塞尔回归和贝塔回归模型选择的判别方法。一个名为bbreg的新R包被开发出来,用于拟合基于em -算法的贝塞尔和贝塔回归模型,并进一步提供模型充分性和模型选择的图形工具。此包的适当文档是可用的。在仿真研究中,对模型的性能进行了评估。通过使用新的R包bbreg,探索了一个实证说明来面对贝塞尔和贝塔回归的结果。
{"title":"Bessel regression and bbreg package to analyse bounded data","authors":"Wagner Barreto-Souza,&nbsp;Vinícius D. Mayrink,&nbsp;Alexandre B. Simas","doi":"10.1111/anzs.12354","DOIUrl":"10.1111/anzs.12354","url":null,"abstract":"<div>\u0000 \u0000 <p>Beta regression has been extensively used by statisticians and practitioners to model bounded continuous data without a strong competitor having the same main features. A class of normalised inverse-Gaussian (N-IG) process was introduced in the literature and has been explored in the Bayesian context as a powerful alternative to the Dirichlet process. Until this moment, no attention has been paid to the univariate N-IG distribution in the classical inference. In this paper, we propose the bessel regression based on the univariate N-IG distribution, which is an alternative to the beta model. The estimation of the parameters is done through an expectation–maximisation (EM) algorithm and the paper discusses how to perform inference. A useful and practical discrimination procedure is proposed for model selection between bessel and beta regressions. A new <span>R</span> package called <span>bbreg</span> is developed for fitting both bessel and beta regression models based on the EM-algorithm and further providing graphical tools for model adequacy and model selection as well. Proper documentation for this package is available. The performances of the models are evaluated under misspecification in a simulation study. An empirical illustration is explored to confront results from bessel and beta regressions by using the new <span>R</span> package <span>bbreg</span>.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 4","pages":"685-706"},"PeriodicalIF":1.1,"publicationDate":"2022-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81092039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modelling students’ career indicators via mixtures of parsimonious matrix-normal distributions 通过简洁矩阵-正态分布的混合模型对学生的职业指标进行建模
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2022-02-10 DOI: 10.1111/anzs.12351
Salvatore D. Tomarchio, Salvatore Ingrassia, Volodymyr Melnykov

The evaluation of the teaching efficiency, under different points of view, is an important aspect for the university system because it helps managers to improve more and more the quality of the education and helps students to achieve strong professional skills. In this framework, students’ careers as well as teachers’ qualification and quantity adequacy indicators are analysed based on data sets provided by the Italian National Agency for the Evaluation of Universities and Research Institutes (ANVUR) according to a mixture model approach. In particular, parsimonious mixtures of matrix-normal distributions are used to detect underlying grouping structures. The results show that the data present an underlying group structure of courses having different traits, thus providing useful information for the university policy makers.

从不同的角度来看,教学效率的评估是大学系统的一个重要方面,因为它有助于管理者越来越多地提高教育质量,帮助学生获得强大的专业技能。在这个框架中,学生的职业生涯以及教师的资格和数量充足性指标是根据意大利国家大学和研究所评估机构(ANVUR)根据混合模型方法提供的数据集进行分析的。特别是,使用矩阵-正态分布的简约混合来检测潜在的分组结构。结果表明,这些数据显示了具有不同特征的课程的潜在群体结构,从而为大学决策者提供了有用的信息。
{"title":"Modelling students’ career indicators via mixtures of parsimonious matrix-normal distributions","authors":"Salvatore D. Tomarchio,&nbsp;Salvatore Ingrassia,&nbsp;Volodymyr Melnykov","doi":"10.1111/anzs.12351","DOIUrl":"10.1111/anzs.12351","url":null,"abstract":"<div>\u0000 \u0000 <p>The evaluation of the teaching efficiency, under different points of view, is an important aspect for the university system because it helps managers to improve more and more the quality of the education and helps students to achieve strong professional skills. In this framework, students’ careers as well as teachers’ qualification and quantity adequacy indicators are analysed based on data sets provided by the Italian National Agency for the Evaluation of Universities and Research Institutes (ANVUR) according to a mixture model approach. In particular, parsimonious mixtures of matrix-normal distributions are used to detect underlying grouping structures. The results show that the data present an underlying group structure of courses having different traits, thus providing useful information for the university policy makers.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 2","pages":"117-132"},"PeriodicalIF":1.1,"publicationDate":"2022-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82270077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Australian & New Zealand Journal of Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1