Australian & New Zealand Journal of Statistics最新文献

英文中文

On two conjectures about perturbations of the stochastic growth rate 关于随机增长率扰动的两个猜想

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2023-02-15 DOI: 10.1111/anzs.12382

Stefano Giaimo

The stochastic growth rate describes long-run growth of a population that lives in a fluctuating environment. Perturbation analysis of the stochastic growth rate provides crucial information for population managers, ecologists and evolutionary biologists. This analysis quantifies the response of the stochastic growth rate to changes in demographic parameters. A form of this analysis deals with changes that only occur in some environmental states. Caswell put forth two conjectures about environment-specific perturbations of the stochastic growth rate. The conjectures link the stationary distribution of the stochastic environmental process with the magnitude of some environment-specific perturbations. This note disproves one conjecture and proves the other.

随机增长率描述了生活在波动环境中的人口的长期增长。随机增长率的扰动分析为种群管理者、生态学家和进化生物学家提供了重要信息。该分析量化了随机增长率对人口统计参数变化的响应。这种分析的一种形式是处理只在某些环境状态下发生的变化。Caswell提出了两个关于随机增长率的环境特定扰动的猜想。这些猜想将随机环境过程的平稳分布与某些特定环境扰动的大小联系起来。这个注释推翻了一个猜想，也证明了另一个猜想。

引用次数: 0

A Richards growth model to predict fruit weight 预测水果重量的理查兹生长模型

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2023-01-05 DOI: 10.1111/anzs.12380

Daniel Gerhard, Elena Moltchanova

The Richards model comprises several popular sigmoidal and monomolecular growth curves. We illustrate fitting of a Bayesian Richards model by splitting the full growth model into several submodels, followed by a model selection procedure. The performance of the methodology is evaluated by Monte Carlo simulations. A double-sigmoidal version of the Richards model is applied to model grape bunch weight based on data from a New Zealand vineyard over a single growing period.

A Bayesian Richards growth model applied to grape size data. Representations of phenological processes are selected through multi-model inference.

理查兹模型包括几种流行的s型和单分子生长曲线。我们通过将完整的增长模型分成几个子模型来说明贝叶斯理查兹模型的拟合，然后是模型选择过程。通过蒙特卡洛仿真对该方法的性能进行了评价。理查兹模型的双s型版本应用于基于新西兰葡萄园单一生长时期数据的葡萄串重量模型。贝叶斯理查兹生长模型应用于葡萄大小数据。物候过程的表征是通过多模型推理来选择的。

引用次数: 1

Minimum cost-compression risk in principal component analysis 主成分分析中的最小成本压缩风险

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2022-12-28 DOI: 10.1111/anzs.12378

Bhargab Chattopadhyay, Swarnali Banerjee

Principal Component Analysis (PCA) is a popular multivariate analytic tool which can be used for dimension reduction without losing much information. Data vectors containing a large number of features arriving sequentially may be correlated with each other. An effective algorithm for such situations is online PCA. Existing Online PCA research works revolve around proposing efficient scalable updating algorithms focusing on compression loss only. They do not take into account the size of the dataset at which further arrival of data vectors can be terminated and dimension reduction can be applied. It is well known that the dataset size contributes to reducing the compression loss – the smaller the dataset size, the larger the compression loss while larger the dataset size, the lesser the compression loss. However, the reduction in compression loss by increasing dataset size will increase the total data collection cost. In this paper, we move beyond the scalability and updation problems related to Online PCA and focus on optimising a cost-compression loss which considers the compression loss and data collection cost. We minimise the corresponding risk using a two-stage PCA algorithm. The resulting two-stage algorithm is a fast and an efficient alternative to Online PCA and is shown to exhibit attractive convergence properties with no assumption on specific data distributions. Experimental studies demonstrate similar results and further illustrations are provided using real data. As an extension, a multi-stage PCA algorithm is discussed as well. Given the time complexity, the two-stage PCA algorithm is emphasised over the multi-stage PCA algorithm for online data.

主成分分析(PCA)是一种流行的多元分析工具，它可以在不丢失太多信息的情况下进行降维。包含大量顺序到达的特征的数据向量可能彼此相关。在线PCA是一种有效的算法。现有的在线PCA研究工作围绕着提出有效的可扩展更新算法，只关注压缩损失。它们没有考虑数据集的大小，数据向量的进一步到达可以被终止，并且可以应用降维。众所周知，数据集大小有助于减少压缩损失——数据集大小越小，压缩损失越大，而数据集大小越大，压缩损失越小。然而，通过增加数据集大小来减少压缩损失将增加总数据收集成本。在本文中，我们超越了与在线PCA相关的可扩展性和更新问题，并专注于优化考虑压缩损失和数据收集成本的成本-压缩损失。我们使用两阶段PCA算法最小化相应的风险。所得到的两阶段算法是一种快速而有效的在线PCA替代方案，并且在不假设特定数据分布的情况下显示出有吸引力的收敛特性。实验研究表明了类似的结果，并利用实际数据提供了进一步的说明。作为扩展，本文还讨论了一种多阶段PCA算法。考虑到在线数据的时间复杂度，两阶段主成分分析算法比多阶段主成分分析算法更受重视。

{"title":"Minimum cost-compression risk in principal component analysis","authors":"Bhargab Chattopadhyay, Swarnali Banerjee","doi":"10.1111/anzs.12378","DOIUrl":"10.1111/anzs.12378","url":null,"abstract":"<div>\u0000 \u0000 <p>Principal Component Analysis (PCA) is a popular multivariate analytic tool which can be used for dimension reduction without losing much information. Data vectors containing a large number of features arriving sequentially may be correlated with each other. An effective algorithm for such situations is online PCA. Existing Online PCA research works revolve around proposing efficient scalable updating algorithms focusing on compression loss only. They do not take into account the size of the dataset at which further arrival of data vectors can be terminated and dimension reduction can be applied. It is well known that the dataset size contributes to reducing the compression loss – the smaller the dataset size, the larger the compression loss while larger the dataset size, the lesser the compression loss. However, the reduction in compression loss by increasing dataset size will increase the total data collection cost. In this paper, we move beyond the scalability and updation problems related to Online PCA and focus on optimising a cost-compression loss which considers the compression loss and data collection cost. We minimise the corresponding risk using a two-stage PCA algorithm. The resulting two-stage algorithm is a fast and an efficient alternative to Online PCA and is shown to exhibit attractive convergence properties with no assumption on specific data distributions. Experimental studies demonstrate similar results and further illustrations are provided using real data. As an extension, a multi-stage PCA algorithm is discussed as well. Given the time complexity, the two-stage PCA algorithm is emphasised over the multi-stage PCA algorithm for online data.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 4","pages":"422-441"},"PeriodicalIF":1.1,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82020722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A new minification integer-valued autoregressive process driven by explanatory variables 一种新的由解释变量驱动的最小化整数值自回归过程

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2022-12-28 DOI: 10.1111/anzs.12379

Lianyong Qian, Fukang Zhu

The discrete minification model based on the modified negative binomial operator, as an extension to the continuous minification model, can be used to describe an extreme value after few increasing values. To make this model more practical and flexible, a new minification integer-valued autoregressive process driven by explanatory variables is proposed. Ergodicity of the new process is discussed. The estimators of the unknown parameters are obtained via the conditional least squares and conditional maximum likelihood methods, and the asymptotic properties are also established. A testing procedure for checking existence of the explanatory variables is developed. Some Monte Carlo simulations are given to illustrate the finite-sample performances of the estimators under specification and misspecification and the test, respectively. A real example is applied to illustrate the performance of our model.

基于修正负二项式算子的离散最小化模型，作为连续最小化模型的扩展，可以用来描述少量增量后的极值。为了使该模型更加实用和灵活，提出了一种新的由解释变量驱动的最小化整值自回归过程。讨论了新工艺的遍历性。通过条件最小二乘和条件极大似然方法得到了未知参数的估计量，并建立了未知参数的渐近性质。开发了检验解释变量是否存在的检验程序。通过蒙特卡罗仿真分别说明了该估计器在规范和不规范情况下的有限样本性能和测试结果。最后用一个实例说明了该模型的性能。

引用次数: 2

Small area estimation under a semi-parametric covariate measured with error 半参数协变量测量误差下的小面积估计

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2022-12-08 DOI: 10.1111/anzs.12377

Reyhane Sefidkar, Mahmoud Torabi, Amir Kavousi

In recent years, small area estimation has played an important role in statistics as it deals with the problem of obtaining reliable estimates for parameters of interest in areas with small or even zero sample sizes corresponding to population sizes. Nested error linear regression models are often used in small area estimation assuming that the covariates are measured without error and also the relationship between covariates and response variable is linear. Small area models have also been extended to the case in which a linear relationship may not hold, using penalised spline (P-spline) regression, but assuming that the covariates are measured without error. Recently, a nested error regression model using a P-spline regression model, for the fixed part of the model, has been studied assuming the presence of measurement error in covariate, in the Bayesian framework. In this paper, we propose a frequentist approach to study a semi-parametric nested error regression model using P-splines with a covariate measured with error. In particular, the pseudo-empirical best predictors of small area means and their corresponding mean squared prediction error estimates are studied. Performance of the proposed approach is evaluated through a simulation and also by a real data application. We propose a frequentist approach to study a semi-parametric nested error regression model using P-splines with a covariate measured with error.

近年来，小面积估计在统计学中发挥了重要作用，因为它处理的是在与人口规模相对应的小样本甚至为零的区域中获得感兴趣参数的可靠估计的问题。嵌套误差线性回归模型常用于小面积估计，假设协变量测量无误差，且协变量与响应变量之间呈线性关系。小面积模型也被扩展到线性关系可能不成立的情况下，使用惩罚样条(p样条)回归，但假设协变量的测量没有误差。本文研究了在贝叶斯框架下，假设协变量中存在测量误差，采用p样条回归模型对模型的固定部分建立嵌套误差回归模型。在本文中，我们提出了一种频率论方法来研究一个半参数嵌套误差回归模型，该模型使用带有误差测量协变量的p样条。特别研究了小面积均值的伪经验最佳预测因子及其相应的均方预测误差估计。通过仿真和实际数据应用对该方法的性能进行了评价。我们提出了一种频率论方法来研究一个半参数嵌套误差回归模型，该模型使用带有误差测量协变量的p样条。

{"title":"Small area estimation under a semi-parametric covariate measured with error","authors":"Reyhane Sefidkar, Mahmoud Torabi, Amir Kavousi","doi":"10.1111/anzs.12377","DOIUrl":"10.1111/anzs.12377","url":null,"abstract":"<div>\u0000 \u0000 <p>In recent years, small area estimation has played an important role in statistics as it deals with the problem of obtaining reliable estimates for parameters of interest in areas with small or even zero sample sizes corresponding to population sizes. Nested error linear regression models are often used in small area estimation assuming that the covariates are measured without error and also the relationship between covariates and response variable is linear. Small area models have also been extended to the case in which a linear relationship may not hold, using penalised spline (P-spline) regression, but assuming that the covariates are measured without error. Recently, a nested error regression model using a P-spline regression model, for the fixed part of the model, has been studied assuming the presence of measurement error in covariate, in the Bayesian framework. In this paper, we propose a frequentist approach to study a semi-parametric nested error regression model using P-splines with a covariate measured with error. In particular, the pseudo-empirical best predictors of small area means and their corresponding mean squared prediction error estimates are studied. Performance of the proposed approach is evaluated through a simulation and also by a real data application. We propose a frequentist approach to study a semi-parametric nested error regression model using P-splines with a covariate measured with error.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 4","pages":"495-515"},"PeriodicalIF":1.1,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89503682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Permutation entropy and its variants for measuring temporal dependence 测量时间依赖性的排列熵及其变体

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2022-12-08 DOI: 10.1111/anzs.12376

Xin Huang, Han Lin Shang, David Pitt

Permutation entropy (PE) is an ordinal-based non-parametric complexity measure for studying the temporal dependence structure in a linear or non-linear time series. Based on the PE, we propose a new measure, namely permutation dependence (PD), to quantify the strength of the temporal dependence in a univariate time series and remedy the major drawbacks of PE. We demonstrate that the PE and PD are viable and useful alternatives to conventional temporal dependence measures, such as the autocorrelation function (ACF) and mutual information (MI). Compared to the ACF, the PE and PD are not restricted in detecting the linear or quasi-linear serial correlation in an autoregression model. Instead, they can be viewed as non-parametric and non-linear alternatives since they do not require any prior knowledge or assumptions about the underlying structure. Compared to MI estimated by k-nearest neighbour, PE and PD show added sensitivity to structures of relatively weak strength. We compare the finite-sample performance of the PE and PD with the ACF and the MI estimated by k-nearest neighbour in a number of simulation studies to showcase their respective strengths and weaknesses. Moreover, their performance under non-stationarity is also investigated. Using high-frequency EUR/USD exchange rate returns data, we apply the PE and PD to study the temporal dependence structure in intraday foreign exchange.

置换熵(Permutation entropy, PE)是一种基于序数的非参数复杂度度量，用于研究线性或非线性时间序列中的时间依赖结构。在此基础上，我们提出了一种新的度量方法，即置换依赖(PD)，以量化单变量时间序列中时间依赖性的强度，并弥补了置换依赖的主要缺陷。我们证明，PE和PD是可行的和有用的替代传统的时间依赖性措施，如自相关函数(ACF)和互信息(MI)。与ACF相比，PE和PD在检测自回归模型中的线性或拟线性序列相关方面不受限制。相反，它们可以被视为非参数和非线性替代方案，因为它们不需要任何关于底层结构的先验知识或假设。与k近邻估计的MI相比，PE和PD对强度相对较弱的结构表现出更高的敏感性。在许多模拟研究中，我们将PE和PD的有限样本性能与由k近邻估计的ACF和MI进行比较，以展示它们各自的优点和缺点。此外，还研究了它们在非平稳条件下的性能。利用欧元/美元的高频汇率回报数据，我们运用PE和PD来研究外汇交易的时间依赖结构。

{"title":"Permutation entropy and its variants for measuring temporal dependence","authors":"Xin Huang, Han Lin Shang, David Pitt","doi":"10.1111/anzs.12376","DOIUrl":"10.1111/anzs.12376","url":null,"abstract":"<p>Permutation entropy (PE) is an ordinal-based non-parametric complexity measure for studying the temporal dependence structure in a linear or non-linear time series. Based on the PE, we propose a new measure, namely permutation dependence (PD), to quantify the strength of the temporal dependence in a univariate time series and remedy the major drawbacks of PE. We demonstrate that the PE and PD are viable and useful alternatives to conventional temporal dependence measures, such as the autocorrelation function (ACF) and mutual information (MI). Compared to the ACF, the PE and PD are not restricted in detecting the linear or quasi-linear serial correlation in an autoregression model. Instead, they can be viewed as non-parametric and non-linear alternatives since they do not require any prior knowledge or assumptions about the underlying structure. Compared to MI estimated by <i>k</i>-nearest neighbour, PE and PD show added sensitivity to structures of relatively weak strength. We compare the finite-sample performance of the PE and PD with the ACF and the MI estimated by <i>k</i>-nearest neighbour in a number of simulation studies to showcase their respective strengths and weaknesses. Moreover, their performance under non-stationarity is also investigated. Using high-frequency EUR/USD exchange rate returns data, we apply the PE and PD to study the temporal dependence structure in intraday foreign exchange.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 4","pages":"442-477"},"PeriodicalIF":1.1,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12376","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76160251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

The place of probability distributions in statistical learning. A commented book review of ‘Distributions for modeling location, scale, and shape using GAMLSS in R’ by Rigby et al. (2021) 概率分布在统计学习中的地位。Rigby等人对《在R中使用GAMLSS建模位置、规模和形状的分布》的书评(2021年)。

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2022-09-23 DOI: 10.1111/anzs.12374

Fernando Marmolejo-Ramos, Raydonal Ospina, Freddy Hernández-Barajas

Generalised additive models for location, scale and shape (GAMLSS) is a type of distributional regression framework that enables modelling numeric dependent variables via probability distributions other than those of the exponential family. While the cogs behind GAMLSS are provided in Stasinopoulos et al. 2017's book ‘Flexible regression and smoothing using GAMLSS in R, the new book by Rigby et al. considers the distributions implemented in the R software that are usable for GAMLSS modelling. A commented summary of that second book is provided in a supplementary file. Unlike traditional book reviews, two topics in this new book are briefly elaborated on: robustness (Chapter 12) and shape (Chapters 14–16). It is concluded that despite GAMLSS being a powerful and flexible framework for supervised statistical learning, striving for interpretable GAMLSS models is essential.

广义加性位置、规模和形状模型(GAMLSS)是一种分布回归框架，它可以通过概率分布而不是指数族分布来模拟数值因变量。尽管Stasinopoulos等人在2017年出版的《在R中使用GAMLSS进行灵活回归和平滑》一书中提供了GAMLSS背后的细节，但Rigby等人的新书考虑了R软件中实现的可用于GAMLSS建模的分布。在补充文件中提供了第二本书的评论摘要。与传统的书评不同，这本新书中简要阐述了两个主题:稳健性(第12章)和形状(第14-16章)。结论是，尽管GAMLSS是一个强大而灵活的监督统计学习框架，但努力建立可解释的GAMLSS模型是必不可少的。

引用次数: 0

Penalised, post-pretest, and post-shrinkage strategies in nonlinear growth models 非线性增长模型中的惩罚、后预测和后收缩策略

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2022-09-04 DOI: 10.1111/anzs.12373

Janjira Piladaeng, S. Ejaz Ahmed, Supranee Lisawadi

In nonlinear growth models, we considered the parameter estimation under subspace information for low-dimensional and high-dimensional data. We proposed novel estimators based on pretest and shrinkage strategies to improve the estimation efficiency and to establish asymptotic properties. We used simulation studies and a real data example to confirm the theoretical results. We also applied two well-known penalised methods—least absolute shrinkage and selection operator (LASSO) and adaptive LASSO (aLASSO)—for the dimensional reduction of the predictor variables. The results demonstrated that the pretest and shrinkage estimation strategies performed well in parameter estimations when the subspace information was incorrect for both low- and high-dimensional regimes.

在非线性增长模型中，我们考虑了低维和高维数据在子空间信息下的参数估计。我们提出了基于预检验和收缩策略的新估计器，以提高估计效率并建立渐近性质。通过仿真研究和实际数据算例对理论结果进行了验证。我们还应用了两种众所周知的惩罚方法-最小绝对收缩和选择算子(LASSO)和自适应LASSO (aLASSO) -用于预测变量的降维。结果表明，当子空间信息不正确时，预测试和收缩估计策略在低维和高维区域的参数估计中都表现良好。

引用次数: 1

Robust subtractive stability measures for fast and exhaustive feature importance ranking and selection in generalised linear models 广义线性模型中快速穷尽特征重要性排序和选择的鲁棒减法稳定性测度

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2022-09-02 DOI: 10.1111/anzs.12375

Connor Smith, Boris Guennewig, Samuel Muller

We introduce the relatively new concept of subtractive lack-of-fit measures in the context of robust regression, in particular in generalised linear models. We devise a fast and robust feature selection framework for regression that empirically enjoys better performance than other selection methods while remaining computationally feasible when fully exhaustive methods are not. Our method builds on the concepts of model stability, subtractive lack-of-fit measures and repeated model identification. We demonstrate how the multiple implementations add value in a robust regression type context, in particular through utilizing a combination of robust regression coefficient and scale estimates. Through resampling, we construct a robust stability matrix, which contains multiple measures of feature importance for each variable. By constructing this stability matrix and using it to rank features based on importance, we are able to reduce the candidate model space and then perform an exhaustive search on the remaining models. We also introduce two different visualisations to better convey information held within the stability matrix; a subtractive Mosaic Probability Plot and a subtractive Variable Inclusion Plot. We demonstrate how these graphics allow for a better understanding of how variable importance changes under small alterations to the underlying data. Our framework is made available in R through the RobStabR package.

我们在鲁棒回归的背景下，特别是在广义线性模型中，引入了相对较新的减法拟合缺失度量的概念。我们设计了一个快速和鲁棒的回归特征选择框架，在经验上比其他选择方法具有更好的性能，同时在完全穷尽方法不具备计算可行性的情况下保持计算可行性。我们的方法建立在模型稳定性、相减失拟合度量和重复模型识别的概念之上。我们演示了多个实现如何在稳健回归类型上下文中增加价值，特别是通过结合使用稳健回归系数和规模估计。通过重采样，我们构建了一个鲁棒稳定性矩阵，其中包含每个变量的多个特征重要性度量。通过构造这个稳定性矩阵并使用它来根据重要性对特征进行排序，我们能够减少候选模型空间，然后对剩余模型进行穷举搜索。我们还引入了两种不同的可视化，以更好地传达稳定性矩阵中包含的信息;一个相减的镶嵌概率图和一个相减的变量包含图。我们演示了这些图形如何允许更好地理解在底层数据的微小变化下变量重要性是如何变化的。我们的框架可以通过RobStabR包在R中使用。

{"title":"Robust subtractive stability measures for fast and exhaustive feature importance ranking and selection in generalised linear models","authors":"Connor Smith, Boris Guennewig, Samuel Muller","doi":"10.1111/anzs.12375","DOIUrl":"10.1111/anzs.12375","url":null,"abstract":"<p>We introduce the relatively new concept of subtractive lack-of-fit measures in the context of robust regression, in particular in generalised linear models. We devise a fast and robust feature selection framework for regression that empirically enjoys better performance than other selection methods while remaining computationally feasible when fully exhaustive methods are not. Our method builds on the concepts of model stability, subtractive lack-of-fit measures and repeated model identification. We demonstrate how the multiple implementations add value in a robust regression type context, in particular through utilizing a combination of robust regression coefficient and scale estimates. Through resampling, we construct a robust stability matrix, which contains multiple measures of feature importance for each variable. By constructing this stability matrix and using it to rank features based on importance, we are able to reduce the candidate model space and then perform an exhaustive search on the remaining models. We also introduce two different visualisations to better convey information held within the stability matrix; a subtractive Mosaic Probability Plot and a subtractive Variable Inclusion Plot. We demonstrate how these graphics allow for a better understanding of how variable importance changes under small alterations to the underlying data. Our framework is made available in <span>R</span> through the <span>RobStabR</span> package.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 3","pages":"339-355"},"PeriodicalIF":1.1,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90245712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Multivariate Kruskal_Wallis tests based on principal component score and latent source of independent component analysis 基于主成分评分和独立成分分析潜在源的多元Kruskal_Wallis检验

IF 1.1 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2022-08-04 DOI: 10.1111/anzs.12371

Amitava Mukherjee, Hidetoshi Murakami

Analysing multivariate and high_dimensional multi_sample data is essential in many scientific fields. One of the most crucial and popular topics in modern nonparametric statistics is multi_sample comparison problems for such multivariate and high_dimensional data. The Kruskal_Wallis test is widely used in the multi_sample problem. For multivariate or high_dimensional data, it is imperative to specify how to determine the ranks of individual vector_valued observations in terms of various distance metrics. Alternatively, one can combine the concept of principal component scores or independent component scores with the Kruskal_Wallis test. A simple but powerful Kruskal_Wallis test based on the principal component scores is discussed in this paper for the multivariate and high_dimensional data. Another type of Kruskal_Wallis test based on latent sources of independent component analysis is constructed as a competitor. These tests are suitable for testing the difference in the location vector, scale matrix or both and can be used with equal and unequal sample sizes. These tests_ power performances are thoroughly compared with traditional distance_based Kruskal_Wallis tests for multivariate data using simulation based on Monte Carlo for various population distributions. We include an illustration of the proposed tests using real data. The paper concludes with some remarks and directions for future research.

分析多元和高维多样本数据在许多科学领域是必不可少的。多样本比较问题是现代非参数统计中最重要和最热门的课题之一。Kruskal_Wallis检验被广泛应用于多样本问题。对于多变量或高维数据，必须指定如何根据各种距离度量来确定单个vector_value观测值的秩。或者，可以将主成分分数或独立成分分数的概念与Kruskal_Wallis检验结合起来。本文讨论了一种简单但功能强大的基于主成分分数的多维高维数据Kruskal_Wallis检验方法。构建了另一种基于独立分量分析潜在源的Kruskal_Wallis检验作为竞争对手。这些测试适用于测试位置向量、尺度矩阵或两者的差异，并可用于相等和不相等样本量。通过蒙特卡罗模拟，对不同总体分布的多变量数据与传统的基于距离的Kruskal_Wallis测试进行了比较。我们包括使用实际数据的拟议测试的说明。最后，对今后的研究提出了几点看法和方向。

{"title":"Multivariate Kruskal_Wallis tests based on principal component score and latent source of independent component analysis","authors":"Amitava Mukherjee, Hidetoshi Murakami","doi":"10.1111/anzs.12371","DOIUrl":"10.1111/anzs.12371","url":null,"abstract":"<div>\u0000 \u0000 <p>Analysing multivariate and high_dimensional multi_sample data is essential in many scientific fields. One of the most crucial and popular topics in modern nonparametric statistics is multi_sample comparison problems for such multivariate and high_dimensional data. The Kruskal_Wallis test is widely used in the multi_sample problem. For multivariate or high_dimensional data, it is imperative to specify how to determine the ranks of individual vector_valued observations in terms of various distance metrics. Alternatively, one can combine the concept of principal component scores or independent component scores with the Kruskal_Wallis test. A simple but powerful Kruskal_Wallis test based on the principal component scores is discussed in this paper for the multivariate and high_dimensional data. Another type of Kruskal_Wallis test based on latent sources of independent component analysis is constructed as a competitor. These tests are suitable for testing the difference in the location vector, scale matrix or both and can be used with equal and unequal sample sizes. These tests_ power performances are thoroughly compared with traditional distance_based Kruskal_Wallis tests for multivariate data using simulation based on Monte Carlo for various population distributions. We include an illustration of the proposed tests using real data. The paper concludes with some remarks and directions for future research.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 3","pages":"356-380"},"PeriodicalIF":1.1,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72631322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Australian & New Zealand Journal of Statistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀