首页 > 最新文献

Australian & New Zealand Journal of Statistics最新文献

英文 中文
Comparisons of distributions of Australian mental health scores 澳大利亚心理健康得分分布的比较
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-10-11 DOI: 10.1111/anzs.12399
D. Gunawan, William E. Griffiths, D. Chotikapanich

Bayesian non-parametric estimates of Australian distributions of mental health scores are obtained to assess how the mental health status of the population has changed over time, and to compare the mental health status of female/male and Aboriginal/non-Aboriginal population subgroups. First-order and second-order stochastic dominance are used to compare distributions, with results presented in terms of the posterior probability of dominance and the posterior probability of no dominance. If a criterion for dominance is satisfied, then, in terms of that criterion, the mental health status of the dominant population is superior to that of the dominated population. If neither distribution is dominant, then the mental health status of neither population is superior in the same sense. Our results suggest mental health has deteriorated in recent years, that males' mental health status is better than that of females, and that non-Aboriginal health status is better than that of the Aboriginal population.

获得澳大利亚心理健康得分分布的贝叶斯非参数估计,以评估人口的心理健康状况如何随时间变化,并比较女性/男性和土著/非土著人口亚群的心理健康状况。一阶和二阶随机优势被用来比较分布,结果以优势的后验概率和无优势的后验概率表示。如果满足优势的标准,那么,根据该标准,优势群体的心理健康状况优于劣势群体。如果两种分布都不占优势,那么在同样的意义上,两种人口的心理健康状况都不占优势。结果显示,近年来原住民族的心理健康状况有所恶化,男性的心理健康状况好于女性,非原住民族的心理健康状况好于原住民族。
{"title":"Comparisons of distributions of Australian mental health scores","authors":"D. Gunawan,&nbsp;William E. Griffiths,&nbsp;D. Chotikapanich","doi":"10.1111/anzs.12399","DOIUrl":"10.1111/anzs.12399","url":null,"abstract":"<p>Bayesian non-parametric estimates of Australian distributions of mental health scores are obtained to assess how the mental health status of the population has changed over time, and to compare the mental health status of female/male and Aboriginal/non-Aboriginal population subgroups. First-order and second-order stochastic dominance are used to compare distributions, with results presented in terms of the posterior probability of dominance and the posterior probability of no dominance. If a criterion for dominance is satisfied, then, in terms of that criterion, the mental health status of the dominant population is superior to that of the dominated population. If neither distribution is dominant, then the mental health status of neither population is superior in the same sense. Our results suggest mental health has deteriorated in recent years, that males' mental health status is better than that of females, and that non-Aboriginal health status is better than that of the Aboriginal population.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 4","pages":"287-308"},"PeriodicalIF":0.8,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12399","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136212528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Embedding latent class regression and latent class distal outcome models into cluster-weighted latent class analysis: a detailed simulation experiment 将潜在类别回归和潜在类别远端结果模型嵌入聚类加权潜在类别分析:一个详细的模拟实验
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-09-22 DOI: 10.1111/anzs.12396
Roberto Di Mari, Antonio Punzo, Zsuzsa Bakk

Usually in latent class (LC) analysis, external predictors are taken to be cluster conditional probability predictors (LC models with external predictors), and/or score conditional probability predictors (LC regression models). In such cases, their distribution is not of interest. Class-specific distribution is of interest in the distal outcome model, when the distribution of the external variables is assumed to depend on LC membership. In this paper, we consider a more general formulation, that embeds both the LC regression and the distal outcome models, as is typically done in cluster-weighted modelling. This allows us to investigate (1) whether the distribution of the external variables differs across classes, (2) whether there are significant direct effects of the external variables on the indicators, by modelling jointly the relationship between the external and the latent variables. We show the advantages of the proposed modelling approach through a set of artificial examples, an extensive simulation study and an empirical application about psychological contracts among employees and employers in Belgium and the Netherlands.

通常在潜在类别(LC)分析中,外部预测因子被视为聚类条件概率预测因子(具有外部预测因子的LC模型)和/或评分条件概率预测函数(LC回归模型)。在这种情况下,他们的分配是不感兴趣的。当假设外部变量的分布取决于LC成员时,类特异性分布在远端结果模型中是感兴趣的。在本文中,我们考虑了一个更通用的公式,它嵌入了LC回归和远端结果模型,就像在聚类加权模型中通常做的那样。这使我们能够通过联合建模外部变量和潜在变量之间的关系,研究(1)外部变量的分布是否在不同类别中不同,(2)外部变量对指标是否有显著的直接影响。我们通过一组人工示例、一项广泛的模拟研究以及比利时和荷兰员工和雇主心理契约的实证应用,展示了所提出的建模方法的优势。
{"title":"Embedding latent class regression and latent class distal outcome models into cluster-weighted latent class analysis: a detailed simulation experiment","authors":"Roberto Di Mari,&nbsp;Antonio Punzo,&nbsp;Zsuzsa Bakk","doi":"10.1111/anzs.12396","DOIUrl":"https://doi.org/10.1111/anzs.12396","url":null,"abstract":"<p>Usually in latent class (LC) analysis, external predictors are taken to be cluster conditional probability predictors (LC models with external predictors), and/or score conditional probability predictors (LC regression models). In such cases, their distribution is not of interest. Class-specific distribution is of interest in the distal outcome model, when the distribution of the external variables is assumed to depend on LC membership. In this paper, we consider a more general formulation, that embeds both the LC regression and the distal outcome models, as is typically done in cluster-weighted modelling. This allows us to investigate (1) whether the distribution of the external variables differs across classes, (2) whether there are significant direct effects of the external variables on the indicators, by modelling jointly the relationship between the external and the latent variables. We show the advantages of the proposed modelling approach through a set of artificial examples, an extensive simulation study and an empirical application about psychological contracts among employees and employers in Belgium and the Netherlands.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 3","pages":"213-233"},"PeriodicalIF":1.1,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12396","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50141418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian modelling of effects of prenatal alcohol exposure on child cognition based on data from multiple cohorts 基于多个队列数据的产前酒精暴露对儿童认知影响的贝叶斯建模
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-09-08 DOI: 10.1111/anzs.12397
Khue-Dung Dang, Louise M. Ryan, Tugba Akkaya Hocagil, Richard J. Cook, Gale A. Richardson, Nancy L. Day, Claire D. Coles, Heather Carmichael Olson, Sandra W. Jacobson, Joseph L. Jacobson

High levels of prenatal alcohol exposure (PAE) result in significant cognitive deficits in children, but the exact nature of the dose-response relationship is less well understood. To investigate this relationship, data were assembled from six longitudinal birth cohort studies examining the effects of PAE on cognitive outcomes from early school age through adolescence. Structural equation models (SEMs) are a natural approach to consider, because of the way they conceptualise multiple observed outcomes as relating to an underlying latent variable of interest, which can then be modelled as a function of exposure and other predictors of interest. However, conventional SEMs could not be fitted in this context because slightly different outcome measures were used in the six studies. In this paper we propose a multi-group Bayesian SEM that maps the unobserved cognition variable to a broad range of observed outcomes. The relation between these variables and PAE is then examined while controlling for potential confounders via propensity score adjustment. By examining different possible dose-response functions, the proposed framework is used to investigate whether there is a threshold PAE level that results in minimal cognitive deficit.

高水平的产前酒精暴露(PAE)会导致儿童出现显著的认知缺陷,但剂量-反应关系的确切性质尚不清楚。为了研究这种关系,从六项纵向出生队列研究中收集了数据,这些研究考察了从学龄早期到青春期PAE对认知结果的影响。结构方程模型是一种自然的考虑方法,因为它们将多个观察到的结果概念化为与潜在的感兴趣变量有关,然后可以将其建模为暴露和其他感兴趣预测因素的函数。然而,由于六项研究中使用的结果指标略有不同,因此传统的SEMs无法适用于这种情况。在本文中,我们提出了一种多组贝叶斯SEM,将未观察到的认知变量映射到广泛的观察结果。然后检查这些变量与PAE之间的关系,同时通过倾向评分调整来控制潜在的混杂因素。通过检查不同可能的剂量反应函数,所提出的框架用于研究是否存在导致最小认知缺陷的阈值PAE水平。
{"title":"Bayesian modelling of effects of prenatal alcohol exposure on child cognition based on data from multiple cohorts","authors":"Khue-Dung Dang,&nbsp;Louise M. Ryan,&nbsp;Tugba Akkaya Hocagil,&nbsp;Richard J. Cook,&nbsp;Gale A. Richardson,&nbsp;Nancy L. Day,&nbsp;Claire D. Coles,&nbsp;Heather Carmichael Olson,&nbsp;Sandra W. Jacobson,&nbsp;Joseph L. Jacobson","doi":"10.1111/anzs.12397","DOIUrl":"https://doi.org/10.1111/anzs.12397","url":null,"abstract":"<div>\u0000 \u0000 <p>High levels of prenatal alcohol exposure (PAE) result in significant cognitive deficits in children, but the exact nature of the dose-response relationship is less well understood. To investigate this relationship, data were assembled from six longitudinal birth cohort studies examining the effects of PAE on cognitive outcomes from early school age through adolescence. Structural equation models (SEMs) are a natural approach to consider, because of the way they conceptualise multiple observed outcomes as relating to an underlying latent variable of interest, which can then be modelled as a function of exposure and other predictors of interest. However, conventional SEMs could not be fitted in this context because slightly different outcome measures were used in the six studies. In this paper we propose a multi-group Bayesian SEM that maps the unobserved cognition variable to a broad range of observed outcomes. The relation between these variables and PAE is then examined while controlling for potential confounders via propensity score adjustment. By examining different possible dose-response functions, the proposed framework is used to investigate whether there is a threshold PAE level that results in minimal cognitive deficit.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 3","pages":"167-186"},"PeriodicalIF":1.1,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50125566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical methods for astronomical data analysis. By A. K. Chattopadhyay and T. Chattopadhyay. New York: Springer. 2014. 349 pages. UK£49.99 (hardback). ISBN: 978-1-4939-1506-4. 天文数据分析的统计方法。作者:A. K. Chattopadhyay和T. Chattopadhyay纽约:b施普林格. 2014。349页。英国£49.99(精装)。ISBN: 978-1-4939-1506-4。
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-09-04 DOI: 10.1111/anzs.12398
Soumita Modak
{"title":"Statistical methods for astronomical data analysis. By A. K. Chattopadhyay and T. Chattopadhyay. New York: Springer. 2014. 349 pages. UK£49.99 (hardback). ISBN: 978-1-4939-1506-4.","authors":"Soumita Modak","doi":"10.1111/anzs.12398","DOIUrl":"10.1111/anzs.12398","url":null,"abstract":"","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 4","pages":"394-395"},"PeriodicalIF":0.8,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122111742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The multivariate component zero-inflated Poisson model for correlated count data analysis 用于相关计数数据分析的多元零膨胀泊松模型
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-08-27 DOI: 10.1111/anzs.12395
Qin Wu, Guo-Liang Tian, Tao Li, Man-Lai Tang, Chi Zhang

Multivariate zero-inflated Poisson (ZIP) distributions are important tools for modelling and analysing correlated count data with extra zeros. Unfortunately, existing multivariate ZIP distributions consider only the overall zero-inflation while the component zero-inflation is not well addressed. This paper proposes a flexible multivariate ZIP distribution, called the multivariate component ZIP distribution, in which both the overall and component zero-inflations are taken into account. Likelihood-based inference procedures including the calculation of maximum likelihood estimates of parameters in the model without and with covariates are provided. Simulation studies indicate that the performance of the proposed methods on the multivariate component ZIP model is satisfactory. The Australia health care utilisation data set is analysed to demonstrate that the new distribution is more appropriate than the existing multivariate ZIP distributions.

多变量零膨胀泊松(ZIP)分布是建模和分析具有额外零的相关计数数据的重要工具。不幸的是,现有的多变量ZIP分布只考虑总体零通货膨胀,而零通货膨胀分量没有得到很好的解决。本文提出了一种灵活的多元ZIP分布,称为多元分量ZIP分布,其中同时考虑了整体膨胀和零分量膨胀。提供了基于似然的推理程序,包括在没有协变量和有协变量的情况下计算模型中参数的最大似然估计。仿真研究表明,所提出的方法在多元分量ZIP模型上的性能是令人满意的。对澳大利亚医疗保健利用率数据集进行了分析,以证明新的分布比现有的多变量ZIP分布更合适。
{"title":"The multivariate component zero-inflated Poisson model for correlated count data analysis","authors":"Qin Wu,&nbsp;Guo-Liang Tian,&nbsp;Tao Li,&nbsp;Man-Lai Tang,&nbsp;Chi Zhang","doi":"10.1111/anzs.12395","DOIUrl":"https://doi.org/10.1111/anzs.12395","url":null,"abstract":"<div>\u0000 \u0000 <p>Multivariate zero-inflated Poisson (ZIP) distributions are important tools for modelling and analysing correlated count data with extra zeros. Unfortunately, existing multivariate ZIP distributions consider only the overall zero-inflation while the component zero-inflation is not well addressed. This paper proposes a flexible multivariate ZIP distribution, called the multivariate component ZIP distribution, in which both the overall and component zero-inflations are taken into account. Likelihood-based inference procedures including the calculation of maximum likelihood estimates of parameters in the model without and with covariates are provided. Simulation studies indicate that the performance of the proposed methods on the multivariate component ZIP model is satisfactory. The Australia health care utilisation data set is analysed to demonstrate that the new distribution is more appropriate than the existing multivariate ZIP distributions.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 3","pages":"234-261"},"PeriodicalIF":1.1,"publicationDate":"2023-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50145271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Short-term forecasting with a computationally efficient nonparametric transfer function model 一种计算高效的非参数传递函数模型的短期预测
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-08-01 DOI: 10.1111/anzs.12394
Jun. M. Liu

In this paper a semi-parametric approach is developed to model non-linear relationships in time series data using polynomial splines. Polynomial splines require very little assumption about the functional form of the underlying relationship, so they are very flexible and can be used to model highly non-linear relationships. Polynomial splines are also computationally very efficient. The serial correlation in the data is accounted for by modelling the noise as an autoregressive integrated moving average (ARIMA) process, by doing so, the efficiency in nonparametric estimation is improved and correct inferences can be obtained. The explicit structure of the ARIMA model allows the correlation information to be used to improve forecasting performance. An algorithm is developed to automatically select and estimate the polynomial spline model and the ARIMA model through backfitting. This method is applied on a real-life data set to forecast hourly electricity usage. The non-linear effect of temperature on hourly electricity usage is allowed to be different at different hours of the day and days of the week. The forecasting performance of the developed method is evaluated in post-sample forecasting and compared with several well-accepted models. The results show the performance of the proposed model is comparable with a long short-term memory deep learning model.

本文提出了一种使用多项式样条对时间序列数据中的非线性关系进行建模的半参数方法。多项式样条几乎不需要对基础关系的函数形式进行假设,因此它们非常灵活,可以用于建模高度非线性的关系。多项式样条在计算上也是非常有效的。通过将噪声建模为自回归积分移动平均(ARIMA)过程来解释数据中的序列相关性,通过这样做,提高了非参数估计的效率,并可以获得正确的推断。ARIMA模型的显式结构允许使用相关性信息来提高预测性能。开发了一种通过反拟合自动选择和估计多项式样条模型和ARIMA模型的算法。该方法应用于实际数据集,以预测每小时用电量。允许温度对每小时用电量的非线性影响在一天中的不同时间和一周中的不同日子是不同的。在样本后预测中评估了所开发方法的预测性能,并与几种公认的模型进行了比较。结果表明,该模型的性能与长短期记忆深度学习模型相当。
{"title":"Short-term forecasting with a computationally efficient nonparametric transfer function model","authors":"Jun. M. Liu","doi":"10.1111/anzs.12394","DOIUrl":"https://doi.org/10.1111/anzs.12394","url":null,"abstract":"<div>\u0000 \u0000 <p>In this paper a semi-parametric approach is developed to model non-linear relationships in time series data using polynomial splines. Polynomial splines require very little assumption about the functional form of the underlying relationship, so they are very flexible and can be used to model highly non-linear relationships. Polynomial splines are also computationally very efficient. The serial correlation in the data is accounted for by modelling the noise as an autoregressive integrated moving average (ARIMA) process, by doing so, the efficiency in nonparametric estimation is improved and correct inferences can be obtained. The explicit structure of the ARIMA model allows the correlation information to be used to improve forecasting performance. An algorithm is developed to automatically select and estimate the polynomial spline model and the ARIMA model through backfitting. This method is applied on a real-life data set to forecast hourly electricity usage. The non-linear effect of temperature on hourly electricity usage is allowed to be different at different hours of the day and days of the week. The forecasting performance of the developed method is evaluated in post-sample forecasting and compared with several well-accepted models. The results show the performance of the proposed model is comparable with a long short-term memory deep learning model.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 3","pages":"187-212"},"PeriodicalIF":1.1,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50114984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asymptotics of M-estimator in multivariate linear regression models for a class of random errors 一类随机误差的多元线性回归模型中M-估计的渐近性
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-07-21 DOI: 10.1111/anzs.12393
Yi Wu, Wei Yu, Xuejun Wang

It is known that linear regression models have immense applications in various areas such as engineering technology, economics and social sciences. In this paper, we investigate the asymptotic properties of M-estimator in multivariate linear regression model based on a class of random errors satisfying a generalised Bernstein-type inequality. By using the generalised Bernstein-type inequality, we obtain a general result on almost sure convergence for a class of random variables and then obtain the strong consistency for the M-estimator in multivariate linear regression models under some mild conditions. The result extends or improves some existing ones in the literature. Moreover, we also consider the case when the dimension $p$ tends to infinity by establishing the rate of almost sure convergence for a class of random variables satisfying generalised Bernstein-type inequality. Some numerical simulations are also provided to verify the validity of the theoretical results.

众所周知,线性回归模型在工程技术、经济和社会科学等各个领域都有着巨大的应用。本文研究了基于一类满足广义Bernstein型不等式的随机误差的多元线性回归模型中M-估计量的渐近性质。利用广义Bernstein型不等式,我们得到了一类随机变量几乎肯定收敛的一般结果,并在一些温和条件下得到了多元线性回归模型中M-估计量的强一致性。该结果扩展或改进了文献中已有的一些结果。此外,我们还通过建立一类满足广义Bernstein型不等式的随机变量的几乎肯定收敛率,来考虑维数$p$趋于无穷大的情况。通过数值模拟验证了理论结果的正确性。
{"title":"Asymptotics of M-estimator in multivariate linear regression models for a class of random errors","authors":"Yi Wu,&nbsp;Wei Yu,&nbsp;Xuejun Wang","doi":"10.1111/anzs.12393","DOIUrl":"https://doi.org/10.1111/anzs.12393","url":null,"abstract":"<div>\u0000 \u0000 <p>It is known that linear regression models have immense applications in various areas such as engineering technology, economics and social sciences. In this paper, we investigate the asymptotic properties of <i>M</i>-estimator in multivariate linear regression model based on a class of random errors satisfying a generalised Bernstein-type inequality. By using the generalised Bernstein-type inequality, we obtain a general result on almost sure convergence for a class of random variables and then obtain the strong consistency for the <i>M</i>-estimator in multivariate linear regression models under some mild conditions. The result extends or improves some existing ones in the literature. Moreover, we also consider the case when the dimension $p$ tends to infinity by establishing the rate of almost sure convergence for a class of random variables satisfying generalised Bernstein-type inequality. Some numerical simulations are also provided to verify the validity of the theoretical results.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 3","pages":"262-285"},"PeriodicalIF":1.1,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50148711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the selection of predictors by using greedy algorithms and information theoretic criteria 利用贪婪算法和信息论准则选择预测因子
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-06-29 DOI: 10.1111/anzs.12387
Fangyao Li, Christopher M. Triggs, Ciprian Doru Giurcăneanu

We discuss the use of the following greedy algorithms in the prediction of multivariate time series: Matching Pursuit Algorithm (MPA), Orthogonal Matching Pursuit (OMP), Relaxed Matching Pursuit (RMP), Frank–Wolfe Algorithm (FWA) and Constrained Matching Pursuit (CMP). The last two are known to be solvers for the lasso problem. Some of the algorithms are well-known (e.g. OMP), while others are less popular (e.g. RMP). We provide a unified presentation of all the algorithms, and evaluate their computational complexity for the high-dimensional case and for the big data case. We show how 12 information theoretic (IT) criteria can be used jointly with the greedy algorithms. As part of this effort, we derive new theoretical results that allow modification of the IT criteria such that to be compatible with RMP. The prediction capabilities are tested in experiments with two data sets. The first one involves air pollution data measured in Auckland (New Zealand) and the second one concerns the House Price Index in England (the United Kingdom).

我们讨论了以下贪婪算法在多变量时间序列预测中的应用:匹配追踪算法(MPA)、正交匹配追踪(OMP)、松弛匹配追踪(RMP)、Frank–Wolfe算法(FWA)和约束匹配追踪(CMP)。最后两个已知是套索问题的求解器。一些算法是众所周知的(例如OMP),而另一些算法则不那么流行(例如RMP)。我们提供了所有算法的统一表示,并评估了它们在高维情况和大数据情况下的计算复杂性。我们展示了如何将12个信息论(IT)准则与贪婪算法结合使用。作为这项工作的一部分,我们得出了新的理论结果,允许修改IT标准,使其与RMP兼容。在两个数据集的实验中测试了预测能力。第一个涉及奥克兰(新西兰)的空气污染数据,第二个涉及英国(英国)的房价指数。
{"title":"On the selection of predictors by using greedy algorithms and information theoretic criteria","authors":"Fangyao Li,&nbsp;Christopher M. Triggs,&nbsp;Ciprian Doru Giurcăneanu","doi":"10.1111/anzs.12387","DOIUrl":"https://doi.org/10.1111/anzs.12387","url":null,"abstract":"<p>We discuss the use of the following greedy algorithms in the prediction of multivariate time series: Matching Pursuit Algorithm (MPA), Orthogonal Matching Pursuit (OMP), Relaxed Matching Pursuit (RMP), Frank–Wolfe Algorithm (FWA) and Constrained Matching Pursuit (CMP). The last two are known to be solvers for the lasso problem. Some of the algorithms are well-known (e.g. OMP), while others are less popular (e.g. RMP). We provide a unified presentation of all the algorithms, and evaluate their computational complexity for the high-dimensional case and for the big data case. We show how 12 information theoretic (IT) criteria can be used jointly with the greedy algorithms. As part of this effort, we derive new theoretical results that allow modification of the IT criteria such that to be compatible with RMP. The prediction capabilities are tested in experiments with two data sets. The first one involves air pollution data measured in Auckland (New Zealand) and the second one concerns the House Price Index in England (the United Kingdom).</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 2","pages":"77-100"},"PeriodicalIF":1.1,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12387","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50155532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Visual assessment of matrix-variate normality 矩阵变量正态性的可视化评估
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-06-17 DOI: 10.1111/anzs.12388
Nikola Počuča, Michael P.B. Gallaugher, Katharine M. Clark, Paul D. McNicholas

In recent years, the analysis of three-way data has become ever more prevalent in the literature. It is becoming increasingly common to analyse such data by means of matrix-variate distributions, the most prevalent of which is the matrix-variate normal distribution. Although many methods exist for assessing multivariate normality, there is a relative paucity of approaches for assessing matrix-variate normality. Herein, a new visual method is proposed for assessing matrix-variate normality by means of a distance–distance plot. In addition, a testing procedure is discussed to be used in tandem with the proposed visual method. The proposed approach is illustrated via simulated data as well as an application on analysing handwritten digits.

近年来,对三元数据的分析在文献中变得越来越普遍。通过矩阵变量分布来分析这些数据变得越来越普遍,其中最普遍的是矩阵变量正态分布。尽管存在许多评估多元正态性的方法,但评估矩阵多元正态的方法相对较少。本文提出了一种新的视觉方法,通过距离-距离图来评估矩阵变量的正态性。此外,还讨论了与所提出的视觉方法一起使用的测试程序。通过模拟数据以及在手写数字分析中的应用,说明了所提出的方法。
{"title":"Visual assessment of matrix-variate normality","authors":"Nikola Počuča,&nbsp;Michael P.B. Gallaugher,&nbsp;Katharine M. Clark,&nbsp;Paul D. McNicholas","doi":"10.1111/anzs.12388","DOIUrl":"https://doi.org/10.1111/anzs.12388","url":null,"abstract":"<div>\u0000 \u0000 <p>In recent years, the analysis of three-way data has become ever more prevalent in the literature. It is becoming increasingly common to analyse such data by means of matrix-variate distributions, the most prevalent of which is the matrix-variate normal distribution. Although many methods exist for assessing multivariate normality, there is a relative paucity of approaches for assessing matrix-variate normality. Herein, a new visual method is proposed for assessing matrix-variate normality by means of a distance–distance plot. In addition, a testing procedure is discussed to be used in tandem with the proposed visual method. The proposed approach is illustrated via simulated data as well as an application on analysing handwritten digits.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 2","pages":"152-165"},"PeriodicalIF":1.1,"publicationDate":"2023-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50151748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Robust PCA for high-dimensional data based on characteristic transformation 基于特征变换的高维数据鲁棒主成分分析
IF 1.1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-06-13 DOI: 10.1111/anzs.12385
Lingyu He, Yanrong Yang, Bo Zhang

In this paper, we propose a novel robust principal component analysis (PCA) for high-dimensional data in the presence of various heterogeneities, in particular strong tailing and outliers. A transformation motivated by the characteristic function is constructed to improve the robustness of the classical PCA. The suggested method has the distinct advantage of dealing with heavy-tail-distributed data, whose covariances may be non-existent (positively infinite, for instance), in addition to the usual outliers. The proposed approach is also a case of kernel principal component analysis (KPCA) and employs the robust and non-linear properties via a bounded and non-linear kernel function. The merits of the new method are illustrated by some statistical properties, including the upper bound of the excess error and the behaviour of the large eigenvalues under a spiked covariance model. Additionally, using a variety of simulations, we demonstrate the benefits of our approach over the classical PCA. Finally, using data on protein expression in mice of various genotypes in a biological study, we apply the novel robust PCA to categorise the mice and find that our approach is more effective at identifying abnormal mice than the classical PCA.

在本文中,我们提出了一种新的鲁棒主成分分析(PCA),用于存在各种异质性,特别是强拖尾和异常值的高维数据。构造了一个由特征函数驱动的变换,以提高经典PCA的鲁棒性。所提出的方法在处理重尾分布数据方面具有明显的优势,除了通常的异常值外,这些数据的协变量可能不存在(例如,正无限)。所提出的方法也是核主成分分析(KPCA)的一个例子,并通过有界和非线性核函数利用了鲁棒和非线性特性。新方法的优点通过一些统计特性来说明,包括超额误差的上界和大特征值在尖峰协方差模型下的行为。此外,通过各种模拟,我们展示了我们的方法相对于经典PCA的优势。最后,在一项生物学研究中,利用不同基因型小鼠蛋白质表达的数据,我们应用新的稳健PCA对小鼠进行分类,发现我们的方法在识别异常小鼠方面比经典PCA更有效。
{"title":"Robust PCA for high-dimensional data based on characteristic transformation","authors":"Lingyu He,&nbsp;Yanrong Yang,&nbsp;Bo Zhang","doi":"10.1111/anzs.12385","DOIUrl":"https://doi.org/10.1111/anzs.12385","url":null,"abstract":"<div>\u0000 \u0000 <p>In this paper, we propose a novel robust principal component analysis (PCA) for high-dimensional data in the presence of various heterogeneities, in particular strong tailing and outliers. A transformation motivated by the characteristic function is constructed to improve the robustness of the classical PCA. The suggested method has the distinct advantage of dealing with heavy-tail-distributed data, whose covariances may be non-existent (positively infinite, for instance), in addition to the usual outliers. The proposed approach is also a case of kernel principal component analysis (KPCA) and employs the robust and non-linear properties via a bounded and non-linear kernel function. The merits of the new method are illustrated by some statistical properties, including the upper bound of the excess error and the behaviour of the large eigenvalues under a spiked covariance model. Additionally, using a variety of simulations, we demonstrate the benefits of our approach over the classical PCA. Finally, using data on protein expression in mice of various genotypes in a biological study, we apply the novel robust PCA to categorise the mice and find that our approach is more effective at identifying abnormal mice than the classical PCA.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 2","pages":"127-151"},"PeriodicalIF":1.1,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50150434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Australian & New Zealand Journal of Statistics
全部 Geobiology Appl. Clay Sci. Geochim. Cosmochim. Acta J. Hydrol. Org. Geochem. Carbon Balance Manage. Contrib. Mineral. Petrol. Int. J. Biometeorol. IZV-PHYS SOLID EART+ J. Atmos. Chem. Acta Oceanolog. Sin. Acta Geophys. ACTA GEOL POL ACTA PETROL SIN ACTA GEOL SIN-ENGL AAPG Bull. Acta Geochimica Adv. Atmos. Sci. Adv. Meteorol. Am. J. Phys. Anthropol. Am. J. Sci. Am. Mineral. Annu. Rev. Earth Planet. Sci. Appl. Geochem. Aquat. Geochem. Ann. Glaciol. Archaeol. Anthropol. Sci. ARCHAEOMETRY ARCT ANTARCT ALP RES Asia-Pac. J. Atmos. Sci. ATMOSPHERE-BASEL Atmos. Res. Aust. J. Earth Sci. Atmos. Chem. Phys. Atmos. Meas. Tech. Basin Res. Big Earth Data BIOGEOSCIENCES Geostand. Geoanal. Res. GEOLOGY Geosci. J. Geochem. J. Geochem. Trans. Geosci. Front. Geol. Ore Deposits Global Biogeochem. Cycles Gondwana Res. Geochem. Int. Geol. J. Geophys. Prospect. Geosci. Model Dev. GEOL BELG GROUNDWATER Hydrogeol. J. Hydrol. Earth Syst. Sci. Hydrol. Processes Int. J. Climatol. Int. J. Earth Sci. Int. Geol. Rev. Int. J. Disaster Risk Reduct. Int. J. Geomech. Int. J. Geog. Inf. Sci. Isl. Arc J. Afr. Earth. Sci. J. Adv. Model. Earth Syst. J APPL METEOROL CLIM J. Atmos. Oceanic Technol. J. Atmos. Sol. Terr. Phys. J. Clim. J. Earth Sci. J. Earth Syst. Sci. J. Environ. Eng. Geophys. J. Geog. Sci. Mineral. Mag. Miner. Deposita Mon. Weather Rev. Nat. Hazards Earth Syst. Sci. Nat. Clim. Change Nat. Geosci. Ocean Dyn. Ocean and Coastal Research npj Clim. Atmos. Sci. Ocean Modell. Ocean Sci. Ore Geol. Rev. OCEAN SCI J Paleontol. J. PALAEOGEOGR PALAEOCL PERIOD MINERAL PETROLOGY+ Phys. Chem. Miner. Polar Sci. Prog. Oceanogr. Quat. Sci. Rev. Q. J. Eng. Geol. Hydrogeol. RADIOCARBON Pure Appl. Geophys. Resour. Geol. Rev. Geophys. Sediment. Geol.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1