Australian & New Zealand Journal of Statistics最新文献

英文中文

John Newton Darroch, 1930–2024 约翰·牛顿·达罗克(1930-2024

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Australian & New Zealand Journal of Statistics

Pub Date : 2025-01-16 DOI: 10.1111/anzs.12430

Gary Glonek

John Darroch was born in England in Melksham, Wiltshire, to George Darroch and Phyllis Lacey on 22 October 1930 and died, aged 93 years, on 15 April 2024.

He attended grammar school where he excelled in the classroom and at sports, and subsequently gained admission to study civil engineering at the University of Bristol. Recognising his talent, his mathematics teacher persuaded him to postpone university and take the exam for the Cambridge Open Scholarship to study mathematics.

John was successful and studied for 4 years, specialising in theoretical physics and completing the Diploma in Mathematical Statistics under the supervision of Dennis Lindley. He studied briefly with R.A. Fisher in this time, although this appears not to have been a significant factor in his future career choices. It was also there that he met his future wife, Elisabeth Pennington. He completed 2 years of national service in the RAF, at the rank of Pilot Officer, teaching mathematics, and in 1955 he and Elisabeth set sail for Cape Town to take up his new position as a lecturer at the University.

John arrived at the University of Cape Town ‘with no thought of doing any research’. Within a few months, an enquiry from the professor of biology awakened his instinct for research, culminating in his 1958 Biometrika paper on capture-recapture experiments. He enrolled in a Ph.D. by Publication program at Cape Town, and was subsequently awarded the degree on the basis of his series of three Biometrika papers on capture-recapture (Darroch 1958, 1959, 1961). This seminal contribution proposed models and provided maximum likelihood estimates for a number of capture-recapture settings, and provided a basis for much of the development that followed. It was through this work, undertaken without supervision and with only rudimentary access to the literature, that John developed his first-principles approach to research.

John's academic career flourished. He and Elisabeth returned to England to take up a lectureship at the University of Manchester, where he supervised George Seber for a period introducing him to a problem in capture-recapture. Keen to escape the cold winters, John and his young family moved to Adelaide where he took up a senior lectureship at the University of Adelaide. This was followed by a position at the University of Michigan, Ann Arbor, but John and his family were drawn back to Adelaide. In 1966 he was offered and accepted the Inaugural Chair in Statistics at Flinders University, a position he held until his retirement in 1996.

John is best known for his contributions to statistical methodology, especially in the area of multivariate categorical data. His work was recognised in 2005 through the award of the SSA Pitman Medal. Writing in support of the award, Stephen Fienberg observed: Someone once remarked to me that there are few statisticians who have a major new idea that coul

首先，他的见解往往具有高度的原创性、优雅性和相关性。毫无疑问，这部分归功于他早年在开普敦的经历所带来的思想独立性。在几乎与世隔绝的环境中工作，约翰认识到了不受他人思想束缚的优势，并在今后的工作中将这种优势发扬光大。其次，虽然约翰本身是一位能力很强的数学家，但他更看重概念上的洞察力，而不是数学上的技术性。他的直觉告诉他，只要选择了正确的概念，数学理由通常就会随之而来。他对自己的想法进行批判性思考，不会被没有严密论证的概念的表面优雅所迷惑。最后，约翰的方法论研究涵盖了各种各样的主题，但贯穿其中的一个共同点是他对变量间依赖关系的兴趣：即复杂关系的起源和解释，如何在模型和推理方法中体现这些关系。他对这些问题进行了深入的思考，并提出了一个复杂的观点，该观点正确地解释了在任何可观察到的关联中通常会存在的多种依赖性来源。约翰的兴趣和影响超出了学术出版的范围；特别值得一提的是他在 1983 年参与了斯普拉特皇家委员会的工作。皇家委员会的主题是调查 1978 年对爱德华-斯普拉特（Edward Splatt）的谋杀定罪，约翰协助委员会辩护律师处理了案件的统计方面。他通过应用概率逻辑和贝叶斯定理，找出了控方论证逻辑中的严重缺陷。皇家委员会的报告表示接受他的论点，并推翻了对爱德华-斯普拉特的定罪。斯普拉特皇家委员会成为约翰在澳大利亚统计学会发表的主席演讲的主题。演讲稿刊登在 1985 年 2 月的 SSAI 时事通讯上，后来又在《专业统计学家》上发表了一个版本（Darroch，1987 年）。随后，约翰在特里-斯比德的引荐下认识了理查德-埃格莱斯顿爵士，他是一位杰出的法官、莫纳什大学校长和《证据、证明和概率》一书的作者。约翰非常重视随后的交流，双方进行了广泛的通信，并在两次国际会议上发表了论文。虽然约翰以研究著称，但他对统计专业做出了许多重要贡献，其中最突出的是他对统计教育的贡献。他最初被任命为开普敦大学的首位概率和统计学讲师，他的职责就是将该教材引入课程。7 年后，当他来到阿德莱德大学时，该校只有一门统计学课程。几个月后，约翰在三年级开设了第二门数理统计课程，并在四年级开设了马尔可夫链课程。这两个科目都是对当时最新材料的创新综合。这些科目激发了尤金-塞内塔（Eugene Seneta）与约翰的进一步研究，最终促成了他们之前提到的准稳态分布方面的工作。约翰担任弗林德斯大学统计学就职教席期间，负责设计和实施统计学课程。在其鼎盛时期，该课程为一代统计学家提供了全面的荣誉学位课程。该课程涵盖了当时的许多最新发展，尤其值得一提的是用无坐标几何方法处理线性统计模型。约翰多年来努力工作，在入学人数不断减少的情况下维持了这一课程。他认识到有必要为该学科确立服务教学的角色，而当时的数学科学系主任莫名其妙的反对使这项任务变得更加困难。在约翰的职业生涯中，指导研究生也让他非常满意。他是学生们的楷模、良师益友，许多学生都取得了事业上的巨大成功。 1995 年 5 月，约翰在弗林德斯大学办公室的办公桌前。资料来源：不详约翰对统计专业做出了更广泛的贡献，他曾多次担任澳大利亚统计协会南澳大利亚分会主席和全国主席以及国际生物统计学会澳大拉西亚地区主席。

{"title":"John Newton Darroch, 1930–2024","authors":"Gary Glonek","doi":"10.1111/anzs.12430","DOIUrl":"https://doi.org/10.1111/anzs.12430","url":null,"abstract":"John Darroch was born in England in Melksham, Wiltshire, to George Darroch and Phyllis Lacey on 22 October 1930 and died, aged 93 years, on 15 April 2024.He attended grammar school where he excelled in the classroom and at sports, and subsequently gained admission to study civil engineering at the University of Bristol. Recognising his talent, his mathematics teacher persuaded him to postpone university and take the exam for the Cambridge Open Scholarship to study mathematics.John was successful and studied for 4 years, specialising in theoretical physics and completing the Diploma in Mathematical Statistics under the supervision of Dennis Lindley. He studied briefly with R.A. Fisher in this time, although this appears not to have been a significant factor in his future career choices. It was also there that he met his future wife, Elisabeth Pennington. He completed 2 years of national service in the RAF, at the rank of Pilot Officer, teaching mathematics, and in 1955 he and Elisabeth set sail for Cape Town to take up his new position as a lecturer at the University.John arrived at the University of Cape Town ‘with no thought of doing any research’. Within a few months, an enquiry from the professor of biology awakened his instinct for research, culminating in his 1958 Biometrika paper on capture-recapture experiments. He enrolled in a Ph.D. by Publication program at Cape Town, and was subsequently awarded the degree on the basis of his series of three Biometrika papers on capture-recapture (Darroch 1958, 1959, 1961). This seminal contribution proposed models and provided maximum likelihood estimates for a number of capture-recapture settings, and provided a basis for much of the development that followed. It was through this work, undertaken without supervision and with only rudimentary access to the literature, that John developed his first-principles approach to research.John's academic career flourished. He and Elisabeth returned to England to take up a lectureship at the University of Manchester, where he supervised George Seber for a period introducing him to a problem in capture-recapture. Keen to escape the cold winters, John and his young family moved to Adelaide where he took up a senior lectureship at the University of Adelaide. This was followed by a position at the University of Michigan, Ann Arbor, but John and his family were drawn back to Adelaide. In 1966 he was offered and accepted the Inaugural Chair in Statistics at Flinders University, a position he held until his retirement in 1996.John is best known for his contributions to statistical methodology, especially in the area of multivariate categorical data. His work was recognised in 2005 through the award of the SSA Pitman Medal. Writing in support of the award, Stephen Fienberg observed: Someone once remarked to me that there are few statisticians who have a major new idea that coul","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"67 1","pages":"130-134"},"PeriodicalIF":0.8,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12430","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143831134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0

PanIC: Consistent information criteria for general model selection problems PanIC：一般模型选择问题的一致信息标准
IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY
Australian & New Zealand Journal of Statistics
Pub Date : 2024-10-31 DOI: 10.1111/anzs.12426

Hien Duy Nguyen

Model selection is a ubiquitous problem that arises in the application of many statistical and machine learning methods. In the likelihood and related settings, it is typical to use the method of information criteria (ICs) to choose the most parsimonious among competing models by penalizing the likelihood-based objective function. Theorems guaranteeing the consistency of ICs can often be difficult to verify and are often specific and bespoke. We present a set of results that guarantee consistency for a class of ICs, which we call PanIC (from the Greek root ‘pan’, meaning ‘of everything’), with easily verifiable regularity conditions. PanICs are applicable in any loss-based learning problem and are not exclusive to likelihood problems. We illustrate the verification of regularity conditions for model selection problems regarding finite mixture models, least absolute deviation and support vector regression and principal component analysis, and demonstrate the effectiveness of PanICs for such problems via numerical simulations. Furthermore, we present new sufficient conditions for the consistency of BIC-like estimators and provide comparisons of the BIC with PanIC.

模型选择是一个普遍存在的问题，它出现在许多统计和机器学习方法的应用中。在似然和相关设置中，通常使用信息标准（ICs）的方法，通过惩罚基于似然的目标函数，在竞争模型中选择最节俭的模型。保证ic一致性的定理通常很难验证，并且通常是特定的和定制的。我们提出了一组结果，保证了一类ic的一致性，我们称之为PanIC（来自希腊语词根“pan”，意思是“一切”），具有易于验证的正则性条件。恐慌适用于任何基于损失的学习问题，而不仅仅是可能性问题。我们举例说明了有限混合模型、最小绝对偏差、支持向量回归和主成分分析的模型选择问题的正则性条件的验证，并通过数值模拟证明了PanICs对这类问题的有效性。此外，我们给出了类BIC估计一致性的新充分条件，并将类BIC估计与PanIC估计进行了比较。

{"title":"PanIC: Consistent information criteria for general model selection problems","authors":"Hien Duy Nguyen","doi":"10.1111/anzs.12426","DOIUrl":"https://doi.org/10.1111/anzs.12426","url":null,"abstract":"Model selection is a ubiquitous problem that arises in the application of many statistical and machine learning methods. In the likelihood and related settings, it is typical to use the method of information criteria (ICs) to choose the most parsimonious among competing models by penalizing the likelihood-based objective function. Theorems guaranteeing the consistency of ICs can often be difficult to verify and are often specific and bespoke. We present a set of results that guarantee consistency for a class of ICs, which we call PanIC (from the Greek root ‘pan’, meaning ‘of everything’), with easily verifiable regularity conditions. PanICs are applicable in any loss-based learning problem and are not exclusive to likelihood problems. We illustrate the verification of regularity conditions for model selection problems regarding finite mixture models, least absolute deviation and support vector regression and principal component analysis, and demonstrate the effectiveness of PanICs for such problems via numerical simulations. Furthermore, we present new sufficient conditions for the consistency of BIC-like estimators and provide comparisons of the BIC with PanIC.","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"441-466"},"PeriodicalIF":0.8,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12426","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0

Prediction de-correlated inference: A safe approach for post-prediction inference 预测去相关推理：一种安全的后预测推理方法
IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY
Australian & New Zealand Journal of Statistics
Pub Date : 2024-10-24 DOI: 10.1111/anzs.12429

Feng Gan, Wanfeng Liang, Changliang Zou

In modern data analysis, it is common to use machine learning methods to predict outcomes on unlabelled datasets and then use these pseudo-outcomes in subsequent statistical inference. Inference in this setting is often called post-prediction inference. We propose a novel assumption-lean framework for statistical inference under post-prediction setting, called prediction de-correlated inference (PDC). Our approach is safe, in the sense that PDC can automatically adapt to any black-box machine-learning model and consistently outperform the supervised counterparts. The PDC framework also offers easy extensibility for accommodating multiple predictive models. Both numerical results and real-world data analysis demonstrate the superiority of PDC over the state-of-the-art methods.

在现代数据分析中，通常使用机器学习方法来预测未标记数据集的结果，然后在随后的统计推断中使用这些伪结果。这种情况下的推理通常被称为后预测推理。我们提出了一种新的预测后设置统计推断的精简假设框架，即预测去相关推理（PDC）。我们的方法是安全的，从某种意义上说，PDC可以自动适应任何黑箱机器学习模型，并始终优于有监督的对应模型。PDC框架还为适应多个预测模型提供了简单的可扩展性。数值结果和实际数据分析都证明了PDC的优越性。

{"title":"Prediction de-correlated inference: A safe approach for post-prediction inference","authors":"Feng Gan, Wanfeng Liang, Changliang Zou","doi":"10.1111/anzs.12429","DOIUrl":"https://doi.org/10.1111/anzs.12429","url":null,"abstract":"<div>\u0000 \u0000 In modern data analysis, it is common to use machine learning methods to predict outcomes on unlabelled datasets and then use these pseudo-outcomes in subsequent statistical inference. Inference in this setting is often called post-prediction inference. We propose a novel assumption-lean framework for statistical inference under post-prediction setting, called prediction de-correlated inference (PDC). Our approach is safe, in the sense that PDC can automatically adapt to any black-box machine-learning model and consistently outperform the supervised counterparts. The PDC framework also offers easy extensibility for accommodating multiple predictive models. Both numerical results and real-world data analysis demonstrate the superiority of PDC over the state-of-the-art methods.\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"417-440"},"PeriodicalIF":0.8,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0

Telling Stories with Data: With Application in R. By Rohan Alexander. CRC Press. 2023. 622 pages. AU$129.60 (hardback). ISBN: 978-1-0321-3477-2. 用数据讲故事：在r语言中的应用CRC出版社。2023。622页。非盟(精装)129.60美元。ISBN: 978-1-0321-3477-2。
IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY
Australian & New Zealand Journal of Statistics
Pub Date : 2024-09-30 DOI: 10.1111/anzs.12428

Emi Tanaka

{"title":"Telling Stories with Data: With Application in R. By Rohan Alexander. CRC Press. 2023. 622 pages. AU$129.60 (hardback). ISBN: 978-1-0321-3477-2.","authors":"Emi Tanaka","doi":"10.1111/anzs.12428","DOIUrl":"https://doi.org/10.1111/anzs.12428","url":null,"abstract":"","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"467-470"},"PeriodicalIF":0.8,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0

Full Bayesian analysis of triple seasonal autoregressive models 三季节自回归模型的全贝叶斯分析
IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY
Australian & New Zealand Journal of Statistics
Pub Date : 2024-09-27 DOI: 10.1111/anzs.12427

Ayman A. Amin

Seasonal autoregressive (SAR) time series models have been extended to fit time series exhibiting multiple seasonalities. However, hardly any research in Bayesian literature has been done on modelling multiple seasonalities. In this article, we propose a full Bayesian analysis of triple SAR (TSAR) models for time series with triple seasonality, considering identification, estimation and prediction for these TSAR models. In this Bayesian analysis of TSAR models, we assume the model errors to be normally distributed and the model order to be a random variable with a known maximum value, and we employ the g prior for the model coefficients and variance. Accordingly, we first derive the posterior mass function of the TSAR order in closed form, which then enables us to identify the best order of TSAR model as the order value with the highest posterior probability. In addition, we derive the conditional posteriors to be a multivariate normal for the TSAR coefficients and to be an inverse gamma for the TSAR variance; also, we derive the conditional predictive distribution to be a multivariate normal for future observations. Since these derived conditional distributions are in closed forms, we introduce the Gibbs sampler to present the Bayesian analysis of TSAR models and to easily produce multiple-step-ahead predictions. Using Julia programming language, we conduct an extensive simulation study, aiming to evaluate the accuracy of our proposed full Bayesian analysis for TSAR models. In addition, we apply our work on time series to hourly electricity load in some European countries.

季节自回归（SAR）时间序列模型已扩展到拟合具有多季节性的时间序列。然而，在贝叶斯文献中，几乎没有研究对多季节性进行建模。在本文中，我们提出了一个完整的贝叶斯分析三重季节序列的三重SAR （TSAR）模型，考虑这些TSAR模型的识别，估计和预测。在TSAR模型的贝叶斯分析中，我们假设模型误差是正态分布的，模型阶数是一个已知最大值的随机变量，我们使用g先验来表示模型系数和方差。因此，我们首先以封闭形式推导出TSAR阶次的后验质量函数，从而使我们能够识别出TSAR模型的最佳阶次作为具有最高后验概率的阶值。此外，我们推导出条件后验是TSAR系数的多元正态，是TSAR方差的逆伽马；此外，我们推导出条件预测分布是未来观测的多变量正态分布。由于这些导出的条件分布是封闭形式，我们引入吉布斯采样器来呈现TSAR模型的贝叶斯分析，并轻松产生多步提前预测。使用Julia编程语言，我们进行了广泛的模拟研究，旨在评估我们提出的TSAR模型的全贝叶斯分析的准确性。此外，我们将我们的工作时间序列应用于一些欧洲国家的小时电力负荷。

{"title":"Full Bayesian analysis of triple seasonal autoregressive models","authors":"Ayman A. Amin","doi":"10.1111/anzs.12427","DOIUrl":"https://doi.org/10.1111/anzs.12427","url":null,"abstract":"<div>\u0000 \u0000 Seasonal autoregressive (SAR) time series models have been extended to fit time series exhibiting multiple seasonalities. However, hardly any research in Bayesian literature has been done on modelling multiple seasonalities. In this article, we propose a full Bayesian analysis of triple SAR (TSAR) models for time series with triple seasonality, considering identification, estimation and prediction for these TSAR models. In this Bayesian analysis of TSAR models, we assume the model errors to be normally distributed and the model order to be a random variable with a known maximum value, and we employ the g prior for the model coefficients and variance. Accordingly, we first derive the posterior mass function of the TSAR order in closed form, which then enables us to identify the best order of TSAR model as the order value with the highest posterior probability. In addition, we derive the conditional posteriors to be a multivariate normal for the TSAR coefficients and to be an inverse gamma for the TSAR variance; also, we derive the conditional predictive distribution to be a multivariate normal for future observations. Since these derived conditional distributions are in closed forms, we introduce the Gibbs sampler to present the Bayesian analysis of TSAR models and to easily produce multiple-step-ahead predictions. Using Julia programming language, we conduct an extensive simulation study, aiming to evaluate the accuracy of our proposed full Bayesian analysis for TSAR models. In addition, we apply our work on time series to hourly electricity load in some European countries.\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"389-416"},"PeriodicalIF":0.8,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0

Examining collinearities 检查共线性
IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY
Australian & New Zealand Journal of Statistics
Pub Date : 2024-08-29 DOI: 10.1111/anzs.12425

Zillur R. Shabuz, Paul H. Garthwaite

The cos-max method is a little-known method of identifying collinearities. It is based on the cos-max transformation, which makes minimal adjustment to a set of vectors to create orthogonal components with a one-to-one correspondence between the original vectors and the components. The aim of the transformation is that each vector should be close to the orthogonal component with which it is paired. Vectors involved in a collinearity must be adjusted substantially in order to create orthogonal components, while other vectors will typically be adjusted far less. The cos-max method uses the size of adjustments to identify collinearities. It gives a coherent relationship between collinear sets of variables and variance inflation factors (VIFs) and identifies collinear sets using more information than traditional methods. In this paper we describe these features of the method and examine its performance in examples, comparing it with alternative methods. In each example, the collinearities identified by the cos-max method only contained variables with high VIFs and contained all variables with high VIFs. The collinearities identified by other methods did not have such a close link to VIFs. Also, the collinearities identified by the cos-max method were as simple as or simpler than those given by other methods, with less overlap between collinearities in the variables that they contained.

摘要 cos-max 法是一种鲜为人知的识别共线性的方法。它以 cos-max 变换为基础，对一组向量进行最小调整，以创建正交分量，并在原始向量和分量之间建立一一对应关系。变换的目的是使每个向量都能接近与其配对的正交分量。为了创建正交分量，必须对涉及共线性的向量进行大幅调整，而其他向量的调整幅度通常要小得多。cos-max 方法使用调整的大小来识别共线性。与传统方法相比，该方法在共线变量集和方差膨胀因子（VIF）之间给出了一种连贯的关系，并利用更多的信息来识别共线变量集。在本文中，我们介绍了该方法的这些特点，并通过实例检验了其性能，同时将其与其他方法进行了比较。在每个例子中，cos-max 方法识别出的共线性只包含高 VIF 的变量，也包含所有高 VIF 的变量。其他方法识别出的共线性与 VIF 没有如此密切的联系。此外，cos-max 方法确定的共线性与其他方法确定的共线性一样简单，甚至更简单，其包含的变量共线性之间的重叠较少。

{"title":"Examining collinearities","authors":"Zillur R. Shabuz, Paul H. Garthwaite","doi":"10.1111/anzs.12425","DOIUrl":"10.1111/anzs.12425","url":null,"abstract":"<div>\u0000 \u0000 The cos-max method is a little-known method of identifying collinearities. It is based on the cos-max transformation, which makes minimal adjustment to a set of vectors to create orthogonal components with a one-to-one correspondence between the original vectors and the components. The aim of the transformation is that each vector should be close to the orthogonal component with which it is paired. Vectors involved in a collinearity must be adjusted substantially in order to create orthogonal components, while other vectors will typically be adjusted far less. The cos-max method uses the size of adjustments to identify collinearities. It gives a coherent relationship between collinear sets of variables and variance inflation factors (VIFs) and identifies collinear sets using more information than traditional methods. In this paper we describe these features of the method and examine its performance in examples, comparing it with alternative methods. In each example, the collinearities identified by the cos-max method only contained variables with high VIFs and contained all variables with high VIFs. The collinearities identified by other methods did not have such a close link to VIFs. Also, the collinearities identified by the cos-max method were as simple as or simpler than those given by other methods, with less overlap between collinearities in the variables that they contained.\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"367-388"},"PeriodicalIF":0.8,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0

Exact samples sizes for clinical trials subject to size and power constraints 受规模和功率限制的临床试验的精确样本量
IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY
Australian & New Zealand Journal of Statistics
Pub Date : 2024-08-29 DOI: 10.1111/anzs.12424

Chris J. Lloyd

This paper first describes the difficulties in providing the required sample sizes for clinical trials that guarantee type 1 and type 2 error control. The required sample sizes obviously depend on the test employed, and in this study we use the so-called E-test, which is known to have extremely favourable size properties and higher power than alternatives. To compute exact powers for this test in real time is not currently feasible, so a corpus of pre-computed exact powers (and sizes) was created, covering sample sizes up to 500. When there are no solutions within the corpus, a novel extrapolation technique is used. Exact size can be computed after the sample sizes have been extracted; however, for the E-test the exact size is virtually always very close to the nominal target. All the code has been converted into an R-package, which is available on CRAN and illustrated.

摘要本文首先介绍了为临床试验提供所需的样本量以保证 1 型和 2 型误差控制所面临的困难。所需的样本量显然取决于所采用的检验，在本研究中，我们采用了所谓的 E 检验，众所周知，该检验具有极其有利的样本量特性，且比其他检验具有更高的功率。实时计算该测试的精确幂目前并不可行，因此我们创建了一个预先计算精确幂（和大小）的语料库，涵盖的样本量最高可达 500 个。当语料库中没有解决方案时，就会使用一种新颖的外推法。在提取样本大小后，可以计算精确大小；不过，对于 E 测试，精确大小几乎总是非常接近标称目标。所有代码都已转换成 R 包，可在 CRAN 上获取，并附有图解。

{"title":"Exact samples sizes for clinical trials subject to size and power constraints","authors":"Chris J. Lloyd","doi":"10.1111/anzs.12424","DOIUrl":"10.1111/anzs.12424","url":null,"abstract":"This paper first describes the difficulties in providing the required sample sizes for clinical trials that guarantee type 1 and type 2 error control. The required sample sizes obviously depend on the test employed, and in this study we use the so-called E-test, which is known to have extremely favourable size properties and higher power than alternatives. To compute exact powers for this test in real time is not currently feasible, so a corpus of pre-computed exact powers (and sizes) was created, covering sample sizes up to 500. When there are no solutions within the corpus, a novel extrapolation technique is used. Exact size can be computed after the sample sizes have been extracted; however, for the E-test the exact size is virtually always very close to the nominal target. All the code has been converted into an R-package, which is available on CRAN and illustrated.","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"297-305"},"PeriodicalIF":0.8,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12424","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0

Bayesian analysis of multivariate mixed longitudinal ordinal and continuous data 多变量混合纵向序数和连续数据的贝叶斯分析
IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY
Australian & New Zealand Journal of Statistics
Pub Date : 2024-08-13 DOI: 10.1111/anzs.12421

Xiao Zhang

Multivariate longitudinal ordinal and continuous data exist in many scientific fields. However, it is a rigorous task to jointly analyse them due to the complicated correlated structures of those mixed data and the lack of a multivariate distribution. The multivariate probit model, assuming there is a multivariate normal latent variable for each multivariate ordinal data, becomes a natural modeling choice for longitudinal ordinal data especially for jointly analysing with longitudinal continuous data. However, the identifiable multivariate probit model requires the variances of the latent normal variables to be fixed at 1, thus the joint covariance matrix of the latent variables and the continuous multivariate normal variables is restricted at some of the diagonal elements. This constrains to develop both the classical and Bayesian methods to analyse mixed ordinal and continuous data. In this investigation, we proposed three Markov chain Monte Carlo (MCMC) methods: Metropolis–Hastings within Gibbs algorithm based on the identifiable model, and a Gibbs sampling algorithm and parameter-expanded data augmentation based on the constructed non-identifiable model. Through simulation studies and a real data application, we illustrated the performance of these three methods and provided an observation of using non-identifiable model to develop MCMC sampling methods.

摘要许多科学领域都存在多变量纵向序数和连续数据。然而，由于这些混合数据的相关结构复杂且缺乏多元分布，对它们进行联合分析是一项艰巨的任务。多变量 probit 模型假定每个多变量序数数据都有一个多变量正态潜变量，因此成为纵向序数数据，尤其是与纵向连续数据进行联合分析时的自然建模选择。然而，可识别多元 probit 模型要求潜变量正态变量的方差固定为 1，因此潜变量和连续多元正态变量的联合协方差矩阵在某些对角元素上受到限制。这就要求我们同时开发经典方法和贝叶斯方法来分析混合序数和连续数据。在这项研究中，我们提出了三种马尔科夫链蒙特卡罗（MCMC）方法：基于可识别模型的吉布斯算法中的 Metropolis-Hastings，以及基于构建的不可识别模型的吉布斯抽样算法和参数扩展数据增强。通过模拟研究和实际数据应用，我们说明了这三种方法的性能，并提供了使用不可识别模型开发 MCMC 采样方法的观察结果。

{"title":"Bayesian analysis of multivariate mixed longitudinal ordinal and continuous data","authors":"Xiao Zhang","doi":"10.1111/anzs.12421","DOIUrl":"10.1111/anzs.12421","url":null,"abstract":"Multivariate longitudinal ordinal and continuous data exist in many scientific fields. However, it is a rigorous task to jointly analyse them due to the complicated correlated structures of those mixed data and the lack of a multivariate distribution. The multivariate probit model, assuming there is a multivariate normal latent variable for each multivariate ordinal data, becomes a natural modeling choice for longitudinal ordinal data especially for jointly analysing with longitudinal continuous data. However, the identifiable multivariate probit model requires the variances of the latent normal variables to be fixed at 1, thus the joint covariance matrix of the latent variables and the continuous multivariate normal variables is restricted at some of the diagonal elements. This constrains to develop both the classical and Bayesian methods to analyse mixed ordinal and continuous data. In this investigation, we proposed three Markov chain Monte Carlo (MCMC) methods: Metropolis–Hastings within Gibbs algorithm based on the identifiable model, and a Gibbs sampling algorithm and parameter-expanded data augmentation based on the constructed non-identifiable model. Through simulation studies and a real data application, we illustrated the performance of these three methods and provided an observation of using non-identifiable model to develop MCMC sampling methods.","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"325-346"},"PeriodicalIF":0.8,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12421","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0

Distributional modelling of positively skewed data via the flexible Weibull extension distribution 通过灵活的威布尔扩展分布建立正倾斜数据的分布模型
IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY
Australian & New Zealand Journal of Statistics
Pub Date : 2024-08-11 DOI: 10.1111/anzs.12423

Freddy Hernández-Barajas, Olga Usuga-Manco, Carmen Patino-Rodríguez, Fernando Marmolejo-Ramos

The time until an event occurs is often known to have a skewed distribution. To model this, a statistical distribution called the two-parameter flexible Weibull extension (FWE) has been proposed. In this paper, the FWE distribution is used to model datasets through the use of generalised additive models for location, scale and shape (GAMLSS) distributional regression. GAMLSS is the only regression technique that can examine the effects of both categorical and numeric predictors on all the parameters of the distribution used to fit the dependent variable. To make it easier to use the FWE distribution through GAMLSS, the RelDists R package is proposed. A simulation study shows that FWE modelling through GAMLSS provides reliable parameter estimates even in the presence of factors that affect the distribution.

摘要众所周知，事件发生前的时间通常呈倾斜分布。为了模拟这种情况，有人提出了一种称为双参数灵活威布尔扩展（FWE）的统计分布。本文通过使用位置、规模和形状的广义加性模型（GAMLSS）分布回归，将 FWE 分布用于数据集建模。GAMLSS 是唯一一种可以检查分类和数字预测因子对用于拟合因变量的分布的所有参数的影响的回归技术。为了更方便地通过 GAMLSS 使用 FWE 分布，我们提出了 RelDists R 软件包。模拟研究表明，即使存在影响分布的因素，通过 GAMLSS 建立 FWE 模型也能提供可靠的参数估计。

{"title":"Distributional modelling of positively skewed data via the flexible Weibull extension distribution","authors":"Freddy Hernández-Barajas, Olga Usuga-Manco, Carmen Patino-Rodríguez, Fernando Marmolejo-Ramos","doi":"10.1111/anzs.12423","DOIUrl":"10.1111/anzs.12423","url":null,"abstract":"The time until an event occurs is often known to have a skewed distribution. To model this, a statistical distribution called the two-parameter flexible Weibull extension (FWE) has been proposed. In this paper, the FWE distribution is used to model datasets through the use of generalised additive models for location, scale and shape (GAMLSS) distributional regression. GAMLSS is the only regression technique that can examine the effects of both categorical and numeric predictors on all the parameters of the distribution used to fit the dependent variable. To make it easier to use the FWE distribution through GAMLSS, the RelDists R package is proposed. A simulation study shows that FWE modelling through GAMLSS provides reliable parameter estimates even in the presence of factors that affect the distribution.","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"306-324"},"PeriodicalIF":0.8,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12423","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0

Spline linear mixed-effects models for causal mediation analysis with longitudinal data 用于纵向数据因果中介分析的样条线性混合效应模型
IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY
Australian & New Zealand Journal of Statistics
Pub Date : 2024-07-26 DOI: 10.1111/anzs.12422

Jeffrey M. Albert, Hongxu Zhu, Tanujit Dey, Jiayang Sun, Wojbor A. Woyczynski, Gregory Powers, Meeyoung Min

Often, causal mediation analysis is of interest when both the mediator and the final outcome are repeatedly measured, but limited work has been done for this situation (as opposed to where only the mediator is repeatedly measured). Available methods are primarily based on parametric models and tend to be sensitive to model assumptions. This article presents semiparametric, continuous-time models to provide a flexible and robust approach to causal mediation analysis for longitudinal data, which allows these data to be unbalanced or irregular. Specifically, the method uses spline linear mixed-effects models for the mediator and for the final outcome, with a two-step approach to model-fitting in which a predicted mediator is used as a covariate in the final outcome model. The models allow flexible functions for both the mean and individual response functions for each outcome. We derive estimated natural direct and indirect effects as a function of time using an extended mediation formula and sequential ignorability assumption. In simulation studies, we compare properties of estimated direct and indirect effects, and a delta method estimate of the standard error of the latter, under alternative approaches for predicting the mediator. The approach is illustrated using harmonised data from two cohort studies to examine attention as a mediator of the effect of prenatal tobacco exposure on externalising behaviour in children.

摘要通常情况下，当中介因子和最终结果都被重复测量时，因果中介分析就会引起人们的兴趣，但针对这种情况（与只重复测量中介因子的情况相反）所做的工作很有限。现有方法主要基于参数模型，往往对模型假设很敏感。本文提出了半参数连续时间模型，为纵向数据的因果中介分析提供了一种灵活稳健的方法，允许这些数据是不平衡或不规则的。具体来说，该方法对中介因子和最终结果使用样条线性混合效应模型，采用两步法进行模型拟合，其中预测的中介因子在最终结果模型中用作协变量。这些模型允许对每种结果的平均值和个体反应函数使用灵活的函数。我们利用扩展中介公式和顺序无知假设，得出作为时间函数的估计自然直接效应和间接效应。在模拟研究中，我们比较了估算的直接效应和间接效应的特性，以及在其他中介预测方法下，后者标准误差的德尔塔法估算值。我们使用了两项队列研究的统一数据来说明这种方法，以研究注意力作为产前烟草暴露对儿童外化行为影响的中介因素。

{"title":"Spline linear mixed-effects models for causal mediation analysis with longitudinal data","authors":"Jeffrey M. Albert, Hongxu Zhu, Tanujit Dey, Jiayang Sun, Wojbor A. Woyczynski, Gregory Powers, Meeyoung Min","doi":"10.1111/anzs.12422","DOIUrl":"10.1111/anzs.12422","url":null,"abstract":"<div>\u0000 \u0000 Often, causal mediation analysis is of interest when both the mediator and the final outcome are repeatedly measured, but limited work has been done for this situation (as opposed to where only the mediator is repeatedly measured). Available methods are primarily based on parametric models and tend to be sensitive to model assumptions. This article presents semiparametric, continuous-time models to provide a flexible and robust approach to causal mediation analysis for longitudinal data, which allows these data to be unbalanced or irregular. Specifically, the method uses spline linear mixed-effects models for the mediator and for the final outcome, with a two-step approach to model-fitting in which a predicted mediator is used as a covariate in the final outcome model. The models allow flexible functions for both the mean and individual response functions for each outcome. We derive estimated natural direct and indirect effects as a function of time using an extended mediation formula and sequential ignorability assumption. In simulation studies, we compare properties of estimated direct and indirect effects, and a delta method estimate of the standard error of the latter, under alternative approaches for predicting the mediator. The approach is illustrated using harmonised data from two cohort studies to examine attention as a mediator of the effect of prenatal tobacco exposure on externalising behaviour in children.\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"347-366"},"PeriodicalIF":0.8,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141779821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0

首页上一页

1

2

3

4

5

6

...

10

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Australian & New Zealand Journal of Statistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
﹀