The American Statistician最新文献

英文中文

The Probability Mass Function of the Kaplan–Meier Product–Limit Estimator Kaplan-Meier积极限估计量的概率质量函数

The American Statistician

Pub Date : 2023-01-02 DOI: 10.1080/00031305.2022.2070279

Yuxin Qin, H. Sasinowska, L. Leemis

Abstract Kaplan and Meier’s 1958 article developed a nonparametric estimator of the survivor function from a right-censored dataset. Determining the size of the support of the estimator as a function of the sample size provides a challenging exercise for students in an advanced course in mathematical statistics. We devise two algorithms for calculating the support size and calculate the associated probability mass function for small sample sizes and particular probability distributions for the failure and censoring times.

Kaplan和Meier在1958年的文章中提出了一种基于右删节数据集的幸存者函数的非参数估计。确定估计量的支持度大小作为样本大小的函数，为数理统计高级课程的学生提供了一个具有挑战性的练习。我们设计了两种计算支撑尺寸的算法，并计算了小样本量的相关概率质量函数以及失效和审查时间的特定概率分布。

引用次数: 0

Quantitative Drug Safety and Benefit-Risk Evaluation: Practical and Cross-Disciplinary Approaches 定量药物安全性和收益风险评估:实用和跨学科的方法

The American Statistician

Pub Date : 2023-01-02 DOI: 10.1080/00031305.2022.2160592

Huan Wang

This book, which wants to be called OODA, is in an emerging genre, the statistical autobiography. Marron and Dryden have expanded the frontiers of data analysis in many directions over their careers, and they document the challenges encountered along the way. Their data illustrate growth in the statistical landscape in both size and complexity. Functional data analysis, and its cousin shape analysis, took data analysis beyond the familiar matrix format by replacing frequently unordered columns by continuous and usually differentiable curves. The functional transition in one sense was easy because it remained within the Hilbert space framework. But the space of operations on curves is larger than linear algebra, since it includes differentiation to fit data with differential equations, integration to compute arc length and the nonlinear transformation of domains so as to align curve features. The authors add to the mix the graph structures trees and networks, as well as curved manifolds. This binding of new data objects to new transformation groups coincided roughly with the advent of object oriented programming systems, and hence the title. The first three chapters provide short overviews of several example analyses followed by tutorial material on variants of principle component analysis. Chapters 4 , 5, and 6 provide examples of data exploration and confirmation, respectively, as well as tips on visualizing results. Chapter 7 turns from PCA to distance based analyses and multidimensional scaling, and chapter 8 to shape and manifold representations. Chapter 9 illustrates data alignment using domain warping by the Fisher-Rao method. Chapter 10 looks at tree graphs and networks as data. Chapters 11 and 12 consider novel classification and clustering techniques. Chapters 13 and 14 offer methods for inference and asymptotic, respectively, in high-dimensional contexts. Chapter 15 describes the statistical graphics tool SiZer and chapter 16 outlines robust estimation techniques. The book concludes with additional material on PCA and a final chapter on general reflections on object oriented data. By my count the book examines 19 substantial and varied datasets, most of which are available on gitHub along with analyses using Matlab. They also add to these a number of toy sets used as illustrations. Their use of color and other statistical graphics tools is outstanding, and makes displays exciting even if not always essential. My personal favorite is the display of 3D rectum-prostate-bladder structures, require a solid background in finite element analysis to produce. The target audience is graduate students in statistics and machine learning, and the book provides a gold mine of fascinating potential class projects. However, as a teaching tool it does have some limitations. The many literature citations that seem to accompany any assertion, and make for cluttered reading. Restricting these to an annotated resource section at the end of each chapter woul

这本书，希望被称为OODA，属于一种新兴的体裁，统计自传。Marron和Dryden在他们的职业生涯中扩展了数据分析在许多方面的前沿，他们记录了在此过程中遇到的挑战。他们的数据说明了统计领域在规模和复杂性方面的增长。函数数据分析及其近亲形状分析通过用连续且通常可微的曲线替换频繁无序的列，使数据分析超越了熟悉的矩阵格式。从某种意义上说，功能转换很容易，因为它仍然在希尔伯特空间框架内。但是曲线运算的空间比线性代数要大，因为它包括用微分方程拟合数据的微分，计算弧长的积分，以及使曲线特征对齐的域的非线性变换。作者将图结构、树和网络以及弯曲流形加入其中。这种将新数据对象绑定到新转换组的方法与面向对象编程系统的出现大致一致，因此有了这个标题。前三章提供了几个示例分析的简短概述，然后是主成分分析变体的教程材料。第4章、第5章和第6章分别提供了数据探索和确认的示例，以及可视化结果的提示。第7章从PCA转向基于距离的分析和多维尺度，第8章转向形状和流形表示。第9章通过Fisher-Rao方法演示了使用域翘曲的数据对齐。第10章将树状图和网络视为数据。第11章和第12章考虑了新的分类和聚类技术。第13章和第14章分别提供了高维环境下的推理和渐近方法。第15章描述了统计图形工具SiZer，第16章概述了稳健估计技术。本书的最后一章是关于PCA的附加材料，最后一章是关于面向对象数据的一般思考。据我统计，这本书检查了19个实质性的和不同的数据集，其中大部分可以在gitHub上获得，并使用Matlab进行分析。他们还增加了这些玩具的数量作为插图。他们对颜色和其他统计图形工具的使用是杰出的，即使不总是必不可少的，也使显示令人兴奋。我个人最喜欢的是三维直肠-前列腺-膀胱结构的显示，需要扎实的有限元分析背景才能制作。目标读者是统计学和机器学习的研究生，这本书提供了一个迷人的潜在课程项目的金矿。然而，作为一种教学工具，它确实有一些局限性。许多文献引用似乎伴随着任何断言，并使混乱的阅读。将这些限制在每章末尾的注释资源部分将会有所帮助。引文通常用于更有启发性的文本行。通常在没有警告或后续定义的情况下宣布主题。学生们会花很多时间在图书馆，或者更有可能是维基百科上。这个头衔会保留下来吗?也许不是，因为大多数研究生都太年轻，没有经历过从C到c++或从S到r的转变。但是，作为一个应用程序的崇拜者，我发现这本书既不可抗拒又鼓舞人心。我的另一个标题可能是“在统计的内陆勘探”，马龙和德莱顿将激励我发现更多的瑰宝。

{"title":"Quantitative Drug Safety and Benefit-Risk Evaluation: Practical and Cross-Disciplinary Approaches","authors":"Huan Wang","doi":"10.1080/00031305.2022.2160592","DOIUrl":"https://doi.org/10.1080/00031305.2022.2160592","url":null,"abstract":"This book, which wants to be called OODA, is in an emerging genre, the statistical autobiography. Marron and Dryden have expanded the frontiers of data analysis in many directions over their careers, and they document the challenges encountered along the way. Their data illustrate growth in the statistical landscape in both size and complexity. Functional data analysis, and its cousin shape analysis, took data analysis beyond the familiar matrix format by replacing frequently unordered columns by continuous and usually differentiable curves. The functional transition in one sense was easy because it remained within the Hilbert space framework. But the space of operations on curves is larger than linear algebra, since it includes differentiation to fit data with differential equations, integration to compute arc length and the nonlinear transformation of domains so as to align curve features. The authors add to the mix the graph structures trees and networks, as well as curved manifolds. This binding of new data objects to new transformation groups coincided roughly with the advent of object oriented programming systems, and hence the title. The first three chapters provide short overviews of several example analyses followed by tutorial material on variants of principle component analysis. Chapters 4 , 5, and 6 provide examples of data exploration and confirmation, respectively, as well as tips on visualizing results. Chapter 7 turns from PCA to distance based analyses and multidimensional scaling, and chapter 8 to shape and manifold representations. Chapter 9 illustrates data alignment using domain warping by the Fisher-Rao method. Chapter 10 looks at tree graphs and networks as data. Chapters 11 and 12 consider novel classification and clustering techniques. Chapters 13 and 14 offer methods for inference and asymptotic, respectively, in high-dimensional contexts. Chapter 15 describes the statistical graphics tool SiZer and chapter 16 outlines robust estimation techniques. The book concludes with additional material on PCA and a final chapter on general reflections on object oriented data. By my count the book examines 19 substantial and varied datasets, most of which are available on gitHub along with analyses using Matlab. They also add to these a number of toy sets used as illustrations. Their use of color and other statistical graphics tools is outstanding, and makes displays exciting even if not always essential. My personal favorite is the display of 3D rectum-prostate-bladder structures, require a solid background in finite element analysis to produce. The target audience is graduate students in statistics and machine learning, and the book provides a gold mine of fascinating potential class projects. However, as a teaching tool it does have some limitations. The many literature citations that seem to accompany any assertion, and make for cluttered reading. Restricting these to an annotated resource section at the end of each chapter woul","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128073728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Object Oriented Data Analysis 面向对象的数据分析

The American Statistician

Pub Date : 2023-01-02 DOI: 10.1080/00031305.2022.2160590

James O. Ramsay

引用次数: 0

Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology 在线控制实验中的统计挑战:A/B测试方法综述

The American Statistician

Pub Date : 2022-12-21 DOI: 10.1080/00031305.2023.2257237

Nicholas Larsen, Jonathan W. Stallrich, Srijan Sengupta, Alex Deng, Ron Kohavi, Nathaniel T. Stevens

The rise of internet-based services and products in the late 1990's brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the past two decades, organizations such as Airbnb, Alibaba, Amazon, Baidu, Booking, Alphabet's Google, LinkedIn, Lyft, Meta's Facebook, Microsoft, Netflix, Twitter, Uber, and Yandex have invested tremendous resources in online controlled experiments (OCEs) to assess the impact of innovation on their customers and businesses. Running OCEs at scale has presented a host of challenges requiring solutions from many domains. In this paper we review challenges that require new statistical methodologies to address them. In particular, we discuss the practice and culture of online experimentation, as well as its statistics literature, placing the current methodologies within their relevant statistical lineages and providing illustrative examples of OCE applications. Our goal is to raise academic statisticians' awareness of these new research opportunities to increase collaboration between academia and the online industry.

20世纪90年代末，基于互联网的服务和产品的兴起，为在线企业参与大规模数据驱动的决策带来了前所未有的机会。在过去的20年里，Airbnb、阿里巴巴、亚马逊、百度、Booking、Alphabet旗下的谷歌、LinkedIn、Lyft、Meta旗下的Facebook、微软、Netflix、Twitter、Uber和Yandex等公司在在线控制实验(OCEs)上投入了大量资源，以评估创新对其客户和业务的影响。大规模运行OCEs带来了许多挑战，需要来自许多领域的解决方案。在本文中，我们回顾了需要新的统计方法来解决这些问题的挑战。特别是，我们讨论了在线实验的实践和文化，以及其统计文献，将当前的方法置于相关的统计谱系中，并提供了OCE应用的说明性示例。我们的目标是提高学术统计学家对这些新研究机会的认识，以增加学术界和在线行业之间的合作。

{"title":"Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology","authors":"Nicholas Larsen, Jonathan W. Stallrich, Srijan Sengupta, Alex Deng, Ron Kohavi, Nathaniel T. Stevens","doi":"10.1080/00031305.2023.2257237","DOIUrl":"https://doi.org/10.1080/00031305.2023.2257237","url":null,"abstract":"The rise of internet-based services and products in the late 1990's brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the past two decades, organizations such as Airbnb, Alibaba, Amazon, Baidu, Booking, Alphabet's Google, LinkedIn, Lyft, Meta's Facebook, Microsoft, Netflix, Twitter, Uber, and Yandex have invested tremendous resources in online controlled experiments (OCEs) to assess the impact of innovation on their customers and businesses. Running OCEs at scale has presented a host of challenges requiring solutions from many domains. In this paper we review challenges that require new statistical methodologies to address them. In particular, we discuss the practice and culture of online experimentation, as well as its statistics literature, placing the current methodologies within their relevant statistical lineages and providing illustrative examples of OCE applications. Our goal is to raise academic statisticians' awareness of these new research opportunities to increase collaboration between academia and the online industry.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129899586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Quantifying the Inspection Paradox with Random Time 随机时间下检验悖论的量化

The American Statistician

Pub Date : 2022-12-19 DOI: 10.1080/00031305.2022.2151510

Diana Rauwolf, U. Kamps

Abstract The well-known inspection paradox of renewal theory states that, in expectation, the inspection interval is larger than a common renewal interval, in general. For a random inspection time, which includes the deterministic case, and a delayed renewal process, representations of the expected length of an inspection interval and related inequalities in terms of covariances are shown. Datasets of eruption times of Beehive Geyser and Riverside Geyser in Yellowstone National Park, as well as several distributional examples, illustrate the findings.

摘要:众所周知的更新理论检查悖论表明，在期望中，检查间隔通常大于普通的更新间隔。对于随机检查时间(包括确定性情况)和延迟更新过程，给出了检查间隔的期望长度和相关协方差不等式的表示。黄石国家公园蜂巢间歇泉和河滨间歇泉喷发时间的数据集，以及几个分布的例子，说明了这一发现。

引用次数: 0

Integrating Ethics into the Guidelines for Assessment and Instruction in Statistics Education (GAISE) 将伦理纳入统计教育评估和指导准则（GAISE）

The American Statistician

Pub Date : 2022-12-13 DOI: 10.1080/00031305.2022.2156612

R. Raman, J. Utts, Andrew I. Cohen, Matthew J. Hayat

Abstract Statistics education at all levels includes data collected on human subjects. Thus, statistics educators have a responsibility to educate their students about the ethical aspects related to the collection of those data. The changing statistics education landscape has seen instruction moving from being formula-based to being focused on statistical reasoning. The widely implemented Guidelines for Assessment and Instruction in Statistics Education (GAISE) Report has paved the way for instructors to present introductory statistics to students in a way that is both approachable and engaging. However, with technological advancement and the increase in availability of real-world datasets, it is necessary that instruction also integrate the ethical aspects around data sources, such as privacy, how the data were obtained and whether participants consent to the use of their data. In this article, we propose incorporating ethics into established curricula and integrating ethics into undergraduate-level introductory statistics courses based on recommendations in the GAISE Report. We provide a few examples of how to prompt students to constructively think about their ethical responsibilities when working with data.

摘要各级统计教育都包括收集以人为对象的数据。因此，统计教育工作者有责任对学生进行与收集这些数据有关的道德教育。随着统计教育形势的不断变化，教学已从以公式为基础转向以统计推理为重点。广泛实施的《统计教育评估与教学指南》（GAISE）报告为教师以平易近人、引人入胜的方式向学生介绍统计学入门知识铺平了道路。然而，随着技术的进步和现实世界数据集可用性的增加，教学中也有必要纳入数据来源的伦理方面，如隐私、数据获取方式以及参与者是否同意使用其数据等。在本文中，我们根据 GAISE 报告中的建议，提议将伦理学纳入既定课程，并将伦理学纳入本科入门统计学课程。我们提供了一些实例，说明如何促使学生在处理数据时建设性地思考他们的伦理责任。

{"title":"Integrating Ethics into the Guidelines for Assessment and Instruction in Statistics Education (GAISE)","authors":"R. Raman, J. Utts, Andrew I. Cohen, Matthew J. Hayat","doi":"10.1080/00031305.2022.2156612","DOIUrl":"https://doi.org/10.1080/00031305.2022.2156612","url":null,"abstract":"Abstract Statistics education at all levels includes data collected on human subjects. Thus, statistics educators have a responsibility to educate their students about the ethical aspects related to the collection of those data. The changing statistics education landscape has seen instruction moving from being formula-based to being focused on statistical reasoning. The widely implemented Guidelines for Assessment and Instruction in Statistics Education (GAISE) Report has paved the way for instructors to present introductory statistics to students in a way that is both approachable and engaging. However, with technological advancement and the increase in availability of real-world datasets, it is necessary that instruction also integrate the ethical aspects around data sources, such as privacy, how the data were obtained and whether participants consent to the use of their data. In this article, we propose incorporating ethics into established curricula and integrating ethics into undergraduate-level introductory statistics courses based on recommendations in the GAISE Report. We provide a few examples of how to prompt students to constructively think about their ethical responsibilities when working with data.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127034149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Selection Criterion of Working Correlation Structure for Spatially Correlated Data 空间相关数据的工作关联结构选择准则

The American Statistician

Pub Date : 2022-12-13 DOI: 10.1080/00031305.2022.2157874

Marcelo dos Santos, F. De Bastiani, M. Uribe-Opazo, M. Galea

Abstract To obtain regression parameter estimates in generalized estimation equation modeling, whether in longitudinal or spatially correlated data, it is necessary to specify the structure of the working correlation matrix. The regression parameter estimates can be affected by the choice of this matrix. Within spatial statistics, the correlation matrix also influences how spatial variability is modeled. Therefore, this study proposes a new method for selecting a working matrix, based on conditioning the variance-covariance matrix naive. The method performance is evaluated by an extensive simulation study, using the marginal distributions of normal, Poisson, and gamma for spatially correlated data. The correlation structure specification is based on semivariogram models, using the Wendland, Matérn, and spherical model families. The results reveal that regarding the hit rates of the true spatial correlation structure of simulated data, the proposed criterion resulted in better performance than competing criteria: quasi-likelihood under the independence model criterion QIC, correlation information criterion CIC, and the Rotnizky–Jewell criterion RJC. The application of an appropriate spatial correlation structure selection was shown using the first-semester average rainfall data of 2021 in the state of Pernambuco, Brazil.

摘要为了获得广义估计方程建模中的回归参数估计，无论是纵向还是空间相关数据，都需要指定工作相关矩阵的结构。该矩阵的选择会影响回归参数的估计。在空间统计中，相关矩阵也影响空间变异性的建模方式。因此，本研究提出了一种新的选择工作矩阵的方法，该方法基于方差-协方差矩阵的朴素条件。该方法的性能通过广泛的模拟研究进行评估，使用正态分布，泊松分布和伽玛空间相关数据的边际分布。相关结构规范基于半变差模型，使用Wendland、mat和球形模型族。结果表明，对于模拟数据真实空间相关结构的准确率，本文提出的准则优于独立模型准则QIC下的准似然准则、相关信息准则CIC下的准似然准则和Rotnizky-Jewell准则RJC下的准似然准则。以巴西伯南布哥州2021年上半年平均降雨量数据为例，展示了空间相关结构选择的应用。

{"title":"Selection Criterion of Working Correlation Structure for Spatially Correlated Data","authors":"Marcelo dos Santos, F. De Bastiani, M. Uribe-Opazo, M. Galea","doi":"10.1080/00031305.2022.2157874","DOIUrl":"https://doi.org/10.1080/00031305.2022.2157874","url":null,"abstract":"Abstract To obtain regression parameter estimates in generalized estimation equation modeling, whether in longitudinal or spatially correlated data, it is necessary to specify the structure of the working correlation matrix. The regression parameter estimates can be affected by the choice of this matrix. Within spatial statistics, the correlation matrix also influences how spatial variability is modeled. Therefore, this study proposes a new method for selecting a working matrix, based on conditioning the variance-covariance matrix naive. The method performance is evaluated by an extensive simulation study, using the marginal distributions of normal, Poisson, and gamma for spatially correlated data. The correlation structure specification is based on semivariogram models, using the Wendland, Matérn, and spherical model families. The results reveal that regarding the hit rates of the true spatial correlation structure of simulated data, the proposed criterion resulted in better performance than competing criteria: quasi-likelihood under the independence model criterion QIC, correlation information criterion CIC, and the Rotnizky–Jewell criterion RJC. The application of an appropriate spatial correlation structure selection was shown using the first-semester average rainfall data of 2021 in the state of Pernambuco, Brazil.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129350046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Learning to forecast: The probabilistic time series forecasting challenge 学习预测:概率时间序列预测的挑战

The American Statistician

Pub Date : 2022-11-29 DOI: 10.1080/00031305.2023.2199800

J. Bracher, Nils Koster, Fabian Kruger, Sebastian Lerch

We report on a course project in which students submit weekly probabilistic forecasts of two weather variables and one financial variable. This real-time format allows students to engage in practical forecasting, which requires a diverse set of skills in data science and applied statistics. We describe the context and aims of the course, and discuss design parameters like the selection of target variables, the forecast submission process, the evaluation of forecast performance, and the feedback provided to students. Furthermore, we describe empirical properties of students' probabilistic forecasts, as well as some lessons learned on our part.

我们报告了一个课程项目，学生每周提交两个天气变量和一个金融变量的概率预测。这种实时格式允许学生参与实际预测，这需要数据科学和应用统计方面的各种技能。我们描述了课程的背景和目的，并讨论了设计参数，如目标变量的选择，预测提交过程，预测性能的评估，以及提供给学生的反馈。此外，我们描述了学生概率预测的经验性质，以及我们所学到的一些经验教训。

引用次数: 0

Improved approximation and visualization of the correlation matrix 改进了相关矩阵的近似和可视化

The American Statistician

Pub Date : 2022-11-23 DOI: 10.1080/00031305.2023.2186952

J. Graffelman, Jan de Leeuw

The graphical representation of the correlation matrix by means of different multivariate statistical methods is reviewed, a comparison of the different procedures is presented with the use of an example data set, and an improved representation with better fit is proposed. Principal component analysis is widely used for making pictures of correlation structure, though as shown a weighted alternating least squares approach that avoids the fitting of the diagonal of the correlation matrix outperforms both principal component analysis and principal factor analysis in approximating a correlation matrix. Weighted alternating least squares is a very strong competitor for principal component analysis, in particular if the correlation matrix is the focus of the study, because it improves the representation of the correlation matrix, often at the expense of only a minor percentage of explained variance for the original data matrix, if the latter is mapped onto the correlation biplot by regression. In this article, we propose to combine weighted alternating least squares with an additive adjustment of the correlation matrix, and this is seen to lead to further improved approximation of the correlation matrix.

本文回顾了用不同的多元统计方法表示相关矩阵的方法，并以实例数据集对不同的方法进行了比较，提出了一种拟合更好的改进方法。主成分分析被广泛用于制作相关结构的图像，尽管如图所示，加权交替最小二乘方法避免了相关矩阵对角线的拟合，在近似相关矩阵方面优于主成分分析和主因子分析。加权交替最小二乘是主成分分析的有力竞争者，特别是如果相关矩阵是研究的重点，因为它改善了相关矩阵的表示，如果原始数据矩阵通过回归映射到相关双标图上，则通常只会牺牲一小部分可解释方差。在本文中，我们提出将加权交替最小二乘与相关矩阵的加性调整相结合，这可以进一步改善相关矩阵的近似。

{"title":"Improved approximation and visualization of the correlation matrix","authors":"J. Graffelman, Jan de Leeuw","doi":"10.1080/00031305.2023.2186952","DOIUrl":"https://doi.org/10.1080/00031305.2023.2186952","url":null,"abstract":"The graphical representation of the correlation matrix by means of different multivariate statistical methods is reviewed, a comparison of the different procedures is presented with the use of an example data set, and an improved representation with better fit is proposed. Principal component analysis is widely used for making pictures of correlation structure, though as shown a weighted alternating least squares approach that avoids the fitting of the diagonal of the correlation matrix outperforms both principal component analysis and principal factor analysis in approximating a correlation matrix. Weighted alternating least squares is a very strong competitor for principal component analysis, in particular if the correlation matrix is the focus of the study, because it improves the representation of the correlation matrix, often at the expense of only a minor percentage of explained variance for the original data matrix, if the latter is mapped onto the correlation biplot by regression. In this article, we propose to combine weighted alternating least squares with an additive adjustment of the correlation matrix, and this is seen to lead to further improved approximation of the correlation matrix.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121780467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Comment on “On Optimal Correlation-Based Prediction”, By Bottai et al. (2022) 关于“基于最优相关性的预测”的评论，由Bottai等人(2022)

The American Statistician

Pub Date : 2022-11-15 DOI: 10.1080/00031305.2022.2141879

S. Lipovetsky

where μ1 and μ2 are the means, and σ1 and σ2 are the standard errors of the dependent variable y and the predictor x, respectively, and sgn(ρ) is the sign of Pearson correlation ρ of these variables. In contrast to the best linear prediction (1), the slope of the best linear predictor (2), obtained with the restriction that the variance of the predictor of y equals the variance of y itself, is expressed by the sgn(ρ) replacing the actual value of ρ in (1). The formula (1) corresponds to the simple regression, while the formula (2) coincides with the so-called diagonal regression. The diagonal regression was proposed by Ragnar Frisch (1934), one of the founders of modern economics and the first economics Nobel laureate, who coined such terms as econometrics and collinearity. Up to the variables centering, the formula (2) defines the slope as the signed quotient of the standard deviations of the dependent and independent variables, and the diagonal regression for one and two predictors was considered in Cobb (1939, 1943). The model of the form (2) for one predictor is identical to the so-called geometric mean regression, standard (reduced) major axis regression, and some others, reviewed in the work by Xe (2014), with an extensive list of many researchers independently proposed and developed these models. Derivation of the diagonal regression (2) for the models with errors in measurement by both variables via the maximum likelihood criterion is described in Leser (1974, Chapt. 2). More references on diagonal regres-

其中μ1和μ2分别是因变量y和预测变量x的均值，σ1和σ2分别是因变量y和预测变量x的标准误差，sgn(ρ)是这些变量的Pearson相关符号ρ。与最佳线性预测(1)相反，最佳线性预测器(2)的斜率是在y的预测器的方差等于y本身的方差的限制下得到的，用sgn(ρ)代替(1)中ρ的实际值来表示。公式(1)对应于简单回归，而公式(2)与所谓的对角回归一致。对角回归是由现代经济学奠基人之一、第一位诺贝尔经济学奖得主拉格纳尔•弗里施(Ragnar Frisch, 1934)提出的，他创造了计量经济学和共线性等术语。在变量定心之前，公式(2)将斜率定义为因变量和自变量标准差的有符号商，Cobb(1939, 1943)考虑了一个和两个预测因子的对角回归。一个预测器的形式(2)模型与所谓的几何平均回归、标准(减少)长轴回归和其他一些模型相同，在Xe(2014)的工作中进行了回顾，许多研究人员独立提出并开发了这些模型。对于两个变量通过最大似然准则测量误差的模型，对角回归(2)的推导见Leser(1974，第2章)。更多关于对角回归的参考文献

{"title":"Comment on “On Optimal Correlation-Based Prediction”, By Bottai et al. (2022)","authors":"S. Lipovetsky","doi":"10.1080/00031305.2022.2141879","DOIUrl":"https://doi.org/10.1080/00031305.2022.2141879","url":null,"abstract":"where μ1 and μ2 are the means, and σ1 and σ2 are the standard errors of the dependent variable y and the predictor x, respectively, and sgn(ρ) is the sign of Pearson correlation ρ of these variables. In contrast to the best linear prediction (1), the slope of the best linear predictor (2), obtained with the restriction that the variance of the predictor of y equals the variance of y itself, is expressed by the sgn(ρ) replacing the actual value of ρ in (1). The formula (1) corresponds to the simple regression, while the formula (2) coincides with the so-called diagonal regression. The diagonal regression was proposed by Ragnar Frisch (1934), one of the founders of modern economics and the first economics Nobel laureate, who coined such terms as econometrics and collinearity. Up to the variables centering, the formula (2) defines the slope as the signed quotient of the standard deviations of the dependent and independent variables, and the diagonal regression for one and two predictors was considered in Cobb (1939, 1943). The model of the form (2) for one predictor is identical to the so-called geometric mean regression, standard (reduced) major axis regression, and some others, reviewed in the work by Xe (2014), with an extensive list of many researchers independently proposed and developed these models. Derivation of the diagonal regression (2) for the models with errors in measurement by both variables via the maximum likelihood criterion is described in Leser (1974, Chapt. 2). More references on diagonal regres-","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132470124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

The American Statistician

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀