Pub Date : 2023-01-02DOI: 10.1080/00031305.2022.2070279
Yuxin Qin, H. Sasinowska, L. Leemis
Abstract Kaplan and Meier’s 1958 article developed a nonparametric estimator of the survivor function from a right-censored dataset. Determining the size of the support of the estimator as a function of the sample size provides a challenging exercise for students in an advanced course in mathematical statistics. We devise two algorithms for calculating the support size and calculate the associated probability mass function for small sample sizes and particular probability distributions for the failure and censoring times.
{"title":"The Probability Mass Function of the Kaplan–Meier Product–Limit Estimator","authors":"Yuxin Qin, H. Sasinowska, L. Leemis","doi":"10.1080/00031305.2022.2070279","DOIUrl":"https://doi.org/10.1080/00031305.2022.2070279","url":null,"abstract":"Abstract Kaplan and Meier’s 1958 article developed a nonparametric estimator of the survivor function from a right-censored dataset. Determining the size of the support of the estimator as a function of the sample size provides a challenging exercise for students in an advanced course in mathematical statistics. We devise two algorithms for calculating the support size and calculate the associated probability mass function for small sample sizes and particular probability distributions for the failure and censoring times.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126176402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-02DOI: 10.1080/00031305.2022.2160592
Huan Wang
This book, which wants to be called OODA, is in an emerging genre, the statistical autobiography. Marron and Dryden have expanded the frontiers of data analysis in many directions over their careers, and they document the challenges encountered along the way. Their data illustrate growth in the statistical landscape in both size and complexity. Functional data analysis, and its cousin shape analysis, took data analysis beyond the familiar matrix format by replacing frequently unordered columns by continuous and usually differentiable curves. The functional transition in one sense was easy because it remained within the Hilbert space framework. But the space of operations on curves is larger than linear algebra, since it includes differentiation to fit data with differential equations, integration to compute arc length and the nonlinear transformation of domains so as to align curve features. The authors add to the mix the graph structures trees and networks, as well as curved manifolds. This binding of new data objects to new transformation groups coincided roughly with the advent of object oriented programming systems, and hence the title. The first three chapters provide short overviews of several example analyses followed by tutorial material on variants of principle component analysis. Chapters 4 , 5, and 6 provide examples of data exploration and confirmation, respectively, as well as tips on visualizing results. Chapter 7 turns from PCA to distance based analyses and multidimensional scaling, and chapter 8 to shape and manifold representations. Chapter 9 illustrates data alignment using domain warping by the Fisher-Rao method. Chapter 10 looks at tree graphs and networks as data. Chapters 11 and 12 consider novel classification and clustering techniques. Chapters 13 and 14 offer methods for inference and asymptotic, respectively, in high-dimensional contexts. Chapter 15 describes the statistical graphics tool SiZer and chapter 16 outlines robust estimation techniques. The book concludes with additional material on PCA and a final chapter on general reflections on object oriented data. By my count the book examines 19 substantial and varied datasets, most of which are available on gitHub along with analyses using Matlab. They also add to these a number of toy sets used as illustrations. Their use of color and other statistical graphics tools is outstanding, and makes displays exciting even if not always essential. My personal favorite is the display of 3D rectum-prostate-bladder structures, require a solid background in finite element analysis to produce. The target audience is graduate students in statistics and machine learning, and the book provides a gold mine of fascinating potential class projects. However, as a teaching tool it does have some limitations. The many literature citations that seem to accompany any assertion, and make for cluttered reading. Restricting these to an annotated resource section at the end of each chapter woul
{"title":"Quantitative Drug Safety and Benefit-Risk Evaluation: Practical and Cross-Disciplinary Approaches","authors":"Huan Wang","doi":"10.1080/00031305.2022.2160592","DOIUrl":"https://doi.org/10.1080/00031305.2022.2160592","url":null,"abstract":"This book, which wants to be called OODA, is in an emerging genre, the statistical autobiography. Marron and Dryden have expanded the frontiers of data analysis in many directions over their careers, and they document the challenges encountered along the way. Their data illustrate growth in the statistical landscape in both size and complexity. Functional data analysis, and its cousin shape analysis, took data analysis beyond the familiar matrix format by replacing frequently unordered columns by continuous and usually differentiable curves. The functional transition in one sense was easy because it remained within the Hilbert space framework. But the space of operations on curves is larger than linear algebra, since it includes differentiation to fit data with differential equations, integration to compute arc length and the nonlinear transformation of domains so as to align curve features. The authors add to the mix the graph structures trees and networks, as well as curved manifolds. This binding of new data objects to new transformation groups coincided roughly with the advent of object oriented programming systems, and hence the title. The first three chapters provide short overviews of several example analyses followed by tutorial material on variants of principle component analysis. Chapters 4 , 5, and 6 provide examples of data exploration and confirmation, respectively, as well as tips on visualizing results. Chapter 7 turns from PCA to distance based analyses and multidimensional scaling, and chapter 8 to shape and manifold representations. Chapter 9 illustrates data alignment using domain warping by the Fisher-Rao method. Chapter 10 looks at tree graphs and networks as data. Chapters 11 and 12 consider novel classification and clustering techniques. Chapters 13 and 14 offer methods for inference and asymptotic, respectively, in high-dimensional contexts. Chapter 15 describes the statistical graphics tool SiZer and chapter 16 outlines robust estimation techniques. The book concludes with additional material on PCA and a final chapter on general reflections on object oriented data. By my count the book examines 19 substantial and varied datasets, most of which are available on gitHub along with analyses using Matlab. They also add to these a number of toy sets used as illustrations. Their use of color and other statistical graphics tools is outstanding, and makes displays exciting even if not always essential. My personal favorite is the display of 3D rectum-prostate-bladder structures, require a solid background in finite element analysis to produce. The target audience is graduate students in statistics and machine learning, and the book provides a gold mine of fascinating potential class projects. However, as a teaching tool it does have some limitations. The many literature citations that seem to accompany any assertion, and make for cluttered reading. Restricting these to an annotated resource section at the end of each chapter woul","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128073728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-02DOI: 10.1080/00031305.2022.2160590
James O. Ramsay
{"title":"Object Oriented Data Analysis","authors":"James O. Ramsay","doi":"10.1080/00031305.2022.2160590","DOIUrl":"https://doi.org/10.1080/00031305.2022.2160590","url":null,"abstract":"","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135755214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-21DOI: 10.1080/00031305.2023.2257237
Nicholas Larsen, Jonathan W. Stallrich, Srijan Sengupta, Alex Deng, Ron Kohavi, Nathaniel T. Stevens
The rise of internet-based services and products in the late 1990's brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the past two decades, organizations such as Airbnb, Alibaba, Amazon, Baidu, Booking, Alphabet's Google, LinkedIn, Lyft, Meta's Facebook, Microsoft, Netflix, Twitter, Uber, and Yandex have invested tremendous resources in online controlled experiments (OCEs) to assess the impact of innovation on their customers and businesses. Running OCEs at scale has presented a host of challenges requiring solutions from many domains. In this paper we review challenges that require new statistical methodologies to address them. In particular, we discuss the practice and culture of online experimentation, as well as its statistics literature, placing the current methodologies within their relevant statistical lineages and providing illustrative examples of OCE applications. Our goal is to raise academic statisticians' awareness of these new research opportunities to increase collaboration between academia and the online industry.
{"title":"Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology","authors":"Nicholas Larsen, Jonathan W. Stallrich, Srijan Sengupta, Alex Deng, Ron Kohavi, Nathaniel T. Stevens","doi":"10.1080/00031305.2023.2257237","DOIUrl":"https://doi.org/10.1080/00031305.2023.2257237","url":null,"abstract":"The rise of internet-based services and products in the late 1990's brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the past two decades, organizations such as Airbnb, Alibaba, Amazon, Baidu, Booking, Alphabet's Google, LinkedIn, Lyft, Meta's Facebook, Microsoft, Netflix, Twitter, Uber, and Yandex have invested tremendous resources in online controlled experiments (OCEs) to assess the impact of innovation on their customers and businesses. Running OCEs at scale has presented a host of challenges requiring solutions from many domains. In this paper we review challenges that require new statistical methodologies to address them. In particular, we discuss the practice and culture of online experimentation, as well as its statistics literature, placing the current methodologies within their relevant statistical lineages and providing illustrative examples of OCE applications. Our goal is to raise academic statisticians' awareness of these new research opportunities to increase collaboration between academia and the online industry.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129899586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-19DOI: 10.1080/00031305.2022.2151510
Diana Rauwolf, U. Kamps
Abstract The well-known inspection paradox of renewal theory states that, in expectation, the inspection interval is larger than a common renewal interval, in general. For a random inspection time, which includes the deterministic case, and a delayed renewal process, representations of the expected length of an inspection interval and related inequalities in terms of covariances are shown. Datasets of eruption times of Beehive Geyser and Riverside Geyser in Yellowstone National Park, as well as several distributional examples, illustrate the findings.
{"title":"Quantifying the Inspection Paradox with Random Time","authors":"Diana Rauwolf, U. Kamps","doi":"10.1080/00031305.2022.2151510","DOIUrl":"https://doi.org/10.1080/00031305.2022.2151510","url":null,"abstract":"Abstract The well-known inspection paradox of renewal theory states that, in expectation, the inspection interval is larger than a common renewal interval, in general. For a random inspection time, which includes the deterministic case, and a delayed renewal process, representations of the expected length of an inspection interval and related inequalities in terms of covariances are shown. Datasets of eruption times of Beehive Geyser and Riverside Geyser in Yellowstone National Park, as well as several distributional examples, illustrate the findings.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128729027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-13DOI: 10.1080/00031305.2022.2156612
R. Raman, J. Utts, Andrew I. Cohen, Matthew J. Hayat
Abstract Statistics education at all levels includes data collected on human subjects. Thus, statistics educators have a responsibility to educate their students about the ethical aspects related to the collection of those data. The changing statistics education landscape has seen instruction moving from being formula-based to being focused on statistical reasoning. The widely implemented Guidelines for Assessment and Instruction in Statistics Education (GAISE) Report has paved the way for instructors to present introductory statistics to students in a way that is both approachable and engaging. However, with technological advancement and the increase in availability of real-world datasets, it is necessary that instruction also integrate the ethical aspects around data sources, such as privacy, how the data were obtained and whether participants consent to the use of their data. In this article, we propose incorporating ethics into established curricula and integrating ethics into undergraduate-level introductory statistics courses based on recommendations in the GAISE Report. We provide a few examples of how to prompt students to constructively think about their ethical responsibilities when working with data.
{"title":"Integrating Ethics into the Guidelines for Assessment and Instruction in Statistics Education (GAISE)","authors":"R. Raman, J. Utts, Andrew I. Cohen, Matthew J. Hayat","doi":"10.1080/00031305.2022.2156612","DOIUrl":"https://doi.org/10.1080/00031305.2022.2156612","url":null,"abstract":"Abstract Statistics education at all levels includes data collected on human subjects. Thus, statistics educators have a responsibility to educate their students about the ethical aspects related to the collection of those data. The changing statistics education landscape has seen instruction moving from being formula-based to being focused on statistical reasoning. The widely implemented Guidelines for Assessment and Instruction in Statistics Education (GAISE) Report has paved the way for instructors to present introductory statistics to students in a way that is both approachable and engaging. However, with technological advancement and the increase in availability of real-world datasets, it is necessary that instruction also integrate the ethical aspects around data sources, such as privacy, how the data were obtained and whether participants consent to the use of their data. In this article, we propose incorporating ethics into established curricula and integrating ethics into undergraduate-level introductory statistics courses based on recommendations in the GAISE Report. We provide a few examples of how to prompt students to constructively think about their ethical responsibilities when working with data.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127034149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-13DOI: 10.1080/00031305.2022.2157874
Marcelo dos Santos, F. De Bastiani, M. Uribe-Opazo, M. Galea
Abstract To obtain regression parameter estimates in generalized estimation equation modeling, whether in longitudinal or spatially correlated data, it is necessary to specify the structure of the working correlation matrix. The regression parameter estimates can be affected by the choice of this matrix. Within spatial statistics, the correlation matrix also influences how spatial variability is modeled. Therefore, this study proposes a new method for selecting a working matrix, based on conditioning the variance-covariance matrix naive. The method performance is evaluated by an extensive simulation study, using the marginal distributions of normal, Poisson, and gamma for spatially correlated data. The correlation structure specification is based on semivariogram models, using the Wendland, Matérn, and spherical model families. The results reveal that regarding the hit rates of the true spatial correlation structure of simulated data, the proposed criterion resulted in better performance than competing criteria: quasi-likelihood under the independence model criterion QIC, correlation information criterion CIC, and the Rotnizky–Jewell criterion RJC. The application of an appropriate spatial correlation structure selection was shown using the first-semester average rainfall data of 2021 in the state of Pernambuco, Brazil.
{"title":"Selection Criterion of Working Correlation Structure for Spatially Correlated Data","authors":"Marcelo dos Santos, F. De Bastiani, M. Uribe-Opazo, M. Galea","doi":"10.1080/00031305.2022.2157874","DOIUrl":"https://doi.org/10.1080/00031305.2022.2157874","url":null,"abstract":"Abstract To obtain regression parameter estimates in generalized estimation equation modeling, whether in longitudinal or spatially correlated data, it is necessary to specify the structure of the working correlation matrix. The regression parameter estimates can be affected by the choice of this matrix. Within spatial statistics, the correlation matrix also influences how spatial variability is modeled. Therefore, this study proposes a new method for selecting a working matrix, based on conditioning the variance-covariance matrix naive. The method performance is evaluated by an extensive simulation study, using the marginal distributions of normal, Poisson, and gamma for spatially correlated data. The correlation structure specification is based on semivariogram models, using the Wendland, Matérn, and spherical model families. The results reveal that regarding the hit rates of the true spatial correlation structure of simulated data, the proposed criterion resulted in better performance than competing criteria: quasi-likelihood under the independence model criterion QIC, correlation information criterion CIC, and the Rotnizky–Jewell criterion RJC. The application of an appropriate spatial correlation structure selection was shown using the first-semester average rainfall data of 2021 in the state of Pernambuco, Brazil.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129350046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-29DOI: 10.1080/00031305.2023.2199800
J. Bracher, Nils Koster, Fabian Kruger, Sebastian Lerch
We report on a course project in which students submit weekly probabilistic forecasts of two weather variables and one financial variable. This real-time format allows students to engage in practical forecasting, which requires a diverse set of skills in data science and applied statistics. We describe the context and aims of the course, and discuss design parameters like the selection of target variables, the forecast submission process, the evaluation of forecast performance, and the feedback provided to students. Furthermore, we describe empirical properties of students' probabilistic forecasts, as well as some lessons learned on our part.
{"title":"Learning to forecast: The probabilistic time series forecasting challenge","authors":"J. Bracher, Nils Koster, Fabian Kruger, Sebastian Lerch","doi":"10.1080/00031305.2023.2199800","DOIUrl":"https://doi.org/10.1080/00031305.2023.2199800","url":null,"abstract":"We report on a course project in which students submit weekly probabilistic forecasts of two weather variables and one financial variable. This real-time format allows students to engage in practical forecasting, which requires a diverse set of skills in data science and applied statistics. We describe the context and aims of the course, and discuss design parameters like the selection of target variables, the forecast submission process, the evaluation of forecast performance, and the feedback provided to students. Furthermore, we describe empirical properties of students' probabilistic forecasts, as well as some lessons learned on our part.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125685634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-23DOI: 10.1080/00031305.2023.2186952
J. Graffelman, Jan de Leeuw
The graphical representation of the correlation matrix by means of different multivariate statistical methods is reviewed, a comparison of the different procedures is presented with the use of an example data set, and an improved representation with better fit is proposed. Principal component analysis is widely used for making pictures of correlation structure, though as shown a weighted alternating least squares approach that avoids the fitting of the diagonal of the correlation matrix outperforms both principal component analysis and principal factor analysis in approximating a correlation matrix. Weighted alternating least squares is a very strong competitor for principal component analysis, in particular if the correlation matrix is the focus of the study, because it improves the representation of the correlation matrix, often at the expense of only a minor percentage of explained variance for the original data matrix, if the latter is mapped onto the correlation biplot by regression. In this article, we propose to combine weighted alternating least squares with an additive adjustment of the correlation matrix, and this is seen to lead to further improved approximation of the correlation matrix.
{"title":"Improved approximation and visualization of the correlation matrix","authors":"J. Graffelman, Jan de Leeuw","doi":"10.1080/00031305.2023.2186952","DOIUrl":"https://doi.org/10.1080/00031305.2023.2186952","url":null,"abstract":"The graphical representation of the correlation matrix by means of different multivariate statistical methods is reviewed, a comparison of the different procedures is presented with the use of an example data set, and an improved representation with better fit is proposed. Principal component analysis is widely used for making pictures of correlation structure, though as shown a weighted alternating least squares approach that avoids the fitting of the diagonal of the correlation matrix outperforms both principal component analysis and principal factor analysis in approximating a correlation matrix. Weighted alternating least squares is a very strong competitor for principal component analysis, in particular if the correlation matrix is the focus of the study, because it improves the representation of the correlation matrix, often at the expense of only a minor percentage of explained variance for the original data matrix, if the latter is mapped onto the correlation biplot by regression. In this article, we propose to combine weighted alternating least squares with an additive adjustment of the correlation matrix, and this is seen to lead to further improved approximation of the correlation matrix.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121780467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-15DOI: 10.1080/00031305.2022.2141879
S. Lipovetsky
where μ1 and μ2 are the means, and σ1 and σ2 are the standard errors of the dependent variable y and the predictor x, respectively, and sgn(ρ) is the sign of Pearson correlation ρ of these variables. In contrast to the best linear prediction (1), the slope of the best linear predictor (2), obtained with the restriction that the variance of the predictor of y equals the variance of y itself, is expressed by the sgn(ρ) replacing the actual value of ρ in (1). The formula (1) corresponds to the simple regression, while the formula (2) coincides with the so-called diagonal regression. The diagonal regression was proposed by Ragnar Frisch (1934), one of the founders of modern economics and the first economics Nobel laureate, who coined such terms as econometrics and collinearity. Up to the variables centering, the formula (2) defines the slope as the signed quotient of the standard deviations of the dependent and independent variables, and the diagonal regression for one and two predictors was considered in Cobb (1939, 1943). The model of the form (2) for one predictor is identical to the so-called geometric mean regression, standard (reduced) major axis regression, and some others, reviewed in the work by Xe (2014), with an extensive list of many researchers independently proposed and developed these models. Derivation of the diagonal regression (2) for the models with errors in measurement by both variables via the maximum likelihood criterion is described in Leser (1974, Chapt. 2). More references on diagonal regres-
{"title":"Comment on “On Optimal Correlation-Based Prediction”, By Bottai et al. (2022)","authors":"S. Lipovetsky","doi":"10.1080/00031305.2022.2141879","DOIUrl":"https://doi.org/10.1080/00031305.2022.2141879","url":null,"abstract":"where μ1 and μ2 are the means, and σ1 and σ2 are the standard errors of the dependent variable y and the predictor x, respectively, and sgn(ρ) is the sign of Pearson correlation ρ of these variables. In contrast to the best linear prediction (1), the slope of the best linear predictor (2), obtained with the restriction that the variance of the predictor of y equals the variance of y itself, is expressed by the sgn(ρ) replacing the actual value of ρ in (1). The formula (1) corresponds to the simple regression, while the formula (2) coincides with the so-called diagonal regression. The diagonal regression was proposed by Ragnar Frisch (1934), one of the founders of modern economics and the first economics Nobel laureate, who coined such terms as econometrics and collinearity. Up to the variables centering, the formula (2) defines the slope as the signed quotient of the standard deviations of the dependent and independent variables, and the diagonal regression for one and two predictors was considered in Cobb (1939, 1943). The model of the form (2) for one predictor is identical to the so-called geometric mean regression, standard (reduced) major axis regression, and some others, reviewed in the work by Xe (2014), with an extensive list of many researchers independently proposed and developed these models. Derivation of the diagonal regression (2) for the models with errors in measurement by both variables via the maximum likelihood criterion is described in Leser (1974, Chapt. 2). More references on diagonal regres-","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132470124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}