{"title":"Linear Regression of Sampling Distributions of the Mean.","authors":"David J Torres, Ana Vasilic, Jose Pacheco","doi":"10.26502/jbsb.5107079","DOIUrl":null,"url":null,"abstract":"<p><p>We show that the simple and multiple linear regression coefficients and the coefficient of determination R<sup>2</sup> computed from sampling distributions of the mean (with or without replacement) are equal to the regression coefficients and coefficient of determination computed with individual data. Moreover, the standard error of estimate is reduced by the square root of the group size for sampling distributions of the mean. The result has applications when formulating a distance measure between two genes in a hierarchical clustering algorithm. We show that the Pearson <math><mi>R</mi></math> coefficient can measure how differential expression in one gene correlates with differential expression in a second gene.</p>","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"7 1","pages":"63-80"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11108041/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of bioinformatics and systems biology : Open access","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26502/jbsb.5107079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/4 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We show that the simple and multiple linear regression coefficients and the coefficient of determination R2 computed from sampling distributions of the mean (with or without replacement) are equal to the regression coefficients and coefficient of determination computed with individual data. Moreover, the standard error of estimate is reduced by the square root of the group size for sampling distributions of the mean. The result has applications when formulating a distance measure between two genes in a hierarchical clustering algorithm. We show that the Pearson coefficient can measure how differential expression in one gene correlates with differential expression in a second gene.
我们证明,根据平均值的抽样分布(有或没有替换)计算出的简单和多重线性回归系数以及判定系数 R2 与根据个体数据计算出的回归系数和判定系数相等。此外,对于均值的抽样分布,估计值的标准误差会因群体规模的平方根而减小。这一结果适用于分层聚类算法中两个基因之间的距离测量。我们表明,皮尔逊 R 系数可以衡量一个基因的差异表达与第二个基因的差异表达之间的相关性。