{"title":"Estimating the mean of a small sample under the two parameter lognormal distribution","authors":"P. Hingley","doi":"10.11145/TEXTS.2018.02.027","DOIUrl":null,"url":null,"abstract":"When making statistical inferences about the means of small samples, the confidence limits for the mean are often calculated assuming a normal distribution. But many biological variables follow the lognormal distribution \\cite{Johnson}, for example the birth weights of babies (EG data at \\cite{Iannelli}). Here, sampling distributions (probability density functions) are found for the maximum likelihood estimates (MLE) of sample mean and variance when data are lognormally distributed. They are derived analytically, making some use of the Technique for Estimator Densities (TED) \\cite{Hingley}, and then checked by using simulations with random numbers. For an I.I.D. sample of size n with lognormal estimation, the sample mean has a lognormal distribution that is conditional on the variance. The distribution of the sample variance of the logarithms follows the usual transform of the central chi-squared distribution. The joint distribution of the sample mean and variance shows the extent to which the mean is affected by the variance. When a normal distribution is wrongly used for estimation on lognormally distributed data, the sample mean still has a lognormal distribution. But the distribution for the MLE of the variance differs. From the distribution for one observation, that for larger sample sizes can be approached by using convolutions. The assumption of a normal estimation model biases the confidence interval for the mean. There is a discussion of the extent to which this is of practical importance when estimating means for small samples of birth weights and other lognormally distributed data sets.","PeriodicalId":370233,"journal":{"name":"Biomath Communications Supplement","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomath Communications Supplement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11145/TEXTS.2018.02.027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
When making statistical inferences about the means of small samples, the confidence limits for the mean are often calculated assuming a normal distribution. But many biological variables follow the lognormal distribution \cite{Johnson}, for example the birth weights of babies (EG data at \cite{Iannelli}). Here, sampling distributions (probability density functions) are found for the maximum likelihood estimates (MLE) of sample mean and variance when data are lognormally distributed. They are derived analytically, making some use of the Technique for Estimator Densities (TED) \cite{Hingley}, and then checked by using simulations with random numbers. For an I.I.D. sample of size n with lognormal estimation, the sample mean has a lognormal distribution that is conditional on the variance. The distribution of the sample variance of the logarithms follows the usual transform of the central chi-squared distribution. The joint distribution of the sample mean and variance shows the extent to which the mean is affected by the variance. When a normal distribution is wrongly used for estimation on lognormally distributed data, the sample mean still has a lognormal distribution. But the distribution for the MLE of the variance differs. From the distribution for one observation, that for larger sample sizes can be approached by using convolutions. The assumption of a normal estimation model biases the confidence interval for the mean. There is a discussion of the extent to which this is of practical importance when estimating means for small samples of birth weights and other lognormally distributed data sets.