Microbiome analysis is the process of identifying the composition and function of a community of microorganisms in a particular location, which is essential in understanding human and environmental health. Properly quantifying microbial composition, however, remains challenging and relies on statistical modeling of either the raw taxonomic abundances or the relative abundances. Relative abundance measures are commonly preferred over the absolute abundances for microbiome analysis because absolute abundance values are dependent on the sequencing depth and sequencing method. Despite this, literature on modeling relative abundance by meaningful probability distribution, followed by subsequent statistical inferences, is limited. In this work, the Dirichlet distribution is proposed to model the relative abundances of taxa directly without the use of any further transformation (e.g., additive log-ratio transform, isometric log-ratio transform). In a comprehensive simulation study, we have compared biases and standard errors of two methods of moments estimators (MMEs) and the maximum likelihood estimator (MLE) of the Dirichlet distribution. comparison of these estimators is done over three cases of differing sample size and dimension: (i) Small dimension and small sample size; (ii) small dimension and large sample size; (iii) large dimension with both small and large sample sizes. As expected, the MLE shows the overall best performance because there is no loss of information since this estimator is based on the (minimal) sufficient statistics. We then explore the asymptotic properties of the MLE utilizing the Fisher information alongside our simulation results. We demonstrate the applicability of Dirichlet modeling methodology with four real world microbiome datasets and show how the estimated mean relative abundances obtained from the Dirichlet MLE (DMLE) differ from those obtained by a commonly used method, that is-Bayesian Dirichlet-multinomial estimator (BDME), which works with absolute abundances. For all the four datasets, the DMLE results are comparable to the BDME results while requiring much less computational time for both single uses and for large simulations.
扫码关注我们
求助内容:
应助结果提醒方式:
