Association of body index with fecal microbiome in children cohorts with ethnic-geographic factor interaction: accurately using a Bayesian zero-inflated negative binomial regression model.
Jian Huang, Yanzhuan Lu, Fengwei Tian, Yongqing Ni
{"title":"Association of body index with fecal microbiome in children cohorts with ethnic-geographic factor interaction: accurately using a Bayesian zero-inflated negative binomial regression model.","authors":"Jian Huang, Yanzhuan Lu, Fengwei Tian, Yongqing Ni","doi":"10.1128/msystems.01345-24","DOIUrl":null,"url":null,"abstract":"<p><p>The exponential growth of high-throughput sequencing (HTS) data on the microbial communities presents researchers with an unparalleled opportunity to delve deeper into the association of microorganisms with host phenotype. However, this growth also poses a challenge, as microbial data are complex, sparse, discrete, and prone to zero inflation. Herein, by utilizing 10 distinct counting models for analyzing simulated data, we proposed an innovative Bayesian zero-inflated negative binomial (ZINB) regression model that is capable of identifying differentially abundant taxa associated with distinctive host phenotypes and quantifying the effects of covariates on these taxa. Our proposed model exhibits excellent accuracy compared with conventional Hurdle and INLA models, especially in scenarios characterized by inflation and overdispersion. Moreover, we confirm that dispersion parameters significantly affect the accuracy of model results, with defects gradually alleviating as the number of analyzed samples increases. Subsequently applying our model to amplicon data in real multi-ethnic children cohort, we found that only a subset of taxa were identified as having zero inflation in real data, suggesting that the prevailing understanding and processing of microbial count data in most previous microbiome studies were overly dogmatic. In practice, our pipeline of integrating bacterial differential abundance in microbiome data and relevant covariates is effective and feasible. Taken together, our method is expected to be extended to the microbiota studies of various multi-cohort populations.</p><p><strong>Importance: </strong>The microbiome is closely associated with physical indicators of the body, such as height, weight, age and BMI, which can be used as measures of human health. Accurately identifying which taxa in the microbiome are closely related to indicators of physical development is valuable as microbial markers of regional child growth trajectory. Zero-inflated negative binomial (ZINB) model, a type of Bayesian generalized linear model, can be effectively modeled in complex biological systems. We present an innovative ZINB regression model that is capable of identifying differentially abundant taxa associated with distinctive host phenotypes and quantifying the effects of covariates on these taxa, and demonstrate that its accuracy is superior to traditional Hurdle and INLA models. Our pipeline of integrating bacterial differential abundance in microbiome data and relevant covariates is effective and feasible.</p>","PeriodicalId":18819,"journal":{"name":"mSystems","volume":" ","pages":"e0134524"},"PeriodicalIF":5.0000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"mSystems","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/msystems.01345-24","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The exponential growth of high-throughput sequencing (HTS) data on the microbial communities presents researchers with an unparalleled opportunity to delve deeper into the association of microorganisms with host phenotype. However, this growth also poses a challenge, as microbial data are complex, sparse, discrete, and prone to zero inflation. Herein, by utilizing 10 distinct counting models for analyzing simulated data, we proposed an innovative Bayesian zero-inflated negative binomial (ZINB) regression model that is capable of identifying differentially abundant taxa associated with distinctive host phenotypes and quantifying the effects of covariates on these taxa. Our proposed model exhibits excellent accuracy compared with conventional Hurdle and INLA models, especially in scenarios characterized by inflation and overdispersion. Moreover, we confirm that dispersion parameters significantly affect the accuracy of model results, with defects gradually alleviating as the number of analyzed samples increases. Subsequently applying our model to amplicon data in real multi-ethnic children cohort, we found that only a subset of taxa were identified as having zero inflation in real data, suggesting that the prevailing understanding and processing of microbial count data in most previous microbiome studies were overly dogmatic. In practice, our pipeline of integrating bacterial differential abundance in microbiome data and relevant covariates is effective and feasible. Taken together, our method is expected to be extended to the microbiota studies of various multi-cohort populations.
Importance: The microbiome is closely associated with physical indicators of the body, such as height, weight, age and BMI, which can be used as measures of human health. Accurately identifying which taxa in the microbiome are closely related to indicators of physical development is valuable as microbial markers of regional child growth trajectory. Zero-inflated negative binomial (ZINB) model, a type of Bayesian generalized linear model, can be effectively modeled in complex biological systems. We present an innovative ZINB regression model that is capable of identifying differentially abundant taxa associated with distinctive host phenotypes and quantifying the effects of covariates on these taxa, and demonstrate that its accuracy is superior to traditional Hurdle and INLA models. Our pipeline of integrating bacterial differential abundance in microbiome data and relevant covariates is effective and feasible.
mSystemsBiochemistry, Genetics and Molecular Biology-Biochemistry
CiteScore
10.50
自引率
3.10%
发文量
308
审稿时长
13 weeks
期刊介绍:
mSystems™ will publish preeminent work that stems from applying technologies for high-throughput analyses to achieve insights into the metabolic and regulatory systems at the scale of both the single cell and microbial communities. The scope of mSystems™ encompasses all important biological and biochemical findings drawn from analyses of large data sets, as well as new computational approaches for deriving these insights. mSystems™ will welcome submissions from researchers who focus on the microbiome, genomics, metagenomics, transcriptomics, metabolomics, proteomics, glycomics, bioinformatics, and computational microbiology. mSystems™ will provide streamlined decisions, while carrying on ASM''s tradition of rigorous peer review.