Association of body index with fecal microbiome in children cohorts with ethnic-geographic factor interaction: accurately using a Bayesian zero-inflated negative binomial regression model.

IF 5 2区 生物学 Q1 MICROBIOLOGY mSystems Pub Date : 2024-11-21 DOI:10.1128/msystems.01345-24
Jian Huang, Yanzhuan Lu, Fengwei Tian, Yongqing Ni
{"title":"Association of body index with fecal microbiome in children cohorts with ethnic-geographic factor interaction: accurately using a Bayesian zero-inflated negative binomial regression model.","authors":"Jian Huang, Yanzhuan Lu, Fengwei Tian, Yongqing Ni","doi":"10.1128/msystems.01345-24","DOIUrl":null,"url":null,"abstract":"<p><p>The exponential growth of high-throughput sequencing (HTS) data on the microbial communities presents researchers with an unparalleled opportunity to delve deeper into the association of microorganisms with host phenotype. However, this growth also poses a challenge, as microbial data are complex, sparse, discrete, and prone to zero inflation. Herein, by utilizing 10 distinct counting models for analyzing simulated data, we proposed an innovative Bayesian zero-inflated negative binomial (ZINB) regression model that is capable of identifying differentially abundant taxa associated with distinctive host phenotypes and quantifying the effects of covariates on these taxa. Our proposed model exhibits excellent accuracy compared with conventional Hurdle and INLA models, especially in scenarios characterized by inflation and overdispersion. Moreover, we confirm that dispersion parameters significantly affect the accuracy of model results, with defects gradually alleviating as the number of analyzed samples increases. Subsequently applying our model to amplicon data in real multi-ethnic children cohort, we found that only a subset of taxa were identified as having zero inflation in real data, suggesting that the prevailing understanding and processing of microbial count data in most previous microbiome studies were overly dogmatic. In practice, our pipeline of integrating bacterial differential abundance in microbiome data and relevant covariates is effective and feasible. Taken together, our method is expected to be extended to the microbiota studies of various multi-cohort populations.</p><p><strong>Importance: </strong>The microbiome is closely associated with physical indicators of the body, such as height, weight, age and BMI, which can be used as measures of human health. Accurately identifying which taxa in the microbiome are closely related to indicators of physical development is valuable as microbial markers of regional child growth trajectory. Zero-inflated negative binomial (ZINB) model, a type of Bayesian generalized linear model, can be effectively modeled in complex biological systems. We present an innovative ZINB regression model that is capable of identifying differentially abundant taxa associated with distinctive host phenotypes and quantifying the effects of covariates on these taxa, and demonstrate that its accuracy is superior to traditional Hurdle and INLA models. Our pipeline of integrating bacterial differential abundance in microbiome data and relevant covariates is effective and feasible.</p>","PeriodicalId":18819,"journal":{"name":"mSystems","volume":" ","pages":"e0134524"},"PeriodicalIF":5.0000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"mSystems","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/msystems.01345-24","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The exponential growth of high-throughput sequencing (HTS) data on the microbial communities presents researchers with an unparalleled opportunity to delve deeper into the association of microorganisms with host phenotype. However, this growth also poses a challenge, as microbial data are complex, sparse, discrete, and prone to zero inflation. Herein, by utilizing 10 distinct counting models for analyzing simulated data, we proposed an innovative Bayesian zero-inflated negative binomial (ZINB) regression model that is capable of identifying differentially abundant taxa associated with distinctive host phenotypes and quantifying the effects of covariates on these taxa. Our proposed model exhibits excellent accuracy compared with conventional Hurdle and INLA models, especially in scenarios characterized by inflation and overdispersion. Moreover, we confirm that dispersion parameters significantly affect the accuracy of model results, with defects gradually alleviating as the number of analyzed samples increases. Subsequently applying our model to amplicon data in real multi-ethnic children cohort, we found that only a subset of taxa were identified as having zero inflation in real data, suggesting that the prevailing understanding and processing of microbial count data in most previous microbiome studies were overly dogmatic. In practice, our pipeline of integrating bacterial differential abundance in microbiome data and relevant covariates is effective and feasible. Taken together, our method is expected to be extended to the microbiota studies of various multi-cohort populations.

Importance: The microbiome is closely associated with physical indicators of the body, such as height, weight, age and BMI, which can be used as measures of human health. Accurately identifying which taxa in the microbiome are closely related to indicators of physical development is valuable as microbial markers of regional child growth trajectory. Zero-inflated negative binomial (ZINB) model, a type of Bayesian generalized linear model, can be effectively modeled in complex biological systems. We present an innovative ZINB regression model that is capable of identifying differentially abundant taxa associated with distinctive host phenotypes and quantifying the effects of covariates on these taxa, and demonstrate that its accuracy is superior to traditional Hurdle and INLA models. Our pipeline of integrating bacterial differential abundance in microbiome data and relevant covariates is effective and feasible.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
儿童队列中身体指数与粪便微生物组的关系与种族-地理因素的交互作用:使用贝叶斯零膨胀负二项回归模型进行精确分析。
微生物群落的高通量测序(HTS)数据呈指数级增长,为研究人员深入研究微生物与宿主表型的关联提供了无与伦比的机会。然而,这种增长也带来了挑战,因为微生物数据复杂、稀少、离散,而且容易出现零膨胀。在此,我们利用 10 个不同的计数模型来分析模拟数据,提出了一个创新的贝叶斯零膨胀负二项(ZINB)回归模型,该模型能够识别与独特宿主表型相关的不同丰富类群,并量化协变量对这些类群的影响。与传统的 Hurdle 和 INLA 模型相比,我们提出的模型具有极高的准确性,尤其是在膨胀和过度分散的情况下。此外,我们还证实,分散参数会显著影响模型结果的准确性,随着分析样本数量的增加,缺陷会逐渐减少。随后,我们将模型应用于真实多种族儿童队列中的扩增片段数据,发现在真实数据中只有一部分类群被认定为零膨胀,这表明之前大多数微生物组研究对微生物计数数据的理解和处理过于教条。在实践中,我们整合微生物组数据中细菌差异丰度和相关协变量的方法是有效和可行的。综上所述,我们的方法有望推广到各种多队列人群的微生物群研究中:微生物群与身高、体重、年龄和体重指数等身体指标密切相关,可作为人体健康的衡量标准。准确确定微生物组中哪些类群与身体发育指标密切相关,作为区域性儿童生长轨迹的微生物标记非常有价值。零膨胀负二项(ZINB)模型是贝叶斯广义线性模型的一种,可以有效地模拟复杂的生物系统。我们提出了一种创新的 ZINB 回归模型,该模型能够识别与独特宿主表型相关的差异丰度类群,并量化协变量对这些类群的影响,同时证明其准确性优于传统的 Hurdle 模型和 INLA 模型。我们在微生物组数据中整合细菌差异丰度和相关协变量的方法是有效和可行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
mSystems
mSystems Biochemistry, Genetics and Molecular Biology-Biochemistry
CiteScore
10.50
自引率
3.10%
发文量
308
审稿时长
13 weeks
期刊介绍: mSystems™ will publish preeminent work that stems from applying technologies for high-throughput analyses to achieve insights into the metabolic and regulatory systems at the scale of both the single cell and microbial communities. The scope of mSystems™ encompasses all important biological and biochemical findings drawn from analyses of large data sets, as well as new computational approaches for deriving these insights. mSystems™ will welcome submissions from researchers who focus on the microbiome, genomics, metagenomics, transcriptomics, metabolomics, proteomics, glycomics, bioinformatics, and computational microbiology. mSystems™ will provide streamlined decisions, while carrying on ASM''s tradition of rigorous peer review.
期刊最新文献
Association of body index with fecal microbiome in children cohorts with ethnic-geographic factor interaction: accurately using a Bayesian zero-inflated negative binomial regression model. Cigarette smoke-induced disordered microbiota aggravates the severity of influenza A virus infection. Deep learning enabled integration of tumor microenvironment microbial profiles and host gene expressions for interpretable survival subtyping in diverse types of cancers. Advancing microbiome research in Māori populations: insights from recent literature exploring the gut microbiomes of underrepresented and Indigenous peoples. Pan-genome-scale metabolic modeling of Bacillus subtilis reveals functionally distinct groups.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1