GDmicro: classifying host disease status with GCN and Deep adaptation network based on the human gut microbiome data

IF 4.4 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Bioinformatics Pub Date : 2023-12-12 DOI:10.1093/bioinformatics/btad747
Herui Liao, Jiayu Shang, Yanni Sun
{"title":"GDmicro: classifying host disease status with GCN and Deep adaptation network based on the human gut microbiome data","authors":"Herui Liao, Jiayu Shang, Yanni Sun","doi":"10.1093/bioinformatics/btad747","DOIUrl":null,"url":null,"abstract":"Motivation With advances in metagenomic sequencing technologies, there are accumulating studies revealing the associations between the human gut microbiome and some human diseases. These associations shed light on using gut microbiome data to distinguish case and control samples of a specific disease, which is also called host disease status classification. Importantly, using learning-based models to distinguish the disease and control samples is expected to identify important biomarkers more accurately than abundance-based statistical analysis. However, available tools have not fully addressed two challenges associated with this task: limited labeled microbiome data and decreased accuracy in cross-studies. The confounding factors such as the diet, technical biases in sample collection/sequencing across different studies/cohorts often jeopardize the generalization of the learning model. Results To address these challenges, we develop a new tool GDmicro, which combines semi-supervised learning and domain adaptation to achieve a more generalized model using limited labeled samples. We evaluated GDmicro on human gut microbiome data from 11 cohorts covering 5 different diseases. The results show that GDmicro has better performance and robustness than state-of-the-art tools. In particular, it improves the AUC from 0.783 to 0.949 in identifying inflammatory bowel disease. Furthermore, GDmicro can identify potential biomarkers with greater accuracy than abundance-based statistical analysis methods. It also reveals the contribution of these biomarkers to the host’s disease status. Availability and implementation https://github.com/liaoherui/GDmicro Supplementary information Supplementary data are available at Bioinformatics online","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"103 1","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btad747","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation With advances in metagenomic sequencing technologies, there are accumulating studies revealing the associations between the human gut microbiome and some human diseases. These associations shed light on using gut microbiome data to distinguish case and control samples of a specific disease, which is also called host disease status classification. Importantly, using learning-based models to distinguish the disease and control samples is expected to identify important biomarkers more accurately than abundance-based statistical analysis. However, available tools have not fully addressed two challenges associated with this task: limited labeled microbiome data and decreased accuracy in cross-studies. The confounding factors such as the diet, technical biases in sample collection/sequencing across different studies/cohorts often jeopardize the generalization of the learning model. Results To address these challenges, we develop a new tool GDmicro, which combines semi-supervised learning and domain adaptation to achieve a more generalized model using limited labeled samples. We evaluated GDmicro on human gut microbiome data from 11 cohorts covering 5 different diseases. The results show that GDmicro has better performance and robustness than state-of-the-art tools. In particular, it improves the AUC from 0.783 to 0.949 in identifying inflammatory bowel disease. Furthermore, GDmicro can identify potential biomarkers with greater accuracy than abundance-based statistical analysis methods. It also reveals the contribution of these biomarkers to the host’s disease status. Availability and implementation https://github.com/liaoherui/GDmicro Supplementary information Supplementary data are available at Bioinformatics online
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GDmicro:利用基于人类肠道微生物组数据的 GCN 和深度适应网络对宿主疾病状况进行分类
动机 随着元基因组测序技术的发展,越来越多的研究揭示了人类肠道微生物组与某些人类疾病之间的关联。这些关联为利用肠道微生物组数据区分特定疾病的病例和对照样本(也称为宿主疾病状态分类)提供了启示。重要的是,与基于丰度的统计分析相比,使用基于学习的模型来区分疾病和对照样本有望更准确地识别重要的生物标志物。然而,现有的工具还没有完全解决与这项任务相关的两个难题:标注的微生物组数据有限和交叉研究的准确性降低。饮食、不同研究/队列中样本采集/测序的技术偏差等混杂因素往往会影响学习模型的通用性。结果 为了应对这些挑战,我们开发了一种新工具 GDmicro,它结合了半监督学习和领域适应性,能利用有限的标记样本建立更具普适性的模型。我们对来自 11 个队列、涵盖 5 种不同疾病的人类肠道微生物组数据进行了 GDmicro 评估。结果表明,与最先进的工具相比,GDmicro 具有更好的性能和鲁棒性。特别是,它在识别炎症性肠病方面的 AUC 从 0.783 提高到了 0.949。此外,与基于丰度的统计分析方法相比,GDmicro 能更准确地识别潜在的生物标记物。它还能揭示这些生物标记物对宿主疾病状态的贡献。可用性和实施 https://github.com/liaoherui/GDmicro 补充信息 补充数据可在生物信息学网上查阅
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Bioinformatics
Bioinformatics 生物-生化研究方法
CiteScore
11.20
自引率
5.20%
发文量
753
审稿时长
2.1 months
期刊介绍: The leading journal in its field, Bioinformatics publishes the highest quality scientific papers and review articles of interest to academic and industrial researchers. Its main focus is on new developments in genome bioinformatics and computational biology. Two distinct sections within the journal - Discovery Notes and Application Notes- focus on shorter papers; the former reporting biologically interesting discoveries using computational methods, the latter exploring the applications used for experiments.
期刊最新文献
MEHunter: Transformer-based mobile element variant detection from long reads Metabolic syndrome may be more frequent in treatment-naive sarcoidosis patients. Coracle—A Machine Learning Framework to Identify Bacteria Associated with Continuous Variables CoSIA: an R Bioconductor package for CrOss Species Investigation and Analysis LncLocFormer: a Transformer-based deep learning model for multi-label lncRNA subcellular localization prediction by using localization-specific attention mechanism
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1