GDmicro: classifying host disease status with GCN and Deep adaptation network based on the human gut microbiome data

IF 5.4 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS Bioinformatics Pub Date : 2023-12-12 DOI:10.1093/bioinformatics/btad747

Herui Liao, Jiayu Shang, Yanni Sun

{"title":"GDmicro: classifying host disease status with GCN and Deep adaptation network based on the human gut microbiome data","authors":"Herui Liao, Jiayu Shang, Yanni Sun","doi":"10.1093/bioinformatics/btad747","DOIUrl":null,"url":null,"abstract":"Motivation With advances in metagenomic sequencing technologies, there are accumulating studies revealing the associations between the human gut microbiome and some human diseases. These associations shed light on using gut microbiome data to distinguish case and control samples of a specific disease, which is also called host disease status classification. Importantly, using learning-based models to distinguish the disease and control samples is expected to identify important biomarkers more accurately than abundance-based statistical analysis. However, available tools have not fully addressed two challenges associated with this task: limited labeled microbiome data and decreased accuracy in cross-studies. The confounding factors such as the diet, technical biases in sample collection/sequencing across different studies/cohorts often jeopardize the generalization of the learning model. Results To address these challenges, we develop a new tool GDmicro, which combines semi-supervised learning and domain adaptation to achieve a more generalized model using limited labeled samples. We evaluated GDmicro on human gut microbiome data from 11 cohorts covering 5 different diseases. The results show that GDmicro has better performance and robustness than state-of-the-art tools. In particular, it improves the AUC from 0.783 to 0.949 in identifying inflammatory bowel disease. Furthermore, GDmicro can identify potential biomarkers with greater accuracy than abundance-based statistical analysis methods. It also reveals the contribution of these biomarkers to the host’s disease status. Availability and implementation https://github.com/liaoherui/GDmicro Supplementary information Supplementary data are available at Bioinformatics online","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"103 1","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btad747","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Motivation With advances in metagenomic sequencing technologies, there are accumulating studies revealing the associations between the human gut microbiome and some human diseases. These associations shed light on using gut microbiome data to distinguish case and control samples of a specific disease, which is also called host disease status classification. Importantly, using learning-based models to distinguish the disease and control samples is expected to identify important biomarkers more accurately than abundance-based statistical analysis. However, available tools have not fully addressed two challenges associated with this task: limited labeled microbiome data and decreased accuracy in cross-studies. The confounding factors such as the diet, technical biases in sample collection/sequencing across different studies/cohorts often jeopardize the generalization of the learning model. Results To address these challenges, we develop a new tool GDmicro, which combines semi-supervised learning and domain adaptation to achieve a more generalized model using limited labeled samples. We evaluated GDmicro on human gut microbiome data from 11 cohorts covering 5 different diseases. The results show that GDmicro has better performance and robustness than state-of-the-art tools. In particular, it improves the AUC from 0.783 to 0.949 in identifying inflammatory bowel disease. Furthermore, GDmicro can identify potential biomarkers with greater accuracy than abundance-based statistical analysis methods. It also reveals the contribution of these biomarkers to the host’s disease status. Availability and implementation https://github.com/liaoherui/GDmicro Supplementary information Supplementary data are available at Bioinformatics online

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

GDmicro：利用基于人类肠道微生物组数据的 GCN 和深度适应网络对宿主疾病状况进行分类

动机随着元基因组测序技术的发展，越来越多的研究揭示了人类肠道微生物组与某些人类疾病之间的关联。这些关联为利用肠道微生物组数据区分特定疾病的病例和对照样本（也称为宿主疾病状态分类）提供了启示。重要的是，与基于丰度的统计分析相比，使用基于学习的模型来区分疾病和对照样本有望更准确地识别重要的生物标志物。然而，现有的工具还没有完全解决与这项任务相关的两个难题：标注的微生物组数据有限和交叉研究的准确性降低。饮食、不同研究/队列中样本采集/测序的技术偏差等混杂因素往往会影响学习模型的通用性。结果为了应对这些挑战，我们开发了一种新工具 GDmicro，它结合了半监督学习和领域适应性，能利用有限的标记样本建立更具普适性的模型。我们对来自 11 个队列、涵盖 5 种不同疾病的人类肠道微生物组数据进行了 GDmicro 评估。结果表明，与最先进的工具相比，GDmicro 具有更好的性能和鲁棒性。特别是，它在识别炎症性肠病方面的 AUC 从 0.783 提高到了 0.949。此外，与基于丰度的统计分析方法相比，GDmicro 能更准确地识别潜在的生物标记物。它还能揭示这些生物标记物对宿主疾病状态的贡献。可用性和实施 https://github.com/liaoherui/GDmicro 补充信息补充数据可在生物信息学网上查阅

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Bioinformatics 生物-生化研究方法

CiteScore

11.20

自引率

5.20%

发文量

753

审稿时长

2.1 months

期刊介绍： The leading journal in its field, Bioinformatics publishes the highest quality scientific papers and review articles of interest to academic and industrial researchers. Its main focus is on new developments in genome bioinformatics and computational biology. Two distinct sections within the journal - Discovery Notes and Application Notes- focus on shorter papers; the former reporting biologically interesting discoveries using computational methods, the latter exploring the applications used for experiments.