Jakob Wirbel, Konrad Zych, Morgan Essex, Nicolai Karcher, Ece Kartal, Guillem Salazar, Peer Bork, Shinichi Sunagawa, Georg Zeller
{"title":"利用 SIAMCAT 机器学习工具箱进行微生物组元分析和跨疾病比较。","authors":"Jakob Wirbel, Konrad Zych, Morgan Essex, Nicolai Karcher, Ece Kartal, Guillem Salazar, Peer Bork, Shinichi Sunagawa, Georg Zeller","doi":"10.1186/s13059-021-02306-1","DOIUrl":null,"url":null,"abstract":"<p><p>The human microbiome is increasingly mined for diagnostic and therapeutic biomarkers using machine learning (ML). However, metagenomics-specific software is scarce, and overoptimistic evaluation and limited cross-study generalization are prevailing issues. To address these, we developed SIAMCAT, a versatile R toolbox for ML-based comparative metagenomics. We demonstrate its capabilities in a meta-analysis of fecal metagenomic studies (10,803 samples). When naively transferred across studies, ML models lost accuracy and disease specificity, which could however be resolved by a novel training set augmentation strategy. This reveals some biomarkers to be disease-specific, with others shared across multiple conditions. SIAMCAT is freely available from siamcat.embl.de .</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"93"},"PeriodicalIF":12.3000,"publicationDate":"2021-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8008609/pdf/","citationCount":"0","resultStr":"{\"title\":\"Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox.\",\"authors\":\"Jakob Wirbel, Konrad Zych, Morgan Essex, Nicolai Karcher, Ece Kartal, Guillem Salazar, Peer Bork, Shinichi Sunagawa, Georg Zeller\",\"doi\":\"10.1186/s13059-021-02306-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The human microbiome is increasingly mined for diagnostic and therapeutic biomarkers using machine learning (ML). However, metagenomics-specific software is scarce, and overoptimistic evaluation and limited cross-study generalization are prevailing issues. To address these, we developed SIAMCAT, a versatile R toolbox for ML-based comparative metagenomics. We demonstrate its capabilities in a meta-analysis of fecal metagenomic studies (10,803 samples). When naively transferred across studies, ML models lost accuracy and disease specificity, which could however be resolved by a novel training set augmentation strategy. This reveals some biomarkers to be disease-specific, with others shared across multiple conditions. SIAMCAT is freely available from siamcat.embl.de .</p>\",\"PeriodicalId\":48922,\"journal\":{\"name\":\"Genome Biology\",\"volume\":\"22 1\",\"pages\":\"93\"},\"PeriodicalIF\":12.3000,\"publicationDate\":\"2021-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8008609/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13059-021-02306-1\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Agricultural and Biological Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13059-021-02306-1","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}
引用次数: 0
摘要
人们越来越多地利用机器学习(ML)技术从人类微生物组中挖掘诊断和治疗生物标志物。然而,针对元基因组学的软件非常稀缺,过度乐观的评估和有限的跨研究泛化是普遍存在的问题。为了解决这些问题,我们开发了 SIAMCAT,这是一个用于基于 ML 的比较元基因组学的多功能 R 工具箱。我们在粪便元基因组研究(10803 个样本)的荟萃分析中展示了它的能力。当在不同研究之间进行简单移植时,ML 模型会失去准确性和疾病特异性,但这可以通过一种新颖的训练集增强策略来解决。这揭示了一些生物标志物具有疾病特异性,而另一些生物标志物则在多种疾病中共享。SIAMCAT 可从 siamcat.embl.de 免费获取。
Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox.
The human microbiome is increasingly mined for diagnostic and therapeutic biomarkers using machine learning (ML). However, metagenomics-specific software is scarce, and overoptimistic evaluation and limited cross-study generalization are prevailing issues. To address these, we developed SIAMCAT, a versatile R toolbox for ML-based comparative metagenomics. We demonstrate its capabilities in a meta-analysis of fecal metagenomic studies (10,803 samples). When naively transferred across studies, ML models lost accuracy and disease specificity, which could however be resolved by a novel training set augmentation strategy. This reveals some biomarkers to be disease-specific, with others shared across multiple conditions. SIAMCAT is freely available from siamcat.embl.de .
期刊介绍:
Genome Biology is a leading research journal that focuses on the study of biology and biomedicine from a genomic and post-genomic standpoint. The journal consistently publishes outstanding research across various areas within these fields.
With an impressive impact factor of 12.3 (2022), Genome Biology has earned its place as the 3rd highest-ranked research journal in the Genetics and Heredity category, according to Thomson Reuters. Additionally, it is ranked 2nd among research journals in the Biotechnology and Applied Microbiology category. It is important to note that Genome Biology is the top-ranking open access journal in this category.
In summary, Genome Biology sets a high standard for scientific publications in the field, showcasing cutting-edge research and earning recognition among its peers.