Using gut microbiome metagenomic hypervariable features for diabetes screening and typing through supervised machine learning.

IF 4 2区 生物学 Q1 GENETICS & HEREDITY Microbial Genomics Pub Date : 2025-03-01 DOI:10.1099/mgen.0.001365
Xavier Chavarria, Hyun Seo Park, Singeun Oh, Dongjun Kang, Jun Ho Choi, Myungjun Kim, Yoon Hee Cho, Myung-Hee Yi, Ju Yeong Kim
{"title":"Using gut microbiome metagenomic hypervariable features for diabetes screening and typing through supervised machine learning.","authors":"Xavier Chavarria, Hyun Seo Park, Singeun Oh, Dongjun Kang, Jun Ho Choi, Myungjun Kim, Yoon Hee Cho, Myung-Hee Yi, Ju Yeong Kim","doi":"10.1099/mgen.0.001365","DOIUrl":null,"url":null,"abstract":"<p><p>Diabetes mellitus is a complex metabolic disorder and one of the fastest-growing global public health concerns. The gut microbiota is implicated in the pathophysiology of various diseases, including diabetes. This study utilized 16S rRNA metagenomic data from a volunteer citizen science initiative to investigate microbial markers associated with diabetes status (positive or negative) and type (type 1 or type 2 diabetes mellitus) using supervised machine learning (ML) models. The diversity of the microbiome varied according to diabetes status and type. Differential microbial signatures between diabetes types and negative group revealed an increased presence of <i>Brucellaceae</i>, <i>Ruminococcaceae</i>, <i>Clostridiaceae</i>, <i>Micrococcaceae</i>, <i>Barnesiellaceae</i> and <i>Fusobacteriaceae</i> in subjects with diabetes type 1, and <i>Veillonellaceae</i>, <i>Streptococcaceae</i> and the order <i>Gammaproteobacteria</i> in subjects with diabetes type 2. The decision tree, elastic net, random forest (RF) and support vector machine with radial kernel ML algorithms were trained to screen and type diabetes based on microbial profiles of 76 subjects with type 1 diabetes, 366 subjects with type 2 diabetes and 250 subjects without diabetes. Using the 1000 most variable features, tree-based models were the highest-performing algorithms. The RF screening models achieved the best performance, with an average area under the receiver operating characteristic curve (AUC) of 0.76, although all models lacked sensitivity. Reducing the dataset to 500 features produced an AUC of 0.77 with sensitivity increasing by 74% from 0.46 to 0.80. Model performance improved for the classification of negative-status and type 2 diabetes. Diabetes type models performed best with 500 features, but the metric performed poorly across all model iterations. ML has the potential to facilitate early diagnosis of diabetes based on microbial profiles of the gut microbiome.</p>","PeriodicalId":18487,"journal":{"name":"Microbial Genomics","volume":"11 3","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11893737/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbial Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1099/mgen.0.001365","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Diabetes mellitus is a complex metabolic disorder and one of the fastest-growing global public health concerns. The gut microbiota is implicated in the pathophysiology of various diseases, including diabetes. This study utilized 16S rRNA metagenomic data from a volunteer citizen science initiative to investigate microbial markers associated with diabetes status (positive or negative) and type (type 1 or type 2 diabetes mellitus) using supervised machine learning (ML) models. The diversity of the microbiome varied according to diabetes status and type. Differential microbial signatures between diabetes types and negative group revealed an increased presence of Brucellaceae, Ruminococcaceae, Clostridiaceae, Micrococcaceae, Barnesiellaceae and Fusobacteriaceae in subjects with diabetes type 1, and Veillonellaceae, Streptococcaceae and the order Gammaproteobacteria in subjects with diabetes type 2. The decision tree, elastic net, random forest (RF) and support vector machine with radial kernel ML algorithms were trained to screen and type diabetes based on microbial profiles of 76 subjects with type 1 diabetes, 366 subjects with type 2 diabetes and 250 subjects without diabetes. Using the 1000 most variable features, tree-based models were the highest-performing algorithms. The RF screening models achieved the best performance, with an average area under the receiver operating characteristic curve (AUC) of 0.76, although all models lacked sensitivity. Reducing the dataset to 500 features produced an AUC of 0.77 with sensitivity increasing by 74% from 0.46 to 0.80. Model performance improved for the classification of negative-status and type 2 diabetes. Diabetes type models performed best with 500 features, but the metric performed poorly across all model iterations. ML has the potential to facilitate early diagnosis of diabetes based on microbial profiles of the gut microbiome.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Microbial Genomics
Microbial Genomics Medicine-Epidemiology
CiteScore
6.60
自引率
2.60%
发文量
153
审稿时长
12 weeks
期刊介绍: Microbial Genomics (MGen) is a fully open access, mandatory open data and peer-reviewed journal publishing high-profile original research on archaea, bacteria, microbial eukaryotes and viruses.
期刊最新文献
Differential effect of monoterpenes and flavonoids on the transcription of aromatic ring-hydroxylating dioxygenase genes in Rhodococcus opacus C1 and Rhodococcus sp. WAY2. Global phylogenomic analysis of Staphylococcus pseudintermedius reveals genomic and prophage diversity in multidrug-resistant lineages. Molecular epidemiology of a multidrug-resistant Shigella sonnei outbreak in Tunisia (2022-2023) using whole-genome sequencing. Rapid, reference-free identification of bacterial pathogen transmission using optimized split k-mer analysis. Using gut microbiome metagenomic hypervariable features for diabetes screening and typing through supervised machine learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1