PhyloMix:通过系统发育混合增强增强微生物组-性状关联预测。

Yifan Jiang, Disen Liao, Qiyun Zhu, Yang Young Lu
{"title":"PhyloMix:通过系统发育混合增强增强微生物组-性状关联预测。","authors":"Yifan Jiang, Disen Liao, Qiyun Zhu, Yang Young Lu","doi":"10.1093/bioinformatics/btaf014","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Understanding the associations between traits and microbial composition is a fundamental objective in microbiome research. Recently, researchers have turned to machine learning (ML) models to achieve this goal with promising results. However, the effectiveness of advanced ML models is often limited by the unique characteristics of microbiome data, which are typically high-dimensional, compositional, and imbalanced. These characteristics can hinder the models' ability to fully explore the relationships among taxa in predictive analyses. To address this challenge, data augmentation has become crucial. It involves generating synthetic samples with artificial labels based on existing data and incorporating these samples into the training set to improve ML model performance.</p><p><strong>Results: </strong>Here we propose PhyloMix, a novel data augmentation method specifically designed for microbiome data to enhance predictive analyses. PhyloMix leverages the phylogenetic relationships among microbiome taxa as an informative prior to guide the generation of synthetic microbial samples. Leveraging phylogeny, PhyloMix creates new samples by removing a subtree from one sample and combining it with the corresponding subtree from another sample. Notably, PhyloMix is designed to address the compositional nature of microbiome data, effectively handling both raw counts and relative abundances. This approach introduces sufficient diversity into the augmented samples, leading to improved predictive performance. We empirically evaluated PhyloMix on six real microbiome datasets across five commonly used ML models. PhyloMix significantly outperforms distinct baseline methods including sample-mixing-based data augmentation techniques like vanilla mixup and compositional cutmix, as well as the phylogeny-based method TADA. We also demonstrated the wide applicability of PhyloMix in both supervised learning and contrastive representation learning.</p><p><strong>Availability: </strong>The Apache licensed source code is available at (https://github.com/batmen-lab/phylomix).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PhyloMix: Enhancing microbiome-trait association prediction through phylogeny-mixing augmentation.\",\"authors\":\"Yifan Jiang, Disen Liao, Qiyun Zhu, Yang Young Lu\",\"doi\":\"10.1093/bioinformatics/btaf014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Understanding the associations between traits and microbial composition is a fundamental objective in microbiome research. Recently, researchers have turned to machine learning (ML) models to achieve this goal with promising results. However, the effectiveness of advanced ML models is often limited by the unique characteristics of microbiome data, which are typically high-dimensional, compositional, and imbalanced. These characteristics can hinder the models' ability to fully explore the relationships among taxa in predictive analyses. To address this challenge, data augmentation has become crucial. It involves generating synthetic samples with artificial labels based on existing data and incorporating these samples into the training set to improve ML model performance.</p><p><strong>Results: </strong>Here we propose PhyloMix, a novel data augmentation method specifically designed for microbiome data to enhance predictive analyses. PhyloMix leverages the phylogenetic relationships among microbiome taxa as an informative prior to guide the generation of synthetic microbial samples. Leveraging phylogeny, PhyloMix creates new samples by removing a subtree from one sample and combining it with the corresponding subtree from another sample. Notably, PhyloMix is designed to address the compositional nature of microbiome data, effectively handling both raw counts and relative abundances. This approach introduces sufficient diversity into the augmented samples, leading to improved predictive performance. We empirically evaluated PhyloMix on six real microbiome datasets across five commonly used ML models. PhyloMix significantly outperforms distinct baseline methods including sample-mixing-based data augmentation techniques like vanilla mixup and compositional cutmix, as well as the phylogeny-based method TADA. We also demonstrated the wide applicability of PhyloMix in both supervised learning and contrastive representation learning.</p><p><strong>Availability: </strong>The Apache licensed source code is available at (https://github.com/batmen-lab/phylomix).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btaf014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

动机了解性状与微生物组成之间的关联是微生物组研究的一个基本目标。最近,研究人员转向使用机器学习(ML)模型来实现这一目标,并取得了可喜的成果。然而,高级 ML 模型的有效性往往受到微生物组数据独特特性的限制,这些数据通常具有高维、组成复杂和不平衡的特点。这些特点会阻碍模型在预测分析中充分探索类群之间关系的能力。为了应对这一挑战,数据扩增变得至关重要。它包括在现有数据的基础上生成带有人工标签的合成样本,并将这些样本纳入训练集,以提高 ML 模型的性能:在此,我们提出了 PhyloMix,这是一种专为微生物组数据设计的新型数据增强方法,可增强预测分析。PhyloMix 利用微生物群分类群之间的系统发育关系作为信息先导,指导合成微生物样本的生成。利用系统发育关系,PhyloMix 从一个样本中移除一个子树,然后将其与另一个样本中的相应子树结合,从而生成新样本。值得注意的是,PhyloMix 的设计旨在解决微生物组数据的组成性质问题,有效处理原始计数和相对丰度。这种方法为增强样本引入了足够的多样性,从而提高了预测性能。我们在六个真实的微生物组数据集上对 PhyloMix 进行了实证评估,涉及五个常用的 ML 模型。PhyloMix 明显优于不同的基线方法,包括基于样本混合的数据增强技术,如 vanilla mixup 和 compositional cutmix,以及基于系统发育的方法 TADA。我们还证明了 PhyloMix 在监督学习和对比表示学习中的广泛适用性:Apache 许可的源代码可在 (https://github.com/batmen-lab/phylomix) 上获取。补充信息:补充数据可从 Bioinformatics 网站获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PhyloMix: Enhancing microbiome-trait association prediction through phylogeny-mixing augmentation.

Motivation: Understanding the associations between traits and microbial composition is a fundamental objective in microbiome research. Recently, researchers have turned to machine learning (ML) models to achieve this goal with promising results. However, the effectiveness of advanced ML models is often limited by the unique characteristics of microbiome data, which are typically high-dimensional, compositional, and imbalanced. These characteristics can hinder the models' ability to fully explore the relationships among taxa in predictive analyses. To address this challenge, data augmentation has become crucial. It involves generating synthetic samples with artificial labels based on existing data and incorporating these samples into the training set to improve ML model performance.

Results: Here we propose PhyloMix, a novel data augmentation method specifically designed for microbiome data to enhance predictive analyses. PhyloMix leverages the phylogenetic relationships among microbiome taxa as an informative prior to guide the generation of synthetic microbial samples. Leveraging phylogeny, PhyloMix creates new samples by removing a subtree from one sample and combining it with the corresponding subtree from another sample. Notably, PhyloMix is designed to address the compositional nature of microbiome data, effectively handling both raw counts and relative abundances. This approach introduces sufficient diversity into the augmented samples, leading to improved predictive performance. We empirically evaluated PhyloMix on six real microbiome datasets across five commonly used ML models. PhyloMix significantly outperforms distinct baseline methods including sample-mixing-based data augmentation techniques like vanilla mixup and compositional cutmix, as well as the phylogeny-based method TADA. We also demonstrated the wide applicability of PhyloMix in both supervised learning and contrastive representation learning.

Availability: The Apache licensed source code is available at (https://github.com/batmen-lab/phylomix).

Supplementary information: Supplementary data are available at Bioinformatics.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
HTSinfer: Inferring metadata from bulk illumina RNA-Seq libraries. MOSTPLAS: A Self-correction Multi-label Learning Model for Plasmid Host Range Prediction. GCLink: a graph contrastive link prediction framework for gene regulatory network inference. PNL: a software to build polygenic risk scores using a Super Learner approach based on PairNet, a Convolutional Neural Network. TiltRec: An ultra-fast and open-source toolkit for cryo-electron tomographic reconstruction.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1