基于宏基因组数据的不同微生物群落功能性生物标志物发现的机器学习框架

Wei Fang, Xingzhi Chang, Xiaoquan Su, Jian Xu, Deli Zhang, K. Ning
{"title":"基于宏基因组数据的不同微生物群落功能性生物标志物发现的机器学习框架","authors":"Wei Fang, Xingzhi Chang, Xiaoquan Su, Jian Xu, Deli Zhang, K. Ning","doi":"10.1109/ISB.2012.6314121","DOIUrl":null,"url":null,"abstract":"As more than 90% of microbial community could not be isolated and cultivated, the metagenomic methods have been commonly used to analyze the microbial community as a whole. With the fast acumination of metagenomic samples, it is now intriguing to find simple biomarkers, especially functional biomarkers, which could distinguish different metagenomic samples. Next-generation sequencing techniques have enabled the detection of very accurate gene-presence (abundance) values in metagenomic studies. And the presence/absence or different abundance values for a set of genes could be used as appropriate biomarker for identification of the corresponding microbial community's phenotype. However, it is not yet clear how to select such a set of genes (features), and how accurate would it be for such a set of selected genes on prediction of microbial community's phenotype. In this study, we have evaluated different machine learning methods, including feature selection methods and classification methods, for selection of biomarkers that could distinguish different samples. Then we proposed a machine learning framework, which could discover biomarkers for different microbial communities from the mining of metagenomic data. Given a set of features (genes) and their presence values in multiple samples, we first selected discriminative features as candidate by feature selection, and then selected the feature sets with low error rate and classification accuracies as biomarkers by classification method. We have selected whole genome sequencing data from simulation, public domain and in-house metagenomic data generation facilities. We tested the framework on prediction and evaluation of the biomarkers. Results have shown that the framework could select functional biomarkers with very high accuracy. Therefore, this framework would be a suitable tool to discover functional biomarkers to distinguish different microbial communities.","PeriodicalId":224011,"journal":{"name":"2012 IEEE 6th International Conference on Systems Biology (ISB)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A machine learning framework of functional biomarker discovery for different microbial communities based on metagenomic data\",\"authors\":\"Wei Fang, Xingzhi Chang, Xiaoquan Su, Jian Xu, Deli Zhang, K. Ning\",\"doi\":\"10.1109/ISB.2012.6314121\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As more than 90% of microbial community could not be isolated and cultivated, the metagenomic methods have been commonly used to analyze the microbial community as a whole. With the fast acumination of metagenomic samples, it is now intriguing to find simple biomarkers, especially functional biomarkers, which could distinguish different metagenomic samples. Next-generation sequencing techniques have enabled the detection of very accurate gene-presence (abundance) values in metagenomic studies. And the presence/absence or different abundance values for a set of genes could be used as appropriate biomarker for identification of the corresponding microbial community's phenotype. However, it is not yet clear how to select such a set of genes (features), and how accurate would it be for such a set of selected genes on prediction of microbial community's phenotype. In this study, we have evaluated different machine learning methods, including feature selection methods and classification methods, for selection of biomarkers that could distinguish different samples. Then we proposed a machine learning framework, which could discover biomarkers for different microbial communities from the mining of metagenomic data. Given a set of features (genes) and their presence values in multiple samples, we first selected discriminative features as candidate by feature selection, and then selected the feature sets with low error rate and classification accuracies as biomarkers by classification method. We have selected whole genome sequencing data from simulation, public domain and in-house metagenomic data generation facilities. We tested the framework on prediction and evaluation of the biomarkers. Results have shown that the framework could select functional biomarkers with very high accuracy. Therefore, this framework would be a suitable tool to discover functional biomarkers to distinguish different microbial communities.\",\"PeriodicalId\":224011,\"journal\":{\"name\":\"2012 IEEE 6th International Conference on Systems Biology (ISB)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 6th International Conference on Systems Biology (ISB)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISB.2012.6314121\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 6th International Conference on Systems Biology (ISB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISB.2012.6314121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

由于90%以上的微生物群落无法分离和培养,宏基因组学方法已被广泛用于对微生物群落进行整体分析。随着宏基因组样本的快速积累,寻找能够区分不同宏基因组样本的简单生物标志物,尤其是功能性生物标志物已成为人们关注的焦点。新一代测序技术已经能够在宏基因组研究中检测非常准确的基因存在(丰度)值。一组基因的存在/缺失或不同的丰度值可作为鉴定相应微生物群落表型的合适生物标志物。然而,如何选择这样一组基因(特征),以及这样一组选择的基因对微生物群落表型的预测准确度如何,目前还不清楚。在这项研究中,我们评估了不同的机器学习方法,包括特征选择方法和分类方法,用于选择可以区分不同样本的生物标志物。然后,我们提出了一个机器学习框架,该框架可以从元基因组数据的挖掘中发现不同微生物群落的生物标志物。给定一组特征(基因)及其在多个样本中的存在值,首先通过特征选择选择判别特征作为候选特征,然后通过分类方法选择错误率和分类准确率较低的特征集作为生物标志物。我们从模拟、公共领域和内部宏基因组数据生成设施中选择了全基因组测序数据。我们测试了生物标记物的预测和评估框架。结果表明,该框架能够以非常高的准确率选择功能性生物标志物。因此,该框架将是发现功能生物标志物以区分不同微生物群落的合适工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A machine learning framework of functional biomarker discovery for different microbial communities based on metagenomic data
As more than 90% of microbial community could not be isolated and cultivated, the metagenomic methods have been commonly used to analyze the microbial community as a whole. With the fast acumination of metagenomic samples, it is now intriguing to find simple biomarkers, especially functional biomarkers, which could distinguish different metagenomic samples. Next-generation sequencing techniques have enabled the detection of very accurate gene-presence (abundance) values in metagenomic studies. And the presence/absence or different abundance values for a set of genes could be used as appropriate biomarker for identification of the corresponding microbial community's phenotype. However, it is not yet clear how to select such a set of genes (features), and how accurate would it be for such a set of selected genes on prediction of microbial community's phenotype. In this study, we have evaluated different machine learning methods, including feature selection methods and classification methods, for selection of biomarkers that could distinguish different samples. Then we proposed a machine learning framework, which could discover biomarkers for different microbial communities from the mining of metagenomic data. Given a set of features (genes) and their presence values in multiple samples, we first selected discriminative features as candidate by feature selection, and then selected the feature sets with low error rate and classification accuracies as biomarkers by classification method. We have selected whole genome sequencing data from simulation, public domain and in-house metagenomic data generation facilities. We tested the framework on prediction and evaluation of the biomarkers. Results have shown that the framework could select functional biomarkers with very high accuracy. Therefore, this framework would be a suitable tool to discover functional biomarkers to distinguish different microbial communities.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A fixed-point blind source extraction algorithm and its application to ECG data analysis Comparing two models based on the transcriptional regulation by KaiC of cyanobacteria rhythm Predicting protein complexes via the integration of multiple biological information Effective clustering of microRNA sequences by N-grams and feature weighting RNA-seq coverage effects on biological pathways and GO tag clouds
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1