基于宏基因组数据的不同微生物群落功能性生物标志物发现的机器学习框架

2012 IEEE 6th International Conference on Systems Biology (ISB) Pub Date : 2012-09-27 DOI:10.1109/ISB.2012.6314121

Wei Fang, Xingzhi Chang, Xiaoquan Su, Jian Xu, Deli Zhang, K. Ning

{"title":"基于宏基因组数据的不同微生物群落功能性生物标志物发现的机器学习框架","authors":"Wei Fang, Xingzhi Chang, Xiaoquan Su, Jian Xu, Deli Zhang, K. Ning","doi":"10.1109/ISB.2012.6314121","DOIUrl":null,"url":null,"abstract":"As more than 90% of microbial community could not be isolated and cultivated, the metagenomic methods have been commonly used to analyze the microbial community as a whole. With the fast acumination of metagenomic samples, it is now intriguing to find simple biomarkers, especially functional biomarkers, which could distinguish different metagenomic samples. Next-generation sequencing techniques have enabled the detection of very accurate gene-presence (abundance) values in metagenomic studies. And the presence/absence or different abundance values for a set of genes could be used as appropriate biomarker for identification of the corresponding microbial community's phenotype. However, it is not yet clear how to select such a set of genes (features), and how accurate would it be for such a set of selected genes on prediction of microbial community's phenotype. In this study, we have evaluated different machine learning methods, including feature selection methods and classification methods, for selection of biomarkers that could distinguish different samples. Then we proposed a machine learning framework, which could discover biomarkers for different microbial communities from the mining of metagenomic data. Given a set of features (genes) and their presence values in multiple samples, we first selected discriminative features as candidate by feature selection, and then selected the feature sets with low error rate and classification accuracies as biomarkers by classification method. We have selected whole genome sequencing data from simulation, public domain and in-house metagenomic data generation facilities. We tested the framework on prediction and evaluation of the biomarkers. Results have shown that the framework could select functional biomarkers with very high accuracy. Therefore, this framework would be a suitable tool to discover functional biomarkers to distinguish different microbial communities.","PeriodicalId":224011,"journal":{"name":"2012 IEEE 6th International Conference on Systems Biology (ISB)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A machine learning framework of functional biomarker discovery for different microbial communities based on metagenomic data\",\"authors\":\"Wei Fang, Xingzhi Chang, Xiaoquan Su, Jian Xu, Deli Zhang, K. Ning\",\"doi\":\"10.1109/ISB.2012.6314121\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As more than 90% of microbial community could not be isolated and cultivated, the metagenomic methods have been commonly used to analyze the microbial community as a whole. With the fast acumination of metagenomic samples, it is now intriguing to find simple biomarkers, especially functional biomarkers, which could distinguish different metagenomic samples. Next-generation sequencing techniques have enabled the detection of very accurate gene-presence (abundance) values in metagenomic studies. And the presence/absence or different abundance values for a set of genes could be used as appropriate biomarker for identification of the corresponding microbial community's phenotype. However, it is not yet clear how to select such a set of genes (features), and how accurate would it be for such a set of selected genes on prediction of microbial community's phenotype. In this study, we have evaluated different machine learning methods, including feature selection methods and classification methods, for selection of biomarkers that could distinguish different samples. Then we proposed a machine learning framework, which could discover biomarkers for different microbial communities from the mining of metagenomic data. Given a set of features (genes) and their presence values in multiple samples, we first selected discriminative features as candidate by feature selection, and then selected the feature sets with low error rate and classification accuracies as biomarkers by classification method. We have selected whole genome sequencing data from simulation, public domain and in-house metagenomic data generation facilities. We tested the framework on prediction and evaluation of the biomarkers. Results have shown that the framework could select functional biomarkers with very high accuracy. Therefore, this framework would be a suitable tool to discover functional biomarkers to distinguish different microbial communities.\",\"PeriodicalId\":224011,\"journal\":{\"name\":\"2012 IEEE 6th International Conference on Systems Biology (ISB)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 6th International Conference on Systems Biology (ISB)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISB.2012.6314121\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 6th International Conference on Systems Biology (ISB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISB.2012.6314121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

由于90%以上的微生物群落无法分离和培养，宏基因组学方法已被广泛用于对微生物群落进行整体分析。随着宏基因组样本的快速积累，寻找能够区分不同宏基因组样本的简单生物标志物，尤其是功能性生物标志物已成为人们关注的焦点。新一代测序技术已经能够在宏基因组研究中检测非常准确的基因存在(丰度)值。一组基因的存在/缺失或不同的丰度值可作为鉴定相应微生物群落表型的合适生物标志物。然而，如何选择这样一组基因(特征)，以及这样一组选择的基因对微生物群落表型的预测准确度如何，目前还不清楚。在这项研究中，我们评估了不同的机器学习方法，包括特征选择方法和分类方法，用于选择可以区分不同样本的生物标志物。然后，我们提出了一个机器学习框架，该框架可以从元基因组数据的挖掘中发现不同微生物群落的生物标志物。给定一组特征(基因)及其在多个样本中的存在值，首先通过特征选择选择判别特征作为候选特征，然后通过分类方法选择错误率和分类准确率较低的特征集作为生物标志物。我们从模拟、公共领域和内部宏基因组数据生成设施中选择了全基因组测序数据。我们测试了生物标记物的预测和评估框架。结果表明，该框架能够以非常高的准确率选择功能性生物标志物。因此，该框架将是发现功能生物标志物以区分不同微生物群落的合适工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A machine learning framework of functional biomarker discovery for different microbial communities based on metagenomic data

As more than 90% of microbial community could not be isolated and cultivated, the metagenomic methods have been commonly used to analyze the microbial community as a whole. With the fast acumination of metagenomic samples, it is now intriguing to find simple biomarkers, especially functional biomarkers, which could distinguish different metagenomic samples. Next-generation sequencing techniques have enabled the detection of very accurate gene-presence (abundance) values in metagenomic studies. And the presence/absence or different abundance values for a set of genes could be used as appropriate biomarker for identification of the corresponding microbial community's phenotype. However, it is not yet clear how to select such a set of genes (features), and how accurate would it be for such a set of selected genes on prediction of microbial community's phenotype. In this study, we have evaluated different machine learning methods, including feature selection methods and classification methods, for selection of biomarkers that could distinguish different samples. Then we proposed a machine learning framework, which could discover biomarkers for different microbial communities from the mining of metagenomic data. Given a set of features (genes) and their presence values in multiple samples, we first selected discriminative features as candidate by feature selection, and then selected the feature sets with low error rate and classification accuracies as biomarkers by classification method. We have selected whole genome sequencing data from simulation, public domain and in-house metagenomic data generation facilities. We tested the framework on prediction and evaluation of the biomarkers. Results have shown that the framework could select functional biomarkers with very high accuracy. Therefore, this framework would be a suitable tool to discover functional biomarkers to distinguish different microbial communities.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE 6th International Conference on Systems Biology (ISB)

自引率

0.00%

发文量