{"title":"Metabolic Network Construction Using Ensemble Algorithms","authors":"S. Kim, Joohyoung Lee, H. Jang, Xiang Zhang","doi":"10.4108/EAI.3-12-2015.2262392","DOIUrl":null,"url":null,"abstract":"One of the most important and challenging \"knowledge extraction\" tasks in bioinformatics is the reverse engineering of genes, proteins, and metabolites networks from biological data. Gaussian graphical models (GGMs) have been proven to be a very powerful formalism to infer biological networks. Standard GGM selection techniques can unfortunately not be used in the \"small N, large P\" data setting. Various methods to overcome this issue have been developed based on regularized estimation, partial least squares method, and limited-order partial correlation graphs. Several studies compared the performances among several network construction algorithms, such as PLSR, SCE, and ES, ICR and PCR, Ridge regression, Lasso and adaptive Lasso, to see which method is the best for biological network constructions. Each comparison analysis resulted in that each construction method has its own advantages as well as disadvantages according to different circumstances, such as the network complexity. However, it is almost impossible to recognize the complexity of the network before estimation. Thus, we develop an Ensemble method which is model averaging to construct a metabolic network. Our simulation studies show that the ensemble averaging based network construction has F1 score larger than these of other methods except only for Adaptive Lasso, reflecting its ability to account for uncertainty of network complexity.","PeriodicalId":415083,"journal":{"name":"International Conference on Bio-inspired Information and Communications Technologies","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Bio-inspired Information and Communications Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4108/EAI.3-12-2015.2262392","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
One of the most important and challenging "knowledge extraction" tasks in bioinformatics is the reverse engineering of genes, proteins, and metabolites networks from biological data. Gaussian graphical models (GGMs) have been proven to be a very powerful formalism to infer biological networks. Standard GGM selection techniques can unfortunately not be used in the "small N, large P" data setting. Various methods to overcome this issue have been developed based on regularized estimation, partial least squares method, and limited-order partial correlation graphs. Several studies compared the performances among several network construction algorithms, such as PLSR, SCE, and ES, ICR and PCR, Ridge regression, Lasso and adaptive Lasso, to see which method is the best for biological network constructions. Each comparison analysis resulted in that each construction method has its own advantages as well as disadvantages according to different circumstances, such as the network complexity. However, it is almost impossible to recognize the complexity of the network before estimation. Thus, we develop an Ensemble method which is model averaging to construct a metabolic network. Our simulation studies show that the ensemble averaging based network construction has F1 score larger than these of other methods except only for Adaptive Lasso, reflecting its ability to account for uncertainty of network complexity.