Chun-Cheng Liu, Chien-Ming Chen, Cing-Han Yang, Tun-Wen Pai, P. Lim, S. Phang, Sze-Wan Poong, Kok-Keong Lee
{"title":"Biological Pathway Analysis for De Novo Transcriptomes through Multiple Reference Species Selections","authors":"Chun-Cheng Liu, Chien-Ming Chen, Cing-Han Yang, Tun-Wen Pai, P. Lim, S. Phang, Sze-Wan Poong, Kok-Keong Lee","doi":"10.1109/CISIS.2016.73","DOIUrl":null,"url":null,"abstract":"For de novo transcriptome analysis, choosing a closest reference model specie in terms of evolutionary distance is a general approach for gene mapping and genome annotations. However, not every selected reference model species possesses comprehensive genome annotations and curated information, and the total number of mapped genes from the selected reference species could not be fully expected either. Due to inefficient mapped genes from the selected reference model species, the following functional pathway analysis on transcriptome datasets would be seriously affected. To solve this problem, we proposed an improved approach based on multiple reference model species selection, especially for KEGG pathway analysis on differentially expressed genes. Applying union operations on individually mapped genes from different selected species, we could significantly promote the integrity of gene mapping results in KEGG pathways and provide realistic P-values for each identified pathway. Furthermore, based on mapped genes and KGML datasets, we applied various gray-levels, colors and shapes to present gene expression conditions on each biological pathway. Taking NGS transcriptomic datasets from an unknown Antarctic green alga species as an experimental example and selecting three published known species including Chlamydomonas reinhardtii, Chlorella variabilis, and Coccomyxa subellipsoidea as candidate reference species, we compared the results of pathway enrichment analysis by adopting different selections of reference species. We found that integrating all mapped genes from various model species provided a better result compared to using any single reference species. Some missed important biological pathways could be retrieved under an identical threshold setting of P-value, such as Ribosome, Pyrimidine metabolism and ABC transporters pathways. Therefore, we believe appropriate selection of multiple reference species is necessary and significant for transcriptome analysis on de novo species.","PeriodicalId":249236,"journal":{"name":"2016 10th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS)","volume":"189 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 10th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISIS.2016.73","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
For de novo transcriptome analysis, choosing a closest reference model specie in terms of evolutionary distance is a general approach for gene mapping and genome annotations. However, not every selected reference model species possesses comprehensive genome annotations and curated information, and the total number of mapped genes from the selected reference species could not be fully expected either. Due to inefficient mapped genes from the selected reference model species, the following functional pathway analysis on transcriptome datasets would be seriously affected. To solve this problem, we proposed an improved approach based on multiple reference model species selection, especially for KEGG pathway analysis on differentially expressed genes. Applying union operations on individually mapped genes from different selected species, we could significantly promote the integrity of gene mapping results in KEGG pathways and provide realistic P-values for each identified pathway. Furthermore, based on mapped genes and KGML datasets, we applied various gray-levels, colors and shapes to present gene expression conditions on each biological pathway. Taking NGS transcriptomic datasets from an unknown Antarctic green alga species as an experimental example and selecting three published known species including Chlamydomonas reinhardtii, Chlorella variabilis, and Coccomyxa subellipsoidea as candidate reference species, we compared the results of pathway enrichment analysis by adopting different selections of reference species. We found that integrating all mapped genes from various model species provided a better result compared to using any single reference species. Some missed important biological pathways could be retrieved under an identical threshold setting of P-value, such as Ribosome, Pyrimidine metabolism and ABC transporters pathways. Therefore, we believe appropriate selection of multiple reference species is necessary and significant for transcriptome analysis on de novo species.