Peng Ken Lim, Ruoxi Wang, Jenet Princy Antony Velankanni, Marek Mutwil
{"title":"Constructing Ensemble Gene Functional Networks Capturing Tissue/condition-specific Co-expression from Unlabled Transcriptomic Data with TEA-GCN","authors":"Peng Ken Lim, Ruoxi Wang, Jenet Princy Antony Velankanni, Marek Mutwil","doi":"10.1101/2024.07.22.604713","DOIUrl":null,"url":null,"abstract":"Gene co-expression networks (GCNs) generated from public transcriptomic datasets can elucidate the co-regulatory and co-functional relationships between genes, making GCNs an important tool to predict gene functions. However, current GCN construction methods are sensitive to the quality of the data, and the interpretability of the identified relationships between genes is still difficult. To address this, we present a novel method: Two-Tier Ensemble Aggregation (TEA-) GCN. TEA-GCN utilizes unsupervised partitioning of big transcriptomic datasets and three correlation coefficients to generate ensemble GCNs in a two-step aggregation process. We show that TEA-GCN outperforms in finding correct functional relationships between genes over the current state-of-the-art across three model species, and is able to not only capture condition/tissue-specific gene co-expression but explain them through the use of natural language processing (NLP). In addition, we found TEA-GCN to be especially performant in identifying relationships between transcription factors and their activation targets, making it effective in inferring gene regulatory networks. TEA-GCN is available at https://github.com/pengkenlim/TEA-GCN.","PeriodicalId":501213,"journal":{"name":"bioRxiv - Systems Biology","volume":"21 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Systems Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.22.604713","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Gene co-expression networks (GCNs) generated from public transcriptomic datasets can elucidate the co-regulatory and co-functional relationships between genes, making GCNs an important tool to predict gene functions. However, current GCN construction methods are sensitive to the quality of the data, and the interpretability of the identified relationships between genes is still difficult. To address this, we present a novel method: Two-Tier Ensemble Aggregation (TEA-) GCN. TEA-GCN utilizes unsupervised partitioning of big transcriptomic datasets and three correlation coefficients to generate ensemble GCNs in a two-step aggregation process. We show that TEA-GCN outperforms in finding correct functional relationships between genes over the current state-of-the-art across three model species, and is able to not only capture condition/tissue-specific gene co-expression but explain them through the use of natural language processing (NLP). In addition, we found TEA-GCN to be especially performant in identifying relationships between transcription factors and their activation targets, making it effective in inferring gene regulatory networks. TEA-GCN is available at https://github.com/pengkenlim/TEA-GCN.