CoGTEx:从 GTEx 数据中预测新型功能基因伙伴的无标度系统级共表达估计。

IF 2.9 3区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES PLoS ONE Pub Date : 2024-10-04 eCollection Date: 2024-01-01 DOI:10.1371/journal.pone.0309961
Miguel-Angel Cortes-Guzman, Víctor Treviño
{"title":"CoGTEx:从 GTEx 数据中预测新型功能基因伙伴的无标度系统级共表达估计。","authors":"Miguel-Angel Cortes-Guzman, Víctor Treviño","doi":"10.1371/journal.pone.0309961","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Coexpression estimations are helpful for analysis of pathways, cofactors, regulators, targets, and human health and disease. Ideally, coexpression estimations should consider as many diverse cell types as possible and consider that available data is not uniform across tissues. Importantly, the coexpression estimations accessible today are performed on a \"tissue level\", which is based on cell type standardized formulations. Little or no attention is paid to overall gene expression levels. The tissue-level estimation assumes that variance expression levels are more important than mean expression levels. Here, we challenge this assumption by estimating a coexpression calculation at the \"system level\", which is estimated without standardization by tissue, and show that it provides valuable information. We made available a resource to view, download, and analyze both, tissue- and system-level coexpression estimations from GTEx human data.</p><p><strong>Methods: </strong>GTEx v8 expression data was globally normalized, batch-processed, and filtered. Then, PCA, clustering, and tSNE stringent procedures were applied to generate 42 distinct and curated tissue clusters. Coexpression was estimated from these 42 tissue clusters computing the correlation of 33,445 genes by sampling 70 samples per tissue cluster to avoid tissue overrepresentation. This process was repeated 20 times, extracting the minimum value provided as a robust estimation. Three metrics were calculated (Pearson, Spearman, and G-statistic) in two data processing modes, at the system-level (TPM scale) and tissue levels (z-score scale).</p><p><strong>Results: </strong>We first validate our tissue-level estimations compared with other databases. Then, by specific analyses in several examples and literature validations of predictions, we show that system-level coexpression estimation differs from tissue-level estimations and that both contain valuable information reflected in biological pathways. We also show that coexpression estimations are associated to transcriptional regulation. Finally, we present CoGTEx, a valuable resource for viewing and analyzing coexpressed genes in human adult tissues from GTEx v8 data. We introduce our web resource to list, view and explore the coexpressed genes from GTEx data.</p><p><strong>Conclusion: </strong>We conclude that system-level coexpression is a novel and interesting coexpression metric capable of generating plausible predictions and biological hypotheses; and that CoGTEx is a valuable resource to view, compare, and download system- and tissue- level coexpression estimations from GTEx data.</p><p><strong>Availability: </strong>The web resource is available at http://bioinformatics.mx/cogtex.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CoGTEx: Unscaled system-level coexpression estimation from GTEx data forecast novel functional gene partners.\",\"authors\":\"Miguel-Angel Cortes-Guzman, Víctor Treviño\",\"doi\":\"10.1371/journal.pone.0309961\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Coexpression estimations are helpful for analysis of pathways, cofactors, regulators, targets, and human health and disease. Ideally, coexpression estimations should consider as many diverse cell types as possible and consider that available data is not uniform across tissues. Importantly, the coexpression estimations accessible today are performed on a \\\"tissue level\\\", which is based on cell type standardized formulations. Little or no attention is paid to overall gene expression levels. The tissue-level estimation assumes that variance expression levels are more important than mean expression levels. Here, we challenge this assumption by estimating a coexpression calculation at the \\\"system level\\\", which is estimated without standardization by tissue, and show that it provides valuable information. We made available a resource to view, download, and analyze both, tissue- and system-level coexpression estimations from GTEx human data.</p><p><strong>Methods: </strong>GTEx v8 expression data was globally normalized, batch-processed, and filtered. Then, PCA, clustering, and tSNE stringent procedures were applied to generate 42 distinct and curated tissue clusters. Coexpression was estimated from these 42 tissue clusters computing the correlation of 33,445 genes by sampling 70 samples per tissue cluster to avoid tissue overrepresentation. This process was repeated 20 times, extracting the minimum value provided as a robust estimation. Three metrics were calculated (Pearson, Spearman, and G-statistic) in two data processing modes, at the system-level (TPM scale) and tissue levels (z-score scale).</p><p><strong>Results: </strong>We first validate our tissue-level estimations compared with other databases. Then, by specific analyses in several examples and literature validations of predictions, we show that system-level coexpression estimation differs from tissue-level estimations and that both contain valuable information reflected in biological pathways. We also show that coexpression estimations are associated to transcriptional regulation. Finally, we present CoGTEx, a valuable resource for viewing and analyzing coexpressed genes in human adult tissues from GTEx v8 data. We introduce our web resource to list, view and explore the coexpressed genes from GTEx data.</p><p><strong>Conclusion: </strong>We conclude that system-level coexpression is a novel and interesting coexpression metric capable of generating plausible predictions and biological hypotheses; and that CoGTEx is a valuable resource to view, compare, and download system- and tissue- level coexpression estimations from GTEx data.</p><p><strong>Availability: </strong>The web resource is available at http://bioinformatics.mx/cogtex.</p>\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0309961\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0309961","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

动机共表达估计有助于分析途径、辅助因子、调节因子、靶标以及人类健康和疾病。理想情况下,共表达估算应考虑尽可能多的不同细胞类型,并考虑到不同组织的可用数据并不一致。重要的是,目前可获得的共表达估计是在 "组织水平 "上进行的,是基于细胞类型的标准化配方。对整体基因表达水平的关注很少或根本不关注。组织水平估算假设方差表达水平比平均表达水平更重要。在这里,我们挑战了这一假设,在 "系统水平 "上估算了共表达计算,这种估算没有按组织进行标准化,并证明它提供了有价值的信息。我们提供了一种资源,可用于查看、下载和分析来自 GTEx 人类数据的组织和系统水平的共表达估计值:方法:对 GTEx v8 表达数据进行全局归一化、批处理和过滤。然后,应用 PCA、聚类和 tSNE 严格程序生成 42 个不同的、经过策划的组织集群。为了避免组织代表性过高,每个组织簇取样 70 个样本,计算 33,445 个基因的相关性,从而估算出这 42 个组织簇的共表达。这一过程重复 20 次,提取最小值作为稳健估计值。在系统级(TPM标度)和组织级(z-score标度)两种数据处理模式下计算了三个指标(Pearson、Spearman和G-statistic):我们首先与其他数据库进行了比较,验证了我们的组织级估算。然后,通过对几个实例的具体分析和文献预测的验证,我们表明系统级共表达估算与组织级估算不同,两者都包含反映生物通路的有价值信息。我们还表明,共表达估计与转录调控有关。最后,我们介绍了CoGTEx,这是一种从GTEx v8数据中查看和分析人类成人组织中共表达基因的宝贵资源。我们介绍了从 GTEx 数据中列出、查看和探索共表达基因的网络资源:我们的结论是:系统级共表达是一种新颖而有趣的共表达指标,能够产生可信的预测和生物学假设;CoGTEx 是一种宝贵的资源,可用于查看、比较和下载 GTEx 数据中的系统级和组织级共表达估计值:该网络资源可在 http://bioinformatics.mx/cogtex 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CoGTEx: Unscaled system-level coexpression estimation from GTEx data forecast novel functional gene partners.

Motivation: Coexpression estimations are helpful for analysis of pathways, cofactors, regulators, targets, and human health and disease. Ideally, coexpression estimations should consider as many diverse cell types as possible and consider that available data is not uniform across tissues. Importantly, the coexpression estimations accessible today are performed on a "tissue level", which is based on cell type standardized formulations. Little or no attention is paid to overall gene expression levels. The tissue-level estimation assumes that variance expression levels are more important than mean expression levels. Here, we challenge this assumption by estimating a coexpression calculation at the "system level", which is estimated without standardization by tissue, and show that it provides valuable information. We made available a resource to view, download, and analyze both, tissue- and system-level coexpression estimations from GTEx human data.

Methods: GTEx v8 expression data was globally normalized, batch-processed, and filtered. Then, PCA, clustering, and tSNE stringent procedures were applied to generate 42 distinct and curated tissue clusters. Coexpression was estimated from these 42 tissue clusters computing the correlation of 33,445 genes by sampling 70 samples per tissue cluster to avoid tissue overrepresentation. This process was repeated 20 times, extracting the minimum value provided as a robust estimation. Three metrics were calculated (Pearson, Spearman, and G-statistic) in two data processing modes, at the system-level (TPM scale) and tissue levels (z-score scale).

Results: We first validate our tissue-level estimations compared with other databases. Then, by specific analyses in several examples and literature validations of predictions, we show that system-level coexpression estimation differs from tissue-level estimations and that both contain valuable information reflected in biological pathways. We also show that coexpression estimations are associated to transcriptional regulation. Finally, we present CoGTEx, a valuable resource for viewing and analyzing coexpressed genes in human adult tissues from GTEx v8 data. We introduce our web resource to list, view and explore the coexpressed genes from GTEx data.

Conclusion: We conclude that system-level coexpression is a novel and interesting coexpression metric capable of generating plausible predictions and biological hypotheses; and that CoGTEx is a valuable resource to view, compare, and download system- and tissue- level coexpression estimations from GTEx data.

Availability: The web resource is available at http://bioinformatics.mx/cogtex.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
PLoS ONE
PLoS ONE 生物-生物学
CiteScore
6.20
自引率
5.40%
发文量
14242
审稿时长
3.7 months
期刊介绍: PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage
期刊最新文献
A mathematical model of visceral leishmaniasis transmission and control: Impact of ITNs on VL prevention and elimination in the Indian subcontinent. A new amino acid substitution in the MvALS1 gene of metsulfuron-methyl resistant biotypes Monochoria vaginalis (Burm. f.) C. Presl from West Java, Indonesia. A novel mean shape based post-processing method for enhancing deep learning lower-limb muscle segmentation accuracy. Association between triglyceride and depression: A systematic review and meta-analysis. Avidity sequencing of whole genomes from retinal degeneration pedigrees identifies causal variants.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1