用于绘制人类基因组规模代谢网络转录组图谱的 RNA-seq 数据归一化方法基准。

IF 3.5 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY NPJ Systems Biology and Applications Pub Date : 2024-10-24 DOI:10.1038/s41540-024-00448-z

Hatice Büşra Lüleci, Dilara Uzuner, Müberra Fatma Cesur, Atılay İlgün, Elif Düz, Ecehan Abdik, Regan Odongo, Tunahan Çakır

{"title":"用于绘制人类基因组规模代谢网络转录组图谱的 RNA-seq 数据归一化方法基准。","authors":"Hatice Büşra Lüleci, Dilara Uzuner, Müberra Fatma Cesur, Atılay İlgün, Elif Düz, Ecehan Abdik, Regan Odongo, Tunahan Çakır","doi":"10.1038/s41540-024-00448-z","DOIUrl":null,"url":null,"abstract":"Genome-scale metabolic models (GEMs) cover the entire list of metabolic genes in an organism and associated reactions, in a tissue/condition non-specific manner. RNA-seq provides crucial information to make the GEMs condition-specific. Integrative Metabolic Analysis Tool (iMAT) and Integrative Network Inference for Tissues (INIT) are the two most popular algorithms to create condition-specific GEMs from human transcriptome data. The normalization method of choice for raw RNA-seq count data affects the model content produced by these algorithms and their predictive accuracy. However, a benchmark of the RNA-seq normalization methods on the performance of iMAT and INIT algorithms is missing in the literature. Another important phenomenon is covariates such as age and gender in a dataset, and they can affect the predictivity of analysis. In this study, we aimed to compare five different RNA-seq data normalization methods (TPM, FPKM, TMM, GeTMM, and RLE) and covariate adjusted versions of the normalized data by mapping them on a human GEM using the iMAT and INIT algorithms to generate personalized metabolic models. We used RNA-seq data for Alzheimer's disease (AD) and lung adenocarcinoma (LUAD) patients. The results demonstrated that RNA-seq data normalized by the RLE, TMM, or GeTMM methods enabled the production of condition-specific metabolic models with considerably low variability in terms of the number of active reactions compared to the within-sample normalization methods (FPKM, TPM). Using these models, we could more accurately capture the disease-associated genes (average accuracy of ~0.80 for AD and ~0.67 for LUAD) for the RLE, TMM, and GeTMM normalization methods. An increase in the accuracies was observed for all the methods when covariate adjustment was applied. We found a similar accuracy trend when we compared the metabolites of perturbed reactions to metabolome data for AD. Together, our benchmark study shows that the between-sample RNA-seq normalization methods reduce false positive predictions at the expense of missing some true positive genes when mapped on GEMs.","PeriodicalId":19345,"journal":{"name":"NPJ Systems Biology and Applications","volume":"10 1","pages":"124"},"PeriodicalIF":3.5000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502818/pdf/","citationCount":"0","resultStr":"{\"title\":\"A benchmark of RNA-seq data normalization methods for transcriptome mapping on human genome-scale metabolic networks.\",\"authors\":\"Hatice Büşra Lüleci, Dilara Uzuner, Müberra Fatma Cesur, Atılay İlgün, Elif Düz, Ecehan Abdik, Regan Odongo, Tunahan Çakır\",\"doi\":\"10.1038/s41540-024-00448-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Genome-scale metabolic models (GEMs) cover the entire list of metabolic genes in an organism and associated reactions, in a tissue/condition non-specific manner. RNA-seq provides crucial information to make the GEMs condition-specific. Integrative Metabolic Analysis Tool (iMAT) and Integrative Network Inference for Tissues (INIT) are the two most popular algorithms to create condition-specific GEMs from human transcriptome data. The normalization method of choice for raw RNA-seq count data affects the model content produced by these algorithms and their predictive accuracy. However, a benchmark of the RNA-seq normalization methods on the performance of iMAT and INIT algorithms is missing in the literature. Another important phenomenon is covariates such as age and gender in a dataset, and they can affect the predictivity of analysis. In this study, we aimed to compare five different RNA-seq data normalization methods (TPM, FPKM, TMM, GeTMM, and RLE) and covariate adjusted versions of the normalized data by mapping them on a human GEM using the iMAT and INIT algorithms to generate personalized metabolic models. We used RNA-seq data for Alzheimer's disease (AD) and lung adenocarcinoma (LUAD) patients. The results demonstrated that RNA-seq data normalized by the RLE, TMM, or GeTMM methods enabled the production of condition-specific metabolic models with considerably low variability in terms of the number of active reactions compared to the within-sample normalization methods (FPKM, TPM). Using these models, we could more accurately capture the disease-associated genes (average accuracy of ~0.80 for AD and ~0.67 for LUAD) for the RLE, TMM, and GeTMM normalization methods. An increase in the accuracies was observed for all the methods when covariate adjustment was applied. We found a similar accuracy trend when we compared the metabolites of perturbed reactions to metabolome data for AD. Together, our benchmark study shows that the between-sample RNA-seq normalization methods reduce false positive predictions at the expense of missing some true positive genes when mapped on GEMs.\",\"PeriodicalId\":19345,\"journal\":{\"name\":\"NPJ Systems Biology and Applications\",\"volume\":\"10 1\",\"pages\":\"124\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502818/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"NPJ Systems Biology and Applications\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1038/s41540-024-00448-z\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Systems Biology and Applications","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41540-024-00448-z","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

基因组尺度代谢模型（GEM）以组织/条件非特异性的方式涵盖了生物体内的全部代谢基因及相关反应。RNA-seq 为 GEMs 的条件特异性提供了关键信息。整合代谢分析工具（iMAT）和组织整合网络推断（INIT）是利用人类转录组数据创建条件特异性 GEM 的两种最常用算法。原始 RNA-seq 计数数据的归一化方法会影响这些算法生成的模型内容及其预测准确性。然而，文献中还缺少 RNA-seq 归一化方法对 iMAT 和 INIT 算法性能的基准测试。另一个重要现象是数据集中的年龄和性别等协变量，它们会影响分析的预测性。在本研究中，我们旨在比较五种不同的 RNA-seq 数据归一化方法（TPM、FPKM、TMM、GeTMM 和 RLE）以及归一化数据的协变量调整版本，通过使用 iMAT 和 INIT 算法将它们映射到人类 GEM 上，从而生成个性化的代谢模型。我们使用了阿尔茨海默病（AD）和肺腺癌（LUAD）患者的 RNA-seq 数据。结果表明，与样本内归一化方法（FPKM、TPM）相比，通过RLE、TMM或GeTMM方法归一化的RNA-seq数据在活性反应数量方面的变异性要低得多，从而能够生成针对特定病情的代谢模型。利用这些模型，我们可以更准确地捕获 RLE、TMM 和 GeTMM 归一化方法的疾病相关基因（AD 的平均准确率约为 0.80，LUAD 的平均准确率约为 0.67）。在应用协变量调整时，所有方法的准确率都有所提高。当我们将扰动反应的代谢物与 AD 代谢组数据进行比较时，也发现了类似的准确性趋势。总之，我们的基准研究表明，样本间 RNA-seq 归一化方法可以减少假阳性预测，但会遗漏一些映射到 GEM 上的真阳性基因。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A benchmark of RNA-seq data normalization methods for transcriptome mapping on human genome-scale metabolic networks.

Genome-scale metabolic models (GEMs) cover the entire list of metabolic genes in an organism and associated reactions, in a tissue/condition non-specific manner. RNA-seq provides crucial information to make the GEMs condition-specific. Integrative Metabolic Analysis Tool (iMAT) and Integrative Network Inference for Tissues (INIT) are the two most popular algorithms to create condition-specific GEMs from human transcriptome data. The normalization method of choice for raw RNA-seq count data affects the model content produced by these algorithms and their predictive accuracy. However, a benchmark of the RNA-seq normalization methods on the performance of iMAT and INIT algorithms is missing in the literature. Another important phenomenon is covariates such as age and gender in a dataset, and they can affect the predictivity of analysis. In this study, we aimed to compare five different RNA-seq data normalization methods (TPM, FPKM, TMM, GeTMM, and RLE) and covariate adjusted versions of the normalized data by mapping them on a human GEM using the iMAT and INIT algorithms to generate personalized metabolic models. We used RNA-seq data for Alzheimer's disease (AD) and lung adenocarcinoma (LUAD) patients. The results demonstrated that RNA-seq data normalized by the RLE, TMM, or GeTMM methods enabled the production of condition-specific metabolic models with considerably low variability in terms of the number of active reactions compared to the within-sample normalization methods (FPKM, TPM). Using these models, we could more accurately capture the disease-associated genes (average accuracy of ~0.80 for AD and ~0.67 for LUAD) for the RLE, TMM, and GeTMM normalization methods. An increase in the accuracies was observed for all the methods when covariate adjustment was applied. We found a similar accuracy trend when we compared the metabolites of perturbed reactions to metabolome data for AD. Together, our benchmark study shows that the between-sample RNA-seq normalization methods reduce false positive predictions at the expense of missing some true positive genes when mapped on GEMs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

NPJ Systems Biology and Applications Mathematics-Applied Mathematics

CiteScore

5.80

自引率

0.00%

发文量

审稿时长

8 weeks

期刊介绍： npj Systems Biology and Applications is an online Open Access journal dedicated to publishing the premier research that takes a systems-oriented approach. The journal aims to provide a forum for the presentation of articles that help define this nascent field, as well as those that apply the advances to wider fields. We encourage studies that integrate, or aid the integration of, data, analyses and insight from molecules to organisms and broader systems. Important areas of interest include not only fundamental biological systems and drug discovery, but also applications to health, medical practice and implementation, big data, biotechnology, food science, human behaviour, broader biological systems and industrial applications of systems biology. We encourage all approaches, including network biology, application of control theory to biological systems, computational modelling and analysis, comprehensive and/or high-content measurements, theoretical, analytical and computational studies of system-level properties of biological systems and computational/software/data platforms enabling such studies.