{"title":"用于绘制人类基因组规模代谢网络转录组图谱的 RNA-seq 数据归一化方法基准。","authors":"Hatice Büşra Lüleci, Dilara Uzuner, Müberra Fatma Cesur, Atılay İlgün, Elif Düz, Ecehan Abdik, Regan Odongo, Tunahan Çakır","doi":"10.1038/s41540-024-00448-z","DOIUrl":null,"url":null,"abstract":"<p><p>Genome-scale metabolic models (GEMs) cover the entire list of metabolic genes in an organism and associated reactions, in a tissue/condition non-specific manner. RNA-seq provides crucial information to make the GEMs condition-specific. Integrative Metabolic Analysis Tool (iMAT) and Integrative Network Inference for Tissues (INIT) are the two most popular algorithms to create condition-specific GEMs from human transcriptome data. The normalization method of choice for raw RNA-seq count data affects the model content produced by these algorithms and their predictive accuracy. However, a benchmark of the RNA-seq normalization methods on the performance of iMAT and INIT algorithms is missing in the literature. Another important phenomenon is covariates such as age and gender in a dataset, and they can affect the predictivity of analysis. In this study, we aimed to compare five different RNA-seq data normalization methods (TPM, FPKM, TMM, GeTMM, and RLE) and covariate adjusted versions of the normalized data by mapping them on a human GEM using the iMAT and INIT algorithms to generate personalized metabolic models. We used RNA-seq data for Alzheimer's disease (AD) and lung adenocarcinoma (LUAD) patients. The results demonstrated that RNA-seq data normalized by the RLE, TMM, or GeTMM methods enabled the production of condition-specific metabolic models with considerably low variability in terms of the number of active reactions compared to the within-sample normalization methods (FPKM, TPM). Using these models, we could more accurately capture the disease-associated genes (average accuracy of ~0.80 for AD and ~0.67 for LUAD) for the RLE, TMM, and GeTMM normalization methods. An increase in the accuracies was observed for all the methods when covariate adjustment was applied. We found a similar accuracy trend when we compared the metabolites of perturbed reactions to metabolome data for AD. Together, our benchmark study shows that the between-sample RNA-seq normalization methods reduce false positive predictions at the expense of missing some true positive genes when mapped on GEMs.</p>","PeriodicalId":19345,"journal":{"name":"NPJ Systems Biology and Applications","volume":"10 1","pages":"124"},"PeriodicalIF":3.5000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502818/pdf/","citationCount":"0","resultStr":"{\"title\":\"A benchmark of RNA-seq data normalization methods for transcriptome mapping on human genome-scale metabolic networks.\",\"authors\":\"Hatice Büşra Lüleci, Dilara Uzuner, Müberra Fatma Cesur, Atılay İlgün, Elif Düz, Ecehan Abdik, Regan Odongo, Tunahan Çakır\",\"doi\":\"10.1038/s41540-024-00448-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Genome-scale metabolic models (GEMs) cover the entire list of metabolic genes in an organism and associated reactions, in a tissue/condition non-specific manner. RNA-seq provides crucial information to make the GEMs condition-specific. Integrative Metabolic Analysis Tool (iMAT) and Integrative Network Inference for Tissues (INIT) are the two most popular algorithms to create condition-specific GEMs from human transcriptome data. The normalization method of choice for raw RNA-seq count data affects the model content produced by these algorithms and their predictive accuracy. However, a benchmark of the RNA-seq normalization methods on the performance of iMAT and INIT algorithms is missing in the literature. Another important phenomenon is covariates such as age and gender in a dataset, and they can affect the predictivity of analysis. In this study, we aimed to compare five different RNA-seq data normalization methods (TPM, FPKM, TMM, GeTMM, and RLE) and covariate adjusted versions of the normalized data by mapping them on a human GEM using the iMAT and INIT algorithms to generate personalized metabolic models. We used RNA-seq data for Alzheimer's disease (AD) and lung adenocarcinoma (LUAD) patients. The results demonstrated that RNA-seq data normalized by the RLE, TMM, or GeTMM methods enabled the production of condition-specific metabolic models with considerably low variability in terms of the number of active reactions compared to the within-sample normalization methods (FPKM, TPM). Using these models, we could more accurately capture the disease-associated genes (average accuracy of ~0.80 for AD and ~0.67 for LUAD) for the RLE, TMM, and GeTMM normalization methods. An increase in the accuracies was observed for all the methods when covariate adjustment was applied. We found a similar accuracy trend when we compared the metabolites of perturbed reactions to metabolome data for AD. Together, our benchmark study shows that the between-sample RNA-seq normalization methods reduce false positive predictions at the expense of missing some true positive genes when mapped on GEMs.</p>\",\"PeriodicalId\":19345,\"journal\":{\"name\":\"NPJ Systems Biology and Applications\",\"volume\":\"10 1\",\"pages\":\"124\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502818/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"NPJ Systems Biology and Applications\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1038/s41540-024-00448-z\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Systems Biology and Applications","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41540-024-00448-z","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
A benchmark of RNA-seq data normalization methods for transcriptome mapping on human genome-scale metabolic networks.
Genome-scale metabolic models (GEMs) cover the entire list of metabolic genes in an organism and associated reactions, in a tissue/condition non-specific manner. RNA-seq provides crucial information to make the GEMs condition-specific. Integrative Metabolic Analysis Tool (iMAT) and Integrative Network Inference for Tissues (INIT) are the two most popular algorithms to create condition-specific GEMs from human transcriptome data. The normalization method of choice for raw RNA-seq count data affects the model content produced by these algorithms and their predictive accuracy. However, a benchmark of the RNA-seq normalization methods on the performance of iMAT and INIT algorithms is missing in the literature. Another important phenomenon is covariates such as age and gender in a dataset, and they can affect the predictivity of analysis. In this study, we aimed to compare five different RNA-seq data normalization methods (TPM, FPKM, TMM, GeTMM, and RLE) and covariate adjusted versions of the normalized data by mapping them on a human GEM using the iMAT and INIT algorithms to generate personalized metabolic models. We used RNA-seq data for Alzheimer's disease (AD) and lung adenocarcinoma (LUAD) patients. The results demonstrated that RNA-seq data normalized by the RLE, TMM, or GeTMM methods enabled the production of condition-specific metabolic models with considerably low variability in terms of the number of active reactions compared to the within-sample normalization methods (FPKM, TPM). Using these models, we could more accurately capture the disease-associated genes (average accuracy of ~0.80 for AD and ~0.67 for LUAD) for the RLE, TMM, and GeTMM normalization methods. An increase in the accuracies was observed for all the methods when covariate adjustment was applied. We found a similar accuracy trend when we compared the metabolites of perturbed reactions to metabolome data for AD. Together, our benchmark study shows that the between-sample RNA-seq normalization methods reduce false positive predictions at the expense of missing some true positive genes when mapped on GEMs.
期刊介绍:
npj Systems Biology and Applications is an online Open Access journal dedicated to publishing the premier research that takes a systems-oriented approach. The journal aims to provide a forum for the presentation of articles that help define this nascent field, as well as those that apply the advances to wider fields. We encourage studies that integrate, or aid the integration of, data, analyses and insight from molecules to organisms and broader systems. Important areas of interest include not only fundamental biological systems and drug discovery, but also applications to health, medical practice and implementation, big data, biotechnology, food science, human behaviour, broader biological systems and industrial applications of systems biology.
We encourage all approaches, including network biology, application of control theory to biological systems, computational modelling and analysis, comprehensive and/or high-content measurements, theoretical, analytical and computational studies of system-level properties of biological systems and computational/software/data platforms enabling such studies.