Abstract 198: Quantifying miRNA activity in single cell clusters

Gulden Olgun, Vishaka Gopalan, S. Hannenhalli
{"title":"Abstract 198: Quantifying miRNA activity in single cell clusters","authors":"Gulden Olgun, Vishaka Gopalan, S. Hannenhalli","doi":"10.1158/1538-7445.AM2021-198","DOIUrl":null,"url":null,"abstract":"Background: MicroRNAs are small noncoding RNAs that mediate gene regulation at the post-transcriptional level via multiple mechanisms such as mRNA degradation, translational inhibition, and mRNA stabilization. They are involved in several cellular processes from development to homeostasis, and their deregulation is implicated in several diseases, including cancer. Since miRNA lacks the polyA tail, the standard single cell RNAseq protocols do not capture miRNAs, thus severely limiting our understanding of miRNA functions at cellular resolution. To overcome this limitation, we develop a novel machine learning method to infer the miRNA activity in a sample given its RNAseq profile. Methods: We develop a model using XGBoost, to predict miRNA profile in a sample from its global mRNA profile. We train and test the model using cross validation in the CCLE collection, as well as a number of healthy and cancer human tissue data obtained from GTEx and TCGA. We quantify the method9s performance as the correlation between actual and predicted miRNA expression values across the test samples. We validate our model in multiple single cell datasets where miRNA and mRNA profiles are available for the same cell types by assessing the model9s ability to identify cell type specific miRNAs based on a model trained on independent bulk datasets. Results: First, we show in CCLE collection, and multiple TCGA tissues, that a model based on all genes was far more accurate than the model based only on known targets. Our mean cross validation model accuracy across 10 tissues having greater than 100 paired miRNA and mRNA samples (in terms of Spearman correlation between predicted and actual expression of a miRNA) is 0.45 (min 0.39 in Pancreas to a max of 0.51 in Brain). In comparison to the normal tissues in GTEx, in the malignant counterpart in TCGA, due to greater heterogeneity, therefore greater variability in gene expression, our model performs significantly better (average cross validation accuracy improvement of 0.19). We have validated our model in independent single cell data. Using a model trained in bulk tissue data, we predict microRNA expression levels in a single cell based on the single cell RNA and compare our predicted fold difference by a miRNA9s expression between two cell types with the actual fold difference. We quantify the prediction accuracy as the correlation between the predicted and actual fold differences across all miRNAs. In a total of 4 cell type pair comparisons (different sets of kidney, brain, breast, and skin), our model achieves an average accuracy of 0.81 (ranging from 0.73 to 0.89), thus strongly validating our model. Our next step is to apply our model to study miRNA activities during T cell development, Pancreatic Ductal Adenocarcinoma, and Glioblastoma, in collaboration with experimentalists. Conclusions: Our method addresses a major bottleneck in studying miRNA activities at a cellular resolution and can be applied to any scRNA data to infer miRNA activity. Citation Format: Gulden Olgun, Vishaka Gopalan, Sridhar Hannenhalli. Quantifying miRNA activity in single cell clusters [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 198.","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of bioinformatics and systems biology : Open access","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1158/1538-7445.AM2021-198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: MicroRNAs are small noncoding RNAs that mediate gene regulation at the post-transcriptional level via multiple mechanisms such as mRNA degradation, translational inhibition, and mRNA stabilization. They are involved in several cellular processes from development to homeostasis, and their deregulation is implicated in several diseases, including cancer. Since miRNA lacks the polyA tail, the standard single cell RNAseq protocols do not capture miRNAs, thus severely limiting our understanding of miRNA functions at cellular resolution. To overcome this limitation, we develop a novel machine learning method to infer the miRNA activity in a sample given its RNAseq profile. Methods: We develop a model using XGBoost, to predict miRNA profile in a sample from its global mRNA profile. We train and test the model using cross validation in the CCLE collection, as well as a number of healthy and cancer human tissue data obtained from GTEx and TCGA. We quantify the method9s performance as the correlation between actual and predicted miRNA expression values across the test samples. We validate our model in multiple single cell datasets where miRNA and mRNA profiles are available for the same cell types by assessing the model9s ability to identify cell type specific miRNAs based on a model trained on independent bulk datasets. Results: First, we show in CCLE collection, and multiple TCGA tissues, that a model based on all genes was far more accurate than the model based only on known targets. Our mean cross validation model accuracy across 10 tissues having greater than 100 paired miRNA and mRNA samples (in terms of Spearman correlation between predicted and actual expression of a miRNA) is 0.45 (min 0.39 in Pancreas to a max of 0.51 in Brain). In comparison to the normal tissues in GTEx, in the malignant counterpart in TCGA, due to greater heterogeneity, therefore greater variability in gene expression, our model performs significantly better (average cross validation accuracy improvement of 0.19). We have validated our model in independent single cell data. Using a model trained in bulk tissue data, we predict microRNA expression levels in a single cell based on the single cell RNA and compare our predicted fold difference by a miRNA9s expression between two cell types with the actual fold difference. We quantify the prediction accuracy as the correlation between the predicted and actual fold differences across all miRNAs. In a total of 4 cell type pair comparisons (different sets of kidney, brain, breast, and skin), our model achieves an average accuracy of 0.81 (ranging from 0.73 to 0.89), thus strongly validating our model. Our next step is to apply our model to study miRNA activities during T cell development, Pancreatic Ductal Adenocarcinoma, and Glioblastoma, in collaboration with experimentalists. Conclusions: Our method addresses a major bottleneck in studying miRNA activities at a cellular resolution and can be applied to any scRNA data to infer miRNA activity. Citation Format: Gulden Olgun, Vishaka Gopalan, Sridhar Hannenhalli. Quantifying miRNA activity in single cell clusters [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 198.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
198:单细胞簇miRNA活性的定量分析
背景:MicroRNAs是一种小的非编码rna,通过mRNA降解、翻译抑制和mRNA稳定等多种机制在转录后水平介导基因调控。它们参与了从发育到体内平衡的几个细胞过程,它们的失调与包括癌症在内的几种疾病有关。由于miRNA缺乏polyA尾部,标准的单细胞RNAseq协议不能捕获miRNA,从而严重限制了我们在细胞分辨率上对miRNA功能的理解。为了克服这一限制,我们开发了一种新的机器学习方法来推断样品中给定RNAseq谱的miRNA活性。方法:我们使用XGBoost开发了一个模型,从样本的全局mRNA谱中预测miRNA谱。我们在CCLE收集中使用交叉验证来训练和测试模型,以及从GTEx和TCGA获得的许多健康和癌症人体组织数据。我们将方法的性能量化为测试样本中实际miRNA表达值和预测miRNA表达值之间的相关性。我们在多个单细胞数据集中验证了我们的模型,其中miRNA和mRNA图谱可用于相同的细胞类型,通过评估模型识别基于独立批量数据集训练的模型的细胞类型特异性miRNA的能力。结果:首先,我们在CCLE收集和多个TCGA组织中发现,基于所有基因的模型远比仅基于已知靶标的模型准确。我们的交叉验证模型在超过100个配对miRNA和mRNA样本的10个组织中的平均准确性(根据预测和实际miRNA表达之间的Spearman相关性)为0.45(胰腺最小0.39,大脑最大0.51)。与GTEx中的正常组织相比,在TCGA中的恶性组织中,由于更大的异质性,因此基因表达的变异性更大,我们的模型表现明显更好(平均交叉验证精度提高0.19)。我们已经在独立的单细胞数据中验证了我们的模型。使用在大量组织数据中训练的模型,我们基于单细胞RNA预测单细胞中的microRNA表达水平,并将我们预测的miRNA9s表达在两种细胞类型之间的折叠差异与实际折叠差异进行比较。我们将预测准确性量化为所有mirna中预测和实际折叠差异之间的相关性。在总共4个细胞类型对比较(不同组的肾、脑、乳腺和皮肤)中,我们的模型达到了0.81的平均精度(范围从0.73到0.89),从而有力地验证了我们的模型。我们的下一步是与实验人员合作,将我们的模型应用于研究T细胞发育、胰腺导管腺癌和胶质母细胞瘤过程中的miRNA活性。结论:我们的方法解决了在细胞分辨率下研究miRNA活性的主要瓶颈,可以应用于任何scRNA数据来推断miRNA活性。引文格式:Gulden Olgun, Vishaka Gopalan, Sridhar Hannenhalli。单细胞簇中miRNA活性的定量分析[摘要]。见:美国癌症研究协会2021年年会论文集;2021年4月10日至15日和5月17日至21日。费城(PA): AACR;癌症杂志,2021;81(13 -增刊):摘要第198期。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Linear Regression of Sampling Distributions of the Mean. Transcriptional and Translational Regulation of Differentially Expressed Genes in Yucatan Miniswine Brain Tissues following Traumatic Brain Injury. The Growing Liberality Observed in Primary Animal and Plant Cultures is Common to the Social Amoeba. Role of Transcription Factors and MicroRNAs in Regulating Fibroblast Reprogramming in Wound Healing. CDKs Functional Analysis in Low Proliferating Early-Stage Pancreatic Ductal Adenocarcinoma.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1