单细胞RNA测序数据缺失值恢复

Wenjuan Zhang, William Yang, J. Talburt, S. Weissman, Mary Q. Yang
{"title":"单细胞RNA测序数据缺失值恢复","authors":"Wenjuan Zhang, William Yang, J. Talburt, S. Weissman, Mary Q. Yang","doi":"10.1109/CSCI54926.2021.00129","DOIUrl":null,"url":null,"abstract":"The emergence of single-cell sequencing technologies has enabled the production of high-resolution data at the individual cell level, providing unprecedented opportunities to capture cell population diversity and dissect the cellular heterogeneity of complex diseases. At the same time, relatively high biological and technical noise poses new challenges for single-cell data analysis. The single-cell RNA sequencing (scRNA-seq) data often contains substantial missing values due to gene dropout events. Here, we developed a convolutional neural network based model to recover missing values for scRNA-seq data. We first calculated the probability of dropout employing gamma-normal expectation maximum algorithm. Unlike most existing approaches, our model only recovered the expression values that have a dropout probability larger than a threshold. The mean square error and Pearson correlation coefficient were used to assess the accuracy of predicted expression values. The purity and entropy were computed to measure the homogeneity of cell clusters using imputed gene expression profiles. Across various scRNAseq datasets, our model demonstrated robust performance and achieved comparable or better results compared to the other imputation methods.","PeriodicalId":206881,"journal":{"name":"2021 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Missing Value Recovery for Single Cell RNA Sequencing Data\",\"authors\":\"Wenjuan Zhang, William Yang, J. Talburt, S. Weissman, Mary Q. Yang\",\"doi\":\"10.1109/CSCI54926.2021.00129\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The emergence of single-cell sequencing technologies has enabled the production of high-resolution data at the individual cell level, providing unprecedented opportunities to capture cell population diversity and dissect the cellular heterogeneity of complex diseases. At the same time, relatively high biological and technical noise poses new challenges for single-cell data analysis. The single-cell RNA sequencing (scRNA-seq) data often contains substantial missing values due to gene dropout events. Here, we developed a convolutional neural network based model to recover missing values for scRNA-seq data. We first calculated the probability of dropout employing gamma-normal expectation maximum algorithm. Unlike most existing approaches, our model only recovered the expression values that have a dropout probability larger than a threshold. The mean square error and Pearson correlation coefficient were used to assess the accuracy of predicted expression values. The purity and entropy were computed to measure the homogeneity of cell clusters using imputed gene expression profiles. Across various scRNAseq datasets, our model demonstrated robust performance and achieved comparable or better results compared to the other imputation methods.\",\"PeriodicalId\":206881,\"journal\":{\"name\":\"2021 International Conference on Computational Science and Computational Intelligence (CSCI)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computational Science and Computational Intelligence (CSCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSCI54926.2021.00129\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computational Science and Computational Intelligence (CSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCI54926.2021.00129","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

单细胞测序技术的出现使得在单个细胞水平上产生高分辨率数据成为可能,为捕捉细胞群体多样性和剖析复杂疾病的细胞异质性提供了前所未有的机会。同时,较高的生物噪声和技术噪声对单细胞数据分析提出了新的挑战。单细胞RNA测序(scRNA-seq)数据通常由于基因脱落事件而包含大量缺失值。在这里,我们开发了一个基于卷积神经网络的模型来恢复scRNA-seq数据的缺失值。我们首先用伽玛正态期望最大值算法计算了辍学概率。与大多数现有的方法不同,我们的模型只恢复具有大于阈值的丢弃概率的表达式值。采用均方误差和Pearson相关系数评价预测值的准确性。通过计算纯度和熵,利用输入的基因表达谱来测量细胞簇的均匀性。在不同的scRNAseq数据集上,我们的模型表现出了稳健的性能,并且与其他估算方法相比取得了相当或更好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Missing Value Recovery for Single Cell RNA Sequencing Data
The emergence of single-cell sequencing technologies has enabled the production of high-resolution data at the individual cell level, providing unprecedented opportunities to capture cell population diversity and dissect the cellular heterogeneity of complex diseases. At the same time, relatively high biological and technical noise poses new challenges for single-cell data analysis. The single-cell RNA sequencing (scRNA-seq) data often contains substantial missing values due to gene dropout events. Here, we developed a convolutional neural network based model to recover missing values for scRNA-seq data. We first calculated the probability of dropout employing gamma-normal expectation maximum algorithm. Unlike most existing approaches, our model only recovered the expression values that have a dropout probability larger than a threshold. The mean square error and Pearson correlation coefficient were used to assess the accuracy of predicted expression values. The purity and entropy were computed to measure the homogeneity of cell clusters using imputed gene expression profiles. Across various scRNAseq datasets, our model demonstrated robust performance and achieved comparable or better results compared to the other imputation methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Remote Video Surveillance Effects of Social Distancing Intention, Affective Risk Perception, and Cabin Fever Syndrome on Perceived Value of E-learning : Type of submission: Late Breaking Paper / Most relevant symposium: CSCI-ISED Cybersecurity Integration: Deploying Critical Infrastructure Security and Resilience Topics into the Undergraduate Curriculum Distributed Algorithms for k-Coverage in Mobile Sensor Networks Software Development Methodologies for Virtual Reality
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1