Wenjuan Zhang, William Yang, J. Talburt, S. Weissman, Mary Q. Yang
{"title":"单细胞RNA测序数据缺失值恢复","authors":"Wenjuan Zhang, William Yang, J. Talburt, S. Weissman, Mary Q. Yang","doi":"10.1109/CSCI54926.2021.00129","DOIUrl":null,"url":null,"abstract":"The emergence of single-cell sequencing technologies has enabled the production of high-resolution data at the individual cell level, providing unprecedented opportunities to capture cell population diversity and dissect the cellular heterogeneity of complex diseases. At the same time, relatively high biological and technical noise poses new challenges for single-cell data analysis. The single-cell RNA sequencing (scRNA-seq) data often contains substantial missing values due to gene dropout events. Here, we developed a convolutional neural network based model to recover missing values for scRNA-seq data. We first calculated the probability of dropout employing gamma-normal expectation maximum algorithm. Unlike most existing approaches, our model only recovered the expression values that have a dropout probability larger than a threshold. The mean square error and Pearson correlation coefficient were used to assess the accuracy of predicted expression values. The purity and entropy were computed to measure the homogeneity of cell clusters using imputed gene expression profiles. Across various scRNAseq datasets, our model demonstrated robust performance and achieved comparable or better results compared to the other imputation methods.","PeriodicalId":206881,"journal":{"name":"2021 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Missing Value Recovery for Single Cell RNA Sequencing Data\",\"authors\":\"Wenjuan Zhang, William Yang, J. Talburt, S. Weissman, Mary Q. Yang\",\"doi\":\"10.1109/CSCI54926.2021.00129\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The emergence of single-cell sequencing technologies has enabled the production of high-resolution data at the individual cell level, providing unprecedented opportunities to capture cell population diversity and dissect the cellular heterogeneity of complex diseases. At the same time, relatively high biological and technical noise poses new challenges for single-cell data analysis. The single-cell RNA sequencing (scRNA-seq) data often contains substantial missing values due to gene dropout events. Here, we developed a convolutional neural network based model to recover missing values for scRNA-seq data. We first calculated the probability of dropout employing gamma-normal expectation maximum algorithm. Unlike most existing approaches, our model only recovered the expression values that have a dropout probability larger than a threshold. The mean square error and Pearson correlation coefficient were used to assess the accuracy of predicted expression values. The purity and entropy were computed to measure the homogeneity of cell clusters using imputed gene expression profiles. Across various scRNAseq datasets, our model demonstrated robust performance and achieved comparable or better results compared to the other imputation methods.\",\"PeriodicalId\":206881,\"journal\":{\"name\":\"2021 International Conference on Computational Science and Computational Intelligence (CSCI)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computational Science and Computational Intelligence (CSCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSCI54926.2021.00129\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computational Science and Computational Intelligence (CSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCI54926.2021.00129","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Missing Value Recovery for Single Cell RNA Sequencing Data
The emergence of single-cell sequencing technologies has enabled the production of high-resolution data at the individual cell level, providing unprecedented opportunities to capture cell population diversity and dissect the cellular heterogeneity of complex diseases. At the same time, relatively high biological and technical noise poses new challenges for single-cell data analysis. The single-cell RNA sequencing (scRNA-seq) data often contains substantial missing values due to gene dropout events. Here, we developed a convolutional neural network based model to recover missing values for scRNA-seq data. We first calculated the probability of dropout employing gamma-normal expectation maximum algorithm. Unlike most existing approaches, our model only recovered the expression values that have a dropout probability larger than a threshold. The mean square error and Pearson correlation coefficient were used to assess the accuracy of predicted expression values. The purity and entropy were computed to measure the homogeneity of cell clusters using imputed gene expression profiles. Across various scRNAseq datasets, our model demonstrated robust performance and achieved comparable or better results compared to the other imputation methods.