Imad Abugessaisa, Akira Hasegawa, Scott Walker, Shintaro Katayama, Juha Kere, Takeya Kasukawa
{"title":"细胞游侠版本对 Chromium 基因表达数据的影响","authors":"Imad Abugessaisa, Akira Hasegawa, Scott Walker, Shintaro Katayama, Juha Kere, Takeya Kasukawa","doi":"10.1101/2024.08.10.607413","DOIUrl":null,"url":null,"abstract":"In droplet-based Chromium single cell gene expression data by the 10x Genomics platform, cell barcode calling by Cell Ranger (CR) is a standard pipeline. However, no systematic evaluation of the impact of the released versions of CR on Chromium single cell gene expression data has been conducted. To comprehensively evaluate the impact of CR, we considered six molecular quality criteria, quantified gene expression, and performed downstream analysis for 12 single-cell Chromium gene expression datasets. Each dataset was processed by 10 versions of CR resulting in 180 datasets and a total of 702,493 cell barcodes. We demonstrated that different versions of CR yield different numbers of cell barcodes with significant variation in molecular qualities and average gene expression for the same dataset. Our analysis finds distinction between two diverse categories of cell barcodes: common barcodes called (unmasked) by all versions of CR, and specific barcodes only called (unmasked/masked) by some versions. Surprisingly, we observed variations in molecular quality indices between common cell barcodes when called by different versions of CR. The specific barcodes yield skewed gene body coverage and form distinct clusters at the edges of UMAP plots. The choice of CR version affects scores for quality, average gene expression, clustering results, and top cluster marker genes for each dataset. Our study indicates a demonstrable, quantitative effect on downstream analysis from choice of CR version, resulting in widely different Chromium single cell gene expression data for different CR versions.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"86 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Impacts of Cell Ranger versions on Chromium gene expression data\",\"authors\":\"Imad Abugessaisa, Akira Hasegawa, Scott Walker, Shintaro Katayama, Juha Kere, Takeya Kasukawa\",\"doi\":\"10.1101/2024.08.10.607413\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In droplet-based Chromium single cell gene expression data by the 10x Genomics platform, cell barcode calling by Cell Ranger (CR) is a standard pipeline. However, no systematic evaluation of the impact of the released versions of CR on Chromium single cell gene expression data has been conducted. To comprehensively evaluate the impact of CR, we considered six molecular quality criteria, quantified gene expression, and performed downstream analysis for 12 single-cell Chromium gene expression datasets. Each dataset was processed by 10 versions of CR resulting in 180 datasets and a total of 702,493 cell barcodes. We demonstrated that different versions of CR yield different numbers of cell barcodes with significant variation in molecular qualities and average gene expression for the same dataset. Our analysis finds distinction between two diverse categories of cell barcodes: common barcodes called (unmasked) by all versions of CR, and specific barcodes only called (unmasked/masked) by some versions. Surprisingly, we observed variations in molecular quality indices between common cell barcodes when called by different versions of CR. The specific barcodes yield skewed gene body coverage and form distinct clusters at the edges of UMAP plots. The choice of CR version affects scores for quality, average gene expression, clustering results, and top cluster marker genes for each dataset. Our study indicates a demonstrable, quantitative effect on downstream analysis from choice of CR version, resulting in widely different Chromium single cell gene expression data for different CR versions.\",\"PeriodicalId\":501307,\"journal\":{\"name\":\"bioRxiv - Bioinformatics\",\"volume\":\"86 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv - Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.08.10.607413\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.10.607413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Impacts of Cell Ranger versions on Chromium gene expression data
In droplet-based Chromium single cell gene expression data by the 10x Genomics platform, cell barcode calling by Cell Ranger (CR) is a standard pipeline. However, no systematic evaluation of the impact of the released versions of CR on Chromium single cell gene expression data has been conducted. To comprehensively evaluate the impact of CR, we considered six molecular quality criteria, quantified gene expression, and performed downstream analysis for 12 single-cell Chromium gene expression datasets. Each dataset was processed by 10 versions of CR resulting in 180 datasets and a total of 702,493 cell barcodes. We demonstrated that different versions of CR yield different numbers of cell barcodes with significant variation in molecular qualities and average gene expression for the same dataset. Our analysis finds distinction between two diverse categories of cell barcodes: common barcodes called (unmasked) by all versions of CR, and specific barcodes only called (unmasked/masked) by some versions. Surprisingly, we observed variations in molecular quality indices between common cell barcodes when called by different versions of CR. The specific barcodes yield skewed gene body coverage and form distinct clusters at the edges of UMAP plots. The choice of CR version affects scores for quality, average gene expression, clustering results, and top cluster marker genes for each dataset. Our study indicates a demonstrable, quantitative effect on downstream analysis from choice of CR version, resulting in widely different Chromium single cell gene expression data for different CR versions.