Rarity: Discovering rare cell populations from single-cell imaging data

IF 4.4 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS Bioinformatics Pub Date : 2023-12-12 DOI:10.1093/bioinformatics/btad750

Kaspar Märtens, Michele Bortolomeazzi, Lucia Montorsi, Jo Spencer, Francesca Ciccarelli, Christopher Yau

{"title":"Rarity: Discovering rare cell populations from single-cell imaging data","authors":"Kaspar Märtens, Michele Bortolomeazzi, Lucia Montorsi, Jo Spencer, Francesca Ciccarelli, Christopher Yau","doi":"10.1093/bioinformatics/btad750","DOIUrl":null,"url":null,"abstract":"Motivation Cell type identification plays an important role in the analysis and interpretation of single-cell data and can be carried out via supervised or unsupervised clustering approaches. Supervised methods are best suited where we can list all cell types and their respective marker genes a priori. While unsupervised clustering algorithms look for groups of cells with similar expression properties. This property permits the identification of both known and unknown cell populations, making unsupervised methods suitable for discovery. Success is dependent on the relative strength of the expression signature of each group as well as the number of cells. Rare cell types therefore present a particular challenge that are magnified when they are defined by differentially expressing a small number of genes. Results Typical unsupervised approaches fail to identify such rare sub-populations, and these cells tend to be absorbed into more prevalent cell types. In order to balance these competing demands, we have developed a novel statistical framework for unsupervised clustering, named Rarity, that enables the discovery process for rare cell types to be more robust, consistent and interpretable. We achieve this by devising a novel clustering method based on a Bayesian latent variable model in which we assign cells to inferred latent binary on/off expression profiles. This lets us achieve increased sensitivity to rare cell populations while also allowing us to control and interpret potential false positive discoveries. We systematically study the challenges associated with rare cell type identification and demonstrate the utility of Rarity on various IMC data sets. Availability Implementation of Rarity together with examples are available from the Github repository (https://github.com/kasparmartens/rarity). Supplementary information Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"19 1","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btad750","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Motivation Cell type identification plays an important role in the analysis and interpretation of single-cell data and can be carried out via supervised or unsupervised clustering approaches. Supervised methods are best suited where we can list all cell types and their respective marker genes a priori. While unsupervised clustering algorithms look for groups of cells with similar expression properties. This property permits the identification of both known and unknown cell populations, making unsupervised methods suitable for discovery. Success is dependent on the relative strength of the expression signature of each group as well as the number of cells. Rare cell types therefore present a particular challenge that are magnified when they are defined by differentially expressing a small number of genes. Results Typical unsupervised approaches fail to identify such rare sub-populations, and these cells tend to be absorbed into more prevalent cell types. In order to balance these competing demands, we have developed a novel statistical framework for unsupervised clustering, named Rarity, that enables the discovery process for rare cell types to be more robust, consistent and interpretable. We achieve this by devising a novel clustering method based on a Bayesian latent variable model in which we assign cells to inferred latent binary on/off expression profiles. This lets us achieve increased sensitivity to rare cell populations while also allowing us to control and interpret potential false positive discoveries. We systematically study the challenges associated with rare cell type identification and demonstrate the utility of Rarity on various IMC data sets. Availability Implementation of Rarity together with examples are available from the Github repository (https://github.com/kasparmartens/rarity). Supplementary information Supplementary data are available at Bioinformatics online.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

稀有性：从单细胞成像数据中发现罕见细胞群

细胞类型鉴定在单细胞数据的分析和解读中起着重要作用，可通过有监督或无监督聚类方法进行。有监督的方法最适合我们先验地列出所有细胞类型及其各自的标记基因。而无监督聚类算法则是寻找具有相似表达特性的细胞群。这种特性允许识别已知和未知的细胞群，使无监督方法适用于发现。成功与否取决于每组细胞表达特征的相对强度以及细胞数量。因此，稀有细胞类型是一个特殊的挑战，当它们是由少量基因的差异表达所定义时，这一挑战就会被放大。结果典型的无监督方法无法识别这种稀有亚群，这些细胞往往会被吸收到更普遍的细胞类型中。为了平衡这些相互竞争的需求，我们为无监督聚类开发了一种名为 "稀有性"（Rarity）的新型统计框架，使稀有细胞类型的发现过程更加稳健、一致和可解释。为了实现这一目标，我们设计了一种基于贝叶斯潜变量模型的新型聚类方法，在该模型中，我们将细胞分配到推断出的潜二元开/关表达谱中。这让我们提高了对罕见细胞群的敏感性，同时也让我们能够控制和解释潜在的假阳性发现。我们系统地研究了与罕见细胞类型鉴定相关的挑战，并在各种 IMC 数据集上展示了 Rarity 的实用性。可用性 Rarity 的实现和示例可从 Github 存储库 (https://github.com/kasparmartens/rarity) 获取。补充信息补充数据可在 Bioinformatics online 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Bioinformatics 生物-生化研究方法

CiteScore

11.20

自引率

5.20%

发文量

753

审稿时长

2.1 months

期刊介绍： The leading journal in its field, Bioinformatics publishes the highest quality scientific papers and review articles of interest to academic and industrial researchers. Its main focus is on new developments in genome bioinformatics and computational biology. Two distinct sections within the journal - Discovery Notes and Application Notes- focus on shorter papers; the former reporting biologically interesting discoveries using computational methods, the latter exploring the applications used for experiments.