Rarity: Discovering rare cell populations from single-cell imaging data

IF 4.4 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Bioinformatics Pub Date : 2023-12-12 DOI:10.1093/bioinformatics/btad750
Kaspar Märtens, Michele Bortolomeazzi, Lucia Montorsi, Jo Spencer, Francesca Ciccarelli, Christopher Yau
{"title":"Rarity: Discovering rare cell populations from single-cell imaging data","authors":"Kaspar Märtens, Michele Bortolomeazzi, Lucia Montorsi, Jo Spencer, Francesca Ciccarelli, Christopher Yau","doi":"10.1093/bioinformatics/btad750","DOIUrl":null,"url":null,"abstract":"Motivation Cell type identification plays an important role in the analysis and interpretation of single-cell data and can be carried out via supervised or unsupervised clustering approaches. Supervised methods are best suited where we can list all cell types and their respective marker genes a priori. While unsupervised clustering algorithms look for groups of cells with similar expression properties. This property permits the identification of both known and unknown cell populations, making unsupervised methods suitable for discovery. Success is dependent on the relative strength of the expression signature of each group as well as the number of cells. Rare cell types therefore present a particular challenge that are magnified when they are defined by differentially expressing a small number of genes. Results Typical unsupervised approaches fail to identify such rare sub-populations, and these cells tend to be absorbed into more prevalent cell types. In order to balance these competing demands, we have developed a novel statistical framework for unsupervised clustering, named Rarity, that enables the discovery process for rare cell types to be more robust, consistent and interpretable. We achieve this by devising a novel clustering method based on a Bayesian latent variable model in which we assign cells to inferred latent binary on/off expression profiles. This lets us achieve increased sensitivity to rare cell populations while also allowing us to control and interpret potential false positive discoveries. We systematically study the challenges associated with rare cell type identification and demonstrate the utility of Rarity on various IMC data sets. Availability Implementation of Rarity together with examples are available from the Github repository (https://github.com/kasparmartens/rarity). Supplementary information Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"19 1","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btad750","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation Cell type identification plays an important role in the analysis and interpretation of single-cell data and can be carried out via supervised or unsupervised clustering approaches. Supervised methods are best suited where we can list all cell types and their respective marker genes a priori. While unsupervised clustering algorithms look for groups of cells with similar expression properties. This property permits the identification of both known and unknown cell populations, making unsupervised methods suitable for discovery. Success is dependent on the relative strength of the expression signature of each group as well as the number of cells. Rare cell types therefore present a particular challenge that are magnified when they are defined by differentially expressing a small number of genes. Results Typical unsupervised approaches fail to identify such rare sub-populations, and these cells tend to be absorbed into more prevalent cell types. In order to balance these competing demands, we have developed a novel statistical framework for unsupervised clustering, named Rarity, that enables the discovery process for rare cell types to be more robust, consistent and interpretable. We achieve this by devising a novel clustering method based on a Bayesian latent variable model in which we assign cells to inferred latent binary on/off expression profiles. This lets us achieve increased sensitivity to rare cell populations while also allowing us to control and interpret potential false positive discoveries. We systematically study the challenges associated with rare cell type identification and demonstrate the utility of Rarity on various IMC data sets. Availability Implementation of Rarity together with examples are available from the Github repository (https://github.com/kasparmartens/rarity). Supplementary information Supplementary data are available at Bioinformatics online.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
稀有性:从单细胞成像数据中发现罕见细胞群
细胞类型鉴定在单细胞数据的分析和解读中起着重要作用,可通过有监督或无监督聚类方法进行。有监督的方法最适合我们先验地列出所有细胞类型及其各自的标记基因。而无监督聚类算法则是寻找具有相似表达特性的细胞群。这种特性允许识别已知和未知的细胞群,使无监督方法适用于发现。成功与否取决于每组细胞表达特征的相对强度以及细胞数量。因此,稀有细胞类型是一个特殊的挑战,当它们是由少量基因的差异表达所定义时,这一挑战就会被放大。结果 典型的无监督方法无法识别这种稀有亚群,这些细胞往往会被吸收到更普遍的细胞类型中。为了平衡这些相互竞争的需求,我们为无监督聚类开发了一种名为 "稀有性"(Rarity)的新型统计框架,使稀有细胞类型的发现过程更加稳健、一致和可解释。为了实现这一目标,我们设计了一种基于贝叶斯潜变量模型的新型聚类方法,在该模型中,我们将细胞分配到推断出的潜二元开/关表达谱中。这让我们提高了对罕见细胞群的敏感性,同时也让我们能够控制和解释潜在的假阳性发现。我们系统地研究了与罕见细胞类型鉴定相关的挑战,并在各种 IMC 数据集上展示了 Rarity 的实用性。可用性 Rarity 的实现和示例可从 Github 存储库 (https://github.com/kasparmartens/rarity) 获取。补充信息 补充数据可在 Bioinformatics online 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Bioinformatics
Bioinformatics 生物-生化研究方法
CiteScore
11.20
自引率
5.20%
发文量
753
审稿时长
2.1 months
期刊介绍: The leading journal in its field, Bioinformatics publishes the highest quality scientific papers and review articles of interest to academic and industrial researchers. Its main focus is on new developments in genome bioinformatics and computational biology. Two distinct sections within the journal - Discovery Notes and Application Notes- focus on shorter papers; the former reporting biologically interesting discoveries using computational methods, the latter exploring the applications used for experiments.
期刊最新文献
MEHunter: Transformer-based mobile element variant detection from long reads Metabolic syndrome may be more frequent in treatment-naive sarcoidosis patients. Coracle—A Machine Learning Framework to Identify Bacteria Associated with Continuous Variables CoSIA: an R Bioconductor package for CrOss Species Investigation and Analysis LncLocFormer: a Transformer-based deep learning model for multi-label lncRNA subcellular localization prediction by using localization-specific attention mechanism
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1