{"title":"Joint inference of discrete cell types and continuous type-specific variability in single-cell datasets with MMIDAS","authors":"Yeganeh Marghi, Rohan Gala, Fahimeh Baftizadeh, Uygar Sümbül","doi":"10.1038/s43588-024-00683-8","DOIUrl":null,"url":null,"abstract":"Reproducible definition and identification of cell types is essential to enable investigations into their biological function and to understand their relevance in the context of development, disease and evolution. Current approaches model variability in data as continuous latent factors, followed by clustering as a separate step, or immediately apply clustering on the data. We show that such approaches can suffer from qualitative mistakes in identifying cell types robustly, particularly when the number of such cell types is in the hundreds or even thousands. Here we propose an unsupervised method, Mixture Model Inference with Discrete-coupled AutoencoderS (MMIDAS), which combines a generalized mixture model with a multi-armed deep neural network to jointly infer the discrete type and continuous type-specific variability. Using four recent datasets of brain cells spanning different technologies, species and conditions, we demonstrate that MMIDAS can identify reproducible cell types and infer cell type-dependent continuous variability in both unimodal and multimodal datasets. Clustering in high-dimensional spaces with a large number of clusters and identifying common aspects of within-cluster variability remain challenging. Here the authors develop an unsupervised method for this purpose and demonstrate it on brain single-cell datasets.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 9","pages":"706-722"},"PeriodicalIF":12.0000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s43588-024-00683-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Reproducible definition and identification of cell types is essential to enable investigations into their biological function and to understand their relevance in the context of development, disease and evolution. Current approaches model variability in data as continuous latent factors, followed by clustering as a separate step, or immediately apply clustering on the data. We show that such approaches can suffer from qualitative mistakes in identifying cell types robustly, particularly when the number of such cell types is in the hundreds or even thousands. Here we propose an unsupervised method, Mixture Model Inference with Discrete-coupled AutoencoderS (MMIDAS), which combines a generalized mixture model with a multi-armed deep neural network to jointly infer the discrete type and continuous type-specific variability. Using four recent datasets of brain cells spanning different technologies, species and conditions, we demonstrate that MMIDAS can identify reproducible cell types and infer cell type-dependent continuous variability in both unimodal and multimodal datasets. Clustering in high-dimensional spaces with a large number of clusters and identifying common aspects of within-cluster variability remain challenging. Here the authors develop an unsupervised method for this purpose and demonstrate it on brain single-cell datasets.