F. Camerlenghi, Bianca Dumitrascu, F. Ferrari, B. Engelhardt, S. Favaro
{"title":"Nonparametric Bayesian multiarmed bandits for single-cell experiment design","authors":"F. Camerlenghi, Bianca Dumitrascu, F. Ferrari, B. Engelhardt, S. Favaro","doi":"10.1214/20-aoas1370","DOIUrl":null,"url":null,"abstract":"The problem of maximizing cell type discovery under budget constraints is a fundamental challenge in the collection and the analysis of single-cell RNA-sequencing (scRNA-seq) data. In this paper, we introduce a simple, computationally efficient, and scalable Bayesian nonparametric sequential approach to optimize the budget allocation when designing a large scale collection of scRNA-seq data for the purpose of, but not limited to, creating cell atlases. Our approach relies on i) a hierarchical Pitman-Yor prior that recapitulates biological assumptions regarding cellular differentiation, and ii) a Thompson sampling multi-armed bandit strategy that balances exploitation and exploration to prioritize experiments across a sequence of trials. Posterior inference is performed using a sequential Monte Carlo approach, which allows us to fully exploit the sequential nature of our species sampling problem. We empirically show that our approach outperforms state-of-the-art methods and achieves near-Oracle performance on simulated and real data alike. HPY-TS code is available at this https URL.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/20-aoas1370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
The problem of maximizing cell type discovery under budget constraints is a fundamental challenge in the collection and the analysis of single-cell RNA-sequencing (scRNA-seq) data. In this paper, we introduce a simple, computationally efficient, and scalable Bayesian nonparametric sequential approach to optimize the budget allocation when designing a large scale collection of scRNA-seq data for the purpose of, but not limited to, creating cell atlases. Our approach relies on i) a hierarchical Pitman-Yor prior that recapitulates biological assumptions regarding cellular differentiation, and ii) a Thompson sampling multi-armed bandit strategy that balances exploitation and exploration to prioritize experiments across a sequence of trials. Posterior inference is performed using a sequential Monte Carlo approach, which allows us to fully exploit the sequential nature of our species sampling problem. We empirically show that our approach outperforms state-of-the-art methods and achieves near-Oracle performance on simulated and real data alike. HPY-TS code is available at this https URL.