Theresa Willem, Vladimir A. Shitov, Malte D. Luecken, Niki Kilbertus, Stefan Bauer, Marie Piraud, Alena Buyx, Fabian J. Theis
{"title":"Biases in machine-learning models of human single-cell data","authors":"Theresa Willem, Vladimir A. Shitov, Malte D. Luecken, Niki Kilbertus, Stefan Bauer, Marie Piraud, Alena Buyx, Fabian J. Theis","doi":"10.1038/s41556-025-01619-8","DOIUrl":null,"url":null,"abstract":"<p>Recent machine-learning (ML)-based advances in single-cell data science have enabled the stratification of human tissue donors at single-cell resolution, promising to provide valuable diagnostic and prognostic insights. However, such insights are susceptible to biases. Here we discuss various biases that emerge along the pipeline of ML-based single-cell analysis, ranging from societal biases affecting whose samples are collected, to clinical and cohort biases that influence the generalizability of single-cell datasets, biases stemming from single-cell sequencing, ML biases specific to (weakly supervised or unsupervised) ML models trained on human single-cell samples and biases during the interpretation of results from ML models. We end by providing methods for single-cell data scientists to assess and mitigate biases, and call for efforts to address the root causes of biases.</p>","PeriodicalId":18977,"journal":{"name":"Nature Cell Biology","volume":"80 1","pages":""},"PeriodicalIF":17.3000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Cell Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41556-025-01619-8","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Recent machine-learning (ML)-based advances in single-cell data science have enabled the stratification of human tissue donors at single-cell resolution, promising to provide valuable diagnostic and prognostic insights. However, such insights are susceptible to biases. Here we discuss various biases that emerge along the pipeline of ML-based single-cell analysis, ranging from societal biases affecting whose samples are collected, to clinical and cohort biases that influence the generalizability of single-cell datasets, biases stemming from single-cell sequencing, ML biases specific to (weakly supervised or unsupervised) ML models trained on human single-cell samples and biases during the interpretation of results from ML models. We end by providing methods for single-cell data scientists to assess and mitigate biases, and call for efforts to address the root causes of biases.
期刊介绍:
Nature Cell Biology, a prestigious journal, upholds a commitment to publishing papers of the highest quality across all areas of cell biology, with a particular focus on elucidating mechanisms underlying fundamental cell biological processes. The journal's broad scope encompasses various areas of interest, including but not limited to:
-Autophagy
-Cancer biology
-Cell adhesion and migration
-Cell cycle and growth
-Cell death
-Chromatin and epigenetics
-Cytoskeletal dynamics
-Developmental biology
-DNA replication and repair
-Mechanisms of human disease
-Mechanobiology
-Membrane traffic and dynamics
-Metabolism
-Nuclear organization and dynamics
-Organelle biology
-Proteolysis and quality control
-RNA biology
-Signal transduction
-Stem cell biology