Biases in machine-learning models of human single-cell data

IF 19.1 1区 生物学 Q1 CELL BIOLOGY Nature Cell Biology Pub Date : 2025-02-19 DOI:10.1038/s41556-025-01619-8
Theresa Willem, Vladimir A. Shitov, Malte D. Luecken, Niki Kilbertus, Stefan Bauer, Marie Piraud, Alena Buyx, Fabian J. Theis
{"title":"Biases in machine-learning models of human single-cell data","authors":"Theresa Willem, Vladimir A. Shitov, Malte D. Luecken, Niki Kilbertus, Stefan Bauer, Marie Piraud, Alena Buyx, Fabian J. Theis","doi":"10.1038/s41556-025-01619-8","DOIUrl":null,"url":null,"abstract":"Recent machine-learning (ML)-based advances in single-cell data science have enabled the stratification of human tissue donors at single-cell resolution, promising to provide valuable diagnostic and prognostic insights. However, such insights are susceptible to biases. Here we discuss various biases that emerge along the pipeline of ML-based single-cell analysis, ranging from societal biases affecting whose samples are collected, to clinical and cohort biases that influence the generalizability of single-cell datasets, biases stemming from single-cell sequencing, ML biases specific to (weakly supervised or unsupervised) ML models trained on human single-cell samples and biases during the interpretation of results from ML models. We end by providing methods for single-cell data scientists to assess and mitigate biases, and call for efforts to address the root causes of biases. This Perspective discusses the various biases that can emerge along the pipeline of machine learning-based single-cell analysis and presents methods to train models on human single-cell data in order to assess and mitigate these biases.","PeriodicalId":18977,"journal":{"name":"Nature Cell Biology","volume":"27 3","pages":"384-392"},"PeriodicalIF":19.1000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Cell Biology","FirstCategoryId":"99","ListUrlMain":"https://www.nature.com/articles/s41556-025-01619-8","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Recent machine-learning (ML)-based advances in single-cell data science have enabled the stratification of human tissue donors at single-cell resolution, promising to provide valuable diagnostic and prognostic insights. However, such insights are susceptible to biases. Here we discuss various biases that emerge along the pipeline of ML-based single-cell analysis, ranging from societal biases affecting whose samples are collected, to clinical and cohort biases that influence the generalizability of single-cell datasets, biases stemming from single-cell sequencing, ML biases specific to (weakly supervised or unsupervised) ML models trained on human single-cell samples and biases during the interpretation of results from ML models. We end by providing methods for single-cell data scientists to assess and mitigate biases, and call for efforts to address the root causes of biases. This Perspective discusses the various biases that can emerge along the pipeline of machine learning-based single-cell analysis and presents methods to train models on human single-cell data in order to assess and mitigate these biases.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人类单细胞数据的机器学习模型中的偏见
最近,基于机器学习(ML)的单细胞数据科学取得了进展,能够以单细胞分辨率对人体组织供体进行分层,有望提供有价值的诊断和预后见解。然而,这样的见解容易受到偏见的影响。在这里,我们讨论了基于ML的单细胞分析过程中出现的各种偏差,包括影响收集样本的社会偏差,影响单细胞数据集泛化的临床和队列偏差,源自单细胞测序的偏差,针对人类单细胞样本训练的ML模型(弱监督或无监督)的ML偏差,以及在解释ML模型结果期间的偏差。最后,我们为单细胞数据科学家提供了评估和减轻偏见的方法,并呼吁努力解决偏见的根本原因。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Nature Cell Biology
Nature Cell Biology 生物-细胞生物学
CiteScore
28.40
自引率
0.90%
发文量
219
审稿时长
3 months
期刊介绍: Nature Cell Biology, a prestigious journal, upholds a commitment to publishing papers of the highest quality across all areas of cell biology, with a particular focus on elucidating mechanisms underlying fundamental cell biological processes. The journal's broad scope encompasses various areas of interest, including but not limited to: -Autophagy -Cancer biology -Cell adhesion and migration -Cell cycle and growth -Cell death -Chromatin and epigenetics -Cytoskeletal dynamics -Developmental biology -DNA replication and repair -Mechanisms of human disease -Mechanobiology -Membrane traffic and dynamics -Metabolism -Nuclear organization and dynamics -Organelle biology -Proteolysis and quality control -RNA biology -Signal transduction -Stem cell biology
期刊最新文献
Emerging mechanisms of genome degradation during the mtDNA life cycle. Riboflavin metabolism shapes FSP1-driven ferroptosis resistance. Of rigor and outreach, and practicing what we teach. An SP110-SP100 axis is a critical regulator of promyelocytic leukaemia body dynamics and mitotic fidelity. Palmitoylation-mediated regulation of KAT2A promotes lung metastasis in breast cancer.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1