Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios

IF 10.1 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Genome Biology Pub Date : 2024-06-03 DOI:10.1186/s13059-024-03290-y
Hongrui Duo, Yinghong Li, Yang Lan, Jingxin Tao, Qingxia Yang, Yingxue Xiao, Jing Sun, Lei Li, Xiner Nie, Xiaoxi Zhang, Guizhao Liang, Mingwei Liu, Youjin Hao, Bo Li
{"title":"Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios","authors":"Hongrui Duo, Yinghong Li, Yang Lan, Jingxin Tao, Qingxia Yang, Yingxue Xiao, Jing Sun, Lei Li, Xiner Nie, Xiaoxi Zhang, Guizhao Liang, Mingwei Liu, Youjin Hao, Bo Li","doi":"10.1186/s13059-024-03290-y","DOIUrl":null,"url":null,"abstract":"Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines. We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation. No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":10.1000,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13059-024-03290-y","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines. We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation. No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
系统评估单细胞和空间分辨转录组学数据在多种情况下的模拟,并提供实用指南
单细胞 RNA 测序(scRNA-seq)和空间解析转录组学(SRT)在生命科学领域取得了突破性进展。为了开发用于 scRNA-seq 和 SRT 数据的生物信息学工具并执行无偏基准,数据模拟已被广泛采用,提供明确的基本事实并生成定制数据集。然而,模拟方法在多种情况下的性能尚未得到全面评估,因此在没有实用指南的情况下选择合适的方法具有挑战性。我们利用来自 24 个平台的 152 个参考数据集,从准确性、功能性、可扩展性和可用性等方面系统地评估了针对 scRNA-seq 和/或 SRT 数据开发的 49 种模拟方法。在各种平台上,SRTsim、scDesign3、ZINB-WaVE 和 scDesign2 的准确性表现最好。出乎意料的是,一些针对 scRNA-seq 数据定制的方法具有模拟 SRT 数据的潜在兼容性。在相应的模拟场景下,Lun、SPARSim 和 scDesign3-tree 优于其他方法。Phenopath、Lun、Simple 和 MFA 的可扩展性得分较高,但它们无法生成真实的模拟数据。用户在做决定时应考虑方法的准确性和可扩展性(或功能性)之间的权衡。此外,执行错误主要是由参数估计失败和计算中出现缺失值或无限值造成的。我们提供了实用的方法选择指南、标准管道 Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ) 和用于数据模拟的在线工具 Simsite ( https://www.ciblab.net/software/simshiny/ )。没有哪种方法在所有标准上都能达到最佳效果,因此,如果某种方法能有效、合理地解决问题,那么我们就会推荐使用这种 "虽好但非最佳 "的方法。我们的综合工作为基因表达数据建模开发人员提供了重要的见解,并促进了用户的模拟过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Genome Biology
Genome Biology Biochemistry, Genetics and Molecular Biology-Genetics
CiteScore
21.00
自引率
3.30%
发文量
241
审稿时长
2 months
期刊介绍: Genome Biology stands as a premier platform for exceptional research across all domains of biology and biomedicine, explored through a genomic and post-genomic lens. With an impressive impact factor of 12.3 (2022),* the journal secures its position as the 3rd-ranked research journal in the Genetics and Heredity category and the 2nd-ranked research journal in the Biotechnology and Applied Microbiology category by Thomson Reuters. Notably, Genome Biology holds the distinction of being the highest-ranked open-access journal in this category. Our dedicated team of highly trained in-house Editors collaborates closely with our esteemed Editorial Board of international experts, ensuring the journal remains on the forefront of scientific advances and community standards. Regular engagement with researchers at conferences and institute visits underscores our commitment to staying abreast of the latest developments in the field.
期刊最新文献
Atlas of telomeric repeat diversity in Arabidopsis thaliana ESCHR: a hyperparameter-randomized ensemble approach for robust clustering across diverse datasets Splam: a deep-learning-based splice site predictor that improves spliced alignments Dimension reduction, cell clustering, and cell–cell communication inference for single-cell transcriptomics with DcjComm A comprehensive map of the aging blood methylome in humans
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1