{"title":"Benchmarking clustering, alignment, and integration methods for spatial transcriptomics","authors":"Yunfei Hu, Manfei Xie, Yikang Li, Mingxing Rao, Wenjun Shen, Can Luo, Haoran Qin, Jihoon Baek, Xin Maizie Zhou","doi":"10.1186/s13059-024-03361-0","DOIUrl":null,"url":null,"abstract":"Spatial transcriptomics (ST) is advancing our understanding of complex tissues and organisms. However, building a robust clustering algorithm to define spatially coherent regions in a single tissue slice and aligning or integrating multiple tissue slices originating from diverse sources for essential downstream analyses remains challenging. Numerous clustering, alignment, and integration methods have been specifically designed for ST data by leveraging its spatial information. The absence of comprehensive benchmark studies complicates the selection of methods and future method development. In this study, we systematically benchmark a variety of state-of-the-art algorithms with a wide range of real and simulated datasets of varying sizes, technologies, species, and complexity. We analyze the strengths and weaknesses of each method using diverse quantitative and qualitative metrics and analyses, including eight metrics for spatial clustering accuracy and contiguity, uniform manifold approximation and projection visualization, layer-wise and spot-to-spot alignment accuracy, and 3D reconstruction, which are designed to assess method performance as well as data quality. The code used for evaluation is available on our GitHub. Additionally, we provide online notebook tutorials and documentation to facilitate the reproduction of all benchmarking results and to support the study of new methods and new datasets. Our analyses lead to comprehensive recommendations that cover multiple aspects, helping users to select optimal tools for their specific needs and guide future method development.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"33 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13059-024-03361-0","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Spatial transcriptomics (ST) is advancing our understanding of complex tissues and organisms. However, building a robust clustering algorithm to define spatially coherent regions in a single tissue slice and aligning or integrating multiple tissue slices originating from diverse sources for essential downstream analyses remains challenging. Numerous clustering, alignment, and integration methods have been specifically designed for ST data by leveraging its spatial information. The absence of comprehensive benchmark studies complicates the selection of methods and future method development. In this study, we systematically benchmark a variety of state-of-the-art algorithms with a wide range of real and simulated datasets of varying sizes, technologies, species, and complexity. We analyze the strengths and weaknesses of each method using diverse quantitative and qualitative metrics and analyses, including eight metrics for spatial clustering accuracy and contiguity, uniform manifold approximation and projection visualization, layer-wise and spot-to-spot alignment accuracy, and 3D reconstruction, which are designed to assess method performance as well as data quality. The code used for evaluation is available on our GitHub. Additionally, we provide online notebook tutorials and documentation to facilitate the reproduction of all benchmarking results and to support the study of new methods and new datasets. Our analyses lead to comprehensive recommendations that cover multiple aspects, helping users to select optimal tools for their specific needs and guide future method development.
空间转录组学(ST)正在推进我们对复杂组织和生物体的了解。然而,建立一个强大的聚类算法来定义单个组织切片中的空间一致性区域,并对来自不同来源的多个组织切片进行配准或整合,以进行必要的下游分析,这仍然具有挑战性。许多聚类、配准和整合方法都是利用空间信息专门为 ST 数据设计的。由于缺乏全面的基准研究,使得方法的选择和未来的方法开发变得更加复杂。在本研究中,我们利用各种不同规模、技术、物种和复杂程度的真实和模拟数据集,系统地对各种最先进的算法进行了基准测试。我们使用不同的定量和定性指标和分析方法来分析每种方法的优缺点,其中包括空间聚类精度和连续性、均匀流形近似和投影可视化、层间和点对点配准精度以及三维重建等八个指标,这些指标旨在评估方法性能和数据质量。用于评估的代码可在我们的 GitHub 上获取。此外,我们还提供在线笔记本教程和文档,以方便复制所有基准测试结果,并支持对新方法和新数据集的研究。通过分析,我们提出了涵盖多个方面的综合建议,帮助用户根据自己的具体需求选择最佳工具,并指导未来的方法开发。
Genome BiologyBiochemistry, Genetics and Molecular Biology-Genetics
CiteScore
21.00
自引率
3.30%
发文量
241
审稿时长
2 months
期刊介绍:
Genome Biology stands as a premier platform for exceptional research across all domains of biology and biomedicine, explored through a genomic and post-genomic lens.
With an impressive impact factor of 12.3 (2022),* the journal secures its position as the 3rd-ranked research journal in the Genetics and Heredity category and the 2nd-ranked research journal in the Biotechnology and Applied Microbiology category by Thomson Reuters. Notably, Genome Biology holds the distinction of being the highest-ranked open-access journal in this category.
Our dedicated team of highly trained in-house Editors collaborates closely with our esteemed Editorial Board of international experts, ensuring the journal remains on the forefront of scientific advances and community standards. Regular engagement with researchers at conferences and institute visits underscores our commitment to staying abreast of the latest developments in the field.