Deciphering spatial domains from spatially resolved transcriptomics with Siamese graph autoencoder.

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES GigaScience Pub Date : 2024-01-02 DOI:10.1093/gigascience/giae003
Lei Cao, Chao Yang, Luni Hu, Wenjian Jiang, Yating Ren, Tianyi Xia, Mengyang Xu, Yishuai Ji, Mei Li, Xun Xu, Yuxiang Li, Yong Zhang, Shuangsang Fang
{"title":"Deciphering spatial domains from spatially resolved transcriptomics with Siamese graph autoencoder.","authors":"Lei Cao, Chao Yang, Luni Hu, Wenjian Jiang, Yating Ren, Tianyi Xia, Mengyang Xu, Yishuai Ji, Mei Li, Xun Xu, Yuxiang Li, Yong Zhang, Shuangsang Fang","doi":"10.1093/gigascience/giae003","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Cell clustering is a pivotal aspect of spatial transcriptomics (ST) data analysis as it forms the foundation for subsequent data mining. Recent advances in spatial domain identification have leveraged graph neural network (GNN) approaches in conjunction with spatial transcriptomics data. However, such GNN-based methods suffer from representation collapse, wherein all spatial spots are projected onto a singular representation. Consequently, the discriminative capability of individual representation feature is limited, leading to suboptimal clustering performance.</p><p><strong>Results: </strong>To address this issue, we proposed SGAE, a novel framework for spatial domain identification, incorporating the power of the Siamese graph autoencoder. SGAE mitigates the information correlation at both sample and feature levels, thus improving the representation discrimination. We adapted this framework to ST analysis by constructing a graph based on both gene expression and spatial information. SGAE outperformed alternative methods by its effectiveness in capturing spatial patterns and generating high-quality clusters, as evaluated by the Adjusted Rand Index, Normalized Mutual Information, and Fowlkes-Mallows Index. Moreover, the clustering results derived from SGAE can be further utilized in the identification of 3-dimensional (3D) Drosophila embryonic structure with enhanced accuracy.</p><p><strong>Conclusions: </strong>Benchmarking results from various ST datasets generated by diverse platforms demonstrate compelling evidence for the effectiveness of SGAE against other ST clustering methods. Specifically, SGAE exhibits potential for extension and application on multislice 3D reconstruction and tissue structure investigation. The source code and a collection of spatial clustering results can be accessed at https://github.com/STOmics/SGAE/.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 1","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10939418/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giae003","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Cell clustering is a pivotal aspect of spatial transcriptomics (ST) data analysis as it forms the foundation for subsequent data mining. Recent advances in spatial domain identification have leveraged graph neural network (GNN) approaches in conjunction with spatial transcriptomics data. However, such GNN-based methods suffer from representation collapse, wherein all spatial spots are projected onto a singular representation. Consequently, the discriminative capability of individual representation feature is limited, leading to suboptimal clustering performance.

Results: To address this issue, we proposed SGAE, a novel framework for spatial domain identification, incorporating the power of the Siamese graph autoencoder. SGAE mitigates the information correlation at both sample and feature levels, thus improving the representation discrimination. We adapted this framework to ST analysis by constructing a graph based on both gene expression and spatial information. SGAE outperformed alternative methods by its effectiveness in capturing spatial patterns and generating high-quality clusters, as evaluated by the Adjusted Rand Index, Normalized Mutual Information, and Fowlkes-Mallows Index. Moreover, the clustering results derived from SGAE can be further utilized in the identification of 3-dimensional (3D) Drosophila embryonic structure with enhanced accuracy.

Conclusions: Benchmarking results from various ST datasets generated by diverse platforms demonstrate compelling evidence for the effectiveness of SGAE against other ST clustering methods. Specifically, SGAE exhibits potential for extension and application on multislice 3D reconstruction and tissue structure investigation. The source code and a collection of spatial clustering results can be accessed at https://github.com/STOmics/SGAE/.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用连体图自动编码器从空间解析转录组学中解密空间域
背景:细胞聚类是空间转录组学(ST)数据分析的关键环节,因为它是后续数据挖掘的基础。空间域识别的最新进展是将图神经网络(GNN)方法与空间转录组学数据相结合。然而,这种基于图神经网络的方法存在表征坍塌的问题,即所有空间点都被投射到一个单一的表征上。因此,单个表征特征的分辨能力有限,导致聚类效果不理想:为了解决这个问题,我们提出了 SGAE,这是一种用于空间域识别的新型框架,它结合了连体图自动编码器的强大功能。SGAE 可减轻样本和特征层面的信息相关性,从而提高表征识别能力。我们通过构建基于基因表达和空间信息的图,将这一框架应用于 ST 分析。根据调整后兰德指数、归一化互信息和 Fowlkes-Mallows 指数的评估,SGAE 在捕捉空间模式和生成高质量聚类方面的效果优于其他方法。此外,SGAE 得出的聚类结果还可进一步用于识别三维果蝇胚胎结构,提高识别的准确性:来自不同平台生成的各种 ST 数据集的基准测试结果表明,SGAE 与其他 ST 聚类方法相比具有令人信服的有效性。特别是,SGAE 在多切片三维重建和组织结构研究方面具有扩展和应用潜力。源代码和空间聚类结果集可通过 https://github.com/STOmics/SGAE/ 访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
期刊最新文献
IPEV: identification of prokaryotic and eukaryotic virus-derived sequences in virome using deep learning Large-scale genomic survey with deep learning-based method reveals strain-level phage specificity determinants An effective strategy for assembling the sex-limited chromosome Enhanced bovine genome annotation through integration of transcriptomics and epi-transcriptomics datasets facilitates genomic biology Korea4K: whole genome sequences of 4,157 Koreans with 107 phenotypes derived from extensive health check-ups
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1