scHNTL: single-cell RNA-seq data clustering augmented by high-order neighbors and triplet loss.

Hua Meng, Chuan Qin, Zhiguo Long
{"title":"scHNTL: single-cell RNA-seq data clustering augmented by high-order neighbors and triplet loss.","authors":"Hua Meng, Chuan Qin, Zhiguo Long","doi":"10.1093/bioinformatics/btaf044","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>The rapid development of single-cell RNA sequencing (scRNA-seq) has significantly advanced biomedical research. Clustering analysis, crucial for scRNA-seq data, faces challenges including data sparsity, high dimensionality, and variable gene expressions. Better low-dimensional embeddings for these complex data should maintain intrinsic information while making similar data close and dissimilar data distant. However, existing methods utilizing neural networks typically focus on minimizing reconstruction loss and maintaining similarity in embeddings of directly related cells, but fail to consider dissimilarity, thus lacking separability and limiting the performance of clustering.</p><p><strong>Results: </strong>We propose a novel clustering algorithm, called scHNTL (scRNA-seq data clustering augmented by high-order neighbors and triplet loss). It first constructs an auxiliary similarity graph and uses a Graph Attentional Autoencoder to learn initial embeddings of cells. Then it identifies similar and dissimilar cells by exploring high-order structures of the similarity graph and exploits a triplet loss of contrastive learning, to improve the embeddings in preserving structural information by separating dissimilar pairs. Finally, this improvement for embedding and the target of clustering are fused in a self-optimizing clustering framework to obtain the clusters. Experimental evaluations on 16 real-world datasets demonstrate the superiority of scHNTL in clustering over the state-of-the-arts single-cell clustering algorithms.</p><p><strong>Availability and implementation: </strong>Python implementation of scHNTL is available at Figshare (https://doi.org/10.6084/m9.figshare.27001090) and Github (https://github.com/SWJTU-ML/scHNTL-code).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11878765/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: The rapid development of single-cell RNA sequencing (scRNA-seq) has significantly advanced biomedical research. Clustering analysis, crucial for scRNA-seq data, faces challenges including data sparsity, high dimensionality, and variable gene expressions. Better low-dimensional embeddings for these complex data should maintain intrinsic information while making similar data close and dissimilar data distant. However, existing methods utilizing neural networks typically focus on minimizing reconstruction loss and maintaining similarity in embeddings of directly related cells, but fail to consider dissimilarity, thus lacking separability and limiting the performance of clustering.

Results: We propose a novel clustering algorithm, called scHNTL (scRNA-seq data clustering augmented by high-order neighbors and triplet loss). It first constructs an auxiliary similarity graph and uses a Graph Attentional Autoencoder to learn initial embeddings of cells. Then it identifies similar and dissimilar cells by exploring high-order structures of the similarity graph and exploits a triplet loss of contrastive learning, to improve the embeddings in preserving structural information by separating dissimilar pairs. Finally, this improvement for embedding and the target of clustering are fused in a self-optimizing clustering framework to obtain the clusters. Experimental evaluations on 16 real-world datasets demonstrate the superiority of scHNTL in clustering over the state-of-the-arts single-cell clustering algorithms.

Availability and implementation: Python implementation of scHNTL is available at Figshare (https://doi.org/10.6084/m9.figshare.27001090) and Github (https://github.com/SWJTU-ML/scHNTL-code).

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
scHNTL:单细胞RNA-seq数据聚类增强的高阶邻居和三重丢失。
动机:单细胞RNA测序技术(scRNA-seq)的快速发展极大地推动了生物医学研究的发展。聚类分析对scRNA-seq数据至关重要,但它面临着数据稀疏性、高维性和基因表达变量等挑战。对于这些复杂的数据,更好的低维嵌入应该保持固有的信息,同时使相似的数据接近而不相似的数据远离。然而,现有的利用神经网络的方法通常侧重于最小化重建损失和保持直接相关细胞嵌入的相似性,而没有考虑不相似性,从而缺乏可分离性,限制了聚类的性能。结果:我们提出了一种新的聚类算法,称为scHNTL (scRNA-seq数据聚类增强的高阶邻居和三重态损失)。它首先构造一个辅助相似图,并使用图注意自编码器学习细胞的初始嵌入。然后,它通过探索相似图的高阶结构来识别相似和不相似的细胞,并利用对比学习的三重损失,通过分离不相似对来改进嵌入,以保留结构信息。最后,将改进后的嵌入算法与聚类目标融合到一个自优化聚类框架中,得到聚类。在16个真实数据集上的实验评估表明,scHNTL在聚类方面优于最先进的单细胞聚类算法。可用性和实现:scHNTL的Python实现可在Figshare (https://doi.org/10.6084/m9.figshare.27001090)和Github (https://github.com/SWJTU-ML/scHNTL-code).Supplementary)上获得。信息:补充数据可在Bioinformatics在线获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
SeOMLR: one-step multi-view latent representation with self-weighted ensemble learning for multi-omics cancer subtyping. PEtab-GUI: A graphical user interface to create, edit and inspect PEtab parameter estimation problems. Response to: Best practices when benchmarking CATCH for the design of genome enrichment probes. scDock: Streamlining drug discovery targeting cell-cell communication via scRNA-seq analysis and molecular docking. GeneExt: a gene model extension tool for enhanced single-cell RNA-seq analysis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1