Single-cell Curriculum Learning-based Deep Graph Embedding Clustering

Huifa Li, Jie Fu, Xinpeng Ling, Zhiyu Sun, Kuncan Wang, Zhili Chen
{"title":"Single-cell Curriculum Learning-based Deep Graph Embedding Clustering","authors":"Huifa Li, Jie Fu, Xinpeng Ling, Zhiyu Sun, Kuncan Wang, Zhili Chen","doi":"arxiv-2408.10511","DOIUrl":null,"url":null,"abstract":"The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies\nenables the investigation of cellular-level tissue heterogeneity. Cell\nannotation significantly contributes to the extensive downstream analysis of\nscRNA-seq data. However, The analysis of scRNA-seq for biological inference\npresents challenges owing to its intricate and indeterminate data distribution,\ncharacterized by a substantial volume and a high frequency of dropout events.\nFurthermore, the quality of training samples varies greatly, and the\nperformance of the popular scRNA-seq data clustering solution GNN could be\nharmed by two types of low-quality training nodes: 1) nodes on the boundary; 2)\nnodes that contribute little additional information to the graph. To address\nthese problems, we propose a single-cell curriculum learning-based deep graph\nembedding clustering (scCLG). We first propose a Chebyshev graph convolutional\nautoencoder with multi-decoder (ChebAE) that combines three optimization\nobjectives corresponding to three decoders, including topology reconstruction\nloss of cell graphs, zero-inflated negative binomial (ZINB) loss, and\nclustering loss, to learn cell-cell topology representation. Meanwhile, we\nemploy a selective training strategy to train GNN based on the features and\nentropy of nodes and prune the difficult nodes based on the difficulty scores\nto keep the high-quality graph. Empirical results on a variety of gene\nexpression datasets show that our model outperforms state-of-the-art methods.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies enables the investigation of cellular-level tissue heterogeneity. Cell annotation significantly contributes to the extensive downstream analysis of scRNA-seq data. However, The analysis of scRNA-seq for biological inference presents challenges owing to its intricate and indeterminate data distribution, characterized by a substantial volume and a high frequency of dropout events. Furthermore, the quality of training samples varies greatly, and the performance of the popular scRNA-seq data clustering solution GNN could be harmed by two types of low-quality training nodes: 1) nodes on the boundary; 2) nodes that contribute little additional information to the graph. To address these problems, we propose a single-cell curriculum learning-based deep graph embedding clustering (scCLG). We first propose a Chebyshev graph convolutional autoencoder with multi-decoder (ChebAE) that combines three optimization objectives corresponding to three decoders, including topology reconstruction loss of cell graphs, zero-inflated negative binomial (ZINB) loss, and clustering loss, to learn cell-cell topology representation. Meanwhile, we employ a selective training strategy to train GNN based on the features and entropy of nodes and prune the difficult nodes based on the difficulty scores to keep the high-quality graph. Empirical results on a variety of gene expression datasets show that our model outperforms state-of-the-art methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于单细胞课程学习的深度图嵌入式聚类
单细胞 RNA 测序(scRNA-seq)技术的迅猛发展使研究细胞级组织异质性成为可能。细胞注释大大有助于对 scRNA-seq 数据进行广泛的下游分析。此外,训练样本的质量参差不齐,流行的 scRNA-seq 数据聚类解决方案 GNN 的性能可能会受到两类低质量训练节点的影响:1)边界上的节点;2)对图贡献很少额外信息的节点。为了解决这些问题,我们提出了一种基于单细胞课程学习的深度图标聚类(sCLG)。我们首先提出了一种带多解码器的切比雪夫图卷积自动编码器(ChebAE),它结合了与三个解码器相对应的三个优化目标,包括细胞图拓扑重建损失、零膨胀负二项式(ZINB)损失和聚类损失,以学习细胞-细胞拓扑表示。同时,我们采用选择性训练策略,根据节点的特征和熵来训练 GNN,并根据难度评分来剪切困难的节点,以保持高质量的图。在各种基因表达数据集上的实证结果表明,我们的模型优于最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Allium Vegetables Intake and Digestive System Cancer Risk: A Study Based on Mendelian Randomization, Network Pharmacology and Molecular Docking wgatools: an ultrafast toolkit for manipulating whole genome alignments Selecting Differential Splicing Methods: Practical Considerations Advancements in colored k-mer sets: essentials for the curious Advancements in practical k-mer sets: essentials for the curious
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1