{"title":"Single-cell Curriculum Learning-based Deep Graph Embedding Clustering","authors":"Huifa Li, Jie Fu, Xinpeng Ling, Zhiyu Sun, Kuncan Wang, Zhili Chen","doi":"arxiv-2408.10511","DOIUrl":null,"url":null,"abstract":"The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies\nenables the investigation of cellular-level tissue heterogeneity. Cell\nannotation significantly contributes to the extensive downstream analysis of\nscRNA-seq data. However, The analysis of scRNA-seq for biological inference\npresents challenges owing to its intricate and indeterminate data distribution,\ncharacterized by a substantial volume and a high frequency of dropout events.\nFurthermore, the quality of training samples varies greatly, and the\nperformance of the popular scRNA-seq data clustering solution GNN could be\nharmed by two types of low-quality training nodes: 1) nodes on the boundary; 2)\nnodes that contribute little additional information to the graph. To address\nthese problems, we propose a single-cell curriculum learning-based deep graph\nembedding clustering (scCLG). We first propose a Chebyshev graph convolutional\nautoencoder with multi-decoder (ChebAE) that combines three optimization\nobjectives corresponding to three decoders, including topology reconstruction\nloss of cell graphs, zero-inflated negative binomial (ZINB) loss, and\nclustering loss, to learn cell-cell topology representation. Meanwhile, we\nemploy a selective training strategy to train GNN based on the features and\nentropy of nodes and prune the difficult nodes based on the difficulty scores\nto keep the high-quality graph. Empirical results on a variety of gene\nexpression datasets show that our model outperforms state-of-the-art methods.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies
enables the investigation of cellular-level tissue heterogeneity. Cell
annotation significantly contributes to the extensive downstream analysis of
scRNA-seq data. However, The analysis of scRNA-seq for biological inference
presents challenges owing to its intricate and indeterminate data distribution,
characterized by a substantial volume and a high frequency of dropout events.
Furthermore, the quality of training samples varies greatly, and the
performance of the popular scRNA-seq data clustering solution GNN could be
harmed by two types of low-quality training nodes: 1) nodes on the boundary; 2)
nodes that contribute little additional information to the graph. To address
these problems, we propose a single-cell curriculum learning-based deep graph
embedding clustering (scCLG). We first propose a Chebyshev graph convolutional
autoencoder with multi-decoder (ChebAE) that combines three optimization
objectives corresponding to three decoders, including topology reconstruction
loss of cell graphs, zero-inflated negative binomial (ZINB) loss, and
clustering loss, to learn cell-cell topology representation. Meanwhile, we
employ a selective training strategy to train GNN based on the features and
entropy of nodes and prune the difficult nodes based on the difficulty scores
to keep the high-quality graph. Empirical results on a variety of gene
expression datasets show that our model outperforms state-of-the-art methods.