Boneshwar V K , Deepesh Agarwal , Bala Natarajan , Babji Srinivasan
{"title":"Inductive graph neural network framework for imputation of single-cell RNA sequencing data","authors":"Boneshwar V K , Deepesh Agarwal , Bala Natarajan , Babji Srinivasan","doi":"10.1016/j.compchemeng.2025.109031","DOIUrl":null,"url":null,"abstract":"<div><div>Single-cell RNA sequencing (scRNA-seq) has transformed biological research, enabling detailed analysis of disease pathways, cellular differentiation, and immune responses at a cellular level. However, the noisy and sparse nature of scRNA-seq datasets often impedes accurate downstream analyses. Cell clustering and gene imputation serve as foundational tasks in harnessing scRNA-seq data for complex biological insights. While various graph-based methods have been developed to enhance imputation and clustering accuracy, traditional transductive models require entire graphs during training, limiting computational efficiency on large biological networks. This study introduces a novel inductive framework that efficiently learns relationships among graph nodes by utilizing subgraphs rather than full neighbor sets for node embedding generation, significantly reducing computational demands while maintaining robust performance. The proposed model achieves up to 60% improvement in Silhouette score, 14.9% in Adjusted Rand Index, 48% in runtime, and 4.5% in L<span><math><msub><mrow></mrow><mrow><mn>1</mn></mrow></msub></math></span> Median error over baseline models, validating the effectiveness of inductive graph learning. Evaluated on diverse scRNA-seq datasets—GSE75748 (progenitor cell types derived from human embryonic stem cells (hESCs)), GSE131928 (adult and pediatric IDH-wildtype glioblastomas (GBM)), and Goolam et al (blastomeres from early-stage Mus musculus (mouse) embryos collected at the 2-cell, 4-cell, 8-cell, 16-cell, and 32-cell stages of preimplantation development).—this framework demonstrates scalability and adaptability, offering a reliable approach for future applications in trajectory inference and gene pathway analysis.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"195 ","pages":"Article 109031"},"PeriodicalIF":3.9000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425000353","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Single-cell RNA sequencing (scRNA-seq) has transformed biological research, enabling detailed analysis of disease pathways, cellular differentiation, and immune responses at a cellular level. However, the noisy and sparse nature of scRNA-seq datasets often impedes accurate downstream analyses. Cell clustering and gene imputation serve as foundational tasks in harnessing scRNA-seq data for complex biological insights. While various graph-based methods have been developed to enhance imputation and clustering accuracy, traditional transductive models require entire graphs during training, limiting computational efficiency on large biological networks. This study introduces a novel inductive framework that efficiently learns relationships among graph nodes by utilizing subgraphs rather than full neighbor sets for node embedding generation, significantly reducing computational demands while maintaining robust performance. The proposed model achieves up to 60% improvement in Silhouette score, 14.9% in Adjusted Rand Index, 48% in runtime, and 4.5% in L Median error over baseline models, validating the effectiveness of inductive graph learning. Evaluated on diverse scRNA-seq datasets—GSE75748 (progenitor cell types derived from human embryonic stem cells (hESCs)), GSE131928 (adult and pediatric IDH-wildtype glioblastomas (GBM)), and Goolam et al (blastomeres from early-stage Mus musculus (mouse) embryos collected at the 2-cell, 4-cell, 8-cell, 16-cell, and 32-cell stages of preimplantation development).—this framework demonstrates scalability and adaptability, offering a reliable approach for future applications in trajectory inference and gene pathway analysis.
期刊介绍:
Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.