Inductive graph neural network framework for imputation of single-cell RNA sequencing data

IF 3.9 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Chemical Engineering Pub Date : 2025-02-06 DOI:10.1016/j.compchemeng.2025.109031
Boneshwar V K , Deepesh Agarwal , Bala Natarajan , Babji Srinivasan
{"title":"Inductive graph neural network framework for imputation of single-cell RNA sequencing data","authors":"Boneshwar V K ,&nbsp;Deepesh Agarwal ,&nbsp;Bala Natarajan ,&nbsp;Babji Srinivasan","doi":"10.1016/j.compchemeng.2025.109031","DOIUrl":null,"url":null,"abstract":"<div><div>Single-cell RNA sequencing (scRNA-seq) has transformed biological research, enabling detailed analysis of disease pathways, cellular differentiation, and immune responses at a cellular level. However, the noisy and sparse nature of scRNA-seq datasets often impedes accurate downstream analyses. Cell clustering and gene imputation serve as foundational tasks in harnessing scRNA-seq data for complex biological insights. While various graph-based methods have been developed to enhance imputation and clustering accuracy, traditional transductive models require entire graphs during training, limiting computational efficiency on large biological networks. This study introduces a novel inductive framework that efficiently learns relationships among graph nodes by utilizing subgraphs rather than full neighbor sets for node embedding generation, significantly reducing computational demands while maintaining robust performance. The proposed model achieves up to 60% improvement in Silhouette score, 14.9% in Adjusted Rand Index, 48% in runtime, and 4.5% in L<span><math><msub><mrow></mrow><mrow><mn>1</mn></mrow></msub></math></span> Median error over baseline models, validating the effectiveness of inductive graph learning. Evaluated on diverse scRNA-seq datasets—GSE75748 (progenitor cell types derived from human embryonic stem cells (hESCs)), GSE131928 (adult and pediatric IDH-wildtype glioblastomas (GBM)), and Goolam et al (blastomeres from early-stage Mus musculus (mouse) embryos collected at the 2-cell, 4-cell, 8-cell, 16-cell, and 32-cell stages of preimplantation development).—this framework demonstrates scalability and adaptability, offering a reliable approach for future applications in trajectory inference and gene pathway analysis.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"195 ","pages":"Article 109031"},"PeriodicalIF":3.9000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425000353","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Single-cell RNA sequencing (scRNA-seq) has transformed biological research, enabling detailed analysis of disease pathways, cellular differentiation, and immune responses at a cellular level. However, the noisy and sparse nature of scRNA-seq datasets often impedes accurate downstream analyses. Cell clustering and gene imputation serve as foundational tasks in harnessing scRNA-seq data for complex biological insights. While various graph-based methods have been developed to enhance imputation and clustering accuracy, traditional transductive models require entire graphs during training, limiting computational efficiency on large biological networks. This study introduces a novel inductive framework that efficiently learns relationships among graph nodes by utilizing subgraphs rather than full neighbor sets for node embedding generation, significantly reducing computational demands while maintaining robust performance. The proposed model achieves up to 60% improvement in Silhouette score, 14.9% in Adjusted Rand Index, 48% in runtime, and 4.5% in L1 Median error over baseline models, validating the effectiveness of inductive graph learning. Evaluated on diverse scRNA-seq datasets—GSE75748 (progenitor cell types derived from human embryonic stem cells (hESCs)), GSE131928 (adult and pediatric IDH-wildtype glioblastomas (GBM)), and Goolam et al (blastomeres from early-stage Mus musculus (mouse) embryos collected at the 2-cell, 4-cell, 8-cell, 16-cell, and 32-cell stages of preimplantation development).—this framework demonstrates scalability and adaptability, offering a reliable approach for future applications in trajectory inference and gene pathway analysis.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
单细胞RNA测序数据输入的归纳图神经网络框架
单细胞RNA测序(scRNA-seq)已经改变了生物学研究,能够在细胞水平上详细分析疾病途径、细胞分化和免疫反应。然而,scRNA-seq数据集的噪声和稀疏特性经常阻碍准确的下游分析。细胞聚类和基因植入是利用scRNA-seq数据进行复杂生物学研究的基础任务。虽然已经开发了各种基于图的方法来提高输入和聚类精度,但传统的换能法模型在训练过程中需要整个图,限制了大型生物网络的计算效率。本研究引入了一种新的归纳框架,通过利用子图而不是全邻居集进行节点嵌入生成,有效地学习图节点之间的关系,在保持稳健性能的同时显著减少了计算需求。与基线模型相比,该模型的Silhouette得分提高了60%,Adjusted Rand Index提高了14.9%,运行时间提高了48%,L1中位数误差降低了4.5%,验证了归纳图学习的有效性。在不同的scRNA-seq数据集上进行评估,包括gse75748(来自人胚胎干细胞(hESCs)的祖细胞类型)、GSE131928(成人和儿童idh野生型胶质母细胞瘤(GBM))和Goolam等(在植入前发育的2细胞、4细胞、8细胞、16细胞和32细胞阶段收集的早期小家鼠(小鼠)胚胎的囊胚)。-该框架展示了可扩展性和适应性,为未来在轨迹推断和基因通路分析中的应用提供了可靠的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computers & Chemical Engineering
Computers & Chemical Engineering 工程技术-工程:化工
CiteScore
8.70
自引率
14.00%
发文量
374
审稿时长
70 days
期刊介绍: Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.
期刊最新文献
YANNs: Y-wise affine neural networks for exact and efficient representations of piecewise linear functions Unlocking reactive power potential of industrial processes for voltage support through scheduling optimization Superstructure modeling and optimization of dynamic processes applied to high-performance liquid chromatography with recycling Partial least-squares model adaptation by bootstrap resampling PSFCL: A Probabilistic Slow Feature Contrastive Learning approach for incipient fault diagnosis in industrial processes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1