Mixed Effects Deep Learning for the interpretable analysis of single cell RNA sequencing data by quantifying and visualizing batch effects.

ArXiv Pub Date : 2024-11-13
Aixa X Andrade, Son Nguyen, Albert Montillo
{"title":"Mixed Effects Deep Learning for the interpretable analysis of single cell RNA sequencing data by quantifying and visualizing batch effects.","authors":"Aixa X Andrade, Son Nguyen, Albert Montillo","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) data are often confounded by technical or biological batch effects. Existing deep learning models mitigate these effects but often discard batch-specific information, potentially losing valuable biological insights. We propose a Mixed Effects Deep Learning (MEDL) autoencoder framework that separately models batch-invariant (fixed effects) and batch-specific (random effects) components. By decoupling batch-invariant biological states from batch variations, our framework integrates both into predictive models. Our approach also generates 2D visualizations of how the same cell appears across batches, enhancing interpretability. Retaining both fixed and random effect latent spaces improves classification accuracy. We applied our framework to three datasets spanning the cardiovascular system (Healthy Heart), Autism Spectrum Disorder (ASD), and Acute Myeloid Leukemia (AML). With 147 batches in the Healthy Heart dataset-far exceeding typical numbers-we tested our framework's ability to handle many batches. In the ASD dataset, our approach captured donor heterogeneity between autistic and healthy individuals. In the AML dataset, it distinguished donor heterogeneity despite missing cell types and diseased donors exhibiting both healthy and malignant cells. These results highlight our framework's ability to characterize fixed and random effects, enhance batch effect visualization, and improve prediction accuracy across diverse datasets.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601787/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Single-cell RNA sequencing (scRNA-seq) data are often confounded by technical or biological batch effects. Existing deep learning models mitigate these effects but often discard batch-specific information, potentially losing valuable biological insights. We propose a Mixed Effects Deep Learning (MEDL) autoencoder framework that separately models batch-invariant (fixed effects) and batch-specific (random effects) components. By decoupling batch-invariant biological states from batch variations, our framework integrates both into predictive models. Our approach also generates 2D visualizations of how the same cell appears across batches, enhancing interpretability. Retaining both fixed and random effect latent spaces improves classification accuracy. We applied our framework to three datasets spanning the cardiovascular system (Healthy Heart), Autism Spectrum Disorder (ASD), and Acute Myeloid Leukemia (AML). With 147 batches in the Healthy Heart dataset-far exceeding typical numbers-we tested our framework's ability to handle many batches. In the ASD dataset, our approach captured donor heterogeneity between autistic and healthy individuals. In the AML dataset, it distinguished donor heterogeneity despite missing cell types and diseased donors exhibiting both healthy and malignant cells. These results highlight our framework's ability to characterize fixed and random effects, enhance batch effect visualization, and improve prediction accuracy across diverse datasets.

分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
混合效应深度学习通过量化和可视化批次效应,对单细胞 RNA 测序数据进行可解释的分析。
单细胞 RNA 测序(scRNA-seq)数据经常受到技术或生物批次效应的干扰。现有的深度学习模型可以减轻这些影响,但往往会丢弃特定批次的信息,从而可能失去有价值的生物学见解。我们提出了一种混合效应深度学习(MEDL)自动编码器框架,它能分别对批次不变(固定效应)和批次特定(随机效应)成分进行建模。通过将批次不变的生物状态与批次变化解耦,我们的框架将两者都整合到了预测模型中。我们的方法还能生成同一细胞在不同批次中出现情况的二维可视化图像,从而提高可解释性。同时保留固定效应和随机效应潜空间可提高分类准确性。我们将框架应用于心血管系统(健康心脏)、自闭症谱系障碍(ASD)和急性髓性白血病(AML)三个数据集。健康心脏 "数据集中有 147 个批次,远远超出了通常的数量,因此我们测试了我们的框架处理多个批次的能力。在 ASD 数据集中,我们的方法捕捉到了自闭症患者和健康人之间的供体异质性。在急性髓细胞白血病数据集中,尽管细胞类型缺失,而且患病供体既有健康细胞也有恶性细胞,但我们的方法仍能区分供体的异质性。这些结果凸显了我们的框架在描述固定效应和随机效应、增强批量效应可视化以及提高不同数据集预测准确性方面的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Metastability in networks of nonlinear stochastic integrate-and-fire neurons. On the linear scaling of entropy vs. energy in human brain activity, the Hagedorn temperature and the Zipf law. Timing consistency of T cell receptor activation in a stochastic model combining kinetic segregation and proofreading. Brain Morphology Normative modelling platform for abnormality and Centile estimation: Brain MoNoCle. Adversarial Attacks on Large Language Models in Medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1