Latent space arithmetic on data embeddings from healthy multi-tissue human RNA-seq decodes disease modules.

IF 6.7 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Patterns Pub Date : 2024-10-31 eCollection Date: 2024-11-08 DOI:10.1016/j.patter.2024.101093
Hendrik A de Weerd, Dimitri Guala, Mika Gustafsson, Jane Synnergren, Jesper Tegnér, Zelmina Lubovac-Pilav, Rasmus Magnusson
{"title":"Latent space arithmetic on data embeddings from healthy multi-tissue human RNA-seq decodes disease modules.","authors":"Hendrik A de Weerd, Dimitri Guala, Mika Gustafsson, Jane Synnergren, Jesper Tegnér, Zelmina Lubovac-Pilav, Rasmus Magnusson","doi":"10.1016/j.patter.2024.101093","DOIUrl":null,"url":null,"abstract":"<p><p>Computational analyses of transcriptomic data have dramatically improved our understanding of complex diseases. However, such approaches are limited by small sample sets of disease-affected material. We asked if a variational autoencoder trained on large groups of healthy human RNA sequencing (RNA-seq) data can capture the fundamental gene regulation system and generalize to unseen disease changes. Importantly, we found this model to successfully compress unseen transcriptomic changes from 25 independent disease datasets. We decoded disease-specific signals from the latent space and found them to contain more disease-specific genes than the corresponding differential expression analysis in 20 of 25 cases. Finally, we matched these disease signals with known drug targets and extracted sets of known and potential pharmaceutical candidates. In summary, our study demonstrates how data-driven representation learning enables the arithmetic deconstruction of the latent space, facilitating the dissection of disease mechanisms and drug targets.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"5 11","pages":"101093"},"PeriodicalIF":6.7000,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11573900/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Patterns","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.patter.2024.101093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/8 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Computational analyses of transcriptomic data have dramatically improved our understanding of complex diseases. However, such approaches are limited by small sample sets of disease-affected material. We asked if a variational autoencoder trained on large groups of healthy human RNA sequencing (RNA-seq) data can capture the fundamental gene regulation system and generalize to unseen disease changes. Importantly, we found this model to successfully compress unseen transcriptomic changes from 25 independent disease datasets. We decoded disease-specific signals from the latent space and found them to contain more disease-specific genes than the corresponding differential expression analysis in 20 of 25 cases. Finally, we matched these disease signals with known drug targets and extracted sets of known and potential pharmaceutical candidates. In summary, our study demonstrates how data-driven representation learning enables the arithmetic deconstruction of the latent space, facilitating the dissection of disease mechanisms and drug targets.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
对健康多组织人类 RNA-seq 数据嵌入的潜在空间运算解码疾病模块。
转录组数据的计算分析极大地提高了我们对复杂疾病的认识。然而,这些方法受到受疾病影响的小样本集的限制。我们提出了一个问题:在大组健康人类 RNA 测序(RNA-seq)数据上训练的变异自动编码器能否捕捉到基本的基因调控系统,并推广到未见的疾病变化。重要的是,我们发现该模型能成功压缩来自 25 个独立疾病数据集的未知转录组变化。我们从潜在空间中解码了疾病特异性信号,发现在 25 个病例中的 20 个病例中,这些信号比相应的差异表达分析包含更多的疾病特异性基因。最后,我们将这些疾病信号与已知的药物靶点进行了匹配,并提取了已知和潜在的候选药物集。总之,我们的研究展示了数据驱动的表征学习如何实现潜空间的算术解构,从而促进疾病机制和药物靶点的剖析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Patterns
Patterns Decision Sciences-Decision Sciences (all)
CiteScore
10.60
自引率
4.60%
发文量
153
审稿时长
19 weeks
期刊介绍:
期刊最新文献
Data-knowledge co-driven innovations in engineering and management. Integration of large language models and federated learning. Decorrelative network architecture for robust electrocardiogram classification. Best holdout assessment is sufficient for cancer transcriptomic model selection. The recent Physics and Chemistry Nobel Prizes, AI, and the convergence of knowledge fields.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1