Hendrik A de Weerd, Dimitri Guala, Mika Gustafsson, Jane Synnergren, Jesper Tegnér, Zelmina Lubovac-Pilav, Rasmus Magnusson
{"title":"Latent space arithmetic on data embeddings from healthy multi-tissue human RNA-seq decodes disease modules.","authors":"Hendrik A de Weerd, Dimitri Guala, Mika Gustafsson, Jane Synnergren, Jesper Tegnér, Zelmina Lubovac-Pilav, Rasmus Magnusson","doi":"10.1016/j.patter.2024.101093","DOIUrl":null,"url":null,"abstract":"<p><p>Computational analyses of transcriptomic data have dramatically improved our understanding of complex diseases. However, such approaches are limited by small sample sets of disease-affected material. We asked if a variational autoencoder trained on large groups of healthy human RNA sequencing (RNA-seq) data can capture the fundamental gene regulation system and generalize to unseen disease changes. Importantly, we found this model to successfully compress unseen transcriptomic changes from 25 independent disease datasets. We decoded disease-specific signals from the latent space and found them to contain more disease-specific genes than the corresponding differential expression analysis in 20 of 25 cases. Finally, we matched these disease signals with known drug targets and extracted sets of known and potential pharmaceutical candidates. In summary, our study demonstrates how data-driven representation learning enables the arithmetic deconstruction of the latent space, facilitating the dissection of disease mechanisms and drug targets.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"5 11","pages":"101093"},"PeriodicalIF":6.7000,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11573900/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Patterns","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.patter.2024.101093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/8 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Computational analyses of transcriptomic data have dramatically improved our understanding of complex diseases. However, such approaches are limited by small sample sets of disease-affected material. We asked if a variational autoencoder trained on large groups of healthy human RNA sequencing (RNA-seq) data can capture the fundamental gene regulation system and generalize to unseen disease changes. Importantly, we found this model to successfully compress unseen transcriptomic changes from 25 independent disease datasets. We decoded disease-specific signals from the latent space and found them to contain more disease-specific genes than the corresponding differential expression analysis in 20 of 25 cases. Finally, we matched these disease signals with known drug targets and extracted sets of known and potential pharmaceutical candidates. In summary, our study demonstrates how data-driven representation learning enables the arithmetic deconstruction of the latent space, facilitating the dissection of disease mechanisms and drug targets.