Philip Fradkin, Puria Azadi, Karush Suri, Frederik Wenkel, Ali Bashashati, Maciej Sypetkowski, Dominique Beaini
{"title":"How Molecules Impact Cells: Unlocking Contrastive PhenoMolecular Retrieval","authors":"Philip Fradkin, Puria Azadi, Karush Suri, Frederik Wenkel, Ali Bashashati, Maciej Sypetkowski, Dominique Beaini","doi":"arxiv-2409.08302","DOIUrl":null,"url":null,"abstract":"Predicting molecular impact on cellular function is a core challenge in\ntherapeutic design. Phenomic experiments, designed to capture cellular\nmorphology, utilize microscopy based techniques and demonstrate a high\nthroughput solution for uncovering molecular impact on the cell. In this work,\nwe learn a joint latent space between molecular structures and microscopy\nphenomic experiments, aligning paired samples with contrastive learning.\nSpecifically, we study the problem ofContrastive PhenoMolecular Retrieval,\nwhich consists of zero-shot molecular structure identification conditioned on\nphenomic experiments. We assess challenges in multi-modal learning of phenomics\nand molecular modalities such as experimental batch effect, inactive molecule\nperturbations, and encoding perturbation concentration. We demonstrate improved\nmulti-modal learner retrieval through (1) a uni-modal pre-trained phenomics\nmodel, (2) a novel inter sample similarity aware loss, and (3) models\nconditioned on a representation of molecular concentration. Following this\nrecipe, we propose MolPhenix, a molecular phenomics model. MolPhenix leverages\na pre-trained phenomics model to demonstrate significant performance gains\nacross perturbation concentrations, molecular scaffolds, and activity\nthresholds. In particular, we demonstrate an 8.1x improvement in zero shot\nmolecular retrieval of active molecules over the previous state-of-the-art,\nreaching 77.33% in top-1% accuracy. These results open the door for machine\nlearning to be applied in virtual phenomics screening, which can significantly\nbenefit drug discovery applications.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Quantitative Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08302","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Predicting molecular impact on cellular function is a core challenge in
therapeutic design. Phenomic experiments, designed to capture cellular
morphology, utilize microscopy based techniques and demonstrate a high
throughput solution for uncovering molecular impact on the cell. In this work,
we learn a joint latent space between molecular structures and microscopy
phenomic experiments, aligning paired samples with contrastive learning.
Specifically, we study the problem ofContrastive PhenoMolecular Retrieval,
which consists of zero-shot molecular structure identification conditioned on
phenomic experiments. We assess challenges in multi-modal learning of phenomics
and molecular modalities such as experimental batch effect, inactive molecule
perturbations, and encoding perturbation concentration. We demonstrate improved
multi-modal learner retrieval through (1) a uni-modal pre-trained phenomics
model, (2) a novel inter sample similarity aware loss, and (3) models
conditioned on a representation of molecular concentration. Following this
recipe, we propose MolPhenix, a molecular phenomics model. MolPhenix leverages
a pre-trained phenomics model to demonstrate significant performance gains
across perturbation concentrations, molecular scaffolds, and activity
thresholds. In particular, we demonstrate an 8.1x improvement in zero shot
molecular retrieval of active molecules over the previous state-of-the-art,
reaching 77.33% in top-1% accuracy. These results open the door for machine
learning to be applied in virtual phenomics screening, which can significantly
benefit drug discovery applications.