Toward Unified AI Drug Discovery with Multimodal Knowledge.

Health data science Pub Date : 2024-02-23 eCollection Date: 2024-01-01 DOI:10.34133/hds.0113

Yizhen Luo, Xing Yi Liu, Kai Yang, Kui Huang, Massimo Hong, Jiahuan Zhang, Yushuai Wu, Zaiqing Nie

{"title":"Toward Unified AI Drug Discovery with Multimodal Knowledge.","authors":"Yizhen Luo, Xing Yi Liu, Kai Yang, Kui Huang, Massimo Hong, Jiahuan Zhang, Yushuai Wu, Zaiqing Nie","doi":"10.34133/hds.0113","DOIUrl":null,"url":null,"abstract":"Background: In real-world drug discovery, human experts typically grasp molecular knowledge of drugs and proteins from multimodal sources including molecular structures, structured knowledge from knowledge bases, and unstructured knowledge from biomedical literature. Existing multimodal approaches in AI drug discovery integrate either structured or unstructured knowledge independently, which compromises the holistic understanding of biomolecules. Besides, they fail to address the missing modality problem, where multimodal information is missing for novel drugs and proteins. Methods: In this work, we present KEDD, a unified, end-to-end deep learning framework that jointly incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first incorporates independent representation learning models to extract the underlying characteristics from each modality. Then, it applies a feature fusion technique to calculate the prediction results. To mitigate the missing modality problem, we leverage sparse attention and a modality masking technique to reconstruct the missing features based on top relevant molecules. Results: Benefiting from structured and unstructured knowledge, our framework achieves a deeper understanding of biomolecules. KEDD outperforms state-of-the-art models by an average of 5.2% on drug-target interaction prediction, 2.6% on drug property prediction, 1.2% on drug-drug interaction prediction, and 4.1% on protein-protein interaction prediction. Through qualitative analysis, we reveal KEDD's promising potential in assisting real-world applications. Conclusions: By incorporating biomolecular expertise from multimodal knowledge, KEDD bears promise in accelerating drug discovery.","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0113"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10886071/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34133/hds.0113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: In real-world drug discovery, human experts typically grasp molecular knowledge of drugs and proteins from multimodal sources including molecular structures, structured knowledge from knowledge bases, and unstructured knowledge from biomedical literature. Existing multimodal approaches in AI drug discovery integrate either structured or unstructured knowledge independently, which compromises the holistic understanding of biomolecules. Besides, they fail to address the missing modality problem, where multimodal information is missing for novel drugs and proteins. Methods: In this work, we present KEDD, a unified, end-to-end deep learning framework that jointly incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first incorporates independent representation learning models to extract the underlying characteristics from each modality. Then, it applies a feature fusion technique to calculate the prediction results. To mitigate the missing modality problem, we leverage sparse attention and a modality masking technique to reconstruct the missing features based on top relevant molecules. Results: Benefiting from structured and unstructured knowledge, our framework achieves a deeper understanding of biomolecules. KEDD outperforms state-of-the-art models by an average of 5.2% on drug-target interaction prediction, 2.6% on drug property prediction, 1.2% on drug-drug interaction prediction, and 4.1% on protein-protein interaction prediction. Through qualitative analysis, we reveal KEDD's promising potential in assisting real-world applications. Conclusions: By incorporating biomolecular expertise from multimodal knowledge, KEDD bears promise in accelerating drug discovery.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用多模态知识实现统一的人工智能药物发现。

背景：在现实世界的药物发现中，人类专家通常从多模态来源掌握药物和蛋白质的分子知识，包括分子结构、知识库中的结构化知识和生物医学文献中的非结构化知识。现有的人工智能药物发现多模态方法独立整合了结构化知识或非结构化知识，影响了对生物分子的整体理解。此外，它们也无法解决缺失模态问题，即新型药物和蛋白质的多模态信息缺失。方法在这项工作中，我们提出了 KEDD--一个统一的端到端深度学习框架，它能将结构化和非结构化知识联合起来，用于庞大的人工智能药物发现任务。该框架首先结合独立的表征学习模型，从每种模式中提取基本特征。然后，它应用特征融合技术来计算预测结果。为了缓解缺失模态问题，我们利用稀疏注意力和模态掩蔽技术，根据顶级相关分子重建缺失特征。结果受益于结构化和非结构化知识，我们的框架加深了对生物分子的理解。在药物-靶点相互作用预测、药物性质预测、药物-药物相互作用预测和蛋白质-蛋白质相互作用预测方面，KEDD的表现分别比最先进的模型平均高出5.2%、2.6%、1.2%和4.1%。通过定性分析，我们揭示了 KEDD 在协助实际应用方面的巨大潜力。结论：通过结合多模态知识中的生物分子专业知识，KEDD有望加速药物发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Health data science

CiteScore

3.70

自引率

0.00%

发文量