Self-Supervised Molecular Representation Learning With Topology and Geometry

IF 6.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Journal of Biomedical and Health Informatics Pub Date : 2024-10-14 DOI:10.1109/JBHI.2024.3479194

Xuan Zang;Junjie Zhang;Buzhou Tang

{"title":"Self-Supervised Molecular Representation Learning With Topology and Geometry","authors":"Xuan Zang;Junjie Zhang;Buzhou Tang","doi":"10.1109/JBHI.2024.3479194","DOIUrl":null,"url":null,"abstract":"Molecular representation learning is of great importance for drug molecular analysis. The development in molecular representation learning has demonstrated great promise through self-supervised pre-training strategy to overcome the scarcity of labeled molecular property data. Recent studies concentrate on pre-training molecular representation encoders by integrating both 2D topological and 3D geometric structures. However, existing methods rely on molecule-level or atom-level alignment for different views, while overlooking hierarchical self-supervised learning to capture both inter-molecule and intra-molecule correlation. Additionally, most methods employ 2D or 3D encoders to individually extract molecular characteristics locally or globally for molecular property prediction. The potential for effectively fusing these two molecular representations remains to be explored. In this work, we propose a \n<bold>M\nulti-\n<bold>V\niew \n<bold>M\nolecular \n<bold>R\nepresentation \n<bold>L\nearning method (MVMRL) for molecular property prediction. First, hierarchical pre-training pretext tasks are designed, including fine-grained atom-level tasks for 2D molecular graphs as well as coarse-grained molecule-level tasks for 3D molecular graphs to provide complementary information to each other. Subsequently, a motif-level fusion pattern of multi-view molecular representations is presented during fine-tuning to enhance the performance of molecular property prediction. We evaluate the effectiveness of the proposed MVMRL by comparing with state-of-the-art baselines on molecular property prediction tasks, and the experimental results demonstrate the superiority of MVMRL.","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"29 1","pages":"700-710"},"PeriodicalIF":6.7000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10715653/","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Molecular representation learning is of great importance for drug molecular analysis. The development in molecular representation learning has demonstrated great promise through self-supervised pre-training strategy to overcome the scarcity of labeled molecular property data. Recent studies concentrate on pre-training molecular representation encoders by integrating both 2D topological and 3D geometric structures. However, existing methods rely on molecule-level or atom-level alignment for different views, while overlooking hierarchical self-supervised learning to capture both inter-molecule and intra-molecule correlation. Additionally, most methods employ 2D or 3D encoders to individually extract molecular characteristics locally or globally for molecular property prediction. The potential for effectively fusing these two molecular representations remains to be explored. In this work, we propose a M ulti- V iew M olecular R epresentation L earning method (MVMRL) for molecular property prediction. First, hierarchical pre-training pretext tasks are designed, including fine-grained atom-level tasks for 2D molecular graphs as well as coarse-grained molecule-level tasks for 3D molecular graphs to provide complementary information to each other. Subsequently, a motif-level fusion pattern of multi-view molecular representations is presented during fine-tuning to enhance the performance of molecular property prediction. We evaluate the effectiveness of the proposed MVMRL by comparing with state-of-the-art baselines on molecular property prediction tasks, and the experimental results demonstrate the superiority of MVMRL.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用拓扑学和几何学进行自我监督分子表征学习

分子表征学习对药物分子分析具有重要意义。通过自监督预训练策略来克服标注分子特性数据稀缺的问题，分子表征学习的发展前景广阔。最近的研究集中于通过整合二维拓扑结构和三维几何结构来预训练分子表征编码器。然而，现有的方法依赖于分子级或原子级的不同视图配准，而忽略了捕捉分子间和分子内相关性的分层自监督学习。此外，大多数方法都采用二维或三维编码器来单独提取局部或全局的分子特征，以进行分子特性预测。有效融合这两种分子表征的潜力仍有待探索。在这项工作中，我们提出了一种用于分子特性预测的多视图分子表征学习方法（MVMRL）。首先，设计了分层预训练借口任务，包括针对二维分子图的细粒度原子级任务和针对三维分子图的粗粒度分子级任务，以提供互补信息。随后，在微调过程中提出了多视图分子表征的图案级融合模式，以提高分子性质预测的性能。我们通过在分子性质预测任务中与最先进的基线进行比较来评估所提出的 MVMRL 的有效性，实验结果证明了 MVMRL 的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Journal of Biomedical and Health Informatics COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

CiteScore

13.60

自引率

6.50%

发文量

1151

期刊介绍： IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.