MMCL-CPI: A multi-modal compound-protein interaction prediction model incorporating contrastive learning pre-training

IF 2.6 4区 生物学 Q2 BIOLOGY Computational Biology and Chemistry Pub Date : 2024-07-25 DOI:10.1016/j.compbiolchem.2024.108137
{"title":"MMCL-CPI: A multi-modal compound-protein interaction prediction model incorporating contrastive learning pre-training","authors":"","doi":"10.1016/j.compbiolchem.2024.108137","DOIUrl":null,"url":null,"abstract":"<div><h3>Motivation</h3><p>Compound-protein interaction (CPI) prediction plays a crucial role in drug discovery and drug repositioning. Early researchers relied on time-consuming and labor-intensive wet laboratory experiments. However, the advent of deep learning has significantly accelerated this progress. Most existing deep learning methods utilize deep neural networks to extract compound features from sequences and graphs, either separately or in combination. Our team’s previous research has demonstrated that compound images contain valuable information that can be leveraged for CPI task. However, there is a scarcity of multimodal methods that effectively combine sequence and image representations of compounds in CPI. Currently, the use of text-image pairs for contrastive language-image pre-training is a popular approach in the multimodal field. Further research is needed to explore how the integration of sequence and image representations can enhance the accuracy of CPI task.</p></div><div><h3>Results</h3><p>This paper presents a novel method called MMCL-CPI, which encompasses two key highlights: 1) Firstly, we propose extracting compound features from two modalities: one-dimensional SMILES and two-dimensional images. This approach enables us to capture both sequence and spatial features, enhancing the prediction accuracy for CPI. Based on this, we design a novel multimodal model. 2) Secondly, we introduce a multimodal pre-training strategy that leverages comparative learning on a large-scale unlabeled dataset to establish the correspondence between SMILES string and compound’s image. This pre-training approach significantly improves compound feature representations for downstream CPI task. Our method has shown competitive results on multiple datasets.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927124001257","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation

Compound-protein interaction (CPI) prediction plays a crucial role in drug discovery and drug repositioning. Early researchers relied on time-consuming and labor-intensive wet laboratory experiments. However, the advent of deep learning has significantly accelerated this progress. Most existing deep learning methods utilize deep neural networks to extract compound features from sequences and graphs, either separately or in combination. Our team’s previous research has demonstrated that compound images contain valuable information that can be leveraged for CPI task. However, there is a scarcity of multimodal methods that effectively combine sequence and image representations of compounds in CPI. Currently, the use of text-image pairs for contrastive language-image pre-training is a popular approach in the multimodal field. Further research is needed to explore how the integration of sequence and image representations can enhance the accuracy of CPI task.

Results

This paper presents a novel method called MMCL-CPI, which encompasses two key highlights: 1) Firstly, we propose extracting compound features from two modalities: one-dimensional SMILES and two-dimensional images. This approach enables us to capture both sequence and spatial features, enhancing the prediction accuracy for CPI. Based on this, we design a novel multimodal model. 2) Secondly, we introduce a multimodal pre-training strategy that leverages comparative learning on a large-scale unlabeled dataset to establish the correspondence between SMILES string and compound’s image. This pre-training approach significantly improves compound feature representations for downstream CPI task. Our method has shown competitive results on multiple datasets.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MMCL-CPI:结合对比学习预训练的多模态化合物-蛋白质相互作用预测模型
动机化合物-蛋白质相互作用(CPI)预测在药物发现和药物重新定位中起着至关重要的作用。早期的研究人员依赖于耗时耗力的湿实验室实验。然而,深度学习的出现大大加快了这一进程。现有的深度学习方法大多利用深度神经网络从序列和图形中单独或组合提取化合物特征。我们团队之前的研究表明,复合图像包含有价值的信息,可用于 CPI 任务。然而,在 CPI 中有效结合化合物序列和图像表征的多模态方法还很匮乏。目前,使用文本-图像对进行语言-图像对比预训练是多模态领域的一种流行方法。本文提出了一种名为 MMCL-CPI 的新方法,该方法有两大亮点:1)首先,我们建议从两种模式中提取复合特征:一维 SMILES 和二维图像。这种方法可以同时捕捉序列和空间特征,从而提高 CPI 的预测准确性。在此基础上,我们设计了一个新颖的多模态模型。2) 其次,我们引入了一种多模态预训练策略,利用大规模无标记数据集上的比较学习来建立 SMILES 字符串与化合物图像之间的对应关系。这种预训练方法大大改进了下游 CPI 任务的复合特征表征。我们的方法已在多个数据集上显示出具有竞争力的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computational Biology and Chemistry
Computational Biology and Chemistry 生物-计算机:跨学科应用
CiteScore
6.10
自引率
3.20%
发文量
142
审稿时长
24 days
期刊介绍: Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered. Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered. Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.
期刊最新文献
Genome-wide identification of alternative splicing related with transcription factors and splicing regulators in breast cancer stem cells responding to fasting-mimicking diet Design and characterization of defined alpha-helix mini-proteins with intrinsic cell permeability Identify the key genes and pathways of melatonin in age-dependent mice hippocampus regulation by transcriptome analysis Integrating (deep) machine learning and cheminformatics for predicting human intestinal absorption of small molecules Investigating pH-induced conformational switch in PIM-1: An integrated multi spectroscopic and MD simulation study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1