Improving drug-target affinity prediction by adaptive self-supervised learning.

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE PeerJ Computer Science Pub Date : 2025-01-03 eCollection Date: 2025-01-01 DOI:10.7717/peerj-cs.2622

Qing Ye, Yaxin Sun

{"title":"Improving drug-target affinity prediction by adaptive self-supervised learning.","authors":"Qing Ye, Yaxin Sun","doi":"10.7717/peerj-cs.2622","DOIUrl":null,"url":null,"abstract":"Computational drug-target affinity prediction is important for drug screening and discovery. Currently, self-supervised learning methods face two major challenges in drug-target affinity prediction. The first difficulty lies in the phenomenon of sample mismatch: self-supervised learning processes drug and target samples independently, while actual prediction requires the integration of drug-target pairs. Another challenge is the mismatch between the broadness of self-supervised learning objectives and the precision of biological mechanisms of drug-target affinity (i.e., the induced-fit principle). The former focuses on global feature extraction, while the latter emphasizes the importance of local precise matching. To address these issues, an adaptive self-supervised learning-based drug-target affinity prediction (ASSLDTA) was designed. ASSLDTA integrates a novel adaptive self-supervised learning (ASSL) module with a high-level feature learning network to extract the feature. The ASSL leverages a large amount of unlabeled training data to effectively capture low-level features of drugs and targets. Its goal is to maximize the retention of original feature information, thereby bridging the objective gap between self-supervised learning and drug-target affinity prediction and alleviating the sample mismatch problem. The high-level feature learning network, on the other hand, focuses on extracting effective high-level features for affinity prediction through a small amount of labeled data. Through this two-stage feature extraction design, each stage undertakes specific tasks, fully leveraging the advantages of each model while efficiently integrating information from different data sources, providing a more accurate and comprehensive solution for drug-target affinity prediction. In our experiments, ASSLDTA is much better than other deep methods, and the result of ASSLDTA is significantly increased by learning adaptive self-supervised learning-based features, which validates the effectiveness of our ASSLDTA.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2622"},"PeriodicalIF":2.5000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784864/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2622","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Computational drug-target affinity prediction is important for drug screening and discovery. Currently, self-supervised learning methods face two major challenges in drug-target affinity prediction. The first difficulty lies in the phenomenon of sample mismatch: self-supervised learning processes drug and target samples independently, while actual prediction requires the integration of drug-target pairs. Another challenge is the mismatch between the broadness of self-supervised learning objectives and the precision of biological mechanisms of drug-target affinity (i.e., the induced-fit principle). The former focuses on global feature extraction, while the latter emphasizes the importance of local precise matching. To address these issues, an adaptive self-supervised learning-based drug-target affinity prediction (ASSLDTA) was designed. ASSLDTA integrates a novel adaptive self-supervised learning (ASSL) module with a high-level feature learning network to extract the feature. The ASSL leverages a large amount of unlabeled training data to effectively capture low-level features of drugs and targets. Its goal is to maximize the retention of original feature information, thereby bridging the objective gap between self-supervised learning and drug-target affinity prediction and alleviating the sample mismatch problem. The high-level feature learning network, on the other hand, focuses on extracting effective high-level features for affinity prediction through a small amount of labeled data. Through this two-stage feature extraction design, each stage undertakes specific tasks, fully leveraging the advantages of each model while efficiently integrating information from different data sources, providing a more accurate and comprehensive solution for drug-target affinity prediction. In our experiments, ASSLDTA is much better than other deep methods, and the result of ASSLDTA is significantly increased by learning adaptive self-supervised learning-based features, which validates the effectiveness of our ASSLDTA.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

自适应自监督学习改进药物靶标亲和力预测。

计算药物靶点亲和力预测对药物筛选和发现具有重要意义。目前，自监督学习方法在药物靶点亲和力预测方面面临两大挑战。第一个困难在于样本错配现象：自监督学习独立处理药物和目标样本，而实际预测需要药物-目标对的整合。另一个挑战是自我监督学习目标的广泛性与药物靶标亲和力生物学机制的精确性（即诱导拟合原则）之间的不匹配。前者侧重于全局特征提取，而后者强调局部精确匹配的重要性。为了解决这些问题，设计了一种基于自适应自监督学习的药物靶标亲和力预测（ASSLDTA）。ASSLDTA集成了一种新的自适应自监督学习（ASSL）模块和高级特征学习网络来提取特征。ASSL利用大量未标记的训练数据有效捕获药物和靶标的底层特征。其目标是最大限度地保留原始特征信息，从而弥合自监督学习与药物靶标亲和力预测之间的客观差距，缓解样本不匹配问题。另一方面，高级特征学习网络侧重于通过少量标记数据提取有效的高级特征进行亲和力预测。通过两阶段特征提取设计，每个阶段承担特定的任务，充分发挥每个模型的优势，同时高效整合不同数据源的信息，为药物靶点亲和力预测提供更准确、更全面的解决方案。在我们的实验中，ASSLDTA比其他深度方法要好得多，并且通过学习基于自适应自监督学习的特征显著提高了ASSLDTA的结果，验证了我们的ASSLDTA的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

PeerJ Computer Science Computer Science-General Computer Science

CiteScore

6.10

自引率

5.30%

发文量

332

审稿时长

10 weeks

期刊介绍： PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.