蛋白质预训练的语言模型是否有助于预测蛋白质与配体的相互作用？

IF 4.3 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS Methods Pub Date : 2023-11-01 DOI:10.1016/j.ymeth.2023.08.016

Weihong Zhang , Fan Hu , Wang Li , Peng Yin

{"title":"蛋白质预训练的语言模型是否有助于预测蛋白质与配体的相互作用？","authors":"Weihong Zhang , Fan Hu , Wang Li , Peng Yin","doi":"10.1016/j.ymeth.2023.08.016","DOIUrl":null,"url":null,"abstract":"<div><p>Protein-ligand interaction (PLI) is a critical step for drug discovery. Recently, protein pretrained language models (PLMs) have showcased exceptional performance across a wide range of protein-related tasks. However, a significant heterogeneity exists between the PLM and PLI tasks, leading to a degree of uncertainty. In this study, we propose a method that quantitatively assesses the significance of protein PLMs in PLI prediction. Specifically, we analyze the performance of three widely-used protein PLMs (TAPE, ESM-1b, and ProtTrans) on three PLI tasks (PDBbind, Kinase, and DUD-E). The model with pre-training consistently achieves improved performance and decreased time cost, demonstrating that enhance both the accuracy and efficiency of PLI prediction. By quantitatively assessing the transferability, the optimal PLM for each PLI task is identified without the need for costly transfer experiments. Additionally, we examine the contributions of PLMs on the distribution of feature space, highlighting the improved discriminability after pre-training. Our findings provide insights into the mechanisms underlying PLMs in PLI prediction and pave the way for the design of more interpretable and accurate PLMs in the future. Code and data are freely available at <span>https://github.com/brian-zZZ/PLM-PLI</span><svg><path></path></svg>.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"219 ","pages":"Pages 8-15"},"PeriodicalIF":4.3000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Does protein pretrained language model facilitate the prediction of protein–ligand interaction?\",\"authors\":\"Weihong Zhang , Fan Hu , Wang Li , Peng Yin\",\"doi\":\"10.1016/j.ymeth.2023.08.016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Protein-ligand interaction (PLI) is a critical step for drug discovery. Recently, protein pretrained language models (PLMs) have showcased exceptional performance across a wide range of protein-related tasks. However, a significant heterogeneity exists between the PLM and PLI tasks, leading to a degree of uncertainty. In this study, we propose a method that quantitatively assesses the significance of protein PLMs in PLI prediction. Specifically, we analyze the performance of three widely-used protein PLMs (TAPE, ESM-1b, and ProtTrans) on three PLI tasks (PDBbind, Kinase, and DUD-E). The model with pre-training consistently achieves improved performance and decreased time cost, demonstrating that enhance both the accuracy and efficiency of PLI prediction. By quantitatively assessing the transferability, the optimal PLM for each PLI task is identified without the need for costly transfer experiments. Additionally, we examine the contributions of PLMs on the distribution of feature space, highlighting the improved discriminability after pre-training. Our findings provide insights into the mechanisms underlying PLMs in PLI prediction and pave the way for the design of more interpretable and accurate PLMs in the future. Code and data are freely available at <span>https://github.com/brian-zZZ/PLM-PLI</span><svg><path></path></svg>.</p></div>\",\"PeriodicalId\":390,\"journal\":{\"name\":\"Methods\",\"volume\":\"219 \",\"pages\":\"Pages 8-15\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1046202323001469\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202323001469","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

蛋白质-配体相互作用（PLI）是药物发现的关键步骤。最近，蛋白质预训练语言模型（PLM）在广泛的蛋白质相关任务中表现出了非凡的性能。然而，PLM和PLI任务之间存在显著的异质性，导致一定程度的不确定性。在这项研究中，我们提出了一种定量评估蛋白质PLM在PLI预测中的意义的方法。具体而言，我们分析了三种广泛使用的蛋白质PLM（TAPE、ESM-1b和ProtTrans）在三种PLI任务（PDBbind、激酶和DUD-E）中的性能。经过预训练的模型持续提高了性能，降低了时间成本，提高了PLI预测的准确性和效率。通过定量评估可转移性，可以确定每个PLI任务的最佳PLM，而不需要昂贵的转移实验。此外，我们还研究了PLM对特征空间分布的贡献，强调了预训练后可分辨性的提高。我们的发现为PLI预测中PLM的潜在机制提供了见解，并为未来设计更可解释和准确的PLM铺平了道路。代码和数据可在https://github.com/brian-zZZ/PLM-PLI.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Does protein pretrained language model facilitate the prediction of protein–ligand interaction?

Protein-ligand interaction (PLI) is a critical step for drug discovery. Recently, protein pretrained language models (PLMs) have showcased exceptional performance across a wide range of protein-related tasks. However, a significant heterogeneity exists between the PLM and PLI tasks, leading to a degree of uncertainty. In this study, we propose a method that quantitatively assesses the significance of protein PLMs in PLI prediction. Specifically, we analyze the performance of three widely-used protein PLMs (TAPE, ESM-1b, and ProtTrans) on three PLI tasks (PDBbind, Kinase, and DUD-E). The model with pre-training consistently achieves improved performance and decreased time cost, demonstrating that enhance both the accuracy and efficiency of PLI prediction. By quantitatively assessing the transferability, the optimal PLM for each PLI task is identified without the need for costly transfer experiments. Additionally, we examine the contributions of PLMs on the distribution of feature space, highlighting the improved discriminability after pre-training. Our findings provide insights into the mechanisms underlying PLMs in PLI prediction and pave the way for the design of more interpretable and accurate PLMs in the future. Code and data are freely available at https://github.com/brian-zZZ/PLM-PLI.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Methods 生物-生化研究方法

CiteScore

9.80

自引率

2.10%

发文量

222

审稿时长

11.3 weeks

期刊介绍： Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.