PatchProt: hydrophobic patch prediction using protein foundation models.

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Bioinformatics advances Pub Date : 2024-10-14 eCollection Date: 2024-01-01 DOI:10.1093/bioadv/vbae154
Dea Gogishvili, Emmanuel Minois-Genin, Jan van Eck, Sanne Abeln
{"title":"PatchProt: hydrophobic patch prediction using protein foundation models.","authors":"Dea Gogishvili, Emmanuel Minois-Genin, Jan van Eck, Sanne Abeln","doi":"10.1093/bioadv/vbae154","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Hydrophobic patches on protein surfaces play important functional roles in protein-protein and protein-ligand interactions. Large hydrophobic surfaces are also involved in the progression of aggregation diseases. Predicting exposed hydrophobic patches from a protein sequence has shown to be a difficult task. Fine-tuning foundation models allows for adapting a model to the specific nuances of a new task using a much smaller dataset. Additionally, multitask deep learning offers a promising solution for addressing data gaps, simultaneously outperforming single-task methods.</p><p><strong>Results: </strong>In this study, we harnessed a recently released leading large language model Evolutionary Scale Models (ESM-2). Efficient fine-tuning of ESM-2 was achieved by leveraging a recently developed parameter-efficient fine-tuning method. This approach enabled comprehensive training of model layers without excessive parameters and without the need to include a computationally expensive multiple sequence analysis. We explored several related tasks, at local (residue) and global (protein) levels, to improve the representation of the model. As a result, our model, PatchProt, cannot only predict hydrophobic patch areas but also outperforms existing methods at predicting primary tasks, including secondary structure and surface accessibility predictions. Importantly, our analysis shows that including related local tasks can improve predictions on more difficult global tasks. This research sets a new standard for sequence-based protein property prediction and highlights the remarkable potential of fine-tuning foundation models enriching the model representation by training over related tasks.</p><p><strong>Availability and implementation: </strong>https://github.com/Deagogishvili/chapter-multi-task.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae154"},"PeriodicalIF":2.4000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525051/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbae154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Hydrophobic patches on protein surfaces play important functional roles in protein-protein and protein-ligand interactions. Large hydrophobic surfaces are also involved in the progression of aggregation diseases. Predicting exposed hydrophobic patches from a protein sequence has shown to be a difficult task. Fine-tuning foundation models allows for adapting a model to the specific nuances of a new task using a much smaller dataset. Additionally, multitask deep learning offers a promising solution for addressing data gaps, simultaneously outperforming single-task methods.

Results: In this study, we harnessed a recently released leading large language model Evolutionary Scale Models (ESM-2). Efficient fine-tuning of ESM-2 was achieved by leveraging a recently developed parameter-efficient fine-tuning method. This approach enabled comprehensive training of model layers without excessive parameters and without the need to include a computationally expensive multiple sequence analysis. We explored several related tasks, at local (residue) and global (protein) levels, to improve the representation of the model. As a result, our model, PatchProt, cannot only predict hydrophobic patch areas but also outperforms existing methods at predicting primary tasks, including secondary structure and surface accessibility predictions. Importantly, our analysis shows that including related local tasks can improve predictions on more difficult global tasks. This research sets a new standard for sequence-based protein property prediction and highlights the remarkable potential of fine-tuning foundation models enriching the model representation by training over related tasks.

Availability and implementation: https://github.com/Deagogishvili/chapter-multi-task.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PatchProt:利用蛋白质基础模型预测疏水斑块。
动机蛋白质表面的疏水斑块在蛋白质-蛋白质和蛋白质-配体相互作用中发挥着重要的功能作用。大面积的疏水表面也与聚集性疾病的发展有关。根据蛋白质序列预测暴露的疏水斑块是一项艰巨的任务。通过对基础模型进行微调,可以使用更小的数据集使模型适应新任务的具体细微差别。此外,多任务深度学习为解决数据缺口问题提供了一种前景广阔的解决方案,同时还优于单任务方法:在这项研究中,我们利用了最近发布的领先大型语言模型 Evolutionary Scale Models(ESM-2)。通过利用最近开发的参数高效微调方法,实现了对 ESM-2 的高效微调。这种方法能够对模型层进行全面训练,无需过多参数,也无需进行计算成本高昂的多序列分析。我们在局部(残基)和全局(蛋白质)层面探索了几项相关任务,以改进模型的表示。因此,我们的模型 PatchProt 不仅能预测疏水斑块区域,而且在预测二级结构和表面可及性预测等主要任务方面也优于现有方法。重要的是,我们的分析表明,包含相关的局部任务可以改善对更困难的全局任务的预测。这项研究为基于序列的蛋白质性质预测设定了一个新标准,并凸显了通过对相关任务进行训练来丰富模型表征的微调基础模型的巨大潜力。可用性与实现:https://github.com/Deagogishvili/chapter-multi-task。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
期刊最新文献
MultiOmicsIntegrator: a nextflow pipeline for integrated omics analyses. mxfda: a comprehensive toolkit for functional data analysis of single-cell spatial data. Phylogenetic-informed graph deep learning to classify dynamic transmission clusters in infectious disease epidemics. AAclust: k-optimized clustering for selecting redundancy-reduced sets of amino acid scales. Exon nomenclature and classification of transcripts database (ENACTdb): a resource for analyzing alternative splicing mediated proteome diversity.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1