Recurrent neural network-based prediction of O-GlcNAcylation sites in mammalian proteins

IF 3.9 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Chemical Engineering Pub Date : 2024-10-01 Epub Date: 2024-07-26 DOI:10.1016/j.compchemeng.2024.108818
Pedro Seber, Richard D. Braatz
{"title":"Recurrent neural network-based prediction of O-GlcNAcylation sites in mammalian proteins","authors":"Pedro Seber,&nbsp;Richard D. Braatz","doi":"10.1016/j.compchemeng.2024.108818","DOIUrl":null,"url":null,"abstract":"<div><p>O-GlcNAcylation has the potential to be an important target for therapeutics, but a motif or an algorithm to reliably predict O-GlcNAcylation sites is not available. Current predictive models are insufficient as they fail to generalize, and many are no longer available. This article constructs recurrent neural network models to predict O-GlcNAcylation sites based on protein sequences. Different datasets are evaluated separately and assessed in terms of strengths and issues. Within a given dataset, results are robust to changes in cross-validation and test data as determined by nested validation. The best model achieves an F<span><math><msub><mrow></mrow><mrow><mn>1</mn></mrow></msub></math></span> score of 36% (more than 3.5-fold greater than the previous best model) and a Matthews Correlation Coefficient of 35% (more than 4.5-fold greater than the previous best model), and, for the F<span><math><msub><mrow></mrow><mrow><mn>1</mn></mrow></msub></math></span> score, 7.6-fold higher than when not using any model. Shapley values are used to interpret the model’s predictions and provide biological insight into O-GlcNAcylation.</p></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"189 ","pages":"Article 108818"},"PeriodicalIF":3.9000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135424002369","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/26 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

O-GlcNAcylation has the potential to be an important target for therapeutics, but a motif or an algorithm to reliably predict O-GlcNAcylation sites is not available. Current predictive models are insufficient as they fail to generalize, and many are no longer available. This article constructs recurrent neural network models to predict O-GlcNAcylation sites based on protein sequences. Different datasets are evaluated separately and assessed in terms of strengths and issues. Within a given dataset, results are robust to changes in cross-validation and test data as determined by nested validation. The best model achieves an F1 score of 36% (more than 3.5-fold greater than the previous best model) and a Matthews Correlation Coefficient of 35% (more than 4.5-fold greater than the previous best model), and, for the F1 score, 7.6-fold higher than when not using any model. Shapley values are used to interpret the model’s predictions and provide biological insight into O-GlcNAcylation.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于循环神经网络预测哺乳动物蛋白质中的 O-GlcNAcylation 位点
O-GlcNAcylation 有可能成为治疗药物的重要靶点,但目前还没有可靠预测 O-GlcNAcylation 位点的主题或算法。目前的预测模型不够充分,因为它们无法推广,而且许多模型已不再可用。本文构建了递归神经网络模型,根据蛋白质序列预测 O-GlcNAcylation 位点。本文分别对不同的数据集进行了评估,并从优势和问题两个方面进行了评价。在给定的数据集中,结果对交叉验证和测试数据的变化是稳健的,这是由嵌套验证决定的。最佳模型的 F1 分数达到 36%(比之前的最佳模型高出 3.5 倍以上),马修斯相关系数达到 35%(比之前的最佳模型高出 4.5 倍以上),F1 分数比不使用任何模型时高出 7.6 倍。Shapley 值用于解释模型的预测结果,并提供有关 O-GlcNAcylation 的生物学见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computers & Chemical Engineering
Computers & Chemical Engineering 工程技术-工程:化工
CiteScore
8.70
自引率
14.00%
发文量
374
审稿时长
70 days
期刊介绍: Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.
期刊最新文献
Kolmogorov-Arnold network driven soft sensors for chemical processes with distributed output Reproducibility of GPU-based Large Eddy Simulations for mixing in stirred tank reactors PlantGraphExpert: A knowledge graph-driven tool for chemical plant operator assistance Strategic design of decentralized multi-hub hydrogen supply chains with LNG value chain integration for global trade Adaptive physics-informed neural network-based digital twins integrated with Ensemble Kalman Filter
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1