A novel interpretability framework for enzyme turnover number prediction boosted by pre-trained enzyme embeddings and adaptive gate network

IF 4.3 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Methods Pub Date : 2025-05-01 Epub Date: 2025-02-26 DOI:10.1016/j.ymeth.2025.02.010
Bing-Xue Du , Haoyang Yu , Bei Zhu , Yahui Long , Min Wu , Jian-Yu Shi
{"title":"A novel interpretability framework for enzyme turnover number prediction boosted by pre-trained enzyme embeddings and adaptive gate network","authors":"Bing-Xue Du ,&nbsp;Haoyang Yu ,&nbsp;Bei Zhu ,&nbsp;Yahui Long ,&nbsp;Min Wu ,&nbsp;Jian-Yu Shi","doi":"10.1016/j.ymeth.2025.02.010","DOIUrl":null,"url":null,"abstract":"<div><div>It is a vital step to identify the enzyme turnover number (kcat) in synthetic biology and early-stage drug discovery. Recently, deep learning methods have achieved inspiring process to predict kcat with the development of multi-species enzyme-substrate pairs turnover number data. However, the performance of existing approaches still heavily depends on the effectiveness of feature extraction for enzymes and substrates, as well as the optimal fusion of these two types of features. Furthermore, it is essential to identify the key molecular substructures that significantly impact kcat prediction. To address these issues, we develop a novel end-to-end dual-representation interpretability framework GELKcat by harnessing graph transformers for substrate molecular encoding and CNNs for enzyme word2vec embeddings. We further integrate substrate and enzyme features using the adaptive gate network, which assigns optimal weights to capture the most suitable feature combinations. The comparison with several state-of-the-art methods demonstrates the superiority of our GELKcat and the ablation studies further illuminate the invaluable roles of three main components. Furthermore, case studies illustrate the interpretability of GELKcat by identifying the key functional groups in a substrate, which are significantly associated with turnover number. It is anticipated that this work can bridge current gaps in enzyme-substrate representation, which can give some guidance for drug discovery and synthetic biology.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"237 ","pages":"Pages 45-52"},"PeriodicalIF":4.3000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202325000519","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/26 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

It is a vital step to identify the enzyme turnover number (kcat) in synthetic biology and early-stage drug discovery. Recently, deep learning methods have achieved inspiring process to predict kcat with the development of multi-species enzyme-substrate pairs turnover number data. However, the performance of existing approaches still heavily depends on the effectiveness of feature extraction for enzymes and substrates, as well as the optimal fusion of these two types of features. Furthermore, it is essential to identify the key molecular substructures that significantly impact kcat prediction. To address these issues, we develop a novel end-to-end dual-representation interpretability framework GELKcat by harnessing graph transformers for substrate molecular encoding and CNNs for enzyme word2vec embeddings. We further integrate substrate and enzyme features using the adaptive gate network, which assigns optimal weights to capture the most suitable feature combinations. The comparison with several state-of-the-art methods demonstrates the superiority of our GELKcat and the ablation studies further illuminate the invaluable roles of three main components. Furthermore, case studies illustrate the interpretability of GELKcat by identifying the key functional groups in a substrate, which are significantly associated with turnover number. It is anticipated that this work can bridge current gaps in enzyme-substrate representation, which can give some guidance for drug discovery and synthetic biology.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于预训练酶嵌入和自适应门网络的酶周转数预测可解释性框架。
酶周转数(kcat)的确定是合成生物学和早期药物开发的重要环节。近年来,随着多物种酶-底物对周转数数据的发展,深度学习方法在预测kcat方面取得了鼓舞人心的进展。然而,这些方法的性能取决于如何提取酶和底物的特征,以及这两类特征的最佳融合,这仍然是一个有待研究的问题。此外,有必要阐明哪些子结构对分子kcat预测至关重要。为了解决这些问题,我们开发了一个新的端到端双表示可解释性框架GELKcat,通过利用图转换器进行底物分子编码,利用cnn进行酶word2vec嵌入。我们使用自适应门网络进一步整合底物和酶的特征,该网络分配最优权重以捕获最合适的特征组合。与几种最先进的方法的比较表明了GELKcat的优越性,烧蚀研究进一步阐明了三个主要组成部分的宝贵作用。此外,案例研究通过鉴定底物中的关键官能团来说明GELKcat的可解释性,这些关键官能团与周转率显著相关。预计这项工作可以弥补目前酶-底物表示的空白,可以为药物发现和合成生物学提供一些指导。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Methods
Methods 生物-生化研究方法
CiteScore
9.80
自引率
2.10%
发文量
222
审稿时长
11.3 weeks
期刊介绍: Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.
期刊最新文献
An Affordable paper based platform for eDNA Filtration, Concentration, and nucleic acid extraction Broad Screening of 2-Styrylchromones Identifies dual inhibitors of the COX-2 pathway Methodological approaches for dissecting ferroptosis in lens epithelial cells and its role in diabetic cataract progression Concurrent quantification of novel cyclometalated iridium-based photosensitizer and 5-fluorouracil using RP-HPLC: Applications in nanoformulation analysis, drug release, and skin permeation studies First report of CRISPR prime editing in a globally significant non-model organism, the fall armyworm, Spodoptera frugiperda (Lepidoptera: Noctuidae)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1