DORA-XGB: an improved enzymatic reaction feasibility classifier trained using a novel synthetic data approach†

IF 3.2 3区 工程技术 Q2 CHEMISTRY, PHYSICAL Molecular Systems Design & Engineering Pub Date : 2024-11-02 DOI:10.1039/D4ME00118D
Yash Chainani, Zhuofu Ni, Kevin M. Shebek, Linda J. Broadbelt and Keith E. J. Tyo
{"title":"DORA-XGB: an improved enzymatic reaction feasibility classifier trained using a novel synthetic data approach†","authors":"Yash Chainani, Zhuofu Ni, Kevin M. Shebek, Linda J. Broadbelt and Keith E. J. Tyo","doi":"10.1039/D4ME00118D","DOIUrl":null,"url":null,"abstract":"<p >Retrobiosynthesis tools harness the inherent promiscuities of enzymes for the <em>de novo</em> design of novel biosynthetic pathways to key small molecules. Many existing pathway search algorithms rely on exhaustively enumerating the space of all possible enzymatic reactions using generalized rules, followed by an extensive analysis of the ensuing reaction network to extract candidate pathways for experimental validation. While this approach is comprehensive, many false positive reactions are often generated given the permissiveness of such reaction rules. Here, we have developed DORA-XGB, a enzymatic reaction feasibility classifier. DORA-XGB can be used within our DORAnet framework to assess whether newly enumerated enzymatic reactions and pathways would be feasible. To curate a training dataset for our model, we extracted enzymatic reactions from public databases and screened them for their general thermodynamic feasibility. We then considered alternate reaction centers on known substrates to strategically generate infeasible reactions with high confidence, thereby circumventing the lack of negative data in the literature. In training our model, we also experimented with various molecular fingerprinting techniques and configurations for assembling reaction fingerprints, taking into account not just primary substrate and primary product structures, but cofactor structures as well. Our model's utility is demonstrated through favorable benchmarking against a previously published classifier, the successful recovery of newly published reactions, and the ranking of previously predicted pathways for the biosynthesis of propionic acid from pyruvate.</p>","PeriodicalId":91,"journal":{"name":"Molecular Systems Design & Engineering","volume":" 2","pages":" 129-142"},"PeriodicalIF":3.2000,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/me/d4me00118d?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Systems Design & Engineering","FirstCategoryId":"5","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/me/d4me00118d","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Retrobiosynthesis tools harness the inherent promiscuities of enzymes for the de novo design of novel biosynthetic pathways to key small molecules. Many existing pathway search algorithms rely on exhaustively enumerating the space of all possible enzymatic reactions using generalized rules, followed by an extensive analysis of the ensuing reaction network to extract candidate pathways for experimental validation. While this approach is comprehensive, many false positive reactions are often generated given the permissiveness of such reaction rules. Here, we have developed DORA-XGB, a enzymatic reaction feasibility classifier. DORA-XGB can be used within our DORAnet framework to assess whether newly enumerated enzymatic reactions and pathways would be feasible. To curate a training dataset for our model, we extracted enzymatic reactions from public databases and screened them for their general thermodynamic feasibility. We then considered alternate reaction centers on known substrates to strategically generate infeasible reactions with high confidence, thereby circumventing the lack of negative data in the literature. In training our model, we also experimented with various molecular fingerprinting techniques and configurations for assembling reaction fingerprints, taking into account not just primary substrate and primary product structures, but cofactor structures as well. Our model's utility is demonstrated through favorable benchmarking against a previously published classifier, the successful recovery of newly published reactions, and the ranking of previously predicted pathways for the biosynthesis of propionic acid from pyruvate.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DORA-XGB:一种改进的酶促反应可行性分类器,使用一种新的合成数据方法训练†
逆转录生物合成工具利用酶固有的混杂性来重新设计新的生物合成途径,以获得关键的小分子。许多现有的途径搜索算法依赖于使用广义规则详尽地枚举所有可能的酶促反应的空间,然后对随后的反应网络进行广泛分析,以提取候选途径进行实验验证。虽然这种方法是全面的,但由于这种反应规则的宽容,经常会产生许多假阳性反应。在这里,我们开发了DORA-XGB,一个酶反应可行性分类器。DORA-XGB可以在我们的DORAnet框架内使用,以评估新列举的酶促反应和途径是否可行。为了为我们的模型建立一个训练数据集,我们从公共数据库中提取酶反应,并对它们进行一般热力学可行性筛选。然后,我们考虑了已知底物上的替代反应中心,以高置信度有策略地产生不可行的反应,从而避免了文献中缺乏负面数据。在训练我们的模型时,我们还尝试了各种分子指纹技术和配置来组装反应指纹,不仅考虑了主要的底物和主要的产物结构,还考虑了辅因子结构。通过对先前发表的分类器进行有利的基准测试,成功恢复新发表的反应,以及先前预测的丙酮酸生物合成丙酸途径的排名,证明了我们模型的实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Molecular Systems Design & Engineering
Molecular Systems Design & Engineering Engineering-Biomedical Engineering
CiteScore
6.40
自引率
2.80%
发文量
144
期刊介绍: Molecular Systems Design & Engineering provides a hub for cutting-edge research into how understanding of molecular properties, behaviour and interactions can be used to design and assemble better materials, systems, and processes to achieve specific functions. These may have applications of technological significance and help address global challenges.
期刊最新文献
Molecular bioengineering: computational tools, smart materials, and therapeutic systems Bioinspired nucleolipid as a low molecular weight oleogelator for oil-in-water nanoemulsions Rational design of DNA nanocarriers via sequence and length modulation of linker and lock domains: insights from coarse-grained simulations Atomistic insights into structure–morphology relationships in hydrated poly(benzimidazolium) and poly(bis-arylimidazolium) ionene membranes Integrating equivariant architectures and charge supervision for data-efficient molecular property prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1