{"title":"Robust enzyme discovery and engineering with deep learning using CataPro","authors":"Zechen Wang, Dongqi Xie, Dong Wu, Xiaozhou Luo, Sheng Wang, Yangyang Li, Yanmei Yang, Weifeng Li, Liangzhen Zheng","doi":"10.1038/s41467-025-58038-4","DOIUrl":null,"url":null,"abstract":"<p>Accurate prediction of enzyme kinetic parameters is crucial for enzyme exploration and modification. Existing models face the problem of either low accuracy or poor generalization ability due to overfitting. In this work, we first developed unbiased datasets to evaluate the actual performance of these methods and proposed a deep learning model, CataPro, based on pre-trained models and molecular fingerprints to predict turnover number (<i>k</i><sub><i>c</i><i>a</i><i>t</i></sub>), Michaelis constant (<i>K</i><sub><i>m</i></sub>), and catalytic efficiency (<i>k</i><sub><i>c</i><i>a</i><i>t</i></sub>/<i>K</i><sub><i>m</i></sub>). Compared with previous baseline models, CataPro demonstrates clearly enhanced accuracy and generalization ability on the unbiased datasets. In a representational enzyme mining project, by combining CataPro with traditional methods, we identified an enzyme (SsCSO) with 19.53 times increased activity compared to the initial enzyme (CSO2) and then successfully engineered it to improve its activity by 3.34 times. This reveals the high potential of CataPro as an effective tool for future enzyme discovery and modification.</p>","PeriodicalId":19066,"journal":{"name":"Nature Communications","volume":"33 1","pages":""},"PeriodicalIF":15.7000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Communications","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41467-025-58038-4","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate prediction of enzyme kinetic parameters is crucial for enzyme exploration and modification. Existing models face the problem of either low accuracy or poor generalization ability due to overfitting. In this work, we first developed unbiased datasets to evaluate the actual performance of these methods and proposed a deep learning model, CataPro, based on pre-trained models and molecular fingerprints to predict turnover number (kcat), Michaelis constant (Km), and catalytic efficiency (kcat/Km). Compared with previous baseline models, CataPro demonstrates clearly enhanced accuracy and generalization ability on the unbiased datasets. In a representational enzyme mining project, by combining CataPro with traditional methods, we identified an enzyme (SsCSO) with 19.53 times increased activity compared to the initial enzyme (CSO2) and then successfully engineered it to improve its activity by 3.34 times. This reveals the high potential of CataPro as an effective tool for future enzyme discovery and modification.
期刊介绍:
Nature Communications, an open-access journal, publishes high-quality research spanning all areas of the natural sciences. Papers featured in the journal showcase significant advances relevant to specialists in each respective field. With a 2-year impact factor of 16.6 (2022) and a median time of 8 days from submission to the first editorial decision, Nature Communications is committed to rapid dissemination of research findings. As a multidisciplinary journal, it welcomes contributions from biological, health, physical, chemical, Earth, social, mathematical, applied, and engineering sciences, aiming to highlight important breakthroughs within each domain.