Accurate top protein variant discovery via low-N pick-and-validate machine learning.

Cell systems Pub Date : 2024-02-21 Epub Date: 2024-02-09 DOI:10.1016/j.cels.2024.01.002
Hoi Yee Chu, John H C Fong, Dawn G L Thean, Peng Zhou, Frederic K C Fung, Yuanhua Huang, Alan S L Wong
{"title":"Accurate top protein variant discovery via low-N pick-and-validate machine learning.","authors":"Hoi Yee Chu, John H C Fong, Dawn G L Thean, Peng Zhou, Frederic K C Fung, Yuanhua Huang, Alan S L Wong","doi":"10.1016/j.cels.2024.01.002","DOIUrl":null,"url":null,"abstract":"<p><p>A strategy to obtain the greatest number of best-performing variants with least amount of experimental effort over the vast combinatorial mutational landscape would have enormous utility in boosting resource producibility for protein engineering. Toward this goal, we present a simple and effective machine learning-based strategy that outperforms other state-of-the-art methods. Our strategy integrates zero-shot prediction and multi-round sampling to direct active learning via experimenting with only a few predicted top variants. We find that four rounds of low-N pick-and-validate sampling of 12 variants for machine learning yielded the best accuracy of up to 92.6% in selecting the true top 1% variants in combinatorial mutant libraries, whereas two rounds of 24 variants can also be used. We demonstrate our strategy in successfully discovering high-performance protein variants from diverse families including the CRISPR-based genome editors, supporting its generalizable application for solving protein engineering tasks. A record of this paper's transparent peer review process is included in the supplemental information.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.cels.2024.01.002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/9 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A strategy to obtain the greatest number of best-performing variants with least amount of experimental effort over the vast combinatorial mutational landscape would have enormous utility in boosting resource producibility for protein engineering. Toward this goal, we present a simple and effective machine learning-based strategy that outperforms other state-of-the-art methods. Our strategy integrates zero-shot prediction and multi-round sampling to direct active learning via experimenting with only a few predicted top variants. We find that four rounds of low-N pick-and-validate sampling of 12 variants for machine learning yielded the best accuracy of up to 92.6% in selecting the true top 1% variants in combinatorial mutant libraries, whereas two rounds of 24 variants can also be used. We demonstrate our strategy in successfully discovering high-performance protein variants from diverse families including the CRISPR-based genome editors, supporting its generalizable application for solving protein engineering tasks. A record of this paper's transparent peer review process is included in the supplemental information.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过低 N 挑选和验证机器学习准确发现顶级蛋白质变体。
一种能在广阔的组合突变景观中以最少的实验工作量获得最佳变体数量的策略,对于提高蛋白质工程的资源可生产性将大有裨益。为了实现这一目标,我们提出了一种简单有效的基于机器学习的策略,其效果优于其他最先进的方法。我们的策略整合了零次预测和多轮采样,通过仅对少数预测的顶级变异进行实验来指导主动学习。我们发现,通过对 12 个变体进行四轮低 N 挑选和验证采样来进行机器学习,在组合突变体库中选出真正的前 1%变体时,准确率最高可达 92.6%,而对 24 个变体进行两轮采样也是可行的。我们展示了我们的策略,它成功地从包括基于CRISPR的基因组编辑器在内的不同家族中发现了高性能蛋白质变体,支持了它在解决蛋白质工程任务中的可推广应用。本文透明的同行评审过程记录包含在补充信息中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Plausible, robust biological oscillations through allelic buffering. Markov field network model of multi-modal data predicts effects of immune system perturbations on intravenous BCG vaccination in macaques. Automated single-cell omics end-to-end framework with data-driven batch inference. Entrainment and multi-stability of the p53 oscillator in human cells. Protein turnover regulation is critical for influenza A virus infection.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1