Integrating Deep Learning and Synthetic Biology: A Co-Design Approach for Enhancing Gene Expression via N-Terminal Coding Sequences

IF 3.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS ACS Synthetic Biology Pub Date : 2024-09-04 DOI:10.1021/acssynbio.4c0037110.1021/acssynbio.4c00371
Zhanglu Yan*, Weiran Chu, Yuhua Sheng, Kaiwen Tang, Shida Wang, Yanfeng Liu* and Weng-Fai Wong, 
{"title":"Integrating Deep Learning and Synthetic Biology: A Co-Design Approach for Enhancing Gene Expression via N-Terminal Coding Sequences","authors":"Zhanglu Yan*,&nbsp;Weiran Chu,&nbsp;Yuhua Sheng,&nbsp;Kaiwen Tang,&nbsp;Shida Wang,&nbsp;Yanfeng Liu* and Weng-Fai Wong,&nbsp;","doi":"10.1021/acssynbio.4c0037110.1021/acssynbio.4c00371","DOIUrl":null,"url":null,"abstract":"<p >N-terminal coding sequence (NCS) influences gene expression by impacting the translation initiation rate. The NCS optimization problem is to find an NCS that maximizes gene expression. The problem is important in genetic engineering. However, current methods for NCS optimization such as rational design and statistics-guided approaches are labor-intensive yield only relatively small improvements. This paper introduces a deep learning/synthetic biology codesigned few-shot training workflow for NCS optimization. Our method utilizes <i>k</i>-nearest encoding followed by word2vec to encode the NCS, then performs feature extraction using attention mechanisms, before constructing a time-series network for predicting gene expression intensity, and finally a direct search algorithm identifies the optimal NCS with limited training data. We took green fluorescent protein (GFP) expressed by <i>Bacillus subtilis</i> as a reporting protein of NCSs, and employed the fluorescence enhancement factor as the metric of NCS optimization. Within just six iterative experiments, our model generated an NCS (MLD<sub>62</sub>) that increased average GFP expression by 5.41-fold, outperforming the state-of-the-art NCS designs. Extending our findings beyond GFP, we showed that our engineered NCS (MLD<sub>62</sub>) can effectively boost the production of N-acetylneuraminic acid by enhancing the expression of the crucial rate-limiting <i>GNA1</i> gene, demonstrating its practical utility. We have open-sourced our NCS expression database and experimental procedures for public use.</p>","PeriodicalId":26,"journal":{"name":"ACS Synthetic Biology","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Synthetic Biology","FirstCategoryId":"99","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acssynbio.4c00371","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

N-terminal coding sequence (NCS) influences gene expression by impacting the translation initiation rate. The NCS optimization problem is to find an NCS that maximizes gene expression. The problem is important in genetic engineering. However, current methods for NCS optimization such as rational design and statistics-guided approaches are labor-intensive yield only relatively small improvements. This paper introduces a deep learning/synthetic biology codesigned few-shot training workflow for NCS optimization. Our method utilizes k-nearest encoding followed by word2vec to encode the NCS, then performs feature extraction using attention mechanisms, before constructing a time-series network for predicting gene expression intensity, and finally a direct search algorithm identifies the optimal NCS with limited training data. We took green fluorescent protein (GFP) expressed by Bacillus subtilis as a reporting protein of NCSs, and employed the fluorescence enhancement factor as the metric of NCS optimization. Within just six iterative experiments, our model generated an NCS (MLD62) that increased average GFP expression by 5.41-fold, outperforming the state-of-the-art NCS designs. Extending our findings beyond GFP, we showed that our engineered NCS (MLD62) can effectively boost the production of N-acetylneuraminic acid by enhancing the expression of the crucial rate-limiting GNA1 gene, demonstrating its practical utility. We have open-sourced our NCS expression database and experimental procedures for public use.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
深度学习与合成生物学的结合:通过 N 端编码序列增强基因表达的协同设计方法
N 端编码序列(NCS)通过影响翻译启动率来影响基因表达。NCS 优化问题是找到一个能使基因表达最大化的 NCS。这个问题在基因工程中非常重要。然而,目前的 NCS 优化方法,如合理设计和统计引导方法,都是劳动密集型的,只能产生相对较小的改进。本文介绍了一种针对 NCS 优化的深度学习/合成生物学编码设计的少量训练工作流程。我们的方法利用 k-nearest 编码和 word2vec 对 NCS 进行编码,然后利用注意力机制进行特征提取,最后构建用于预测基因表达强度的时间序列网络。我们以枯草杆菌表达的绿色荧光蛋白(GFP)作为 NCS 的报告蛋白,并采用荧光增强因子作为 NCS 优化的指标。在短短六次迭代实验中,我们的模型生成的 NCS(MLD62)将 GFP 的平均表达量提高了 5.41 倍,优于最先进的 NCS 设计。除 GFP 外,我们的研究结果还表明,我们设计的 NCS(MLD62)可以通过提高关键的限速 GNA1 基因的表达,有效提高 N-乙酰神经氨酸的产量,这证明了它的实用性。我们已将 NCS 表达数据库和实验程序开源,供公众使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
8.00
自引率
10.60%
发文量
380
审稿时长
6-12 weeks
期刊介绍: The journal is particularly interested in studies on the design and synthesis of new genetic circuits and gene products; computational methods in the design of systems; and integrative applied approaches to understanding disease and metabolism. Topics may include, but are not limited to: Design and optimization of genetic systems Genetic circuit design and their principles for their organization into programs Computational methods to aid the design of genetic systems Experimental methods to quantify genetic parts, circuits, and metabolic fluxes Genetic parts libraries: their creation, analysis, and ontological representation Protein engineering including computational design Metabolic engineering and cellular manufacturing, including biomass conversion Natural product access, engineering, and production Creative and innovative applications of cellular programming Medical applications, tissue engineering, and the programming of therapeutic cells Minimal cell design and construction Genomics and genome replacement strategies Viral engineering Automated and robotic assembly platforms for synthetic biology DNA synthesis methodologies Metagenomics and synthetic metagenomic analysis Bioinformatics applied to gene discovery, chemoinformatics, and pathway construction Gene optimization Methods for genome-scale measurements of transcription and metabolomics Systems biology and methods to integrate multiple data sources in vitro and cell-free synthetic biology and molecular programming Nucleic acid engineering.
期刊最新文献
Efficient Strategy for Synthesizing Vector-Free and Oncolytic Herpes Simplex Type 1 Viruses. One-Pot Assay for Rapid Detection of Stenotrophomonas maltophilia by RPA-CRISPR/Cas12a. Correction to "Cell-Free Gene Expression Dynamics in Synthetic Cell Populations". The Potential of Artificial Cells Functioning under In Situ Deep-Sea Conditions. Disentangling the Regulatory Response of Agrobacterium tumefaciens CHLDO to Glyphosate for Engineering Whole-Cell Phosphonate Biosensors.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1