Yan Zheng , Xi-Chen Cui , Fei Guo , Ming-Liang Dou , Ze-Xiong Xie , Ying-Jin Yuan
{"title":"Design and structure of overlapping regions in PCA via deep learning","authors":"Yan Zheng , Xi-Chen Cui , Fei Guo , Ming-Liang Dou , Ze-Xiong Xie , Ying-Jin Yuan","doi":"10.1016/j.synbio.2024.12.007","DOIUrl":null,"url":null,"abstract":"<div><div>Polymerase cycling assembly (PCA) stands out as the predominant method in the synthesis of kilobase-length DNA fragments. The design of overlapping regions is the core factor affecting the success rate of synthesis. However, there still exists DNA sequences that are challenging to design and construct in the genome synthesis. Here we proposed a deep learning model based on extensive synthesis data to discern latent sequence representations in overlapping regions with an AUPR of 0.805. Utilizing the model, we developed the SmartCut algorithm aimed at designing oligonucleotides and enhancing the success rate of PCA experiments. This algorithm was successfully applied to sequences with diverse synthesis constraints, 80.4 % of which were synthesized in a single round. We further discovered structure differences represented by major groove width, stagger, slide, and centroid distance between overlapping and non-overlapping regions, which elucidated the model's reasonableness through the lens of physical chemistry. This comprehensive approach facilitates streamlined and efficient investigations into the genome synthesis.</div></div>","PeriodicalId":22148,"journal":{"name":"Synthetic and Systems Biotechnology","volume":"10 2","pages":"Pages 442-451"},"PeriodicalIF":4.4000,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Synthetic and Systems Biotechnology","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2405805X24001595","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Polymerase cycling assembly (PCA) stands out as the predominant method in the synthesis of kilobase-length DNA fragments. The design of overlapping regions is the core factor affecting the success rate of synthesis. However, there still exists DNA sequences that are challenging to design and construct in the genome synthesis. Here we proposed a deep learning model based on extensive synthesis data to discern latent sequence representations in overlapping regions with an AUPR of 0.805. Utilizing the model, we developed the SmartCut algorithm aimed at designing oligonucleotides and enhancing the success rate of PCA experiments. This algorithm was successfully applied to sequences with diverse synthesis constraints, 80.4 % of which were synthesized in a single round. We further discovered structure differences represented by major groove width, stagger, slide, and centroid distance between overlapping and non-overlapping regions, which elucidated the model's reasonableness through the lens of physical chemistry. This comprehensive approach facilitates streamlined and efficient investigations into the genome synthesis.
期刊介绍:
Synthetic and Systems Biotechnology aims to promote the communication of original research in synthetic and systems biology, with strong emphasis on applications towards biotechnology. This journal is a quarterly peer-reviewed journal led by Editor-in-Chief Lixin Zhang. The journal publishes high-quality research; focusing on integrative approaches to enable the understanding and design of biological systems, and research to develop the application of systems and synthetic biology to natural systems. This journal will publish Articles, Short notes, Methods, Mini Reviews, Commentary and Conference reviews.