DNA数据存储中纠错编码的挑战:光刻合成和DNA衰变

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY Digital discovery Pub Date : 2024-10-18 DOI:10.1039/D4DD00220B
Andreas L. Gimpel, Wendelin J. Stark, Reinhard Heckel and Robert N. Grass
{"title":"DNA数据存储中纠错编码的挑战:光刻合成和DNA衰变","authors":"Andreas L. Gimpel, Wendelin J. Stark, Reinhard Heckel and Robert N. Grass","doi":"10.1039/D4DD00220B","DOIUrl":null,"url":null,"abstract":"<p >Efficient error-correction codes are crucial for realizing DNA's potential as a long-lasting, high-density storage medium for digital data. At the same time, new workflows promising low-cost, resilient DNA data storage are challenging their design and error-correcting capabilities. This study characterizes the errors and biases in two new additions to the state-of-the-art workflow in DNA data storage: photolithographic synthesis and DNA decay. Photolithographic synthesis offers low-cost, scalable oligonucleotide synthesis but suffers from high error rates, necessitating sophisticated error-correction schemes, for example codes introducing within-sequence redundancy combined with clustering and alignment techniques for retrieval. On the other hand, the decoding of oligo fragments after DNA decay promises unprecedented storage densities, but complicates data recovery by requiring the reassembly of full-length sequences or the use of partial sequences for decoding. Our analysis provides a detailed account of the error patterns and biases present in photolithographic synthesis and DNA decay, and identifies considerable bias stemming from sequencing workflows. We implement our findings into a digital twin of the two workflows, offering a tool for developing error-correction codes and providing benchmarks for the evaluation of codec performance.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2497-2508"},"PeriodicalIF":6.2000,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00220b?page=search","citationCount":"0","resultStr":"{\"title\":\"Challenges for error-correction coding in DNA data storage: photolithographic synthesis and DNA decay†\",\"authors\":\"Andreas L. Gimpel, Wendelin J. Stark, Reinhard Heckel and Robert N. Grass\",\"doi\":\"10.1039/D4DD00220B\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Efficient error-correction codes are crucial for realizing DNA's potential as a long-lasting, high-density storage medium for digital data. At the same time, new workflows promising low-cost, resilient DNA data storage are challenging their design and error-correcting capabilities. This study characterizes the errors and biases in two new additions to the state-of-the-art workflow in DNA data storage: photolithographic synthesis and DNA decay. Photolithographic synthesis offers low-cost, scalable oligonucleotide synthesis but suffers from high error rates, necessitating sophisticated error-correction schemes, for example codes introducing within-sequence redundancy combined with clustering and alignment techniques for retrieval. On the other hand, the decoding of oligo fragments after DNA decay promises unprecedented storage densities, but complicates data recovery by requiring the reassembly of full-length sequences or the use of partial sequences for decoding. Our analysis provides a detailed account of the error patterns and biases present in photolithographic synthesis and DNA decay, and identifies considerable bias stemming from sequencing workflows. We implement our findings into a digital twin of the two workflows, offering a tool for developing error-correction codes and providing benchmarks for the evaluation of codec performance.</p>\",\"PeriodicalId\":72816,\"journal\":{\"name\":\"Digital discovery\",\"volume\":\" 12\",\"pages\":\" 2497-2508\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2024-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00220b?page=search\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital discovery\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00220b\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00220b","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

高效的纠错码对于实现DNA作为数字数据持久、高密度存储介质的潜力至关重要。与此同时,低成本、弹性DNA数据存储的新工作流程正在挑战它们的设计和纠错能力。本研究描述了DNA数据存储中两个新添加的最新工作流程中的错误和偏差:光刻合成和DNA衰变。光刻合成提供了低成本、可扩展的寡核苷酸合成,但存在高错误率,需要复杂的纠错方案,例如引入序列内冗余的代码,结合聚类和比对技术进行检索。另一方面,DNA衰变后寡核苷酸片段的解码承诺了前所未有的存储密度,但由于需要重新组装全长序列或使用部分序列进行解码,使得数据恢复变得复杂。我们的分析提供了光刻合成和DNA衰变中存在的错误模式和偏差的详细说明,并确定了源于测序工作流程的相当大的偏差。我们将我们的发现应用到两个工作流程的数字孪生中,提供了开发纠错码的工具,并提供了评估编解码器性能的基准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Challenges for error-correction coding in DNA data storage: photolithographic synthesis and DNA decay†

Efficient error-correction codes are crucial for realizing DNA's potential as a long-lasting, high-density storage medium for digital data. At the same time, new workflows promising low-cost, resilient DNA data storage are challenging their design and error-correcting capabilities. This study characterizes the errors and biases in two new additions to the state-of-the-art workflow in DNA data storage: photolithographic synthesis and DNA decay. Photolithographic synthesis offers low-cost, scalable oligonucleotide synthesis but suffers from high error rates, necessitating sophisticated error-correction schemes, for example codes introducing within-sequence redundancy combined with clustering and alignment techniques for retrieval. On the other hand, the decoding of oligo fragments after DNA decay promises unprecedented storage densities, but complicates data recovery by requiring the reassembly of full-length sequences or the use of partial sequences for decoding. Our analysis provides a detailed account of the error patterns and biases present in photolithographic synthesis and DNA decay, and identifies considerable bias stemming from sequencing workflows. We implement our findings into a digital twin of the two workflows, offering a tool for developing error-correction codes and providing benchmarks for the evaluation of codec performance.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
期刊最新文献
Back cover Biophysics-guided uncertainty-aware deep learning uncovers high-affinity plastic-binding peptides Commit: Mini article for dynamic reporting of incremental improvements to previous scholarly work Artificial intelligence-assisted electrochemical sensors for qualitative and semi-quantitative multiplexed analyses† Exploring the expertise of large language models in materials science and metallurgical engineering†
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1