Scalable and robust DNA-based storage via coding theory and deep learning

IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Nature Machine Intelligence Pub Date : 2025-02-21 DOI:10.1038/s42256-025-01003-z
Daniella Bar-Lev, Itai Orr, Omer Sabary, Tuvi Etzion, Eitan Yaakobi
{"title":"Scalable and robust DNA-based storage via coding theory and deep learning","authors":"Daniella Bar-Lev, Itai Orr, Omer Sabary, Tuvi Etzion, Eitan Yaakobi","doi":"10.1038/s42256-025-01003-z","DOIUrl":null,"url":null,"abstract":"The global data sphere is expanding exponentially, projected to hit 180 zettabytes by 2025, whereas current technologies are not anticipated to scale at nearly the same rate. DNA-based storage emerges as a crucial solution to this gap, enabling digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic and optical storage solutions such as exceptional information density, enhanced data durability and negligible power consumption to maintain data integrity. To access the data, an information retrieval process is employed, where some of the main bottlenecks are the scalability and accuracy, which have a natural tradeoff between the two. Here we show a modular and holistic approach that combines deep neural networks trained on simulated data, tensor product-based error-correcting codes and a safety margin mechanism into a single coherent pipeline. We demonstrated our solution on 3.1 MB of information using two different sequencing technologies. Our work improves upon the current leading solutions with a 3,200× increase in speed and a 40% improvement in accuracy and offers a code rate of 1.6 bits per base in a high-noise regime. In a broader sense, our work shows a viable path to commercial DNA storage solutions hindered by current information retrieval processes. Bar-Lev et al. propose a high-efficiency DNA-based storage pipeline that integrates deep neural networks, error-correcting codes and safety margins, achieving a 3,200× speed improvement and a 40% accuracy gain, paving the way for commercially viable DNA data storage.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 4","pages":"639-649"},"PeriodicalIF":23.9000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.nature.com/articles/s42256-025-01003-z","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The global data sphere is expanding exponentially, projected to hit 180 zettabytes by 2025, whereas current technologies are not anticipated to scale at nearly the same rate. DNA-based storage emerges as a crucial solution to this gap, enabling digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic and optical storage solutions such as exceptional information density, enhanced data durability and negligible power consumption to maintain data integrity. To access the data, an information retrieval process is employed, where some of the main bottlenecks are the scalability and accuracy, which have a natural tradeoff between the two. Here we show a modular and holistic approach that combines deep neural networks trained on simulated data, tensor product-based error-correcting codes and a safety margin mechanism into a single coherent pipeline. We demonstrated our solution on 3.1 MB of information using two different sequencing technologies. Our work improves upon the current leading solutions with a 3,200× increase in speed and a 40% improvement in accuracy and offers a code rate of 1.6 bits per base in a high-noise regime. In a broader sense, our work shows a viable path to commercial DNA storage solutions hindered by current information retrieval processes. Bar-Lev et al. propose a high-efficiency DNA-based storage pipeline that integrates deep neural networks, error-correcting codes and safety margins, achieving a 3,200× speed improvement and a 40% accuracy gain, paving the way for commercially viable DNA data storage.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过编码理论和深度学习可扩展和健壮的基于dna的存储
全球数据领域正在呈指数级增长,预计到2025年将达到180泽字节,而目前的技术预计不会以几乎相同的速度扩展。基于DNA的存储成为解决这一差距的关键解决方案,使数字信息能够存档在DNA分子中。与磁性和光存储解决方案相比,这种方法具有卓越的信息密度、增强的数据耐久性和可忽略不计的功耗,以保持数据完整性。为了访问数据,需要使用信息检索过程,其中的一些主要瓶颈是可伸缩性和准确性,这两者之间需要进行自然的权衡。在这里,我们展示了一种模块化和整体的方法,将模拟数据训练的深度神经网络、基于张量产品的纠错码和安全边际机制结合到一个单一的连贯管道中。我们使用两种不同的测序技术在3.1 MB的信息上演示了我们的解决方案。我们的工作改进了当前领先的解决方案,速度提高了3200倍,精度提高了40%,并在高噪声状态下提供了1.6位/基的码率。从更广泛的意义上说,我们的工作显示了一条可行的商业DNA存储解决方案的途径,这些解决方案受到当前信息检索过程的阻碍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
36.90
自引率
2.10%
发文量
127
期刊介绍: Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements. To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects. Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.
期刊最新文献
Machine learning global atomic representations with Euclidean fast attention Reverse predictivity for bidirectional comparison of neural networks and biological brains Computational framework to predict and shape human–machine interactions in closed-loop, co-adaptive neural interfaces Interpretability and implicit model semantics in biomedicine and deep learning Sample-efficient generative molecular design using memory manipulation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1