DNA palette code for time-series archival data storage

IF 16.3 1区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES National Science Review Pub Date : 2024-09-09 DOI:10.1093/nsr/nwae321
Zihui Yan, Haoran Zhang, Boyuan Lu, Tong Han, Xiaoguang Tong, Yingjin Yuan
{"title":"DNA palette code for time-series archival data storage","authors":"Zihui Yan, Haoran Zhang, Boyuan Lu, Tong Han, Xiaoguang Tong, Yingjin Yuan","doi":"10.1093/nsr/nwae321","DOIUrl":null,"url":null,"abstract":"The long-term preservation of large volumes of infrequently accessed cold data poses challenges to the storage community. Deoxyribonucleic Acid (DNA) is considered a promising solution due to its inherent physical stability and significant storage density. The information density and decoding sequence coverage are two important metrics that influence the efficiency of DNA data storage. In this study, we propose a novel coding scheme called DNA Palette code, which is suitable for cold data, especially time-series archival datasets. These datasets are not frequently accessed but necessitate reliable long-term storage for retrospective research. The DNA Palette code employs unordered combinations of index-free oligonucleotides (oligos) to represent binary information. It can achieve high net information density encoding and lossless decoding with low sequencing coverage. When sequencing reads are corrupted, it can still effectively recover partial information, preventing the complete failure of file retrieval. The in vivo testing of clinical brain magnetic resonance imaging (MRI) data storage, as well as simulation validations using large-scale public MRI datasets (10 GB), planetary science datasets, and meteorological datasets, demonstrate the advantages of our coding scheme, including high information density, low decoding sequence coverage, and wide applicability.","PeriodicalId":18842,"journal":{"name":"National Science Review","volume":null,"pages":null},"PeriodicalIF":16.3000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"National Science Review","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1093/nsr/nwae321","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The long-term preservation of large volumes of infrequently accessed cold data poses challenges to the storage community. Deoxyribonucleic Acid (DNA) is considered a promising solution due to its inherent physical stability and significant storage density. The information density and decoding sequence coverage are two important metrics that influence the efficiency of DNA data storage. In this study, we propose a novel coding scheme called DNA Palette code, which is suitable for cold data, especially time-series archival datasets. These datasets are not frequently accessed but necessitate reliable long-term storage for retrospective research. The DNA Palette code employs unordered combinations of index-free oligonucleotides (oligos) to represent binary information. It can achieve high net information density encoding and lossless decoding with low sequencing coverage. When sequencing reads are corrupted, it can still effectively recover partial information, preventing the complete failure of file retrieval. The in vivo testing of clinical brain magnetic resonance imaging (MRI) data storage, as well as simulation validations using large-scale public MRI datasets (10 GB), planetary science datasets, and meteorological datasets, demonstrate the advantages of our coding scheme, including high information density, low decoding sequence coverage, and wide applicability.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于存储时间序列档案数据的 DNA 调色板代码
长期保存大量不常访问的冷数据给存储界带来了挑战。脱氧核糖核酸(DNA)因其固有的物理稳定性和巨大的存储密度而被认为是一种有前途的解决方案。信息密度和解码序列覆盖率是影响 DNA 数据存储效率的两个重要指标。在这项研究中,我们提出了一种名为 DNA 调色板代码的新型编码方案,它适用于冷数据,尤其是时间序列档案数据集。这些数据集不经常被访问,但需要可靠的长期存储,以便进行回顾性研究。DNA 调色板代码采用无索引寡核苷酸(oligos)的无序组合来表示二进制信息。它可以在低测序覆盖率的情况下实现高净信息密度编码和无损解码。当测序读数被破坏时,它仍能有效恢复部分信息,防止文件检索完全失败。临床脑磁共振成像(MRI)数据存储的活体测试,以及使用大规模公共磁共振成像数据集(10 GB)、行星科学数据集和气象数据集进行的模拟验证,证明了我们的编码方案具有高信息密度、低解码序列覆盖率和广泛适用性等优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
National Science Review
National Science Review MULTIDISCIPLINARY SCIENCES-
CiteScore
24.10
自引率
1.90%
发文量
249
审稿时长
13 weeks
期刊介绍: National Science Review (NSR; ISSN abbreviation: Natl. Sci. Rev.) is an English-language peer-reviewed multidisciplinary open-access scientific journal published by Oxford University Press under the auspices of the Chinese Academy of Sciences.According to Journal Citation Reports, its 2021 impact factor was 23.178. National Science Review publishes both review articles and perspectives as well as original research in the form of brief communications and research articles.
期刊最新文献
DNA palette code for time-series archival data storage Unveiling the interfacial liquid in electrochemical reactions Engineered Janus hydrogels: biomimetic surface engineering and biomedical applications Stripe charge order and its interaction with Majorana bound states in 2M-WS2 topological superconductor Spin-dependent electrocatalysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1