A practical DNA data storage using an expanded alphabet introducing 5-methylcytosine.

IF 1.2 GigaByte (Hong Kong, China) Pub Date : 2025-01-24 eCollection Date: 2025-01-01 DOI:10.46471/gigabyte.147
Deruilin Liu, Demin Xu, Liuxin Shi, Jiayuan Zhang, Kewei Bi, Bei Luo, Chen Liu, Yuxiang Li, Guangyi Fan, Wen Wang, Zhi Ping
{"title":"A practical DNA data storage using an expanded alphabet introducing 5-methylcytosine.","authors":"Deruilin Liu, Demin Xu, Liuxin Shi, Jiayuan Zhang, Kewei Bi, Bei Luo, Chen Liu, Yuxiang Li, Guangyi Fan, Wen Wang, Zhi Ping","doi":"10.46471/gigabyte.147","DOIUrl":null,"url":null,"abstract":"<p><p>The DNA molecule is a promising next-generation data storage medium. Recently, it has been theoretically proposed that non-natural or modified bases can serve as extra molecular letters to increase the information density. However, this strategy is challenging due to the difficulty in synthesizing non-natural DNA sequences and their complex structure. Here, we described a practical DNA data storage transcoding scheme named R+ based on an expanded molecular alphabet that introduces 5-methylcytosine (5mC). We demonstrated its experimental validation by encoding one representative file into several 1.3∼1.6 kbps <i>in vitro</i> DNA fragments for nanopore sequencing. Our results show an average data recovery rate of 98.97% and 86.91% with and without reference, respectively. Our work validates the practicability of 5mC in DNA storage systems, with a potentially wide range of applications.</p><p><strong>Availability and implementation: </strong>R+ is implemented in Python and the code is available under a MIT license at https://github.com/Incpink-Liu/DNA-storage-R_plus.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte147"},"PeriodicalIF":1.2000,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11791762/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaByte (Hong Kong, China)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46471/gigabyte.147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The DNA molecule is a promising next-generation data storage medium. Recently, it has been theoretically proposed that non-natural or modified bases can serve as extra molecular letters to increase the information density. However, this strategy is challenging due to the difficulty in synthesizing non-natural DNA sequences and their complex structure. Here, we described a practical DNA data storage transcoding scheme named R+ based on an expanded molecular alphabet that introduces 5-methylcytosine (5mC). We demonstrated its experimental validation by encoding one representative file into several 1.3∼1.6 kbps in vitro DNA fragments for nanopore sequencing. Our results show an average data recovery rate of 98.97% and 86.91% with and without reference, respectively. Our work validates the practicability of 5mC in DNA storage systems, with a potentially wide range of applications.

Availability and implementation: R+ is implemented in Python and the code is available under a MIT license at https://github.com/Incpink-Liu/DNA-storage-R_plus.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一个实用的DNA数据存储使用扩展字母表引入5-甲基胞嘧啶。
DNA分子是一种很有前途的下一代数据存储介质。近年来,从理论上提出非天然或修饰碱基可以作为额外的分子字母来增加信息密度。然而,由于合成非天然DNA序列的困难及其复杂的结构,这种策略具有挑战性。在这里,我们描述了一种实用的DNA数据存储转编码方案,名为R+,该方案基于扩展的分子字母表,引入了5-甲基胞嘧啶(5mC)。我们通过将一个代表性文件编码为几个1.3 ~ 1.6 kbps的体外DNA片段进行纳米孔测序来验证其实验有效性。结果表明,在有参考文献和无参考文献的情况下,平均数据恢复率分别为98.97%和86.91%。我们的工作验证了5mC在DNA存储系统中的实用性,具有潜在的广泛应用。可用性和实现:R+是用Python实现的,其代码在MIT许可下可在https://github.com/Incpink-Liu/DNA-storage-R_plus上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.60
自引率
0.00%
发文量
0
审稿时长
5 weeks
期刊最新文献
AI in practice: a multilingual survey of 2025 BioHackathon participants. Spatial temporal distribution of Anopheles mosquitoes in different ecological zones of Ghana. Urban malaria vector dynamics in Accra, Ghana. Transcriptomic profile of embryoid bodies under hypoxia at single cell level. First draft genome of the decaploid species, Ludwigia grandiflora subsp. hexapetala, validated through gene expression.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1