An Encoding Table Corresponding to ASCII Codes for DNA Data Storage and a New Error Correction Method HMSA

IF 3.7 4区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS IEEE Transactions on NanoBioscience Pub Date : 2024-01-22 DOI:10.1109/TNB.2024.3356522
Xuncai Zhang;Fuzhen Zhou
{"title":"An Encoding Table Corresponding to ASCII Codes for DNA Data Storage and a New Error Correction Method HMSA","authors":"Xuncai Zhang;Fuzhen Zhou","doi":"10.1109/TNB.2024.3356522","DOIUrl":null,"url":null,"abstract":"DNA storage stands out from other storage media due to its high capacity, eco-friendliness, long lifespan, high stability, low energy consumption, and low data maintenance costs. To standardize the DNA encoding system, maintain consistency in character representation and transmission, and link binary, base, and character together, this paper combines the encoding method with ASCII code to construct an ASCII-DNA encoding table. The encoding method can encode not only pure text information but also audio and video information and satisfies the GC content constraint and the homopolymer constraint, with the encoding density reaching 1.4 bits/nt. In particular, when encoding textual information, it directly skips the binary conversion process, which reduces the complexity of encoding, and increasing the encoding density to 1.6 bits/nt. In order to solve the problem of errors in sequences, under the influence of heuristic algorithms, this paper proposes a new error correction method (HMSA) by combining minimum Hamming distance, multiple sequence alignment, and encoding scheme. It can correct not only substitution, insertion, and deletion errors in Reads but also consecutive errors in Reads. It greatly improves the utilization of the Reads and avoids the waste of resources. Simulation results show that the recovery rate of Reads increases with the increasing number of sequencing times. When the number of erroneous bases in a 150nt sequence reaches 5nt, the error correction rate can exceed 96% by sequencing the base sequence only 10 times regardless of whether the errors are consecutive or not. Additionally, the HMSA error correction method is applicable to all coding schemes for lookup code table types.","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":"23 2","pages":"344-354"},"PeriodicalIF":3.7000,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on NanoBioscience","FirstCategoryId":"99","ListUrlMain":"https://ieeexplore.ieee.org/document/10410899/","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

DNA storage stands out from other storage media due to its high capacity, eco-friendliness, long lifespan, high stability, low energy consumption, and low data maintenance costs. To standardize the DNA encoding system, maintain consistency in character representation and transmission, and link binary, base, and character together, this paper combines the encoding method with ASCII code to construct an ASCII-DNA encoding table. The encoding method can encode not only pure text information but also audio and video information and satisfies the GC content constraint and the homopolymer constraint, with the encoding density reaching 1.4 bits/nt. In particular, when encoding textual information, it directly skips the binary conversion process, which reduces the complexity of encoding, and increasing the encoding density to 1.6 bits/nt. In order to solve the problem of errors in sequences, under the influence of heuristic algorithms, this paper proposes a new error correction method (HMSA) by combining minimum Hamming distance, multiple sequence alignment, and encoding scheme. It can correct not only substitution, insertion, and deletion errors in Reads but also consecutive errors in Reads. It greatly improves the utilization of the Reads and avoids the waste of resources. Simulation results show that the recovery rate of Reads increases with the increasing number of sequencing times. When the number of erroneous bases in a 150nt sequence reaches 5nt, the error correction rate can exceed 96% by sequencing the base sequence only 10 times regardless of whether the errors are consecutive or not. Additionally, the HMSA error correction method is applicable to all coding schemes for lookup code table types.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于 DNA 数据存储的与 ASCII 编码相对应的编码表和一种新的纠错方法 HMSA。
DNA 存储因其容量大、环保、寿命长、稳定性高、能耗低、数据维护成本低等特点而在其他存储介质中脱颖而出。为了使 DNA 编码系统标准化,保持字符表示和传输的一致性,并将二进制、碱基和字符联系在一起,本文将编码方法与 ASCII 码相结合,构建了 ASCII-DNA 编码表。该编码方法不仅能对纯文本信息进行编码,还能对音频和视频信息进行编码,并满足 GC 内容约束和同源多聚约束,编码密度达到 1.4 bits/nt。具体来说,在对文本信息进行编码时,它直接跳过了二进制转换过程,降低了编码的复杂度,编码密度可提高到 1.6 bits/nt。针对受启发式算法影响而出现的排序错误问题,本文结合最小汉明距离、多序列比对和编码方案,提出了一种新的纠错方法(HMSA)。它不仅能纠正读数中的替换、插入和删除错误,还能纠正读数中的连续错误。它大大提高了读数的利用率,避免了资源浪费。仿真结果表明,Reads 的恢复率随着测序次数的增加而提高。当 150 nts 序列中的错误碱基数达到 5 nts 时,无论错误是否连续,只需对碱基序列测序 10 次,纠错率就能达到 98% 以上。此外,HMSA 纠错方法适用于所有查找码表类型的所有编码方案。此外,HMSA 适用于所有查找码表类型的所有编码方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on NanoBioscience
IEEE Transactions on NanoBioscience 工程技术-纳米科技
CiteScore
7.00
自引率
5.10%
发文量
197
审稿时长
>12 weeks
期刊介绍: The IEEE Transactions on NanoBioscience reports on original, innovative and interdisciplinary work on all aspects of molecular systems, cellular systems, and tissues (including molecular electronics). Topics covered in the journal focus on a broad spectrum of aspects, both on foundations and on applications. Specifically, methods and techniques, experimental aspects, design and implementation, instrumentation and laboratory equipment, clinical aspects, hardware and software data acquisition and analysis and computer based modelling are covered (based on traditional or high performance computing - parallel computers or computer networks).
期刊最新文献
Electrospun Stannic Oxide Nanofiber Thin-Film Based Sensing Device for Monitoring Functional Behaviours of Adherent Mammalian Cells. "Galaxy" encoding: toward high storage density and low cost. 2024 Index IEEE Transactions on NanoBioscience Vol. 23 Table of Contents Front Cover
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1