Iterative Soft Decoding Algorithm for DNA Storage Using Quality Score and Redecoding

IF 3.7 4区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS IEEE Transactions on NanoBioscience Pub Date : 2023-06-09 DOI:10.1109/TNB.2023.3284406
Jaeho Jeong;Hosung Park;Hee-Youl Kwak;Jong-Seon No;Hahyeon Jeon;Jeong Wook Lee;Jae-Won Kim
{"title":"Iterative Soft Decoding Algorithm for DNA Storage Using Quality Score and Redecoding","authors":"Jaeho Jeong;Hosung Park;Hee-Youl Kwak;Jong-Seon No;Hahyeon Jeon;Jeong Wook Lee;Jae-Won Kim","doi":"10.1109/TNB.2023.3284406","DOIUrl":null,"url":null,"abstract":"Ever since deoxyribonucleic acid (DNA) was considered as a next-generation data-storage medium, lots of research efforts have been made to correct errors occurred during the synthesis, storage, and sequencing processes using error correcting codes (ECCs). Previous works on recovering the data from the sequenced DNA pool with errors have utilized hard decoding algorithms based on a majority decision rule. To improve the correction capability of ECCs and robustness of the DNA storage system, we propose a new iterative soft decoding algorithm, where soft information is obtained from FASTQ files and channel statistics. In particular, we propose a new formula for log-likelihood ratio (LLR) calculation using quality scores (Q-scores) and a redecoding method which may be suitable for the error correction and detection in the DNA sequencing area. Based on the widely adopted encoding scheme of the fountain code structure proposed by Erlich et al., we use three different sets of sequenced data to show consistency for the performance evaluation. The proposed soft decoding algorithm gives 2.3%\n<inline-formula> <tex-math>$\\sim $ </tex-math></inline-formula>\n7.0% improvement of the reading number reduction compared to the state-of-the-art decoding method and it is shown that it can deal with erroneous sequenced oligo reads with insertion and deletion errors.","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on NanoBioscience","FirstCategoryId":"99","ListUrlMain":"https://ieeexplore.ieee.org/document/10147330/","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Ever since deoxyribonucleic acid (DNA) was considered as a next-generation data-storage medium, lots of research efforts have been made to correct errors occurred during the synthesis, storage, and sequencing processes using error correcting codes (ECCs). Previous works on recovering the data from the sequenced DNA pool with errors have utilized hard decoding algorithms based on a majority decision rule. To improve the correction capability of ECCs and robustness of the DNA storage system, we propose a new iterative soft decoding algorithm, where soft information is obtained from FASTQ files and channel statistics. In particular, we propose a new formula for log-likelihood ratio (LLR) calculation using quality scores (Q-scores) and a redecoding method which may be suitable for the error correction and detection in the DNA sequencing area. Based on the widely adopted encoding scheme of the fountain code structure proposed by Erlich et al., we use three different sets of sequenced data to show consistency for the performance evaluation. The proposed soft decoding algorithm gives 2.3% $\sim $ 7.0% improvement of the reading number reduction compared to the state-of-the-art decoding method and it is shown that it can deal with erroneous sequenced oligo reads with insertion and deletion errors.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用质量分数和重编码的 DNA 存储迭代软解码算法
自从脱氧核糖核酸(DNA)被视为下一代数据存储介质以来,人们一直在努力研究如何利用纠错码(ECC)纠正在合成、存储和测序过程中出现的错误。以前从有错误的 DNA 测序池中恢复数据的工作采用的是基于多数决定规则的硬解码算法。为了提高 ECC 的纠错能力和 DNA 存储系统的鲁棒性,我们提出了一种新的迭代软解码算法,其中软信息来自 FASTQ 文件和信道统计数据。特别是,我们提出了一种使用质量分数(Q-scores)计算对数似然比(LLR)的新公式和一种适合 DNA 测序领域纠错和检测的重解码方法。基于 Erlich 等人提出的被广泛采用的喷泉代码结构编码方案,我们使用了三组不同的测序数据来显示性能评估的一致性。与最先进的解码方法相比,所提出的软解码算法的读数减少率提高了 2.3% ∼ 7.0%,并证明它能处理带有插入和删除错误的错误寡核苷酸测序读数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on NanoBioscience
IEEE Transactions on NanoBioscience 工程技术-纳米科技
CiteScore
7.00
自引率
5.10%
发文量
197
审稿时长
>12 weeks
期刊介绍: The IEEE Transactions on NanoBioscience reports on original, innovative and interdisciplinary work on all aspects of molecular systems, cellular systems, and tissues (including molecular electronics). Topics covered in the journal focus on a broad spectrum of aspects, both on foundations and on applications. Specifically, methods and techniques, experimental aspects, design and implementation, instrumentation and laboratory equipment, clinical aspects, hardware and software data acquisition and analysis and computer based modelling are covered (based on traditional or high performance computing - parallel computers or computer networks).
期刊最新文献
Electrospun Stannic Oxide Nanofiber Thin-Film Based Sensing Device for Monitoring Functional Behaviours of Adherent Mammalian Cells. "Galaxy" encoding: toward high storage density and low cost. 2024 Index IEEE Transactions on NanoBioscience Vol. 23 Table of Contents Front Cover
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1