Capacity-Achieving Constrained Codes with GC-Content and Runlength Limits for DNA Storage

2022 IEEE International Symposium on Information Theory (ISIT) Pub Date : 2022-06-26 DOI:10.1109/ISIT50566.2022.9834494

Yajuan Liu, Xuan He, Xiaohu Tang

引用次数: 5

Abstract

GC-content and homopolymer run are two constraints of interest in DNA storage systems. Extensive experiments showed that if GC-content is too high (low), or homopolymer run exceeds six in a DNA sequence, there will give rise to dramatical increase of insertion, deletion and substitution errors. Committing to study the DNA sequences with both constraints, a recent work (Nguyen et al. 2020) proposed a class of (ϵ, ℓ)-constrained codes that can only asymptotically approach the capacity, but may have reasonable loss for finite code lengths.In this paper, we design the first (ϵ, ℓ)-constrained codes based on the enumeration coding technique which can always achieve capacity regardless of code lengths. In addition, motivated by the influence of local GC-content, we consider a nontrivial case that the prefixes of a DNA sequence also hold GC-content constraint for the first time, called (δ,ℓ)-prefix constrained codes.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有gc含量和运行长度限制的DNA存储容量受限代码

gc含量和均聚物运行是DNA存储系统的两个限制因素。大量实验表明，如果一个DNA序列中gc含量过高(过低)或均聚物数超过6个，插入、删除和替换错误将显著增加。致力于研究具有这两种约束的DNA序列，最近的一项工作(Nguyen et al. 2020)提出了一类(λ， λ)约束的编码，它只能渐近地接近容量，但对于有限的编码长度可能有合理的损失。在本文中，我们设计了第一个基于枚举编码技术的(λ， λ)约束码，无论码长如何，都能获得容量。此外，受局部gc含量的影响，我们考虑了一种非平凡的情况，即DNA序列的前缀也首次具有gc含量约束，称为(δ， r)-前缀约束码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE International Symposium on Information Theory (ISIT)

自引率

0.00%

发文量

期刊最新文献

Fast Low Rank column-wise Compressive Sensing Ternary Message Passing Decoding of RS-SPC Product Codes Understanding Deep Neural Networks Using Sliced Mutual Information Rate-Optimal Streaming Codes Over the Three-Node Decode-And-Forward Relay Network Unlimited Sampling via Generalized Thresholding