{"title":"A New Version of q-Ary Varshamov-Tenengolts Codes With More Efficient Encoders: The Differential VT Codes and The Differential Shifted VT Codes","authors":"Tuan Thanh Nguyen;Kui Cai;Paul H. Siegel","doi":"10.1109/TIT.2024.3417894","DOIUrl":null,"url":null,"abstract":"The problem of correcting deletions and insertions has recently received significantly increased attention due to the DNA-based data storage technology, which suffers from deletions and insertions with extremely high probability. In this work, we study the problem of constructing non-binary burst-deletion/insertion correcting codes. Particularly, for the quaternary alphabet, our designed codes are suited for correcting a burst of deletions/insertions in DNA storage. Non-binary codes correcting a single deletion or insertion were introduced by Tenengolts (1984), and the results were extended to correct a fixed-length burst of deletions or insertions by Schoeny et al. (2017). Recently, Wang et al. (2021) proposed constructions of non-binary codes of length n, correcting a burst of length at most two for q-ary alphabets with redundancy \n<inline-formula> <tex-math>$\\log n+O(\\log q \\log \\log n)$ </tex-math></inline-formula>\n bits, for arbitrary even q. The common idea in those constructions is to convert non-binary sequences into binary sequences, and the error decoding algorithms for the q-ary sequences are mainly based on the success of recovering the corresponding binary sequences, respectively. In this work, we look at a natural solution that the error detection and correction algorithms are performed directly over q-ary sequences, and for certain cases, our codes provide a more efficient encoder with lower redundancy than the best-known encoder in the literature. Particularly, (Single-error correction codes) We first present a new version of non-binary VT codes that are capable of correcting a single deletion or single insertion, providing an alternative simpler and more efficient encoder of the construction by Tenengolts (1984). Our construction is based on the differential vector, and the codes are referred to as the differential VT codes. In addition, we provide linear-time algorithms that encode user messages into these codes of length n over the q-ary alphabet for \n<inline-formula> <tex-math>$q \\geqslant 2$ </tex-math></inline-formula>\n with at most \n<inline-formula> <tex-math>$\\lceil \\log _{q} n\\rceil +1$ </tex-math></inline-formula>\n redundant symbols, while the optimal redundancy required is at least \n<inline-formula> <tex-math>$\\log _{q} n+\\log _{q} (q-1)$ </tex-math></inline-formula>\n symbols. Our designed encoder reduces the redundancy of the best-known encoder of Tenengolts (1984) by at least 2 redundant symbols or equivalently \n<inline-formula> <tex-math>$2\\log _{2} q$ </tex-math></inline-formula>\n bits. (Burst-error correction codes) We use the idea of the binary shifted VT codes to define the q-ary differential shifted VT codes, and propose non-binary codes correcting a burst of up to two deletions (or two insertions) with redundancy \n<inline-formula> <tex-math>$\\log n+3\\log \\log n+ O(\\log q)$ </tex-math></inline-formula>\n bits, which improves a recent result of Wang et al. (2021) with redundancy \n<inline-formula> <tex-math>$\\log n+O(\\log q \\log \\log n)$ </tex-math></inline-formula>\n bits for all \n<inline-formula> <tex-math>$q\\geqslant 8$ </tex-math></inline-formula>\n. We then extend the construction to design non-binary codes correcting a burst of either exactly or at most t deletions (or insertions) for arbitrary \n<inline-formula> <tex-math>$t\\geqslant 2$ </tex-math></inline-formula>\n.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"70 10","pages":"6989-7004"},"PeriodicalIF":2.2000,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10571999/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The problem of correcting deletions and insertions has recently received significantly increased attention due to the DNA-based data storage technology, which suffers from deletions and insertions with extremely high probability. In this work, we study the problem of constructing non-binary burst-deletion/insertion correcting codes. Particularly, for the quaternary alphabet, our designed codes are suited for correcting a burst of deletions/insertions in DNA storage. Non-binary codes correcting a single deletion or insertion were introduced by Tenengolts (1984), and the results were extended to correct a fixed-length burst of deletions or insertions by Schoeny et al. (2017). Recently, Wang et al. (2021) proposed constructions of non-binary codes of length n, correcting a burst of length at most two for q-ary alphabets with redundancy
$\log n+O(\log q \log \log n)$
bits, for arbitrary even q. The common idea in those constructions is to convert non-binary sequences into binary sequences, and the error decoding algorithms for the q-ary sequences are mainly based on the success of recovering the corresponding binary sequences, respectively. In this work, we look at a natural solution that the error detection and correction algorithms are performed directly over q-ary sequences, and for certain cases, our codes provide a more efficient encoder with lower redundancy than the best-known encoder in the literature. Particularly, (Single-error correction codes) We first present a new version of non-binary VT codes that are capable of correcting a single deletion or single insertion, providing an alternative simpler and more efficient encoder of the construction by Tenengolts (1984). Our construction is based on the differential vector, and the codes are referred to as the differential VT codes. In addition, we provide linear-time algorithms that encode user messages into these codes of length n over the q-ary alphabet for
$q \geqslant 2$
with at most
$\lceil \log _{q} n\rceil +1$
redundant symbols, while the optimal redundancy required is at least
$\log _{q} n+\log _{q} (q-1)$
symbols. Our designed encoder reduces the redundancy of the best-known encoder of Tenengolts (1984) by at least 2 redundant symbols or equivalently
$2\log _{2} q$
bits. (Burst-error correction codes) We use the idea of the binary shifted VT codes to define the q-ary differential shifted VT codes, and propose non-binary codes correcting a burst of up to two deletions (or two insertions) with redundancy
$\log n+3\log \log n+ O(\log q)$
bits, which improves a recent result of Wang et al. (2021) with redundancy
$\log n+O(\log q \log \log n)$
bits for all
$q\geqslant 8$
. We then extend the construction to design non-binary codes correcting a burst of either exactly or at most t deletions (or insertions) for arbitrary
$t\geqslant 2$
.
期刊介绍:
The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.