Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound

Bernhard Haeupler, Amirbehshad Shahrasbi
{"title":"Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound","authors":"Bernhard Haeupler, Amirbehshad Shahrasbi","doi":"10.1145/3468265","DOIUrl":null,"url":null,"abstract":"We introduce synchronization strings, which provide a novel way to efficiently deal with synchronization errors, i.e., insertions and deletions. Synchronization errors are strictly more general and much harder to cope with than more commonly considered Hamming-type errors, i.e., symbol substitutions and erasures. For every ε > 0, synchronization strings allow us to index a sequence with an ε-O(1)-size alphabet, such that one can efficiently transform k synchronization errors into (1 + ε)k Hamming-type errors. This powerful new technique has many applications. In this article, we focus on designing insdel codes, i.e., error correcting block codes (ECCs) for insertion-deletion channels. While ECCs for both Hamming-type errors and synchronization errors have been intensely studied, the latter has largely resisted progress. As Mitzenmacher puts it in his 2009 survey [30]: “Channels with synchronization errors...are simply not adequately understood by current theory. Given the near-complete knowledge, we have for channels with erasures and errors...our lack of understanding about channels with synchronization errors is truly remarkable.” Indeed, it took until 1999 for the first insdel codes with constant rate, constant distance, and constant alphabet size to be constructed and only since 2016 are there constructions of constant rate insdel codes for asymptotically large noise rates. Even in the asymptotically large or small noise regimes, these codes are polynomially far from the optimal rate-distance tradeoff. This makes the understanding of insdel codes up to this work equivalent to what was known for regular ECCs after Forney introduced concatenated codes in his doctoral thesis 50 years ago. A straightforward application of our synchronization strings-based indexing method gives a simple black-box construction that transforms any ECC into an equally efficient insdel code with only a small increase in the alphabet size. This instantly transfers much of the highly developed understanding for regular ECCs into the realm of insdel codes. Most notably, for the complete noise spectrum, we obtain efficient “near-MDS” insdel codes, which get arbitrarily close to the optimal rate-distance tradeoff given by the Singleton bound. In particular, for any δ ∈ (0,1) and ε > 0, we give a family of insdel codes achieving a rate of 1 - δ - ε over a constant-size alphabet that efficiently corrects a δ fraction of insertions or deletions.","PeriodicalId":17199,"journal":{"name":"Journal of the ACM (JACM)","volume":"56 1","pages":"1 - 39"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the ACM (JACM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3468265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

We introduce synchronization strings, which provide a novel way to efficiently deal with synchronization errors, i.e., insertions and deletions. Synchronization errors are strictly more general and much harder to cope with than more commonly considered Hamming-type errors, i.e., symbol substitutions and erasures. For every ε > 0, synchronization strings allow us to index a sequence with an ε-O(1)-size alphabet, such that one can efficiently transform k synchronization errors into (1 + ε)k Hamming-type errors. This powerful new technique has many applications. In this article, we focus on designing insdel codes, i.e., error correcting block codes (ECCs) for insertion-deletion channels. While ECCs for both Hamming-type errors and synchronization errors have been intensely studied, the latter has largely resisted progress. As Mitzenmacher puts it in his 2009 survey [30]: “Channels with synchronization errors...are simply not adequately understood by current theory. Given the near-complete knowledge, we have for channels with erasures and errors...our lack of understanding about channels with synchronization errors is truly remarkable.” Indeed, it took until 1999 for the first insdel codes with constant rate, constant distance, and constant alphabet size to be constructed and only since 2016 are there constructions of constant rate insdel codes for asymptotically large noise rates. Even in the asymptotically large or small noise regimes, these codes are polynomially far from the optimal rate-distance tradeoff. This makes the understanding of insdel codes up to this work equivalent to what was known for regular ECCs after Forney introduced concatenated codes in his doctoral thesis 50 years ago. A straightforward application of our synchronization strings-based indexing method gives a simple black-box construction that transforms any ECC into an equally efficient insdel code with only a small increase in the alphabet size. This instantly transfers much of the highly developed understanding for regular ECCs into the realm of insdel codes. Most notably, for the complete noise spectrum, we obtain efficient “near-MDS” insdel codes, which get arbitrarily close to the optimal rate-distance tradeoff given by the Singleton bound. In particular, for any δ ∈ (0,1) and ε > 0, we give a family of insdel codes achieving a rate of 1 - δ - ε over a constant-size alphabet that efficiently corrects a δ fraction of insertions or deletions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
同步字符串:接近单例边界的插入和删除代码
我们引入了同步字符串,它提供了一种新的方法来有效地处理同步错误,即插入和删除。同步错误严格来说比一般认为的汉明类型错误(即符号替换和擦除)更通用,也更难处理。对于每个ε > 0,同步字符串允许我们用ε- o(1)-大小的字母表索引序列,这样就可以有效地将k个同步错误转换为(1 + ε)k个hming型错误。这项强大的新技术有许多用途。在本文中,我们专注于设计插入码,即插入-删除通道的纠错块码(ECCs)。虽然对汉明型错误和同步错误的ECCs进行了深入的研究,但后者在很大程度上阻碍了进展。正如Mitzenmacher在他2009年的调查中所说的[30]:“具有同步错误的通道……是目前的理论无法充分理解的。鉴于几乎完全的知识,我们有擦除和错误的通道……我们对存在同步错误的通道缺乏了解,这一点确实值得注意。”事实上,直到1999年才构造出第一个具有恒定速率、恒定距离和恒定字母大小的内码,直到2016年才构造出具有渐近大噪声率的恒定速率内码。即使在渐近的大噪声或小噪声条件下,这些码也多项式地远离最佳速率-距离权衡。这使得对这项工作的内部代码的理解相当于50年前Forney在他的博士论文中引入串联代码后对常规ecc的理解。我们基于同步字符串的索引方法的一个直接应用程序提供了一个简单的黑盒结构,它将任何ECC转换为同样高效的内部代码,仅增加了少量的字母大小。这立即将对常规ecc的高度理解转移到内部代码领域。最值得注意的是,对于完整的噪声谱,我们获得了有效的“近mds”内码,它可以任意接近单例界给出的最佳速率-距离权衡。特别地,对于任意δ∈(0,1)且ε > 0,我们给出了一组在恒定大小的字母表上实现1 - δ - ε率的插入码,有效地纠正了δ分数的插入或删除。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound The Reachability Problem for Two-Dimensional Vector Addition Systems with States Invited Articles Foreword On Nonconvex Optimization for Machine Learning Exploiting Spontaneous Transmissions for Broadcasting and Leader Election in Radio Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1