Synchronization strings: codes for insertions and deletions approaching the Singleton bound

Bernhard Haeupler, Amirbehshad Shahrasbi
{"title":"Synchronization strings: codes for insertions and deletions approaching the Singleton bound","authors":"Bernhard Haeupler, Amirbehshad Shahrasbi","doi":"10.1145/3055399.3055498","DOIUrl":null,"url":null,"abstract":"We introduce synchronization strings, which provide a novel way of efficiently dealing with synchronization errors, i.e., insertions and deletions. Synchronization errors are strictly more general and much harder to deal with than more commonly considered half-errors, i.e., symbol corruptions and erasures. For every ε > 0, synchronization strings allow to index a sequence with an ε-O(1) size alphabet such that one can efficiently transform k synchronization errors into (1 + ε)k half-errors. This powerful new technique has many applications. In this paper we focus on designing insdel codes, i.e., error correcting block codes (ECCs) for insertion deletion channels. While ECCs for both half-errors and synchronization errors have been intensely studied, the later has largely resisted progress. As Mitzenmacher puts it in his 2009 survey: \"Channels with synchronization errors ... are simply not adequately understood by current theory. Given the near-complete knowledge we have for channels with erasures and errors ... our lack of understanding about channels with synchronization errors is truly remarkable.\" Indeed, it took until 1999 for the first insdel codes with constant rate, constant distance, and constant alphabet size to be constructed and only since 2016 are there constructions of constant rate indel codes for asymptotically large noise rates. Even in the asymptotically large or small noise regime these codes are polynomially far from the optimal rate-distance tradeoff. This makes the understanding of insdel codes up to this work equivalent to what was known for regular ECCs after Forney introduced concatenated codes in his doctoral thesis 50 years ago. A straight forward application of our synchronization strings based indexing method gives a simple black-box construction which transforms any ECC into an equally efficient insdel code with only a small increase in the alphabet size. This instantly transfers much of the highly developed understanding for regular ECCs over large constant alphabets into the realm of insdel codes. Most notably, for the complete noise spectrum we obtain efficient \"near-MDS\" insdel codes which get arbitrarily close to the optimal rate-distance tradeoff given by the Singleton bound. In particular, for any δ ∈ (0,1) and ε > 0 we give insdel codes achieving a rate of 1 - ξ - ε over a constant size alphabet that efficiently correct a δ fraction of insertions or deletions.","PeriodicalId":20615,"journal":{"name":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3055399.3055498","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 67

Abstract

We introduce synchronization strings, which provide a novel way of efficiently dealing with synchronization errors, i.e., insertions and deletions. Synchronization errors are strictly more general and much harder to deal with than more commonly considered half-errors, i.e., symbol corruptions and erasures. For every ε > 0, synchronization strings allow to index a sequence with an ε-O(1) size alphabet such that one can efficiently transform k synchronization errors into (1 + ε)k half-errors. This powerful new technique has many applications. In this paper we focus on designing insdel codes, i.e., error correcting block codes (ECCs) for insertion deletion channels. While ECCs for both half-errors and synchronization errors have been intensely studied, the later has largely resisted progress. As Mitzenmacher puts it in his 2009 survey: "Channels with synchronization errors ... are simply not adequately understood by current theory. Given the near-complete knowledge we have for channels with erasures and errors ... our lack of understanding about channels with synchronization errors is truly remarkable." Indeed, it took until 1999 for the first insdel codes with constant rate, constant distance, and constant alphabet size to be constructed and only since 2016 are there constructions of constant rate indel codes for asymptotically large noise rates. Even in the asymptotically large or small noise regime these codes are polynomially far from the optimal rate-distance tradeoff. This makes the understanding of insdel codes up to this work equivalent to what was known for regular ECCs after Forney introduced concatenated codes in his doctoral thesis 50 years ago. A straight forward application of our synchronization strings based indexing method gives a simple black-box construction which transforms any ECC into an equally efficient insdel code with only a small increase in the alphabet size. This instantly transfers much of the highly developed understanding for regular ECCs over large constant alphabets into the realm of insdel codes. Most notably, for the complete noise spectrum we obtain efficient "near-MDS" insdel codes which get arbitrarily close to the optimal rate-distance tradeoff given by the Singleton bound. In particular, for any δ ∈ (0,1) and ε > 0 we give insdel codes achieving a rate of 1 - ξ - ε over a constant size alphabet that efficiently correct a δ fraction of insertions or deletions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
同步字符串:接近单例绑定的插入和删除代码
我们引入了同步字符串,它提供了一种有效处理同步错误的新方法,即插入和删除。同步错误严格来说比一般认为的半错误(即符号损坏和擦除)更通用,也更难处理。对于每个ε > 0,同步字符串允许用ε- o(1)大小的字母表索引序列,这样可以有效地将k个同步误差转换为(1 + ε)k个半误差。这项强大的新技术有许多用途。在本文中,我们专注于设计插入码,即插入删除信道的纠错块码(ECCs)。虽然半误差和同步误差的ECCs已经得到了广泛的研究,但后者在很大程度上阻碍了进展。正如米岑马赫在2009年的调查中所说:“有同步错误的频道……是目前的理论无法充分理解的。鉴于我们对带有擦除和错误的通道的近乎完整的了解……我们对存在同步错误的通道缺乏了解,这一点确实值得注意。”实际上,直到1999年才构造出第一个具有恒定速率、恒定距离和恒定字母大小的indel码,直到2016年才构造出具有渐近大噪声率的恒定速率indel码。即使在渐近的大噪声或小噪声条件下,这些码也多项式地远离最佳速率-距离权衡。这使得对这项工作的内部代码的理解相当于50年前Forney在他的博士论文中引入串联代码后对常规ecc的理解。我们基于同步字符串的索引方法的直接应用程序提供了一个简单的黑盒结构,它将任何ECC转换为同样高效的内部代码,仅增加了少量的字母大小。这立即将对大型常量字母的常规ecc的高度理解转移到内部代码领域。最值得注意的是,对于完整的噪声谱,我们获得了有效的“近mds”内码,它可以任意接近由单例界给出的最佳速率-距离权衡。特别地,对于任意δ∈(0,1)和ε >,我们给出了在恒定大小的字母表上实现1 - ξ - ε率的插入码,有效地纠正了δ分数的插入或删除。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Online service with delay A simpler and faster strongly polynomial algorithm for generalized flow maximization Low rank approximation with entrywise l1-norm error Fast convergence of learning in games (invited talk) Surviving in directed graphs: a quasi-polynomial-time polylogarithmic approximation for two-connected directed Steiner tree
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1