{"title":"Synchronization strings: codes for insertions and deletions approaching the Singleton bound","authors":"Bernhard Haeupler, Amirbehshad Shahrasbi","doi":"10.1145/3055399.3055498","DOIUrl":null,"url":null,"abstract":"We introduce synchronization strings, which provide a novel way of efficiently dealing with synchronization errors, i.e., insertions and deletions. Synchronization errors are strictly more general and much harder to deal with than more commonly considered half-errors, i.e., symbol corruptions and erasures. For every ε > 0, synchronization strings allow to index a sequence with an ε-O(1) size alphabet such that one can efficiently transform k synchronization errors into (1 + ε)k half-errors. This powerful new technique has many applications. In this paper we focus on designing insdel codes, i.e., error correcting block codes (ECCs) for insertion deletion channels. While ECCs for both half-errors and synchronization errors have been intensely studied, the later has largely resisted progress. As Mitzenmacher puts it in his 2009 survey: \"Channels with synchronization errors ... are simply not adequately understood by current theory. Given the near-complete knowledge we have for channels with erasures and errors ... our lack of understanding about channels with synchronization errors is truly remarkable.\" Indeed, it took until 1999 for the first insdel codes with constant rate, constant distance, and constant alphabet size to be constructed and only since 2016 are there constructions of constant rate indel codes for asymptotically large noise rates. Even in the asymptotically large or small noise regime these codes are polynomially far from the optimal rate-distance tradeoff. This makes the understanding of insdel codes up to this work equivalent to what was known for regular ECCs after Forney introduced concatenated codes in his doctoral thesis 50 years ago. A straight forward application of our synchronization strings based indexing method gives a simple black-box construction which transforms any ECC into an equally efficient insdel code with only a small increase in the alphabet size. This instantly transfers much of the highly developed understanding for regular ECCs over large constant alphabets into the realm of insdel codes. Most notably, for the complete noise spectrum we obtain efficient \"near-MDS\" insdel codes which get arbitrarily close to the optimal rate-distance tradeoff given by the Singleton bound. In particular, for any δ ∈ (0,1) and ε > 0 we give insdel codes achieving a rate of 1 - ξ - ε over a constant size alphabet that efficiently correct a δ fraction of insertions or deletions.","PeriodicalId":20615,"journal":{"name":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3055399.3055498","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 67
Abstract
We introduce synchronization strings, which provide a novel way of efficiently dealing with synchronization errors, i.e., insertions and deletions. Synchronization errors are strictly more general and much harder to deal with than more commonly considered half-errors, i.e., symbol corruptions and erasures. For every ε > 0, synchronization strings allow to index a sequence with an ε-O(1) size alphabet such that one can efficiently transform k synchronization errors into (1 + ε)k half-errors. This powerful new technique has many applications. In this paper we focus on designing insdel codes, i.e., error correcting block codes (ECCs) for insertion deletion channels. While ECCs for both half-errors and synchronization errors have been intensely studied, the later has largely resisted progress. As Mitzenmacher puts it in his 2009 survey: "Channels with synchronization errors ... are simply not adequately understood by current theory. Given the near-complete knowledge we have for channels with erasures and errors ... our lack of understanding about channels with synchronization errors is truly remarkable." Indeed, it took until 1999 for the first insdel codes with constant rate, constant distance, and constant alphabet size to be constructed and only since 2016 are there constructions of constant rate indel codes for asymptotically large noise rates. Even in the asymptotically large or small noise regime these codes are polynomially far from the optimal rate-distance tradeoff. This makes the understanding of insdel codes up to this work equivalent to what was known for regular ECCs after Forney introduced concatenated codes in his doctoral thesis 50 years ago. A straight forward application of our synchronization strings based indexing method gives a simple black-box construction which transforms any ECC into an equally efficient insdel code with only a small increase in the alphabet size. This instantly transfers much of the highly developed understanding for regular ECCs over large constant alphabets into the realm of insdel codes. Most notably, for the complete noise spectrum we obtain efficient "near-MDS" insdel codes which get arbitrarily close to the optimal rate-distance tradeoff given by the Singleton bound. In particular, for any δ ∈ (0,1) and ε > 0 we give insdel codes achieving a rate of 1 - ξ - ε over a constant size alphabet that efficiently correct a δ fraction of insertions or deletions.