Edit Distance: Sketching, Streaming, and Document Exchange

D. Belazzougui, Qin Zhang
{"title":"Edit Distance: Sketching, Streaming, and Document Exchange","authors":"D. Belazzougui, Qin Zhang","doi":"10.1109/FOCS.2016.15","DOIUrl":null,"url":null,"abstract":"We show that in the document exchange problem, where Alice holds x ϵ {0, 1}n and Bob holds y ϵ {0, 1}n, Alice can send Bob a message of size O(K(log2 K + log n)) bits such that Bob can recover x using the message and his input y if the edit distance between x and y is no more than K, and output \"error\" otherwise. Both the encoding and decoding can be done in time Õ(n + poly(K)). This result significantly improves on the previous communication bounds under polynomial encoding/decoding time. We also show that in the referee model, where Alice and Bob hold x and y respectively, they can compute sketches of x and y of sizes poly(K log n) bits (the encoding), and send to the referee, who can then compute the edit distance between x and y together with all the edit operations if the edit distance is no more than K, and output \"error\" otherwise (the decoding). To the best of our knowledge, this is the first result for sketching edit distance using poly(K log n) bits. Moreover, the encoding phase of our sketching algorithm can be performed by scanning the input string in one pass. Thus our sketching algorithm also implies the first streaming algorithm for computing edit distance and all the edits exactly using poly(K log n) bits of space.","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"162 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"54","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2016.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 54

Abstract

We show that in the document exchange problem, where Alice holds x ϵ {0, 1}n and Bob holds y ϵ {0, 1}n, Alice can send Bob a message of size O(K(log2 K + log n)) bits such that Bob can recover x using the message and his input y if the edit distance between x and y is no more than K, and output "error" otherwise. Both the encoding and decoding can be done in time Õ(n + poly(K)). This result significantly improves on the previous communication bounds under polynomial encoding/decoding time. We also show that in the referee model, where Alice and Bob hold x and y respectively, they can compute sketches of x and y of sizes poly(K log n) bits (the encoding), and send to the referee, who can then compute the edit distance between x and y together with all the edit operations if the edit distance is no more than K, and output "error" otherwise (the decoding). To the best of our knowledge, this is the first result for sketching edit distance using poly(K log n) bits. Moreover, the encoding phase of our sketching algorithm can be performed by scanning the input string in one pass. Thus our sketching algorithm also implies the first streaming algorithm for computing edit distance and all the edits exactly using poly(K log n) bits of space.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
编辑距离:草图,流,和文件交换
我们证明,在文档交换问题中,Alice持有x λ {0,1}n, Bob持有y λ {0,1}n, Alice可以向Bob发送一个大小为O(K(log2k + log n))位的消息,这样,如果x和y之间的编辑距离不大于K, Bob可以使用消息和他的输入y恢复x,否则输出“error”。编码和解码都可以及时完成Õ(n + poly(K))。这一结果显著改善了以前在多项式编码/解码时间下的通信边界。我们还表明,在裁判模型中,Alice和Bob分别持有x和y,他们可以计算大小为poly(K log n)位(编码)的x和y的草图,并发送给裁判,然后裁判可以计算x和y之间的编辑距离以及所有编辑操作,如果编辑距离不超过K,则输出“错误”(解码)。据我们所知,这是使用多边形(K log n)位绘制编辑距离的第一个结果。此外,我们的草图绘制算法的编码阶段可以通过一次扫描输入字符串来完成。因此,我们的草图绘制算法也意味着计算编辑距离的第一个流算法,并且所有的编辑都精确地使用poly(K log n)位空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Exponential Lower Bounds for Monotone Span Programs Truly Sub-cubic Algorithms for Language Edit Distance and RNA-Folding via Fast Bounded-Difference Min-Plus Product Polynomial-Time Tensor Decompositions with Sum-of-Squares Decremental Single-Source Reachability and Strongly Connected Components in Õ(m√n) Total Update Time NP-Hardness of Reed-Solomon Decoding and the Prouhet-Tarry-Escott Problem
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1