{"title":"Fast Manhattan sketches in data streams","authors":"Jelani Nelson, David P. Woodruff","doi":"10.1145/1807085.1807101","DOIUrl":null,"url":null,"abstract":"The L1-distance, also known as the Manhattan or taxicab distance, between two vectors <i>x, y</i> in R<sup><i>n</i></sup> is ∑_{i=1}over<i>n</i> |<i>x<sub>i</sub>-y_<sub>i</sub></i>|. Approximating this distance is a fundamental primitive on massive databases, with applications to clustering, nearest neighbor search, network monitoring, regression, sampling, and support vector machines. We give the first 1-pass streaming algorithm for this problem in the turnstile model with <i>O</i>*(1/ε<sup>2</sup>) space and <i>O</i>*(1) update time. The <i>O</i>* notation hides polylogarithmic factors in ε, <i>n</i>, and the precision required to store vector entries. All previous algorithms either required Ω(1/ε<sup>3</sup>) space or Ω(1/ε<sup>2</sup>) update time and/or could not work in the turnstile model (i.e., support an arbitrary number of updates to each coordinate). Our bounds are optimal up to <i>O</i>*(1) factors.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"69 1","pages":"99-110"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1807085.1807101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35

Abstract

The L1-distance, also known as the Manhattan or taxicab distance, between two vectors x, y in Rn is ∑_{i=1}overn |xi-y_i|. Approximating this distance is a fundamental primitive on massive databases, with applications to clustering, nearest neighbor search, network monitoring, regression, sampling, and support vector machines. We give the first 1-pass streaming algorithm for this problem in the turnstile model with O*(1/ε2) space and O*(1) update time. The O* notation hides polylogarithmic factors in ε, n, and the precision required to store vector entries. All previous algorithms either required Ω(1/ε3) space or Ω(1/ε2) update time and/or could not work in the turnstile model (i.e., support an arbitrary number of updates to each coordinate). Our bounds are optimal up to O*(1) factors.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数据流中的快速曼哈顿草图
在Rn中,两个向量x, y之间的l1距离,也称为曼哈顿或出租车距离,是∑_{i=1} / n |xi-y_i|。在大型数据库中,近似这个距离是一个基本的基本要素,应用程序可以用于聚类、最近邻搜索、网络监控、回归、采样和支持向量机。在空间为O*(1/ε2)、更新时间为O*(1)的转门模型中,给出了该问题的第一个1次流算法。O*符号隐藏了ε、n中的多对数因子,以及存储向量项所需的精度。所有先前的算法要么需要Ω(1/ε3)空间,要么需要Ω(1/ε2)更新时间,而且/或者不能在旋转门模型中工作(即,支持对每个坐标进行任意数量的更新)。我们的边界在0 *(1)个因子范围内是最优的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.40
自引率
0.00%
发文量
0
期刊最新文献
Subspace exploration: Bounds on Projected Frequency Estimation. PODS'21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Virtual Event, China, June 20-25, 2021 Computing Optimal Repairs for Functional Dependencies. Relational database behavior: utilizing relational discrete event systems and models Data Citation: a Computational Challenge.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1