Linear Hashing Is Awesome

M. B. T. Knudsen
{"title":"Linear Hashing Is Awesome","authors":"M. B. T. Knudsen","doi":"10.1109/FOCS.2016.45","DOIUrl":null,"url":null,"abstract":"The most classic textbook hash function, e.g. taught in CLRS [MIT Press'09], is h(x) = ((ax + b) mod p) mod m, (◇) where x, a, b ϵ {0, 1, ..., p-1} and a, b are chosen uniformly at random. It is known that (◇) is 2-independent and almost uniform provided p is a prime and p ≫ m. This implies that when using (◇) to build a hash table with chaining that contains n ≤ m keys, the expected query time is O(1) and the expected length of the longest chain is O(√n). This result holds for any 2-independent hash function. No hash function can improve on the expected query time, but the upper bound on the expected length of the longest chain is not known to be tight for (◇). Partially addressing this problem, Alon et al. [STOC'97] proved the existence of a class of linear hash functions such that the expected length of the longest chain is (√n) and leave as an open problem to decide which nontrivial properties (◇) has. We make the first progress on this fundamental problem, by showing that the expected length of the longest chain is at most n1/3o(1) which means that the performance of (◇) is similar to that of a independent hash function for which we can prove an upper bound of O(n1/3). As a lemma we show that within a fixed set of integers there are few pairs such that the height of the ratio of the pairs are small. Given two non-zero coprime integers n, m ϵ ℤ with the height of n/m is max t{|n|, |m|}, and the height is a way of measuring how complex a fraction is. This is proved using a mixture of techniques from additive combinatorics and number theory, and we believe that the result might be of independent interest. For a natural variation of (◇), we show that it is possible to apply second order moment bounds even when a hash value is fixed. As a consequence: For min-wise hashing it was known that any key from a set of n keys has the smallest hash value with probability O (1√n). We improve this to n-1+o(1). For linear probing it was known that the worst case expected query time is O (√n). We improve this to no(1).","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2016.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

The most classic textbook hash function, e.g. taught in CLRS [MIT Press'09], is h(x) = ((ax + b) mod p) mod m, (◇) where x, a, b ϵ {0, 1, ..., p-1} and a, b are chosen uniformly at random. It is known that (◇) is 2-independent and almost uniform provided p is a prime and p ≫ m. This implies that when using (◇) to build a hash table with chaining that contains n ≤ m keys, the expected query time is O(1) and the expected length of the longest chain is O(√n). This result holds for any 2-independent hash function. No hash function can improve on the expected query time, but the upper bound on the expected length of the longest chain is not known to be tight for (◇). Partially addressing this problem, Alon et al. [STOC'97] proved the existence of a class of linear hash functions such that the expected length of the longest chain is (√n) and leave as an open problem to decide which nontrivial properties (◇) has. We make the first progress on this fundamental problem, by showing that the expected length of the longest chain is at most n1/3o(1) which means that the performance of (◇) is similar to that of a independent hash function for which we can prove an upper bound of O(n1/3). As a lemma we show that within a fixed set of integers there are few pairs such that the height of the ratio of the pairs are small. Given two non-zero coprime integers n, m ϵ ℤ with the height of n/m is max t{|n|, |m|}, and the height is a way of measuring how complex a fraction is. This is proved using a mixture of techniques from additive combinatorics and number theory, and we believe that the result might be of independent interest. For a natural variation of (◇), we show that it is possible to apply second order moment bounds even when a hash value is fixed. As a consequence: For min-wise hashing it was known that any key from a set of n keys has the smallest hash value with probability O (1√n). We improve this to n-1+o(1). For linear probing it was known that the worst case expected query time is O (√n). We improve this to no(1).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
线性哈希很棒
最经典的教科书哈希函数,例如在CLRS [MIT出版社'09]中教授的,是h(x) = ((ax + b) mod p) mod m,(◇)其中x, a, b ε{0,1,…, p-1}和a, b是均匀随机选择的。已知(◇)是2独立的且几乎一致的,只要p是素数且p < m。这意味着当使用(◇)构建包含n≤m个键的链哈希表时,期望查询时间为O(1),最长链的期望长度为O(√n)。这个结果适用于任何2独立的哈希函数。没有哈希函数可以提高期望的查询时间,但是最长链的期望长度的上界对于(◇)来说并不紧。Alon等人[STOC'97]部分解决了这个问题,证明了一类线性哈希函数的存在性,使得最长链的期望长度为(√n),并留下一个开放问题来决定哪些非平凡性质(◇)具有。我们在这个基本问题上取得了第一个进展,通过证明最长链的期望长度最多为n1/ 30(1),这意味着(◇)的性能类似于我们可以证明上界为O(n /3)的独立哈希函数的性能。作为一个引理,我们证明了在一个固定的整数集合中,很少有对使得这些对之比的高度很小。给定两个高度为n/m的非零素数n, m λ m等于max t{|n|, |m|},高度是衡量分数复杂程度的一种方式。这是使用加性组合学和数论的混合技术证明的,我们相信结果可能是独立的兴趣。对于(◇)的自然变化,我们证明了即使哈希值是固定的,也可以应用二阶矩界。结果是:对于最小散列,已知n个键的集合中的任何键具有最小的散列值,概率为O(1√n)。我们把它改进成n-1+ 0 (1)对于线性探测,已知最坏情况下的预期查询时间为O(√n)。我们将其改进为no(1)
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Exponential Lower Bounds for Monotone Span Programs Truly Sub-cubic Algorithms for Language Edit Distance and RNA-Folding via Fast Bounded-Difference Min-Plus Product Polynomial-Time Tensor Decompositions with Sum-of-Squares Decremental Single-Source Reachability and Strongly Connected Components in Õ(m√n) Total Update Time NP-Hardness of Reed-Solomon Decoding and the Prouhet-Tarry-Escott Problem
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1