{"title":"基于指纹的最小完美哈希重新审视","authors":"Piotr Beling","doi":"10.1145/3596453","DOIUrl":null,"url":null,"abstract":"In the paper we study a fingerprint-based minimal perfect hash function (FMPH for short). While FMPH is not as space-efficient as some other minimal perfect hash functions (for example RecSplit, CHD, or PTHash), it has a number of practical advantages that make it worthy of consideration. FMPH is simple and quite fast to evaluate. Its construction requires very little auxiliary memory, takes a short time and, in addition, can be parallelized or carried out without holding keys in memory. In this paper, we propose an effective method (called FMPHGO) that reduces the size of FMPH, as well as a number of implementation improvements. In addition, we experimentally study FMPHGO performance and find the best values for its parameters. Our benchmarks show that with our method and an efficient structure to support the rank queries on a bit vector, the FMPH size can be reduced to about 2.1 bits/key, which is close to the size achieved by state-of-the-art methods and noticeably larger only compared to RecSplit. FMPHGO preserves most of the FMPH advantages mentioned above, but significantly reduces its construction speed. However, FMPHGO’s construction speed is still competitive with methods of similar space efficiency (like CHD or PTHash), and seems to be good enough for practical applications.","PeriodicalId":53707,"journal":{"name":"Journal of Experimental Algorithmics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Fingerprinting-based minimal perfect hashing revisited\",\"authors\":\"Piotr Beling\",\"doi\":\"10.1145/3596453\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the paper we study a fingerprint-based minimal perfect hash function (FMPH for short). While FMPH is not as space-efficient as some other minimal perfect hash functions (for example RecSplit, CHD, or PTHash), it has a number of practical advantages that make it worthy of consideration. FMPH is simple and quite fast to evaluate. Its construction requires very little auxiliary memory, takes a short time and, in addition, can be parallelized or carried out without holding keys in memory. In this paper, we propose an effective method (called FMPHGO) that reduces the size of FMPH, as well as a number of implementation improvements. In addition, we experimentally study FMPHGO performance and find the best values for its parameters. Our benchmarks show that with our method and an efficient structure to support the rank queries on a bit vector, the FMPH size can be reduced to about 2.1 bits/key, which is close to the size achieved by state-of-the-art methods and noticeably larger only compared to RecSplit. FMPHGO preserves most of the FMPH advantages mentioned above, but significantly reduces its construction speed. However, FMPHGO’s construction speed is still competitive with methods of similar space efficiency (like CHD or PTHash), and seems to be good enough for practical applications.\",\"PeriodicalId\":53707,\"journal\":{\"name\":\"Journal of Experimental Algorithmics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Experimental Algorithmics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3596453\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Experimental Algorithmics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3596453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Mathematics","Score":null,"Total":0}
In the paper we study a fingerprint-based minimal perfect hash function (FMPH for short). While FMPH is not as space-efficient as some other minimal perfect hash functions (for example RecSplit, CHD, or PTHash), it has a number of practical advantages that make it worthy of consideration. FMPH is simple and quite fast to evaluate. Its construction requires very little auxiliary memory, takes a short time and, in addition, can be parallelized or carried out without holding keys in memory. In this paper, we propose an effective method (called FMPHGO) that reduces the size of FMPH, as well as a number of implementation improvements. In addition, we experimentally study FMPHGO performance and find the best values for its parameters. Our benchmarks show that with our method and an efficient structure to support the rank queries on a bit vector, the FMPH size can be reduced to about 2.1 bits/key, which is close to the size achieved by state-of-the-art methods and noticeably larger only compared to RecSplit. FMPHGO preserves most of the FMPH advantages mentioned above, but significantly reduces its construction speed. However, FMPHGO’s construction speed is still competitive with methods of similar space efficiency (like CHD or PTHash), and seems to be good enough for practical applications.
期刊介绍:
The ACM JEA is a high-quality, refereed, archival journal devoted to the study of discrete algorithms and data structures through a combination of experimentation and classical analysis and design techniques. It focuses on the following areas in algorithms and data structures: ■combinatorial optimization ■computational biology ■computational geometry ■graph manipulation ■graphics ■heuristics ■network design ■parallel processing ■routing and scheduling ■searching and sorting ■VLSI design