{"title":"Flexible fingerprint cuckoo filter for information retrieval optimization in distributed network","authors":"Wenhan Lian, Jinlin Wang, Jiali You","doi":"10.1007/s10619-024-07440-w","DOIUrl":null,"url":null,"abstract":"<p>In a large-scale distributed network, a naming service is used to achieve location transparency and provide effective content discovery. However, fast and accurate name retrieval in the massive name set is laborious. Approximate set membership data structures, such as Bloom filter and Cuckoo filter, are very popular in distributed information systems. They obtain high query performance and reduce memory requirements through the abstract representation of information, but at the cost of introducing query error rates, which will ultimately affect content service quality. In this paper, in order to obtain higher space utilization and a lower query false positive rate, we propose a flexible fingerprint cuckoo filter (FFCF) for information storage and retrieval, which can change the length and type of fingerprints adaptively. In our scheme, FFCF uses longer fingerprints under low occupancy and has the ability to correct errors by changing the type of stored fingerprints. Moreover, we give a theoretical proof and evaluate the performance of FFCF by experimental simulations with synthetic data sets and real network packets. The results demonstrate that FFCF can improve memory utilization, significantly reduce false positive errors by nearly 90<span>\\(\\%\\)</span> at 50<span>\\(\\%\\)</span> occupancy and outperform Cuckoo filter in the full range of occupancy.</p>","PeriodicalId":50568,"journal":{"name":"Distributed and Parallel Databases","volume":"68 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Distributed and Parallel Databases","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10619-024-07440-w","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In a large-scale distributed network, a naming service is used to achieve location transparency and provide effective content discovery. However, fast and accurate name retrieval in the massive name set is laborious. Approximate set membership data structures, such as Bloom filter and Cuckoo filter, are very popular in distributed information systems. They obtain high query performance and reduce memory requirements through the abstract representation of information, but at the cost of introducing query error rates, which will ultimately affect content service quality. In this paper, in order to obtain higher space utilization and a lower query false positive rate, we propose a flexible fingerprint cuckoo filter (FFCF) for information storage and retrieval, which can change the length and type of fingerprints adaptively. In our scheme, FFCF uses longer fingerprints under low occupancy and has the ability to correct errors by changing the type of stored fingerprints. Moreover, we give a theoretical proof and evaluate the performance of FFCF by experimental simulations with synthetic data sets and real network packets. The results demonstrate that FFCF can improve memory utilization, significantly reduce false positive errors by nearly 90\(\%\) at 50\(\%\) occupancy and outperform Cuckoo filter in the full range of occupancy.
期刊介绍:
Distributed and Parallel Databases publishes papers in all the traditional as well as most emerging areas of database research, including:
Availability and reliability;
Benchmarking and performance evaluation, and tuning;
Big Data Storage and Processing;
Cloud Computing and Database-as-a-Service;
Crowdsourcing;
Data curation, annotation and provenance;
Data integration, metadata Management, and interoperability;
Data models, semantics, query languages;
Data mining and knowledge discovery;
Data privacy, security, trust;
Data provenance, workflows, Scientific Data Management;
Data visualization and interactive data exploration;
Data warehousing, OLAP, Analytics;
Graph data management, RDF, social networks;
Information Extraction and Data Cleaning;
Middleware and Workflow Management;
Modern Hardware and In-Memory Database Systems;
Query Processing and Optimization;
Semantic Web and open data;
Social Networks;
Storage, indexing, and physical database design;
Streams, sensor networks, and complex event processing;
Strings, Texts, and Keyword Search;
Spatial, temporal, and spatio-temporal databases;
Transaction processing;
Uncertain, probabilistic, and approximate databases.