{"title":"Private approximate nearest neighbor search for on-chain data based on locality-sensitive hashing","authors":"Siyuan Shang , Xuehui Du , Xiaohan Wang, Aodi Liu","doi":"10.1016/j.future.2024.107586","DOIUrl":null,"url":null,"abstract":"<div><div>Blockchain manages data with immutability, decentralization and traceability, offering new solutions for traditional information systems and greatly facilitating data sharing. However, on-chain data query still faces challenges such as low efficiency and difficulty in privacy protection. We propose a private Approximate Nearest Neighbor (ANN) search method for on-chain data based on Locality-Sensitive Hashing (LSH), which mainly includes two steps: query initialization and query implementation. In query initialization, the data management node builds hash tables for on-chain data through improved LSH, which are encrypted and stored on the blockchain using attribute-based encryption. In query implementation, node with correct privileges utilizes random smart contracts to query on-chain data privately by distributed point function and a privacy protection technique called oblivious masking. To validate the effectiveness of this method, we compare the performance with two ANN search algorithms, the query time is reduced by 57% and 59.2%, the average recall is increased by 4.5% and 2%, the average precision is increased by 7.7% and 6.9%, the average F1-score is increased by 6% and 4.3%, the average initialization time is reduced by 34 times and 122 times, respectively. We also compare the performance with private ANN search methods using homomorphic encryption, differential privacy and secure multi-party computation. The results show that our method can reduce the query time by several orders of magnitude, which is more applicable to the blockchain environment. To the best of our knowledge, this is the first private ANN search method for on-chain data, which consider the query efficiency and privacy protection, achieving efficient, accurate, and private data query.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107586"},"PeriodicalIF":6.2000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24005508","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Blockchain manages data with immutability, decentralization and traceability, offering new solutions for traditional information systems and greatly facilitating data sharing. However, on-chain data query still faces challenges such as low efficiency and difficulty in privacy protection. We propose a private Approximate Nearest Neighbor (ANN) search method for on-chain data based on Locality-Sensitive Hashing (LSH), which mainly includes two steps: query initialization and query implementation. In query initialization, the data management node builds hash tables for on-chain data through improved LSH, which are encrypted and stored on the blockchain using attribute-based encryption. In query implementation, node with correct privileges utilizes random smart contracts to query on-chain data privately by distributed point function and a privacy protection technique called oblivious masking. To validate the effectiveness of this method, we compare the performance with two ANN search algorithms, the query time is reduced by 57% and 59.2%, the average recall is increased by 4.5% and 2%, the average precision is increased by 7.7% and 6.9%, the average F1-score is increased by 6% and 4.3%, the average initialization time is reduced by 34 times and 122 times, respectively. We also compare the performance with private ANN search methods using homomorphic encryption, differential privacy and secure multi-party computation. The results show that our method can reduce the query time by several orders of magnitude, which is more applicable to the blockchain environment. To the best of our knowledge, this is the first private ANN search method for on-chain data, which consider the query efficiency and privacy protection, achieving efficient, accurate, and private data query.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.