Private Information Retrieval Without Storage Overhead: Coding Instead of Replication

IEEE journal on selected areas in information theory Pub Date : 2023-01-01 DOI:10.1109/JSAIT.2023.3285665

Alexander Vardy;Eitan Yaakobi

{"title":"Private Information Retrieval Without Storage Overhead: Coding Instead of Replication","authors":"Alexander Vardy;Eitan Yaakobi","doi":"10.1109/JSAIT.2023.3285665","DOIUrl":null,"url":null,"abstract":"Private information retrieval (PIR) protocols allow a user to retrieve a data item from a database without revealing any information about the identity of the item being retrieved. Specifically, in information-theoretic <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>-server PIR, the database is replicated among <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula> non-communicating servers, and each server learns nothing about the item retrieved by the user. The effectiveness of PIR protocols is usually measured in terms of their communication complexity, which is the total number of bits exchanged between the user and the servers. However, another important cost parameter is storage overhead, which is the ratio between the total number of bits stored on all the servers and the number of bits in the database. Since single-server information-theoretic PIR is impossible, the storage overhead of all existing PIR protocols is at least 2 (or <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>, in the case of <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>-server PIR). In this work, we show that information-theoretic PIR can be achieved with storage overhead arbitrarily close to the optimal value of 1, without sacrificing the communication complexity asymptotically. Specifically, we prove that all known linear <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>-server PIR protocols can be efficiently emulated, while preserving both privacy and communication complexity but significantly reducing the storage overhead. To this end, we distribute the <inline-formula> <tex-math notation=\"LaTeX\">$n$ </tex-math></inline-formula> bits of the database among <inline-formula> <tex-math notation=\"LaTeX\">$s+r$ </tex-math></inline-formula> servers, each storing <inline-formula> <tex-math notation=\"LaTeX\">$n/s$ </tex-math></inline-formula> coded bits (rather than replicas). Notably, our coding scheme remains the same, regardless of the specific <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>-server PIR protocol being emulated. For every fixed <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>, the resulting storage overhead <inline-formula> <tex-math notation=\"LaTeX\">$(s+r)/s$ </tex-math></inline-formula> approaches 1 as <inline-formula> <tex-math notation=\"LaTeX\">$s$ </tex-math></inline-formula> grows; explicitly we have <inline-formula> <tex-math notation=\"LaTeX\">$r \\le k \\sqrt {s}(1 + o(1))$ </tex-math></inline-formula>. Moreover, in the special case <inline-formula> <tex-math notation=\"LaTeX\">$k = 2$ </tex-math></inline-formula>, the storage overhead is only <inline-formula> <tex-math notation=\"LaTeX\">$1 + {}\\frac {1}{s}$ </tex-math></inline-formula>. In order to achieve these results, we introduce and study a new kind of binary linear codes, called here <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>-server PIR codes. We then show how such codes can be constructed from one-step majority-logic decodable codes, from Steiner systems, from constant-weight codes, and from certain locally recoverable codes. We also establish several bounds on the parameters of <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>-server PIR codes and finally extend for array codes.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"286-301"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal on selected areas in information theory","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10180061/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Private information retrieval (PIR) protocols allow a user to retrieve a data item from a database without revealing any information about the identity of the item being retrieved. Specifically, in information-theoretic

$k$

-server PIR, the database is replicated among

$k$

non-communicating servers, and each server learns nothing about the item retrieved by the user. The effectiveness of PIR protocols is usually measured in terms of their communication complexity, which is the total number of bits exchanged between the user and the servers. However, another important cost parameter is storage overhead, which is the ratio between the total number of bits stored on all the servers and the number of bits in the database. Since single-server information-theoretic PIR is impossible, the storage overhead of all existing PIR protocols is at least 2 (or

$k$

, in the case of

$k$

-server PIR). In this work, we show that information-theoretic PIR can be achieved with storage overhead arbitrarily close to the optimal value of 1, without sacrificing the communication complexity asymptotically. Specifically, we prove that all known linear

$k$

-server PIR protocols can be efficiently emulated, while preserving both privacy and communication complexity but significantly reducing the storage overhead. To this end, we distribute the

$n$

bits of the database among

$s+r$

servers, each storing

$n/s$

coded bits (rather than replicas). Notably, our coding scheme remains the same, regardless of the specific

$k$

-server PIR protocol being emulated. For every fixed

$k$

, the resulting storage overhead

$(s+r)/s$

approaches 1 as

$s$

grows; explicitly we have

$r \le k \sqrt {s}(1 + o(1))$

. Moreover, in the special case

$k = 2$

, the storage overhead is only

$1 + {}\frac {1}{s}$

. In order to achieve these results, we introduce and study a new kind of binary linear codes, called here

$k$

-server PIR codes. We then show how such codes can be constructed from one-step majority-logic decodable codes, from Steiner systems, from constant-weight codes, and from certain locally recoverable codes. We also establish several bounds on the parameters of

$k$

-server PIR codes and finally extend for array codes.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

没有存储开销的私有信息检索：编码而不是复制

专用信息检索（PIR）协议允许用户从数据库中检索数据项，而不透露任何关于所检索项的身份的信息。具体地说，在信息论的$k$服务器PIR中，数据库在$k$非通信服务器之间复制，并且每个服务器对用户检索的项目一无所知。PIR协议的有效性通常根据其通信复杂性来衡量，通信复杂性是用户和服务器之间交换的比特总数。然而，另一个重要的成本参数是存储开销，它是存储在所有服务器上的总位数与数据库中的位数之间的比率。由于单服务器信息论PIR是不可能的，因此所有现有PIR协议的存储开销至少为2（在$k$-server PIR的情况下为$k$）。在这项工作中，我们证明了信息论PIR可以在存储开销任意接近最优值1的情况下实现，而不会渐近地牺牲通信复杂性。具体来说，我们证明了所有已知的线性$k$-server PIR协议都可以有效地仿真，同时保留了隐私和通信复杂性，但显著降低了存储开销。为此，我们将数据库的$n$位分布在$s+r$服务器之间，每个服务器存储$n/s$编码位（而不是副本）。值得注意的是，我们的编码方案保持不变，无论模拟的是特定的$k$-server PIR协议。对于每个固定的$k$，随着$s$的增长，由此产生的存储开销$（s+r）/s$接近1；明确地说，我们有$r\le k\sqrt｛s｝（1+o（1））$。此外，在特殊情况$k=2$中，存储开销仅为$1+｛｝\frac｛1｝｛s｝$。为了实现这些结果，我们引入并研究了一种新的二进制线性码，这里称为$k$-server PIR码。然后，我们展示了如何从一步多数逻辑可解码代码、Steiner系统、常重代码和某些局部可恢复代码中构建此类代码。我们还建立了$k$-server PIR代码的参数的几个边界，并最终扩展到数组代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE journal on selected areas in information theory

CiteScore

8.20

自引率

0.00%

发文量