{"title":"Private Information Retrieval Without Storage Overhead: Coding Instead of Replication","authors":"Alexander Vardy;Eitan Yaakobi","doi":"10.1109/JSAIT.2023.3285665","DOIUrl":null,"url":null,"abstract":"Private information retrieval (PIR) protocols allow a user to retrieve a data item from a database without revealing any information about the identity of the item being retrieved. Specifically, in information-theoretic <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>-server PIR, the database is replicated among <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula> non-communicating servers, and each server learns nothing about the item retrieved by the user. The effectiveness of PIR protocols is usually measured in terms of their communication complexity, which is the total number of bits exchanged between the user and the servers. However, another important cost parameter is storage overhead, which is the ratio between the total number of bits stored on all the servers and the number of bits in the database. Since single-server information-theoretic PIR is impossible, the storage overhead of all existing PIR protocols is at least 2 (or <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>, in the case of <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>-server PIR). In this work, we show that information-theoretic PIR can be achieved with storage overhead arbitrarily close to the optimal value of 1, without sacrificing the communication complexity asymptotically. Specifically, we prove that all known linear <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>-server PIR protocols can be efficiently emulated, while preserving both privacy and communication complexity but significantly reducing the storage overhead. To this end, we distribute the <inline-formula> <tex-math notation=\"LaTeX\">$n$ </tex-math></inline-formula> bits of the database among <inline-formula> <tex-math notation=\"LaTeX\">$s+r$ </tex-math></inline-formula> servers, each storing <inline-formula> <tex-math notation=\"LaTeX\">$n/s$ </tex-math></inline-formula> coded bits (rather than replicas). Notably, our coding scheme remains the same, regardless of the specific <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>-server PIR protocol being emulated. For every fixed <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>, the resulting storage overhead <inline-formula> <tex-math notation=\"LaTeX\">$(s+r)/s$ </tex-math></inline-formula> approaches 1 as <inline-formula> <tex-math notation=\"LaTeX\">$s$ </tex-math></inline-formula> grows; explicitly we have <inline-formula> <tex-math notation=\"LaTeX\">$r \\le k \\sqrt {s}(1 + o(1))$ </tex-math></inline-formula>. Moreover, in the special case <inline-formula> <tex-math notation=\"LaTeX\">$k = 2$ </tex-math></inline-formula>, the storage overhead is only <inline-formula> <tex-math notation=\"LaTeX\">$1 + {}\\frac {1}{s}$ </tex-math></inline-formula>. In order to achieve these results, we introduce and study a new kind of binary linear codes, called here <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>-server PIR codes. We then show how such codes can be constructed from one-step majority-logic decodable codes, from Steiner systems, from constant-weight codes, and from certain locally recoverable codes. We also establish several bounds on the parameters of <inline-formula> <tex-math notation=\"LaTeX\">$k$ </tex-math></inline-formula>-server PIR codes and finally extend for array codes.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"286-301"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal on selected areas in information theory","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10180061/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Private information retrieval (PIR) protocols allow a user to retrieve a data item from a database without revealing any information about the identity of the item being retrieved. Specifically, in information-theoretic $k$ -server PIR, the database is replicated among $k$ non-communicating servers, and each server learns nothing about the item retrieved by the user. The effectiveness of PIR protocols is usually measured in terms of their communication complexity, which is the total number of bits exchanged between the user and the servers. However, another important cost parameter is storage overhead, which is the ratio between the total number of bits stored on all the servers and the number of bits in the database. Since single-server information-theoretic PIR is impossible, the storage overhead of all existing PIR protocols is at least 2 (or $k$ , in the case of $k$ -server PIR). In this work, we show that information-theoretic PIR can be achieved with storage overhead arbitrarily close to the optimal value of 1, without sacrificing the communication complexity asymptotically. Specifically, we prove that all known linear $k$ -server PIR protocols can be efficiently emulated, while preserving both privacy and communication complexity but significantly reducing the storage overhead. To this end, we distribute the $n$ bits of the database among $s+r$ servers, each storing $n/s$ coded bits (rather than replicas). Notably, our coding scheme remains the same, regardless of the specific $k$ -server PIR protocol being emulated. For every fixed $k$ , the resulting storage overhead $(s+r)/s$ approaches 1 as $s$ grows; explicitly we have $r \le k \sqrt {s}(1 + o(1))$ . Moreover, in the special case $k = 2$ , the storage overhead is only $1 + {}\frac {1}{s}$ . In order to achieve these results, we introduce and study a new kind of binary linear codes, called here $k$ -server PIR codes. We then show how such codes can be constructed from one-step majority-logic decodable codes, from Steiner systems, from constant-weight codes, and from certain locally recoverable codes. We also establish several bounds on the parameters of $k$ -server PIR codes and finally extend for array codes.