Coeus:一个遗忘文档排序与检索系统

Q3 Computer Science Operating Systems Review (ACM) Pub Date : 2021-10-26 DOI:10.1145/3477132.3483586

Ishtiyaque Ahmad, Laboni Sarker, D. Agrawal, A. E. Abbadi, Trinabh Gupta

{"title":"Coeus:一个遗忘文档排序与检索系统","authors":"Ishtiyaque Ahmad, Laboni Sarker, D. Agrawal, A. E. Abbadi, Trinabh Gupta","doi":"10.1145/3477132.3483586","DOIUrl":null,"url":null,"abstract":"Given a private string q and a remote server that holds a set of public documents D, how can one of the K most relevant documents to q in D be selected and viewed without anyone (not even the server) learning anything about q or the document? This is the oblivious document ranking and retrieval problem. In this paper, we describe Coeus, a system that solves this problem. At a high level, Coeus composes two cryptographic primitives: secure matrix-vector product for scoring document relevance using the widely-used term frequency-inverse document frequency (tf-idf) method, and private information retrieval (PIR) for obliviously retrieving documents. However, Coeus reduces the time to run these protocols, thereby improving the user-perceived latency, which is a key performance metric. Coeus first reduces the PIR overhead by separating out private metadata retrieval from document retrieval, and it then scales secure matrix-vector product to tf-idf matrices with several hundred billion elements through a series of novel cryptographic refinements. For a corpus of English Wikipedia containing 5 million documents, a keyword dictionary with 64K keywords, and on a cluster of 143 machines on AWS, Coeus enables a user to obliviously rank and retrieve a document in 3.9 seconds---a 24x improvement over a baseline system.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"43 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Coeus: A System for Oblivious Document Ranking and Retrieval\",\"authors\":\"Ishtiyaque Ahmad, Laboni Sarker, D. Agrawal, A. E. Abbadi, Trinabh Gupta\",\"doi\":\"10.1145/3477132.3483586\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given a private string q and a remote server that holds a set of public documents D, how can one of the K most relevant documents to q in D be selected and viewed without anyone (not even the server) learning anything about q or the document? This is the oblivious document ranking and retrieval problem. In this paper, we describe Coeus, a system that solves this problem. At a high level, Coeus composes two cryptographic primitives: secure matrix-vector product for scoring document relevance using the widely-used term frequency-inverse document frequency (tf-idf) method, and private information retrieval (PIR) for obliviously retrieving documents. However, Coeus reduces the time to run these protocols, thereby improving the user-perceived latency, which is a key performance metric. Coeus first reduces the PIR overhead by separating out private metadata retrieval from document retrieval, and it then scales secure matrix-vector product to tf-idf matrices with several hundred billion elements through a series of novel cryptographic refinements. For a corpus of English Wikipedia containing 5 million documents, a keyword dictionary with 64K keywords, and on a cluster of 143 machines on AWS, Coeus enables a user to obliviously rank and retrieve a document in 3.9 seconds---a 24x improvement over a baseline system.\",\"PeriodicalId\":38935,\"journal\":{\"name\":\"Operating Systems Review (ACM)\",\"volume\":\"43 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Operating Systems Review (ACM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3477132.3483586\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Operating Systems Review (ACM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3477132.3483586","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 7

摘要

给定一个私有字符串q和一个持有一组公共文档D的远程服务器，如何在没有任何人(甚至服务器)了解q或文档的情况下选择和查看D中与q最相关的K个文档中的一个?这是无关文档排序和检索问题。在本文中，我们描述了Coeus，一个解决这个问题的系统。在高层次上，Coeus包含两个加密原语:使用广泛使用的术语频率逆文档频率(tf-idf)方法对文档相关性进行评分的安全矩阵-向量积，以及用于隐性检索文档的私有信息检索(PIR)。然而，Coeus减少了运行这些协议的时间，从而提高了用户感知的延迟，这是一个关键的性能指标。Coeus首先通过将私有元数据检索从文档检索中分离出来来减少PIR开销，然后通过一系列新的加密改进将安全矩阵向量积扩展到具有数千亿个元素的tf-idf矩阵。对于包含500万个文档的英文维基百科语料库，一个包含64K个关键字的关键字字典，以及AWS上143台机器的集群，Coeus使用户能够在3.9秒内对文档进行排序和检索，这比基线系统提高了24倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Coeus: A System for Oblivious Document Ranking and Retrieval

Given a private string q and a remote server that holds a set of public documents D, how can one of the K most relevant documents to q in D be selected and viewed without anyone (not even the server) learning anything about q or the document? This is the oblivious document ranking and retrieval problem. In this paper, we describe Coeus, a system that solves this problem. At a high level, Coeus composes two cryptographic primitives: secure matrix-vector product for scoring document relevance using the widely-used term frequency-inverse document frequency (tf-idf) method, and private information retrieval (PIR) for obliviously retrieving documents. However, Coeus reduces the time to run these protocols, thereby improving the user-perceived latency, which is a key performance metric. Coeus first reduces the PIR overhead by separating out private metadata retrieval from document retrieval, and it then scales secure matrix-vector product to tf-idf matrices with several hundred billion elements through a series of novel cryptographic refinements. For a corpus of English Wikipedia containing 5 million documents, a keyword dictionary with 64K keywords, and on a cluster of 143 machines on AWS, Coeus enables a user to obliviously rank and retrieve a document in 3.9 seconds---a 24x improvement over a baseline system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Operating Systems Review (ACM) Computer Science-Computer Networks and Communications

CiteScore

2.80

自引率

0.00%

发文量

期刊介绍： Operating Systems Review (OSR) is a publication of the ACM Special Interest Group on Operating Systems (SIGOPS), whose scope of interest includes: computer operating systems and architecture for multiprogramming, multiprocessing, and time sharing; resource management; evaluation and simulation; reliability, integrity, and security of data; communications among computing processors; and computer system modeling and analysis.

期刊最新文献

Disaggregated GPU Acceleration for Serverless Applications Navigating Performance-Efficiency Tradeoffs in Serverless Computing: Deduplication to the Rescue! Using Local Cache Coherence for Disaggregated Memory Systems Make It Real: An End-to-End Implementation of A Physically Disaggregated Data Center Memory disaggregation: why now and what are the challenges