用压缩索引在一眨眼的时间内进行相似度搜索

Proc. VLDB Endow. Pub Date : 2023-04-07 DOI:10.48550/arXiv.2304.04759

C. Aguerrebere, Ishwar Bhati, Mark Hildebrand, Mariano Tepper, Ted L. Willke

{"title":"用压缩索引在一眨眼的时间内进行相似度搜索","authors":"C. Aguerrebere, Ishwar Bhati, Mark Hildebrand, Mariano Tepper, Ted L. Willke","doi":"10.48550/arXiv.2304.04759","DOIUrl":null,"url":null,"abstract":"Nowadays, data is represented by vectors. Retrieving those vectors, among millions and billions, that are similar to a given query is a ubiquitous problem, known as similarity search, of relevance for a wide range of applications. Graph-based indices are currently the best performing techniques for billion-scale similarity search. However, their random-access memory pattern presents challenges to realize their full potential. In this work, we present new techniques and systems for creating faster and smaller graph-based indices. To this end, we introduce a novel vector compression method, Locally-adaptive Vector Quantization (LVQ), that uses per-vector scaling and scalar quantization to improve search performance with fast similarity computations and a reduced effective bandwidth, while decreasing memory footprint and barely impacting accuracy. LVQ, when combined with a new high-performance computing system for graph-based similarity search, establishes the new state of the art in terms of performance and memory footprint. For billions of vectors, LVQ outcompetes the second-best alternatives: (1) in the low-memory regime, by up to 20.7x in throughput with up to a 3x memory footprint reduction, and (2) in the high-throughput regime by 5.8x with 1.4x less memory.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"116 2 1","pages":"3433-3446"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Similarity search in the blink of an eye with compressed indices\",\"authors\":\"C. Aguerrebere, Ishwar Bhati, Mark Hildebrand, Mariano Tepper, Ted L. Willke\",\"doi\":\"10.48550/arXiv.2304.04759\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, data is represented by vectors. Retrieving those vectors, among millions and billions, that are similar to a given query is a ubiquitous problem, known as similarity search, of relevance for a wide range of applications. Graph-based indices are currently the best performing techniques for billion-scale similarity search. However, their random-access memory pattern presents challenges to realize their full potential. In this work, we present new techniques and systems for creating faster and smaller graph-based indices. To this end, we introduce a novel vector compression method, Locally-adaptive Vector Quantization (LVQ), that uses per-vector scaling and scalar quantization to improve search performance with fast similarity computations and a reduced effective bandwidth, while decreasing memory footprint and barely impacting accuracy. LVQ, when combined with a new high-performance computing system for graph-based similarity search, establishes the new state of the art in terms of performance and memory footprint. For billions of vectors, LVQ outcompetes the second-best alternatives: (1) in the low-memory regime, by up to 20.7x in throughput with up to a 3x memory footprint reduction, and (2) in the high-throughput regime by 5.8x with 1.4x less memory.\",\"PeriodicalId\":20467,\"journal\":{\"name\":\"Proc. VLDB Endow.\",\"volume\":\"116 2 1\",\"pages\":\"3433-3446\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proc. VLDB Endow.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2304.04759\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2304.04759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

现在，数据是用向量表示的。从数百万甚至数十亿个与给定查询相似的向量中检索这些向量是一个普遍存在的问题，称为相似性搜索，与广泛的应用程序相关。基于图的索引是目前十亿级相似度搜索中性能最好的技术。然而，他们的随机存取存储器模式提出了充分发挥其潜力的挑战。在这项工作中，我们提出了创建更快、更小的基于图的索引的新技术和系统。为此，我们引入了一种新的矢量压缩方法，局部自适应矢量量化(LVQ)，它使用每个矢量缩放和标量量化来提高搜索性能，快速相似度计算和减少有效带宽，同时减少内存占用，几乎不影响精度。LVQ与用于基于图的相似性搜索的新型高性能计算系统相结合，在性能和内存占用方面达到了最新水平。对于数十亿个向量，LVQ优于第二好的替代方案:(1)在低内存状态下，吞吐量提高20.7倍，内存占用减少3倍;(2)在高吞吐量状态下，吞吐量提高5.8倍，内存减少1.4倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Similarity search in the blink of an eye with compressed indices

Nowadays, data is represented by vectors. Retrieving those vectors, among millions and billions, that are similar to a given query is a ubiquitous problem, known as similarity search, of relevance for a wide range of applications. Graph-based indices are currently the best performing techniques for billion-scale similarity search. However, their random-access memory pattern presents challenges to realize their full potential. In this work, we present new techniques and systems for creating faster and smaller graph-based indices. To this end, we introduce a novel vector compression method, Locally-adaptive Vector Quantization (LVQ), that uses per-vector scaling and scalar quantization to improve search performance with fast similarity computations and a reduced effective bandwidth, while decreasing memory footprint and barely impacting accuracy. LVQ, when combined with a new high-performance computing system for graph-based similarity search, establishes the new state of the art in terms of performance and memory footprint. For billions of vectors, LVQ outcompetes the second-best alternatives: (1) in the low-memory regime, by up to 20.7x in throughput with up to a 3x memory footprint reduction, and (2) in the high-throughput regime by 5.8x with 1.4x less memory.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助