Bridging Dense and Sparse Maximum Inner Product Search

IF 5.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Information Systems Pub Date : 2024-05-17 DOI:10.1145/3665324

Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo Liberty

{"title":"Bridging Dense and Sparse Maximum Inner Product Search","authors":"Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo Liberty","doi":"10.1145/3665324","DOIUrl":null,"url":null,"abstract":"<p>Maximum inner product search (MIPS) over dense and sparse vectors have progressed independently in a bifurcated literature for decades; the latter is better known as top-\\(k\\) retrieval in Information Retrieval. This duality exists because sparse and dense vectors serve different end goals. That is despite the fact that they are manifestations of the same mathematical problem. In this work, we ask if algorithms for dense vectors could be applied effectively to sparse vectors, particularly those that violate the assumptions underlying top-\\(k\\) retrieval methods. We study clustering-based approximate MIPS where vectors are partitioned into clusters and only a fraction of clusters are searched during retrieval. We conduct a comprehensive analysis of dimensionality reduction for sparse vectors, and examine standard and spherical KMeans for partitioning. Our experiments demonstrate that clustering-based retrieval serves as an efficient solution for sparse MIPS. As byproducts, we identify two research opportunities and explore their potential. First, we cast the clustering-based paradigm as dynamic pruning and turn that insight into a novel organization of the inverted index for approximate MIPS over general sparse vectors. Second, we offer a unified regime for MIPS over vectors that have dense and sparse subspaces, that is robust to query distributions.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"5 1","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3665324","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Maximum inner product search (MIPS) over dense and sparse vectors have progressed independently in a bifurcated literature for decades; the latter is better known as top-\(k\) retrieval in Information Retrieval. This duality exists because sparse and dense vectors serve different end goals. That is despite the fact that they are manifestations of the same mathematical problem. In this work, we ask if algorithms for dense vectors could be applied effectively to sparse vectors, particularly those that violate the assumptions underlying top-\(k\) retrieval methods. We study clustering-based approximate MIPS where vectors are partitioned into clusters and only a fraction of clusters are searched during retrieval. We conduct a comprehensive analysis of dimensionality reduction for sparse vectors, and examine standard and spherical KMeans for partitioning. Our experiments demonstrate that clustering-based retrieval serves as an efficient solution for sparse MIPS. As byproducts, we identify two research opportunities and explore their potential. First, we cast the clustering-based paradigm as dynamic pruning and turn that insight into a novel organization of the inverted index for approximate MIPS over general sparse vectors. Second, we offer a unified regime for MIPS over vectors that have dense and sparse subspaces, that is robust to query distributions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

连接密集与稀疏最大内积搜索

几十年来，稠密向量和稀疏向量的最大内积搜索（MIPS）一直在分化的文献中独立发展；后者在信息检索中被称为顶层检索（top-\(k\) retrieval）。之所以存在这种二元性，是因为稀疏向量和密集向量服务于不同的最终目标。尽管事实上它们表现的是同一个数学问题。在这项工作中，我们询问密向量的算法能否有效地应用于稀疏向量，尤其是那些违反顶（k）检索方法基础假设的算法。我们研究了基于聚类的近似 MIPS，在这种方法中，向量被划分为聚类，检索时只搜索聚类的一部分。我们对稀疏向量的降维进行了全面分析，并研究了标准和球形 KMeans 分区。我们的实验证明，基于聚类的检索是稀疏 MIPS 的高效解决方案。作为副产品，我们发现了两个研究机会，并探索了它们的潜力。首先，我们将基于聚类的范例视为动态剪枝，并将这一洞察力转化为一种新颖的倒排索引组织，用于一般稀疏向量上的近似 MIPS。其次，我们为具有密集和稀疏子空间的向量的 MIPS 提供了一种统一的机制，它对查询分布具有鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

14.30%

发文量

165

审稿时长

>12 weeks

期刊介绍： The ACM Transactions on Information Systems (TOIS) publishes papers on information retrieval (such as search engines, recommender systems) that contain: new principled information retrieval models or algorithms with sound empirical validation; observational, experimental and/or theoretical studies yielding new insights into information retrieval or information seeking; accounts of applications of existing information retrieval techniques that shed light on the strengths and weaknesses of the techniques; formalization of new information retrieval or information seeking tasks and of methods for evaluating the performance on those tasks; development of content (text, image, speech, video, etc) analysis methods to support information retrieval and information seeking; development of computational models of user information preferences and interaction behaviors; creation and analysis of evaluation methodologies for information retrieval and information seeking; or surveys of existing work that propose a significant synthesis. The information retrieval scope of ACM Transactions on Information Systems (TOIS) appeals to industry practitioners for its wealth of creative ideas, and to academic researchers for its descriptions of their colleagues'' work.

期刊最新文献

ROGER: Ranking-oriented Generative Retrieval Adversarial Item Promotion on Visually-Aware Recommender Systems by Guided Diffusion Bridging Dense and Sparse Maximum Inner Product Search MvStHgL: Multi-view Hypergraph Learning with Spatial-temporal Periodic Interests for Next POI Recommendation City Matters! A Dual-Target Cross-City Sequential POI Recommendation Model