DIMS: Distributed Index for Similarity Search in Metric Spaces

IF 8.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-10-29 DOI:10.1109/TKDE.2024.3487759
Yifan Zhu;Chengyang Luo;Tang Qian;Lu Chen;Yunjun Gao;Baihua Zheng
{"title":"DIMS: Distributed Index for Similarity Search in Metric Spaces","authors":"Yifan Zhu;Chengyang Luo;Tang Qian;Lu Chen;Yunjun Gao;Baihua Zheng","doi":"10.1109/TKDE.2024.3487759","DOIUrl":null,"url":null,"abstract":"Similarity search finds objects that are similar to a given query object based on a similarity metric. As the amount and variety of data continue to grow, similarity search in metric spaces has gained significant attention. Metric spaces can accommodate any type of data and support flexible distance metrics, making similarity search in metric spaces beneficial for many real-world applications, such as multimedia retrieval, personalized recommendation, trajectory analytics, data mining, decision planning, and distributed servers. However, existing studies mostly focus on indexing metric spaces on a single machine, which faces efficiency and scalability limitations with increasing data volume and query amount. Recent advancements in similarity search turn towards distributed methods, while they face challenges including inefficient local data management, unbalanced workload, and low concurrent search efficiency. To this end, we propose \n<bold>DIMS</b>\n, an efficient \n<bold>D</b>\nistributed \n<bold>I</b>\nndex for similarity search in \n<bold>M</b>\netric \n<bold>S</b>\npaces. First, we design a novel three-stage heterogeneous partition to achieve workload balance. Then, we present an effective three-stage indexing structure to efficiently manage objects. We also develop concurrent search methods with filtering and validation techniques that support efficient distributed similarity search. Additionally, we devise a cost-based optimization model to balance communication and computation cost. Extensive experiments demonstrate that DIMS significantly outperforms existing distributed similarity search approaches.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 1","pages":"210-225"},"PeriodicalIF":8.9000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10737368/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Similarity search finds objects that are similar to a given query object based on a similarity metric. As the amount and variety of data continue to grow, similarity search in metric spaces has gained significant attention. Metric spaces can accommodate any type of data and support flexible distance metrics, making similarity search in metric spaces beneficial for many real-world applications, such as multimedia retrieval, personalized recommendation, trajectory analytics, data mining, decision planning, and distributed servers. However, existing studies mostly focus on indexing metric spaces on a single machine, which faces efficiency and scalability limitations with increasing data volume and query amount. Recent advancements in similarity search turn towards distributed methods, while they face challenges including inefficient local data management, unbalanced workload, and low concurrent search efficiency. To this end, we propose DIMS , an efficient D istributed I ndex for similarity search in M etric S paces. First, we design a novel three-stage heterogeneous partition to achieve workload balance. Then, we present an effective three-stage indexing structure to efficiently manage objects. We also develop concurrent search methods with filtering and validation techniques that support efficient distributed similarity search. Additionally, we devise a cost-based optimization model to balance communication and computation cost. Extensive experiments demonstrate that DIMS significantly outperforms existing distributed similarity search approaches.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
度量空间相似度搜索的分布式索引
相似度搜索根据相似度度量查找与给定查询对象相似的对象。随着数据量和种类的不断增长,度量空间中的相似度搜索得到了广泛的关注。度量空间可以容纳任何类型的数据,并支持灵活的距离度量,使得度量空间中的相似性搜索对许多现实世界的应用程序有益,例如多媒体检索、个性化推荐、轨迹分析、数据挖掘、决策规划和分布式服务器。然而,现有的研究大多集中在单机器上索引度量空间,随着数据量和查询量的增加,这种方法面临效率和可扩展性的限制。近年来,相似度搜索技术的发展趋向于分布式方法,但也面临着本地数据管理效率低下、工作负载不平衡和并发搜索效率低等问题。为此,我们提出了一种高效的度量空间相似度搜索分布式索引DIMS。首先,我们设计了一种新的三阶段异构分区来实现工作负载平衡。然后,我们提出了一种有效的三阶段索引结构来有效地管理对象。我们还开发了具有过滤和验证技术的并发搜索方法,以支持高效的分布式相似度搜索。此外,我们设计了一个基于成本的优化模型来平衡通信和计算成本。大量的实验表明,DIMS显著优于现有的分布式相似度搜索方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering 工程技术-工程:电子与电气
CiteScore
11.70
自引率
3.40%
发文量
515
审稿时长
6 months
期刊介绍: The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.
期刊最新文献
2024 Reviewers List Web-FTP: A Feature Transferring-Based Pre-Trained Model for Web Attack Detection Network-to-Network: Self-Supervised Network Representation Learning via Position Prediction AEGK: Aligned Entropic Graph Kernels Through Continuous-Time Quantum Walks Contextual Inference From Sparse Shopping Transactions Based on Motif Patterns
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1