Hyper-distance oracles in hypergraphs

Giulia Preti, Gianmarco De Francisci Morales, Francesco Bonchi
{"title":"Hyper-distance oracles in hypergraphs","authors":"Giulia Preti, Gianmarco De Francisci Morales, Francesco Bonchi","doi":"10.1007/s00778-024-00851-2","DOIUrl":null,"url":null,"abstract":"<p>We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer <i>s</i>, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer <i>s</i>-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: The line graph is typically orders of magnitude larger than the original hypergraph. We then introduce <span>HypED</span>, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding the materialization of the line graph. Our framework allows to approximately answer vertex-to-vertex, vertex-to-hyperedge, and hyperedge-to-hyperedge <i>s</i>-distance queries for any value of <i>s</i>. A key observation at the basis of our framework is that as <i>s</i> increases, the hypergraph becomes more fragmented. We show how this can be exploited to improve the placement of landmarks, by identifying the <i>s</i>-connected components of the hypergraph. For this latter task, we devise an efficient algorithm based on the union-find technique and a dynamic inverted index. We experimentally evaluate <span>HypED</span> on several real-world hypergraphs and prove its versatility in answering <i>s</i>-distance queries for different values of <i>s</i>. Our framework allows answering such queries in fractions of a millisecond while allowing fine-grained control of the trade-off between index size and approximation error at creation time. Finally, we prove the usefulness of the <i>s</i>-distance oracle in two applications, namely hypergraph-based recommendation and the approximation of the <i>s</i>-closeness centrality of vertices and hyperedges in the context of protein-protein interactions.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"38 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The VLDB Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00778-024-00851-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: The line graph is typically orders of magnitude larger than the original hypergraph. We then introduce HypED, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding the materialization of the line graph. Our framework allows to approximately answer vertex-to-vertex, vertex-to-hyperedge, and hyperedge-to-hyperedge s-distance queries for any value of s. A key observation at the basis of our framework is that as s increases, the hypergraph becomes more fragmented. We show how this can be exploited to improve the placement of landmarks, by identifying the s-connected components of the hypergraph. For this latter task, we devise an efficient algorithm based on the union-find technique and a dynamic inverted index. We experimentally evaluate HypED on several real-world hypergraphs and prove its versatility in answering s-distance queries for different values of s. Our framework allows answering such queries in fractions of a millisecond while allowing fine-grained control of the trade-off between index size and approximation error at creation time. Finally, we prove the usefulness of the s-distance oracle in two applications, namely hypergraph-based recommendation and the approximation of the s-closeness centrality of vertices and hyperedges in the context of protein-protein interactions.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
超图中的超距规则
我们研究的是超图中的点到点距离估计,其中查询的参数是一个正整数 s,它定义了将两个超边视为相邻所需的重叠程度。为了回答 s-距离查询,我们首先探讨了基于给定超图的线图的算法,并讨论了它的局限性:线图通常比原始超图大几个数量级。然后,我们引入了 HypED,这是一种基于地标的神谕,具有预定义的大小,直接建立在超图上,从而避免了线图的实体化。我们的框架可以近似回答任意 s 值的顶点到顶点、顶点到超边以及超边到超边的 s 距离查询。我们展示了如何利用这一点,通过识别超图的 s 连接组件来改进地标的放置。对于后一项任务,我们设计了一种基于联合查找技术和动态倒排索引的高效算法。我们在几个真实世界的超图上对 HypED 进行了实验评估,并证明了它在回答不同 s 值的 s 距离查询时的通用性。我们的框架允许在几毫秒内回答此类查询,同时允许在创建时对索引大小和近似误差之间的权衡进行细粒度控制。最后,我们证明了 s-distance 神谕在两个应用中的实用性,即基于超图的推荐以及蛋白质-蛋白质相互作用背景下顶点和超门的 s-closeness 中心性近似。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A versatile framework for attributed network clustering via K-nearest neighbor augmentation Discovering critical vertices for reinforcement of large-scale bipartite networks DumpyOS: A data-adaptive multi-ary index for scalable data series similarity search Enabling space-time efficient range queries with REncoder AutoCTS++: zero-shot joint neural architecture and hyperparameter search for correlated time series forecasting
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1