Giulia Preti, Gianmarco De Francisci Morales, Francesco Bonchi
{"title":"超图中的超距规则","authors":"Giulia Preti, Gianmarco De Francisci Morales, Francesco Bonchi","doi":"10.1007/s00778-024-00851-2","DOIUrl":null,"url":null,"abstract":"<p>We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer <i>s</i>, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer <i>s</i>-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: The line graph is typically orders of magnitude larger than the original hypergraph. We then introduce <span>HypED</span>, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding the materialization of the line graph. Our framework allows to approximately answer vertex-to-vertex, vertex-to-hyperedge, and hyperedge-to-hyperedge <i>s</i>-distance queries for any value of <i>s</i>. A key observation at the basis of our framework is that as <i>s</i> increases, the hypergraph becomes more fragmented. We show how this can be exploited to improve the placement of landmarks, by identifying the <i>s</i>-connected components of the hypergraph. For this latter task, we devise an efficient algorithm based on the union-find technique and a dynamic inverted index. We experimentally evaluate <span>HypED</span> on several real-world hypergraphs and prove its versatility in answering <i>s</i>-distance queries for different values of <i>s</i>. Our framework allows answering such queries in fractions of a millisecond while allowing fine-grained control of the trade-off between index size and approximation error at creation time. Finally, we prove the usefulness of the <i>s</i>-distance oracle in two applications, namely hypergraph-based recommendation and the approximation of the <i>s</i>-closeness centrality of vertices and hyperedges in the context of protein-protein interactions.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"38 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hyper-distance oracles in hypergraphs\",\"authors\":\"Giulia Preti, Gianmarco De Francisci Morales, Francesco Bonchi\",\"doi\":\"10.1007/s00778-024-00851-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer <i>s</i>, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer <i>s</i>-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: The line graph is typically orders of magnitude larger than the original hypergraph. We then introduce <span>HypED</span>, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding the materialization of the line graph. Our framework allows to approximately answer vertex-to-vertex, vertex-to-hyperedge, and hyperedge-to-hyperedge <i>s</i>-distance queries for any value of <i>s</i>. A key observation at the basis of our framework is that as <i>s</i> increases, the hypergraph becomes more fragmented. We show how this can be exploited to improve the placement of landmarks, by identifying the <i>s</i>-connected components of the hypergraph. For this latter task, we devise an efficient algorithm based on the union-find technique and a dynamic inverted index. We experimentally evaluate <span>HypED</span> on several real-world hypergraphs and prove its versatility in answering <i>s</i>-distance queries for different values of <i>s</i>. Our framework allows answering such queries in fractions of a millisecond while allowing fine-grained control of the trade-off between index size and approximation error at creation time. Finally, we prove the usefulness of the <i>s</i>-distance oracle in two applications, namely hypergraph-based recommendation and the approximation of the <i>s</i>-closeness centrality of vertices and hyperedges in the context of protein-protein interactions.</p>\",\"PeriodicalId\":501532,\"journal\":{\"name\":\"The VLDB Journal\",\"volume\":\"38 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The VLDB Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00778-024-00851-2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The VLDB Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00778-024-00851-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
我们研究的是超图中的点到点距离估计,其中查询的参数是一个正整数 s,它定义了将两个超边视为相邻所需的重叠程度。为了回答 s-距离查询,我们首先探讨了基于给定超图的线图的算法,并讨论了它的局限性:线图通常比原始超图大几个数量级。然后,我们引入了 HypED,这是一种基于地标的神谕,具有预定义的大小,直接建立在超图上,从而避免了线图的实体化。我们的框架可以近似回答任意 s 值的顶点到顶点、顶点到超边以及超边到超边的 s 距离查询。我们展示了如何利用这一点,通过识别超图的 s 连接组件来改进地标的放置。对于后一项任务,我们设计了一种基于联合查找技术和动态倒排索引的高效算法。我们在几个真实世界的超图上对 HypED 进行了实验评估,并证明了它在回答不同 s 值的 s 距离查询时的通用性。我们的框架允许在几毫秒内回答此类查询,同时允许在创建时对索引大小和近似误差之间的权衡进行细粒度控制。最后,我们证明了 s-distance 神谕在两个应用中的实用性,即基于超图的推荐以及蛋白质-蛋白质相互作用背景下顶点和超门的 s-closeness 中心性近似。
We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: The line graph is typically orders of magnitude larger than the original hypergraph. We then introduce HypED, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding the materialization of the line graph. Our framework allows to approximately answer vertex-to-vertex, vertex-to-hyperedge, and hyperedge-to-hyperedge s-distance queries for any value of s. A key observation at the basis of our framework is that as s increases, the hypergraph becomes more fragmented. We show how this can be exploited to improve the placement of landmarks, by identifying the s-connected components of the hypergraph. For this latter task, we devise an efficient algorithm based on the union-find technique and a dynamic inverted index. We experimentally evaluate HypED on several real-world hypergraphs and prove its versatility in answering s-distance queries for different values of s. Our framework allows answering such queries in fractions of a millisecond while allowing fine-grained control of the trade-off between index size and approximation error at creation time. Finally, we prove the usefulness of the s-distance oracle in two applications, namely hypergraph-based recommendation and the approximation of the s-closeness centrality of vertices and hyperedges in the context of protein-protein interactions.