Maximilian Franzke, Tobias Emrich, Andreas Züfle, M. Renz
{"title":"Indexing multi-metric data","authors":"Maximilian Franzke, Tobias Emrich, Andreas Züfle, M. Renz","doi":"10.1109/ICDE.2016.7498318","DOIUrl":null,"url":null,"abstract":"The proliferation of the Web 2.0 and the ubiquitousness of social media yield a huge flood of heterogenous data that is voluntarily published and shared by billions of individual users all over the world. As a result, the representation of an entity (such as a real person) in this data may consist of various data types, including location and other numeric attributes, textual descriptions, images, videos, social network information and other types of information. Searching similar entities in this multi-enriched data exploiting the information of multiple representations simultaneously promises to yield more interesting and relevant information than searching among each data type individually. While efficient similarity search on single representations is a well studied problem, existing studies lacks appropriate solutions for multi-enriched data taking into account the combination of all representations as a whole. In this paper, we address the problem of index-supported similarity search on multi-enriched (a.k.a. multi-represented) objects based on a set of metrics, one metric for each representation. We define multimetric similarity search queries by employing user-defined weight function specifying the impact of each metric at query time. Our main contribution is an index structure which combines all metrics into a single multi-dimensional access method that works for arbitrary weights preferences. The experimental evaluation shows that our proposed index structure is more efficient than existing multi-metric access methods considering different cost criteria and tremendously outperforms traditional approaches when querying very large sets of multi-enriched objects.","PeriodicalId":6883,"journal":{"name":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","volume":"26 1","pages":"1122-1133"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2016.7498318","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
The proliferation of the Web 2.0 and the ubiquitousness of social media yield a huge flood of heterogenous data that is voluntarily published and shared by billions of individual users all over the world. As a result, the representation of an entity (such as a real person) in this data may consist of various data types, including location and other numeric attributes, textual descriptions, images, videos, social network information and other types of information. Searching similar entities in this multi-enriched data exploiting the information of multiple representations simultaneously promises to yield more interesting and relevant information than searching among each data type individually. While efficient similarity search on single representations is a well studied problem, existing studies lacks appropriate solutions for multi-enriched data taking into account the combination of all representations as a whole. In this paper, we address the problem of index-supported similarity search on multi-enriched (a.k.a. multi-represented) objects based on a set of metrics, one metric for each representation. We define multimetric similarity search queries by employing user-defined weight function specifying the impact of each metric at query time. Our main contribution is an index structure which combines all metrics into a single multi-dimensional access method that works for arbitrary weights preferences. The experimental evaluation shows that our proposed index structure is more efficient than existing multi-metric access methods considering different cost criteria and tremendously outperforms traditional approaches when querying very large sets of multi-enriched objects.
Web 2.0的扩散和无处不在的社交媒体产生了大量异质数据,这些数据由世界各地数十亿个人用户自愿发布和共享。因此,此数据中实体(例如真人)的表示可能由各种数据类型组成,包括位置和其他数字属性、文本描述、图像、视频、社交网络信息和其他类型的信息。在这种多重丰富的数据中搜索相似的实体,同时利用多种表示的信息,比单独在每种数据类型中搜索更有意义和相关的信息。虽然对单一表示的高效相似性搜索是一个研究得很好的问题,但现有的研究缺乏考虑所有表示整体组合的多丰富数据的适当解决方案。在本文中,我们基于一组度量来解决索引支持的多浓缩(即多表示)对象的相似性搜索问题,每个度量对应一个度量。我们通过在查询时使用用户定义的权重函数指定每个度量的影响来定义多度量相似度搜索查询。我们的主要贡献是一个索引结构,它将所有指标结合到一个单一的多维访问方法中,该方法适用于任意权重偏好。实验结果表明,在考虑不同开销标准的情况下,我们提出的索引结构比现有的多度量访问方法效率更高,并且在查询非常大的多富集对象集时,显著优于传统方法。