基于空间填充曲线的高维相似性搜索

Proceedings 17th International Conference on Data Engineering Pub Date : 2001-04-02 DOI:10.1109/ICDE.2001.914876

Swanwa Liao, M. Lopez, Scott T. Leutenegger

{"title":"基于空间填充曲线的高维相似性搜索","authors":"Swanwa Liao, M. Lopez, Scott T. Leutenegger","doi":"10.1109/ICDE.2001.914876","DOIUrl":null,"url":null,"abstract":"We present a new approach for approximate nearest neighbor queries for sets of high dimensional points under any L/sub t/-metric, t=1,...,/spl infin/. The proposed algorithm is efficient and simple to implement. The algorithm uses multiple shifted copies of the data points and stores them in up to (d+1) B-trees where d is the dimensionality of the data, sorted according to their position along a space filling curve. This is done in a way that allows us to guarantee that a neighbor within an O(d/sup 1+1/t/) factor of the exact nearest, can be returned with at most (d+1)log, n page accesses, where p is the branching factor of the B-trees. In practice, for real data sets, our approximate technique finds the exact nearest neighbor between 87% and 99% of the time and a point no farther than the third nearest neighbor between 98% and 100% of the time. Our solution is dynamic, allowing insertion or deletion of points in O(d log/sub p/ n) page accesses and generalizes easily to find approximate k-nearest neighbors.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"100","resultStr":"{\"title\":\"High dimensional similarity search with space filling curves\",\"authors\":\"Swanwa Liao, M. Lopez, Scott T. Leutenegger\",\"doi\":\"10.1109/ICDE.2001.914876\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a new approach for approximate nearest neighbor queries for sets of high dimensional points under any L/sub t/-metric, t=1,...,/spl infin/. The proposed algorithm is efficient and simple to implement. The algorithm uses multiple shifted copies of the data points and stores them in up to (d+1) B-trees where d is the dimensionality of the data, sorted according to their position along a space filling curve. This is done in a way that allows us to guarantee that a neighbor within an O(d/sup 1+1/t/) factor of the exact nearest, can be returned with at most (d+1)log, n page accesses, where p is the branching factor of the B-trees. In practice, for real data sets, our approximate technique finds the exact nearest neighbor between 87% and 99% of the time and a point no farther than the third nearest neighbor between 98% and 100% of the time. Our solution is dynamic, allowing insertion or deletion of points in O(d log/sub p/ n) page accesses and generalizes easily to find approximate k-nearest neighbors.\",\"PeriodicalId\":431818,\"journal\":{\"name\":\"Proceedings 17th International Conference on Data Engineering\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"100\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 17th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2001.914876\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 17th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2001.914876","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 100

摘要

对于任意L/下标t/-metric, t=1，…下的高维点集，提出了一种近似最近邻查询的新方法。spl infin /。该算法效率高，实现简单。该算法使用数据点的多个移位副本，并将它们存储在最多(d+1)棵b树中，其中d是数据的维数，根据它们沿着空间填充曲线的位置进行排序。这是通过一种方式来实现的，这种方式允许我们保证在最接近的O(d/sup 1+1/t/)因子内的邻居，可以以最多(d+1)log, n次页面访问返回，其中p是b树的分支因子。在实践中，对于真实的数据集，我们的近似技术在87%到99%的时间内找到精确的最近邻居，在98%到100%的时间内找到不超过第三最近邻居的点。我们的解决方案是动态的，允许在O(d log/sub p/ n)页访问中插入或删除点，并且可以很容易地找到近似的k近邻。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

High dimensional similarity search with space filling curves

We present a new approach for approximate nearest neighbor queries for sets of high dimensional points under any L/sub t/-metric, t=1,...,/spl infin/. The proposed algorithm is efficient and simple to implement. The algorithm uses multiple shifted copies of the data points and stores them in up to (d+1) B-trees where d is the dimensionality of the data, sorted according to their position along a space filling curve. This is done in a way that allows us to guarantee that a neighbor within an O(d/sup 1+1/t/) factor of the exact nearest, can be returned with at most (d+1)log, n page accesses, where p is the branching factor of the B-trees. In practice, for real data sets, our approximate technique finds the exact nearest neighbor between 87% and 99% of the time and a point no farther than the third nearest neighbor between 98% and 100% of the time. Our solution is dynamic, allowing insertion or deletion of points in O(d log/sub p/ n) page accesses and generalizes easily to find approximate k-nearest neighbors.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings 17th International Conference on Data Engineering

自引率

0.00%

发文量

期刊最新文献

Quality-aware and load sensitive planning of image similarity queries Distinctiveness-sensitive nearest-neighbor search for efficient similarity retrieval of multimedia information Data management support of Web applications Prefetching based on the type-level access pattern in object-relational DBMSs Duality-based subsequence matching in time-series databases