{"title":"加快实现最近邻搜索和分类","authors":"Ivo Marinchev, G. Agre","doi":"10.1145/2812428.2812464","DOIUrl":null,"url":null,"abstract":"The paper presents practical approaches and techniques to speeding up implementations of nearest neighbour search/classification algorithm for high dimensional data and/or many training examples. Such settings often appear in the fields of big data and data mining. We apply a fast iterative form of polar decomposition and use the computed matrix to pre-select smaller number of candidate classes for the query element. We show that additional speed up can be achieved when the training classes consists of many instances by subdividing them in subclasses by fast approximation of some clustering algorithm and the resulting classification is used for building the decomposition matrix. Our pre-processing (depends linearly or near linearly on the number of examples and dimensions) and pre-selection steps (depends on number of classes) can be used with any well-known indexing method as annulus method, kd-trees, metric trees, r-trees, cover trees, etc to limit the training instances used in the search/classification process. Finally we introduce what we name cluster index and show that in practice it extends the applicability of the indexing structures with higher order complexity to bigger datasets.","PeriodicalId":316788,"journal":{"name":"International Conference on Computer Systems and Technologies","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"On speeding up the implementation of nearest neighbour search and classification\",\"authors\":\"Ivo Marinchev, G. Agre\",\"doi\":\"10.1145/2812428.2812464\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper presents practical approaches and techniques to speeding up implementations of nearest neighbour search/classification algorithm for high dimensional data and/or many training examples. Such settings often appear in the fields of big data and data mining. We apply a fast iterative form of polar decomposition and use the computed matrix to pre-select smaller number of candidate classes for the query element. We show that additional speed up can be achieved when the training classes consists of many instances by subdividing them in subclasses by fast approximation of some clustering algorithm and the resulting classification is used for building the decomposition matrix. Our pre-processing (depends linearly or near linearly on the number of examples and dimensions) and pre-selection steps (depends on number of classes) can be used with any well-known indexing method as annulus method, kd-trees, metric trees, r-trees, cover trees, etc to limit the training instances used in the search/classification process. Finally we introduce what we name cluster index and show that in practice it extends the applicability of the indexing structures with higher order complexity to bigger datasets.\",\"PeriodicalId\":316788,\"journal\":{\"name\":\"International Conference on Computer Systems and Technologies\",\"volume\":\"85 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Computer Systems and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2812428.2812464\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Computer Systems and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2812428.2812464","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On speeding up the implementation of nearest neighbour search and classification
The paper presents practical approaches and techniques to speeding up implementations of nearest neighbour search/classification algorithm for high dimensional data and/or many training examples. Such settings often appear in the fields of big data and data mining. We apply a fast iterative form of polar decomposition and use the computed matrix to pre-select smaller number of candidate classes for the query element. We show that additional speed up can be achieved when the training classes consists of many instances by subdividing them in subclasses by fast approximation of some clustering algorithm and the resulting classification is used for building the decomposition matrix. Our pre-processing (depends linearly or near linearly on the number of examples and dimensions) and pre-selection steps (depends on number of classes) can be used with any well-known indexing method as annulus method, kd-trees, metric trees, r-trees, cover trees, etc to limit the training instances used in the search/classification process. Finally we introduce what we name cluster index and show that in practice it extends the applicability of the indexing structures with higher order complexity to bigger datasets.