{"title":"多维点的快速自适应批量加载","authors":"Moin Hussain Moti, Dimitris Papadias","doi":"arxiv-2409.09447","DOIUrl":null,"url":null,"abstract":"Existing methods for bulk loading disk-based multidimensional points involve\nmultiple applications of external sorting. In this paper, we propose techniques\nthat apply linear scan, and are therefore significantly faster. The resulting\nFMBI Index possesses several desirable properties, including almost full and\nsquare nodes with zero overlap, and has excellent query performance. As a\nsecond contribution, we develop an adaptive version AMBI, which utilizes the\nquery workload to build a partial index only for parts of the data space that\ncontain query results. Finally, we extend FMBI and AMBI to parallel bulk\nloading and query processing in distributed systems. An extensive experimental\nevaluation with real datasets confirms that FMBI and AMBI clearly outperform\ncompetitors in terms of combined index construction and query processing cost,\nsometimes by orders of magnitude.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fast and Adaptive Bulk Loading of Multidimensional Points\",\"authors\":\"Moin Hussain Moti, Dimitris Papadias\",\"doi\":\"arxiv-2409.09447\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing methods for bulk loading disk-based multidimensional points involve\\nmultiple applications of external sorting. In this paper, we propose techniques\\nthat apply linear scan, and are therefore significantly faster. The resulting\\nFMBI Index possesses several desirable properties, including almost full and\\nsquare nodes with zero overlap, and has excellent query performance. As a\\nsecond contribution, we develop an adaptive version AMBI, which utilizes the\\nquery workload to build a partial index only for parts of the data space that\\ncontain query results. Finally, we extend FMBI and AMBI to parallel bulk\\nloading and query processing in distributed systems. An extensive experimental\\nevaluation with real datasets confirms that FMBI and AMBI clearly outperform\\ncompetitors in terms of combined index construction and query processing cost,\\nsometimes by orders of magnitude.\",\"PeriodicalId\":501123,\"journal\":{\"name\":\"arXiv - CS - Databases\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09447\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Fast and Adaptive Bulk Loading of Multidimensional Points
Existing methods for bulk loading disk-based multidimensional points involve
multiple applications of external sorting. In this paper, we propose techniques
that apply linear scan, and are therefore significantly faster. The resulting
FMBI Index possesses several desirable properties, including almost full and
square nodes with zero overlap, and has excellent query performance. As a
second contribution, we develop an adaptive version AMBI, which utilizes the
query workload to build a partial index only for parts of the data space that
contain query results. Finally, we extend FMBI and AMBI to parallel bulk
loading and query processing in distributed systems. An extensive experimental
evaluation with real datasets confirms that FMBI and AMBI clearly outperform
competitors in terms of combined index construction and query processing cost,
sometimes by orders of magnitude.