{"title":"基于lsm的存储和索引:一个具有及时优势的老想法","authors":"Sattam Alsubaiee, M. Carey, Chen Li","doi":"10.1145/2786006.2786007","DOIUrl":null,"url":null,"abstract":"With the social-media data explosion, near real-time queries, particularly those of a spatio-temporal nature, can be challenging. In this paper, we show how to efficiently answer queries that target recent data within very large data sets. We describe a solution that exploits a natural partitioning property that LSM-based indexes have for components, allowing us to filter out many components when answering queries. Our solution is generalizable to any LSM-based index structure, and can be applied not just on temporal fields (e.g., based on recency), but on any \"time-correlated fields\" such as Universally Unique Identifiers (UUIDs), user-provided integer ids, etc. We have implemented and experimentally evaluated the solution in the context of the AsterixDB system.","PeriodicalId":443011,"journal":{"name":"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"LSM-Based Storage and Indexing: An Old Idea with Timely Benefits\",\"authors\":\"Sattam Alsubaiee, M. Carey, Chen Li\",\"doi\":\"10.1145/2786006.2786007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the social-media data explosion, near real-time queries, particularly those of a spatio-temporal nature, can be challenging. In this paper, we show how to efficiently answer queries that target recent data within very large data sets. We describe a solution that exploits a natural partitioning property that LSM-based indexes have for components, allowing us to filter out many components when answering queries. Our solution is generalizable to any LSM-based index structure, and can be applied not just on temporal fields (e.g., based on recency), but on any \\\"time-correlated fields\\\" such as Universally Unique Identifiers (UUIDs), user-provided integer ids, etc. We have implemented and experimentally evaluated the solution in the context of the AsterixDB system.\",\"PeriodicalId\":443011,\"journal\":{\"name\":\"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2786006.2786007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2786006.2786007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
LSM-Based Storage and Indexing: An Old Idea with Timely Benefits
With the social-media data explosion, near real-time queries, particularly those of a spatio-temporal nature, can be challenging. In this paper, we show how to efficiently answer queries that target recent data within very large data sets. We describe a solution that exploits a natural partitioning property that LSM-based indexes have for components, allowing us to filter out many components when answering queries. Our solution is generalizable to any LSM-based index structure, and can be applied not just on temporal fields (e.g., based on recency), but on any "time-correlated fields" such as Universally Unique Identifiers (UUIDs), user-provided integer ids, etc. We have implemented and experimentally evaluated the solution in the context of the AsterixDB system.