Thilina Buddhika, Matthew Malensek, S. Pallickara, S. Pallickara
{"title":"生活在边缘","authors":"Thilina Buddhika, Matthew Malensek, S. Pallickara, S. Pallickara","doi":"10.1145/3450767","DOIUrl":null,"url":null,"abstract":"Voluminous time-series data streams produced in continuous sensing environments impose challenges pertaining to ingestion, storage, and analytics. In this study, we present a holistic approach based on data sketching to address these issues. We propose a hyper-sketching algorithm that combines discretization and frequency-based sketching to produce compact representations of the multi-feature, time-series data streams. We generate an ensemble of data sketches to make effective use of capabilities at the resource-constrained edge devices, the links over which data are transmitted, and the server pool where this data must be stored. The data sketches can be queried to construct datasets that are amenable to processing using popular analytical engines. We include several performance benchmarks using real-world data from different domains to profile the suitability of our design decisions. The proposed methodology can achieve up to ∼ 13 × and ∼ 2, 207 × reduction in data transfer and energy consumption at edge devices. We observe up to a ∼ 50% improvement in analytical job completion times in addition to the significant improvements in disk and network I/O.","PeriodicalId":29764,"journal":{"name":"ACM Transactions on Internet of Things","volume":"15 1","pages":"1 - 31"},"PeriodicalIF":3.5000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Living on the Edge\",\"authors\":\"Thilina Buddhika, Matthew Malensek, S. Pallickara, S. Pallickara\",\"doi\":\"10.1145/3450767\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Voluminous time-series data streams produced in continuous sensing environments impose challenges pertaining to ingestion, storage, and analytics. In this study, we present a holistic approach based on data sketching to address these issues. We propose a hyper-sketching algorithm that combines discretization and frequency-based sketching to produce compact representations of the multi-feature, time-series data streams. We generate an ensemble of data sketches to make effective use of capabilities at the resource-constrained edge devices, the links over which data are transmitted, and the server pool where this data must be stored. The data sketches can be queried to construct datasets that are amenable to processing using popular analytical engines. We include several performance benchmarks using real-world data from different domains to profile the suitability of our design decisions. The proposed methodology can achieve up to ∼ 13 × and ∼ 2, 207 × reduction in data transfer and energy consumption at edge devices. We observe up to a ∼ 50% improvement in analytical job completion times in addition to the significant improvements in disk and network I/O.\",\"PeriodicalId\":29764,\"journal\":{\"name\":\"ACM Transactions on Internet of Things\",\"volume\":\"15 1\",\"pages\":\"1 - 31\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2021-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Internet of Things\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3450767\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3450767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Voluminous time-series data streams produced in continuous sensing environments impose challenges pertaining to ingestion, storage, and analytics. In this study, we present a holistic approach based on data sketching to address these issues. We propose a hyper-sketching algorithm that combines discretization and frequency-based sketching to produce compact representations of the multi-feature, time-series data streams. We generate an ensemble of data sketches to make effective use of capabilities at the resource-constrained edge devices, the links over which data are transmitted, and the server pool where this data must be stored. The data sketches can be queried to construct datasets that are amenable to processing using popular analytical engines. We include several performance benchmarks using real-world data from different domains to profile the suitability of our design decisions. The proposed methodology can achieve up to ∼ 13 × and ∼ 2, 207 × reduction in data transfer and energy consumption at edge devices. We observe up to a ∼ 50% improvement in analytical job completion times in addition to the significant improvements in disk and network I/O.