时间序列数据库中使用机器学习的两级数据压缩

Xinyang Yu, Yanqing Peng, Feifei Li, Sheng Wang, Xiaowei Shen, Huijun Mai, Yue Xie
{"title":"时间序列数据库中使用机器学习的两级数据压缩","authors":"Xinyang Yu, Yanqing Peng, Feifei Li, Sheng Wang, Xiaowei Shen, Huijun Mai, Yue Xie","doi":"10.1109/ICDE48307.2020.00119","DOIUrl":null,"url":null,"abstract":"The explosion of time series advances the development of time series databases. To reduce storage overhead in these systems, data compression is widely adopted. Most existing compression algorithms utilize the overall characteristics of the entire time series to achieve high compression ratio, but ignore local contexts around individual points. In this way, they are effective for certain data patterns, and may suffer inherent pattern changes in real-world time series. It is therefore strongly desired to have a compression method that can always achieve high compression ratio in the existence of pattern diversity.In this paper, we propose a two-level compression model that selects a proper compression scheme for each individual point, so that diverse patterns can be captured at a fine granularity. Based on this model, we design and implement AMMMO framework, where a set of control parameters is defined to distill and categorize data patterns. At the top level, we evaluate each sub-sequence to fill in these parameters, generating a set of compression scheme candidates (i.e., major mode selection). At the bottom level, we choose the best scheme from these candidates for each data point respectively (i.e., sub-mode selection). To effectively handle diverse data patterns, we introduce a reinforcement learning based approach to learn parameter values automatically. Our experimental evaluation shows that our approach improves compression ratio by up to 120% (with an average of 50%), compared to other time-series compression methods.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"8 1","pages":"1333-1344"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Two-Level Data Compression using Machine Learning in Time Series Database\",\"authors\":\"Xinyang Yu, Yanqing Peng, Feifei Li, Sheng Wang, Xiaowei Shen, Huijun Mai, Yue Xie\",\"doi\":\"10.1109/ICDE48307.2020.00119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The explosion of time series advances the development of time series databases. To reduce storage overhead in these systems, data compression is widely adopted. Most existing compression algorithms utilize the overall characteristics of the entire time series to achieve high compression ratio, but ignore local contexts around individual points. In this way, they are effective for certain data patterns, and may suffer inherent pattern changes in real-world time series. It is therefore strongly desired to have a compression method that can always achieve high compression ratio in the existence of pattern diversity.In this paper, we propose a two-level compression model that selects a proper compression scheme for each individual point, so that diverse patterns can be captured at a fine granularity. Based on this model, we design and implement AMMMO framework, where a set of control parameters is defined to distill and categorize data patterns. At the top level, we evaluate each sub-sequence to fill in these parameters, generating a set of compression scheme candidates (i.e., major mode selection). At the bottom level, we choose the best scheme from these candidates for each data point respectively (i.e., sub-mode selection). To effectively handle diverse data patterns, we introduce a reinforcement learning based approach to learn parameter values automatically. Our experimental evaluation shows that our approach improves compression ratio by up to 120% (with an average of 50%), compared to other time-series compression methods.\",\"PeriodicalId\":6709,\"journal\":{\"name\":\"2020 IEEE 36th International Conference on Data Engineering (ICDE)\",\"volume\":\"8 1\",\"pages\":\"1333-1344\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 36th International Conference on Data Engineering (ICDE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE48307.2020.00119\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE48307.2020.00119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

摘要

时间序列的爆炸式增长推动了时间序列数据库的发展。为了减少这些系统的存储开销,数据压缩被广泛采用。现有的压缩算法大多利用整个时间序列的整体特征来实现高压缩比,而忽略了单个点周围的局部上下文。通过这种方式,它们对某些数据模式有效,并且可能在实际时间序列中遭受固有的模式变化。因此,人们强烈希望有一种在模式多样性存在的情况下始终能够获得高压缩比的压缩方法。在本文中,我们提出了一个两级压缩模型,该模型为每个单独的点选择合适的压缩方案,从而可以在细粒度上捕获不同的模式。在此基础上,设计并实现了AMMMO框架,定义了一组控制参数对数据模式进行提取和分类。在顶层,我们评估每个子序列以填充这些参数,生成一组压缩方案候选(即主要模式选择)。在底层,我们分别从这些候选方案中为每个数据点选择最佳方案(即子模式选择)。为了有效地处理不同的数据模式,我们引入了一种基于强化学习的方法来自动学习参数值。我们的实验评估表明,与其他时间序列压缩方法相比,我们的方法将压缩比提高了120%(平均为50%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Two-Level Data Compression using Machine Learning in Time Series Database
The explosion of time series advances the development of time series databases. To reduce storage overhead in these systems, data compression is widely adopted. Most existing compression algorithms utilize the overall characteristics of the entire time series to achieve high compression ratio, but ignore local contexts around individual points. In this way, they are effective for certain data patterns, and may suffer inherent pattern changes in real-world time series. It is therefore strongly desired to have a compression method that can always achieve high compression ratio in the existence of pattern diversity.In this paper, we propose a two-level compression model that selects a proper compression scheme for each individual point, so that diverse patterns can be captured at a fine granularity. Based on this model, we design and implement AMMMO framework, where a set of control parameters is defined to distill and categorize data patterns. At the top level, we evaluate each sub-sequence to fill in these parameters, generating a set of compression scheme candidates (i.e., major mode selection). At the bottom level, we choose the best scheme from these candidates for each data point respectively (i.e., sub-mode selection). To effectively handle diverse data patterns, we introduce a reinforcement learning based approach to learn parameter values automatically. Our experimental evaluation shows that our approach improves compression ratio by up to 120% (with an average of 50%), compared to other time-series compression methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Turbocharging Geospatial Visualization Dashboards via a Materialized Sampling Cube Approach Mobility-Aware Dynamic Taxi Ridesharing Multiscale Frequent Co-movement Pattern Mining Automatic Calibration of Road Intersection Topology using Trajectories Turbine: Facebook’s Service Management Platform for Stream Processing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1