Olivia Del Guercio, Rafael Orozco, A. Sim, Kesheng Wu
{"title":"基于相似度的多维模式匹配压缩","authors":"Olivia Del Guercio, Rafael Orozco, A. Sim, Kesheng Wu","doi":"10.1145/3322798.3329252","DOIUrl":null,"url":null,"abstract":"Sensors typically record their measurements using more precision than the accuracy of the sensing techniques. Thus, experimental and observational data often contain noise that appears random and cannot be easily compressed. This noise increases storage requirement as well as computation time for analyses. In this work, we describe a line of research to develop data reduction techniques that preserve the key features while reducing the storage requirement. Our core observation is that the noise in such cases could be characterized by a small number of patterns based on statistical similarity. In earlier tests, this approach was shown to reduce the storage requirement by over 100-fold for one-dimensional sequences. In this work, we explore a set of different similarity measures for multidimensional sequences. During our tests with standard quality measures such as Peak Signal to Noise Ratio (PSNR), we observe that the new compression methods reduce the storage requirements over 100- fold while maintaining relatively low errors in PSNR. Thus, we believe that this is an effective strategy to construct data reduction techniques.","PeriodicalId":365009,"journal":{"name":"Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Similarity-based Compression with Multidimensional Pattern Matching\",\"authors\":\"Olivia Del Guercio, Rafael Orozco, A. Sim, Kesheng Wu\",\"doi\":\"10.1145/3322798.3329252\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sensors typically record their measurements using more precision than the accuracy of the sensing techniques. Thus, experimental and observational data often contain noise that appears random and cannot be easily compressed. This noise increases storage requirement as well as computation time for analyses. In this work, we describe a line of research to develop data reduction techniques that preserve the key features while reducing the storage requirement. Our core observation is that the noise in such cases could be characterized by a small number of patterns based on statistical similarity. In earlier tests, this approach was shown to reduce the storage requirement by over 100-fold for one-dimensional sequences. In this work, we explore a set of different similarity measures for multidimensional sequences. During our tests with standard quality measures such as Peak Signal to Noise Ratio (PSNR), we observe that the new compression methods reduce the storage requirements over 100- fold while maintaining relatively low errors in PSNR. Thus, we believe that this is an effective strategy to construct data reduction techniques.\",\"PeriodicalId\":365009,\"journal\":{\"name\":\"Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3322798.3329252\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3322798.3329252","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Similarity-based Compression with Multidimensional Pattern Matching
Sensors typically record their measurements using more precision than the accuracy of the sensing techniques. Thus, experimental and observational data often contain noise that appears random and cannot be easily compressed. This noise increases storage requirement as well as computation time for analyses. In this work, we describe a line of research to develop data reduction techniques that preserve the key features while reducing the storage requirement. Our core observation is that the noise in such cases could be characterized by a small number of patterns based on statistical similarity. In earlier tests, this approach was shown to reduce the storage requirement by over 100-fold for one-dimensional sequences. In this work, we explore a set of different similarity measures for multidimensional sequences. During our tests with standard quality measures such as Peak Signal to Noise Ratio (PSNR), we observe that the new compression methods reduce the storage requirements over 100- fold while maintaining relatively low errors in PSNR. Thus, we believe that this is an effective strategy to construct data reduction techniques.