Pub Date : 2024-03-01Epub Date: 2024-05-21DOI: 10.1109/dcc58796.2024.00020
Adrián Goga, Lore Depuydt, Nathaniel K Brown, Jan Fostier, Travis Gagie, Gonzalo Navarro
MONI (Rossi et al., JCB 2022) is a BWT-based compressed index for computing the matching statistics and maximal exact matches (MEMs) of a pattern (usually a DNA read) with respect to a highly repetitive text (usually a database of genomes) using two operations: LF-steps and longest common extension (LCE) queries on a grammar-compressed representation of the text. In practice, most of the operations are constant-time LF-steps but most of the time is spent evaluating LCE queries. In this paper we show how (a variant of) the latter can be evaluated lazily, so as to bound the total time MONI needs to process the pattern in terms of the number of MEMs between the pattern and the text, while maintaining logarithmic latency.
MONI(Rossi 等人,JCB 2022)是一种基于 BWT 的压缩索引,用于计算模式(通常是 DNA 读取)与高度重复文本(通常是基因组数据库)的匹配统计和最大精确匹配 (MEM),使用了两种操作:对文本的语法压缩表示进行 LF 步骤和最长公共扩展(LCE)查询。在实践中,大部分操作都是恒时 LF 步,但大部分时间都花在评估 LCE 查询上。在本文中,我们展示了如何对后者(的一种变体)进行懒散评估,从而在保持对数延迟的情况下,用模式和文本之间的 MEM 数量来约束 MONI 处理模式所需的总时间。
{"title":"Faster Maximal Exact Matches with Lazy LCP Evaluation.","authors":"Adrián Goga, Lore Depuydt, Nathaniel K Brown, Jan Fostier, Travis Gagie, Gonzalo Navarro","doi":"10.1109/dcc58796.2024.00020","DOIUrl":"10.1109/dcc58796.2024.00020","url":null,"abstract":"<p><p>MONI (Rossi et al., <i>JCB</i> 2022) is a BWT-based compressed index for computing the matching statistics and maximal exact matches (MEMs) of a pattern (usually a DNA read) with respect to a highly repetitive text (usually a database of genomes) using two operations: LF-steps and longest common extension (LCE) queries on a grammar-compressed representation of the text. In practice, most of the operations are constant-time LF-steps but most of the time is spent evaluating LCE queries. In this paper we show how (a variant of) the latter can be evaluated lazily, so as to bound the total time MONI needs to process the pattern in terms of the number of MEMs between the pattern and the text, while maintaining logarithmic latency.</p>","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"2024 ","pages":"123-132"},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11328106/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142001556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prefix-free parsing is useful for a wide variety of purposes including building the BWT, constructing the suffix array, and supporting compressed suffix tree operations. This linear-time algorithm uses a rolling hash to break an input string into substrings, where the resulting set of unique substrings has the property that none of the substrings' suffixes (of more than a certain length) is a proper prefix of any of the other substrings' suffixes. Hence, the name prefix-free parsing. This set of unique substrings is referred to as the dictionary. The parse is the ordered list of dictionary strings that defines the input string. Prior empirical results demonstrated the size of the parse is more burdensome than the size of the dictionary for large, repetitive inputs. Hence, the question arises as to how the size of the parse can scale satisfactorily with the input. Here, we describe our algorithm, recursive prefix-free parsing, which accomplishes this by computing the prefix-free parse of the parse produced by prefix-free parsing an input string. Although conceptually simple, building the BWT from the parse-of-the-parse and the dictionaries is significantly more challenging. We solve and implement this problem. Our experimental results show that recursive prefix-free parsing is extremely effective in reducing the memory needed to build the run-length encoded BWT of the input. Our implementation is open source and available at https://github.com/marco-oliva/r-pfbwt.
{"title":"Recursive Prefix-Free Parsing for Building Big BWTs.","authors":"Marco Oliva, Travis Gagie, Christina Boucher","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Prefix-free parsing is useful for a wide variety of purposes including building the BWT, constructing the suffix array, and supporting compressed suffix tree operations. This linear-time algorithm uses a rolling hash to break an input string into substrings, where the resulting set of unique substrings has the property that none of the substrings' suffixes (of more than a certain length) is a proper prefix of any of the other substrings' suffixes. Hence, the name prefix-free parsing. This set of unique substrings is referred to as the <i>dictionary</i>. The <i>parse</i> is the ordered list of dictionary strings that defines the input string. Prior empirical results demonstrated the size of the parse is more burdensome than the size of the dictionary for large, repetitive inputs. Hence, the question arises as to how the size of the parse can scale satisfactorily with the input. Here, we describe our algorithm, <i>recursive prefix-free parsing</i>, which accomplishes this by computing the prefix-free parse of the parse produced by prefix-free parsing an input string. Although conceptually simple, building the BWT from the parse-of-the-parse and the dictionaries is significantly more challenging. We solve and implement this problem. Our experimental results show that recursive prefix-free parsing is extremely effective in reducing the memory needed to build the run-length encoded BWT of the input. Our implementation is open source and available at https://github.com/marco-oliva/r-pfbwt.</p>","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"2023 ","pages":"62-70"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11328891/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142001555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-01Epub Date: 2021-05-10DOI: 10.1109/dcc50243.2021.00027
Christina Boucher, Travis Gagie, I Tomohiro, Dominik Köppl, Ben Langmead, Giovanni Manzini, Gonzalo Navarro, Alejandro Pacheco, Massimiliano Rossi
Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this paper, we simplify their solution and make it streaming, at the cost of slowing it down slightly. This means that, first, we can compute the matching statistics of several long patterns (such as whole human chromosomes) in parallel while still using a reasonable amount of RAM; second, we can compute matching statistics online with low latency and thus quickly recognize when a pattern becomes incompressible relative to the database. Our code is available at https://github.com/koeppl/phoni.
{"title":"PHONI: Streamed Matching Statistics with Multi-Genome References.","authors":"Christina Boucher, Travis Gagie, I Tomohiro, Dominik Köppl, Ben Langmead, Giovanni Manzini, Gonzalo Navarro, Alejandro Pacheco, Massimiliano Rossi","doi":"10.1109/dcc50243.2021.00027","DOIUrl":"10.1109/dcc50243.2021.00027","url":null,"abstract":"<p><p>Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this paper, we simplify their solution and make it streaming, at the cost of slowing it down slightly. This means that, first, we can compute the matching statistics of several long patterns (such as whole human chromosomes) in parallel while still using a reasonable amount of RAM; second, we can compute matching statistics online with low latency and thus quickly recognize when a pattern becomes incompressible relative to the database. Our code is available at https://github.com/koeppl/phoni.</p>","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"2021 ","pages":"193-202"},"PeriodicalIF":0.0,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/dcc50243.2021.00027","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39624285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. J. Sánchez-Hernández, V. Ruiz, J. Ortiz, D. Muller
This is a work focused on remote browsing of JPEG2000 image sequences which takes advantage of the spatial scalability of JPEG2000 to determine which precincts of a subsequent image should be transmitted, and which precincts should be reused from a previously reconstructed image. The results of our experiments demonstrate that the quality of the reconstructed images can be significantly increased by using motion compensation and conditional replenishment on the client side. The proposed algorithm is compatible with standard JPIP servers.
{"title":"Client-Driven Transmission of JPEG2000 Image Sequences Using Motion Compensated Conditional Replenishment","authors":"J. J. Sánchez-Hernández, V. Ruiz, J. Ortiz, D. Muller","doi":"10.1109/DCC.2019.00114","DOIUrl":"https://doi.org/10.1109/DCC.2019.00114","url":null,"abstract":"This is a work focused on remote browsing of JPEG2000 image sequences which takes advantage of the spatial scalability of JPEG2000 to determine which precincts of a subsequent image should be transmitted, and which precincts should be reused from a previously reconstructed image. The results of our experiments demonstrate that the quality of the reconstructed images can be significantly increased by using motion compensation and conditional replenishment on the client side. The proposed algorithm is compatible with standard JPIP servers.","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"138 1","pages":"602"},"PeriodicalIF":0.0,"publicationDate":"2019-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83274843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-04-01Epub Date: 2017-05-11DOI: 10.1109/DCC.2017.82
Dmitri S Pavlichin, Amir Ingber, Tsachy Weissman
We propose a method and algorithm for lossless compression of tabular data – including, for example, machine learning datasets, server logs and genomic datasets. Superior compression ratios are achieved by exploiting dependencies between the fields (or "features") in the dataset. The algorithm compresses the records w.r.t. a probabilistic graphical model – specifically an optimized forest, where each feature is a node. The work extends a method known as a Chow-Liu tree by incorporating a more accurate correction term to the cost function, which corresponds to the size required to describe the model itself. Additional features of the algorithm are efficient coding of the metadata (such as probability distributions), as well as data relabeling in order to cope with large datasets and alphabets. We test the algorithm on several datasets, and demonstrate an improvement in the compression rates of between 2X and 5X compared to gzip. The larger improvements are observed for very large datasets, such as the Criteo click prediction dataset which was published as part of a recent Kaggle competition.
{"title":"Compressing Tabular Data via Pairwise Dependencies.","authors":"Dmitri S Pavlichin, Amir Ingber, Tsachy Weissman","doi":"10.1109/DCC.2017.82","DOIUrl":"https://doi.org/10.1109/DCC.2017.82","url":null,"abstract":"We propose a method and algorithm for lossless compression of tabular data – including, for example, machine learning datasets, server logs and genomic datasets. Superior compression ratios are achieved by exploiting dependencies between the fields (or \"features\") in the dataset. The algorithm compresses the records w.r.t. a probabilistic graphical model – specifically an optimized forest, where each feature is a node. The work extends a method known as a Chow-Liu tree by incorporating a more accurate correction term to the cost function, which corresponds to the size required to describe the model itself. Additional features of the algorithm are efficient coding of the metadata (such as probability distributions), as well as data relabeling in order to cope with large datasets and alphabets. We test the algorithm on several datasets, and demonstrate an improvement in the compression rates of between 2X and 5X compared to gzip. The larger improvements are observed for very large datasets, such as the Criteo click prediction dataset which was published as part of a recent Kaggle competition.","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"2017 ","pages":"455"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/DCC.2017.82","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35621699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-04-01Epub Date: 2017-05-11DOI: 10.1109/DCC.2017.76
Reggy Long, Mikel Hernaez, Idoia Ochoa, Tsachy Weissman
The affordability of DNA sequencing has led to unprecedented volumes of genomic data. These data must be stored, processed, and analyzed. The most popular format for genomic data is the SAM format, which contains information such as alignment, quality values, etc. These files are large (on the order of terabytes), which necessitates compression. In this work we propose a new reference-based compressor for SAM files, which can accommodate different levels of compression, based on the specific needs of the user. In particular, the proposed compressor GeneComp allows the user to perform lossy compression of the quality scores, which have been proven to occupy more than half of the compressed file (when losslessly compressed). We show that the proposed compressor GeneComp overall achieves better compression ratios than previously proposed algorithms when working on lossless mode.
{"title":"GeneComp, a new reference-based compressor for SAM files.","authors":"Reggy Long, Mikel Hernaez, Idoia Ochoa, Tsachy Weissman","doi":"10.1109/DCC.2017.76","DOIUrl":"https://doi.org/10.1109/DCC.2017.76","url":null,"abstract":"<p><p>The affordability of DNA sequencing has led to unprecedented volumes of genomic data. These data must be stored, processed, and analyzed. The most popular format for genomic data is the SAM format, which contains information such as alignment, quality values, etc. These files are large (on the order of terabytes), which necessitates compression. In this work we propose a new reference-based compressor for SAM files, which can accommodate different levels of compression, based on the specific needs of the user. In particular, the proposed compressor GeneComp allows the user to perform lossy compression of the quality scores, which have been proven to occupy more than half of the compressed file (when losslessly compressed). We show that the proposed compressor GeneComp overall achieves better compression ratios than previously proposed algorithms when working on lossless mode.</p>","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"2017 ","pages":"330-339"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/DCC.2017.76","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35621698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A Locavore Infrastructure is one which has all of its elements in high-bandwidth and low-latency proximity. It typically combines edge computing elements with an adjacent access network. The growing number of communicating devices and things creates a large and often steady demand for collecting and integrating local information in a Locavore Infrastructure. Slices of this infrastructure can provide architectural advantages in security, meeting performance expectations, and billing. Dynamic slices can provide some of the same kinds of surge capabilities for which traditional cloud computing is prized. Slices can be implemented using a variety of orchestration techniques.
{"title":"Slicing in locavore infrastructures","authors":"Glenn Ricart","doi":"10.1145/2955193.2955207","DOIUrl":"https://doi.org/10.1145/2955193.2955207","url":null,"abstract":"A Locavore Infrastructure is one which has all of its elements in high-bandwidth and low-latency proximity. It typically combines edge computing elements with an adjacent access network. The growing number of communicating devices and things creates a large and often steady demand for collecting and integrating local information in a Locavore Infrastructure. Slices of this infrastructure can provide architectural advantages in security, meeting performance expectations, and billing. Dynamic slices can provide some of the same kinds of surge capabilities for which traditional cloud computing is prized. Slices can be implemented using a variety of orchestration techniques.","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"87 1","pages":"4:1-4:6"},"PeriodicalIF":0.0,"publicationDate":"2016-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76235105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we propose a practical scalable software-level mechanism for taking crash-consistent snapshots of a group of virtual machines. The group is dynamically defined at the software virtualization layer allowing us to move the consistency group abstraction from the hardware array layer into the hypervisor with very low overhead (~ 50 msecs VM freeze time). This low overhead allows us to take crash-consistent snapshots of large software-defined consistency groups at a reasonable frequency, guaranteeing low data loss for disaster recovery. To demonstrate practicality, we use our mechanism to take crash-consistent snapshots of multi-disk virtual machines running two database applications: PostgreSQL, and Apache Cassandra. Deployment experiments confirm that our mechanism scales well with number of VMs, and snapshot times remain invariant of virtual disk size and usage.
{"title":"Software-defined consistency group abstractions for virtual machines","authors":"Muntasir Raihan Rahman, Sudarsan Piduri, Ilya Languev, Rean Griffith, Indranil Gupta","doi":"10.1145/2955193.2955198","DOIUrl":"https://doi.org/10.1145/2955193.2955198","url":null,"abstract":"In this paper we propose a practical scalable software-level mechanism for taking crash-consistent snapshots of a group of virtual machines. The group is dynamically defined at the software virtualization layer allowing us to move the consistency group abstraction from the hardware array layer into the hypervisor with very low overhead (~ 50 msecs VM freeze time). This low overhead allows us to take crash-consistent snapshots of large software-defined consistency groups at a reasonable frequency, guaranteeing low data loss for disaster recovery. To demonstrate practicality, we use our mechanism to take crash-consistent snapshots of multi-disk virtual machines running two database applications: PostgreSQL, and Apache Cassandra. Deployment experiments confirm that our mechanism scales well with number of VMs, and snapshot times remain invariant of virtual disk size and usage.","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"79 1","pages":"3:1-3:6"},"PeriodicalIF":0.0,"publicationDate":"2016-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75242410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Providing services for multiple tenants within a single or federated distributed cloud environment requires a variety of special considerations related to network design, provisioning, and operations. Especially important are multiple topics concerning the implementation of multiple parallel programmable virtual networks for large numbers of tenants, who require autonomous management, control, and data planes. This paper provides an overview of some of the challenges that arise from developing and implementing parallel programmable virtual networks, describes experiences with several experimental techniques for addressing those challenges based on large scale distributed testbeds, and presents the results of the experiments that were conducted. Distributed environments used include a distributed cloud testbed, the Chameleon Cloud, sponsored by the National Science Foundation's NSFCloud program, the NSF's Global Environment for Network Innovations (GENI), an international distributed OpenFlow testbed, and the Open Science Data Cloud.
{"title":"Next generation virtual network architecture for multi-tenant distributed clouds: challenges and emerging techniques","authors":"J. Mambretti, J. Chen, F. Yeh","doi":"10.1145/2955193.2955194","DOIUrl":"https://doi.org/10.1145/2955193.2955194","url":null,"abstract":"Providing services for multiple tenants within a single or federated distributed cloud environment requires a variety of special considerations related to network design, provisioning, and operations. Especially important are multiple topics concerning the implementation of multiple parallel programmable virtual networks for large numbers of tenants, who require autonomous management, control, and data planes. This paper provides an overview of some of the challenges that arise from developing and implementing parallel programmable virtual networks, describes experiences with several experimental techniques for addressing those challenges based on large scale distributed testbeds, and presents the results of the experiments that were conducted. Distributed environments used include a distributed cloud testbed, the Chameleon Cloud, sponsored by the National Science Foundation's NSFCloud program, the NSF's Global Environment for Network Innovations (GENI), an international distributed OpenFlow testbed, and the Open Science Data Cloud.","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"22 1","pages":"1:1-1:6"},"PeriodicalIF":0.0,"publicationDate":"2016-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84404010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a series of novel techniques for reducing the tail latency in stream processing systems like Apache Storm. Concretely, we present three mechanisms: (1) adaptive timeout coupled with selective replay to catch straggler tuples; (2) shared queues among different tasks of the same operator to reduce overall queueing delay; (3) latency feedback-based load balancing, intended to mitigate heterogenous scenarios. We have implemented these techniques in Apache Storm, and present experimental results using sets of micro-benchmarks as well as two topologies from Yahoo! Inc. Our results show improvement in tail latency up to 72.9%.
{"title":"New techniques to curtail the tail latency in stream processing systems","authors":"Guangxiang Du, Indranil Gupta","doi":"10.1145/2955193.2955206","DOIUrl":"https://doi.org/10.1145/2955193.2955206","url":null,"abstract":"This paper presents a series of novel techniques for reducing the tail latency in stream processing systems like Apache Storm. Concretely, we present three mechanisms: (1) adaptive timeout coupled with selective replay to catch straggler tuples; (2) shared queues among different tasks of the same operator to reduce overall queueing delay; (3) latency feedback-based load balancing, intended to mitigate heterogenous scenarios. We have implemented these techniques in Apache Storm, and present experimental results using sets of micro-benchmarks as well as two topologies from Yahoo! Inc. Our results show improvement in tail latency up to 72.9%.","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"38 1","pages":"7:1-7:6"},"PeriodicalIF":0.0,"publicationDate":"2016-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89972727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}