Improving Multi-dimensional query processing with data migration in distributed cache infrastructure

2014 21st International Conference on High Performance Computing (HiPC) Pub Date : 2014-12-01 DOI:10.1109/HiPC.2014.7116906

Youngmoon Eom, Jinwoong Kim, Deukyeon Hwang, J. Kwak, Minho Shin, Beomseok Nam

{"title":"Improving Multi-dimensional query processing with data migration in distributed cache infrastructure","authors":"Youngmoon Eom, Jinwoong Kim, Deukyeon Hwang, J. Kwak, Minho Shin, Beomseok Nam","doi":"10.1109/HiPC.2014.7116906","DOIUrl":null,"url":null,"abstract":"In distributed query processing systems where caching infrastructure is distributed and scales with the number of servers, it is becoming more important to orchestrate and leverage a large number of cached objects in distributed caching systems seamlessly as the present trend is to build large scalable distributed systems by connecting small heterogeneous machines. With a large scale distributed caching system, a scheduling policy must consider both cache hit ratio and system load balance to optimize multiple queries. A scheduling policy that considers system load but not cache hit ratio often fails to reuse cached data by not assigning a query to the sever that has data objects the query needs. On the contrary, a scheduling policy that considers cache hit ratio but not system load balance may suffer from system load imbalance. To maximize the overall system throughput and to reduce query response time, a multiple query scheduling policy must balance system load and also leverage cached objects. In this paper, we present a distributed query processing framework that exhibits high cache hit ratio while achieving good system load balance. In order to seamlessly manage our distributed scalable caching system, our framework performs autonomic cached data migrations to improve cache hit ratio. Our experiments show that our proposed query scheduling policy and data migration policy significantly improve system throughput by achieving high cache hit ratio while avoiding system load imbalance.","PeriodicalId":337777,"journal":{"name":"2014 21st International Conference on High Performance Computing (HiPC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 21st International Conference on High Performance Computing (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2014.7116906","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In distributed query processing systems where caching infrastructure is distributed and scales with the number of servers, it is becoming more important to orchestrate and leverage a large number of cached objects in distributed caching systems seamlessly as the present trend is to build large scalable distributed systems by connecting small heterogeneous machines. With a large scale distributed caching system, a scheduling policy must consider both cache hit ratio and system load balance to optimize multiple queries. A scheduling policy that considers system load but not cache hit ratio often fails to reuse cached data by not assigning a query to the sever that has data objects the query needs. On the contrary, a scheduling policy that considers cache hit ratio but not system load balance may suffer from system load imbalance. To maximize the overall system throughput and to reduce query response time, a multiple query scheduling policy must balance system load and also leverage cached objects. In this paper, we present a distributed query processing framework that exhibits high cache hit ratio while achieving good system load balance. In order to seamlessly manage our distributed scalable caching system, our framework performs autonomic cached data migrations to improve cache hit ratio. Our experiments show that our proposed query scheduling policy and data migration policy significantly improve system throughput by achieving high cache hit ratio while avoiding system load imbalance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用数据迁移改进分布式缓存基础设施中的多维查询处理

在分布式查询处理系统中，缓存基础设施是分布式的，并且随着服务器数量的增加而扩展，因此无缝地编排和利用分布式缓存系统中的大量缓存对象变得越来越重要，因为目前的趋势是通过连接小型异构机器来构建大型可扩展的分布式系统。在大规模分布式缓存系统中，调度策略必须同时考虑缓存命中率和系统负载平衡来优化多个查询。考虑系统负载而不考虑缓存命中率的调度策略通常无法重用缓存的数据，因为它没有将查询分配给具有查询所需数据对象的服务器。相反，如果调度策略只考虑缓存命中率而不考虑系统负载均衡，则可能导致系统负载不均衡。为了最大化整个系统吞吐量并减少查询响应时间，多个查询调度策略必须平衡系统负载并利用缓存对象。在本文中，我们提出了一个分布式查询处理框架，该框架在实现良好的系统负载平衡的同时具有较高的缓存命中率。为了无缝地管理我们的分布式可扩展缓存系统，我们的框架执行自主的缓存数据迁移，以提高缓存命中率。实验表明，我们提出的查询调度策略和数据迁移策略在实现高缓存命中率的同时，显著提高了系统吞吐量，避免了系统负载失衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2014 21st International Conference on High Performance Computing (HiPC)

自引率

0.00%

发文量