Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00020
Johannes Manner, G. Wirtz
Open-source offerings are often investigated when comparing their features to commercial cloud offerings. However, performance benchmarking is rarely executed for open-source tools hosted on-premise nor is it possible to conduct a fair cost comparison due to a lack of resource settings equivalent to cloud scaling strategies.Therefore, we firstly list implemented resource scaling strategies for public and open-source FaaS platforms. Based on this we propose a methodology to calculate an abstract performance measure to compare two platforms with each other. Since all open-source platforms suggest a Kubernetes deployment, we use this measure for a configuration of open-source FaaS platforms based on Kubernetes limits. We tested our approach with CPU intensive functions, considering the difference between single-threaded and multi-threaded functions to avoid wasting resources. With regard to this, we also address the noisy neighbor problem for open-source FaaS platforms by conducting an instance parallelization experiment. Our approach to limit resources leads to consistent results while avoiding an overbooking of resources.
{"title":"Resource Scaling Strategies for Open-Source FaaS Platforms compared to Commercial Cloud Offerings","authors":"Johannes Manner, G. Wirtz","doi":"10.1109/CLOUD55607.2022.00020","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00020","url":null,"abstract":"Open-source offerings are often investigated when comparing their features to commercial cloud offerings. However, performance benchmarking is rarely executed for open-source tools hosted on-premise nor is it possible to conduct a fair cost comparison due to a lack of resource settings equivalent to cloud scaling strategies.Therefore, we firstly list implemented resource scaling strategies for public and open-source FaaS platforms. Based on this we propose a methodology to calculate an abstract performance measure to compare two platforms with each other. Since all open-source platforms suggest a Kubernetes deployment, we use this measure for a configuration of open-source FaaS platforms based on Kubernetes limits. We tested our approach with CPU intensive functions, considering the difference between single-threaded and multi-threaded functions to avoid wasting resources. With regard to this, we also address the noisy neighbor problem for open-source FaaS platforms by conducting an instance parallelization experiment. Our approach to limit resources leads to consistent results while avoiding an overbooking of resources.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"17 1","pages":"40-48"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87816793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00051
Arthur Souza, N. Cacho, T. Batista, R. Ranjan
In the Smart Cities context, a plethora of Middle-ware Platforms had been proposed to support applications execution and data processing. Despite all the progress already made, the vast majority of solutions have not met the requirements of Applications’ Runtime, Development, and Deployment when related to Scalability. Some studies point out that just 1 of 97 (1%) reported platforms reach this all this set of requirements at same time. This small number of platforms may be explained by some reasons: i) Big Data: The huge amount of processed and stored data with various data sources and data types, ii) Multi-domains: many domains involved (Economy, Traffic, Health, Security, Agronomy, etc.), iii) Multiple processing methods like Data Flow, Batch Processing, Services, and Microservices, and 4) High Distributed Degree: The use of multiple IoT and BigData tools combined with execution at various computational levels (Edge, Fog, Cloud) leads applications to present a high level of distribution. Aware of those great challenges, we propose Sapparchi, an integrated architectural model for Smart Cities applications that defines multi-processing levels (Edge, Fog, and Cloud). Also, it presents the Sapparchi middleware platform for developing, deploying, and running applications in the smart city environment with an osmotic multi-processing approach that scales applications from Cloud to Edge. Finally, an experimental evaluation exposes the main advantages of adopting Sapparchi.
{"title":"SAPPARCHI: an Osmotic Platform to Execute Scalable Applications on Smart City Environments","authors":"Arthur Souza, N. Cacho, T. Batista, R. Ranjan","doi":"10.1109/CLOUD55607.2022.00051","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00051","url":null,"abstract":"In the Smart Cities context, a plethora of Middle-ware Platforms had been proposed to support applications execution and data processing. Despite all the progress already made, the vast majority of solutions have not met the requirements of Applications’ Runtime, Development, and Deployment when related to Scalability. Some studies point out that just 1 of 97 (1%) reported platforms reach this all this set of requirements at same time. This small number of platforms may be explained by some reasons: i) Big Data: The huge amount of processed and stored data with various data sources and data types, ii) Multi-domains: many domains involved (Economy, Traffic, Health, Security, Agronomy, etc.), iii) Multiple processing methods like Data Flow, Batch Processing, Services, and Microservices, and 4) High Distributed Degree: The use of multiple IoT and BigData tools combined with execution at various computational levels (Edge, Fog, Cloud) leads applications to present a high level of distribution. Aware of those great challenges, we propose Sapparchi, an integrated architectural model for Smart Cities applications that defines multi-processing levels (Edge, Fog, and Cloud). Also, it presents the Sapparchi middleware platform for developing, deploying, and running applications in the smart city environment with an osmotic multi-processing approach that scales applications from Cloud to Edge. Finally, an experimental evaluation exposes the main advantages of adopting Sapparchi.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"70 1","pages":"289-298"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76538206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00017
Chin-Hsien Wu, Liang-Ting Chen
Nowadays, solid-state drives (SSDs) have become the best choice of storage devices, when compared with hard-disk drives (HDDs). More and more scenarios adopt a multi-SSD architecture to improve performance and expand storage capacity for cloud services, database centers, distributed systems and virtualized environments. When multiple users (flows) are competing for shared multiple SSDs concurrently, if the multi-SSD architecture lacks a fairness strategy among multiple users, a user that takes up more resources can affect other users. Meanwhile, if the multi-SSD architecture lacks a load-balance strategy among multiple shared SSDs, some specific SSDs may receive too many I/O requests to degrade the performance and shorten the lifespan. Therefore, we will propose a state-aware method to consider flows with fairness on NVMe SSDs with load balance.
{"title":"A State-aware Method for Flows with Fairness on NVMe SSDs with Load Balance","authors":"Chin-Hsien Wu, Liang-Ting Chen","doi":"10.1109/CLOUD55607.2022.00017","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00017","url":null,"abstract":"Nowadays, solid-state drives (SSDs) have become the best choice of storage devices, when compared with hard-disk drives (HDDs). More and more scenarios adopt a multi-SSD architecture to improve performance and expand storage capacity for cloud services, database centers, distributed systems and virtualized environments. When multiple users (flows) are competing for shared multiple SSDs concurrently, if the multi-SSD architecture lacks a fairness strategy among multiple users, a user that takes up more resources can affect other users. Meanwhile, if the multi-SSD architecture lacks a load-balance strategy among multiple shared SSDs, some specific SSDs may receive too many I/O requests to degrade the performance and shorten the lifespan. Therefore, we will propose a state-aware method to consider flows with fairness on NVMe SSDs with load balance.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"91 1","pages":"11-18"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83495168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00072
Jesus Rios, Saurabh Jha, L. Shwartz
Finding the exact location of a fault in a large distributed microservices application running in containerized cloud environments can be very difficult and time-consuming. We present a novel approach that uses distributed tracing to automatically detect, localize and aid in explaining application-level faults. We demonstrate the effectiveness of our proposed approach by injecting faults into a well-known microservice-based benchmark application. Our experiments demonstrated that the proposed fault localization algorithm correctly detects and localize the microservice with the injected fault. We also compare our approach with other fault localization methods. In particular, we empirically show that our method outperforms methods in which a graph model of error propagation is used for inferring fault locations using error logs. Our work illustrates the value added by distributed tracing for localizing and explaining faults in microservices.
{"title":"Localizing and Explaining Faults in Microservices Using Distributed Tracing","authors":"Jesus Rios, Saurabh Jha, L. Shwartz","doi":"10.1109/CLOUD55607.2022.00072","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00072","url":null,"abstract":"Finding the exact location of a fault in a large distributed microservices application running in containerized cloud environments can be very difficult and time-consuming. We present a novel approach that uses distributed tracing to automatically detect, localize and aid in explaining application-level faults. We demonstrate the effectiveness of our proposed approach by injecting faults into a well-known microservice-based benchmark application. Our experiments demonstrated that the proposed fault localization algorithm correctly detects and localize the microservice with the injected fault. We also compare our approach with other fault localization methods. In particular, we empirically show that our method outperforms methods in which a graph model of error propagation is used for inferring fault locations using error logs. Our work illustrates the value added by distributed tracing for localizing and explaining faults in microservices.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"60 1","pages":"489-499"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91279540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00073
Pradyumna Kaushik, S. Raghavendra, M. Govindaraju
The adoption of virtualization technologies in datacenters has increased dramatically in the past decade. Clouds have pivoted from being just an infrastructure rental to offering platforms and solutions, made possible by having several layers of abstraction, providing internal and external users the ability to focus on core business logic. Efficient resource management has in turn become salient in ensuring operational efficiency. In this work, we study key factors that can influence vertical scaling decisions, propose a policy to vertically scale deadline constrained applications and surface our findings from experimentation. We observe that (a) the duration for which an application is profiled has an almost cyclic influence on the accuracy of behavior predictions and is inversely proportional to the time spent consuming backlog, (b) the duration for which an application is scaled can help achieve up to a 9.6% and 4.2% reduction in the 75th and 95th percentile of power usage respectively, (c) reducing the tolerance towards accrual of backlog influences the application execution time and can reduce the number of SLA violations by 50% or 100% at times and (d) increasing the time to deadline offers power saving opportunities and can help achieve a 9.3% improvement in the 75th percentile of power usage.
{"title":"A Study of Contributing Factors to Power Aware Vertical Scaling of Deadline Constrained Applications","authors":"Pradyumna Kaushik, S. Raghavendra, M. Govindaraju","doi":"10.1109/CLOUD55607.2022.00073","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00073","url":null,"abstract":"The adoption of virtualization technologies in datacenters has increased dramatically in the past decade. Clouds have pivoted from being just an infrastructure rental to offering platforms and solutions, made possible by having several layers of abstraction, providing internal and external users the ability to focus on core business logic. Efficient resource management has in turn become salient in ensuring operational efficiency. In this work, we study key factors that can influence vertical scaling decisions, propose a policy to vertically scale deadline constrained applications and surface our findings from experimentation. We observe that (a) the duration for which an application is profiled has an almost cyclic influence on the accuracy of behavior predictions and is inversely proportional to the time spent consuming backlog, (b) the duration for which an application is scaled can help achieve up to a 9.6% and 4.2% reduction in the 75th and 95th percentile of power usage respectively, (c) reducing the tolerance towards accrual of backlog influences the application execution time and can reduce the number of SLA violations by 50% or 100% at times and (d) increasing the time to deadline offers power saving opportunities and can help achieve a 9.3% improvement in the 75th percentile of power usage.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"125 1","pages":"500-510"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87514094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00062
T. Inagaki, Yohei Ueda, Moriyoshi Ohara, Sunyanan Choochotkaew, Marcelo Amaral, Scott Trent, Tatsuhiro Chiba, Qi Zhang
We propose a method to detect both software and hardware bottlenecks in a web service consisting of microservices. A bottleneck is a resource that limits the maximum performance of the entire web service. Bottlenecks often include both software resources such as threads, locks, and channels, and hardware resources such as processors, memories, and disks. Bottlenecks form a layered structure since a single request can utilize multiple software resources and a hardware resource simultaneously. The microservice architecture makes the detection of layered bottlenecks challenging due to the lack of a uniform analysis perspective across languages, libraries, frameworks, and middle-ware.We detect layered bottlenecks in microservices by profiling numbers and status of working threads in each microservice and dependency among microservices via network connections. Our approach can be applied to various programming languages since it relies only on standard debugging tools. Nevertheless, our approach not only detects which microservice is a bottleneck but also enables us to understand why it becomes a bottleneck. This is enabled by a novel visualization method to show layered bottlenecks in microservices at a glance. We demonstrate that our approach successfully detects and visualizes layered bottlenecks in the state-of-the-art microservice benchmarks, DeathStarBench and Acme Air microservices. This enables us to optimize the microservices themselves to achieve a higher throughput per re-source utilization rate compared with simply scaling the number of replicas of microservices.
{"title":"Detecting Layered Bottlenecks in Microservices","authors":"T. Inagaki, Yohei Ueda, Moriyoshi Ohara, Sunyanan Choochotkaew, Marcelo Amaral, Scott Trent, Tatsuhiro Chiba, Qi Zhang","doi":"10.1109/CLOUD55607.2022.00062","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00062","url":null,"abstract":"We propose a method to detect both software and hardware bottlenecks in a web service consisting of microservices. A bottleneck is a resource that limits the maximum performance of the entire web service. Bottlenecks often include both software resources such as threads, locks, and channels, and hardware resources such as processors, memories, and disks. Bottlenecks form a layered structure since a single request can utilize multiple software resources and a hardware resource simultaneously. The microservice architecture makes the detection of layered bottlenecks challenging due to the lack of a uniform analysis perspective across languages, libraries, frameworks, and middle-ware.We detect layered bottlenecks in microservices by profiling numbers and status of working threads in each microservice and dependency among microservices via network connections. Our approach can be applied to various programming languages since it relies only on standard debugging tools. Nevertheless, our approach not only detects which microservice is a bottleneck but also enables us to understand why it becomes a bottleneck. This is enabled by a novel visualization method to show layered bottlenecks in microservices at a glance. We demonstrate that our approach successfully detects and visualizes layered bottlenecks in the state-of-the-art microservice benchmarks, DeathStarBench and Acme Air microservices. This enables us to optimize the microservices themselves to achieve a higher throughput per re-source utilization rate compared with simply scaling the number of replicas of microservices.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"44 1","pages":"385-396"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86808106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00047
Shifat P. Mithila, Gerald Baumgartner
A centralized scheduler can become a bottleneck for placing the tasks of a many-task application on heterogeneous cloud resources. We have previously demonstrated that a de-centralized vector scheduling approach based on performance measurements can be used successfully for this task placement scenario. In this paper, we extend this approach to task placement based on latency measurements. Each node collects the performance measurements from its neighbors on an overlay graph, measures the communication latency, and then makes local decisions on where to move tasks. We present a centralized algorithm for configuring the overlay graph based on latency measurements and extend the vector scheduling approach to take latency into considerations. Our experiments in CloudLab demonstrate that this approach results in better performance and resource utilization than without latency information.
{"title":"Latency-based Vector Scheduling of Many-task Applications for a Hybrid Cloud","authors":"Shifat P. Mithila, Gerald Baumgartner","doi":"10.1109/CLOUD55607.2022.00047","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00047","url":null,"abstract":"A centralized scheduler can become a bottleneck for placing the tasks of a many-task application on heterogeneous cloud resources. We have previously demonstrated that a de-centralized vector scheduling approach based on performance measurements can be used successfully for this task placement scenario. In this paper, we extend this approach to task placement based on latency measurements. Each node collects the performance measurements from its neighbors on an overlay graph, measures the communication latency, and then makes local decisions on where to move tasks. We present a centralized algorithm for configuring the overlay graph based on latency measurements and extend the vector scheduling approach to take latency into considerations. Our experiments in CloudLab demonstrate that this approach results in better performance and resource utilization than without latency information.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"51 1","pages":"257-262"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82090489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00039
Yu Zhang, Tianbo Wang
Time series anomaly detection has become more critical with the rapid development of network technology, especially in cloud monitoring. We focus on applying deep reinforcement learning (DRL) in this question. It is not feasible to simply use the traditional value-based DRL method because DRL cannot accurately capture important time information in time series. Most of the existing methods resort to the RNN mechanism, which in turn brings about the problem of sequence learning. In this paper, we conduct progressive research work on applying value-based DRL in time series anomaly detection. Firstly, because of the poor performance of traditional DQN, we propose an improved DQN-D method, whose performance is improved by 62% compared with DQN. Second, for RNN-based DRL, we propose a method based on improved experience replay pool (DRQN) to make up for the shortcomings of existing work and achieve excellent performance. Finally, we propose a Transformer-based DRL anomaly detection method to verify the effectiveness of the Transformer structure. Experimental results show that our DQN-D can obtain performance close to RNN-based DRL, DRQN and DTQN perform well on the dataset, and all methods are proven effective.
{"title":"Applying Value-Based Deep Reinforcement Learning on KPI Time Series Anomaly Detection","authors":"Yu Zhang, Tianbo Wang","doi":"10.1109/CLOUD55607.2022.00039","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00039","url":null,"abstract":"Time series anomaly detection has become more critical with the rapid development of network technology, especially in cloud monitoring. We focus on applying deep reinforcement learning (DRL) in this question. It is not feasible to simply use the traditional value-based DRL method because DRL cannot accurately capture important time information in time series. Most of the existing methods resort to the RNN mechanism, which in turn brings about the problem of sequence learning. In this paper, we conduct progressive research work on applying value-based DRL in time series anomaly detection. Firstly, because of the poor performance of traditional DQN, we propose an improved DQN-D method, whose performance is improved by 62% compared with DQN. Second, for RNN-based DRL, we propose a method based on improved experience replay pool (DRQN) to make up for the shortcomings of existing work and achieve excellent performance. Finally, we propose a Transformer-based DRL anomaly detection method to verify the effectiveness of the Transformer structure. Experimental results show that our DQN-D can obtain performance close to RNN-based DRL, DRQN and DTQN perform well on the dataset, and all methods are proven effective.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"48 1","pages":"197-202"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90476655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/cloud55607.2022.00011
{"title":"Message from the CLOUD 2022 Chairs","authors":"","doi":"10.1109/cloud55607.2022.00011","DOIUrl":"https://doi.org/10.1109/cloud55607.2022.00011","url":null,"abstract":"","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79684642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/CLOUD55607.2022.00050
Chinthaka Weerakkody, Miyuru Dayarathna, Sanath Jayasena, T. Suzumura
Maintaining guaranteed service level agreements on distributed graph processing for concurrent query execution is challenging because graph processing by nature is an unbalanced problem. In this paper we investigate on maintaining predefined service level agreements for graph processing workload mixtures taking triangle counting as the example. We develop a Graph Query Scheduler Mechanism (GQSM) which maintains a guaranteed service level agreement in terms of overall latency on top of JasmineGraph distributed graph database server. The proposed GQSM model is implemented using the queuing theory. Main component of GQSM is a job scheduler which is responsible for listening to an incoming job queue and scheduling the jobs received. The proposed model has a calibration phase where the Service Level Agreement (SLA) data, load average curve data, and maximum load average which can be handled by the hosts participating in the cluster without violating SLA is captured for the graphs in the system. Results show that for a single host system the SLA is successfully maintained when the total number of users is less than 6.
{"title":"Guaranteeing Service Level Agreements for Triangle Counting via Observation-based Admission Control Algorithm","authors":"Chinthaka Weerakkody, Miyuru Dayarathna, Sanath Jayasena, T. Suzumura","doi":"10.1109/CLOUD55607.2022.00050","DOIUrl":"https://doi.org/10.1109/CLOUD55607.2022.00050","url":null,"abstract":"Maintaining guaranteed service level agreements on distributed graph processing for concurrent query execution is challenging because graph processing by nature is an unbalanced problem. In this paper we investigate on maintaining predefined service level agreements for graph processing workload mixtures taking triangle counting as the example. We develop a Graph Query Scheduler Mechanism (GQSM) which maintains a guaranteed service level agreement in terms of overall latency on top of JasmineGraph distributed graph database server. The proposed GQSM model is implemented using the queuing theory. Main component of GQSM is a job scheduler which is responsible for listening to an incoming job queue and scheduling the jobs received. The proposed model has a calibration phase where the Service Level Agreement (SLA) data, load average curve data, and maximum load average which can be handled by the hosts participating in the cluster without violating SLA is captured for the graphs in the system. Results show that for a single host system the SLA is successfully maintained when the total number of users is less than 6.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"83 1","pages":"283-288"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75317706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}