César O. Díaz, Carlos E. Gómez, Harold E. Castro, Carlos J. Barrios, H. Bolívar
Desktop cloud paradigm arises from combining cloud computing with volunteer computing systems in order to harvest the idle computational resources of volunteers' computers Students usually underuse university computer rooms. As a result, a desktop cloud can be seen as a form of high performance computing (HPC) at a low cost. When the capacity of a desktop cloud is insufficient to execute a HPC project, a new opportunity for collaborative work among universities appears, resulting in a federation of desktop cloud systems to create a significant amount of virtual resources from multiple providers on non-dedicated infrastructure. Even though cloud federation generates research activity today, neither interoperability among several implementations of cloud computing nor the federation of desktop clouds are resolved issues. Therefore, our initiative is related to gathering the existing and idle computer resources provided by the universities that take part to form a cloud federation on non-dedicated infrastructure.
{"title":"Federated Campus Cloud Colombian Initiative","authors":"César O. Díaz, Carlos E. Gómez, Harold E. Castro, Carlos J. Barrios, H. Bolívar","doi":"10.1109/CCGrid.2016.48","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.48","url":null,"abstract":"Desktop cloud paradigm arises from combining cloud computing with volunteer computing systems in order to harvest the idle computational resources of volunteers' computers Students usually underuse university computer rooms. As a result, a desktop cloud can be seen as a form of high performance computing (HPC) at a low cost. When the capacity of a desktop cloud is insufficient to execute a HPC project, a new opportunity for collaborative work among universities appears, resulting in a federation of desktop cloud systems to create a significant amount of virtual resources from multiple providers on non-dedicated infrastructure. Even though cloud federation generates research activity today, neither interoperability among several implementations of cloud computing nor the federation of desktop clouds are resolved issues. Therefore, our initiative is related to gathering the existing and idle computer resources provided by the universities that take part to form a cloud federation on non-dedicated infrastructure.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130654656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cloud services providers deliver cloud services to cloud customers on pay-per-use model while the quality of the provided services are defined using service level agreements also known as SLAs. Unfortunately, there is no standard mechanism which exists to verify and assure that delivered services satisfy the signed SLA agreement in an automatic way. There is no guarantee in terms of quality. Those applications have many performance metrics. In this doctoral thesis, we propose a framework for SLA assurance, which can be used by both cloud providers and cloud users. Inside the proposed framework, we will define the performance metrics for the different applications. We will assess the applications performance in different testing environment to assure good services quality as mentioned in SLA. The proposed framework will be evaluated through simulations and using testbed experiments. After testing the applications performance by measuring the performance metrics, we will review the time correlations between those metrics.
{"title":"Service Level Agreement Assurance between Cloud Services Providers and Cloud Customers","authors":"A. A. Ibrahim, D. Kliazovich, P. Bouvry","doi":"10.1109/CCGrid.2016.56","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.56","url":null,"abstract":"Cloud services providers deliver cloud services to cloud customers on pay-per-use model while the quality of the provided services are defined using service level agreements also known as SLAs. Unfortunately, there is no standard mechanism which exists to verify and assure that delivered services satisfy the signed SLA agreement in an automatic way. There is no guarantee in terms of quality. Those applications have many performance metrics. In this doctoral thesis, we propose a framework for SLA assurance, which can be used by both cloud providers and cloud users. Inside the proposed framework, we will define the performance metrics for the different applications. We will assess the applications performance in different testing environment to assure good services quality as mentioned in SLA. The proposed framework will be evaluated through simulations and using testbed experiments. After testing the applications performance by measuring the performance metrics, we will review the time correlations between those metrics.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116299387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During the last years, High Performance Computing (HPC) resources have undergone a dramatic transformation, with an explosion on the available parallelism and the use of special purpose processors. There are international initiatives focusing on redesigning hardware and software in order to achieve the Exaflop capability. With this aim, the HPC4E project is applying the new exascale HPC techniques to energy industry simulations, customizing them if necessary, and going beyond the state-of-the-art in the required HPC exascale simulations for different energy sources that are the present and the future of energy: wind energy production and design, efficient combustion systems for biomass-derived fuels (biogas), and exploration geophysics for hydrocarbon reservoirs. HPC4E joins efforts of several institutions settled in Brazil and Europe.
{"title":"Fostering Collaboration in Energy Research and Technological Developments Applying New Exascale HPC Techniques","authors":"J. Cela, P. Navaux, A. Coutinho, R. Mayo-García","doi":"10.1109/CCGrid.2016.51","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.51","url":null,"abstract":"During the last years, High Performance Computing (HPC) resources have undergone a dramatic transformation, with an explosion on the available parallelism and the use of special purpose processors. There are international initiatives focusing on redesigning hardware and software in order to achieve the Exaflop capability. With this aim, the HPC4E project is applying the new exascale HPC techniques to energy industry simulations, customizing them if necessary, and going beyond the state-of-the-art in the required HPC exascale simulations for different energy sources that are the present and the future of energy: wind energy production and design, efficient combustion systems for biomass-derived fuels (biogas), and exploration geophysics for hydrocarbon reservoirs. HPC4E joins efforts of several institutions settled in Brazil and Europe.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129588101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mariza Ferro, M. Nicolás, Quadalupe Del Rosario Q. Saji, A. Mury, B. Schulze
Bioinformatics could greatly benefit from increased computational resources delivered by High Performance Computing. However, the decision-making about which is the best architecture to deliver good performance for a set of Bioinformatics applications is a hard task. The traditional way is finding the architecture with a high theoretical peak of performance, obtained with benchmark tests. But, this is not an assured way for this decision, because each application of Bioinformatics has different computational requirements, which frequently are much different from usual benchmarks. We developed a methodology that assists researchers, even when their specialty is not high performance computing, to define the best computational infrastructure focused on their set of scientific application requirements. For this purpose, the methodology enables to define representative evaluation tests, including a model to define the correct benchmark, when the tests endorsed by the methodology could not be fully used. Further, a Gain Function allows a reliable decision-making based on the performances of a set of applications and architectures. It is also possible to consider the relative importance between applications and also between cost and performance.
{"title":"Leveraging High Performance Computing for Bioinformatics: A Methodology that Enables a Reliable Decision-Making","authors":"Mariza Ferro, M. Nicolás, Quadalupe Del Rosario Q. Saji, A. Mury, B. Schulze","doi":"10.1109/CCGrid.2016.69","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.69","url":null,"abstract":"Bioinformatics could greatly benefit from increased computational resources delivered by High Performance Computing. However, the decision-making about which is the best architecture to deliver good performance for a set of Bioinformatics applications is a hard task. The traditional way is finding the architecture with a high theoretical peak of performance, obtained with benchmark tests. But, this is not an assured way for this decision, because each application of Bioinformatics has different computational requirements, which frequently are much different from usual benchmarks. We developed a methodology that assists researchers, even when their specialty is not high performance computing, to define the best computational infrastructure focused on their set of scientific application requirements. For this purpose, the methodology enables to define representative evaluation tests, including a model to define the correct benchmark, when the tests endorsed by the methodology could not be fully used. Further, a Gain Function allows a reliable decision-making based on the performances of a set of applications and architectures. It is also possible to consider the relative importance between applications and also between cost and performance.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132453115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph processing is increasingly used in a variety of domains, from engineering to logistics and from scientific computing to online gaming. To process graphs efficiently, GPU-enabled graph-processing systems such as TOTEM and Medusa exploit the GPU or the combined CPU+GPU capabilities of a single machine. Unlike scalable distributed CPU-based systems such as Pregel and GraphX, existing GPU-enabled systems are restricted to the resources of a single machine, including the limited amount of GPU memory, and thus cannot analyze the increasingly large-scale graphs we see in practice. To address this problem, we design and implement three families of distributed heterogeneous graph-processing systems that can use both the CPUs and GPUs of multiple machines. We further focus on graph partitioning, for which we compare existing graph-partitioning policies and a new policy specifically targeted at heterogeneity. We implement all our distributed heterogeneous systems based on the programming model of the single-machine TOTEM, to which we add (1) a new communication layer for CPUs and GPUs across multiple machines to support distributed graphs, and (2) a workload partitioning method that uses offline profiling to distribute the work on the CPUs and the GPUs. We conduct a comprehensive real-world performance evaluation for all three families. To ensure representative results, we select 3 typical algorithms and 5 datasets with different characteristics. Our results include algorithm run time, performance breakdown, scalability, graph partitioning time, and comparison with other graph-processing systems. They demonstrate the feasibility of distributed heterogeneous graph processing and show evidence of the high performance that can be achieved by combining CPUs and GPUs in a distributed environment.
{"title":"Design and Experimental Evaluation of Distributed Heterogeneous Graph-Processing Systems","authors":"Yong Guo, A. Varbanescu, D. Epema, A. Iosup","doi":"10.1109/CCGrid.2016.53","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.53","url":null,"abstract":"Graph processing is increasingly used in a variety of domains, from engineering to logistics and from scientific computing to online gaming. To process graphs efficiently, GPU-enabled graph-processing systems such as TOTEM and Medusa exploit the GPU or the combined CPU+GPU capabilities of a single machine. Unlike scalable distributed CPU-based systems such as Pregel and GraphX, existing GPU-enabled systems are restricted to the resources of a single machine, including the limited amount of GPU memory, and thus cannot analyze the increasingly large-scale graphs we see in practice. To address this problem, we design and implement three families of distributed heterogeneous graph-processing systems that can use both the CPUs and GPUs of multiple machines. We further focus on graph partitioning, for which we compare existing graph-partitioning policies and a new policy specifically targeted at heterogeneity. We implement all our distributed heterogeneous systems based on the programming model of the single-machine TOTEM, to which we add (1) a new communication layer for CPUs and GPUs across multiple machines to support distributed graphs, and (2) a workload partitioning method that uses offline profiling to distribute the work on the CPUs and the GPUs. We conduct a comprehensive real-world performance evaluation for all three families. To ensure representative results, we select 3 typical algorithms and 5 datasets with different characteristics. Our results include algorithm run time, performance breakdown, scalability, graph partitioning time, and comparison with other graph-processing systems. They demonstrate the feasibility of distributed heterogeneous graph processing and show evidence of the high performance that can be achieved by combining CPUs and GPUs in a distributed environment.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132237314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marwan Hassani, Pascal Spaus, A. Cuzzocrea, T. Seidl
Big Data Streams are very popular at now, as stirred-up by a plethora of modern applications such as sensor networks, scientific computing tools, Web intelligence, social network analysis and mining tools, and so forth. Here, the main research issue consists in how to effectively and efficiently extract useful knowledge from (streaming) big data, in order to support innovative big data analytics platforms. To this end, clustering analysis is a well-known tool for extracting knowledge from big data streams, as also confirmed by recent trends in active literature. A special applicative case is represented by so-called graph-shaped data (big) streams, which are produced by graph sources providing both structure-and content-oriented knowledge. On top of such sources, big graph analytics is a leading scientific area to be considered. At the convergence of these emerging topics, in this paper we provide the following contributions: (i) I-HASTREAM, a novel density-based hierarchical clustering algorithm for evolving big data streams that founds on it predecessor, namely HASTREAM, (ii) the architecture of a big graph analytics engine that embeds I-HASTREAM in its core layer.
随着传感器网络、科学计算工具、Web智能、社交网络分析和挖掘工具等大量现代应用的兴起,大数据流现在非常流行。在这里,主要的研究问题在于如何有效和高效地从(流)大数据中提取有用的知识,以支持创新的大数据分析平台。为此,聚类分析是一种众所周知的从大数据流中提取知识的工具,最近的活跃文献趋势也证实了这一点。一个特殊的应用案例是所谓的图形数据(大)流,它是由提供面向结构和面向内容知识的图源产生的。在这些资源之上,大图形分析是一个需要考虑的领先科学领域。在这些新兴主题的融合中,本文提供了以下贡献:(i) i -HASTREAM,一种新的基于密度的分层聚类算法,用于发展大数据流,该算法建立在其前身HASTREAM之上,即(ii)将i -HASTREAM嵌入其核心层的大图形分析引擎架构。
{"title":"I-HASTREAM: Density-Based Hierarchical Clustering of Big Data Streams and Its Application to Big Graph Analytics Tools","authors":"Marwan Hassani, Pascal Spaus, A. Cuzzocrea, T. Seidl","doi":"10.1109/CCGrid.2016.102","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.102","url":null,"abstract":"Big Data Streams are very popular at now, as stirred-up by a plethora of modern applications such as sensor networks, scientific computing tools, Web intelligence, social network analysis and mining tools, and so forth. Here, the main research issue consists in how to effectively and efficiently extract useful knowledge from (streaming) big data, in order to support innovative big data analytics platforms. To this end, clustering analysis is a well-known tool for extracting knowledge from big data streams, as also confirmed by recent trends in active literature. A special applicative case is represented by so-called graph-shaped data (big) streams, which are produced by graph sources providing both structure-and content-oriented knowledge. On top of such sources, big graph analytics is a leading scientific area to be considered. At the convergence of these emerging topics, in this paper we provide the following contributions: (i) I-HASTREAM, a novel density-based hierarchical clustering algorithm for evolving big data streams that founds on it predecessor, namely HASTREAM, (ii) the architecture of a big graph analytics engine that embeds I-HASTREAM in its core layer.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115009380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Chakraborty, H. Subramoni, Jonathan L. Perkins, D. Panda
Dense systems with large number of cores per node are becoming increasingly popular. Existing designs of the Process Management Interface (PMI) show poor scalability in terms of performance and memory consumption on such systems with large number of processes concurrently accessing the PMI interface. Our analysis shows the local socket-based communication scheme used by PMI to be a major bottleneck. While using a shared memory based channel can avoid this bottleneck and thus reduce memory consumption and improve performance, there are several challenges associated with such a design. We investigate several such alternatives and propose a novel design that is based on a hybrid socket+shared memory based communication protocol and uses multiple shared memory regions. This design can reduce the memory usage per node by a factor of Processes per Node. Our evaluations show that memory consumption per node can be reduced by an estimated 1GB with 1 million MPI processes and 16 processes per node. Additionally, performance of PMI Get is improved by 1,000 times compared to the existing design. The proposed design is backward compatible, secure, and imposes negligible overhead.
{"title":"SHMEMPMI -- Shared Memory Based PMI for Improved Performance and Scalability","authors":"S. Chakraborty, H. Subramoni, Jonathan L. Perkins, D. Panda","doi":"10.1109/CCGrid.2016.99","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.99","url":null,"abstract":"Dense systems with large number of cores per node are becoming increasingly popular. Existing designs of the Process Management Interface (PMI) show poor scalability in terms of performance and memory consumption on such systems with large number of processes concurrently accessing the PMI interface. Our analysis shows the local socket-based communication scheme used by PMI to be a major bottleneck. While using a shared memory based channel can avoid this bottleneck and thus reduce memory consumption and improve performance, there are several challenges associated with such a design. We investigate several such alternatives and propose a novel design that is based on a hybrid socket+shared memory based communication protocol and uses multiple shared memory regions. This design can reduce the memory usage per node by a factor of Processes per Node. Our evaluations show that memory consumption per node can be reduced by an estimated 1GB with 1 million MPI processes and 16 processes per node. Additionally, performance of PMI Get is improved by 1,000 times compared to the existing design. The proposed design is backward compatible, secure, and imposes negligible overhead.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116450108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A big data analytics workflow is long and complex, with many programs, tools and scripts interacting together. In general, in modern organizations there is a significant amount of big data analytics processing performed outside a database system, which creates many issues to manage and process big data analytics workflows. In general, data preprocessing is the most time-consuming task in a big data analytics workflow. In this work, we defend the idea of preprocessing, computing models and scoring data sets inside a database system. In addition, we discuss recommendations and experiences to improve big data analytics workflows by pushing data preprocessing (i.e. data cleaning, aggregation and column transformation) into a database system. We present a discussion of practical issues and common solutions when transforming and preparing data sets to improve big data analytics workflows. As a case study validation, based on experience from real-life big data analytics projects, we compare pros and cons between running big data analytics workflows inside and outside the database system. We highlight which tasks in a big data analytics workflow are easier to manage and faster when processed by the database system, compared to external processing.
{"title":"Managing Big Data Analytics Workflows with a Database System","authors":"C. Ordonez, Javier García-García","doi":"10.1109/CCGrid.2016.63","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.63","url":null,"abstract":"A big data analytics workflow is long and complex, with many programs, tools and scripts interacting together. In general, in modern organizations there is a significant amount of big data analytics processing performed outside a database system, which creates many issues to manage and process big data analytics workflows. In general, data preprocessing is the most time-consuming task in a big data analytics workflow. In this work, we defend the idea of preprocessing, computing models and scoring data sets inside a database system. In addition, we discuss recommendations and experiences to improve big data analytics workflows by pushing data preprocessing (i.e. data cleaning, aggregation and column transformation) into a database system. We present a discussion of practical issues and common solutions when transforming and preparing data sets to improve big data analytics workflows. As a case study validation, based on experience from real-life big data analytics projects, we compare pros and cons between running big data analytics workflows inside and outside the database system. We highlight which tasks in a big data analytics workflow are easier to manage and faster when processed by the database system, compared to external processing.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128218727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Iturriaga, Sergio Nesmachnow, Andrei Tchernykh, B. Dorronsoro
The energy consumption of large data centers has been increasing for the last decades and currently is a major concern for economic and environmental reasons. Accurate scheduling of the data center operation and use of renewable energy sources present themselves as promising solutions for this problem. In this paper we study the problem of scheduling workflows of tasks in distributed heterogeneous data centers which are partially powered by renewable energy sources. This problem takes into account quality of service, infrastructure usage, and power consumption of machines and cooling devices. We propose a mathematical model for accurate scheduling solutions.
{"title":"Multiobjective Workflow Scheduling in a Federation of Heterogeneous Green-Powered Data Centers","authors":"S. Iturriaga, Sergio Nesmachnow, Andrei Tchernykh, B. Dorronsoro","doi":"10.1109/CCGrid.2016.34","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.34","url":null,"abstract":"The energy consumption of large data centers has been increasing for the last decades and currently is a major concern for economic and environmental reasons. Accurate scheduling of the data center operation and use of renewable energy sources present themselves as promising solutions for this problem. In this paper we study the problem of scheduling workflows of tasks in distributed heterogeneous data centers which are partially powered by renewable energy sources. This problem takes into account quality of service, infrastructure usage, and power consumption of machines and cooling devices. We propose a mathematical model for accurate scheduling solutions.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128080138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. A. E. Farias, F. R. C. Sousa, J. G. R. Maia, J. Gomes, Javam C. Machado
Cloud computing is a successful, emerging paradigm that supports on-demand services with pay-as-you-go model. With the exponential growth of data, NoSQL databases have been used to manage data in the cloud. In these newly emerging settings, mechanisms to guarantee Quality of Service heavily relies on performance predictability, i.e., the ability to estimate the impact of concurrent query execution on the performance of individual queries in a continuously evolving workload. This paper presents a performance modeling approach for NoSQL databases in terms of performance metrics which is capable of capturing the non-linear effects caused by concurrency and distribution aspects. Experimental results confirm that our performance modeling can accurately predict mean response time measurements under a wide range of workload configurations.
{"title":"Machine Learning Approach for Cloud NoSQL Databases Performance Modeling","authors":"V. A. E. Farias, F. R. C. Sousa, J. G. R. Maia, J. Gomes, Javam C. Machado","doi":"10.1109/CCGrid.2016.83","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.83","url":null,"abstract":"Cloud computing is a successful, emerging paradigm that supports on-demand services with pay-as-you-go model. With the exponential growth of data, NoSQL databases have been used to manage data in the cloud. In these newly emerging settings, mechanisms to guarantee Quality of Service heavily relies on performance predictability, i.e., the ability to estimate the impact of concurrent query execution on the performance of individual queries in a continuously evolving workload. This paper presents a performance modeling approach for NoSQL databases in terms of performance metrics which is capable of capturing the non-linear effects caused by concurrency and distribution aspects. Experimental results confirm that our performance modeling can accurately predict mean response time measurements under a wide range of workload configurations.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133622312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}