The need to share data across applications is becoming increasingly evident. Current cloud isolation mechanisms focus solely on protection, such as containers that isolate at the OS-level, and virtual machines that isolate through the hypervisor. However, by focusing rigidly on protection, these approaches do not provide for controlled sharing. This paper presents how Information Flow Control (IFC) offers a flexible alternative. As a data-centric mechanism it enables strong isolation when required, while providing continuous, fine grained control of the data being shared. An IFC-enabled cloud platform would ensure that policies are enforced as data flows across all applications, without requiring any special sharing mechanisms.
{"title":"Information Flow Control for Strong Protection with Flexible Sharing in PaaS","authors":"Thomas Pasquier, Jatinder Singh, J. Bacon","doi":"10.1109/IC2E.2015.64","DOIUrl":"https://doi.org/10.1109/IC2E.2015.64","url":null,"abstract":"The need to share data across applications is becoming increasingly evident. Current cloud isolation mechanisms focus solely on protection, such as containers that isolate at the OS-level, and virtual machines that isolate through the hypervisor. However, by focusing rigidly on protection, these approaches do not provide for controlled sharing. This paper presents how Information Flow Control (IFC) offers a flexible alternative. As a data-centric mechanism it enables strong isolation when required, while providing continuous, fine grained control of the data being shared. An IFC-enabled cloud platform would ensure that policies are enforced as data flows across all applications, without requiring any special sharing mechanisms.","PeriodicalId":395715,"journal":{"name":"2015 IEEE International Conference on Cloud Engineering","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121842725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ioannis Mytilinis, Dimitrios Tsoumakos, Verena Kantere, Anastassios Nanos, N. Koziris
Big Data applications receive an ever-increasing amount of attention, thus becoming a dominant class of applications that are deployed over virtualized environments. Cloud environments entail a large amount of complexity relative to I/O performance. The use of Big Data increases the complexity of I/O management as well as its characterization and prediction: As I/O operations become growingly dominant in such applications, the intricacies of virtualization, different storage back ends and deployment setups significantly hinder our ability to analyze and correctly predict I/O performance. To that end, this work proposes an end-to-end modeling technique to predict performance of I/O--intensive Big Data applications running over cloud infrastructures. We develop a model tuned over application and infrastructure dimensions: Primitive I/O operations, data access patterns, storage back ends and deployment parameters. The trained model can be used to predict both I/O but also general task performance. Our evaluation results show that for jobs which are dominated by I/O operations, such as I/O-bound MapReduce jobs, our model is capable of predicting execution time with an accuracy close to 90% that decreases as application processing becomes more complex.
{"title":"I/O Performance Modeling for Big Data Applications over Cloud Infrastructures","authors":"Ioannis Mytilinis, Dimitrios Tsoumakos, Verena Kantere, Anastassios Nanos, N. Koziris","doi":"10.1109/IC2E.2015.29","DOIUrl":"https://doi.org/10.1109/IC2E.2015.29","url":null,"abstract":"Big Data applications receive an ever-increasing amount of attention, thus becoming a dominant class of applications that are deployed over virtualized environments. Cloud environments entail a large amount of complexity relative to I/O performance. The use of Big Data increases the complexity of I/O management as well as its characterization and prediction: As I/O operations become growingly dominant in such applications, the intricacies of virtualization, different storage back ends and deployment setups significantly hinder our ability to analyze and correctly predict I/O performance. To that end, this work proposes an end-to-end modeling technique to predict performance of I/O--intensive Big Data applications running over cloud infrastructures. We develop a model tuned over application and infrastructure dimensions: Primitive I/O operations, data access patterns, storage back ends and deployment parameters. The trained model can be used to predict both I/O but also general task performance. Our evaluation results show that for jobs which are dominated by I/O operations, such as I/O-bound MapReduce jobs, our model is capable of predicting execution time with an accuracy close to 90% that decreases as application processing becomes more complex.","PeriodicalId":395715,"journal":{"name":"2015 IEEE International Conference on Cloud Engineering","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132571565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Considerable efforts have been spent on designing architectures to manage heterogeneous resources across multiple administrative domains. Specific fields of application are federated cloud computing (Intercloud) approaches and distributed testbeds, among others. An important interoperability challenge that arises in this context is the exchange of information about the provided resources and their dependencies. Existing work usually rests upon schematic data models, which impede the discovery and management of heterogeneous resources between autonomous sites. One way of addressing this issue is to exchange semantic information models. In this paper, we exploit such approaches to formally define federations, including their infrastructures and the life-cycle of the offered resources and services. The requirements of this work have been derived from several research projects and the results are in process of being standardized by an international body. The main contribution of this work is a higher level (upper) ontology and initial integration concepts for it. These contributions form a basis for further work in the general context of distributed semantic resource management.
{"title":"FIDDLE: Federated Infrastructure Discovery and Description Language","authors":"A. Willner, R. Loughnane, T. Magedanz","doi":"10.1109/IC2E.2015.77","DOIUrl":"https://doi.org/10.1109/IC2E.2015.77","url":null,"abstract":"Considerable efforts have been spent on designing architectures to manage heterogeneous resources across multiple administrative domains. Specific fields of application are federated cloud computing (Intercloud) approaches and distributed testbeds, among others. An important interoperability challenge that arises in this context is the exchange of information about the provided resources and their dependencies. Existing work usually rests upon schematic data models, which impede the discovery and management of heterogeneous resources between autonomous sites. One way of addressing this issue is to exchange semantic information models. In this paper, we exploit such approaches to formally define federations, including their infrastructures and the life-cycle of the offered resources and services. The requirements of this work have been derived from several research projects and the results are in process of being standardized by an international body. The main contribution of this work is a higher level (upper) ontology and initial integration concepts for it. These contributions form a basis for further work in the general context of distributed semantic resource management.","PeriodicalId":395715,"journal":{"name":"2015 IEEE International Conference on Cloud Engineering","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127384642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To track, control, and compel reuse of web APIs, we investigate a new approach to API governance -- combined policy, implementation, and deployment control of web APIs. Our approach, called EAGER, provides a software architecture that integrates into PaaS platforms to support systemwide, deployment-time enforcement of governance policies. Specifically, EAGER checks for and prevents backward incompatible API changes from being deployed into production PaaS clouds, enforces service reuse, and facilitates enforcement of other best practices in software maintenance via policies. Our experiments with an EAGER prototype show that enforcing API governance at deployment-time in PaaS clouds is efficient and scalable to thousands of APIs and policies.
{"title":"EAGER: Deployment-Time API Governance for Modern PaaS Clouds","authors":"Hiranya Jayathilaka, C. Krintz, R. Wolski","doi":"10.1109/IC2E.2015.69","DOIUrl":"https://doi.org/10.1109/IC2E.2015.69","url":null,"abstract":"To track, control, and compel reuse of web APIs, we investigate a new approach to API governance -- combined policy, implementation, and deployment control of web APIs. Our approach, called EAGER, provides a software architecture that integrates into PaaS platforms to support systemwide, deployment-time enforcement of governance policies. Specifically, EAGER checks for and prevents backward incompatible API changes from being deployed into production PaaS clouds, enforces service reuse, and facilitates enforcement of other best practices in software maintenance via policies. Our experiments with an EAGER prototype show that enforcing API governance at deployment-time in PaaS clouds is efficient and scalable to thousands of APIs and policies.","PeriodicalId":395715,"journal":{"name":"2015 IEEE International Conference on Cloud Engineering","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116744250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cloud platforms advances have changed the application development landscape. Cloud platforms abstract the complexity of application delivery to enable rapid development and easy management. This changes the way development teams need to think about and deal with the underlying resources while building and managing their applications. This research describes a new methodology supported by a modeling framework to enable organizations that build cloud applications (e.g., SaaS providers) to unbiasedly exploit the cloud platform building blocks to leverage the flexibility, reliability and scalability that these platforms provide to the application layer.
{"title":"A Bird's-Eye View on Modelling Malleable Multi-cloud Applications","authors":"Mohammad Hamdaqa","doi":"10.1109/IC2E.2015.94","DOIUrl":"https://doi.org/10.1109/IC2E.2015.94","url":null,"abstract":"Cloud platforms advances have changed the application development landscape. Cloud platforms abstract the complexity of application delivery to enable rapid development and easy management. This changes the way development teams need to think about and deal with the underlying resources while building and managing their applications. This research describes a new methodology supported by a modeling framework to enable organizations that build cloud applications (e.g., SaaS providers) to unbiasedly exploit the cloud platform building blocks to leverage the flexibility, reliability and scalability that these platforms provide to the application layer.","PeriodicalId":395715,"journal":{"name":"2015 IEEE International Conference on Cloud Engineering","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131810355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Rey, M. Cogorno, Sergio Nesmachnow, L. Steffenel
Prototyping and testing distributed systems is considered to be a hard task because it is not always possible to reproduce a given sequence of events. While simulations may help on this task, they cannot replace test and validation with real systems. In this paper we present Docker-Hadoop, a container-based virtualization platform designed to prototype, test and deploy MapReduce applications and systems. This tool allowed us to test and reproduce fault-tolerance scenarios that are especially interesting in the context of the PER-MARE project, which aims at adapting the Hadoop framework to the case pervasive systems. Indeed, we developed a fault-tolerant component that can circumvent the limitations from original Hadoop and prevent the job scheduling stall in the case of failures or network disconnections. Thanks to Docker-Hadoop, we could easily prototype and test our improved Hadoop, with the first scalability and speedup results being presented in this paper.
{"title":"Efficient Prototyping of Fault Tolerant Map-Reduce Applications with Docker-Hadoop","authors":"J. Rey, M. Cogorno, Sergio Nesmachnow, L. Steffenel","doi":"10.1109/IC2E.2015.73","DOIUrl":"https://doi.org/10.1109/IC2E.2015.73","url":null,"abstract":"Prototyping and testing distributed systems is considered to be a hard task because it is not always possible to reproduce a given sequence of events. While simulations may help on this task, they cannot replace test and validation with real systems. In this paper we present Docker-Hadoop, a container-based virtualization platform designed to prototype, test and deploy MapReduce applications and systems. This tool allowed us to test and reproduce fault-tolerance scenarios that are especially interesting in the context of the PER-MARE project, which aims at adapting the Hadoop framework to the case pervasive systems. Indeed, we developed a fault-tolerant component that can circumvent the limitations from original Hadoop and prevent the job scheduling stall in the case of failures or network disconnections. Thanks to Docker-Hadoop, we could easily prototype and test our improved Hadoop, with the first scalability and speedup results being presented in this paper.","PeriodicalId":395715,"journal":{"name":"2015 IEEE International Conference on Cloud Engineering","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131814174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cloud related legal documents, like terms of service or customer agreement are usually managed as plain text files. Hence extensive manual effort is required to monitor the cloud service performance by cross referencing the metrics and measures agreed upon in these documents. We have significantly automated the process of managing and monitoring cloud Service Level Agreements (SLA) using semantic web technologies like OWL, RDF and SPARQL. In this paper, we describe in detail the cloud SLA ontology and the prototype that we have developed to illustrate how the SLA measures can be automatically extracted from legal Terms of Service that are available on cloud provider websites.
{"title":"Automating Cloud Service Level Agreements Using Semantic Technologies","authors":"K. Joshi, C. Pearce","doi":"10.1109/IC2E.2015.63","DOIUrl":"https://doi.org/10.1109/IC2E.2015.63","url":null,"abstract":"Cloud related legal documents, like terms of service or customer agreement are usually managed as plain text files. Hence extensive manual effort is required to monitor the cloud service performance by cross referencing the metrics and measures agreed upon in these documents. We have significantly automated the process of managing and monitoring cloud Service Level Agreements (SLA) using semantic web technologies like OWL, RDF and SPARQL. In this paper, we describe in detail the cloud SLA ontology and the prototype that we have developed to illustrate how the SLA measures can be automatically extracted from legal Terms of Service that are available on cloud provider websites.","PeriodicalId":395715,"journal":{"name":"2015 IEEE International Conference on Cloud Engineering","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114233094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, researchers have contributed promising new techniques for allocating cloud resources in more robust, efficient, and ecologically sustainable ways. Unfortunately, the wide-spread use of these techniques in production systems has, to date, remained elusive. One reason for this is that the state of the art for investigating these innovations at scale often relies solely on model-driven simulation. Production-grade cloud software, however, demands certainty and precision for development and business planning that only comes from validating simulation against empirical observation. In this work, we take an alternative approach to facilitating cloud research and engineering in order to transition innovations to production deployment faster. In particular, we present a new methodology that complements existing model-driven simulation with platform-specific and statistically trustworthy results. We simulate systems at scales and on time frames that are testable, and then, based on the statistical validation of these simulations, investigate scenarios beyond those feasibly observable in practice. We demonstrate the approach by developing an energy-aware cloud scheduler and evaluating it using production and synthetic traces in faster than real time. Our results show that we can accurately simulate a production IaaS system, ease capacity planning, and expedite the reliable development of its components and extensions.
{"title":"Using Trustworthy Simulation to Engineer Cloud Schedulers","authors":"A. Pucher, Emre Gul, R. Wolski, C. Krintz","doi":"10.1109/IC2E.2015.14","DOIUrl":"https://doi.org/10.1109/IC2E.2015.14","url":null,"abstract":"In recent years, researchers have contributed promising new techniques for allocating cloud resources in more robust, efficient, and ecologically sustainable ways. Unfortunately, the wide-spread use of these techniques in production systems has, to date, remained elusive. One reason for this is that the state of the art for investigating these innovations at scale often relies solely on model-driven simulation. Production-grade cloud software, however, demands certainty and precision for development and business planning that only comes from validating simulation against empirical observation. In this work, we take an alternative approach to facilitating cloud research and engineering in order to transition innovations to production deployment faster. In particular, we present a new methodology that complements existing model-driven simulation with platform-specific and statistically trustworthy results. We simulate systems at scales and on time frames that are testable, and then, based on the statistical validation of these simulations, investigate scenarios beyond those feasibly observable in practice. We demonstrate the approach by developing an energy-aware cloud scheduler and evaluating it using production and synthetic traces in faster than real time. Our results show that we can accurately simulate a production IaaS system, ease capacity planning, and expedite the reliable development of its components and extensions.","PeriodicalId":395715,"journal":{"name":"2015 IEEE International Conference on Cloud Engineering","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116373798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big data processing tools have evolved rapidly in recent years. MapReduce has proven very successful but is not optimized for many important analytics, especially those involving iteration. In this regard, Iterative MapReduce frameworks improve performance of MapReduce job chains through caching. Further, Pregel, Giraph and Graph Lab abstract data as a graph and process it in iterations. But all these tools are designed with a fixed data abstraction and have limited collective communication support to synchronize application data and algorithm control states among parallel processes. In this paper, we introduce a collective communication abstraction layer which provides efficient collective communication operations on several common data abstractions such as arrays, key-values and graphs, and define a Map Collective programming model which serves the diverse collective communication demands in different parallel algorithms. We implement a library called Harp to provide the features above and plug it into Hadoop so that applications abstracted in Map Collective model can be easily developed on top of MapReduce framework and conveniently integrated with other tools in Apache Big Data Stack. With improved expressiveness in the abstraction and excellent performance on the implementation, we can simultaneously support various applications from HPC to Cloud systems together with high performance.
{"title":"Harp: Collective Communication on Hadoop","authors":"Bingjing Zhang, Yang Ruan, J. Qiu","doi":"10.1109/IC2E.2015.35","DOIUrl":"https://doi.org/10.1109/IC2E.2015.35","url":null,"abstract":"Big data processing tools have evolved rapidly in recent years. MapReduce has proven very successful but is not optimized for many important analytics, especially those involving iteration. In this regard, Iterative MapReduce frameworks improve performance of MapReduce job chains through caching. Further, Pregel, Giraph and Graph Lab abstract data as a graph and process it in iterations. But all these tools are designed with a fixed data abstraction and have limited collective communication support to synchronize application data and algorithm control states among parallel processes. In this paper, we introduce a collective communication abstraction layer which provides efficient collective communication operations on several common data abstractions such as arrays, key-values and graphs, and define a Map Collective programming model which serves the diverse collective communication demands in different parallel algorithms. We implement a library called Harp to provide the features above and plug it into Hadoop so that applications abstracted in Map Collective model can be easily developed on top of MapReduce framework and conveniently integrated with other tools in Apache Big Data Stack. With improved expressiveness in the abstraction and excellent performance on the implementation, we can simultaneously support various applications from HPC to Cloud systems together with high performance.","PeriodicalId":395715,"journal":{"name":"2015 IEEE International Conference on Cloud Engineering","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115739348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent decades, virtualization as an abstraction from physical hardware has become a popular solution to resource isolation and server consolidation. With the surge in adoption of virtualization technologies, ensuring High Availability (HA) for applications hosted in virtualized environments emerges as an important problem and has garnered substantial attention. In this paper, we present a brief comparison of virtualization technologies from a HA perspective. The state-of-the-art HA solutions in two mainstream types of virtualized platforms (i.e., hypervisor-based platform and container-based platform) are respectively investigated in terms of limitations and features such as live migration, failure detection, and checkpoint/ restore. One of our key findings is that, compared with hypervisor-based platforms, HA features in container-based platforms are far from enough. From a HA perspective, extensions on top of container technologies are required.
{"title":"Comparing Containers versus Virtual Machines for Achieving High Availability","authors":"Wubin Li, A. Kanso","doi":"10.1109/IC2E.2015.79","DOIUrl":"https://doi.org/10.1109/IC2E.2015.79","url":null,"abstract":"In recent decades, virtualization as an abstraction from physical hardware has become a popular solution to resource isolation and server consolidation. With the surge in adoption of virtualization technologies, ensuring High Availability (HA) for applications hosted in virtualized environments emerges as an important problem and has garnered substantial attention. In this paper, we present a brief comparison of virtualization technologies from a HA perspective. The state-of-the-art HA solutions in two mainstream types of virtualized platforms (i.e., hypervisor-based platform and container-based platform) are respectively investigated in terms of limitations and features such as live migration, failure detection, and checkpoint/ restore. One of our key findings is that, compared with hypervisor-based platforms, HA features in container-based platforms are far from enough. From a HA perspective, extensions on top of container technologies are required.","PeriodicalId":395715,"journal":{"name":"2015 IEEE International Conference on Cloud Engineering","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132678488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}