F. Brasileiro, G. Silva, Francisco Araujo, Marcos Nobrega, Igor Silva, Gustavo Rocha
This paper presents a new middleware, called Fogbow, designed to support large federations of Infrastructure-as-a-service (IaaS) cloud providers. Fogbow follows a novel approach that implements federation functionalities outside the cloud orchestrator. This approach provides great flexibility, since it can use plug-ins that allow for the definition of precise interaction points between the federation middleware and the underlying cloud orchestrator. The resulting architecture, which relies on standards for conciliating different orchestrators' peculiarities, is thereby able to provide a common API to decouple federation functionalities from the orchestrator functionalities. In the demonstration we will showcase how Fogbow has been used to implement several cloud federations, with different requirements.
{"title":"Fogbow: A Middleware for the Federation of IaaS Clouds","authors":"F. Brasileiro, G. Silva, Francisco Araujo, Marcos Nobrega, Igor Silva, Gustavo Rocha","doi":"10.1109/CCGrid.2016.12","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.12","url":null,"abstract":"This paper presents a new middleware, called Fogbow, designed to support large federations of Infrastructure-as-a-service (IaaS) cloud providers. Fogbow follows a novel approach that implements federation functionalities outside the cloud orchestrator. This approach provides great flexibility, since it can use plug-ins that allow for the definition of precise interaction points between the federation middleware and the underlying cloud orchestrator. The resulting architecture, which relies on standards for conciliating different orchestrators' peculiarities, is thereby able to provide a common API to decouple federation functionalities from the orchestrator functionalities. In the demonstration we will showcase how Fogbow has been used to implement several cloud federations, with different requirements.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132823725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The envisioned intercloud bridging numerous cloud providers offering clients the ability to run their applications on specific configurations unavailable to single clouds poses challenges with respect to selecting the appropriate resources for deploying VMs. Reasons include the large distributed scale and VM performance fluctuations. Reusing previously "successful" resource coalitions may be an alternative to a brute force search employed by many existing scheduling algorithms. The reason for reusing resources is motivated by an implicit trust in previous successful executions that have not experienced VM performance fluctuations described in many research papers on cloud performance. Furthermore, the data deluge coming from services monitoring the load and availability of resources forces a shift in traditional centralized and decentralized resource management by emphasizing the need for edge computing. In this way only meta data is sent to the resource management system for resource matchmaking. In this paper we propose a bottom-up monitoring architecture and a proof-of-concept platform for scheduling applications based on resource coalition reuse. We consider static coalitions and neglect any interference from other coalitions by considering only the historical behavior of a particular coalition and not the overall state of the system in the past and now. We test our prototype on real traces by comparing with a random approach and discuss the results by outlying its benefits as well as some future work on run time coalition adaptation and global influences.
{"title":"Reusing Resource Coalitions for Efficient Scheduling on the Intercloud","authors":"Teodora Selea, Adrian F. Spataru, M. Frîncu","doi":"10.1109/CCGrid.2016.45","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.45","url":null,"abstract":"The envisioned intercloud bridging numerous cloud providers offering clients the ability to run their applications on specific configurations unavailable to single clouds poses challenges with respect to selecting the appropriate resources for deploying VMs. Reasons include the large distributed scale and VM performance fluctuations. Reusing previously \"successful\" resource coalitions may be an alternative to a brute force search employed by many existing scheduling algorithms. The reason for reusing resources is motivated by an implicit trust in previous successful executions that have not experienced VM performance fluctuations described in many research papers on cloud performance. Furthermore, the data deluge coming from services monitoring the load and availability of resources forces a shift in traditional centralized and decentralized resource management by emphasizing the need for edge computing. In this way only meta data is sent to the resource management system for resource matchmaking. In this paper we propose a bottom-up monitoring architecture and a proof-of-concept platform for scheduling applications based on resource coalition reuse. We consider static coalitions and neglect any interference from other coalitions by considering only the historical behavior of a particular coalition and not the overall state of the system in the past and now. We test our prototype on real traces by comparing with a random approach and discuss the results by outlying its benefits as well as some future work on run time coalition adaptation and global influences.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129281273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data Warehouse (DW) is a collection of data, consolidated from several heterogeneous sources, used to perform data analysis and support decision making in an organization. Extract-Transform-Load (ETL) phase plays a crucial role in designing DW. To overcome the complexity of the ETL phase, different studies have recently proposed the use of ontologies. Ontology-based ETL approaches have been used to reduce heterogeneity between data sources and ensure automation of the ETL process. Existing studies in semantic ETL have largely focused on fulfilling functional requirements. However, the ETL process quality dimension has not been sufficiently considered by these studies. As the amount of data has exploded with the advent of big data era, dealing with quality challenges in the early stages of designing the process become more important than ever. To address this issue, we propose to keep data quality requirements at the center of the ETL phase design. We present in this paper an approach, defining the ETL process at the ontological level. We define a set of quality indicators and quantitative measures that can anticipate data quality problems and identify causes of deficiencies. Our approach checks the quality of data before loading them into the target data warehouse to avoid the propagation of corrupted data. Finally, our proposal is validated through a case study, using Oracle Semantic DataBase sources (SDBs), where each source references the Lehigh University BenchMark ontology (LUBM).
数据仓库(Data Warehouse, DW)是来自多个异构源的数据集合,用于执行数据分析并支持组织中的决策制定。提取-转换-加载(ETL)阶段在DW设计中起着至关重要的作用。为了克服ETL阶段的复杂性,最近有不同的研究提出使用本体。基于本体的ETL方法已被用于减少数据源之间的异构性,并确保ETL过程的自动化。现有的语义ETL研究主要集中在功能需求的实现上。然而,这些研究并未充分考虑到ETL过程质量维度。随着大数据时代的到来,数据量呈爆炸式增长,在设计流程的早期阶段处理质量挑战变得比以往任何时候都更加重要。为了解决这个问题,我们建议将数据质量需求放在ETL阶段设计的中心。我们在本文中提出了一种在本体论层面上定义ETL过程的方法。我们定义了一套质量指标和量化措施,可以预测数据质量问题并确定缺陷的原因。我们的方法在将数据加载到目标数据仓库之前检查数据的质量,以避免损坏数据的传播。最后,通过使用Oracle语义数据库源(sdb)的案例研究验证了我们的建议,其中每个源都引用Lehigh University BenchMark本体(LUBM)。
{"title":"A Quality-Driven Approach for Building Heterogeneous Distributed Databases: The Case of Data Warehouses","authors":"Sabrina Abdellaoui, Ladjel Bellatreche, Fahima Nader","doi":"10.1109/CCGrid.2016.79","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.79","url":null,"abstract":"Data Warehouse (DW) is a collection of data, consolidated from several heterogeneous sources, used to perform data analysis and support decision making in an organization. Extract-Transform-Load (ETL) phase plays a crucial role in designing DW. To overcome the complexity of the ETL phase, different studies have recently proposed the use of ontologies. Ontology-based ETL approaches have been used to reduce heterogeneity between data sources and ensure automation of the ETL process. Existing studies in semantic ETL have largely focused on fulfilling functional requirements. However, the ETL process quality dimension has not been sufficiently considered by these studies. As the amount of data has exploded with the advent of big data era, dealing with quality challenges in the early stages of designing the process become more important than ever. To address this issue, we propose to keep data quality requirements at the center of the ETL phase design. We present in this paper an approach, defining the ETL process at the ontological level. We define a set of quality indicators and quantitative measures that can anticipate data quality problems and identify causes of deficiencies. Our approach checks the quality of data before loading them into the target data warehouse to avoid the propagation of corrupted data. Finally, our proposal is validated through a case study, using Oracle Semantic DataBase sources (SDBs), where each source references the Lehigh University BenchMark ontology (LUBM).","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133175505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The use of Graphics Processing Units (GPUs) presents several side effects, such as increased acquisition costs as well as larger space requirements. Furthermore, GPUs require a non-negligible amount of energy even while idle. Additionally, GPU utilization is usually low for most applications. Using the virtual GPUs provided by the remote GPU virtualization mechanism may address the concerns associated with the use of these devices. However, in the same way as workload managers map GPU resources to applications, virtual GPUs should also be scheduled before job execution. Nevertheless, current workload managers are not able to deal with virtual GPUs. In this paper we analyze the performance attained by a cluster using the rCUDA remote GPU virtualization middleware and a modified version of the Slurm workload manager, which is now able to map remote virtual GPUs to jobs. Results show that cluster throughput is doubled at the same time that total energy consumption is reduced up to 40%. GPU utilization is also increased.
{"title":"Increasing the Performance of Data Centers by Combining Remote GPU Virtualization with Slurm","authors":"Sergio Iserte, Javier Prades, C. Reaño, F. Silla","doi":"10.1109/CCGrid.2016.26","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.26","url":null,"abstract":"The use of Graphics Processing Units (GPUs) presents several side effects, such as increased acquisition costs as well as larger space requirements. Furthermore, GPUs require a non-negligible amount of energy even while idle. Additionally, GPU utilization is usually low for most applications. Using the virtual GPUs provided by the remote GPU virtualization mechanism may address the concerns associated with the use of these devices. However, in the same way as workload managers map GPU resources to applications, virtual GPUs should also be scheduled before job execution. Nevertheless, current workload managers are not able to deal with virtual GPUs. In this paper we analyze the performance attained by a cluster using the rCUDA remote GPU virtualization middleware and a modified version of the Slurm workload manager, which is now able to map remote virtual GPUs to jobs. Results show that cluster throughput is doubled at the same time that total energy consumption is reduced up to 40%. GPU utilization is also increased.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"54 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114102128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenzhao Zhang, Houjun Tang, Steve Harenberg, S. Byna, Xiaocheng Zou, D. Devendran, Daniel F. Martin, Kesheng Wu, Bin Dong, S. Klasky, N. Samatova
Frameworks that facilitate runtime data sharingacross multiple applications are of great importance for scientificdata analytics. Although existing frameworks work well overuniform mesh data, they can not effectively handle adaptive meshrefinement (AMR) data. Among the challenges to construct anAMR-capable framework include: (1) designing an architecturethat facilitates online AMR data management, (2) achievinga load-balanced AMR data distribution for the data stagingspace at runtime, and (3) building an effective online indexto support the unique spatial data retrieval requirements forAMR data. Towards addressing these challenges to supportruntime AMR data sharing across scientific applications, wepresent the AMRZone framework. Experiments over real-worldAMR datasets demonstrate AMRZone's effectiveness at achievinga balanced workload distribution, reading/writing large-scaledatasets with thousands of parallel processes, and satisfyingqueries with spatial constraints. Moreover, AMRZone's performance and scalability are even comparable with existing state-of-the-art work when tested over uniform mesh data with up to16384 cores, in the best case, our framework achieves a 46% performance improvement.
{"title":"AMRZone: A Runtime AMR Data Sharing Framework for Scientific Applications","authors":"Wenzhao Zhang, Houjun Tang, Steve Harenberg, S. Byna, Xiaocheng Zou, D. Devendran, Daniel F. Martin, Kesheng Wu, Bin Dong, S. Klasky, N. Samatova","doi":"10.1109/CCGrid.2016.62","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.62","url":null,"abstract":"Frameworks that facilitate runtime data sharingacross multiple applications are of great importance for scientificdata analytics. Although existing frameworks work well overuniform mesh data, they can not effectively handle adaptive meshrefinement (AMR) data. Among the challenges to construct anAMR-capable framework include: (1) designing an architecturethat facilitates online AMR data management, (2) achievinga load-balanced AMR data distribution for the data stagingspace at runtime, and (3) building an effective online indexto support the unique spatial data retrieval requirements forAMR data. Towards addressing these challenges to supportruntime AMR data sharing across scientific applications, wepresent the AMRZone framework. Experiments over real-worldAMR datasets demonstrate AMRZone's effectiveness at achievinga balanced workload distribution, reading/writing large-scaledatasets with thousands of parallel processes, and satisfyingqueries with spatial constraints. Moreover, AMRZone's performance and scalability are even comparable with existing state-of-the-art work when tested over uniform mesh data with up to16384 cores, in the best case, our framework achieves a 46% performance improvement.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124317301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the fast growth of the demand for Cloud computing services, the Cloud has become a very popular platform to develop distributed applications. Features that in the past were available only to big corporations, like fast scalability, availability, and reliability, are now accessible to any customer, including individuals and small companies, thanks to Cloud computing. In order to place an application, a designer must choose among VM types, from private and public cloud providers, those that are capable of hosting her application or its parts using as criteria application requirements, VM prices, and VM resources. This procedure becomes more complicated when the objective is to place large component based applications on multiple clouds. In this case, the number of possible configurations explodes making necessary the automation of the placement. In this context, scalability has a central role since the placement problem is a generalization of the NP-Hard multi-dimensional bin packing problem. In this paper we propose efficient greedy heuristics based on first fit decreasing and best fit algorithms, which are capable of computing near optimal solutions for very large applications, with the objective of minimizing costs and meeting application performance requirements. Through a meticulous evaluation, we show that the greedy heuristics took a few seconds to calculate near optimal solutions to placements that would require hours or even days when calculated using state of the art solutions, namely exact algorithms or meta-heuristics.
{"title":"Efficient Heuristics for Placing Large-Scale Distributed Applications on Multiple Clouds","authors":"Pedro Silva, Christian Pérez, F. Desprez","doi":"10.1109/CCGrid.2016.77","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.77","url":null,"abstract":"With the fast growth of the demand for Cloud computing services, the Cloud has become a very popular platform to develop distributed applications. Features that in the past were available only to big corporations, like fast scalability, availability, and reliability, are now accessible to any customer, including individuals and small companies, thanks to Cloud computing. In order to place an application, a designer must choose among VM types, from private and public cloud providers, those that are capable of hosting her application or its parts using as criteria application requirements, VM prices, and VM resources. This procedure becomes more complicated when the objective is to place large component based applications on multiple clouds. In this case, the number of possible configurations explodes making necessary the automation of the placement. In this context, scalability has a central role since the placement problem is a generalization of the NP-Hard multi-dimensional bin packing problem. In this paper we propose efficient greedy heuristics based on first fit decreasing and best fit algorithms, which are capable of computing near optimal solutions for very large applications, with the objective of minimizing costs and meeting application performance requirements. Through a meticulous evaluation, we show that the greedy heuristics took a few seconds to calculate near optimal solutions to placements that would require hours or even days when calculated using state of the art solutions, namely exact algorithms or meta-heuristics.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124675762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Project VIPRA [1] uses a new approach to modeling the potential spread of infections in airplanes, which involves tracking detailed movements of individual passengers. Inherent uncertainties are parameterized, and a parameter sweep carried out in this space to identify potential vulnerabilities. Simulation time is a major bottleneck for exploration of 'what-if' scenarios in a policy-making context under real-world time constraints. This paper identifies important bottlenecks to efficient computation: inefficiency in workflow, parallel IO, and load imbalance. Our solutions to the above problems include modifying the workflow, optimizing parallel IO, and a new scheme to predict computational time, which leads to efficient load balancing on fewer nodes than currently required. Our techniques reduce the computational time from several hours on 69,000 cores to around 20 minutes on around 39,000 cores on the Blue Waters machine for the same computation. The significance of this paper lies in identifying performance bottlenecks in this class of applications, which is crucial to public health, and presenting a solution that is effective in practice.
{"title":"Optimizing Massively Parallel Simulations of Infection Spread Through Air-Travel for Policy Analysis","authors":"A. Srinivasan, C. D. Sudheer, S. Namilae","doi":"10.1109/CCGrid.2016.23","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.23","url":null,"abstract":"Project VIPRA [1] uses a new approach to modeling the potential spread of infections in airplanes, which involves tracking detailed movements of individual passengers. Inherent uncertainties are parameterized, and a parameter sweep carried out in this space to identify potential vulnerabilities. Simulation time is a major bottleneck for exploration of 'what-if' scenarios in a policy-making context under real-world time constraints. This paper identifies important bottlenecks to efficient computation: inefficiency in workflow, parallel IO, and load imbalance. Our solutions to the above problems include modifying the workflow, optimizing parallel IO, and a new scheme to predict computational time, which leads to efficient load balancing on fewer nodes than currently required. Our techniques reduce the computational time from several hours on 69,000 cores to around 20 minutes on around 39,000 cores on the Blue Waters machine for the same computation. The significance of this paper lies in identifying performance bottlenecks in this class of applications, which is crucial to public health, and presenting a solution that is effective in practice.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116764420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the diversity in the applications that run in large distributed environments, many different application frameworks have been developed, such as MapReduce for data-intensive batch jobs and Spark for interactive data analytics. After initial deployment, a framework starts executing a large set of jobs that are submitted over time. When multiple such frameworks with time-varying resource demands are consolidated in a large distributed environment, static allocation of resources on a per-framework basis leads to low system utilization and to resource fragmentation. The goal of my PhD research is to improve the system utilization and framework performances in such consolidated environments by using dynamic resource allocation for efficient resource sharing among frameworks. My contribution towards this goal is a design and an implementation of a scalable resource manager that dynamically balances resources across set of multiple diverse frameworks in a large distributed environment based on resource requirements, system utilization or performance levels in the deployed frameworks.
{"title":"Towards a Resource Manager for Scheduling Frameworks","authors":"Aleksandra Kuzmanovska, R. H. Mak, D. Epema","doi":"10.1109/CCGrid.2016.70","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.70","url":null,"abstract":"Due to the diversity in the applications that run in large distributed environments, many different application frameworks have been developed, such as MapReduce for data-intensive batch jobs and Spark for interactive data analytics. After initial deployment, a framework starts executing a large set of jobs that are submitted over time. When multiple such frameworks with time-varying resource demands are consolidated in a large distributed environment, static allocation of resources on a per-framework basis leads to low system utilization and to resource fragmentation. The goal of my PhD research is to improve the system utilization and framework performances in such consolidated environments by using dynamic resource allocation for efficient resource sharing among frameworks. My contribution towards this goal is a design and an implementation of a scalable resource manager that dynamically balances resources across set of multiple diverse frameworks in a large distributed environment based on resource requirements, system utilization or performance levels in the deployed frameworks.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126983601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthew G. F. Dosanjh, Taylor L. Groves, Ryan E. Grant, R. Brightwell, P. Bridges
Reaching Exascale will require leveraging massive parallelism while potentially leveraging asynchronous communication to help achieve scalability at such large levels of concurrency. MPI is a good candidate for providing the mechanisms to support communication at such large scales. Two existing MPI mechanisms are particularly relevant to Exascale: multi-threading, to support massive concurrency, and Remote Memory Access (RMA), to support asynchronous communication. Unfortunately, multi-threaded MPI RMA code has not been extensively studied. Part of the reason for this is that no public benchmarks or proxy applications exist to assess its performance. The contributions of this paper are the design and demonstration of the first available proxy applications and micro-benchmark suite for multi-threaded RMA in MPI, a study of multi-threaded RMA performance of different MPI implementations, and an evaluation of how these benchmarks can be used to test development for both performance and correctness.
{"title":"RMA-MT: A Benchmark Suite for Assessing MPI Multi-threaded RMA Performance","authors":"Matthew G. F. Dosanjh, Taylor L. Groves, Ryan E. Grant, R. Brightwell, P. Bridges","doi":"10.1109/CCGrid.2016.84","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.84","url":null,"abstract":"Reaching Exascale will require leveraging massive parallelism while potentially leveraging asynchronous communication to help achieve scalability at such large levels of concurrency. MPI is a good candidate for providing the mechanisms to support communication at such large scales. Two existing MPI mechanisms are particularly relevant to Exascale: multi-threading, to support massive concurrency, and Remote Memory Access (RMA), to support asynchronous communication. Unfortunately, multi-threaded MPI RMA code has not been extensively studied. Part of the reason for this is that no public benchmarks or proxy applications exist to assess its performance. The contributions of this paper are the design and demonstration of the first available proxy applications and micro-benchmark suite for multi-threaded RMA in MPI, a study of multi-threaded RMA performance of different MPI implementations, and an evaluation of how these benchmarks can be used to test development for both performance and correctness.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128893820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hadoop and Spark analytics are used widely for large-scale data processing on commodity clusters. It is better choice to run them on supercomputers in aspects of productivity and maturity rather than developing new frameworks from scratch. YARN, a key component of Hadoop, is responsible for resource management. YARN adopts dynamic management for job execution and scheduling. We identify three Ds (3D) dynamic characteristics from YARN-like management: on-Demand (processes created during job execution), Diverse job, and Detailed (fine-grained allocation). The dynamic management does not fit into typical resource managers on supercomputers, for example PBS, that are identified having three Ss (3S) static characteristics: Stationary (no newly created process during execution), Single job, and Shallow (coarse-grained allocation). In this paper, we propose HPC-Reuse located between YARN-like and PBS-like resource managers in order to provide better support of dynamic management. HPC-Reuse helps avoid process creation, such as MPI-Spawn, and enable MPI communication over Hadoop processes. Our experimental results show that HPC-Reuse can reduce execution time of iterative PageRank by 26%.
{"title":"HPC-Reuse: Efficient Process Creation for Running MPI and Hadoop MapReduce on Supercomputers","authors":"Thanh-Chung Dao, S. Chiba","doi":"10.1109/CCGrid.2016.72","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.72","url":null,"abstract":"Hadoop and Spark analytics are used widely for large-scale data processing on commodity clusters. It is better choice to run them on supercomputers in aspects of productivity and maturity rather than developing new frameworks from scratch. YARN, a key component of Hadoop, is responsible for resource management. YARN adopts dynamic management for job execution and scheduling. We identify three Ds (3D) dynamic characteristics from YARN-like management: on-Demand (processes created during job execution), Diverse job, and Detailed (fine-grained allocation). The dynamic management does not fit into typical resource managers on supercomputers, for example PBS, that are identified having three Ss (3S) static characteristics: Stationary (no newly created process during execution), Single job, and Shallow (coarse-grained allocation). In this paper, we propose HPC-Reuse located between YARN-like and PBS-like resource managers in order to provide better support of dynamic management. HPC-Reuse helps avoid process creation, such as MPI-Spawn, and enable MPI communication over Hadoop processes. Our experimental results show that HPC-Reuse can reduce execution time of iterative PageRank by 26%.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125474053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}