Pub Date : 2019-11-09DOI: 10.1007/s10586-019-03013-0
George Roumelis, Polychronis Velentzas, M. Vassilakopoulos, A. Corral, Athanasios Fevgas, Y. Manolopoulos
{"title":"Parallel processing of spatial batch-queries using xBR+-trees in solid-state drives","authors":"George Roumelis, Polychronis Velentzas, M. Vassilakopoulos, A. Corral, Athanasios Fevgas, Y. Manolopoulos","doi":"10.1007/s10586-019-03013-0","DOIUrl":"https://doi.org/10.1007/s10586-019-03013-0","url":null,"abstract":"","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90444729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. C. Heinrich, Tom Cornebize, A. Degomme, Arnaud Legrand, Alexandra Carpen-Amarie, S. Hunold, Anne-Cécile Orgerie, M. Quinson
Monitoring and assessing the energy efficiency of supercomputers and data centers is crucial in order to limit and reduce their energy consumption. Applications from the domain of High Performance Computing (HPC), such as MPI applications, account for a significant fraction of the overall energy consumed by HPC centers. Simulation is a popular approach for studying the behavior of these applications in a variety of scenarios, and it is therefore advantageous to be able to study their energy consumption in a cost-efficient, controllable, and also reproducible simulation environment. Alas, simulators supporting HPC applications commonly lack the capability of predicting the energy consumption, particularly when target platforms consist of multi-core nodes. In this work, we aim to accurately predict the energy consumption of MPI applications via simulation. Firstly, we introduce the models required for meaningful simulations: The computation model, the communication model, and the energy model of the target platform. Secondly, we demonstrate that by carefully calibrating these models on a single node, the predicted energy consumption of HPC applications at a larger scale is very close (within a few percents) to real experiments. We further show how to integrate such models into the SimGrid simulation toolkit. In order to obtain good execution time predictions on multi-core architectures, we also establish that it is vital to correctly account for memory effects in simulation. The proposed simulator is validated through an extensive set of experiments with wellknown HPC benchmarks. Lastly, we show the simulator can be used to study applications at scale, which allows researchers to save both time and resources compared to real experiments.
{"title":"Predicting the Energy-Consumption of MPI Applications at Scale Using Only a Single Node","authors":"F. C. Heinrich, Tom Cornebize, A. Degomme, Arnaud Legrand, Alexandra Carpen-Amarie, S. Hunold, Anne-Cécile Orgerie, M. Quinson","doi":"10.1109/CLUSTER.2017.66","DOIUrl":"https://doi.org/10.1109/CLUSTER.2017.66","url":null,"abstract":"Monitoring and assessing the energy efficiency of supercomputers and data centers is crucial in order to limit and reduce their energy consumption. Applications from the domain of High Performance Computing (HPC), such as MPI applications, account for a significant fraction of the overall energy consumed by HPC centers. Simulation is a popular approach for studying the behavior of these applications in a variety of scenarios, and it is therefore advantageous to be able to study their energy consumption in a cost-efficient, controllable, and also reproducible simulation environment. Alas, simulators supporting HPC applications commonly lack the capability of predicting the energy consumption, particularly when target platforms consist of multi-core nodes. In this work, we aim to accurately predict the energy consumption of MPI applications via simulation. Firstly, we introduce the models required for meaningful simulations: The computation model, the communication model, and the energy model of the target platform. Secondly, we demonstrate that by carefully calibrating these models on a single node, the predicted energy consumption of HPC applications at a larger scale is very close (within a few percents) to real experiments. We further show how to integrate such models into the SimGrid simulation toolkit. In order to obtain good execution time predictions on multi-core architectures, we also establish that it is vital to correctly account for memory effects in simulation. The proposed simulator is validated through an extensive set of experiments with wellknown HPC benchmarks. Lastly, we show the simulator can be used to study applications at scale, which allows researchers to save both time and resources compared to real experiments.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75522239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01Epub Date: 2017-09-26DOI: 10.1109/CLUSTER.2017.28
Willian Barreiros, George Teodoro, Tahsin Kurc, Jun Kong, Alba C M A Melo, Joel Saltz
We investigate efficient sensitivity analysis (SA) of algorithms that segment and classify image features in a large dataset of high-resolution images. Algorithm SA is the process of evaluating variations of methods and parameter values to quantify differences in the output. A SA can be very compute demanding because it requires re-processing the input dataset several times with different parameters to assess variations in output. In this work, we introduce strategies to efficiently speed up SA via runtime optimizations targeting distributed hybrid systems and reuse of computations from runs with different parameters. We evaluate our approach using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. The SA attained a parallel efficiency of over 90% on 256 nodes. The cooperative execution using the CPUs and the Phi available in each node with smart task assignment strategies resulted in an additional speedup of about 2×. Finally, multi-level computation reuse lead to an additional speedup of up to 2.46× on the parallel version. The level of performance attained with the proposed optimizations will allow the use of SA in large-scale studies.
{"title":"Parallel and Efficient Sensitivity Analysis of Microscopy Image Segmentation Workflows in Hybrid Systems.","authors":"Willian Barreiros, George Teodoro, Tahsin Kurc, Jun Kong, Alba C M A Melo, Joel Saltz","doi":"10.1109/CLUSTER.2017.28","DOIUrl":"https://doi.org/10.1109/CLUSTER.2017.28","url":null,"abstract":"<p><p>We investigate efficient sensitivity analysis (SA) of algorithms that segment and classify image features in a large dataset of high-resolution images. Algorithm SA is the process of evaluating variations of methods and parameter values to quantify differences in the output. A SA can be very compute demanding because it requires re-processing the input dataset several times with different parameters to assess variations in output. In this work, we introduce strategies to efficiently speed up SA via runtime optimizations targeting distributed hybrid systems and reuse of computations from runs with different parameters. We evaluate our approach using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. The SA attained a parallel efficiency of over 90% on 256 nodes. The cooperative execution using the CPUs and the Phi available in each node with smart task assignment strategies resulted in an additional speedup of about 2×. Finally, multi-level computation reuse lead to an additional speedup of up to 2.46× on the parallel version. The level of performance attained with the proposed optimizations will allow the use of SA in large-scale studies.</p>","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CLUSTER.2017.28","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35648091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-09-01DOI: 10.1109/CLUSTER.2013.6702606
Craig Stewart
On behalf of the organizing committee, I am pleased to welcome you to Indianapolis and the 15th IEEE International Conference on Cluster Computing. I hope you enjoy your visit to our beautiful city. Indianapolis has undergone a real renaissance in recent years with many new buildings and an array of new highlights including excellent museums related to culture, the arts, and sports.
{"title":"Letter from the general chair","authors":"Craig Stewart","doi":"10.1109/CLUSTER.2013.6702606","DOIUrl":"https://doi.org/10.1109/CLUSTER.2013.6702606","url":null,"abstract":"On behalf of the organizing committee, I am pleased to welcome you to Indianapolis and the 15th IEEE International Conference on Cluster Computing. I hope you enjoy your visit to our beautiful city. Indianapolis has undergone a real renaissance in recent years with many new buildings and an array of new highlights including excellent museums related to culture, the arts, and sports.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83314788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-10-16DOI: 10.1109/CLUSTR.2009.5289126
D. Jamsek, E. V. Hensbergen
The complexity of modern microprocessor design involving billions of transistors at increasingly denser scales creates many challenges particularly in the area of design reliability and predictable yields. Researchers at IBM's Austin Research Lab have increasingly depended on software based simulation of various aspects of the design and manufacturing process to help address these challenges. The computational complexity and sheer scale of these simulations have lead to the exploration of the application of high-performance hybrid computing clusters to accelerate the design process. Currently, the hybrid clusters in use are composed primarily of commodity workstations and servers incorporating commodity NVIDIA-based GPU graphics cards and TESLA GPU computational accelerators. We have also been experimenting with blade clusters composed of both general purpose servers and PowerXcell accelerators leveraging the computational throughput of the Cell processor. In this paper we will detail our experiences with accelerating our workloads on these hybrid cluster platforms. We will discuss our initial approach of combining hybrid runtimes such as CUDA with MPI to address cluster computation. We will also describe a custom cluster hybrid infrastructure we are developing to deal with some of the perceived shortcomings of MPI and other traditional cluster tools when dealing with hybrid computing environments.
现代微处理器设计的复杂性涉及数十亿个晶体管在越来越密集的尺度上,特别是在设计可靠性和可预测的产量方面带来了许多挑战。IBM Austin Research Lab的研究人员越来越依赖于基于软件的设计和制造过程各个方面的模拟来帮助解决这些挑战。这些模拟的计算复杂性和规模导致了高性能混合计算集群应用的探索,以加速设计过程。目前,使用的混合集群主要由商用工作站和服务器组成,这些工作站和服务器结合了基于nvidia的商用GPU图形卡和TESLA GPU计算加速器。我们还试验了由通用服务器和PowerXcell加速器组成的刀片集群,利用Cell处理器的计算吞吐量。在本文中,我们将详细介绍在这些混合集群平台上加速工作负载的经验。我们将讨论将混合运行时(如CUDA)与MPI相结合以解决集群计算的初始方法。我们还将描述我们正在开发的自定义集群混合基础设施,以解决MPI和其他传统集群工具在处理混合计算环境时存在的一些明显缺点。
{"title":"Experiences with hybrid clusters","authors":"D. Jamsek, E. V. Hensbergen","doi":"10.1109/CLUSTR.2009.5289126","DOIUrl":"https://doi.org/10.1109/CLUSTR.2009.5289126","url":null,"abstract":"The complexity of modern microprocessor design involving billions of transistors at increasingly denser scales creates many challenges particularly in the area of design reliability and predictable yields. Researchers at IBM's Austin Research Lab have increasingly depended on software based simulation of various aspects of the design and manufacturing process to help address these challenges. The computational complexity and sheer scale of these simulations have lead to the exploration of the application of high-performance hybrid computing clusters to accelerate the design process. Currently, the hybrid clusters in use are composed primarily of commodity workstations and servers incorporating commodity NVIDIA-based GPU graphics cards and TESLA GPU computational accelerators. We have also been experimenting with blade clusters composed of both general purpose servers and PowerXcell accelerators leveraging the computational throughput of the Cell processor. In this paper we will detail our experiences with accelerating our workloads on these hybrid cluster platforms. We will discuss our initial approach of combining hybrid runtimes such as CUDA with MPI to address cluster computation. We will also describe a custom cluster hybrid infrastructure we are developing to deal with some of the perceived shortcomings of MPI and other traditional cluster tools when dealing with hybrid computing environments.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90504882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-08-01DOI: 10.1109/CLUSTR.2009.5289149
S. Loebman, D. Nunley, YongChul Kwon, B. Howe, M. Balazinska, J. Gardner
As the datasets used to fuel modern scientific discovery grow increasingly large, they become increasingly difficult to manage using conventional software. Parallel database management systems (DBMSs) and massive-scale data processing systems such as MapReduce hold promise to address this challenge. However, since these systems have not been expressly designed for scientific applications, their efficacy in this domain has not been thoroughly tested. In this paper, we study the performance of these engines in one specific domain: massive astrophysical simulations. We develop a use case that comprises five representative queries. We implement this use case in one distributed DBMS and in the Pig/Hadoop system. We compare the performance of the tools to each other and to hand-written IDL scripts. We find that certain representative analyses are easy to express in each engine's highlevel language and both systems provide competitive performance and improved scalability relative to current IDL-based methods.
{"title":"2009 IEEE International Conference on Cluster Computing and Workshops","authors":"S. Loebman, D. Nunley, YongChul Kwon, B. Howe, M. Balazinska, J. Gardner","doi":"10.1109/CLUSTR.2009.5289149","DOIUrl":"https://doi.org/10.1109/CLUSTR.2009.5289149","url":null,"abstract":"As the datasets used to fuel modern scientific discovery grow increasingly large, they become increasingly difficult to manage using conventional software. Parallel database management systems (DBMSs) and massive-scale data processing systems such as MapReduce hold promise to address this challenge. However, since these systems have not been expressly designed for scientific applications, their efficacy in this domain has not been thoroughly tested. In this paper, we study the performance of these engines in one specific domain: massive astrophysical simulations. We develop a use case that comprises five representative queries. We implement this use case in one distributed DBMS and in the Pig/Hadoop system. We compare the performance of the tools to each other and to hand-written IDL scripts. We find that certain representative analyses are easy to express in each engine's highlevel language and both systems provide competitive performance and improved scalability relative to current IDL-based methods.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89406090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-31DOI: 10.1109/CLUSTR.2008.4663749
D. Reed
Without doubt, scientific discovery, business practice and social interactions are moving rapidly from a world of homogeneous and local systems to a world of distributed software, virtual organizations and cloud computing infrastructure, all powered by multicore processors and large-scale infrastructure. In science, a tsunami of new experimental and computational data and a suite of increasingly ubiquitous sensors pose vexing problems in data analysis, transport, visualization and collaboration. In society and business, software as a service and cloud computing are empowering distributed groups. Letpsilas step back and think about the longer term future. Where is the technology going and what are the implications? What architectures are appropriate? How to we manage power and scale? What are the right size building blocks? How do we come to grips with the fact that our clusters and data centers are now bigger than the Internet was just a few years ago? How do we develop and support malleable software? What is the ecosystem of components in which distributed, data rich applications will operate? How do we optimize performance and reliability? How do we program these systems?
{"title":"Clouds, clusters and ManyCore: The revolution ahead","authors":"D. Reed","doi":"10.1109/CLUSTR.2008.4663749","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663749","url":null,"abstract":"Without doubt, scientific discovery, business practice and social interactions are moving rapidly from a world of homogeneous and local systems to a world of distributed software, virtual organizations and cloud computing infrastructure, all powered by multicore processors and large-scale infrastructure. In science, a tsunami of new experimental and computational data and a suite of increasingly ubiquitous sensors pose vexing problems in data analysis, transport, visualization and collaboration. In society and business, software as a service and cloud computing are empowering distributed groups. Letpsilas step back and think about the longer term future. Where is the technology going and what are the implications? What architectures are appropriate? How to we manage power and scale? What are the right size building blocks? How do we come to grips with the fact that our clusters and data centers are now bigger than the Internet was just a few years ago? How do we develop and support malleable software? What is the ecosystem of components in which distributed, data rich applications will operate? How do we optimize performance and reliability? How do we program these systems?","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85360213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-31DOI: 10.1109/CLUSTR.2008.4663772
D. Panda
Clusters with commodity multi-core processors and commodity networking technologies are providing cost-effective solutions for building next generation high-end systems including HPC clusters, servers, parallel file systems and multi-tier data-centers. The talk focus on two emerging networking technologies (InfiniBand and 10 GE/iWARP) and their associated protocols for designing such systems. In this talk, we critically examine the current and future trends of these technologies and their applicability for designing next generation petascale clusters. The talk start with the motivations behind these technologies and then focus on their architectural aspects and applicability to SAN, LAN and WAN-based clusters. Designing next generation clusters with high performance, scalability and RAS (reliability, availability and serviceability) capabilities by using these technologies will be examined. Current and future trends of InfiniBand and iWARP products was highlighted. The emerging OpenFabrics software stack, focusing both these technologies in an integrated manner, was presented. Finally, a set of case studies in designing various clusters with these networking technologies was presented to outline the associated opportunities and challenges.
{"title":"Designing next generation clusters with InfiniBand and 10GE/iWARP: Opportunities and challenges","authors":"D. Panda","doi":"10.1109/CLUSTR.2008.4663772","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663772","url":null,"abstract":"Clusters with commodity multi-core processors and commodity networking technologies are providing cost-effective solutions for building next generation high-end systems including HPC clusters, servers, parallel file systems and multi-tier data-centers. The talk focus on two emerging networking technologies (InfiniBand and 10 GE/iWARP) and their associated protocols for designing such systems. In this talk, we critically examine the current and future trends of these technologies and their applicability for designing next generation petascale clusters. The talk start with the motivations behind these technologies and then focus on their architectural aspects and applicability to SAN, LAN and WAN-based clusters. Designing next generation clusters with high performance, scalability and RAS (reliability, availability and serviceability) capabilities by using these technologies will be examined. Current and future trends of InfiniBand and iWARP products was highlighted. The emerging OpenFabrics software stack, focusing both these technologies in an integrated manner, was presented. Finally, a set of case studies in designing various clusters with these networking technologies was presented to outline the associated opportunities and challenges.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82353436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-17DOI: 10.1109/CLUSTR.2007.4629271
Ryan E. Grant, A. Afsahi
The performance of the emerging commercial chip multithreaded multiprocessors is of great importance to the high performance computing community. However, the growing power consumption of such systems is of increasing concern, and techniques that could be effectively used to increase overall system power efficiency while sustaining performance are very desirable.
{"title":"Improving system efficiency through scheduling and power management","authors":"Ryan E. Grant, A. Afsahi","doi":"10.1109/CLUSTR.2007.4629271","DOIUrl":"https://doi.org/10.1109/CLUSTR.2007.4629271","url":null,"abstract":"The performance of the emerging commercial chip multithreaded multiprocessors is of great importance to the high performance computing community. However, the growing power consumption of such systems is of increasing concern, and techniques that could be effectively used to increase overall system power efficiency while sustaining performance are very desirable.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87591797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}