Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903723
F. Angiulli, S. Basta, Stefano Lodi, Claudio Sartori
Outlier detection is a data mining task consisting in the discovery of observations which deviate substantially from the rest of the data, and has many important practical applications. Outlier detection in very large data sets is however computationally very demanding and the size limit of the data that can be elaborated is considerably pushed forward by mixing three ingredients: efficient algorithms, intra-cpu parallelism of high-performance architectures, network level parallelism. In this paper we propose an outlier detection algorithm able to exploit the internal parallelism of a GPU and the external parallelism of a cluster of GPU. The algorithm is the evolution of our previous solutions which considered either GPU or network level parallelism. We discuss a set of large scale experiments executed in a supercomputing facility and show the speedup obtained with varying number of nodes.
{"title":"Accelerating outlier detection with intra- and inter-node parallelism","authors":"F. Angiulli, S. Basta, Stefano Lodi, Claudio Sartori","doi":"10.1109/HPCSim.2014.6903723","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903723","url":null,"abstract":"Outlier detection is a data mining task consisting in the discovery of observations which deviate substantially from the rest of the data, and has many important practical applications. Outlier detection in very large data sets is however computationally very demanding and the size limit of the data that can be elaborated is considerably pushed forward by mixing three ingredients: efficient algorithms, intra-cpu parallelism of high-performance architectures, network level parallelism. In this paper we propose an outlier detection algorithm able to exploit the internal parallelism of a GPU and the external parallelism of a cluster of GPU. The algorithm is the evolution of our previous solutions which considered either GPU or network level parallelism. We discuss a set of large scale experiments executed in a supercomputing facility and show the speedup obtained with varying number of nodes.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"11 1","pages":"476-483"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82204292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903767
Gorker Alp Malazgirt, A. Yurdakul, S. Niar
Research has shown that the memory load/store instructions consume an important part in execution time and energy consumption. Extracting available parallelism at different granularity has been an important approach for designing next generation highly parallel systems. In this work, we present MIPT, an architecture exploration framework that leverages instruction parallelism of memory and ALU operations from a sequential algorithm's execution trace. MIPT heuristics recommend memory port sizes and issue slot sizes for memory and ALU operations. Its custom simulator simulates and evaluates the recommended parallel version of the execution trace for measuring performance improvements versus dual port memory. MIPT's architecture exploration criteria is to improve performance by utilizing systems with multi-port memories and multi-issue ALUs. There exists design exploration tools such as Multi2Sim and Trimaran. These simulators offer customization of multi-port memory architectures but designers' initial starting points are usually unclear. Thus, MIPT can suggest initial starting point for customization in those design exploration systems. In addition, given same application with two different implementations, it is possible to compare their execution time by the MIPT simulator.
{"title":"MIPT: Rapid exploration and evaluation for migrating sequential algorithms to multiprocessing systems with multi-port memories","authors":"Gorker Alp Malazgirt, A. Yurdakul, S. Niar","doi":"10.1109/HPCSim.2014.6903767","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903767","url":null,"abstract":"Research has shown that the memory load/store instructions consume an important part in execution time and energy consumption. Extracting available parallelism at different granularity has been an important approach for designing next generation highly parallel systems. In this work, we present MIPT, an architecture exploration framework that leverages instruction parallelism of memory and ALU operations from a sequential algorithm's execution trace. MIPT heuristics recommend memory port sizes and issue slot sizes for memory and ALU operations. Its custom simulator simulates and evaluates the recommended parallel version of the execution trace for measuring performance improvements versus dual port memory. MIPT's architecture exploration criteria is to improve performance by utilizing systems with multi-port memories and multi-issue ALUs. There exists design exploration tools such as Multi2Sim and Trimaran. These simulators offer customization of multi-port memory architectures but designers' initial starting points are usually unclear. Thus, MIPT can suggest initial starting point for customization in those design exploration systems. In addition, given same application with two different implementations, it is possible to compare their execution time by the MIPT simulator.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"776-783"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85507415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903742
G. Cuccuru, Simone Leo, L. Lianas, Michele Muggiri, Andrea Pinna, L. Pireddu, P. Uva, A. Angius, G. Fotia, G. Zanetti
The number of domains affected by the big data phenomenon is constantly increasing, both in science and industry, with high-throughput DNA sequencers being among the most massive data producers. Building analysis frameworks that can keep up with such a high production rate, however, is only part of the problem: current challenges include dealing with articulated data repositories where objects are connected by multiple relationships, managing complex processing pipelines where each step depends on a large number of configuration parameters and ensuring reproducibility, error control and usability by non-technical staff. Here we describe an automated infrastructure built to address the above issues in the context of the analysis of the data produced by the CRS4 next-generation sequencing facility. The system integrates open source tools, either written by us or publicly available, into a framework that can handle the whole data transformation process, from raw sequencer output to primary analysis results.
{"title":"An automated infrastructure to support high-throughput bioinformatics","authors":"G. Cuccuru, Simone Leo, L. Lianas, Michele Muggiri, Andrea Pinna, L. Pireddu, P. Uva, A. Angius, G. Fotia, G. Zanetti","doi":"10.1109/HPCSim.2014.6903742","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903742","url":null,"abstract":"The number of domains affected by the big data phenomenon is constantly increasing, both in science and industry, with high-throughput DNA sequencers being among the most massive data producers. Building analysis frameworks that can keep up with such a high production rate, however, is only part of the problem: current challenges include dealing with articulated data repositories where objects are connected by multiple relationships, managing complex processing pipelines where each step depends on a large number of configuration parameters and ensuring reproducibility, error control and usability by non-technical staff. Here we describe an automated infrastructure built to address the above issues in the context of the analysis of the data produced by the CRS4 next-generation sequencing facility. The system integrates open source tools, either written by us or publicly available, into a framework that can handle the whole data transformation process, from raw sequencer output to primary analysis results.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"21 1","pages":"600-607"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86619025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903668
Milan Stanic, Oscar Palomar, Ivan Ratković, M. Duric, O. Unsal, A. Cristal, M. Valero
Graph500 is a data intensive application for high performance computing and it is an increasingly important workload because graphs are a core part of most analytic applications. So far there is no work that examines if Graph500 is suitable for vectorization mostly due a lack of vector memory instructions for irregular memory accesses. The Xeon Phi is a massively parallel processor recently released by Intel with new features such as a wide 512-bit vector unit and vector scatter/gather instructions. Thus, the Xeon Phi allows for more efficient parallelization of Graph500 that is combined with vectorization. In this paper we vectorize Graph500 and analyze the impact of vectorization and prefetching on the Xeon Phi. We also show that the combination of parallelization, vectorization and prefetching yields a speedup of 27% over a parallel version with prefetching that does not leverage the vector capabilities of the Xeon Phi.
{"title":"Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi","authors":"Milan Stanic, Oscar Palomar, Ivan Ratković, M. Duric, O. Unsal, A. Cristal, M. Valero","doi":"10.1109/HPCSim.2014.6903668","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903668","url":null,"abstract":"Graph500 is a data intensive application for high performance computing and it is an increasingly important workload because graphs are a core part of most analytic applications. So far there is no work that examines if Graph500 is suitable for vectorization mostly due a lack of vector memory instructions for irregular memory accesses. The Xeon Phi is a massively parallel processor recently released by Intel with new features such as a wide 512-bit vector unit and vector scatter/gather instructions. Thus, the Xeon Phi allows for more efficient parallelization of Graph500 that is combined with vectorization. In this paper we vectorize Graph500 and analyze the impact of vectorization and prefetching on the Xeon Phi. We also show that the combination of parallelization, vectorization and prefetching yields a speedup of 27% over a parallel version with prefetching that does not leverage the vector capabilities of the Xeon Phi.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"29 1","pages":"47-54"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86678215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903738
Samih Souissi, A. Serhrouchni
In recent years, information systems have become more diverse and complex making them a privileged target of network and computer attacks. These attacks have increased tremendously and turned out to be more sophisticated and evolving in an unpredictable manner. This work presents an attack model called AIDD (Attacks Identification Description and Defense). It offers a generic attack modeling to classify, help identify and defend against computer and network attacks. Our approach takes into account several attack properties in order to simplify attack handling and aggregate defense mechanisms. The originality in our work is that it introduces a target centric classification which increases the level of abstraction in order to offer a generic model to describe complex attacks.
{"title":"AIDD: A novel generic attack modeling approach","authors":"Samih Souissi, A. Serhrouchni","doi":"10.1109/HPCSim.2014.6903738","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903738","url":null,"abstract":"In recent years, information systems have become more diverse and complex making them a privileged target of network and computer attacks. These attacks have increased tremendously and turned out to be more sophisticated and evolving in an unpredictable manner. This work presents an attack model called AIDD (Attacks Identification Description and Defense). It offers a generic attack modeling to classify, help identify and defend against computer and network attacks. Our approach takes into account several attack properties in order to simplify attack handling and aggregate defense mechanisms. The originality in our work is that it introduces a target centric classification which increases the level of abstraction in order to offer a generic model to describe complex attacks.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"3 1","pages":"580-583"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79194614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903717
S. Cirani, Luca Davoli, Marco Picone, L. Veltri
In recent years, due to the development and innovation in hardware and software, the scenario of a global worldwide network capable of interconnecting both traditional nodes and new Smart Objects (the Internet of Things) is coming true. The Internet of Things (IoT) will involve billions of communicating heterogeneous devices, using different protocols in order to enable new forms of interaction between things and people. In this context, due to scalability, fault-tolerance, and self-configuration requirements, peer-to-peer(P2P) architectures are very appealing in many large-scale IoT scenarios. However, due to memory, processing, and power limitations of constrained devices, the use of specific signaling protocols for the maintenance of the P2P overlay is a critical point. In this paper we present a performance evaluation of a real DHT-based P2P overlay in order to understand the benefits in terms of bandwidth consumption and transmitted/received data when a constrained SIP-based protocol, denoted as CoSIP, is used as P2P signaling protocol.
{"title":"Performance evaluation of a SIP-based constrained peer-to-peer overlay","authors":"S. Cirani, Luca Davoli, Marco Picone, L. Veltri","doi":"10.1109/HPCSim.2014.6903717","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903717","url":null,"abstract":"In recent years, due to the development and innovation in hardware and software, the scenario of a global worldwide network capable of interconnecting both traditional nodes and new Smart Objects (the Internet of Things) is coming true. The Internet of Things (IoT) will involve billions of communicating heterogeneous devices, using different protocols in order to enable new forms of interaction between things and people. In this context, due to scalability, fault-tolerance, and self-configuration requirements, peer-to-peer(P2P) architectures are very appealing in many large-scale IoT scenarios. However, due to memory, processing, and power limitations of constrained devices, the use of specific signaling protocols for the maintenance of the P2P overlay is a critical point. In this paper we present a performance evaluation of a real DHT-based P2P overlay in order to understand the benefits in terms of bandwidth consumption and transmitted/received data when a constrained SIP-based protocol, denoted as CoSIP, is used as P2P signaling protocol.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"432-435"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88914621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903718
Farag Azzedin, Mohammed Onimisi Yahaya
Although BitTorrent is gaining popularity and continuous usage as a file sharing system, it is bedeviled with some challenges. Developers and researchers alike are exerting efforts to address such challenges. Free riders exploitation of BitTorrent is a widely acknowledged problem. In particular, free riders misuse BitTorrent through both optimistic and regular unchoke. In this article, we use game theory to model the BitTorrent choking algorithm and conduct extensive performance evaluation experiments to assess its performance as compared with the original BitTorrent choking algorithm.
{"title":"Analyzing and modeling BitTorrent: A game theory approach","authors":"Farag Azzedin, Mohammed Onimisi Yahaya","doi":"10.1109/HPCSim.2014.6903718","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903718","url":null,"abstract":"Although BitTorrent is gaining popularity and continuous usage as a file sharing system, it is bedeviled with some challenges. Developers and researchers alike are exerting efforts to address such challenges. Free riders exploitation of BitTorrent is a widely acknowledged problem. In particular, free riders misuse BitTorrent through both optimistic and regular unchoke. In this article, we use game theory to model the BitTorrent choking algorithm and conduct extensive performance evaluation experiments to assess its performance as compared with the original BitTorrent choking algorithm.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"37 1","pages":"436-443"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86372893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903762
I. Epicoco, S. Mocavero, F. Macchia, G. Aloisio
The present work describes the analysis and optimisation of the PELAGOS025 configuration based on the coupling of the NEMO physic component of the ocean dynamics and the BFM (Biogeochemical Flux Model), a sophisticated biogeochemical model that can simulate both pelagic and benthic processes. The methodology here followed is characterised by the performance analysis of the original parallel code, in terms of strong scalability, the definition of the bottlenecks limiting the scalability when the number of processes increases, the analysis of the features of the most computational intensive kernels through the Roofline model which provides an insightful visual performance model for multicore architectures and which allows to measure and compare the performance of one or more computational kernels run on different hardware architectures.
{"title":"The roofline model for oceanic climate applications","authors":"I. Epicoco, S. Mocavero, F. Macchia, G. Aloisio","doi":"10.1109/HPCSim.2014.6903762","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903762","url":null,"abstract":"The present work describes the analysis and optimisation of the PELAGOS025 configuration based on the coupling of the NEMO physic component of the ocean dynamics and the BFM (Biogeochemical Flux Model), a sophisticated biogeochemical model that can simulate both pelagic and benthic processes. The methodology here followed is characterised by the performance analysis of the original parallel code, in terms of strong scalability, the definition of the bottlenecks limiting the scalability when the number of processes increases, the analysis of the features of the most computational intensive kernels through the Roofline model which provides an insightful visual performance model for multicore architectures and which allows to measure and compare the performance of one or more computational kernels run on different hardware architectures.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"41 1","pages":"732-737"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86441061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903691
Fouad Hanna, J. Lapayre, L. Droz-Bartholet
It is well known that consensus algorithms are fundamental building blocks for fault tolerant distributed systems. In the literature of consensus, many algorithms have been proposed to solve this problem in different system models but few attempts have been made to analyze their performance. In this paper we present a new leader-based consensus algorithm (FLC algorithm) for the crash-stop failure model. Our algorithm uses the leader oracle Ω and adapts a decentralized communication pattern. In addition, we analyze and compare the performance of our algorithm to four of the most well-known consensus algorithms among asynchronous distributed systems of the crash-stop failure model. Our results give a global idea of the performance of these algorithms and show that our algorithm gives the best performance when process crashes take place in a system using a multicast network model. At the same time, our algorithm also gives a very acceptable performance, even when crashes occur in a unicast network model and in the case where no process crashes happen within the system.
{"title":"Fault tolerance management in distributed systems: A new leader-based consensus algorithm","authors":"Fouad Hanna, J. Lapayre, L. Droz-Bartholet","doi":"10.1109/HPCSim.2014.6903691","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903691","url":null,"abstract":"It is well known that consensus algorithms are fundamental building blocks for fault tolerant distributed systems. In the literature of consensus, many algorithms have been proposed to solve this problem in different system models but few attempts have been made to analyze their performance. In this paper we present a new leader-based consensus algorithm (FLC algorithm) for the crash-stop failure model. Our algorithm uses the leader oracle Ω and adapts a decentralized communication pattern. In addition, we analyze and compare the performance of our algorithm to four of the most well-known consensus algorithms among asynchronous distributed systems of the crash-stop failure model. Our results give a global idea of the performance of these algorithms and show that our algorithm gives the best performance when process crashes take place in a system using a multicast network model. At the same time, our algorithm also gives a very acceptable performance, even when crashes occur in a unicast network model and in the case where no process crashes happen within the system.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"39 1","pages":"234-242"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81705573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-21DOI: 10.1109/HPCSim.2014.6903678
A. D. Peris, J. Hernández, E. Huedo
Pull-based late-binding overlays are used in some of today's largest computational grids. Job agents are submitted to resources with the duty of retrieving real workload from a central queue at runtime. This helps overcome the problems of these complex environments: heterogeneity, imprecise status information and relatively high failure rates. In addition, the late job assignment allows dynamic adaptation to changes in grid conditions or user priorities. However, as the scale grows, the central assignment queue may become a bottleneck for the whole system. This article presents a distributed scheduling architecture for late-binding overlays, which addresses this issue by letting execution nodes build a distributed hash table and delegating job matching and assignment to them. This reduces the load on the central server and makes the system much more scalable and robust. Scalability makes fine-grained scheduling possible and enables new functionalities, like the implementation of a distributed data cache on the execution nodes, which helps alleviate the commonly congested grid storage services.
{"title":"Distributed scheduling and data sharing in late-binding overlays","authors":"A. D. Peris, J. Hernández, E. Huedo","doi":"10.1109/HPCSim.2014.6903678","DOIUrl":"https://doi.org/10.1109/HPCSim.2014.6903678","url":null,"abstract":"Pull-based late-binding overlays are used in some of today's largest computational grids. Job agents are submitted to resources with the duty of retrieving real workload from a central queue at runtime. This helps overcome the problems of these complex environments: heterogeneity, imprecise status information and relatively high failure rates. In addition, the late job assignment allows dynamic adaptation to changes in grid conditions or user priorities. However, as the scale grows, the central assignment queue may become a bottleneck for the whole system. This article presents a distributed scheduling architecture for late-binding overlays, which addresses this issue by letting execution nodes build a distributed hash table and delegating job matching and assignment to them. This reduces the load on the central server and makes the system much more scalable and robust. Scalability makes fine-grained scheduling possible and enables new functionalities, like the implementation of a distributed data cache on the execution nodes, which helps alleviate the commonly congested grid storage services.","PeriodicalId":6469,"journal":{"name":"2014 International Conference on High Performance Computing & Simulation (HPCS)","volume":"5 1","pages":"129-136"},"PeriodicalIF":0.0,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89706978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}