J. Schneible, L. Ríha, Maria Malik, T. El-Ghazawi, A. Alexandru
{"title":"基于异构集群的模板操作应用中通信高效工作分配方法","authors":"J. Schneible, L. Ríha, Maria Malik, T. El-Ghazawi, A. Alexandru","doi":"10.1109/HPCSim.2012.6266960","DOIUrl":null,"url":null,"abstract":"In recent years, the use of accelerators in conjunction with CPUs, known as heterogeneous computing, has brought about significant performance increases for scientific applications. One of the best examples of this is Lattice Quantum Chromo-Dynamics (QCD), a stencil operation based simulation. These simulations have a large memory footprint necessitating the use of many graphics processing units (GPUs) in parallel. This requires the use of a heterogeneous cluster with one or more GPUs per node. In order to obtain optimal performance, it is necessary to determine an efficient communication pattern between GPUs on the same node and between nodes. In this paper we present a performance model based method for minimizing the communication time of applications with stencil operations, such as Lattice QCD, on heterogeneous computing systems with a non-blocking Infiniband interconnection network. The proposed method is able to increase the performance of the most computationally intensive kernel of Lattice QCD by 25 percent due to improved overlapping of communication and computation.","PeriodicalId":428764,"journal":{"name":"2012 International Conference on High Performance Computing & Simulation (HPCS)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A method for communication efficient work distributions in stencil operation based applications on heterogeneous clusters\",\"authors\":\"J. Schneible, L. Ríha, Maria Malik, T. El-Ghazawi, A. Alexandru\",\"doi\":\"10.1109/HPCSim.2012.6266960\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the use of accelerators in conjunction with CPUs, known as heterogeneous computing, has brought about significant performance increases for scientific applications. One of the best examples of this is Lattice Quantum Chromo-Dynamics (QCD), a stencil operation based simulation. These simulations have a large memory footprint necessitating the use of many graphics processing units (GPUs) in parallel. This requires the use of a heterogeneous cluster with one or more GPUs per node. In order to obtain optimal performance, it is necessary to determine an efficient communication pattern between GPUs on the same node and between nodes. In this paper we present a performance model based method for minimizing the communication time of applications with stencil operations, such as Lattice QCD, on heterogeneous computing systems with a non-blocking Infiniband interconnection network. The proposed method is able to increase the performance of the most computationally intensive kernel of Lattice QCD by 25 percent due to improved overlapping of communication and computation.\",\"PeriodicalId\":428764,\"journal\":{\"name\":\"2012 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCSim.2012.6266960\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSim.2012.6266960","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A method for communication efficient work distributions in stencil operation based applications on heterogeneous clusters
In recent years, the use of accelerators in conjunction with CPUs, known as heterogeneous computing, has brought about significant performance increases for scientific applications. One of the best examples of this is Lattice Quantum Chromo-Dynamics (QCD), a stencil operation based simulation. These simulations have a large memory footprint necessitating the use of many graphics processing units (GPUs) in parallel. This requires the use of a heterogeneous cluster with one or more GPUs per node. In order to obtain optimal performance, it is necessary to determine an efficient communication pattern between GPUs on the same node and between nodes. In this paper we present a performance model based method for minimizing the communication time of applications with stencil operations, such as Lattice QCD, on heterogeneous computing systems with a non-blocking Infiniband interconnection network. The proposed method is able to increase the performance of the most computationally intensive kernel of Lattice QCD by 25 percent due to improved overlapping of communication and computation.