首页 > 最新文献

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献

英文 中文
Squeezing performance out of Arkouda 压缩Arkouda的性能
Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00119
Elliot Ronaghan
This talk will highlight optimizations made to Arkouda, a Python package backed by Chapel that provides a key subset of the popular NumPy and Pandas interfaces at HPC scales. Optimizations such as aggregating communication have significantly improved Arkouda’s performance across a wide range of architectures. Key optimizations and benchmark results will be shown on architectures including a single node server, Ethernet and InfiniBand clusters, and a 512 node Cray supercomputer.
本演讲将重点介绍对Arkouda的优化,Arkouda是一个由Chapel支持的Python包,它提供了HPC规模上流行的NumPy和Pandas接口的关键子集。诸如聚合通信之类的优化大大提高了Arkouda在各种架构中的性能。关键优化和基准测试结果将在架构上显示,包括单节点服务器、以太网和InfiniBand集群以及512节点的Cray超级计算机。
{"title":"Squeezing performance out of Arkouda","authors":"Elliot Ronaghan","doi":"10.1109/IPDPSW50202.2020.00119","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00119","url":null,"abstract":"This talk will highlight optimizations made to Arkouda, a Python package backed by Chapel that provides a key subset of the popular NumPy and Pandas interfaces at HPC scales. Optimizations such as aggregating communication have significantly improved Arkouda’s performance across a wide range of architectures. Key optimizations and benchmark results will be shown on architectures including a single node server, Ethernet and InfiniBand clusters, and a 512 node Cray supercomputer.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121600005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Population Count on Intel® CPU, GPU and FPGA Intel®CPU、GPU和FPGA的人口统计
Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00081
Zheming Jin, H. Finkel
Population count is a primitive used in many applications. Commodity processors have dedicated instructions for achieving high-performance population count. Motivated by the productivity of high-level synthesis and the importance of population count, in this paper we investigated the OpenCL implementations of population count algorithms, and evaluated their performance and resource utilizations on an FPGA. Based on the results, we select the most efficient implementation. Then we derived a reduction pattern from a representative application of population count. We parallelized the reduction with atomic functions, and optimized it with vectorized memory accesses, tree reduction, and compute-unit duplication. We evaluated the performance of the reduction kernel on an InteloXeono CPU and an Intel® IrisTM Pro integrated GPU, and an FPGA card that features an Intel® Arria® 10 FPGA. When DRAM memory bandwidth is comparable on the three computing platforms, the FPGA can achieve the highest kernel performance for large workload. On the other hand, we described performance bottlenecks on the FPGA. To make FPGAs more competitive in raw performance compared to high-performant CPU and GPU platforms, it is important to increase external memory bandwidth, minimize data movement between a host and a device, and reduce OpenCL runtime overhead on an FPGA.
人口计数是许多应用程序中使用的原始数据。普通处理器有专门的指令来实现高性能的种群计数。考虑到高级综合的生产力和种群计数的重要性,本文研究了种群计数算法的OpenCL实现,并在FPGA上评估了它们的性能和资源利用率。根据结果,我们选择最有效的实现。然后,我们从人口计数的代表性应用中推导出一个减少模式。我们将原子函数的约简并行化,并通过向量化内存访问、树约简和计算单元复制对其进行优化。我们在InteloXeono CPU和Intel®IrisTM Pro集成GPU以及具有Intel®Arria®10 FPGA的卡上评估了缩减内核的性能。当三种计算平台上的DRAM内存带宽相当时,FPGA可以在大工作负载下实现最高的内核性能。另一方面,我们描述了FPGA上的性能瓶颈。与高性能CPU和GPU平台相比,为了使FPGA在原始性能方面更具竞争力,增加外部内存带宽、最小化主机和设备之间的数据移动以及减少FPGA上的OpenCL运行时开销非常重要。
{"title":"Population Count on Intel® CPU, GPU and FPGA","authors":"Zheming Jin, H. Finkel","doi":"10.1109/IPDPSW50202.2020.00081","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00081","url":null,"abstract":"Population count is a primitive used in many applications. Commodity processors have dedicated instructions for achieving high-performance population count. Motivated by the productivity of high-level synthesis and the importance of population count, in this paper we investigated the OpenCL implementations of population count algorithms, and evaluated their performance and resource utilizations on an FPGA. Based on the results, we select the most efficient implementation. Then we derived a reduction pattern from a representative application of population count. We parallelized the reduction with atomic functions, and optimized it with vectorized memory accesses, tree reduction, and compute-unit duplication. We evaluated the performance of the reduction kernel on an InteloXeono CPU and an Intel® IrisTM Pro integrated GPU, and an FPGA card that features an Intel® Arria® 10 FPGA. When DRAM memory bandwidth is comparable on the three computing platforms, the FPGA can achieve the highest kernel performance for large workload. On the other hand, we described performance bottlenecks on the FPGA. To make FPGAs more competitive in raw performance compared to high-performant CPU and GPU platforms, it is important to increase external memory bandwidth, minimize data movement between a host and a device, and reduce OpenCL runtime overhead on an FPGA.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131457300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Revisiting dynamic DAG scheduling under memory constraints for shared-memory platforms 回顾共享内存平台内存约束下的动态DAG调度
Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00102
Gabriel Bathie, L. Marchal, Y. Robert, Samuel Thibault
This work focuses on dynamic DAG scheduling under memory constraints. We target a shared-memory platform equipped with p parallel processors. We aim at bounding the maximum amount of memory that may be needed by any schedule using p processors to execute the DAG. We refine the classical model that computes maximum cuts by introducing two types of memory edges in the DAG, black edges for regular precedence constraints and red edges for actual memory consumption during execution. A valid edge cut cannot include more than p red edges. This limitation had never been taken into account in previous works, and dramatically changes the complexity of the problem, which was polynomial and becomes NP-hard. We introduce an Integer Linear Program (ILP) to solve it, together with an efficient heuristic based on rounding the rational solution of the ILP. In addition, we propose an exact polynomial algorithm for series-parallel graphs. We provide an extensive set of experiments, both with randomly-generated graphs and with graphs arising form practical applications, which demonstrate the impact of resource constraints on peak memory usage.
本文主要研究内存约束下的动态DAG调度问题。我们的目标是一个配备p个并行处理器的共享内存平台。我们的目标是限制使用p个处理器执行DAG的任何调度可能需要的最大内存量。我们通过在DAG中引入两种类型的内存边来改进计算最大切割的经典模型,黑边用于常规优先约束,红边用于执行期间的实际内存消耗。一条有效的切边不能包含多于p条红色边。这个限制在以前的工作中从来没有被考虑过,并且极大地改变了问题的复杂性,它是多项式的,变成了np困难。我们引入了一个整数线性规划(ILP)来求解它,并给出了一个基于整数线性规划的四舍五入有理解的有效启发式算法。此外,我们还提出了一种序列-并行图的精确多项式算法。我们提供了一组广泛的实验,包括随机生成的图形和从实际应用中产生的图形,这些实验证明了资源约束对峰值内存使用的影响。
{"title":"Revisiting dynamic DAG scheduling under memory constraints for shared-memory platforms","authors":"Gabriel Bathie, L. Marchal, Y. Robert, Samuel Thibault","doi":"10.1109/IPDPSW50202.2020.00102","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00102","url":null,"abstract":"This work focuses on dynamic DAG scheduling under memory constraints. We target a shared-memory platform equipped with p parallel processors. We aim at bounding the maximum amount of memory that may be needed by any schedule using p processors to execute the DAG. We refine the classical model that computes maximum cuts by introducing two types of memory edges in the DAG, black edges for regular precedence constraints and red edges for actual memory consumption during execution. A valid edge cut cannot include more than p red edges. This limitation had never been taken into account in previous works, and dramatically changes the complexity of the problem, which was polynomial and becomes NP-hard. We introduce an Integer Linear Program (ILP) to solve it, together with an efficient heuristic based on rounding the rational solution of the ILP. In addition, we propose an exact polynomial algorithm for series-parallel graphs. We provide an extensive set of experiments, both with randomly-generated graphs and with graphs arising form practical applications, which demonstrate the impact of resource constraints on peak memory usage.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134441748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Chapel on Accelerators 加速器礼拜堂
Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00121
Rahul Ghangas, Josh Milthorpe
Chapel’s high level data-parallel constructs make parallel programming productive for general programmers. This talk introduces the “Chapel on Accelerators” project, which proposes compiler enhancements to extend data-parallel constructs to hardware accelerators including GPUs. Previous attempts to extend Chapel to GPUs [1]–[3] have not been successfully integrated, and any such extension needs to maintain portability and consistency with the Chapel design philosophy and implementation.
Chapel的高级数据并行结构使并行编程对普通程序员来说非常高效。本演讲介绍了“Chapel on Accelerators”项目,该项目提出了编译器增强功能,将数据并行结构扩展到包括gpu在内的硬件加速器。以前将Chapel扩展到gpu的尝试[1]-[3]都没有成功集成,任何这样的扩展都需要保持可移植性和与Chapel设计理念和实现的一致性。
{"title":"Chapel on Accelerators","authors":"Rahul Ghangas, Josh Milthorpe","doi":"10.1109/IPDPSW50202.2020.00121","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00121","url":null,"abstract":"Chapel’s high level data-parallel constructs make parallel programming productive for general programmers. This talk introduces the “Chapel on Accelerators” project, which proposes compiler enhancements to extend data-parallel constructs to hardware accelerators including GPUs. Previous attempts to extend Chapel to GPUs [1]–[3] have not been successfully integrated, and any such extension needs to maintain portability and consistency with the Chapel design philosophy and implementation.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132976521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
SALSA: A Domain Specific Architecture for Sequence Alignment SALSA:用于序列比对的领域特定架构
Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00033
Lorenzo Di Tucci, Riyadh Baghdadi, Saman P. Amarasinghe, M. Santambrogio
The explosion of genomic data is fostering research in fields such as personalized medicine and agritech, raising the necessity of providing more performant, power-efficient and easy-to-use architectures. Devices such as GPUs and FPGAs, deliver major performance improvements, however, GPUs present no-table power consumption, while FPGAs lack programmability. In this paper, we present SALSA, a Domain-Specific Architecture for sequence alignment that is completely configurable, extensible and is based on the RISC-V ISA. SALSA delivers good performance even at 200 MHz, outperforming Rocket, an open-source core, and an Intel Xeon by factors up to 350x in performance and 790x in power efficiency.
基因组数据的爆炸式增长正在促进个性化医疗和农业技术等领域的研究,从而提高了提供更高性能、更节能、更易于使用的架构的必要性。诸如gpu和fpga之类的设备提供了主要的性能改进,然而,gpu呈现无表功耗,而fpga缺乏可编程性。在本文中,我们提出了SALSA,一种基于RISC-V ISA的完全可配置、可扩展的序列比对领域特定架构。即使在200mhz的频率下,SALSA也提供了良好的性能,性能优于Rocket,开源内核和英特尔至强处理器,性能提高了350倍,能效提高了790倍。
{"title":"SALSA: A Domain Specific Architecture for Sequence Alignment","authors":"Lorenzo Di Tucci, Riyadh Baghdadi, Saman P. Amarasinghe, M. Santambrogio","doi":"10.1109/IPDPSW50202.2020.00033","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00033","url":null,"abstract":"The explosion of genomic data is fostering research in fields such as personalized medicine and agritech, raising the necessity of providing more performant, power-efficient and easy-to-use architectures. Devices such as GPUs and FPGAs, deliver major performance improvements, however, GPUs present no-table power consumption, while FPGAs lack programmability. In this paper, we present SALSA, a Domain-Specific Architecture for sequence alignment that is completely configurable, extensible and is based on the RISC-V ISA. SALSA delivers good performance even at 200 MHz, outperforming Rocket, an open-source core, and an Intel Xeon by factors up to 350x in performance and 790x in power efficiency.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130904241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Linear Algebraic Louvain Method in Python Python中的线性代数Louvain方法
Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00050
Tze Meng Low, Daniele G. Spampinato, Scott McMillan, Michel Pelletier
We show that a linear algebraic formulation of the Louvain method for community detection can be derived systematically from the linear algebraic definition of modularity. Using the pygraphblas interface, a high-level Python wrapper for the GraphBLAS C Application Programming Interface (API), we demonstrate that the linear algebraic formulation of the Louvain method can be rapidly implemented.
我们证明了社团检测的Louvain方法的线性代数公式可以从模块化的线性代数定义中系统地推导出来。使用pygraphblas接口(GraphBLAS C应用程序编程接口(API)的高级Python包装器),我们证明了Louvain方法的线性代数公式可以快速实现。
{"title":"Linear Algebraic Louvain Method in Python","authors":"Tze Meng Low, Daniele G. Spampinato, Scott McMillan, Michel Pelletier","doi":"10.1109/IPDPSW50202.2020.00050","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00050","url":null,"abstract":"We show that a linear algebraic formulation of the Louvain method for community detection can be derived systematically from the linear algebraic definition of modularity. Using the pygraphblas interface, a high-level Python wrapper for the GraphBLAS C Application Programming Interface (API), we demonstrate that the linear algebraic formulation of the Louvain method can be rapidly implemented.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133374830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Accelerating Towards Larger Deep Learning Models and Datasets – A System Platform View Point 加速走向更大的深度学习模型和数据集-一个系统平台的观点
Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00169
S. Vinod, M. Naveen, A. K. Patra, Anto Ajay Raj John
Deep Learning (DL) is a rapidly evolving field under the umbrella of Artificial Intelligence (AI) with proven real-world use cases in supervised and unsupervised learning tasks. As the complexity of the learning tasks increases, the DL models become deeper or wider with millions of parameters and use larger datasets. Neural networks like AmoebaNet with 557M parameters and GPT-2 with 1.5 billion parameters are some of the recent examples of large models. DL trainings are generally run on accelerated hardware such as GPUs, TPUs or FPGAs which can satisfy the high computational demands of the neural network training. But accelerators are limited in their memory capacities. Larger the models, larger the memory required while training them. Hence, large DL models and large datasets cannot fit into the limited memory available on GPUs. However, there are techniques designed to overcome this limitation like compression, using CPU memory as a data swap, recomputations within the GPUs etc. But the efficiency of each of these techniques also depends on the underneath system platform capabilities. In this paper we present the observations from our study of training large DL models using data swap method on different system platforms. This study showcases the characteristics of large models and presents the system viewpoint of large deep learning model training by studying the relation of the software techniques to the system platform used underneath. The results presented in the paper show that for training large Deep Learning models, communication link between CPU and GPU is critical and the training performance can be improved by using a platform with high bandwidth link for this communication. The results presented are based on two DL models, 3DUnetCNN model for medical image segmentation and DeepLabV3+ model for semantic image segmentation.
深度学习(DL)是人工智能(AI)下一个快速发展的领域,在监督和无监督学习任务中具有经过验证的实际用例。随着学习任务复杂性的增加,深度学习模型变得更深入或更广泛,有数百万个参数,并使用更大的数据集。像AmoebaNet这样的神经网络有557M个参数,GPT-2有15亿个参数,这些都是最近大型模型的一些例子。DL训练一般在gpu、tpu或fpga等加速硬件上运行,可以满足神经网络训练的高计算需求。但是加速器的记忆容量有限。模型越大,训练时所需的内存就越大。因此,大型DL模型和大型数据集无法装入gpu有限的可用内存中。然而,有一些技术可以克服这一限制,如压缩,使用CPU内存作为数据交换,gpu内的重新计算等。但是这些技术的效率还取决于底层系统平台的能力。在本文中,我们展示了我们在不同系统平台上使用数据交换方法训练大型深度学习模型的研究结果。本研究通过研究软件技术与底层系统平台的关系,展示了大型模型的特点,提出了大型深度学习模型训练的系统观点。本文的研究结果表明,对于训练大型深度学习模型,CPU和GPU之间的通信链路是至关重要的,使用高带宽链路的通信平台可以提高训练性能。结果基于两个深度学习模型,3DUnetCNN模型用于医学图像分割,DeepLabV3+模型用于语义图像分割。
{"title":"Accelerating Towards Larger Deep Learning Models and Datasets – A System Platform View Point","authors":"S. Vinod, M. Naveen, A. K. Patra, Anto Ajay Raj John","doi":"10.1109/IPDPSW50202.2020.00169","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00169","url":null,"abstract":"Deep Learning (DL) is a rapidly evolving field under the umbrella of Artificial Intelligence (AI) with proven real-world use cases in supervised and unsupervised learning tasks. As the complexity of the learning tasks increases, the DL models become deeper or wider with millions of parameters and use larger datasets. Neural networks like AmoebaNet with 557M parameters and GPT-2 with 1.5 billion parameters are some of the recent examples of large models. DL trainings are generally run on accelerated hardware such as GPUs, TPUs or FPGAs which can satisfy the high computational demands of the neural network training. But accelerators are limited in their memory capacities. Larger the models, larger the memory required while training them. Hence, large DL models and large datasets cannot fit into the limited memory available on GPUs. However, there are techniques designed to overcome this limitation like compression, using CPU memory as a data swap, recomputations within the GPUs etc. But the efficiency of each of these techniques also depends on the underneath system platform capabilities. In this paper we present the observations from our study of training large DL models using data swap method on different system platforms. This study showcases the characteristics of large models and presents the system viewpoint of large deep learning model training by studying the relation of the software techniques to the system platform used underneath. The results presented in the paper show that for training large Deep Learning models, communication link between CPU and GPU is critical and the training performance can be improved by using a platform with high bandwidth link for this communication. The results presented are based on two DL models, 3DUnetCNN model for medical image segmentation and DeepLabV3+ model for semantic image segmentation.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132682275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Workshop 6: HIPS High-level Parallel Programming Models and Supportive Environments 研讨会 6:HIPS 高级并行编程模型和支持环境
Pub Date : 2020-05-01 DOI: 10.1109/ipdpsw50202.2020.00064
Dong Li, Heike Jagode
The 25th HIPS workshop, a full-day meeting on May 18th at the IEEE IPDPS 2020 conference in New Orleans (now virtual), focuses on high-level programming of multiprocessors, compute clusters, and massively parallel machines. Like previous workshops in the series, which was established in 1996, this event serves as a forum for research in the areas of parallel applications, language design, compilers, runtime systems, and programming tools. It provides a timely forum for scientists and engineers to present the latest ideas and findings in these rapidly changing fields. In our call for papers, we especially encouraged innovative approaches in the areas of emerging programming models for large-scale parallel systems and many-core architectures.
第 25 届 HIPS 研讨会将于 5 月 18 日在新奥尔良举行的电气和电子工程师学会 IPDPS 2020 会议(现在是虚拟会议)上举行,为期一整天,重点讨论多处理器、计算集群和大规模并行机的高级编程。与 1996 年成立的该系列研讨会的前几届一样,本次会议也是并行应用、语言设计、编译器、运行时系统和编程工具等领域的研究论坛。它为科学家和工程师提供了一个及时的论坛,展示这些瞬息万变领域中的最新思想和研究成果。在征集论文的过程中,我们特别鼓励在大规模并行系统和多核架构的新兴编程模型领域的创新方法。
{"title":"Workshop 6: HIPS High-level Parallel Programming Models and Supportive Environments","authors":"Dong Li, Heike Jagode","doi":"10.1109/ipdpsw50202.2020.00064","DOIUrl":"https://doi.org/10.1109/ipdpsw50202.2020.00064","url":null,"abstract":"The 25th HIPS workshop, a full-day meeting on May 18th at the IEEE IPDPS 2020 conference in New Orleans (now virtual), focuses on high-level programming of multiprocessors, compute clusters, and massively parallel machines. Like previous workshops in the series, which was established in 1996, this event serves as a forum for research in the areas of parallel applications, language design, compilers, runtime systems, and programming tools. It provides a timely forum for scientists and engineers to present the latest ideas and findings in these rapidly changing fields. In our call for papers, we especially encouraged innovative approaches in the areas of emerging programming models for large-scale parallel systems and many-core architectures.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"67 38","pages":"316-316"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141207606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and Comparison of Resilient Scheduling Heuristics for Parallel Jobs 并行作业弹性调度启发式的设计与比较
Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00099
A. Benoit, Valentin Le Fèvre, P. Raghavan, Y. Robert, Hongyang Sun
This paper focuses on the resilient scheduling of parallel jobs on high-performance computing (HPC) platforms to minimize the overall completion time, or makespan. We revisit the classical problem while assuming that jobs are subject to transient or silent errors, and hence may need to be re-executed each time they fail to complete successfully. This work generalizes the classical framework where jobs are known offline and do not fail: in the classical framework, list scheduling that gives priority to longest jobs is known to be a 3-approximation when imposing to use shelves, and a 2-approximation without this restriction. We show that when jobs can fail, using shelves can be arbitrarily bad, but unrestricted list scheduling remains a 2-approximation. The paper focuses on the design of several heuristics, some list-based and some shelf-based, along with different priority rules and backfilling strategies. We assess and compare their performance through an extensive set of simulations, using both synthetic jobs and log traces from the Mira supercomputer.
本文主要研究了高性能计算平台上并行作业的弹性调度,以最小化总体完成时间。我们重新审视这个经典问题,同时假设作业受到暂时或静默错误的影响,因此每次作业未能成功完成时可能需要重新执行。这项工作推广了经典框架,其中作业已知脱机并且不会失败:在经典框架中,在强制使用货架时,已知给予最长作业优先级的列表调度是3近似值,而在没有此限制的情况下是2近似值。我们表明,当作业可能失败时,使用架子可能是任意糟糕的,但不受限制的列表调度仍然是2近似。本文重点介绍了几种启发式算法的设计,一些基于列表,一些基于货架,以及不同的优先级规则和回填策略。我们通过一组广泛的模拟来评估和比较它们的性能,使用合成作业和Mira超级计算机的日志痕迹。
{"title":"Design and Comparison of Resilient Scheduling Heuristics for Parallel Jobs","authors":"A. Benoit, Valentin Le Fèvre, P. Raghavan, Y. Robert, Hongyang Sun","doi":"10.1109/IPDPSW50202.2020.00099","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00099","url":null,"abstract":"This paper focuses on the resilient scheduling of parallel jobs on high-performance computing (HPC) platforms to minimize the overall completion time, or makespan. We revisit the classical problem while assuming that jobs are subject to transient or silent errors, and hence may need to be re-executed each time they fail to complete successfully. This work generalizes the classical framework where jobs are known offline and do not fail: in the classical framework, list scheduling that gives priority to longest jobs is known to be a 3-approximation when imposing to use shelves, and a 2-approximation without this restriction. We show that when jobs can fail, using shelves can be arbitrarily bad, but unrestricted list scheduling remains a 2-approximation. The paper focuses on the design of several heuristics, some list-based and some shelf-based, along with different priority rules and backfilling strategies. We assess and compare their performance through an extensive set of simulations, using both synthetic jobs and log traces from the Mira supercomputer.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128856237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Message from the HCW Technical Program Committee Chair 来自HCW技术计划委员会主席的信息
Pub Date : 2020-05-01 DOI: 10.1109/ipdpsw50202.2020.00010
F. Ciorba
Welcome to the 29th International Heterogeneity in Computing Workshop (HCW). Heterogeneity is one of the most important aspects of modern and emerging parallel and distributed computing systems. Exposing and expressing software parallelism as well as efficiently managing and exploiting hardware parallelism in heterogeneous parallel and distributed computing systems represent both challenges and exciting opportunities for advancing scientific discovery and for impactful innovation.
欢迎参加第29届国际计算异构研讨会(HCW)。异构性是现代和新兴的并行和分布式计算系统最重要的方面之一。揭示和表达软件并行性,以及在异构并行和分布式计算系统中有效地管理和开发硬件并行性,对于推进科学发现和有影响力的创新来说,既是挑战,也是令人兴奋的机会。
{"title":"Message from the HCW Technical Program Committee Chair","authors":"F. Ciorba","doi":"10.1109/ipdpsw50202.2020.00010","DOIUrl":"https://doi.org/10.1109/ipdpsw50202.2020.00010","url":null,"abstract":"Welcome to the 29th International Heterogeneity in Computing Workshop (HCW). Heterogeneity is one of the most important aspects of modern and emerging parallel and distributed computing systems. Exposing and expressing software parallelism as well as efficiently managing and exploiting hardware parallelism in heterogeneous parallel and distributed computing systems represent both challenges and exciting opportunities for advancing scientific discovery and for impactful innovation.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131308734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1